NoSQL

Ahsan Ijaz

NoSQL overview

Relaxing consistency guarantee

Ensure high availability
Support updates

Relaxing consistency guarantee

DataBases: "Everyone MUST see the same thing, either old or new, no matter how long it takes." NoSQL: "For large applications, we can't afford to wait that long, and maybe it doesn't matter anyway."

New design space: High scalability, high availability, eventual consistency.

Eventual Consistency

Write conflicts will eventually propogate throughout the system.

In absence of updates, all replicas converge towards identical copies.
What the application sees in the meantime is sensitive to replication mechanisms.
Contrast with RDBMS: Immediate consistency, but maybe deadlocks.

CAP Theorem

Consistency
- Do all applications see the same data?
Availability
- Can I interact with the system in the presence of failures?
Partitioning
- If two sections of your system cannot talk to each other, can they make forward progress on their own?
- If not, you sacrifice availability
- If so, you might have to sacrifice consistency
Conventional databases assume no partitioning
NoSQL systems may sacrfice consistency.

CAP Triangle

NoSQL summary

Features of NoSQL

Ability to horzontally scale
The ability to replicate and partition data over multiple servers
A simple call level interface (in contrast to SQL binding)
Weaker concurrency model than ACID (Atomicity, Consistency, Isolation, Durability)
Efficient used of distributed indexes and RAM
Dynamically add new attributes

Horizontal Scalability VS Vertical Scalability

Horizontal Scalability

Distribute both data and load of operations over many servers, with no RAM or disk shared.

Vertical Scalability

A database system utilizes many cores and/or CPUs that share RAM and disks.

Data Store categories

Key-value Stores: Stores with values and index to find them, based on a programmer defined key.
Document stores: Stores documents. The documents are indexed and a simple query mechanism is provided.
Wide column stores: Store extensible records that can be partitioned vertically and horizontally across nodes.
Relational Databases: These systems store (and index and query) tuples.

Key-value stores

Project Voldemort

Written in Java, Open source, LinkedIn (most contributions).
Multi-version concurrency control (MVCC) for updates.
Replica updates are asynchronous.
No guarantee of consistent data
Guarantee of an up-to-date view if you read a majority of replicas.
Optimistic locking for consisten multi-record updates
- In case of conflict with other process, back out.
Automatic sharding of data.
Consistent hashing for distribution
- data hashed to node K is replicated on node K+1...K+n where n is number of extra copies.
Nodes can be added or removed from cluster; system adopts automatically.
Voldemort automatically detects and recovers failed nodes.

Riak

More functionality than typical key-value store.
Riak objects can be fetched and stored in JSON format and thus can have multiple fields (like documents)
Only support indices on the primary key.
Limits the non-primary fields to storage and fetching as part of a JSON object.
Riak lacks the query mechanisms of the document stores; the only lookup you can do is on primary key.
Replication of objects and sharding by hashing on the primary key.
Replica values to be temporarily inconsistent.
Tunable consistency by specifying how many replicas must respond for a successful read.
Same tuning for successful write.
Different parts of an application can choose different trade-offs
Uses consistent hashing.
No distinguished node to track status of the system.
Gossip protocol to track alive nodes and know what data they have- Any node may service a client request.
Also includes a map/reduce mechanism to split work over all the nodes in a cluster.

Redis

The client side does the distributed hashing over servers.
The servers store data in RAM, but data can be copied to disk for backup or system shutdown.
System shutdown needed to add more nodes.
Like Voldemort, it allows lists and sets to be associated with a key.
Redis does atomic updates by locking.
Replication is asynchronous.

Scalaris

Allows key ranges to be assigned to nodes, rather than simply hashing to nodes.
- A query on a range of values does not need to go to every node.
- Better load balancing depending on key distributions.
Replication is synchronous. (copies must be updated before the operation is complete)
- Data is guaranteed to be consistent.
Supports transactions with ACID properties on multiple objects.
Data is stored in memory, but replication and recovery from node failures provides durability of the updates.
Scalaris reads and writes must go to a majority of the replicas before an operation completes.

Summary of key-value store

	database	basic_operations	store_data	Transactions	Replication	Concurrency	Scalability
1	Voldemort	TRUE	RAM/Disk	FALSE	Asynchronous	MVCC	TRUE
2	Riak	TRUE	5	FALSE	Asynchronous	MVCC	TRUE
3	Redis	TRUE	Both ram and disk	FALSE	Asynchronous	locks	TRUE
4	Scalaris	TRUE	RAM/Disk	TRUE	Synchronous	locks	TRUE
5	Tokyo Cabinet	TRUE	5	TRUE	Asynchronous	locks	TRUE
6	Memcached	TRUE	Both ram and disk	FALSE	Synchronous	locks	TRUE

NoSQL

NoSQL overview

Relaxing consistency guarantee

Relaxing consistency guarantee

Eventual Consistency

CAP Theorem

CAP Triangle

NoSQL summary

Features of NoSQL

Horizontal Scalability VS Vertical Scalability

Horizontal Scalability

Vertical Scalability

Data Store categories

Key-value stores

Project Voldemort

Riak

Redis

Scalaris

Summary of key-value store

Document stores

SimpleDB

CouchDB

MongoDB

Terrastore