Distributed caches or databases are based on the CAP theorem. CAP stands for Consistency, Availability and Partition Tolerance. Lets see what each of these are and subsequently understand the theorem.
Consistency - This is not the Consistency in ACID in RDBMS world. In NoSQL world, this means that every read will return back the most up-to-date copy of the data and every write will modify the most up-to-date copy of the data or return with an error. i.e. system is giving consistent results no matter what node in the system the request lands up into.
Availability - This means that no requests returns with an error, with out the guarantee that it contains the most recent write.
Partition Tolerance - Network partition gets created when few nodes in the cluster are not able to talk to the other nodes due to some network failures, packet losses etc. Partition Tolerance means that system continues to operate despite an arbitrary no of messages being dropped (or delayed) by the network between the nodes.
Now, CAP theorem says that it is not possible for a distributed system to simulteneously provide more that 2 out of the above 3 guarantees.
Most distributed systems are AP compliant. i.e. the give up on Consistency but ensure Availability and Partition Tolerance.
In presence of a network partition which is quite normal, a request can either be denied service (unavailable) but remaining consistent, or can return results (hence be available) but serviced on an inconsistent copy (created due to n/w partitioning) of data. Thus, you have to choose from either consistency or availability in presence of a network partition.
If there is no network partition, the system is consistent and available.
Consistency - This is not the Consistency in ACID in RDBMS world. In NoSQL world, this means that every read will return back the most up-to-date copy of the data and every write will modify the most up-to-date copy of the data or return with an error. i.e. system is giving consistent results no matter what node in the system the request lands up into.
Availability - This means that no requests returns with an error, with out the guarantee that it contains the most recent write.
Partition Tolerance - Network partition gets created when few nodes in the cluster are not able to talk to the other nodes due to some network failures, packet losses etc. Partition Tolerance means that system continues to operate despite an arbitrary no of messages being dropped (or delayed) by the network between the nodes.
Now, CAP theorem says that it is not possible for a distributed system to simulteneously provide more that 2 out of the above 3 guarantees.
Most distributed systems are AP compliant. i.e. the give up on Consistency but ensure Availability and Partition Tolerance.
In presence of a network partition which is quite normal, a request can either be denied service (unavailable) but remaining consistent, or can return results (hence be available) but serviced on an inconsistent copy (created due to n/w partitioning) of data. Thus, you have to choose from either consistency or availability in presence of a network partition.
If there is no network partition, the system is consistent and available.
Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability, whereas systems designed around the BASE philosophy, common in the NoSql movement for example, choose availability over consistency.
No comments:
Post a Comment