Database Selection & Design (Part IV)
— Non-Relational Properties —
Audience: The article covers the Non-Relational Properties (Distributed Computing, CAP Theorem and PACELC Theorem) and its details. These concepts will be referred in the later parts of this series during the database selection and design procedures.
Proficiency Level: Intermediate to Expert
In the previous parts, we reviewed the relational properties and the internals of the ACID implementation. In this part, let us take a look at the non-relational properties and how they are different from the relational properties. Let’s start with reviewing what distributed database computing is and its taxonomy.
What is distributed computing?
Distributed computing is the process of managing data across multiple interrelated servers (nodes) across many data centers and locations over a computer network. This improves the performance at end-user worksites by allowing transactions to be processed on many machines, instead of being limited to one.
Why do we need distributed database computing?
As we have seen in the parts before, with the explosion of data in the current world scenarios, the need for expanding the data storage presented itself. This expansion drove the need for scaling the database infrastructure to accommodate this increase in volume.
How can we implement distributed database computing?
There are two ways you can scale with the storage needs: Horizontally and Vertically.
- Vertical scaling is adding resources to the existing server. This is very easy and since all resources are in one server, it is faster. But, there is an absolute limitation to the amount of resource you can add to your server. So this scaling is not feasible beyond a certain point. Vertical scaling is typically costly. It does not make the system fault tolerant, i.e if you are scaling application running with single server, if that server goes down, your system will go down
- Horizontal scaling is adding more machines into your pool of resources. This is more feasible as there is no limitation to this scaling. This can be applied to both relational and non-relational databases. The techniques used to manage the data across the nodes differs with different databases. Partition / Sharding (which will be covered in a later part) is the key to enable the distributed computing. Advantages include, ability to store large volume of data, speed in retrieving the data due to proximity, fault tolerance as there will be data redundancy via replication. Disadvantages include, complexity with the data management, inconsistencies with the data replication (which will be covered in the later part) process etc.
While both these scaling techniques apply to relational databases, the non-relational databases are built ground up with horizontal scaling in mind, which enables distributing database computing.
When many servers come together for distributed database computing, connecting these servers with an optimal network topology plays a key role in the performance and efficiency. Network topologies are used to connect these multiple nodes together and there are many network topologies (Mesh, Star, Bus, Ring, Hybrid etc). The most commonly used topology in distributed database computing is the ring topology. In this topology, each node is connected to two other nodes in a ring form.
Distributed computing comes with the challenge of adhering to ACID properties. With data in multiple nodes and network connecting them, there is always a probability of delays or failures across network. This prevents the ACID properties from staying intact. In the diagram above, assume we have a user information stored in node 3 and 4. When there is a network failure during the time of an operation, these two nodes will end up with different information, thereby losing the consistency property within ACID. Since these properties are hard to accomplish in the distributing database computing environment, innovation kicked off to live with these compromised properties to come up with a new set of rules called BaSE properties.
Transactional Management (BaSE) Requirements
- Basically Available: This constraint states that the system does guarantee the availability of the data as regards to CAP Theorem (explained below); there will be a response to any request. But the response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state
- Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always soft.
- Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere sooner or later, but the system will continue to receive input, and is not checking the consistency of every transaction before it moves onto the next one.
CAP Theorem Properties
CAP Theorem: Fundamental theorem for any distributed storage system pertaining to achievable properties
Before looking at the definition of CAP theorem, let’s see what CAP (Consistency, Availability & Partition Tolerance) is.
- Consistency (C): Reads and writes are always executed atomically and are linearizable (in-between strict and sequential consistency, where every operation appears to take place atomically, consistent with the global time order of those operations). All clients who are reading the data from the cluster see the same data at any given point of time.
- Availability (A): Every non-failing node in the system can always accept read and write requests by clients and will eventually return with a meaningful response, without an error message.
- Partition-tolerance (P): It is the tolerance level of your database to keep running with its available nodes and serve the client requests. The system upholds the previously displayed consistency guarantees and availability in the presence of message loss between the nodes or partial system failure. When there is a partition failure, nodes in each side of the partition will continue to serve requests (Read / Write). When this happens, there is a possibility that nodes in each side will end up with different values via different write requests. This is called a “Split-Brain” issue. There are many conflict resolution techniques (in-built in the databases) that will help with this situation. Examples: Revision ID, Timestamp etc.
With this understanding, let us look at the definition of CAP theorem:
It states that a sequentially consistent read/write register that eventually responds to every request cannot be realized in an asynchronous system that is prone to network partitions. In other words, it can guarantee at most two of the above three properties at the same time. Remember, when there is a partition, the failures and delays are inevitable. The below segments explains the possible combinations and how it can / can’t be achieved in relational / non-relational databases.
Availability — Partition Tolerant (AP) Systems:
If you set your system to be Available and Partition Tolerant: When there is a write you make to a certain set of nodes (red) during a time of partition failure, the writes will not be propagated to the set of nodes (blue) across the system partition. In this case, the read performed from those nodes (blue) will not be consistent, or we should say, be eventually consistent. These kinds of systems are called AP (Available and Partition Tolerant) systems. Eg: Cassandra, CouchDB, Dynamo DB, Riak etc
Consistency — Partition Tolerant (CP) Systems:
If you set your system to be Consistent and Partition Tolerant: When there is a write you make to a certain set of nodes (red) during a time of partition failure, the writes will not be propagated to the set of nodes (blue) across the system partition. In this case, if a read is performed in those nodes (blue), they will not respond (unavailable) due to inconsistent data. These kinds of systems are called CP (Consistent and Partition Tolerant) systems. Eg: Mango DB, HBase, Redis etc
Consistency — Availability (CA) Systems:
If you set your system to be Consistent and Available: When there is a write you make to a certain set of nodes, you can access consistently with availability from other nodes. The minute you throw a partition in to the mix, you are now forced into one of the above situations. So, this type is possible only when there is no partition failure in between, which in real world situation is inevitable. These kinds of systems are called CA (Consistent and Available) systems. Eg: RDBMS (Oracle, MySQL, Vertica, Postgres, etc
Remember, when there is a partition, the failures and delays are inevitable.
We listed the CA type last for a reason — in a distributed system, partitions can’t be avoided. So, while we can discuss a CA distributed database in theory, for all practical purposes, a CA distributed database can’t exist. However, this doesn’t mean you can’t have a CA database for your distributed application if you need one. Many relational databases, such as PostgreSQL, deliver consistency and availability and can be deployed to multiple nodes using replication.
The CAP theorem explains how the system behaves when there is a partition. What happens it there is no partition tolerance ?
Please note that the CAP Theorem (explained above) fails to capture the trade-off between latency and consistency during normal operation; it merely tells us whether a system favors availability or consistency in the face of a network partition. On the other hand, the PACELC theorem provides options to see how the system performs during failures and also during normal conditions.
PACELC (passELK) Requirements — An alternate CAP formulation
The X-axis defines the options available when there is network partition. The Y-axis defines parameters when there is no network partition. In case of a Partition, there is an Availability-Consistency trade-off; Else, in normal operation, there is a Latency-Consistency trade-off.
- When there is a network partition, the system chooses availability over consistency, else, choose consistency over latency.
- This is observed in the databases like MySQL and Kafka.
- When there is a network partition, the system chooses availability over consistency, else, choose latency over consistency.
- This is observed in the databases like Amazon’s Dynamo, Facebook’s Cassandra, and Riak databases. These systems employ eventual consistency as is seen in AP systems of the CAP theorem.
- When there is a network partition, the system chooses consistency over availability, else, choose consistency over latency.
- This is observed in the databases like VoltDB/H-Store, MegaStore, BigTable and Hbase, Kafka, Zookeeper.
- When there is a network partition, the system chooses consistency over availability, else, choose latency over consistency.
- This can be seen in the PNUTS database built by Yahoo.
Both CAP and PACELC theorem states that all of Consistency, Availability and Partition cannot be achieved at the same time. So, your selection process has to trade one of these parameters as it fits your use case and your ecosystem.
BaSE properties are not very stringent like the ACID properties. It is all about the trade-offs you have to make between the related properties. These trade-offs come from your business requirements and your organizational policies. While these theoretical concepts provide the direction, there are lot of these properties backed into different databases. When it comes to decisioning, any database evaluation means analyzing the above properties and see how they can help you achieve your business goals.
Happy Reading !