Database Selection & Design (Part II)
— Relational Properties (ACID) —
In the last part in this series, we reviewed the different key databases based on their structural categorization, properties and types. Let us closely look at the relational properties. We will review the ACID properties in this section and will review other properties in the upcoming sections.
Transactional Management (ACID) Requirements
ACID stands for Atomicity, Consistency, Isolation and Durability. This applies to the Relational databases. The corresponding term for Non-Relational database is BaSE, which stands for Basically Available, Soft State and Eventually Consistent.
Atomicity (Serializable transaction):
What is Atomicity? Atomicity requires that each transaction be “all or nothing”. If any part of the transaction fails, the entire transaction fails and the database state is left unchanged. The definition of what constitutes an atomic transaction is decided by its context or the environment in which it being implemented.
For example, in an online airline-booking system, a booking may consist of 3 separate actions that together form a transaction — paying for the seat, reserving the seat & removing from inventory. Business logic dictates that all these actions, though distinct and separate actions, must occur together. If one happens without the other, problems could occur. For example, the system may reserve the same seat for two separate customers OR reserving a seat without charging the customers, etc. The diagram below indicates that if any part of the transaction fails, the entire transaction is marked as failed and the state is left unchanged.
It is essential that a database system that claims to offer atomicity be able to do so even in the face of failure in power supply or the underlying operating system or application that uses the database. This can be achieved by complex mechanisms such as journaling or logging or CAS (Compare and Swap), or via operating-system calls.
- Logging is a general technique for achieving all-or-nothing atomicity. This is widely used in databases and can achieved with reasonable performance, as it stores very minimal information possible. Writes are always fast as it is sequential. Reads can be fast with cell storage. Write-ahead logging (WAL — Golden Rule), makes it safe to update cell storage, Recovery protocol, undo losers / redo winners. Undo helps with the atomicity and Redo helps with the Durability property. This concept has been detailed out in the previous part of this series. Please refer to that for in-depth understanding of logging and recovery.
- Journaling file system is a file system that keeps track of changes not yet committed to the file system’s main part by recording the intentions of such changes in a data structure known as a “journal”, which is usually a circular log. In the event of a system crash or power failure, such file systems can be brought back online more quickly with a lower likelihood of becoming corrupted. There are two types: Physical Journaling and Logical Journaling. A physical journal logs an advance copy of every block that will later be written to the main file system. A logical journal stores only changes to file metadata in the journal, and trades fault tolerance for substantially better write performance
- Shadow pages, also called as Copy-on-write, keeps updated copies of all modified entries on disk but retain old pages. It is very easy to roll back, just by aborting and by reverting back to shadow page.
Atomicity can also be achieved via patterns like two-phase commits and three-phase-commits, which is detailed out later in this series.
There is a significant confusion regarding Consistency and how it is different from Atomicity. Consistency ensures that any transaction will bring the database from one valid state to another. While Atomicity focuses on “all or nothing” for transactions, Consistency guarantees the following:
- Operations in transactions are performed accurately, correctly, and with validity, with respect to application semantics
- Any transactions started in the future necessarily see the effects of other transactions committed in the past
- Database constraints are not violated, particularly once a transaction commits
In the reservation example from above, assume we have 10 seats available for customers to book from. At this point, the inventory database has a count=10. Since the customer has not initiated any reservation yet, his count will be 0. Across customer and Inventory database, the total count is 10 and is considered a valid state across.
- Now, let us say, the customer initiates the reservation, the seat is added to the customer count as 1. At the same time, there should be a seat reduced from the inventory database, making it a count of 9. The total count across customer and inventory still maintains as 10, which is a valid state as before. This becomes an acceptable transaction.
- If for some reason, after adding the seat to customer and before removing it from inventory, there is a failure situation, then the total count between customer and inventory database adds up to 11, which is not a valid state.
- If a failure occurs after removing count from inventory and before adding the seat to customer then the total count between customer and inventory database drops to 9, which is again not a valid state.
This is how the transaction integrity is maintained. This is just one aspect of consistency, which resonates very closely with atomicity. Now let us take the situation with multiple customers trying to make the reservation for these open seats. It is imperative that the transaction started in the future should see the effects of the valid transactions that completed in the past. Otherwise, you might book 10 available seats to 50 customers.
Also, Constraints make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a given integer attribute to values between 1 and 10. Constraints provide one method of implementing business functions in the database and support subsequent data use within the application layer.
A transaction can be perfectly atomic, but may not be consistent. In the example above, assume that the customer cancels the seat and deletes the profile. When both succeeds, it becomes a perfectly atomic transaction. Since this action violated the database constraints by leaving the foreign key entries (pending travel/booking), it lost its consistency.
Types of Consistency: On a very high level, there are 2 types of consistencies. Strong Consistency and Weak Consistency. Weak Consistency is also called Eventual Consistency, as it happens over time.
- Strong Consistency: After the updates, any subsequent access will return the same updated value. It offers up-to-date data but at the cost of high latency.
- Weak Consistency: After the updates, it is not guaranteed that subsequent access will return the same updated value. Eventual consistency (aka lazy propagation) is also a form of weak consistency. It offers low latency but may reply to read requests with stale data since all nodes of the database may not have the updated data.
In NoSQL world, Tunable Consistency is also becoming popular to provide more control to the Engineers. We will see more (patterns) about consistency in the upcoming parts of this series.
Isolation (Concurrency Control):
Concurrency control can be Optimistic control or Pessimistic control.
- Pessimistic Control: This involves locking of the Tuple (a row) to acquire exclusive locking and to perform the required operations. This method has its disadvantages that comes with deadlocks, failure of the process which procured the lock that can bring down the system.
- Optimistic Control: This involves versioning. As diagram below depicts, when two services try to read the same state of a data and try to update at different time, the versioning kicks in and rejects the outdated one by comparing the versioning of the data being updated.
Isolation Read Phenomenon: There are three different read phenomena when Transaction 1 reads data that Transaction 2 might have changed.
- Dirty Read: A dirty read (aka uncommitted dependency) occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed. This creates issues when the uncommitted transaction is rolled back for some reason, the read that happened in between is not valid any more
- Non-Repeatable Read: A non-repeatable read occurs when, during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.
- Phantom Read: A phantom read occurs when, in the course of a transaction, new rows are added or removed by another transaction to the records being read.
To address these issues, we have the following Isolation levels that are available for us to choose to adapt to the application we are designing. The default isolation level of different DBMS’s varies quite widely.
- Read Uncommitted (No Lock): This is the lowest isolation level. In this level, dirty reads are allowed, so one transaction may see the uncommitted changes made by other transactions
- Read Committed (Write Lock): This isolation level is bit stronger than the previous level. In this level, a lock-based concurrency control implementation with write locks (acquired on selected data) is established until the end of the transaction, but read locks are released as soon as the SELECT operation is performed (so the non-repeatable reads phenomenon can occur in this isolation level). By this, it simply restricts the reader from seeing any intermediate, uncommitted, ‘dirty’ read
- Repeatable Read (Read & Write Lock): This isolation level is bit stronger than the previous level. In this isolation level, a lock-based concurrency control implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so phantom reads may occur
- Serializable: This is the highest isolation level. With a lock-based concurrency control implementation, serializability requires read and write locks (acquired on selected data) to be released at the end of the transaction. Also range-locks must be acquired when a SELECT query uses a ranged WHERE clause, especially to avoid the phantom reads phenomenon
- Snapshot: This level of isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database, and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.
There are many organizations and lines of business, like financial and health services in the industry whose very existence depends on the information that is stored in their databases. The durability of their data and their ability to recover in case of any failure makes a difference in the industry. Loosing data might be very undesirable for their organization and to their customers.
In database systems, durability is a property that guarantees the transactions that have committed will be persisted permanently. The committed data will not be lost at any cost. During the data read/write process and when data is in transit between your volatile and non-volatile layer, there could be multiple possibilities for your data to be lost due to failures are different stages. This can be achieved by flushing either the actual transaction or the transaction’s log records to non-volatile storage before acknowledging commitment.
Durability is implemented by writing transactions into a transaction log that can be reprocessed to recreate the system state to a valid state. A transaction is deemed committed only after it is entered in the log and the log entries flushed to the disk. The REDO operation of the Write Ahead Logging (explained above) is used to improve the durability of the databases. This has been explained in detail in the next part of this series.
The ACID property is a crucial parameter to be considered during the selection and design process of your application. The nature of your business requirements dictates the need for these properties and how it needs to implemented. Applications with strong OLTP requirements uses these concepts more than applications with OLAP requirements. As the industry moved from relational to non-relational databases, the strict regulations around ACID properties were relaxed to come up with a new property type called BaSE, which will be discussed in the later part of this series.