Database Selection & Design (Part IX)
— Microservices Data Access Patterns —
Database selection process depends on the Application Architecture. Data Management in a monolithic systems can get pretty complex. However, it could be a completely different ball-game in a microservices architecture. There are different patterns available to help with this complexity. Few of them are discussed below.
CQRS (Command Query Responsibility Segregation):
In most of the application scenarios, the read and write frequency varies drastically. It would be either write heavy or read heavy. This pattern is used to separate the reads and writes into separate data stores. This helps with the decoupling of the command and query models. The application should not have any direct access to change the contents directly to the read store. With read-only replicas, there is a flexibility to place these data stores near to the application instances, which will improve the performance tremendously.
Pros:
- Read and Write schemas can be completely different and can be optimized for read efficiency and write efficiency. By storing a materialized view in the read database, the application can avoid complex joins when querying
- Read and Write store can scale independently. Also, different database types can be used to for read and write stores. Example: Write can happen to relational and read can happen from a non-relations database type
- Security can be managed more tightly, as there exists a responsibility separation
- Each view can be complex with relations and can be rebuilt at the background
Cons:
- The consistency of data between them should be closely managed. It is always going to be eventual consistency, due to which there will be a stale data read situation can can come up from this design. This pattern is different from caching by where the data is stored.
- The design can get more complex quickly, as you combine this with other patterns like Event Sourcing
Event Sourcing:
This pattern is designed to store the operations on the data in a sequence of events. These events are recorded in an append only mode and is stored in a persistent event store. Each event represents a set of changes to the data. This event store acts as the system of record (the authoritative data source) about the current state of the data. The application code’s responsibility is to update this event store and what happens after that is not the responsibility of the application, thus it is decoupled
These immutable events are usually published to the downstream services for further actions. The consumer services then take these events and initiate tasks to complete the rest of the operations. This decoupling of the tasks from the events provides flexibility and extensibility. This can also provide events to external systems notifying them about the addition or update of events. In addition, there is a materialistic view that is created to service the read operations from the application.
This materialistic view is built by summarizing the events published by the event store. At any point it’s possible for applications to read the history of events, and use it to materialize the current state of an entity by playing back and consuming all the events related to that entity.
Storing this information as events for historical, will serve as a critical piece of information for audit trail and compliance purposes. Event sourcing is commonly combined with the CQRS pattern by performing the data management tasks in response to the events, and by materializing views from the stored events.
Use this pattern in the following scenarios:
- When you want to capture intent, purpose, or reason in the data
- When it’s vital to minimize or completely avoid the occurrence of conflicting updates to data
- When you want to record events that occur and be able to replay them to restore the state of a system, roll back changes, or keep a history and audit log
- When using events is a natural feature of the operation of the application and requires little additional development or implementation effort.
- When you need to decouple the process of inputting or updating data from the tasks required to apply these actions. This might be to improve UI performance, or to distribute events to other listeners that take action when the events occur
- When used in conjunction with CQRS, and eventual consistency is acceptable while a read model is updated, or the performance impact of rehydrating entities and data from an event stream is acceptable.
Two or Three Phase Commits:
Unlike transaction in a local database, operations in distributed database is more complicated as it involved updating information in multiple databases across the network. Database must coordinate the commit or roll back of the changes in a transaction as a self-contained unit. The entire transaction commits, or the entire transaction rolls back. Phased commits are a simple algorithm-based approach to implement this requirement
Two Phase Commit: In this approach, a coordinator is required to perform this operation. The client contacts the coordinator and proposes a value. The coordinator then tries to establish the consensus among a set of processes (a.k.a Participants) in two phases, Prepare & Commit.
The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions. After a participant has sent an agreement message to the coordinator, it will block until a commit or rollback is received. To overcome this challenge, three phase commit protocol was introduced.
Three Phase Commit: In this approach, any node can perform the coordinator responsibility. This is an extension of the 2-PC commit approach, where the complete operation happens in three phases. Prepare, Pre-Commit and Commit. The pre-commit phase will handle the failure situations and helps us to recover from the case when a participant fails or both coordinator and participant fails during commit phase.
When the current coordinator fails and if other have to take over, it learns the status of the participants before taking any action. If they are in commit phase, then it assumes that previous coordinator before crashing has made the decision to commit. Hence it can shepherd the protocol to commit. Similarly, if a participant says that it doesn’t receive prepare to commit, then the new coordinator can assume that previous coordinator failed even before it started the prepare to commit phase. Hence it can safely assume no other participant would have committed the changes and hence safely abort the transaction.
Saga Pattern:
This pattern was defined in 1987. It helps solve the data handling issues within the Microservice architecture world. This pattern comes into play when 2-PC or 3-PC patterns cannot be applied. There are some ways to implement SAGA pattern, 2 mostly used modes are Choreography and Orchestrator.
SAGA — Choreography: Sequence of steps/events that can be choreographed to perform the end to end transactions with atomicity. In this scenario, each service will talk to each other to coordinate the events.
With the above example, all services involved are talking to each other via events. It becomes very complicated too soon to keep the coordination intact, as the number of participating services increases. Roll back becomes too complicated as the application needs to track the sequence in the correct order to roll it back without any transaction management violations.
SAGA — Orchestration: In order to address the complexity mentioned above, a process manager can be added to the mix whose sole responsibility will be to take care of the orchestration. The orchestrator communicated with each of the services and make sure the transaction is either completed successfully or rolled back completely. It is easy to maintain as the transaction complexity remains linear with the addition of new services. It also avoids cyclic dependencies between the services.
Normally, the SAGA pattern usually uses with a message broker (Kafka or RabbitMQ) in order to have better performance, increase reliability and improve scalability
Pros: The main benefit of the Saga Pattern is that it helps maintain data consistency across multiple services without tight coupling. This is an extremely important aspect for a microservices architecture.
Cons: However, the main disadvantage of the Saga Pattern is the apparent complexity from a programming point of view. Also, developers are not as well accustomed to writing Sagas as traditional transactions. The other challenge is that compensating transactions also have to be designed to make Sagas work.
Link for the next part in this series:
https://medium.com/@f5sal/database-selection-design-part-x-5b27cac11915