Database Selection & Design (Part V)

— Partition / Sharding —

Audience: The article covers the some of the common properties (Partitioning & Sharding) and its implementation techniques. These concepts will be referred in the later parts of this series during the database selection and design procedures.

  • In case of Masterless configuration, the read request will be send to the nearest node, which then becomes the coordinator node (The coordinator is selected by the driver based on the policy you have set. Common policies are DCAwareRoundRobinPolicy and TokenAwarePolicy. Every node in the masterless configuration will have all information about the partition and the token ranges each node within the partition serves. This coordinator node works with the relevant partition/node(s) to get the data and send it back to the client.
  • Performance: When data is partitioned into smaller chunks and stored in multiple databases, the data access operations is much faster on this smaller volume.
  • Parallelism: Operations affecting multiple partitions can execute in parallel. Parallel implementation offers detailed benefits to optimize resource utilization and lessens the implementation time too
  • Security: Partitioned an be applied based on the sensitivity of your data as well. You can separate sensitive and nonsensitive data into different partitions and apply different security controls to the sensitive data.
  • Availability: Separating data across multiple servers avoids a single-point-of-failure. If any one instance fails, only the data in that partition is unavailable. Operations on other partitions can continue.
  1. Joins across shards: Often times, application / business logic may need data from different shards, which will require programmer to build complex queries. While pulling data from multiple shards, performance and availability concerns needs to be kept in mind.
  2. No Native Support: Sharding is not natively supported by every database engine. With many databases, the algorithms and nuances of sharding needs to be managed explicitly by the programmer.

Conclusion:

There are pros and cons for different types of partition techniques we reviewed above. Consistent hash sharding is better for scalability and preventing hot spots, while range sharding is better for range based queries. A simple comparison of algorithm mentioned below.

Engineering Director, People Leader, Offroader, Handyman, Movie Buff, Photographer, Gardener