Database Selection & Design (Part I)
— Introduction to Database Technologies —
Database is the backbone of any technology enterprise, to store information in a structured set, and for efficient management of information. Database has evolved to a greater extend in the last 50+ years from Hierarchical, Network, Relational, Object Oriented, No-SQL, New-SQL to HTAP (Hybrid transaction/analytical processing).
Before Database’s era, things were being tracked as hard copies in journals, files, cabinets, ledgers and archives of paper. Real estates were dedicated to store just this information. Scanning through this to locate a record is a mundane task, which can become complicated when records were misplaced. Data loss was prevalent during catastrophic situations like fire, flood and other natural disasters.
To solve the above issues, databases were created. If you step back and think about the evolution of database management systems, below are the different types that supports different varieties of data structure. On a high level, they are classified as below:
Let us look at these types and the sub categories in detail. They can also be referred as SQL and NO-SQL databases.
Pre — Relational (SQL) Databases:
File based System:
The is one of the very first file based databases that involves data tabulated via hole punches in paper cards, may be considered the first computerized flat-file database. This concept of storing information in a flat format evolved overtime and became a flat-file database in 1960s. In this type, records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple and a flat file can be a plain text file, or a binary file.
One thing that flat file failed to establish was the relation with other records that it co-existed with.
Then came the hierarchical database system in 1968. A hierarchical database is a data model in which the data are organized into a tree-like structure. The data are stored as records, which are connected to one another with a parent child relationship. A record is a collection of fields, with each field containing only one value. IBMs Information Management Systems (IMS) database was a classic example of this type. The hierarchy model is simple to understand and most of the complexity was backed into the application logic.
The issue with the hierarchical database is that it cannot easily represent relationships between types of data, except for the parent-child linkage.
This issue hierarchical database had was overcome by the Network database. In 1969, scientists at the Conference on Data Systems Languages (CODASYL) released a publication that described the network model. A network database is a type of database model wherein multiple member records or files can be linked to multiple owner files and vice versa. The model can be viewed as an upside-down tree where each member information is represented by the branch linked to the owner, which is the bottom of the tree. This model allows relationships, and it has a “schema” (a diagrammatic representation of the relationships).
A relational database is a type of database that stores and provides access to data points that relate to one another. Relational databases are based on the relational model, an intuitive, straightforward way of representing data in tables.
- Very efficient in handling highly structured data
- String ACID (Atomicity, Consistency, Isolation, and Durability) properties
- Data is easily stored and retrieved using SQL queries. The structure can be scaled up quickly because adding data without modifying existing data is simple.
- Well-suited to applications that require tiered access, since RDBMS has inbuilt structure of access controls. i.e.. customers could view their accounts while company representatives could both view and update.
- Very difficult to handle and support unstructured data
- Cost is a consideration with relational databases, as they tend to be more expensive to set up and grow
- Representing real world entities in context is difficult in the bounds of an RDBMS. “Sliced” data has to be reassembled from tables into something more readable, and speed can be negatively impacted. The fixed schema doesn’t react well to change, either.
- The structure of relational database is well suited for vertical scaling. Horizontal scaling is usually faster and economical. Sharding (in which data is horizontally partitioned and distributed across a collection of machines) is necessary to scale out a relational database. Sharding relational databases while maintaining ACID compliance can be a challenge
Popular examples of standard relational databases include Microsoft SQL Server, Oracle Database, MySQL and IBM DB2. These types of databases are used in top companies like Wells Fargo, Chase and Capital One. Real world use cases Banking and Financial Services.
Post — Relational (NoSQL) Databases:
When the structure of the data became too unpredictable and the volume is exploding with the Internet of Things, innovations kicked off to find a new set of database types to support them. With that, Non-Relational databases were introduced and there were many different types helps you to handle the different data types. A few of them are discussed below:
Key Value store:
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table.
- This style of database has a lot of positives. It’s incredibly flexible, able to handle a very wide array of data types easily. Keys are used to go straight to the value with no index searching or joins, so performance is high.
- Portability is another benefit: key-value stores can be moved from one system to another without rewriting code. Finally, they’re highly horizontally scalable and have lower operating costs overall.
- Flexibility comes at a price. It’s impossible to query values, because they’re stored as a blob and can only be returned as such. This makes it hard to do reporting or edit parts of values. Not all objects are easy to model as key-value pairs, either.
Popular examples of Key-Value pair databases are Redis, Amazon S3, CouchBase, Riak etc. These types of databases are used in top companies like Twitter, Pinterest, GitHub and Snapchat. Real world use cases include Social Networking and Gaming
A document database is a type of non-relational database that is designed to store and query data as JSON-like documents. Document databases make it easier for developers to store and query data in a database by using the same document-model format they use in their application code. The flexible, semi-structured, and hierarchical nature of documents and document databases allows them to evolve with applications’ needs.
- Document stores are very flexible. They handle semi-structured and unstructured data well. When you don’t know what type of data will be stored during set-up, document stores are a good choice. You can create your desired structure in a particular document without affecting all documents.
- Schema can be modified without causing downtime, which leads to high availability. Write speed is generally fast as well.
- Easy to scale horizontally. Document stores scale out fast and efficiently
- Document databases sacrifice ACID compliance for flexibility. While querying can be performed within a document, it’s not possible across documents
Popular examples of Document store databases are MongoDB, CouchDB, RethinkDB, RevenDB etc. These types of databases are used in top companies like Uber, Lyft and Stack. Real world use cases include Content Platform by leading Magazine publishers.
In this type of database, internally the data is column fashion unlike a record-oriented database. A column-oriented database stores each column continuously, i.e. on disk or in-memory. Each column on the left will be stored in sequential blocks. This reduces IO in access, Sequential access makes it faster.
- This type of database has some benefits of both relational and non-relational databases. It deals better with both structured and semi-structured data than other non-relational databases, and it’s easier to update.
- Compared to relational databases, it’s more horizontally scalable and faster at scale.
- Column databases compress better than row-based systems.
- Large data sets are simple to explore. Wide-column stores are particularly good at aggregation queries
- Writes are expensive in the small. While updating is easy to do in bulk, uploading and updating individual records is hard.
- Slower than relational databases when handling transactions.
Popular examples of Column based databases are Cassandra, Amazon Dynamo DB, HBase, Google Store, etc. These types of databases are used in top companies like Hulu, Instagram, Netflix, Reddit etc. Real world use cases include online video streaming and Financial Banking.
A database that uses graph structures for queries with nodes, edges and properties to represent and store data. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes.
Popular examples of Graph databases are Neo4j, InfiniteGraph etc. These types of databases are used in top companies like Facebook, Walmart eBay, adidas etc. Real world use cases include Social Networking and real time recommendations.
Database has grown to great extends in the last 50+ years and will continue to grow based on the ever changing trends in the way of life. Having this information organized and processed on time, gives us the ability to make sense of what is happening around us and the business opportunities it opens up for us to pursue.
When it comes to selecting the right database for your application, there are many more parameters that you need to understand and consider before making your choice. Once you made yours, it becomes very tedious and expensive to change it at a later point in time. So, please take the time to review all parameters required for your organization to operate with.
In this series, I have taken an attempt to review all these properties, how they behave in different conditions, their strengths and weakness along with some recommendations. The properties are explained at the conceptual level, so that you can understand and take the right decision for your organization and for your business processes.
Next part in this series → here
Previous part in this series → Well, this is the first part in the series.