Database Selection & Design (Part X)
— Security Options and Data Temperature —
Database Security:
In the last several years, the number of data breaches has risen considerably. The Personally Identifiable Information (PII) data needs to be secured and protected. The level of damage it causes to the organization and to the customer base is irreversible. The core information about any organization will be stored in the database of the applications and the platforms it exposes for customers’ consumption. When malicious attacks happen, the hackers target the database content either directly or via the application vulnerabilities. There is an increasing number of regulations and penalties for data breaches that organizations must deal with, such as General Data Protection Regulation (GDPR) — some of which are extremely costly. Effective database security is key for remaining compliant, protecting organizations’ reputations, and keeping their customers. Different layers of security are mentioned below:
Database level security:
Encryption: This is the most commonly used data encryption and is the strongest form when implemented right. It uses complex algorithms to convert plain text into unreadable ciphertext, which can only be decoded by the consumer with the appropriate keys. Different forms are
- Network Encryption: This mode protects data as it travels, leaving data in the clear on either end of a transmission
- Transparent Encryption: This mode protects data at rest, decrypting the data before it’s accessed by authorized users
- Persistent Encryption: This mode protects data regardless of where it’s stored or copied, providing maximum protection against inappropriate use.
Tokenization: Like encryption, this is a reversible process that replaces sensitive data with data that can’t be used by unauthorized parties. While encryption uses algorithms to generate ciphertext from plaintext, tokenization replaces the original data with randomly generated characters in the same format (token values). Relationships between the original values and token values are stored on a token server. When a user or application needs the correct data, the tokenization system looks up the token value and retrieves the original value. Tokenization is often used to protect credit card numbers or other sensitive information in payment processing systems, customer service databases, and other structured data environments. However, length-and-format-preserving encryption can address the same use cases, often with less complexity.
Masking: It is essentially permanent tokenization. Sensitive information is replaced by random characters in the same format as the original data, without a mechanism for retrieving the original values. This is a common practice in test environments, which require realistic-looking data but cannot be populated with actual customer or employee data. Masking can also be used to control access to sensitive data based on entitlements. This approach, known as dynamic data masking, allows authorized users and applications to retrieve unmasked data from a database, while providing masked data to users who are not authorized to view the sensitive information.
Redaction: It is a process of permanent removal of sensitive data, digital equivalent of blacking out text in printed versions. It can be accomplished by simply deleting content or replacing critical characters with special characters like asterisk. Automated data redaction is an effective method of eliminating sensitive data from documents, spreadsheets, and other files, without altering the remaining file contents. Organizations often adopt this approach to prevent the spread of sensitive information that has been extracted from a database and saved on file servers, laptops, or desktops.
Access Level Security:
This depends on the organizational decisioning on who gets access to what level of data in the database, managed via an Access Control List or Permission based access. This is mostly the responsibility of the Database Administrator or the Business Owner controlling it.
Perimeter Level Access:
This utilizes the peripheral security settings that will indirectly control the access to the underlying database structure and information.
Back-up & Restore:
Storing copies of backups in a safe, off-site location protects you from potentially catastrophic data loss. Backing up is the only way to protect your data. This is very critical from an audit, compliance standpoint. There are many ways to accomplish this. Based on the data temperature requirement for your application, you can define the backup strategy. Few terminologies to be noted here:
Offline Back-up: Process of backing up your database without stopping the traffic. There is a potential of losing some data during the procedure, which will be taken care in future procedures
Online Backup: Process to backing up your database after you stopped all your traffic from the application side. This will guarantee the data integrity during the backup procedure and avoids data loss
Incremental Back-up: This is also called delta backup, where in the only the incremental data addition is backup as supposed to the full backup. This has to be in conjunction with the incremental procedure. The time interval between 2 full back-ups will be covered with this incremental back up
Log backup: A backup of transaction logs that includes all log records that were not backed up in a previous log backup
Restore: All back-up procedures should be tested on a periodic basis to avoid surprises at times of disaster. At times of recovery, the full back up needs to be restricted first and then all incremental (delta) back-ups will be applied on top of that to get a fully valid state.
In case of distributed systems, there are other procedures to mitigate this challenge. Replication, CQRS, Event sourcing and other patterns can in most cases be used as a back up procedure. Special care needs to be taken care about the level of distancing you want to create in these cases. Having all in one rack will not be a fool proof way of designing this. For that matter, having data replicated across DC helps tremendously. There is obviously a trade-off that comes into play with this. Having these long distancing patterns, increases the latency across network. There are also many third party softwares available to support the back and restore procedures.
Data temperature requirements to support compliance:
Data temperature is used to evaluate the type of data retention required for your application. Different application and organization need to maintain different data temperatures to adhere with organizational policies and audit requirements. Types available are
- Hot: Represents cached data in memory for quicker access and performance
- Warm: Includes data stored in drives for easy access and scalability
- Cold: Data stored in tape drives for more retention period
Link to the next part in this series:
https://medium.com/@f5sal/database-selection-design-part-xi-346c74291d3d