System Design Note #04: Relational Database

The history of relational databases stretches back to the 1970s, marking them as a time-tested method for structuring and accessing data. In the digital era, data is the lifeblood of companies, and efficient data storage becomes crucial as the amount of data skyrockets. Today, we will talk about foundational concepts about these types of data storage.

  • Tabular Data Storage and Normalization: Central to relational databases is the concept of storing data in tables. This structure is not just for the organization; it's designed with a process called normalization in mind. Normalization is a systematic approach that decomposes data into the most granular level, thereby preventing duplication. This meticulous organization is crucial for efficient storage and retrieval, reducing redundancy and ensuring that each piece of information is stored only once.
  • Schema-First Approach: Relational databases are built on a schema-first philosophy. This means that before any data can be entered, the database schema, which is the blueprint defining the database structure, must be defined. The schema dictates the tables, the fields within those tables, and the relationships between them. This pre-defined structure allows for robust querying capabilities, ensuring that data can be accessed and reported in a consistent and reliable manner.
  • Use of SQL for Queries and Data Manipulation: The language spoken by relational databases is SQL or Structured Query Language. SQL provides a powerful toolset for querying and manipulating the data within these databases. It allows users to execute a wide range of operations, from simple data retrieval to complex transactions involving multiple tables and conditions.
Example SQL database schema. Source: https://dev.to/duomly/what-is-a-relational-database-and-relational-database-management-system-112f

ACID Properties

The ACID properties are a set of principles that guarantee reliable processing of database transactions. These principles are fundamental to relational database systems and are critical for ensuring data integrity and handling concurrency. ACID stands for Atomicity, Consistency, Isolation, and Durability:

  • Atomicity: This property ensures that all operations within a database transaction are completed successfully. If any part of the transaction fails, the entire transaction fails, and the database state is left unchanged. Imagine a banking system where a fund transfer transaction involves debiting one account and crediting another. Atomicity guarantees that both the debit and credit happen together; if one fails, neither operation will be carried out.
  • Consistency: Consistency ensures that a transaction can only bring the database from one valid state to another. This means that any data written to the database must be valid according to all defined rules, including constraints, cascades, and triggers. If a transaction results in a violation of any of these rules, the transaction is rolled back. For example, if a transaction attempts to insert a row with a duplicate primary key, the transaction will be rejected to maintain consistency.
  • Isolation: Isolation means that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially. In other words, transactions should not interfere with each other even when they are executed at the same time. If one transaction is updating data while another reads it, isolation ensures that the second transaction will see either the state of the data before or after the update, but not an intermediate state. This is often implemented using various locking mechanisms or multiversion concurrency control (MVCC).
  • Durability: Durability assures that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a banking system, once a fund transfer is confirmed, the transaction is recorded permanently to the database storage. Even if the system crashes immediately after, the changes won’t be lost.
Source: https://www.javatpoint.com/acid-properties-in-dbms

Here’s an example that illustrates all four properties:

Imagine you are shopping online. You add items to your cart and proceed to checkout. When you submit your order, the following transaction begins:

  1. Atomicity: The payment is processed, and the inventory is updated. Both actions must succeed to complete the order. If your payment fails, the inventory update won't occur, and you won't be charged.
  2. Consistency: The database checks if the inventory is sufficient and whether your credit card information meets the required format and funds availability. Only if these checks pass will the transaction proceed.
  3. Isolation: While you are placing your order, many other transactions may be happening concurrently. Isolation ensures that these transactions do not affect the inventory seen by your transaction until it is complete.
  4. Durability: Once the transaction is successful, the results are permanent. Your order details and the updated inventory are stored, and even if the system fails afterward, your order will not be lost.

Pros and Cons

Pros:

  • Complex and Flexible Queries: Relational databases shine when it comes to the complexity and flexibility of the queries they can handle. With a powerful language like SQL, users can perform a wide range of data manipulations and complex joins that may not be possible with non-relational databases. For instance, SQL allows for subqueries, nested joins, and transactions, which can be crucial for complex business logic.
  • Efficient Storage: Thanks to normalization, a process that minimizes redundancy and dependency by organizing fields and table of data, relational databases can store information efficiently. This not only conserves storage space but also ensures that the data is as concise as possible, reducing the number of data-related errors.
  • Natural Data Structure: The tabular format of relational databases is intuitive since it reflects how we tend to organize information naturally – in rows and columns. This makes the design, implementation, and use of relational databases more aligned with how humans perceive and think about data, facilitating easier data entry, storage, and analysis.
  • ACID Transaction Guarantees: The ACID properties (Atomicity, Consistency, Isolation, Durability) of relational databases ensure that all database transactions are processed reliably. This means that the database maintains a state of consistency even in the face of errors, power failures, and other mishaps, which is paramount for any application that requires data reliability and integrity.

Cons:

  • Schema Rigidity: Relational databases require a predefined schema, which makes them less flexible when it comes to adapting to changes. Any significant alteration in data structure demands a corresponding change in the schema, which can be labor-intensive and prone to errors, especially if the database contains a large amount of data.
  • Maintenance and Scaling Challenges: As the amount of data grows, relational databases can become more difficult to maintain due to their strict schema and complex relationships between tables. Scaling out (adding more nodes to a system to handle increased load) can be particularly challenging, as it often requires significant engineering effort and changes to the application logic.
  • Performance Issues with Large-Scale Data: Relational databases are traditionally optimized for ACID properties, but when it comes to handling very large datasets or high throughput applications, they can suffer from performance bottlenecks. Particularly, read operations can become slow, as they may require time-consuming joins across multiple tables. As the data volume grows, the performance can degrade, which may necessitate additional strategies like sharding or caching to maintain performance.

Conclusion

When To Choose a Relational Database

Choosing a relational database is prudent when your data operations require complex, flexible queries to sift through structured data. Relational databases excel at handling operations that involve multi-table joins, transactions, and subqueries, which are often necessary for deep analytical tasks.

Moreover, if your application demands transactional integrity and consistency, the ACID (Atomicity, Consistency, Isolation, Durability) guarantees provided by relational databases are indispensable. They ensure that all your transactions are processed reliably, maintaining data accuracy and reliability, which is crucial for applications like banking systems, inventory management, and any other domain where data consistency is non-negotiable.

When Not To Choose A Relational Database

Conversely, a relational database might not be the ideal choice in situations where the relationships between data points are not the primary concern, and the data does not naturally fit into a tabular format. This is often the case with unstructured data such as text, images, and videos, where a NoSQL database might be more appropriate.

Additionally, if your system's priority is to provide lightning-fast read operations, especially at a large scale, the performance overhead that comes with the rich query capabilities and transactional integrity of relational databases might become a bottleneck. In such cases, simpler key-value stores or document databases might provide the speed required for a better user experience without the overhead of maintaining ACID properties.

In essence, the choice of a relational database should be driven by the specific needs of your data structure, the complexity of your queries, and the necessity of transactional integrity, weighed against the need for scalability and read performance.