A relational database management system (RDBMS)—often referred to as a SQL database because of the Structured Query Language (SQL) used to interact with it—has long been the gold standard for storing transaction-based data. Since the 1970s, RDBMSs have been used to store bank transactions, stock trades, inventory management, and more. While commercial databases like Oracle and SQL Server® are in broad use—along with open-source databases like MySQL and PostgreSQL—some applications have needs beyond the scope of RDBMSs.
The changing types of data generated and stored have led to the need for different types of data stores. Whether it’s streaming internet of things (IoT) data coming from edge devices or graph API data to manage your application’s security model, changing data has led to changes in databases. NoSQL databases have emerged to fill a number of these gaps.
Key Differences Between NoSQL and SQL Databases
Consistency and Durability
One of the main benefits promoted for RDBMSs is transactional consistency. This strict guarantee of consistency causes an RDBMS to suffer a performance penalty compared to NoSQL databases, which generally promise eventual consistency.
Relational databases have a specific set of characteristics to help guarantee data durability and consistency. Collectively, these properties are known as ACID:
- Atomicity: Ensures each transaction is treated as a single unit, which either succeeds or fails completely, not partially
- Consistency: Ensures only successful transactions can change the state of the database
- Isolation: Allows transactions to operate concurrently while using a locking mechanism to maintain data integrity
- Durability: Ensures transactions are logged to secondary media, which will survive in the event of a system failure or crash
However, as a trade-off to performance, RDBMSs boast two other properties: durability and isolation. Durability effectively requires recording any insert/update/delete operation twice: once as the update is made and then into a transaction log stored on durable media. In most modern relational databases, these operations are performed on the actual table data in memory (which is volatile), but the transaction isn’t marked as committed (which means completed) until it reaches the transaction log. This additional write operation can cause additional overhead, especially for many singleton operations, such as a series of delete statements.
The second property designed to affect overall throughput is isolation. This property prevents incomplete transactions from being shown to users who are querying the table. In some scenarios where data operations are inefficient and take too long to complete—or where the calling code handles transactions poorly—subsequent operations can end up waiting until the original operation is complete. Blocking problems like these can cause highly variable performance during query execution. Though blocking problems are usually easy to remediate with index tuning and code reviews, they still represent a risk to the performance of a database application.
Scalability
Besides consistency differences, most NoSQL solutions are designed around a scale-out paradigm, where data is stored across nodes in a cluster for both high availability and increased throughput. Though this is possible in an RDBMS, it requires a data structure designed to be partitioned across servers and can also result in complex query patterns. This schema design would have to be built at the beginning of the application’s life, so this pattern is less common. This allows for a horizontal scaling pattern, which can be more cost-effective than increasing the size of a single server solution like the ones most databases run on.
Schemas
Finally, SQL databases have “rigid” schemas, which means changes to the structure of a table or tables require code changes at both the database and application level. Though some NoSQL databases like CosmosDB have tunable consistency levels and may be used for transactional activity, relational databases are still your best bet for transactional data.
When to Use NoSQL
Although many of the concepts behind NoSQL systems have existed for decades in academic circles, aside from some narrow use cases, they didn’t become common until the 2010s. So what changed to drive this demand? Factors include the rise of DevOps and continuous integration and continuous deployment (CI/CD) processes to allow for more frequent code deployments and more flexibility for data stores. The main characteristics of these NoSQL stores are the ability to scale out across multiple servers, flexible schemas, and built-in data replication between nodes.
However, a bigger driver was the amount of data being logged, most often from mobile devices and applications. Most of these applications output their log records either in classical log record format (timestamp and text string) or, more commonly, in JavaScript Object Notation (JSON) files, as shown in the following example:
JSON data has its own schema, and it wasn’t easily processed by traditional databases. JSON also allows developers more flexibility to add additional data fields for business uses, such as product catalogs. Since most relational databases in the 2010s didn’t query JSON easily, organizations turned to NoSQL databases such as MongoDB and Microsoft DocumentDB, which were optimized for querying these JSON documents.
These new types of databases are just a subset of the data stores that came into use, such as Cassandra—a column family database—and Gremlin or Neo4J, graph databases. There are now many NoSQL stores capable of meeting the specific needs of some part of your application.
While some relational databases—notably, SQL Server and Oracle—have implemented graph and JSON features, they aren’t as robust and fully featured. For example, in SQL Server, JSON data doesn’t have its own dedicated data type or indexes, which can limit the use case for storing large amounts of JSON data there.
Will NoSQL Replace SQL?
Another change shaking the computer industry over recent years is the acceleration of container orchestration platforms like Kubernetes and the rise of the public cloud. Although neither of these trends deals directly with data stores, both make it easier to build an application out of a larger number of components.
Instead of having to provision a virtual machine or acquire a new physical server, developers can simply start up another service or pod in their infrastructure as code deployment packages. This makes it easier to choose the right data store for the right type of data: for instance, the JSON logs from your mobile app can go into a document database, e-commerce transactions can go into a relational store such as PostgreSQL, and your identity and access management can go into a graph database such as Gremlin, all without adding a great deal of cost or burden to your application’s architecture.
Managing Performance in Databases
Whether you use a relational SQL database or a non-relational NoSQL data store, you’ll experience similar concerns about performance and metrics. SolarWinds® Database Performance Monitor (DPM) lets you track and analyze your databases’ most important performance metrics and includes a comprehensive set of NoSQL database performance monitoring tools to help you fix issues faster.
Joey D’Antoni is an ActualTech media contributor and a principal consultant at Denny Cherry and Associates, Microsoft Data Platform MVP, and VMware vExpert.