Saturday, December 28, 2024
Google search engine
HomeData Modelling & AIWhat is a Document Database?

What is a Document Database?

Introduction

With unique capabilities, NoSQL databases overcome the constraints found in the relational database model. NoSQL is an umbrella term for four main subsets of NoSQL databases:

In this article, we’ll explain what a document database is, describe its benefits and drawbacks, and provide examples.

Document Database explained.Document Database explained.

Document Database Definition

A document database is a type of NoSQL database which stores data as JSON documents instead of columns and rows. JSON is a native language used to both store and query data. These documents can be grouped together into collections to form database systems.

Document database collection.Document database collection.

Each document consists of a number of key-value pairs. Here is an example of a document that consists of 4 key value pairs:

{
"ID" : "001",
"Book" : "Java: The Complete Reference",
"Genre" : "Reference work",
"Author" : "Herbert Schildt",
}

Using JSON enables app developers to store and query data in the same document-model format that they use to organize their app’s code. The object model can be converted into other formats, such as JSON, BSON and XML.

Relational Vs Document Database  

Relational database management systems (RDBMS) rely on Structured Query Language (SQL). NoSQL doesn’t.

A RDBMS is focused on creating relationships between files to store and read data. Document databases are focused on the data itself and relationships are represented with nested data.

Key comparisons between relational and document databases:

RDBMS Document Database System
Structured around the concept of relationships. Focused on data rather than relationships.
Organizes data into tuples (or rows). Documents have properties without theoretical definitions, instead of rows.
Defines data (forms relationships) via constraints and foreign keys (e.g., a child table references to the master table via its ID). No DDL language for defining schemas.
Uses DDL (Data Definition Language) to create relationships. Relationships represented via nested data, not foreign keys (any document may contain others nested inside of it, leading to an N:1 or 1:N relationship between the two document entities).
Offers extreme consistency, critical for some use cases such as daily banking. Offers eventual consistency with a period of inconsistency.

Features of Document Databases

Document databases provide fast queries, a structure well suited for handling big data, flexible indexing and a simplified method of maintaining the database. It’s efficient for web apps and has been fully integrated by large-scale IT companies like Amazon.

Although SQL databases have great stability and vertical power, they struggle with super-sized databases. Use cases that require immediate access to data, such as healthcare apps, are a better fit for document databases. Document databases make it easy to query data with the same document-model used to code the application.

Document Databases Use Cases

General Use Cases
User profiles Extracting real-time big data
Book databases Data of varying structures
Content management Catalogs
Patients’ data

We’ll cover some of the above-mentioned use cases in greater detail in the following sections.

Book Database

Both relational and NoSQL document systems are used to form a book database, although in different ways.

The relational approach would represent the relationship between books and authors via tables with IDs – an Author table and a Books table. It forces each author to have at least one entry in the Books table by disallowing null values.

By comparison, the document model lets you nest. It shows relationships more naturally and simply by ensuring that each author document has a property called Books, with an array of related book documents in the property. When you search for an author, the entire book collection appears.

Content Management

Developers use document databases to created video streaming platforms, blogs and similar services. Each file is stored as a single document and the database is easier to maintain as the service evolves over time. Significant data modifications, such as data model changes, require no downtime as no schema update is necessary.

Catalogs

Document databases are much more efficient than relational databases when it comes to storing and reading catalog files. Catalogs may have thousands of attributes stored and document databases provide fast reading times. In document databases, attributes related to a single product are stored in a single document. Modifying one product’s attributes does not affect other documents.

Document Database Advantages and Disadvantages   

Below are some key advantages and disadvantages of document databases:

Document Database Advantages Document Database Disadvantages
Schema-less Consistency-Check Limitations
Faster creation and care Atomicity weaknesses
No foreign keys Security
Open formats
Built-in versioning

The advantages and disadvantages are further explained in the sections below.

Advantages

  • Schema-less. There are no restrictions in the format and structure of data storage. This is good for retaining existing data at massive volumes and different structural states, especially in a continuously transforming system.
  • Faster creation and care. Minimal maintenance is required once you create the document, which can be as simple as adding your complex object once.
  • No foreign keys. With the absence of this relationship dynamic, documents can be independent of one another.
  • Open formats. A clean build process that uses XML, JSON and other derivatives to describe documents.
  • Built-in versioning. As your documents grow in size they can also grow in complexity. Versioning decreases conflicts.

Disadvantages

  • Consistency-Check Limitations. In the book database use case example above, it would be possible to search for books from a non-existent author. You could search the book collection and find documents that are not connected to an author collection.
    Each listing may also duplicate author information for each book. These inconsistencies aren’t significant in some contexts, but at upper-tier standards of RDB consistency audits, they seriously hamper database performance.
  • Atomicity weaknesses. Relational systems also let you modify data from one place without the need for JOINs. All new reading queries will inherit changes made to your data via a single command (such as updating or deleting a row).
    For document databases, a change involving two collections will require you to run two separate queries (per collection). This breaks atomicity requirements.
  • Security. Nearly half of web applications today actively leak sensitive data. Owners of NoSQL databases, therefore, need to pay careful attention to web app vulnerabilities.

Note: Read more about data atomicity and consistency in our article ACID vs. Base.

Best Document Databases     

Amazon DocumentDB

Features:

  • MongoDB-compatible
  • Fully managed
  • High performance with low latency querying
  • Strong compliance and security
  • High availability

Used for:

  • Amazon’s entire development team uses Amazon DocumentDB to increase agility and productivity. They needed nested indexes, aggregations and ad hoc queries, with a fully managed process.
  • The BBC uses it for querying and storing data from multiple data streams and compiling into single customer feeds. They migrated to Amazon DocumentDB for the benefits of a fully managed service with high availability, durability, and default backups.
  • Rappi switched to Amazon DocumentDB to reduce coding time, Dow Jones to simplify operations and Samsung for better handling of large logs more flexibly.

MongoDB

Features:

Used for:

  • Forbes decreased build time by 58%, gaining a 28% increase in subscriptions due to quicker building of new features, simpler incorporations and better handling of increasingly diverse data types.
  • Toyota found it much simpler for developers to work at high speeds by using natural JSON documents. More time is spent on building the business value instead of data modeling.

Cosmos DB

Features:

  • Any scale fast reads
  • 99,999% availability
  • Fully managed
  • NoSQL/Native Core APIs
  • Serverless, cost-effectively/instantly scales

Used for:

  • Coca-Cola gets insights delivered in minutes, facilitating global scaling. Before migrating to Cosmos DB, it took hours.
  • ASOS needed a distributed database that flexibly and seamlessly scales to handle over 100 million global retail customers.

ArangoDB

Features:

  • Schema validations
  • Diverse indexing
  • Fast distributed clusters
  • Efficient v large datasets
  • Supports multiple NoSQL data models
  • Combine models into single queries

Used for:

  • Oxford university reduced hospital attendance and improved test results by developing a web–based assessment test for cardiopulmonary disease.
  • FlightStats transformed fragmented flight data (flight status, weather, airport delays, and reference data) into one standard, enabling accurate, predictive and analytical results.

Couchbase Server

Features:

  • Ability to manage global deployments
  • Extreme agility and flexibility
  • Fast at large scale
  • Easy cloud integrations

Used for:

  • BT used Couchbase’s flexible data model to accelerate its capacity to deliver content at high performance while scaling with ease against demand spikes.
  • eBay migrated from Oracle for a more cost-effective, feature-applicable solution (of their key-value store/document system). App performance and availability grew, while developers could use their SQL know-how to speed up their CI/CD pipeline via a more flexible schema.

CouchDB

Features:

  • Browser-based GUI
  • Offers simplest replications
  • User authentication
  • ACID properties 

Used for:

  • Meebo, the social platform, used CouchDB for the web-based interface and its applications.
  • The BBC used CouchDB for its dynamic content platforms. 

How to Choose?

Your app’s critical demands determine how to structure data. A few key questions:

  • Will you be doing more reading or writing? Relational systems are superior if you are doing more writing, as they avoid duplications during updates.
  • How important is synchronisation? Due to their ACID framework, relational systems do this better.
  • How much will your database schema need to transform in the future? Document databases are a winning choice if you work with diverse data at scale and require minimal maintenance.

Neither document nor SQL is strictly better than the other. The right choice depends on your use case. When making your decision, consider the types of operations that will be most frequently carried out.

Note: Learn more about databases, the definition, how they started, what they are used for, their components and more in our post What Is A Database?

Conclusion

In this article, we explained Document Database features, use cases, advantages and disadvantages. The article also provides a list of the best document databases and how Forbes 500 companies have been using them to improve the efficiency of their business and development processes.

Was this article helpful?
YesNo

RELATED ARTICLES

Most Popular

Recent Comments