Friday, November 15, 2024
Google search engine
HomeData Modelling & AIMongoDB Sharding – How to Deploy a Sharded Cluster

MongoDB Sharding – How to Deploy a Sharded Cluster

Introduction

MongoDB is a highly scalable NoSQL database manager designed for working with big data. It is document-oriented and uses JSON-like documents for storing key-value pairs. This NoSQL solution features optional schemas and uses the BASE transaction model.

Sharding is a MongoDB feature that enables easy scaling. As a method for data distribution across multiple database instances, sharding allows MongoDB to support high throughput operations on large data sets.

This article will show you how to deploy a sharded MongoDB cluster using Docker and Docker Compose.

MongoDB sharding - how to deploy a sharded cluster.MongoDB sharding - how to deploy a sharded cluster.

Prerequisites

What is Sharding?

Sharding is the process in MongoDB that enables horizontal scaling of a database. With sharding, the system divides the data into subsets called shards. Shards are often deployed as replica sets stored on multiple machines for high availability.

Each sharded cluster in MongoDB consists of the following components:

  • Config servers store the cluster configuration and metadata. One of the servers acts as the primary server, and others act as secondary.
  • Shards contain data subsets. One shard is primary, while others are secondary.
  • Query router enables client applications to interact with the cluster.

The diagram below illustrates the architecture of a sharded MongoDB cluster.

The diagram illustrating MongoDB architecture.The diagram illustrating MongoDB architecture.

How to Deploy a Sharded Cluster in MongoDB

To deploy a fully functional MongoDB sharded cluster, deploy each cluster element separately. Below are the steps for sharded cluster deployment using Docker containers and Docker Compose.

Note: The tutorial uses a single test machine to deploy all cluster elements. While it is possible to implement sharding in this way, MongoDB recommends using a separate machine for each member of each deployed replica set in a production environment.

Step 1: Deploy a Config Server Replica Set

Start by deploying a replica set of config servers for storing configuration settings and cluster metadata.

1. Create a directory and navigate to it.

mkdir config && cd config

2. Use a text editor to create a docker-compose.yaml file.

nano docker-compose.yaml

3. Write the configuration you want to deploy. The example below defines three config server replicas in the services section and three Docker volumes for persistent data storage in the volumes section.

version: '3'

services:

  configs1:
    container_name: configs1
    image: mongo
    command: mongod --configsvr --replSet cfgrs --port 27017 --dbpath /data/db
    ports:
      - 10001:27017
    volumes:
      - configs1:/data/db

  configs2:
    container_name: configs2
    image: mongo
    command: mongod --configsvr --replSet cfgrs --port 27017 --dbpath /data/db
    ports:
      - 10002:27017
    volumes:
      - configs2:/data/db

  configs3:
    container_name: configs3
    image: mongo
    command: mongod --configsvr --replSet cfgrs --port 27017 --dbpath /data/db
    ports:
      - 10003:27017
    volumes:
      - configs3:/data/db

volumes:
  configs1: {}
  configs2: {}
  configs3: {}

Each config server replica requires the following parameters:

  • Name. Choose any name you want. Numbered names are recommended for easier instance management.
  • Name of the Docker container. Choose any name.
  • Docker image. Use the mongo image available on Docker Hub.
  • mongod command. The command specifies the instance is a config server (--configsvr) and part of a replica set (--replSet). Furthermore, it defines the default port (27017) and the path to the database.
  • Ports. Map the default Docker port to an external port of your choosing.
  • Volumes. Define the database path on a permanent storage volume.

Save and exit the file when you finish.

4. Apply the configuration with the docker-compose command:

docker-compose -f [path-to-file]/docker-compose.yaml up -d

The system confirms the successful deployment of the MongoDB config servers.

Creating mongodb config server instances with docker-compose.Creating mongodb config server instances with docker-compose.

5. Check the running containers in Docker.

docker ps

All three config server replicas show as separate containers with different external ports.

Checking if the mongodb config server containers are running.Checking if the mongodb config server containers are running.

Alternatively, you can use the docker-compose command to list only the containers relevant to the deployment:

docker-compose -f config/docker-compose.yaml ps
Checking running mongodb config server instances using docker-compose.Checking running mongodb config server instances using docker-compose.

6. Check the Docker volumes:

docker volume ls
Checking if docker volumes have been successfully created.Checking if docker volumes have been successfully created.

7. Use the Mongo client application to log in to one of the config server replicas:

mongo mongodb://[ip-address]:[port]

As a result, the MongoDB shell command prompt appears:

Loging into a mongodb config server.Loging into a mongodb config server.

8. Initiate the replicas in MongoDB by using the rs.initiate() method. The configsvr field set to true is required for config server initiation.

rs.initiate(
  {
    _id: "cfgrs",
    configsvr: true,
    members: [
      { _id : 0, host : "[ip-address]:[port]" },
      { _id : 1, host : "[ip-address]:[port]" },
      { _id : 2, host : "[ip-address]:[port]" }
    ]
  }
)

If the operation is successful, the "ok" value in the output is 1. Conversely, if an error occurs, the value is 0, and an error message is displayed.

Initiating config servers in mongodb.Initiating config servers in mongodb.

Press Enter to exit the secondary and return to the primary instance.

9. Use the rs.status() method to check the status of your instances.

rs.status()
Checking the status of the primary config server replica in mongodb.Checking the status of the primary config server replica in mongodb.

Step 2: Create Shard Replica Sets

After setting up a config server replica set, create shards that will contain your data. The example below shows how to create and initiate only one shard replica set, but the process for each subsequent shard is the same.

1. Create and navigate to the directory where you will store shard-related manifests.

mkdir shard && cd shard

2. Create a docker-compose.yaml file with a text editor.

nano docker-compose.yaml

3. Configure shard instances. Below is an example of a docker-compose.yaml that defines three shard replica sets and three permanent storage volumes.

version: '3'

services:

  shard1s1:
    container_name: shard1s1
    image: mongo
    command: mongod --shardsvr --replSet shard1rs --port 27017 --dbpath /data/db
    ports:
      - 20001:27017
    volumes:
      - shard1s1:/data/db

  shard1s2:
    container_name: shard1s2
    image: mongo
    command: mongod --shardsvr --replSet shard1rs --port 27017 --dbpath /data/db
    ports:
      - 50002:27017
    volumes:
      - shard1s2:/data/db

  shard1s3:
    container_name: shard1s3
    image: mongo
    command: mongod --shardsvr --replSet shard1rs --port 27017 --dbpath /data/db
    ports:
      - 50003:27017
    volumes:
      - shard1s3:/data/db

volumes:
  shard1s1: {}
  shard1s2: {}
  shard1s3: {}

The YAML file for the shard replica set contains specifications similar to the config server specifications. The main difference is in the command field for each replica, where the mongod command for shards is issued with the --shardsvr option. As a result, MongoDB recognizes the servers as shard instances.

When you finish, save and exit the file.

4. Use Docker Compose to apply the replica set configuration.

docker-compose -f shard/docker-compose.yaml up -d

The output confirms the successful creation of the Docker containers.

Creating mongodb shard instances using docker-compose.Creating mongodb shard instances using docker-compose.

5. Log in to one of the replicas using the mongo command.

mongo mongodb://10.0.2.15:20001

6. Initiate the replica set with rs.initiate().

rs.initiate(
  {
    _id: "shard1rs",
    members: [
      { _id : 0, host : "10.0.2.15:20001" },
      { _id : 1, host : "10.0.2.15:20002" },
      { _id : 2, host : "10.0.2.15:20003" }
    ]
  }
)

If the initiation is successful, the output value of "ok" is 1.

Initiating sharded servers in mongodb.Initiating sharded servers in mongodb.

Note: Replica set names must be unique for each shard replica set you add to the cluster.

Step 3: Start a mongos Instance

A mongos instance acts as a query router, i.e., an interface between the cluster and client apps. Follow the steps below to set it up in your cluster.

1. Create a directory for your mongos configuration and navigate to it.

mkdir mongos && cd mongos

2. Create a Docker Compose file.

nano docker-compose.yaml

3. Configure the mongos instance. For example, the file below creates a mongos instance and exposes it to port 30000. The command section should contain the --configdb option, followed by references to the addresses of config server replicas.

version: '3'

services:

  mongos:
    container_name: mongos
    image: mongo
    command: mongos --configdb cfgrs/10.0.2.15:10001,10.0.2.15:10002,10.0.2.15:10003 --bind_ip 0.0.0.0 --port 27017
    ports:
      - 30000:27017

Save the file and exit.

4. Next, apply the configuration with docker-compose:

docker-compose -f mongos/docker-compose.yaml up -d

The output shows Docker has created the mongos instance container.

Creating a mongodb query router instance using docker-compose.Creating a mongodb query router instance using docker-compose.

Check the running containers in Docker:

docker ps

After deploying three config server replicas, three shard replicas, and one mongos instance, the output shows seven containers based on the mongo image.

Checking if all the mongodb instance containers are running.Checking if all the mongodb instance containers are running.

Step 4: Connect to the Sharded Cluster

With all the instances up and running, the rest of the cluster configuration takes place inside the cluster. Connect to the cluster using the mongo command:

mongo mongodb://[mongos-ip-address]:[mongos-port]

The MongoDB shell command prompt appears.

Connecting to a sharded cluster via the mongos instance.Connecting to a sharded cluster via the mongos instance.

Step 5: Add Shards to the Cluster

Use the sh.addshard() method and connect the shard replicas to the cluster:

sh.addShard("[shard-replica-set-name]/[shard-replica-1-ip]:[port],[shard-replica-2-ip]:[port],[shard-replica-3-ip]:[port]")

The output shows that the system successfully added the shards to the cluster. The "ok" value is 1:

Adding shards to a mongdb cluster.Adding shards to a mongdb cluster.

Check the status with the sh.status() method:

sh.status()

The output lists the active shards in the shards section:

Checking sharding status in MongoDB.Checking sharding status in MongoDB.

Step 6: Enable Sharding for a Database

Enable sharding for each database you plan to use it on. Use the sh.enableSharding() method, followed by the database name.

sh.enableSharding("[database-name")

The example below enables sharding for the database named testdb:

Enabling sharding for a database in MongoDB.Enabling sharding for a database in MongoDB.

Note: Database workloads require high memory density and large storage capacity to perform well. Our BMC database servers are workload-optimized and support all major databases.

Step 7: Shard a Collection

There are two ways to shard a collection in MongoDB. Both methods use the sh.shardCollection() method:

  • Range-based sharding produces a shard key using multiple fields and creates contiguous data ranges based on the shard key values.
  • Hashed sharding forms a shard key using a single field’s hashed index.

To shard a collection using range-based sharding, specify the field to use as a shard key, and set its value to 1:

sh.shardCollection("[database].[collection]", { [field]: 1 } )
Range-based collection sharding in MongoDB.Range-based collection sharding in MongoDB.

Use the same syntax to set up hashed sharding. This time set the value of the field to "hashed":

sh.shardCollection("[database].[collection]", { [field]: "hashed" } )
Hashed collection sharding in MongoDB.Hashed collection sharding in MongoDB.

Note: Once you shard a collection, you cannot unshard it.

Conclusion

After reading this article, you will know how to deploy a sharded MongoDB cluster using Docker and Docker Compose. If you are interested in how MongoDB compares against popular database management solutions, read MongoDB vs. MySQL and Cassandra vs. MongoDB.

Additionally, for an overview of the best open-source DBMSs, read 8 Best Open-Source Databases.

Was this article helpful?
YesNo

RELATED ARTICLES

Most Popular

Recent Comments