MongoDB - Replication and Sharding Tutorial

Welcome to the MongoDB Tutorial. This is Lesson 5 of the Mongo DB Developer course offered by Simplilearn. This tutorial will explain what these concepts are and how they help in scaling read and write operations. Let us begin by exploring the objectives of this lesson.


After completing MongoDB Tutorial, you will be able to:

• Explain how the MongoDB replication process works

• Explain the master-slave replication, replica sets, and replica set members

• Explain automatic failover

• Explain write concern, read preference, and tag sets

• Describe the replica set deployment architecture

• Explain how to enable MongoDB sharding for database and collections

• List different types of MongoDB sharding keys and

• Explain when to use sharding in MongoDB, and

• Explain how to deploy a MongoDB sharded cluster

We will begin with a brief introduction to replication in MongoDB in the next section.

Introduction to Replication in MongoDB

The primary task of a MongoDB administrator is to set up replication and ensure that it is functioning correctly. The replication functionality is recommended for use in the production setting. Replication in MongoDB increases data availability by creating data redundancy. MongoDB replication stores multiple copies of data across different databases in multiple locations, and thus protects data when the database suffers any loss.

In addition, it helps you manage data in the event of hardware failure and any kind of service interruptions. Having multiple copies of data stored on various servers across various locales, you can perform tasks, such as disaster recovery, backup, or reporting with ease.  

You can also use replication to enhance read operations. Typically, clients send read and write operations to different servers. You can store copies of these operations in different data centers to increase the locality and availability of data for distributed applications. In the next section, we will discuss the master-slave replication.

Master-Slave Replication in MongoDB

The master-slave replication is the oldest mode of replication that MongoDB supports. In the earlier versions of MongoDB, the master-slave replication was used for failover, backup, and read scaling. However, in the new versions, it is replaced by replica sets for most use cases. 


Although, MongoDB still supports the master-slave replication, replica sets are recommended for new production deployments to replicate data in a cluster.

Let's discuss replica sets in MongoDB.

Replica Set in MongoDB

A replica set consists of a group of mongod (read as Mongo D) instances that host the same data set. In a replica set, the primary mongod receives all write operations and the secondary mongod replicates the operations from the primary and thus both have the same data set. The primary node receives write operations from clients.

A replica set can have only one primary and therefore only one member of the replica set can receive write operations. A replica set provides strict consistency for all read operations from the primary. The primary logs any changes or updates to its data sets in its oplog(read as op log).

The secondaries also replicate the oplog of the primary and apply all the operations to their data sets. When the primary becomes unavailable, the replica set nominates a secondary as the primary. By default, clients read data from the primary. However, they can also specify read preferences and request read operations to send to the secondaries. Reads from secondaries may return data that does not reflect the state of the primary.


The diagram given above depicts that the primary member of the replica set receives all the read and write requests by default. However, read requests can be directly sent to the secondary member of the replica set too.

We will continue our discussion on MongoDB replica sets in the next section.

Replica Set in MongoDB (contd.)

You can choose to add an extra mongod instance to a replica set to act as an arbiter.

Given below are features of arbiters.

They DO NOT maintain a dataset

Their primary function is to select the primary node

They do not store data and hence need no additional hardware

Secondary members in a replica set asynchronously apply operations from the primary. By applying operations after the primary, replica sets can function without any secondary members. As a result, all secondary members may not return the updated data to clients. A primary can convert to a secondary or vice versa. However, an arbiter will remain unchanged.

We will discuss how a replica set handles automatic failover in the next section.

Automatic Failover in MongoDB

When the primary node of a replica set stops communicating with other members for more than 10 seconds or fails, the replica set selects another member as the new primary. The selection of the new primary happens through an election process and whichever secondary node gets majority of the votes becomes the primary.


A replica set supports application needs in various ways. For example, you may deploy a replica set in multiple data centers, or manipulate the primary election by adjusting the priority of members. In addition, replica sets support dedicated members for functions, such as reporting, disaster recovery, or backup.

In the next section, we will discuss replica set members in MongoDB.

MongoDB Replica Set Members

In addition to the primary and secondaries, a replica set can also have an arbiter. Unlike secondaries, arbiters do not replicate or store data. However, arbiters play a crucial role in selecting a secondary to take the place of the primary when the primary becomes unavailable.


A replica set typically contains:

A primary node

A secondary node

An arbiter

Typically, most of the replica set deployments keep three members that store data, one primary and two secondaries.

A replica set in MongoDB version 3.0. (read as three-point O ) can have up to 50 members with 7 voting members. In the previous versions of MongoDB, replica sets could have maximum 12 members.



Priority 0 Replica Set Members in MongoDB

A priority zero member in a replica set is a secondary member that cannot become the primary. These members can act as normal secondaries, but cannot trigger any election.

The main functions of a priority are as follows:

Maintains data set copies

Accepts and performs read operations

Elects the primary node

A priority zero member is particularly useful in multi-data center deployments. In a replica set containing three members, one data center hosts both, the primary and a secondary, and the second data center hosts one priority zero member.


Typically, a priority zero member acts as a backup. For example, in some replica sets, a new member may not be able to add immediately. The backup or standby member stores the updated data and can immediately replace an unavailable member.

Next, we will discuss hidden replica set members in MongoDB.

Hidden MongoDB Replica Set Members

Hidden members of a replica set are invisible to the client applications. They store a copy of the primary node’s data and are good for workloads with different usage patterns from the other members in the replica set.

Although the hidden replica set members are priority zero members and can never replace the primary, they may elect the primary.


In the five-member replica set shown above, all the four secondary members contain the primary’s dataset copies and one hidden secondary member. Typically, the clients of read operations do not distribute appropriate read rights to the hidden members. Therefore, the hidden members receive only the basic replication traffic.

You can utilize the hidden members for dedicated functions like reporting and backups. In a sharded cluster, mongos (read as mongo s) do not interact with the hidden replica set members.

In the next section, we will discuss delayed replica set members.


Interested in learning more about MongoDB? Enroll in our MongoDB course today!

Delayed MongoDB Replica Set Members

Delayed replica set members are those secondaries that copy data from the primary node’s oplog file after a certain interval or delay. Delayed replica set members store their data set copies. However, they reflect a previous version, or a delayed state of the set.

For example, if the data shown in the primary is for 10 AM then the delayed member will show data for 9 AM. Delayed members perform a “roll backup” or run a “historical” snapshot of the data set. Therefore, they help you manage various human errors and recover from errors, such as unsuccessful application upgrade, and dropped databases and collections.

To be considered a delayed member, a replica set member must:

Be a priority zero member

Be hidden and not visible to applications and

Participate in electing the primary

In the next section, we will discuss the configuration settings of delayed replica set members

Delayed MongoDB Replica Set Members (contd.)

You can configure a delayed secondary member with the following settings:

Priority Value - Zero,

Hidden Value - True, and

SlaveDelay Value - Number of seconds to delay

You can set a 1-hour delay on a secondary member currently at the index 0 in the members array. To set the delay, issue the sequence of operations given here in a mongo shell connected to the primary. After a replica set reconfigures, the delayed secondary member cannot replace the primary and is hidden from applications. The slaveDelay value delays both replication and the member’s oplog by 3600 seconds or 1 hour.

In the next section, we will view a demo on how to start a replica set.

Write Concern in MongoDB

The write concern provides an assurance of the success of a write operation. The strength or weakness of a write concern determines the level of assurance. For operations that have weak concerns, such as updates, inserts, and deletes, the write operations return a quick result.

Write operations having weak write concerns may fail. When stronger write concerns are used in an operation, clients wait after sending the operation for MongoDB to confirm their success. MongoDB provides different levels of write concerns to address the varied application needs.

Clients may manage write concerns to ensure that the critical operations are successfully carried out on the MongoDB deployment. For less critical operations, clients can manage the write concern to ensure faster performance rather than ensure persistence to the entire deployment.

Typically, regardless of the write concern level or journaling configuration, MongoDB allows clients to read the inserted or modified documents before committing the modifications to the disk. As a result, if an application has multiple readers and writers, MongoDB allows a reader to read the data of another writer even when the data is still not committed to the journal.

We will continue our discussion on Write Concern in the next section.

Write Concern in MongoDB (contd.)

MongoDB modifies each document of a collection separately. For multi-document operations, MongoDB does not provide any multi-document transactions or isolation. When a standalone mongod returns a successful journaled write concern, the data is stored to the disk and becomes available after mongod restarts.

Write operations in a replica set can become durable only after a write data gets replicated and committed to the journal on majority of the voting members of a replica set. MongoDB periodically commits data to the journal as defined by the commitIntervalMs (read as commit interval millisecond) parameter. In the next section, we will discuss write concern levels.

Write Concern Levels in MongoDB

MongoDB has two levels of conceptual write concern:

Unacknowledged Write Concern - MongoDB does not acknowledge the received write operations or ignores it. This is similar to ignored errors.

Acknowledged Write Concern -  The mongod confirms the receipt of the write operation. It also confirms that the changes to the in-memory data view are applied.

In an unacknowledged write concern, the driver is capable of receiving and handling network errors whenever possible. The driver’s ability to detect and handle network errors depend on the networking configuration of the system. In an acknowledged write concern can detect network, duplicate key, and other errors.

By default, MongoDB uses the acknowledged write concern in the driver. The mongo shell includes the write concern in the write methods and provides the default write concern whether it runs interactively or in a script.

An acknowledged write concern does not confirm that the write operation has persisted to the disk system.

Note that journaling must be enabled to use a write concern.  

In the next section, we will discuss the write concern for a MongoDB replica set.

Write Concern for a MongoDB Replica Set

Replica sets have additional considerations for write concerns. These are:

The default write concerns in a replica set require acknowledgement only from the primary member.

A write concern acknowledged by a replica set ensures that the write operation spreads to the other members of the set.

The default write concern confirms write operations only for the primary.

However, you can choose to override this write concern to check write operations on some replica set members. You can override the default write concern by specifying a write concern for each write operation.

db.products.insert({ item: "envelopes",qty: 100, type: "Clasp" },
{ writeConcern: { w: 2, wtimeout: 5000 } })

For example, the method given above contains a write concern that specifies that the method should return a response only after the write operation is spread to the primary and minimum one secondary, or the method times out after 5 seconds.  

In the next section, we will discuss how to modify default write concerns.

Modify Default Write Concern for MongoDB

You can modify a default write concern by changing the “getLastErrorDefaults” (read as get last error defaults) setting in the replica set configuration. This write operation will complete most of the voting members before returning the result. The sequence of commands given here creates a configuration that waits for the write operation to complete. Additionally, you can include a timeout threshold for a write concern to prevent blocking write operations.

cfg = rs.conf()
cfg.settings = {}
cfg.settings.getLastErrorDefaults = { w: "majority", wtimeout: 5000 }

For example, a write concern must be acknowledged by four members of a replica set. However, only three members are available in the replica set. The operation blocks until all the four members are available. You can prevent the operation from blocking by adding a timeout (as can be seen in the example above).

In the next section, we will discuss Read Preferences.

MongoDB Read Preference

Read preferences define how MongoDB clients route read operations to replica set members. By default, a client directs its read operations to the primary. Therefore, reading from the primary member means that you have the latest version of a document. However, you can distribute reads among the secondary members to improve the read performance or reduce latency for applications that do not require the up-to-date data.

The typical use cases for using non-primary read preferences are as follows.

• Running systems operations that do not affect the front-end application.

• Providing local reads for the geographically distributed applications.

• Maintaining availability during a failover.

You can use Mongo.setReadPref() (read as mongo dot set read preference method) to set a read preference.

We will discuss the read preference modes in the next section.

Read Preference Modes in MongoDB

The read preference modes supported in a replica set are as follows.

Primary: This is the default read preference mode. All operations read from the current replica set are primary.

PrimaryPreferred: Operations are mostly read from the primary member. However, when the primary is unavailable, operations are read from a secondary member.

Secondary: All operations are read from secondary members of the replica set.

SecondaryPreferred: In most situations, operations are read from a secondary, but when a secondary member is unavailable, operations are read from the primary.

Nearest: Operations are read from a member of the replica set with the least network latency, irrespective of the member’s type.

In the next section, we will discuss blocking for replication.

Blocking for Replication in MongoDB

The getLastError (read as get last error) command in MongoDB ensures how the up-to-date replication is done by using the optional "w" parameter.

db.runCommand({getLastError: 1, w: N});

In the query shown above, the getLastError command is run that blocks until at least N number of servers have replicated the last write operation. The query will return immediately if N is either not present or is less than two. If N equals to two, the master will respond to the command only after one slave replicates the last operation. Note that the master is included in N. The master uses the "syncedTo" information stored in local slaves to identify how up-to-date each slave is.

When specifying "w", the getLastError command takes an additional parameter, "wtimeout"(read as w- timeout) —a timeout in milliseconds. This allows the getLastError command to time out and return an error before the last operation has replicated to N servers. However, by default the command has no timeout.

Blocking the replication significantly slows down write operations, particularly for large values of "w".

Setting "w" to two or three for important operations yields a good combination of efficiency and safety.

In the next section, we will discuss tag sets.

MongoDB Tag Set

Tag sets allow tagging target read operations to select the replica set members. Customized read preferences and write concerns assess tag sets differently. Typically, read preferences stresses on the tag value when selecting a replica set member to read from. Write concerns on the other hand, ignore the tag value when selecting a member. However, they confirm whether the value is unique.

You can specify tag sets with the following read preference modes:

• PrimaryPreferred

• Secondary

• SecondaryPreferred and

• Nearest

Remember, tags and the primary mode are not compatible and, in general, tags apply only when selecting a secondary member of a read operation. However, tags are compatible with the nearest mode. When combined together, the nearest mode selects the matching member, primary or secondary, with the lowest network latency.

Typically, all interfaces use the same member selection logic to choose the member to direct read operations, and base the choice on the read preference mode and tag sets.

In the next section, we will discuss how to configure tag sets for replica sets.

Configure Tag Sets for MongoDB Replica set

Using tag sets, you can customize write concerns and read preferences in a replica set.

MongoDB stores tag sets in the replica set configuration object, which is the document returned by rs.conf() (read as R-S dot conf), in the members.tags (read as members dot tags) embedded document.

When using either write concern or read scaling, you may want to have some control over which secondaries receive writes or reads.

For example, you have deployed a five-node replica set across two data centers, NYK (read as New York) and LON (read as London). New York is the primary data center that contains three nodes. The secondary data center, London contains two. Suppose you want to use a write concern to block until a certain write is replicated to at least one node in the London data center. You cannot use a “w” value of majority, because this will translate into a value of 3, and the most likely scenario is that the three nodes in New York will acknowledge first.

Replica set tagging resolves such problems by allowing you to define special write concern modes that target the replica set members with defined tags. To see how this works, you first need to learn how to tag a replica set member.

conf = rs.conf()

conf.members[0].tags = { "dc": "NYK", "rackNYK": "A"}

conf.members[1].tags = { "dc": "NYK", "rackNYK": "A"}

conf.members[2].tags = { "dc": "NYK", "rackNYK": "B" }

conf.members[3].tags = {"dc": "LON", "rackLON": "A"}

conf.members[4].tags = { "dc": "LON", "rackLON": "B"}

conf.settings = { getLastErrorModes: { MultipleDC : { "dc": 2}, multiRack: { rackNYK: 2 } }


In the config document, each member can have a key called tags pointing to an object containing key-value pairs. You can add tag sets to the members of a replica set with the following sequence of commands in the mongo shell as given above. Modes can be defined as a document with one or more keys in the “getLastErrorModes” command.

Here, in this command, you have defined a mode containing 2 keys, “dc” and “rackNYK” (read as rack new york) whose values are integers 2. These integers indicate that the number of the tagged value must be satisfied in order to complete the getlastError command.

In the next section, we will discuss Replica Set Deployment.

Replica Set Deployment Strategies for MongoDB

A replica set architecture impacts the set’s capability. In a standard replica set for a production system, a three-member replica set is deployed. These sets provide redundancy and fault tolerance, avoid complexity, but let your application requirements define an architecture.

Use the following deployment strategies for a replica set.

1. Deploy an Odd Number of Member: For electing the primary member, you need an odd number of members in a replica set. Add an arbiter in case a replica set has even number of members.

2. Consider Fault Tolerance: Fault tolerance for a replica set is the number of members that can become unavailable and still leave enough members in the set to elect the primary. A replica set cannot accept any write operation if the primary member is not available. Remember that adding a new member to a replica set does not always increase its fault tolerance capability. However, the additional members added in a replica set can provide support for some dedicated functions, such as backups or reporting.

3. Use Hidden and Delayed Members: Add hidden or delayed members to support dedicated functions, such as backup or reporting.

4. Load Balance on Read-Heavy Deployments: In deployments with high read traffic, to improve read performance, distribute reads to secondary members. As your deployment grows, add or move members to alternate data centers and improve redundancy and availability. However, ensure that the main facility is able to elect the primary.

In the next section, we will continue with the Replica Set Deployment strategy.

Replica Set Deployment Strategies for MongoDB  (contd.)

Some more replica set deployment strategies are given below.

Add Capacity Ahead of Demand: You must be able to add a new member to an existing replica set to meet the increased data demands. Add new members before new demands arise.

Distribute Members Geographically: For data safety in case of any failure, keep at least one member in an alternate data center. Also set the priorities of these members to zero so that they do not become the primary.

Keep a Majority of Members in One Location: When replica set members are distributed across multiple locations, network partition may hinder the communication between the data centers, and affect data replication. When electing the primary, all the members must be able to see each other to create a majority. To enable the replica set members create a majority and elect the primary, ensure that majority of the members are in one location.

Use replica set tag sets: This ensures that all operations are replicated at specific data centers. Using tag sets helps to route read operations to specific computers.

Use Journaling to Protect Against Power Failures: To avoid the data loss problem, use journaling so that data can be safely written on a disk in case of shutdowns, power failure, and other unexpected failures.

In the next section, we will discuss replica set deployment patterns.

MongoDB Replica Set Deployment Patterns

The common deployment patterns for a replica set are as follows:

Three Member Replica Sets: This is the minimum recommended architecture for a replica set.

Replica Sets with Four or More Members: This set provides greater redundancy and supports greater distribution of read operations and dedicated functionality.

Geographically Distributed Replica Sets: These include members in multiple locations to protect data against facility-specific failures, such as power outages.

In the next section, we will discuss the oplog file.

MongoDB Oplog Files

The record of operations maintained by the master server is called the operation log or oplog (read as op log). The oplog is stored in a database called local, in the “oplog.$main” (read as op log dot main) collection.

Each oplog document denotes a single operation performed on the master server and contains the following keys:

1. Timestamp or TS for the operation: It is an internal function used to track operations. It contains a 4-byte timestamp and a 4-byte incrementing counter.

2. Op: This is the type of operation performed as a 1-byte code, for example, “i” for an insert.

3. Namespace or Ns: This is the collection name where the operation is performed.

4. O: This key is used to document further specifying the operation to perform. For an insert, this would be the document to insert.

The oplog stores only those operations that change the state of the database. The oplog is intended only as a mechanism for keeping the data on slaves in sync with the master. The oplog files do not exactly store the operations that are performed on the master rather the operation gets transformed and then is stored into the oplog file. In the next section, we will discuss replication state and local database.

MongoDB Replication State and Local Database

MongoDB maintains a local database called “local” to keep the information about the replication state and the list of master and slaves. The content of this database does not get replicated and remains local to the master and slaves. This list is stored in the slave’s collection. Slaves store the replication information in the local database.

The unique slave identifier gets saved in the “me” collection and the list of masters gets saved in “sources” collection. The master and slave both use the timestamp stored in the “syncedTo” command to understand how up-to-date a slave is.

A slave uses “synced to” to query the oplog for new operations and find out if any operation is out of sync.

In the next section, we will discuss Replication Administration.

Replication Administration in MongoDB

MongoDB inspects the replication status using some administrative helpers. To check the replication status, use the function given below when connected to the master. This function provides information on the oplog size and the date ranges of operations contained in the oplog.

configured oplog size: 10.48576MB

log length start to end: 34secs (0.01hrs)

oplog first event time: Tue Mar 30 2010 16:42:57 GMT-0400 (EDT)

oplog last event time: Tue Mar 30 2010 16:43:31 GMT-0400 (EDT)

now: Tue Mar 30 2010 16:43:37 GMT-0400 (EDT)

In the given example, the oplog size is 10 megabyte and can accommodate about 30 seconds of operations. In such a case, you should increase the size of the oplog. You can compute the log length by determining the time difference between the first and the last operation. In case the server has recently started, then the first logged operation will be relatively recent.

Hence, the log length will be small, even though the oplog may have some free space available. The log length serves as a metric for servers that have been operational long enough for the oplog to “roll over.” You can also get some information when connected to the slave, using the functions given below. This function will populate a list of sources for a slave, each displaying information such as how far behind it is from the master.


In the next section, we will view a demo on how to check the status of a replica set.

What is Sharding in MongoDB?

In this MongoDB sharding tutorial, you will learn what sharding is and how it can be used. Sharding in MongoDB is the process of distributing data across multiple servers for storage. MongoDB uses sharding to manage massive data growth. With an increase in the data size, a single machine may not be able to store data or provide an acceptable read and write throughput.

MongoDB sharding supports horizontal scaling and thus is capable of distributing data across multiple machines. Sharding in MongoDB allows you to add more servers to your database to support data growth and automatically balances data and load across various servers. MongoDB sharding provides additional write capacity by distributing the write load over a number of mongod instances. It splits the data set and distributes them across multiple databases, or shards.

Each shard serves as an independent database, and together, shards make a single logical database. MongoDB sharding reduces the number of operations each shard handles and as a cluster grows, each shard handles fewer operations and stores lesser data. As a result, a cluster can increase its capacity and input horizontally.

For example, to insert data into a particular record, the application needs to access only the shard that holds the record. If a database has a 1 terabyte data set distributed amongst 4 shards, then each shard may hold only 256 Giga Byte of data. If the database contains 40 shards, then each shard will hold only 25 Giga Byte of data In the next section, we will discuss when sharding in MongoDB should be used.

When Can I Use Sharding in MongoDB?

Typically, sharded clusters in MongoDB require a proper infrastructure setup. This increases the overall complexity of the deployment. Therefore, consider deploying sharded clusters only when there is an application or operational requirement.

You must consider deploying a MongoDB sharded cluster when your system shows the following characteristics:

• The data set outgrows the storage capacity of a single MongoDB instance.

• The size of the active working set exceeds the capacity of the maximum available RAM.

• A single MongoDB instance is unable to manage write operations.

In the absence of these characteristics, sharding in MongoDB will not benefit your system, rather, it will add complexity. Deploying MongoDB sharding consumes time and resources. In case your database system has already surpassed its capacity, deploying sharding in MongoDB without impacting the application is not possible. Therefore, deploy MongoDB sharding if you expect that the read and write operations are going to be increased in future.

In the next section, we will discuss what a shard is.

What is a MongoDB Shard?

A shard is a replica set or a single mongod instance that holds the data subset used in a sharded cluster. Shards hold the entire data set for a cluster. Each shard is a replica set that provides redundancy and high availability for the data it holds.


MongoDB shards data on a per collection basis and lets you access the sharded data through mongos instances. If you directly connect to a shard, you will be able to view only a fraction of the data contained in a cluster. Data is not organized in any particular order. In addition, MongoDB does not guarantee that any two contiguous data chunk will reside on any particular shard.

Note that every database contains a “primary” shard that holds all the un-sharded collections in that database.  

In the next section, we will discuss a shard key.

What is a Shard Key in MongoDB?

When deploying sharding, you need to choose a key from a collection and split the data using the key’s value. This key is called a shard key that determines how to distribute the documents of a collection among the different shards in a cluster.

The shard key is a field that exists in every document in the collection and can be an indexed or indexed compound field. MongoDB performs data partitions in a collection using the different ranges or chunks of shard key values.

Each range or chunk defines a non-overlapping range of shard key values. MongoDB distributes chunks, and their documents, among the shards in a cluster. MongoDB also distributes documents according to the range of values in the shard key.  

In the next section, we will discuss how to choose a shard key.

Choosing a Shard Key in MongoDB

To enhance and optimize the performance, functioning and capability of your database, you need to choose the correct shard key.  

Choosing the appropriate shard key depends on two factors:
The schema of your data and
The way applications in your database query and perform write operations.  

In the next section, we will discuss the characteristics of an ideal shard key.

Ideal MongoDB Shard Key

You will learn the characteristics of an ideal shard key in this MongoDB sharding tutorial. They are:

Must be Easily Divisible: An easily divisible shard key enables MongoDB to perform data distribution among shards. If shard keys contain limited number of possible values, then the chunks in shards cannot be split. For example, if a chunk or a range represents a single shard key value, then the chunk cannot be split even if it exceeds the recommended size.

Must Have a High Degree of Randomness: A shard key must possess a high degree of randomness. This ensures that a single shard distributes write operations among the cluster and does not become a bottleneck.

Must be Able to Target a Single Shard: A shard key must target a single shard to enable the mongos program to return most of the query operations directly from a single mongod instance. In addition, the shard key should be the primary field in queries. Fields having a high degree of “randomness” cannot target operations to specific shards.

Note:  If an existing field in your collection is not the ideal key, compute a special purpose shard key or use a compound shard key.

In the next section, we will discuss what range-based sharding in MongoDB is.

Range-Based MongoDB Shard Key

In range-based sharding, MongoDB divides data sets into different ranges based on the values of shard keys. Thus it provides range-based partitioning.


For example, consider a numeric shard key. If an imaginary number line goes from negative infinity to positive infinity, each shard key value falls at some point on that line. MongoDB partitions this line into chunks where each chunk can have a range of values. In range-based MongoDB sharding, documents having “close” shard key values reside in the same chunk and shard.

Range-based partitioning supports range queries because for a given range query of a shard key, the query router can easily find which shards contain those chunks. Data distribution in range-based partitioning can be uneven, which may negate some benefits of sharding.  

For example, if a shard key field size increases linearly, such as time, then all requests for a given time range will map to the same chunk and shard. In such cases, a small set of shards may receive most of the requests and the system would fail to scale.

In the next section, we will discuss Hash based sharding.

Hash-Based Sharding in MongoDB

For hash based partitioning, MongoDB first calculates the hash of a field’s value, and then creates chunks using those hashes. Unlike range-based partitioning, in hash based partitioning, documents with “close” shard key values may not reside in the same chunk.  

Hash-based Sharding have the following features:

Data is evenly distributed among shards

Hashed key values randomly distribute data across chunks and shards

Range queries are ineffective, as data will be distributed to many shards rather than a few number of shards as in the case of range based partitioning

In the next section, we will discuss the impact of shard key cluster operation.

Impact of MongoDB Shard Keys on Cluster Operation

Some shard keys are capable of scaling write operations. Typically, a computed shard key that possess “randomness,” to some extent allows a cluster to scale write operations. For example, hash keys that include a cryptographic hash, such as MD5 or SHA1 (read asMD Five or S-H-A one) can scale write operations. However, random shard keys do not provide query isolation. To improve write scaling, MongoDB enables sharding a collection on a hashed index.  

MongoDB improves write scaling using two methods:

Querying: A mongos instance provides an interface to enable applications to interact with sharded clusters. When mongos receives queries from client applications, it uses metadata from the config server and routs queries to the mongod instances. Mongos makes querying operational in sharded environments. However, the shard key you select can impact the query performance.

Query Isolation: Query execution will be fast and efficient if mongos can route to a single shard using a shard key and metadata stored from the config server. Queries that require routed to many shared will not be efficient as they will have to wait for a response from all of those shards. If your query contains the first component of a compound shard key then mongos can route a query to a single or the minimum number of shards, which provides good performance.  

In the next section, we will discuss production cluster architecture.

Production Cluster Architecture for MongoDB

Special mongod instances, such as config servers store metadata for a sharded cluster. Config servers provide consistency and reliability using a two-phase commit. Config servers do not run as replica sets and must be available to deploy a sharded cluster or to make changes to a cluster metadata. The production sharded cluster shown in the image here has the following three config servers.


Each config server must be on separate machines. A single sharded cluster must have an exclusive use of its config servers. If you have multiple sharded clusters, you will need to have a group of config servers for each cluster. Config servers store the metadata in the config databases. The mongos instance caches this data and uses it to route the reads and writes to shards.  

MongoDB performs writes operations in the config server only in the following cases:

To split the existing chunks and

To migrate a chunk between shards.

MongoDB reads data from the config server when:

A new mongos starts for the first time, or an existing mongos restarts.

After a chunk migration, the mongos instances update themselves with the new cluster metadata.

MongoDB also uses the config server to manage the distributed locks.  

In the next section of the MongoDB sharding tutorial, we will discuss Config server availability.

Config Server Availability for MongoDB

When one or more config servers are unavailable, the metadata of the cluster becomes read-only. You can still read and write data from shards. However chunk migrations occur only when all the three servers are available. If all three config servers are unavailable, you can still use the cluster provided the mongos instances do not restart. If the mongos instances are restarted before the config servers are available, the mongos become unable to route reads and writes.

Clusters are inoperable without the cluster metadata. Therefore, ensure that the config servers remain available and intact throughout. Query routers are the mongos processes that interface with the client applications and direct queries to the appropriate shard or shards. All the queries from client applications are processed by the query router and in the sharded cluster it is recommended to use 3 query routers.  

In the next section of the MongoDB sharding tutorial, we will discuss production cluster deployment.

Production Cluster Deployment in MongoDB

When deploying a production cluster, ensure data redundancy and systems availability. A production cluster must have the following components:

Config Servers: there are three config servers. Each of the config servers must be hosted on separate machines. Each single sharded cluster must have an exclusive use of its config servers. For multiple sharded clusters, you must have a group of config servers for each cluster.

Shards: A production cluster must have two or more replica sets or shards.

Query Routers or mongos: A production cluster must have one or more mongos instances. The mongos instances act as the routers for the cluster. Production cluster deployments typically have one mongos instance on each application server. You may deploy more than one mongos instances and also use a proxy or load balancer between the application and mongos.  

When making these deployments, configure the load balancer to enable a connection from a single client reach the same mongos. Cursors and other resources are specific to a single mongos instance, and hence each client must interact with only one mongos instance.

In the next section MongoDB sharding tutorial, we will discuss how to deploy a sharded cluster.

Deploy a MongoDB Sharded Cluster

To deploy a sharded cluster for MongoDB, perform the following sequence of tasks:

Step 1: Create data directories for each of the three config server instances. By default, a config server stores its data files in the /data/configdb directory (read as slash data slash config D-B directory). You can also choose an alternate location to store the data. To create a data directory, issue the command as shown below.

mkdir /data/configdb

Step 2: Start each config server by issuing a command using the syntax given here. The default port for config servers is 27019 (read as 2-7-0-1-9). You can also specify a different port. The example given here starts a config server using the default port and default data directory. By default, a mongos instance runs on the port 27017.

mongod --configsvr --dbpath /data/configdb --port 27019

Step 3: To start a mongos instance, issue a command using the syntax given here:

mongo --host --port 27017

For example, to start a mongos that connects to a config server instance running on the following hosts and on the default ports, issue the last command given here.

sh.addShard( “")


In the next section of the MongoDB sharding tutorial, we will discuss how to add shards to a cluster.

Add MongoDB Shards to a Cluster

To add shards to a cluster, perform the following steps.


Step 1: From a mongo shell, connect to the mongos instance and issue a command using the syntax given here.

mongo --host <hostname of machine running mongos> --port <port mongos listens on>

If a mongos is accessible at (read as node one dot example dot net) on the port 27017, (read as 2-7-0-1-7) issue the following command given here:

mongo --host --port 27017

Step 2: Add each shard to the cluster using the sh.addShard() (read as Add shard method on the sh console ) method, as shown in the examples given here.


Step 3: issue the sh.addShard() method separately for each shard. If a shard is a replica set, specify the name of the replica set and specify a member of the set. In production deployments, all shards should be replica sets. To add a shard for a replica set named rs1 (read as RS one) with a member running on the port 27017 on as mongo D-B zero dot example dot net), issue the following command given here.

sh.addShard( "rs1/" )

To add a shard for a standalone mongod on the port 27017 of, issue the following command given here:

sh.addShard( "" )

In the next section of the MongoDB sharding tutorial, we will view a demo on how to create a sharded cluster in MongoDB.

Enable Sharding for MongoDB Database

Before you start sharding a collection, first enable sharding for the database of the collection. When you enable sharding for a database, it does not redistribute data but allows sharding the collections in that database. MongoDB assigns a primary shard for that database where all the data is stored before sharding begins. To enable sharding, perform the following steps.


Step 1: From a mongo shell, connect to the mongos instance. Issue a command using the syntax given here.

mongo --host <hostname of machine running mongos> --port <port mongos listens on>

Step 2: Issue the sh.enableSharding() (read as Sh dot enable sharding) method, specify the name of the database for which you want to enable sharding.

Use the syntax given here:


Optionally, you can enable sharding for a database using the “enableSharding” command. For this, use the syntax given here.

db.runCommand( { enableSharding: <database> } )

In the next section of the MongoDB sharding tutorial, we will discuss how to enable sharding for a collection.

Enable MongoDB Sharding for Collection

You can enable sharding on a per-collection basis. To enable sharding for a collection, perform the following steps.


Step 1: Determine the shard key value. The selected shard key value impacts the efficiency of sharding.

Step 2: If the collection already contains data, create an index on the shard key using the createIndex() method. If the collection is empty then MongoDB will create the index as a part of the sh.shardCollection()(read as S-H dot shard collection) step.

Step 3: To enable sharding for a collection, open the mongo shell and issue the sh.shardCollection()(read as S-H dot shard collection) method. The method uses the syntax given here.

sh.shardCollection("<database>.<collection>", shard-key-pattern)

We will continue with our discussion on enabling MongoDB sharding for a collection.

Enable MongoDB Sharding for Collection (contd.)

To enable sharding for a collection replace the string. (read as database dot collection) with the full namespace of your database. This string consists of the name of your database, a dot, and the full name of the collection.


The shard-key-pattern represents your shard key, which you specify in the same form as you would an index key pattern.

sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } )

sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } )

sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } )

sh.shardCollection("events.alerts", { "_id": "hashed" } )

This example given above shows MongoDB sharding collections based on the partition key.

In the next section of the MongoDB sharding tutorial, we will discuss maintaining a balanced data distribution.

Maintaining a Balanced Data Distribution for MongoDB

When you add new data or new servers to a cluster, it may impact the data distribution balances within the cluster.


For example, some shards may contain more chunks than another shard or the chunk sizes may vary and some can be significantly larger than other chunks. MongoDB uses two background processes, splitting and a balancer to ensure a balanced cluster.

In the next section of the MongoDB sharding tutorial, we will discuss how MongoDB uses splitting to ensure a balanced cluster.

Splitting in MongoDB

Splitting is a background process that checks the chunk sizes. When a chunk exceeds a specified size, MongoDB splits the chunk into two halves.
When performing a split function:

MongoDB uses the insert and update functions to trigger a split.

MongoDB does not perform any data migration or affect the shards.

When you split, the chunk distribution may uneven a collection across the shards. In such cases, the mongos instances will redistribute the chunks across shards.

In the next section of the MongoDB sharding tutorial, we will discuss Chunk Size

Chunk Size in MongoDB

In MongoDB, the default chunk size is 64 megabytes. You can alter a chunk size. However, this may impact the efficiency of the cluster. Smaller chunks enable an even distribution of data but may result in frequent migrations. This creates expense at the query routing or mongos layer.  

Larger chunks encourage fewer migrations and are more efficient from the networking perspective and in terms of internal overhead at the query routing layer. However, this may lead to an uneven distribution of data. The chunk size affects the maximum number of documents per chunk to migrate. If you have many deployments to make, you can use larger chunks. This will help avoid frequent and unnecessary migrations although the data distribution will be uneven.

When you split chunks, changing the chunk size affects with the following limitations.

Automatic splitting occurs only during inserts or updates. When you reduce the chunk size, all chunks may take time to split to the new size.

Once you split chunks, they cannot be “undone”. When you increase the chunk size, the existing chunks must grow through inserts or updates until they reach the new size.  

In the next section of the MongoDB sharding tutorial, we will discuss special chunk types.

Special Chunk Types in MongoDB

Following are the special chunk types.

Jumbo Chunks

Indivisible Chunks

Jumbo chunks

MongoDB does not migrate a chunk if its size is more than the specified size or if the number of documents contained in the chunk exceeds the maximum number of documents per chunk to migrate.

In such situations, MongoDB splits the chunk. If the chunk cannot be split, MongoDB labels the chunk as jumbo.

This avoids repeated attempts to migrate the same chunk. 

Indivisible Chunks

Sometimes, chunks can grow beyond the specified size but cannot be split. For example, when a chunk represents a single shard key value. Invisible chunks are those chunks that cannot be broken by the split process.
In the next section of the MongoDB sharding tutorial, we will discuss shard balancing.

Shard Balancing in MongoDB

MongoDB uses balancing to redistribute data within a sharded cluster. When the shard distribution in a cluster is uneven, the balancer migrates chunks from one shard to another to achieve a balance in chunk numbers per shard.

Chunk migrations is a background operation that occurs between two shards—an origin and a destination. The origin shard sends the destination shard to all the current documents. During the migration, if an error occurs, the balancer aborts the process and leaves the chunk unchanged in the origin shard. 

Adding a new shard to a cluster may create an imbalance because the new shard has no chunks. Similarly, when a shard is being removed, the balancer migrates all the chunks from the shard to other shards. After all the data is migrated and the metadata is updated, the shard can be safely removed.

In the next section of the MongoDB sharding tutorial, we will discuss shard balancing.

Shard Balancing in MongoDB  (contd.)

Chunk migrations carry bandwidth and workload overheads, which may impact the database performance.  

Shard balancing minimizes overhead impact by:

Moving only one chunk at a time.
Starting a balancing round only when the difference between the greatest number of chunks for a sharded collection and the shard with the lowest number of chunks for that collection reaches the migration threshold.  

In some scenarios, you may want to disable the balancer temporarily such as for maintenance of MongoDB or during the peak load time to prevent impacting the performance of MongoDB.

In the next section of the MongoDB sharding tutorial, we will discuss customized data distribution.

Customized Data Distribution with Tag Aware Sharding in MongoDB

A MongoDB administrator can control sharding by using tags. Tags help control the balancer behaviour and chunks distribution in a cluster. As an administrator you can create tags with the ranges of shard keys and assign them those to the shards.

The balancer process migrates the tagged data to the shards that are assigned for those tags. Tag aware sharding in MongoDB helps you improve the locality of data for a sharded cluster.  

In the next section of the MongoDB sharding tutorial, we will discuss Tag Aware Sharding for MongoDB.

Tag Aware Sharding in MongoDB

In mongoDB, you can create tags for a range of shard keys to associate those ranges to a group of shards. Those shards receive all inserts within that tagged range. The balancer that moves chunks from one shard to another always obeys these tagged ranges. The balancer moves or keeps a specific subset of the data on a specific set of shards and ensures that the most relevant data resides on the shard which is geographically closer to the client\application server.

In the next section of the MongoDB sharding tutorial, we will discuss how to add shard tags.

Add Shard Tags in MongoDB

When connected to a mongos instance, use the sh.addShardTag() (read as s-h dot add shard tag) method to associate tags with a particular shard. A single shard may have multiple tags whereas multiple shards may have a common tag.

sh.addShardTag("shard0000", "NYC")

sh.addShardTag("shard0001", "NYC")

sh.addShardTag("shard0002", "SFO")

sh.addShardTag("shard0002", "NRT")

Consider the example given above. This adds the tag NYC to two shards, and adds the tags SFO and NRT to a third shard. To assign a tag to a range of shard keys, connect to the mongos instance and use the sh.addTagRange()(read as s-h dot add tag Range) method.

Any given shard key range may only have one assigned tag. You cannot overlap defined ranges, or tag the same range more than once. A collection named users in the records database is sharded by the zipcode field.


The following operations assign:

• Two ranges of zip codes in Manhattan and Brooklyn, the NYC tag

• One range of zip codes in San Francisco, the SFO tag

sh.addTagRange("records.users", { zipcode: "10001" }, { zipcode: "10281" }, "NYC")
sh.addTagRange("records.users", { zipcode: "11201" }, { zipcode: "11240" }, "NYC")
sh.addTagRange("records.users", { zipcode: "94102" }, { zipcode: "94135" }, "SFO")

(Note that the shard key range tags are distinct from the replica set member tags.)

Hash-based sharding in MongoDB only supports tag-aware sharding on an entire collection. Shard ranges include the lower value and exclude the upper value. 

In the next section of the MongoDB sharding tutorial, we will discuss how to remove shard tags.

Remove Shard Tags in MongoDB

Typically, shard tags exist in the shard’s document in a collection of the config database. To return all shards with a specific tag, use the operations given here.

use config

db.shards.find({ tags: "NYC" })

This will return only those shards that are tagged with NYC. You can find tag ranges for all the namespaces in the tags collection of the config database. The output of sh.status()(read as s-h dot status) displays all tag ranges. To return all shard key ranges tagged with NYC, use the following sequence of operations given here:

use config

db.tags.find({ tags: "NYC" })

The mongod facilitates removing a tag range. To delete a tag assignment from a shard key range, remove the corresponding document from the tags collection of the config database. Each document in the tags holds the namespace of the sharded collection and a minimum shard key value.

The third example given here removes the NYC tag assignment for the range of zip codes within Manhattan.

use config

db.tags.remove({ _id: { ns: "records.users", min: { zipcode: "10001" }}, tag: "NYC" })


Here is a quick recap of what was covered in this MongoDB replication and Sharding tutorial:

Replication is recommended in the production setting to increase data availability and redundancy.

Replica sets are recommended for new production deployments to replicate data in a cluster.

A typical replica set in MongoDB consists of one primary and two secondaries.

In addition to the primary and secondary, a replica set can have an arbiter, a priority zero member, a hidden member, and a delayed member.

Consider deploying a sharded cluster when the data set exceeds the storage capacity of a single MongoDB instance, the size of the active working set exceeds the capacity of the maximum available RAM, or a single • • • • MongoDB instance is unable to manage the write operations.

To deploy a sharded cluster, start the config server database instances and then start the three config server instances.

Shard balancing is a background process that manages chunk migration for an even distribution data shards in a cluster.


This concludes the lesson Replication and Sharding in MongoDB. In the next lesson, we will discuss developing Java and Node JS application with MongoDB.


Find our MongoDB Developer and Administrator Online Classroom training classes in top cities:

Name Date Place
MongoDB Developer and Administrator 22 Feb -15 Mar 2020, Weekend batch Your City View Details
  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Phone Number*
Job Title*