Administration of MongoDB Cluster Operations Tutorial

7.1 Administration of MongoDB Cluster Operations

Hello and welcome to Lesson 6 of the MongoDB Administrator course offered by Simplilearn. This lesson will explain the administrative features of MongoDB. Let us explore the objectives of this lesson in the next screen.

7.2 Objectives

After completing this lesson, you will be able to: • Explain what capped collections are • Explain how a Grid File System, or GridFS ( Read as Grid F-S), is used for storing large data • Explain what memory-mapped files are • Explain allocation algorithms • Describe the storage engines MongoDB supports In the next screen, we will begin with what capped collections are.

7.3 Capped Collection

Capped collections in MongoDB are collections with predefined sizes and support operations that insert and retrieve documents. Capped collections function like circular buffers; once a collection fills its allocated space, it overwrites older documents to allocate space for new documents. Capped collections exhibit the following behaviors: • They preserve an insertion order. Therefore, queries can return documents in the insertion order without the help of an index. This helps them support higher insertion throughput. • They ensure that the insertion order functions similar to the order on the disk or the natural order. They do so by prohibiting updates that increase document size. They allow only those updates that fit the original document size. This ensures that a document’s location on the disk remains unaltered. • They delete the oldest documents in a collection automatically. They do not require scripts or explicit remove operations to do so. In the next screen, we will discuss how to create a capped collection.

7.4 Capped Collection Creation

To create a capped collection, you need to use the createCollection() ( Read as create collection) method explicitly. When creating a capped collection, you need to specify the maximum size of the collection in bytes. MongoDB will pre-allocate the size for the collection. The size must include space for the internal overhead. Use the syntax given on the screen to specify the size. db.createCollection( "serverlog", { capped: true, size: 100000 } ) If the specified size field is less than or equal to 4096 (Read as four thousand ninety six) bytes, then the collection will have a cap of 4096 bytes. Otherwise, MongoDB will increase the size to make it an integer multiple of 256 bytes. Additionally, you may also specify the maximum number of documents for the collection using the max field as shown on the screen. db.createCollection("server", { capped : true, size : 5242880, max : 5000 } ) We will continue our discussion on the creation of a capped collection in the next screen.

7.5 Capped Collection Creation (contd.)

When creating a capped collection, you need to do the following: Query a Capped Collection: If you perform a find() query on a capped collection without specifying any order, MongoDB ensures that the ordering of the results is the same as the insertion order. To retrieve documents in the reverse insertion order, issue the find() query along with the sort() method and set the $natural parameter to -1 ( Read as minus one), as shown in the example given on the screen. db.cappedCollection.find().sort( { $natural: -1 } ) Check if a Collection is Capped: Use the isCapped() method to determine if a collection is capped, as given in the command on the screen. db.collection.isCapped() Convert a Collection to Capped: Convert a non-capped collection to a capped collection with the convertToCapped ( Read as convert to capped) command. db.runCommand({"convertToCapped": "items", size: 100000}); In the next screen, we will view a demo on how to create a capped collection.

7.6 Demo-Create a Capped Collection in MongoDB

This demo will show the steps to create a capped collection in MongoDB. Click the demo icon to view the demo.

7.8 Capped Collection Restriction

When updating a capped collection, you can make only in-place updates to documents. If the update operation increases the original document size, the operation does not succeed. To update documents in a capped collection, first create an index to avoid a table scan. When you update a document to a smaller size than the original document size, a secondary resyncs from the primary. The secondary replicates and allocates space based on the current size of the updated document. However, if the primary receives an update which restores the original document size, the primary accepts the update, whereas the secondary fails. The secondary receives the error message “objects in a capped ns (Read as N-S) cannot grow”. Note that you cannot remove documents from a capped collection. However, you can remove all documents from a collection by dropping the entire collection using the drop() method. You also cannot shard a capped collection. In the next screen, we will discuss TTL collection features.

7.9 TTL Collection Features

For a capped collection, data is automatically removed once the collection grows beyond the threshold size and the defined number of documents. To allow flexibility for expiring data, consider MongoDB’s time to live, or TTL, indexes. TTL indexes help delete data from normal collections using a special type field and a TTL value for the index. For TTL collections, mongod (Read as mongo D) automatically removes data after a specified duration of time in seconds or at a specific clock time. Note that TTL collections are not compatible with capped collections. A special TTL index property supports the implementation of TTL collections as shown on the screen. db.log_events.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } ) In the next screen, we will view a demo on how to create TTL indexes in MongoDB.

7.10 Demo-Create TTL Indexes

This demo will show the steps to create TTL indexes in MongoDB. Click the demo icon to view the demo.

7.12 GridFS

GridFS is a specification used for storing and retrieving files exceeding 16 MB, the BSON-limit for document size. GridFS splits a file into different parts or chunks, and stores each unit as a separate document. The default chunk size specified by GridFS is 255 KB. Typically, GridFS stores files in two collections. In one collection, the file chunks are stored, whereas the metadata is stored in the other collection. When returning a query result from GridFS, the driver or client reassembles the chunks as required. You can perform range queries on files stored in GridFS. For example, you can “skip” into the middle of a video or an audio file to access information. GridFS allows you to access any files without having to load the entire file into memory. In the next screen, we will discuss GridFS collection.

7.13 GridFS Collection

You can store and retrieve files from GridFS using either of the following tools: • A MongoDB driver • The “mongofiles” command-line tool in the mongo shell GridFS stores files in two collections: • Fs.chunks (Read as F-S dot chunks) store the binary chunks. • FS.files (Read as F-S dot files) store the file’s metadata. You can choose a different bucket name and also create multiple buckets in a single database. Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFS store. Each chunk is identified by its unique ObjectId stored in its _id field

7.14 Demo-Create GridFS in MongoDB Java Application

GridFS uses a unique, compound index on the chunks collection for the files_id (Read as files underscore ID) and n fields. The files_id field contains the _id of the chunk’s “parent” document. The n field contains the sequence number of the chunk. GridFS numbers all chunks, starting with 0 (Read as zero). For the descriptions of the documents and fields in the chunks collection, the GridFS index allows the efficient retrieval of chunks using the files_id and n values, as shown in the following example given on the screen. cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1}); If your driver does not create this index, issue the following operation given on the screen using the mongo shell. db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } ); In the next screen, we will view a demo on how to insert an image in MongoDB from a Java application.

7.15 Demo-Create GridFS in MongoDB Java Application

This demo will show the steps to insert an image in MongoDB from a Java application. Click the demo icon to view the demo.

7.16 Memory-Mapped Files

A memory-mapped file contains data stored by the operating system in its memory. This file uses the mmap()( Read as M- Map) system call to store data. The mmap() method maps the file to a region of virtual memory. Memory-mapped files are the critical pieces of the MMAPv1 ( Read as M- Map version one) storage engine in MongoDB. The memory-mapped files enable MongoDB to treat the contents of its data files as if they were in memory. This enables MongoDB to access and manipulate data faster. MongoDB uses the memory-mapped files to manage and interact with all types of data. Memory mapping assigns data files to a virtual memory block having a byte-for-byte correlation. As and when accessing documents, the MongoDB memory maps them. Therefore, only accessed data is mapped to the memory, whereas the data that is not accessed is not memory mapped. Once memory mapping is complete, the relationship between the file and memory allows MongoDB to interact with the data in the file as if it were in memory. In the next screen, we will discuss journaling mechanics.

7.17 Journaling Mechanics

MongoDB uses write operations before logging into an on-disk journal. This ensures the strength of write operations. Before making any changes to the data files, MongoDB first performs the change operation in the journal. If a MongoDB instance encounters any error or terminates before writing the changes from the journal to the data files, MongoDB can re-apply the write operation to maintain a consistent state. If a mongod instance exits unexpectedly without a journal, this means that your data is not consistent. You must then run a repair or, preferably, resync from a clean member of the replica set. When journaling is enabled, even if mongod stops unexpectedly, the program can recover all the written data to the journal. The recovered data is in a consistent state. By default, the writes lost by MongoDB—that is, the writes not made to the journal, are made in the last 100 milliseconds. If journaling is enabled and if you have sufficient RAM in your system, the entire data set and the write working set can reside in the RAM. To enable journaling, start mongod with the journal command-line option. In the next screen, we will discuss storage engines.

7.18 Storage Engines

A storage engine of a database manages how data is stored on a disk. Typically, databases support multiple storage engines, each engine performing specific workloads. For example, one storage engine may manage read-heavy operations, whereas another may support a higher-throughput for write operations. With multiple storage engines available, you can choose one that best suits your application. In the next screen, we will discuss MMAPv1 (Read as M-Map version 1) storage engine.

7.19 MMAPv1 Storage Engine

MMAPv1 is a storage engine based on memory-mapped files. This storage engine can manage high-volume operations, such as inserts, reads, and in-place updates. MMAPv1 is the default storage engine in MongoDB 3.0 and all previous versions. To ensure that all data set modifications are stored on the disk, MongoDB records all modifications in a journal. It writes to the disk more frequently than it writes the data files. The journal lets MongoDB recover data from data files after a mongod instance exits without making all changes. In the next screen, we will discuss the WiredTiger ( Read as Wired Tiger) storage engine.

7.20 WiredTiger Storage Engine

In MongoDB version 3.0, an additional storage engine is available. Although Mmapv1 remains the default storage engine, the new WiredTiger engine offers additional flexibility and improved throughput for many workloads. The WiredTiger storage engine is optionally available in the 64-bit build of MongoDB version 3.0. This engine supports high volumes of read, insert, and more complex update workloads. Document-Level Locking: All write operations in WiredTiger occur within the context of a document-level lock. Therefore, multiple clients can simultaneously modify more than one document in a single collection. With such control, MongoDB can effectively support read, write, update, and high-throughput concurrent workloads. For data persistence, WiredTiger uses a write-ahead transaction log in combination with various checkpoints. Using WiredTiger, MongoDB commits a checkpoint to the disk either every 60 seconds or when there are only 2 gigabytes of data to write. The checkpoint thresholds are configurable and all data files between and during the checkpoints are always valid. The WiredTiger journal persists all data modifications between the checkpoints. If MongoDB exits between the checkpoints, it enables the journal to replay all the data modified since the last checkpoint. By default, the WiredTiger journal is compressed using the snappy algorithm. In the next screen, we will discuss WiredTiger compression support.

7.21 WiredTiger Compression Support

MongoDB uses block and prefix compressions to support compression for collections and indexes. Compression minimizes the storage use at the cost of an additional CPU. By default, prefix compression is used for all indexes in the WiredTiger engine. In addition, by default, all collections with WiredTiger use block compression combined with the snappy algorithm. Compressions with zlib ( Read as z - lib) are also available. You can modify the default compression settings for all collections and indexes. During collection and index creation, you can configure compressions on per collection and per index basis. For most workloads, the default compression settings balance storage efficiency and processing requirements. In the next screen, we will discuss the power of 2 sized allocations.

7.22 Power of 2-Sized Allocations

MongoDB version 3.0 uses the power of 2 sized allocation as the default record allocation strategy for the MMAPv1 storage engine. With this strategy, each record has a size in bytes that is a power of 2, for example, 32, 64, 128, 256, 512 KB, and 2MB. For documents larger than 2MB, the allocation is rounded up to the nearest multiple of 2MB. The power of 2 sized allocation strategy has the following key properties: • It can reuse freed records and reduce fragmentation. Quantizing record allocation sizes in a fixed set of values ensures that an insert will fit into the free space available due to deletion or relocation of earlier documents. • It can reduce moves. The added padding space gives a document scope to grow without requiring a move. In addition to saving the cost of moving, this results in few updates. In the next screen, we will discuss no padding allocation strategy.

7.23 No Padding Allocation Strategy

For some collections, the workloads that consist of insert only or update operations do not increase or change the document sizes. For such workloads, you can disable the power of 2 allocation using the collMod (Read as collection modification) command with the noPadding flag or the db.createCollection()(Read as D-B dot create collection) method with the noPadding option. Prior to version 3.0, MongoDB used an allocation strategy that included dynamically calculated padding as a factor of the document size. In the next screen, we will discuss how to diagnose performance issues.

7.24 Diagnosing Performance Issues

Degraded performance in MongoDB is typically a function that depicts a relationship among the following: • The quantity of data stored in the database • The amount of available system RAM • The number of connections to the database, and • The amount of time the database spends in the locked state. The performance issues in some cases are momentary and relate to traffic load, data access patterns, or the availability of hardware in the host system for virtualized environments. The performance issues are also the results of inadequate or inappropriate indexing strategies and poor schema design patterns. In other situations, performance issues may indicate that the database requires additional capacity. We will continue with our discussion on diagnosing performance issues in the next screen.

7.25 Diagnosing Performance Issues (contd.)

The following are a few causes of performance degradation in MongoDB. Locks: To ensure data set consistency, MongoDB uses locks. However, if certain operations run for longer durations or queue up, performance degrades because requests and operations wait for a lock. Lock-related slowdowns can be irregular. To know if a lock affects your database performance, review the data in the global-lock section of the serverStatus output. If “” ( Read as global lock dot current queue dot total) is consistently high, then it is possible that a large number of requests are waiting for a lock. This indicates that a possible concurrency issue is affecting your database performance. Memory Usage: MongoDB uses memory-mapped files to store data. For a data set of a sufficient size, the MongoDB process allocates all the available memory to the system. This is part of the design for MongoDB’s enhanced performance. However, memory-mapped files make it difficult to determine if the amount of RAM is sufficient for a data set. To determine MongoDB’s memory usage, check the memory usage statuses metrics of the serverStatus output. You can also check the resident memory using the mem.resident (Read as mem dot resident) command. If this exceeds the system memory and there is a significant amount of data on the disk that is not in the RAM, then it means that you have exceeded the capacity of your system. In addition, check the amount of the mapped memory, mem.mapped(Read as mem dot mapped). If this value is greater than the system memory, some operations may require that data be read from the virtual memory. This will negatively impact the system performance. We will discuss a few more performance issues in the next screen.

7.26 Diagnosing Performance Issues (contd.)

A few more causes of performance issues are as follows: Page Faults: When the MMAPv1 storage engine is used, page faults occur as MongoDB reads from or writes data to the data files that are not located in its physical memory. On the contrary, operating system page faults occur when the physical memory is exhausted and pages of the physical memory are swapped to the disk. Page faults triggered by MongoDB are reported as the total number of page faults in one second. To check for page faults, view the extra_info.page_faults (Read as Extra underscore info dot page underscore faults) value in the serverStatus output. Page fault counters in MongoDB may increase drastically when poor performance happens and may correlate with the limited physical memory environment. Page faults can increase when accessing larger data sets, such as scanning an entire collection. Limited and sporadic page faults in MongoDB do not necessarily indicate an issue and do not require any corrective measure. To reduce the page fault frequency, increase the RAM in MongoDB. Alternatively, deploy a sharded cluster or add shards and distribute loads among mongod instances. Number of Connections: The server’s ability to handle a request depends on the number of connections between the application layer and the database. If the number of connections is too high, this can result in performance irregularities. If the number of requests is high because of numerous concurrent application interactions, the database may not be able to meet the demand. In such a case, you need to increase the capacity of your deployment. For read-heavy applications, increase the size of your replica set and distribute read operations to secondary members. For write-heavy applications, deploy sharding and add one or more shards to a sharded cluster and distribute the load among mongod instances. MongoDB has no limit on incoming connections unless it is constrained by system-wide limits. In the next screen, we will view a demo on monitoring performance in MongoDB.

7.27 Demo-Monitor Performance in MongoDB

This demo will show the steps to monitor performance in MongoDB. Click the demo icon to view the demo.

7.29 Optimization Strategies for MongoDB

Many factors can affect database performance and responsiveness. These include index usage, query structure, data models, and application design. In addition, operational factors such as architecture and system configuration can also affect database performance. The following techniques are used for evaluating the operational performance of MongoDB. Database Profiler MongoDB provides a database profiler that can be used to identify the performance characteristics of each operation against the database. For example, using the profiler, you can identify the queries or write operations that are running slow. You can use this information to determine what indexes to create. Capped Collections Capped Collections are circular, fixed-size collections that help keep documents in a proper order even without the use of an index. Capped collections receive very high-speed writes and sequential reads. For faster write operations, use capped collections. $Natural Order To return documents in the order they exist on the disk and return sorted operations, use the $natural operator. On a capped collection, this also returns the documents in the order in which they were written. The natural order does not use indexes but enables faster operations when you want to select the first or last item from the disk. In the next screen, we will discuss how to configure tag sets for replica sets.

7.30 Configure Tag Sets for Replica Set

You can configure tag sets in a replica set using the following methods: db.currentOp() (Read as D-B dot current op): Use this method to evaluate mongod operations. This method reports the current operations running on a mongod instance. cursor.explain()(Read as cursor dot explain) and db.collection.explain()( Read as D-B dot collection dot explain): Use these explain methods to evaluate a query performance, such as the index MongoDB selects to fulfil a query and its execution statistics. You can run the methods in the following three modes to control the amount of information returned: • queryPlanner(Read as query planner) • executionStats (Read as execution stats) • allPlansExecution (Read as all plan execution) In the next screen, we will discuss how to optimize query performance.

7.31 Optimize Query Performance

You can optimize query performance in the following ways: Create Indexes to Support Queries Typically, scanning an index is much faster than scanning a collection. For commonly issued queries, create indexes for quick returns. To search for multiple fields, create a compound index for the query. The index structures are smaller than the documents reference and store references in an order. Limit Query Results to Reduce Network Demand MongoDB cursors return results in groups of multiple documents. To get the desired number of results, use the limit() method to reduce the demand for network resources. For example, to get 10 results from a query to the “posts” collection, issue the following command given on the screen. db.posts.find().sort( { timestamp : -1 } ).limit(10) Use Projections to Return Only Necessary Data To receive a subset of fields from documents, you can choose to have better performance by returning only the required fields. For example, for the posts collection query, you need the timestamp, title, author, and abstract fields. Therefore, you would issue the following command given on the screen: db.posts.find( {}, { timestamp : 1 , title : 1 , author : 1 , abstract : 1} ).sort( { timestamp : -1 Use $hint to Select a Particular Index You can force MongoDB to use a specific index using the hint() method. Use the hint() method on queries where you must select the fields included in several indexes to support performance testing. Use the Increment Operator to Perform Operations Server-Side If a field is a type number, then use the $inc operator to increment or decrement values in documents. This operator executes the logic at the server side rather than making changes at the client side before sending the whole document to the server. It also helps avoid race conditions. In the next screen, we will discuss the monitoring strategies used in MongoDB.

7.32 Monitoring Strategies for MongoDB

Monitoring is a critical component of the entire database administration. The following three strategies are used to collect data and monitor a MongoDB instance: • Utilities are distributed with MongoDB that provides real-time reporting of database activities. • Database commands return statistics regarding the current database state with greater reliability. • MMS monitoring collects data from running MongoDB deployments and provides visualization and alerts based on that data. Note that these strategies are complementary. Each strategy can help answer different questions and is useful in different contexts. In the next screen, we will discuss MongoDB Utilities.

7.33 MongoDB Utilities

MongoDB includes a number of utilities that quickly return statistics about the performance and activity of an instance. Typically, these are useful for diagnosing issues and assessing a normal MongoDB operation. These utilities are as follows: Mongostat: It captures and returns the counts of database operations by type. For example, insert, query, update, delete, and so on. These counts report the load distribution of the server. You can use mongostat to understand the distribution of operation types and to inform capacity planning. Mongotop: The mongotop (Read as mongo top) utility tracks the current read and write activities of a MongoDB instance and reports their statistics on a per collection basis. You can use mongotop to check whether your database activity matches your expectations. HTTP Console: MongoDB provides a Web interface for receiving diagnostic and monitoring information in a simple Web page. You can access this Web interface from localhost: (Read as Localhost semicolon port). Here, the number is 1000 more than the mongod port. For example, if a locally running mongod is using the default port 27017( Read as 2-7-0-1-7), you will be able to access the HTTP console from the link given on the screen. http://localhost:28017 In the next screen, we will discuss MongoDB commands.

7.34 MongoDB Commands

MongoDB uses a number of commands that report the state of the database. This data may provide a finer level of granularity than the utilities. You can use their output in scripts and programs to either develop custom alerts or to modify the behavior of your application. The commands are as follows: db.currentOp: This method helps identify the in-progress operations of a database instance. serverStatus: The serverStatus command, or the db.serverStatus()(Read as D-B dot server status) method from the shell, returns an overview of the database status. This command provides details of the disk usage, memory usage, connection, journaling, and index access. The command returns quickly and does not impact the MongoDB performance. dbStats: This command when executed from the shell returns a document that addresses the storage usage and data volumes. dbStats reflects the amount of storage used, the quantity of data contained in the database, object, collection, and index counters. collStats: The collStats (Read as collection stats) or db.collection.stats (Read as D-B dot collection dot stats) method executed from the shell provides statistics, such as the number of documents in the collection, size of the collection, hard disk utilization by the collection, and information regarding its indexes. replSetGetStatus (Read as replica set get status) or rs.status()( Read as R-S dot status) from the shell returns an overview of the replica set’s status. This command provides details of the state and configuration of the replica set and also provides statistics about its members. These metrics are very useful to check whether a replica set is properly configured or not. Gangliamongodb(Read as ganglia Mongo D-B): Ganglia is the python script for viewing a replica set information, such as memory usage, B-Tree statistics, master-slave status, and current connections. In addition, Mtop, munin, and nagios are the tools used to check server statics and scripts. In the next screen, we will discuss MongoDB Management Service, or MMS.

7.35 MongoDB Management service (MMS)

MMS is a cloud service that helps monitor, backup, and scale MongoDB on the infrastructure of your choice. You can monitor more than a 100 system metrics and get custom alerts before your system starts degrading. You can create your own alerts and integrate monitoring with the tools you already use. In the next screen, we will discuss the data backup strategies in MongoDB.

7.36 Data Backup Strategies in MongoDB

When you deploy MongoDB in production, you should have a strategy for capturing and restoring backups to prepare for the data loss events. You can perform a backup of MongoDB clusters in the following ways: • Back up by copying the underlying data files • Back up a database with mongodump • Use the MMS cloud backup In the next screen, we will discuss how to copy underlying data files.

7.37 Copying Underlying Data Files

You can create a backup for MongoDB by copying its underlying data files. If the volume where MongoDB stores data files supports point in time snapshots, you can use these snapshots to create backups of a MongoDB system at an exact moment in time. File systems snapshots are not specific to MongoDB. The mechanics of snapshots depend on the underlying storage system. On a Linux computer, the Logical Volume Manager, or LVM manager, can create a snapshot. To get a correct snapshot of a running mongod process, you must enable journaling. The journal must reside on the same logical volume as the other MongoDB data files. If journaling is not enabled, there is no guarantee that the snapshot will be consistent or valid. To get a consistent snapshot of a sharded cluster, first disable the balancer and capture a snapshot from every shard and the config server simultaneously. If your storage system does not support snapshots, you can copy the files directly using the copy command in Linux, rsync (Read as R- sync), or a similar tool. Copying multiple files is not an atomic operation. Hence, stop all the writes to mongod before copying the files, otherwise the files will be copied in an invalid state. Backups created by copying the underlying data do not support point-in-time recovery for the replica sets and are difficult to manage for larger sharded clusters. These backups are huge because they include the indexes and duplicate the underlying storage padding and fragmentation. On the other hand, mongodump creates smaller backups. In the next screen, we will discuss backup with mongodump.

7.38 Backup with MongoDump

The mongodump tool reads data from a MongoDB database and creates BSON files. The mongorestore (Read as mongo restore) tool populates a MongoDB database with the data from these BSON files. These tools are simple and efficient for backing up small MongoDB deployments. However, they are not suitable for backing up larger systems. mongodump and mongorestore can operate against a running mongod process. These tools can manipulate the underlying data files directly. By default, mongodump does not capture the contents of the local database, it only captures the documents in the database. Although the resulting backup is space efficient, mongorestore or mongod must rebuild the indexes after restoring data. When connected to a MongoDB instance, mongodump can adversely affect the mongod performance. If your data volume is larger than the available system memory, the queries will push the working set out of the memory. To mitigate the impact of mongodump on the performance of the replica set, use mongodump to capture backups from a secondary member of the replica set. Alternatively, you can shut down a secondary and use mongodump with the data files directly. If you shut down a secondary to capture data with mongodump, ensure that the operation completes before its oplog is unable to replicate. To restore a point-in-time backup created with oplog, use mongorestore with the oplogReplay option. If applications modify data while mongodump is creating a backup, mongodump will compete for resources with those applications. In the next screen, we will discuss fsync ( Read as F Sync) and lock.

7.39 Fsync and Lock

mongodump and mongorestore allow you to perform data backups without shutting down the MongoDB server. However, you lose the ability to get a point-in-time view of the data. MongoDB’s fsync command allows you to copy the data directory of a running MongoDB server without any risk of corruption. The fsync command forces the MongoDB server to send all the pending writes to the disk. Optionally, it also prevents any further writes to the database until the server is unlocked. This write-lock feature makes the fsync command suitable for backups. The example given on the screen shows how to run the command from the shell, forcing an fsync, and acquiring a write lock. > use admin > db.runCommand({"fsync" : 1, "lock" : 1}); At this point, the data directory represents a consistent, point-in-time snapshot of your data. As the server is locked for writes, you can safely make a copy of the data file system, such as LVM, which allows a quick snapshot of the data directory. After performing the backup, unlock the database again using the command given on the screen. > db.$cmd.sys.unlock.findOne(); Next, run the currentOp ( Read as current op) command to ensure that the lock is released. Note that it may take a moment after unlock is first requested. The fsync command allows you to perform a flexible backup, without shutting down the server or sacrificing the point-in-time nature of the backup. The only way to have a point-in-time snapshot without any downtime for reads or writes is to back up from a slave. In the next screen, we will discuss Ops Manager, the backup software in MongoDB.

7.40 MongoDB Ops Manager Backup Software

MongoDB Ops (Read as Operations) Manager is a service that allows you to manage, monitor, and back up the MongoDB infrastructure. Ops Manager provides the following services: • Ops (Read as Operations) Manager Backup provides scheduled snapshots and point-in-time recovery of your MongoDB replica sets and sharded clusters. This backup creates snapshots of the standalones that are run as single-member replica sets. • MMS has a lightweight Backup Agent that runs within your infrastructure and backs up data from the MongoDB processes you have specified. In the next screen, we will discuss security strategies in MongoDB.

7.41 Security Strategies in MongoDB

To maintain a secure MongoDB deployment, you need to implement controls and ensure that users and applications have access to only those data required for their job roles. MongoDB provides various features that allow you as an administrator to implement these controls and restrictions for deploying any MongoDB instance. Following are some of the security strategies in MongoDB: Authentication: It is the mechanism for verifying user and instance access to MongoDB. Authorization: This allows you to control a user’s or an application’s access to MongoDB instances. Collection-Level Access Control: This allows scope privileges to specific collections. Network Exposure and Security: It discusses potential security risks related to the network and strategies for decreasing the possible network-based attack vectors for MongoDB. Security and MongoDB API Interfaces: This allows you to reduce and control the potential risks related to MongoDB’s JavaScript, HTTP, and REST interfaces. Auditing: It includes audit server and client activities for mongod and mongos instances. In the next screen, we will discuss authentication implementation.

7.42 Authentication Implementation in MongoDB

Before gaining access to a system, clients should identify themselves to MongoDB. This ensures that a client cannot access the MongoDB data without authentication. MongoDB supports various authentication mechanisms to enable clients to verify their identities. It supports a password-based challenge, response protocol, and x.509 (Read as X- five –O - Nine) certificates. Additionally, MongoDB Enterprise provides support for Lightweight Directory Access Protocol, or LDAP, proxy authentication and Kerberos authentication. x.509 ( Read as X- five –O - Nine) Certificate Authentication: This authentication mechanism was introduced in MongoDB version 2.6. It supports x.509 certificate authentication for use with a secure SSL connection. For server authentication, clients can use x.509 certificates instead of usernames and passwords. For membership authentication, members of sharded clusters and replica sets can use x.509 certificates instead of key files. Kerberos Authentication MongoDB Enterprise8 supports Kerberos service authentication. This is an industry standard authentication protocol for large client-server systems. To use MongoDB with Kerberos, you must have a properly configured Kerberos deployment and Kerberos service principals and add Kerberos user principal to MongoDB. LDAP Proxy Authority Authentication MongoDB Enterprise supports proxy authentication through an LDAP service. In the next screen, we will discuss authentication in a replica set and sharded cluster.

7.43 Authentication in a Replica set

You can authenticate the members of replica sets and sharded clusters. To authenticate the members of a single MongoDB deployment to each other, use the keyFile and x.509 ( Read as X- five –O - Nine) mechanisms. When you use the keyFile authentication, this also enables the authorization for members. You also need to run replica sets and sharded clusters in a trusted networking environment. Ensure that the network permits only trusted traffic to reach each mongod and mongos instance by using: • A firewall and network routing • Virtual private networks, or VPNs, or wide area networks, or WANs In addition, always ensure that: • Your network configuration allows every member of the replica set or sharded cluster to contact every other member. • The KeyFile configuration is done on all the members to permit authentication. In the next screen, we will discuss authentication on sharded clusters.

7.44 Authentication on Sharded Clusters

In sharded clusters, applications authenticate directly to mongos instances, using the credentials stored in the admin database of the config servers. The shards in the sharded cluster also have credentials, and clients can authenticate directly to the shards to perform maintenance on the shards. In general, applications and clients should connect to the sharded cluster through the mongos. Some maintenance operations, such as cleanupOrphaned, compact, and rs.reconfig() (Read as R-S dot config), require direct connections to the specific shards in a sharded cluster. To perform these operations with authentication enabled, you must connect directly to the shard and authenticate as a shard local administrative user. To create a shard local administrative user, connect directly to the shard and create the user. MongoDB stores shard local users in the admin database of the shard itself. These shard local users are completely independent from the users added to the sharded cluster via mongos. Shard local users are local to the shard and are inaccessible by mongos. Direct connections to a shard should only be for shard-specific maintenance and configuration. In the next screen, we will discuss authorization.

7.45 Authorization

Access control or authorization defines a user’s access to a system’s resources and operations. Ideally, users should be able to perform only those operations that are required to fulfil their defined job roles. This is the “principle of least privilege” that controls and limits the potential risk of a compromised application. MongoDB’s role-based access control system allows you as an administrator to control all access and ensures that all the granted access applies as narrowly as possible. MongoDB does not enable authorization by default. When you enable authorization, MongoDB requires authentication for all the connections. After authorization is enabled, MongoDB controls each user access through its assigned role. Each role consists of a set of privileges, and each privilege consists of actions, a set of operations, and resources. Every user may have one or more assigned roles that describe their access. MongoDB provides several built-in roles, such as the read, readWrite, dbAdmin, and root roles. Users can also have customized roles suited to clients’ requirements. MongoDB does not enable authorization by default. You can enable authorization using the --auth or the keyFile options. If you are using a configuration file, you can enable authorization with the security.authorization ( Read as security dot authorization) or the security.keyFile ( Read as security dot key file) settings. You can also can create new roles and privileges to cater to operational needs. You can assign privileges scoped as granularly as the collection level. When you grant a role to a user, the user receives all the privileges associated with the role. A user can have several concurrent roles. In such cases, the user receives all the privileges of the respective roles. In the next screen, we will discuss end-to-end auditing for compliance.

7.46 End-to-End Auditing for Compliance

As an administrator, you need to implement security policies to control the activities in a system. Auditing enables you to verify that the implemented security policies are controlling the activities as desired. Retaining the audit information ensures that you have enough information to perform forensic investigations and comply with the regulations and policies that require the audit data. The auditing facility allows you and other users to track a system activity for deployments with multiple users and applications. The auditing facility can write audit events to the console, the syslog, a JSON file, or a BSON file.

7.47 Quiz

With this, we come to the end of this lesson. Following are a few questions to test your understanding of the concepts discussed here.

7.48 Summary

Here is a quick recap of what was covered in this lesson: • Capped collections in MongoDB contain collections with predefined sizes that support insert and retrieve operations. • When creating a capped collection, you need to specify the size, query a collection, check if the collection is capped, and convert a collection to capped. • The GridFS specification splits a file into different parts or chunks and stores each unit as a separate document. • GridFS uses two collections—fs.files and fs.chunks. • A memory-mapped file contains data stored by the operating system in its memory. • MongoDB supports multiple storage engines, such as MMAPv1 and WireTiger, and allows you to choose the one best suited to your needs. • The causes of performance issues in MongoDB include locks, memory usage, page faults, and the number of connections.

7.49 Conclusion

This concludes the lesson Administration of MongoDB cluster Operations.

  • Disclaimer
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Request more information

For individuals
For business
Phone Number*
Your Message (Optional)
We are looking into your query.
Our consultants will get in touch with you soon.

A Simplilearn representative will get back to you in one business day.

First Name*
Last Name*
Phone Number*
Job Title*