Mongodb replica set

Time:2021-7-29

1、 Introduction to mongodb

Mongodb is a document oriented database. In fact, it is a database between relational and non relational. In the world of mongodb, there is no row concept. Instead, the document model is used. Documents and arrays can be embedded in documents, which is very flexible. It supports JSON and bson data formats and can store complex data types, so it is also favored by DBAs.
In the production environment, a single node is rarely used to support business traffic, mainly due to the consideration of node performance and data security. Mongodb can use replica set to realize data backup, fault recovery and other functions, and use sharding technology to make the cluster store more data, achieve greater load, and ensure the load balance of storage. This article mainly introduces replica set, and the related sharding technology will be introduced in the following articles.

2、 What is a replica set

Replica set is a group of servers running mongod process and maintaining the same dataset. One of them is the primary server, which is used to process client requests, and the other nodes are the secondary server, which is used to save database replicas. When the primary server hangs up, other backup servers will automatically select a node as the new primary server to ensure data security and service availability.
The following is an official structure diagram:

Mongodb replica set

The primary server accepts read-write requests from all clients (the backup server can also accept read requests through configuration, which is not recommended in environments requiring strong data consistency). The primary server will record changes to the data in the operation log, that is, oplog (similar to binlog in MySQL). The backup server will copy this log from the primary server, Then it is applied to its own database to ensure that the data is consistent with the main database. Each node sends heartbeat information to each other.
When the primary server goes down or loses contact with other nodes due to network failure (10 seconds), one of the backup servers will initiate an election process to promote itself to a new primary server. The whole process is usually completed within one minute.

Mongodb replica set

3、 Replica set members

Before introducing the principle of replication, let’s take a look at the members in the entire replica set and their functions.

1. Primary server

The master server is used to process client requests. By default, all client read and write requests reach the master server.

2. Backup server (secondary)

Save a copy of the data and participate in the election in case of primary server failure. It is used to ensure high availability of clusters. By default, the backup server does not accept read requests.

3. Arbiter

Mongodb supports a special type of member — arbiter. It is not used to save data. The only thing that can be used is to participate in the election process and fulfill the obligation of voting. If the application data is relatively small, but you want to use replica set to prevent the risk of accidental data loss, and using multiple data nodes is a waste of resources, the arbitrator will be the best choice. It can run as a lightweight process on a poorly configured server. It can be deployed in a fault domain different from the data node, which will enhance the robustness of the replica set.
When using arbitrators, you should pay attention to the following two limitations:

  • 1. There can only be one arbitrator in a replica set
  • 2. If conditions permit, do not use arbitrators when data nodes can be used

In a small replica set, such as a replica set of three nodes, use a primary server, a backup server, and an arbitrator. Suppose the primary server is down (even the data is damaged), and the other backup server is promoted to the primary server. At this time, if the new primary server is down, and the arbitrator does not save the data, it will be a devastating blow. Therefore, we must consider all aspects before choosing an arbitrator.

The above three are the main member types in a replica set. The following two are actually backup nodes, but they have some special functions, so they are introduced separately.

4. Hide members

A hidden member is not a replication source in a replica set and is invisible to clients (it can be set as a hidden member only when its priority is 0). Many friends like to hide some servers with less powerful performance.

5. Delayed backup node

As the name suggests, a delayed backup node is a node that lags behind the primary server data for a certain period of time. The purpose of this node is to prevent major failures. For example, a DBA friend shakes his hand and deletes some data. After executing the delete command, he suddenly realizes that the data on the primary server is gone and the data on the fast-running backup server is gone. However, due to the delay in configuring the backup node to lag behind the primary server for a period of time, the data on it is still there, This replica set can be saved.
The priority of the delayed backup node should also be 0. In order to ensure data consistency, the delayed backup node should also be set as a hidden member.

4、 Data synchronization

The replication function of mongodb is realized through the operation log oplog. The client’s changes to the data will be written into the oplog. In fact, it is also a collection stored in the local database. In particular, it is a capped collection, that is, it does not save all the changes to the database, but only a part. When the records occupy this collection, The new operation log will flush out the old one. Each member in the replica set will maintain its own oplog.
In addition, if an operation in oplog is executed multiple times on the node, the effect is the same as that of executing it once. The advantage of this design is to avoid the primary server hanging up during the synchronization of the backup node, and the oplog copied from the new primary server overlaps with the oplog copied from the old primary server.
The data synchronization of mongodb can also be divided into two types:Initial syncingandIncremental synchronization

1. Initialize synchronization

We can also understand it as full synchronization. Generally, members in the replica set will enter this stage after they are started or newly added.
There are three conditions for triggering initialization synchronization:

  • The oplog.rs collection in the local database is empty.
  • The minvalid collection stores_ Initialsyncflag (used for init sync failure processing)
  • Initialsyncrequested is true (for resync command, resync is applicable to master-slave architecture, replica set is not applicable)

The whole initialization synchronization phase includes the following steps:

  1. Select the synchronization source. At this time, it will create its own identifier in local.me, and then delete all its data (except the local database) to synchronize with a new one.
  2. Clone. The simple understanding is to copy all data on the synchronization source to the local.
  3. After the cloning is completed, the oplog synchronization phase begins. This phase is divided into two steps. The first step: if the document is moved during the cloning process, clone it again; Step 2: record the operations in step 1.
  4. Create index.
  5. Synchronizes data changes that occur on the primary server during index creation.
  6. After initialization synchronization, the node becomes a backup server.

2. Incremental synchronization

After the initialization of the cursor node, the slave node will continue to synchronize to the primary database. After the initialization of the cursor node, the slave node will continue to synchronize to the primary database. The tailable cursor cursor is similar to the tailf command in Linux. Incremental synchronization will be completed by several threads. The specific process will be introduced in later articles.

5、 Member status

We already know that heartbeat messages will be sent between nodes in the replica set every 2 seconds (as can be seen from packet capturing), and the messages will contain the status information of the nodes themselves. Let’s see what states the members will have and under what circumstances they will enter these states.

  1. RPIMARY: exclusive to the primary server.
  2. Secondary: exclusive to the backup server.
  3. Arbiter: exclusive to the arbiter.
  4. Startup: when the member is started, it will enter this state. In this state, some replica set configuration information will be loaded. After loading, it will enter the startup2 stage.
  5. Startup2: we mentioned initialization synchronization in the previous section. During the whole initialization synchronization process, the member’s state is startup2.
  6. Recovering: each member will enter this state before becoming a backup node. This status indicates that the node is running normally, but it needs to further check itself. Members in this status cannot process read requests.
  7. Down: the member will be in the down state when it is lost.
  8. Unknown: if a member in the replica set cannot be connected to other members, and other members cannot get its status information, it will be set to unknown status.
  9. Removed: the member is moved out of the replica set, that is, it is set to this state.
  10. Rollback: when the member enters the rollback phase, it will be set to this state. After the rollback is completed, it will enter the recovering state, and then become the backup server.
  11. Fatal: the member will be set to this state after a failure that cannot be repaired automatically.

Officially, these 11 states are divided into three categories:
Core states (including RPIMARY, secondary and arbor), other states (including startup, startup2 and recovering), error states (including unknown, down, removed, rollback and fatal).