Mongodb fragment cluster for distributed document storage database

Time:2020-11-25

Previously, we talked about mongodb’s replica set and configuration replica set. Please refer tohttps://www.cnblogs.com/qiuhom-1874/p/13953598.htmlToday, let’s talk about the partition of mongodb;

1. What is fragmentation? Why split?

We know that the bottleneck of database server is usually caused by disk IO, high concurrency network IO, or CPU and memory of a single server. Therefore, in order to solve these bottlenecks, we must expand the server performance. Generally, the expansion server has upward expansion and outward expansion. The so-called upward expansion is to add larger disks to the server, so as to make the server more stable Use larger and better memory to replace better CPU; this expansion can solve the performance bottleneck problem to a certain extent, but as the amount of data increases, the bottleneck will appear again; therefore, this upward expansion method is not recommended; outward expansion refers to that one server can not add two, two is not enough to add three. In this way, we can expand as long as there is a bottleneck In this way, the performance of the server is solved, but how can users’ reading and writing be distributed to multiple servers? Therefore, we need to find a way to partition the data into multiple blocks, so that each server can only save part of the data set. In this way, a large data set can be sliced into multiple parts and stored on multiple servers in a decentralized way. This is called fragmentation. Fragmentation can effectively solve the performance bottleneck of users’ writing operation. Although it solves the service problem, it can effectively solve the problem of users’ writing operation performance At the same time, there is a new problem, that is, the query of users. If we spread the whole data set to multiple servers, how can users query the data? For example, if a user wants to query users older than 30, how to query? For users older than 30, there may be some data on Server1 and some data on server2. How can we query all the data that meet the conditions? This scenario is a bit similar to the mogilefs architecture we mentioned earlier. When a user uploads an image to mogilefs, the metadata of the image is first written into the tracker, and then the data is stored in the corresponding data node. In this way, the user can query and first find the tracker node, The tracker will tell the client the metadata of the user’s request file, and then the client will fetch the data from the corresponding data node, and finally piece together a picture. This is very similar on mongodb. The difference is that on mogilefs, the client needs to interact with the back-end data node to retrieve the data; on mongdb, the client does not need to directly interact with the back-end data node, Instead, it uses mongodb’s proprietary client agent to interact on behalf of the client, and finally returns the data to the client through the proxy. In this way, the user’s query problem can be solved. In short, the so-called fragmentation is to split a large data set into multiple partitions and store them on multiple servers in a decentralized manner. The purpose of fragmentation is to solve the problem of excessive data Performance problems caused by;

2. Data set fragmentation diagram

Tip: by slicing, we can divide the original 1t data set into 4 points on average. Each node stores 1 / 4 of the original data set, so that the original 1t data can be processed by four servers. This effectively improves the data processing process. This is also the significance of distributed systems. In mongodb, we process one of these data sets together The node of part of the data set is called shard. We call the mongodb cluster using this fragmentation mechanism as mongodb fragmentation cluster;

3. Mongodb partitioned cluster architecture

 

Tips: there are three types of roles in mongodb partitioned cluster. The first type is router role, which is mainly used to receive read and write requests from clients and mainly runs mongos. In order to make router role highly available, multiple nodes are usually used to form router high availability cluster. The second type is config Server. This type of role is mainly used to save the data in mongodb partitioned cluster and the metadata information of the cluster, which is a little similar to the role of tracker in mogilefs. In order to ensure the high availability of the config server, the config server is usually configured The server will also run it as a replica set; the third type is the shard role, which is mainly used to store data. In order to ensure the high availability and integrity of data, each shard is usually a replica set;

4. Working process of mongodb partitioned cluster

First, the user sends the request to the router, which receives the user’s request, and then goes to the config server to get the metadata information of the corresponding request. After getting the metadata information, the router then requests data from the corresponding shard, and finally integrates the data and responds to the user. In this process, router is equivalent to a client agent of mongodb, and config The server is used to store metadata information of data, which mainly includes the data stored on the shards and the corresponding data stored on those shards, which is very similar to the tracker on mogilefs. It mainly stores two tables, one is a table centered on data, the other is a table centered on shard nodes;

5. How is mongodb partitioned?

In mongodb’s shard cluster, shards are divided according to the collection field. We call the specified field shard key. According to the value range of shard key and application scenarios, we can partition based on the value range of shard key, or hash partition based on shard key. After splitting, the results are saved on the config server and in configserver For example, we partition the shardkey based on the shardkey range, and record a continuous range of shardkey values on the configserver, and save the shardkey values on one shardkey partition, as shown in the following figure

The above figure mainly describes range based shardkey partitioning. The data blocks from the minimum value of shardkey to the maximum value are divided into the first partition, the data blocks with the range of – 75 to – 25 are saved on the second partition, and so on. This kind of range based partitioning can easily lead to the data on a partition being too large, while some data blocks on some partitions are too large The data is very small, resulting in uneven data fragmentation. Therefore, in addition to the range of base and shard key values, hash shard can also be done based on the value of shard key, as shown in the following figure

Based on Hash fragmentation, we mainly do hash calculation for shardkey, and then save the corresponding data block in the corresponding partition according to which partition the final result falls on. For example, we hash the shandkey, and then calculate the number of slices. If the result is 0, then the corresponding data block is saved on the first partition. If the result is obtained, the corresponding data block will be saved on the first partition If 1 is saved on the second partition, and so on; this kind of hash based slicing can effectively reduce the imbalance of shard data, because the value calculated by hash is hash;

In addition to the above two slicing methods, we can also slice by region, also known as list based slicing, as shown in the following figure

The above figure mainly describes the partition based on region. The value range of shardkey is not a sequential set, but a discrete set. For example, we can slice the field of provinces across the country in this way. We can separate provinces with large traffic and several provinces with small traffic into one piece, so as to separate foreign visits or countries This kind of slicing is similar to the classification of shardkey. No matter what method we use, we should follow the principle that the more dispersed the writing operation is, the better the reading operation should be;

6. Mongodb partition cluster construction

Environmental description

host name role IP address
node01 router 192.168.0.41
node02/node03/node04 config server replication set

192.168.0.42

192.168.0.43

192.168.0.44

node05/node06/node07 shard1 replication set

192.168.0.45

192.168.0.46

192.168.0.47

node08/node09/node10 shard2 replication set

192.168.0.48

192.168.0.49

192.168.0.50

 

 

 

 

 

 

 

 

 

 

 

 

  

 

  

In the basic environment, all servers do time synchronization, turn off the firewall, close SELinux, SSH mutual trust, and host name resolution

Host name resolution

[[email protected] ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.0.99 time.test.org time-node
192.168.0.41 node01.test.org node01
192.168.0.42 node02.test.org node02
192.168.0.43 node03.test.org node03
192.168.0.44 node04.test.org node04
192.168.0.45 node05.test.org node05
192.168.0.46 node06.test.org node06
192.168.0.47 node07.test.org node07
192.168.0.48 node08.test.org node08
192.168.0.49 node09.test.org node09
192.168.0.50 node10.test.org node10
192.168.0.51 node11.test.org node11
192.168.0.52 node12.test.org node12
[[email protected] ~]#

After the basic environment is ready, configure the mongodb Yum source

[[email protected] ~]# cat /etc/yum.repos.d/mongodb.repo
[mongodb-org]
name = MongoDB Repository
baseurl = https://mirrors.aliyun.com/mongodb/yum/redhat/7/mongodb-org/4.4/x86_64/
gpgcheck = 1
enabled = 1
gpgkey = https://www.mongodb.org/static/pgp/server-4.4.asc
[[email protected] ~]# 

Copy mongodb Yum source to other nodes

[[email protected] ~]# for i in {02..10} ; do scp /etc/yum.repos.d/mongodb.repo node$i:/etc/yum.repos.d/; done
mongodb.repo                                                                  100%  206   247.2KB/s   00:00    
mongodb.repo                                                                  100%  206   222.3KB/s   00:00    
mongodb.repo                                                                  100%  206   118.7KB/s   00:00    
mongodb.repo                                                                  100%  206   164.0KB/s   00:00    
mongodb.repo                                                                  100%  206   145.2KB/s   00:00    
mongodb.repo                                                                  100%  206   119.9KB/s   00:00    
mongodb.repo                                                                  100%  206   219.2KB/s   00:00    
mongodb.repo                                                                  100%  206   302.1KB/s   00:00    
mongodb.repo                                                                  100%  206   289.3KB/s   00:00    
[[email protected] ~]# 

Install the mongodb org package on each node

for i in {01..10} ; do ssh node$i ' yum -y install mongodb-org '; done

Create the data directory and log directory on the config server and shard nodes, and change their owner and group to mongod

[[email protected] ~]# for i in {02..10} ; do ssh node$i 'mkdir -p /mongodb/{data,log} && chown -R mongod.mongod /mongodb/ && ls -ld /mongodb'; done
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:47 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
drwxr-xr-x 4 mongod mongod 29 Nov 11 22:45 /mongodb
[[email protected] ~]# 

Configure shard1 replication set

[[email protected] ~]# cat /etc/mongod.conf 
systemLog:
  destination: file
  logAppend: true
  path: /mongodb/log/mongod.log

storage:
  dbPath: /mongodb/data/
  journal:
    enabled: true

processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid
  timeZoneInfo: /usr/share/zoneinfo

net:
  bindIp: 0.0.0.0

sharding:
  clusterRole: shardsvr

replication:
  replSetName: shard1_replset
[[email protected] ~]# scp /etc/mongod.conf node06:/etc/
mongod.conf                                                                   100%  360   394.5KB/s   00:00    
[[email protected] ~]# scp /etc/mongod.conf node07:/etc/
mongod.conf                                                                   100%  360   351.7KB/s   00:00    
[[email protected] ~]#

Configure shard2 replication set

[[email protected] ~]# cat /etc/mongod.conf
systemLog:
  destination: file
  logAppend: true
  path: /mongodb/log/mongod.log

storage:
  dbPath: /mongodb/data/
  journal:
    enabled: true

processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid
  timeZoneInfo: /usr/share/zoneinfo

net:
  bindIp: 0.0.0.0

sharding:
  clusterRole: shardsvr

replication:
  replSetName: shard2_replset
[[email protected] ~]# scp /etc/mongod.conf node09:/etc/
mongod.conf                                                                   100%  360   330.9KB/s   00:00    
[[email protected] ~]# scp /etc/mongod.conf node10:/etc/
mongod.conf                                                                   100%  360   385.9KB/s   00:00    
[[email protected] ~]# 

Start shard1 replication set and shard2 replication set

[[email protected] ~]# systemctl start mongod.service 
[[email protected] ~]# ss -tnl
State      Recv-Q Send-Q           Local Address:Port                          Peer Address:Port              
LISTEN     0      128                          *:22                                       *:*                  
LISTEN     0      100                  127.0.0.1:25                                       *:*                  
LISTEN     0      128                          *:27018                                    *:*                  
LISTEN     0      128                         :::22                                      :::*                  
LISTEN     0      100                        ::1:25                                      :::*                  
[[email protected] ~]#for i in {06..10} ; do ssh node$i 'systemctl start mongod.service && ss -tnl';done
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
LISTEN     0      128          *:22                       *:*                  
LISTEN     0      100    127.0.0.1:25                       *:*                  
LISTEN     0      128          *:27018                    *:*                  
LISTEN     0      128         :::22                      :::*                  
LISTEN     0      100        ::1:25                      :::*                  
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
LISTEN     0      128          *:22                       *:*                  
LISTEN     0      100    127.0.0.1:25                       *:*                  
LISTEN     0      128          *:27018                    *:*                  
LISTEN     0      128         :::22                      :::*                  
LISTEN     0      100        ::1:25                      :::*                  
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
LISTEN     0      128          *:22                       *:*                  
LISTEN     0      100    127.0.0.1:25                       *:*                  
LISTEN     0      128          *:27018                    *:*                  
LISTEN     0      128         :::22                      :::*                  
LISTEN     0      100        ::1:25                      :::*                  
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
LISTEN     0      128          *:22                       *:*                  
LISTEN     0      100    127.0.0.1:25                       *:*                  
LISTEN     0      128          *:27018                    *:*                  
LISTEN     0      128         :::22                      :::*                  
LISTEN     0      100        ::1:25                      :::*                  
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
LISTEN     0      128          *:22                       *:*                  
LISTEN     0      100    127.0.0.1:25                       *:*                  
LISTEN     0      128          *:27018                    *:*                  
LISTEN     0      128         :::22                      :::*                  
LISTEN     0      100        ::1:25                      :::*                  
[[email protected] ~]# 

Tip: by default, the shard listening port is not specified. It listens on port 27018 by default. After starting the shard node, please ensure that port 27018 listens normally;

Connect mongodb of node05 to initialize shard1_ Replset replica set

> rs.initiate(
...   {
...     _id : "shard1_replset",
...     members: [
...       { _id : 0, host : "node05:27018" },
...       { _id : 1, host : "node06:27018" },
...       { _id : 2, host : "node07:27018" }
...     ]
...   }
... )
{
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605107401, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1605107401, 1)
}
shard1_replset:SECONDARY>

Connect mongodb of node08 to initialize shard2_ Replset replica set

> rs.initiate(
...   {
...     _id : "shard2_replset",
...     members: [
...       { _id : 0, host : "node08:27018" },
...       { _id : 1, host : "node09:27018" },
...       { _id : 2, host : "node10:27018" }
...     ]
...   }
... )
{
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605107644, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1605107644, 1)
}
shard2_replset:OTHER> 

Configure configserver replication set

[[email protected] ~]# cat /etc/mongod.conf
systemLog:
  destination: file
  logAppend: true
  path: /mongodb/log/mongod.log

storage:
  dbPath: /mongodb/data/
  journal:
    enabled: true

processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod.pid
  timeZoneInfo: /usr/share/zoneinfo

net:
  bindIp: 0.0.0.0

sharding:
  clusterRole: configsvr

replication:
  replSetName: cfg_replset
[[email protected] ~]# scp /etc/mongod.conf node03:/etc/mongod.conf 
mongod.conf                                                                   100%  358   398.9KB/s   00:00    
[[email protected] ~]# scp /etc/mongod.conf node04:/etc/mongod.conf  
mongod.conf                                                                   100%  358   270.7KB/s   00:00    
[[email protected] ~]# 

Start the config server

[[email protected] ~]# systemctl start mongod.service 
[[email protected] ~]# ss -tnl
State      Recv-Q Send-Q           Local Address:Port                          Peer Address:Port              
LISTEN     0      128                          *:27019                                    *:*                  
LISTEN     0      128                          *:22                                       *:*                  
LISTEN     0      100                  127.0.0.1:25                                       *:*                  
LISTEN     0      128                         :::22                                      :::*                  
LISTEN     0      100                        ::1:25                                      :::*                  
[[email protected] ~]# ssh node03 'systemctl start mongod.service && ss -tnl'  
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
LISTEN     0      128          *:27019                    *:*                  
LISTEN     0      128          *:22                       *:*                  
LISTEN     0      100    127.0.0.1:25                       *:*                  
LISTEN     0      128         :::22                      :::*                  
LISTEN     0      100        ::1:25                      :::*                  
[[email protected] ~]# ssh node04 'systemctl start mongod.service && ss -tnl' 
State      Recv-Q Send-Q Local Address:Port               Peer Address:Port              
LISTEN     0      128          *:27019                    *:*                  
LISTEN     0      128          *:22                       *:*                  
LISTEN     0      100    127.0.0.1:25                       *:*                  
LISTEN     0      128         :::22                      :::*                  
LISTEN     0      100        ::1:25                      :::*                  
[[email protected] ~]# 

Tip: by default, the config server listens on the 27019 port without specifying the port. After starting, make sure that the port is listening normally;

Connect mongodb of node02 and initialize CFG_ Replset replica set

> rs.initiate(
...   {
...     _id: "cfg_replset",
...     configsvr: true,
...     members: [
...       { _id : 0, host : "node02:27019" },
...       { _id : 1, host : "node03:27019" },
...       { _id : 2, host : "node04:27019" }
...     ]
...   }
... )
{
        "ok" : 1,
        "$gleStats" : {
                "lastOpTime" : Timestamp(1605108177, 1),
                "electionId" : ObjectId("000000000000000000000000")
        },
        "lastCommittedOpTime" : Timestamp(0, 0),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605108177, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        },
        "operationTime" : Timestamp(1605108177, 1)
}
cfg_replset:SECONDARY> 

Configure router

[[email protected] ~]# cat /etc/mongos.conf
systemLog:
   destination: file
   path: /var/log/mongodb/mongos.log
   logAppend: true

processManagement:
   fork: true

net:
   bindIp: 0.0.0.0
sharding:
  configDB: "cfg_replset/node02:27019,node03:27019,node04:27019"
[[email protected] ~]# 

Note: configdb must be in the form of replica set name / member listening address: port, and at least one member must be written;

Start router

[[email protected] ~]# mongos -f /etc/mongos.conf
about to fork child process, waiting until server is ready for connections.
forked process: 1510
child process started successfully, parent exiting
[[email protected] ~]# ss -tnl
State      Recv-Q Send-Q           Local Address:Port                          Peer Address:Port              
LISTEN     0      128                          *:22                                       *:*                  
LISTEN     0      100                  127.0.0.1:25                                       *:*                  
LISTEN     0      128                          *:27017                                    *:*                  
LISTEN     0      128                         :::22                                      :::*                  
LISTEN     0      100                        ::1:25                                      :::*                  
[[email protected] ~]#

Connect mongos and add shard1 replication set and shard2 replication set

mongos> sh.addShard("shard1_replset/node05:27018,node06:27018,node07:27018")
{
        "shardAdded" : "shard1_replset",
        "ok" : 1,
        "operationTime" : Timestamp(1605109085, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605109086, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> sh.addShard("shard2_replset/node08:27018,node09:27018,node10:27018")
{
        "shardAdded" : "shard2_replset",
        "ok" : 1,
        "operationTime" : Timestamp(1605109118, 2),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605109118, 3),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos>

Tip: adding shard replica set also requires specifying the replica set name / member format;

At this point, the partition cluster is configured

View sharding cluster status

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5fac01dd8d6fa3fe899662c8")
  }
  shards:
        {  "_id" : "shard1_replset",  "host" : "shard1_replset/node05:27018,node06:27018,node07:27018",  "state" : 1 }
        {  "_id" : "shard2_replset",  "host" : "shard2_replset/node08:27018,node09:27018,node10:27018",  "state" : 1 }
  active mongoses:
        "4.4.1" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  yes
        Collections with active migrations: 
                config.system.sessions started at Wed Nov 11 2020 23:43:14 GMT+0800 (CST)
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                45 : Success
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard1_replset  978
                                shard2_replset  46
                        too many chunks to print, use verbose if you want to force print
mongos> 

Tip: you can see that there are two shard replica sets in the current shard cluster, which are shard1_ Replset and shard2_ Replset; and a config server

Enable sharding for testdb database

mongos> sh.enableSharding("testdb")
{
        "ok" : 1,
        "operationTime" : Timestamp(1605109993, 9),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605109993, 9),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5fac01dd8d6fa3fe899662c8")
  }
  shards:
        {  "_id" : "shard1_replset",  "host" : "shard1_replset/node05:27018,node06:27018,node07:27018",  "state" : 1 }
        {  "_id" : "shard2_replset",  "host" : "shard2_replset/node08:27018,node09:27018,node10:27018",  "state" : 1 }
  active mongoses:
        "4.4.1" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                214 : Success
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard1_replset  810
                                shard2_replset  214
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "testdb",  "primary" : "shard2_replset",  "partitioned" : true,  "version" : {  "uuid" : UUID("454aad2e-b397-4c88-b5c4-c3b21d37e480"),  "lastMod" : 1 } }
mongos> 

Tip: after starting the sharding function for a database, it will partition us into a primary shard. The so-called primary shard is used to store the collecitons that are not partitioned in the database; the collection of shards will be distributed on each shard;

Enable sharding for the people collection under testdb library, and indicate to do range based fragmentation in the age field

mongos> sh.shardCollection("testdb.peoples",{"age":1})
{
        "collectionsharded" : "testdb.peoples",
        "collectionUUID" : UUID("ec095411-240d-4484-b45d-b541c33c3975"),
        "ok" : 1,
        "operationTime" : Timestamp(1605110694, 11),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605110694, 11),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5fac01dd8d6fa3fe899662c8")
  }
  shards:
        {  "_id" : "shard1_replset",  "host" : "shard1_replset/node05:27018,node06:27018,node07:27018",  "state" : 1 }
        {  "_id" : "shard2_replset",  "host" : "shard2_replset/node08:27018,node09:27018,node10:27018",  "state" : 1 }
  active mongoses:
        "4.4.1" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                408 : Success
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard1_replset  616
                                shard2_replset  408
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "testdb",  "primary" : "shard2_replset",  "partitioned" : true,  "version" : {  "uuid" : UUID("454aad2e-b397-4c88-b5c4-c3b21d37e480"),  "lastMod" : 1 } }
                testdb.peoples
                        shard key: { "age" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard2_replset  1
                        { "age" : { "$minKey" : 1 } } -->> { "age" : { "$maxKey" : 1 } } on : shard2_replset Timestamp(1, 0) 
mongos> 

Tip: if the corresponding collection exists, we need to create a shardkey index on the collection first, and then use the sh.shardCollection () to enable sharding function for colleciton; partition based on range, we can do it in multiple fields;

Partition based on Hash

mongos> sh.shardCollection("testdb.peoples1",{"name":"hashed"})
{
        "collectionsharded" : "testdb.peoples1",
        "collectionUUID" : UUID("f6213da1-7c7d-4d5e-8fb1-fc554efb9df2"),
        "ok" : 1,
        "operationTime" : Timestamp(1605111014, 2),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1605111014, 2),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5fac01dd8d6fa3fe899662c8")
  }
  shards:
        {  "_id" : "shard1_replset",  "host" : "shard1_replset/node05:27018,node06:27018,node07:27018",  "state" : 1 }
        {  "_id" : "shard2_replset",  "host" : "shard2_replset/node08:27018,node09:27018,node10:27018",  "state" : 1 }
  active mongoses:
        "4.4.1" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  yes
        Collections with active migrations: 
                config.system.sessions started at Thu Nov 12 2020 00:10:16 GMT+0800 (CST)
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                480 : Success
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
                config.system.sessions
                        shard key: { "_id" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard1_replset  543
                                shard2_replset  481
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "testdb",  "primary" : "shard2_replset",  "partitioned" : true,  "version" : {  "uuid" : UUID("454aad2e-b397-4c88-b5c4-c3b21d37e480"),  "lastMod" : 1 } }
                testdb.peoples
                        shard key: { "age" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                shard2_replset  1
                        { "age" : { "$minKey" : 1 } } -->> { "age" : { "$maxKey" : 1 } } on : shard2_replset Timestamp(1, 0) 
                testdb.peoples1
                        shard key: { "name" : "hashed" }
                        unique: false
                        balancing: true
                        chunks:
                                shard1_replset  2
                                shard2_replset  2
                        { "name" : { "$minKey" : 1 } } -->> { "name" : NumberLong("-4611686018427387902") } on : shard1_replset Timestamp(1, 0) 
                        { "name" : NumberLong("-4611686018427387902") } -->> { "name" : NumberLong(0) } on : shard1_replset Timestamp(1, 1) 
                        { "name" : NumberLong(0) } -->> { "name" : NumberLong("4611686018427387902") } on : shard2_replset Timestamp(1, 2) 
                        { "name" : NumberLong("4611686018427387902") } -->> { "name" : { "$maxKey" : 1 } } on : shard2_replset Timestamp(1, 3) 
mongos> 

Tip: hashing can only be done in one field, and multiple fields cannot be specified. You can see from the above status information testdb.peoples People 1 is divided into shard1 and shard2. Therefore, how many pieces of data are inserted into people, it will be written to shard2, and the data inserted into peoples1 will be written to shard1 and shard 2;

Validation: insert data on the peoples1 set to see if the data is partitioned to different shards?

Insert data on mongos

mongos> use testdb
switched to db testdb
mongos> for (i=1;i<=10000;i++) db.peoples1.insert({name:"people"+i,age:(i%120),classes:(i%20)})
WriteResult({ "nInserted" : 1 })
mongos> 

View data on shard1

shard1_replset:PRIMARY> show dbs
admin   0.000GB
config  0.001GB
local   0.001GB
testdb  0.000GB
shard1_replset:PRIMARY> use testdb
switched to db testdb
shard1_replset:PRIMARY> show tables
peoples1
shard1_replset:PRIMARY> db.peoples1.find().count()
4966
shard1_replset:PRIMARY> 

Tip: we can see that 4966 pieces of data are saved in the corresponding collection on shard1;

View data on shard2

shard2_replset:PRIMARY> show dbs
admin   0.000GB
config  0.001GB
local   0.011GB
testdb  0.011GB
shard2_replset:PRIMARY> use testdb
switched to db testdb
shard2_replset:PRIMARY> show tables
peoples
peoples1
shard2_replset:PRIMARY> db.peoples1.find().count()
5034
shard2_replset:PRIMARY> 

Tip: on shard2, we can see that there are people set and peoples1 set, in which the peoples1 set stores 5034 data; shard1 and shard2 save a total of 10000 data that we just inserted;

OK, the partition cluster of mongodb is set up and the test is completed;