Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Time:2020-12-7

1. Background

A core Java long connection service uses mongodb as the main storage, and hundreds of clients connect to the same mongodb cluster. There are several performance jitters in a short period of time. In addition, there is an “avalanche” failure, and the traffic drops to zero instantly, which cannot be automatically recovered. This paper analyzes the root causes of these two failures, including unreasonable use of client configuration, unreasonable mongodb kernel link authentication, incomplete proxy configuration and a series of problems. Finally, through multi-party efforts to determine the root cause of the problem.

The cluster has more than ten business interfaces to access, and each interface is deployed on dozens of business servers. The total number of clients accessing the mongodb machine exceeds hundreds, and some requests pull dozens or even more than 100 lines of data at a time.

The cluster is a multi live cluster with two machine rooms in the same city (the election Festival does not consume too much resources, and the third machine room from other places will deploy election nodes). The architecture is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

As can be seen from the above figure, in order to achieve multi activity, there are corresponding agents deployed in each computer room. The client of the corresponding computer room links to the mongos agent of the corresponding computer room, and there are multiple agents in each computer room. IP: port address list of proxy layer deployment (Note: not real IP address) is as follows:

Agent address list of computer room a: 1.1.1:111, 2.2.2.2:1111, 3.3.3.3:1111

Agent address list of computer room B: 4.4.4.4:1111, 4.4.4.4:2222

Three agents in machine room a are deployed in three different physical machines, and two agents in machine room B are deployed in the same physical machine. In addition, machine room a and machine room B are in the same city, and the inter machine room access delay can be ignored.

Both cluster storage layer and config server adopt the same architecture: a machine room (1 master node + 1 slave node) + B machine room (2 slave nodes) + C machine room (1 election node arbiter), that is, 2 (data node) + 2 (data node) + 1 (election node) mode.

The multi Activity Architecture of the computer room can ensure that any computer room is hung up and has no impact on the business of another computer room. The specific principle of the machine room liveliness is as follows:

  1. If computer room a hangs up, because the agent is a stateless node, the agent of machine room B will not be affected by the hang up of machine room a.
  2. If computer room a hangs up and the primary node is in machine room a, then there are three nodes in total of the two data nodes in machine room B and the election node in Room C, which can guarantee that more than half of the nodes are needed for the new election. Therefore, the data node of machine Room B will elect a new master node in a short time, so that the access of the entire storage layer will not be affected.

This paper focuses on the following six questions:

  1. Why does burst traffic jitter?
  2. Why does the data node not have any slow logs, but the agent load is missing 100%?
  3. Why can’t mongos cause avalanche?
  4. Why does the agent jitter in one computer room? After the corresponding computer room service is switched to another computer room, it still shakes?
  5. Why is the packet capture analysis when the exception occurs? The client frequently creates and breaks the chain, and the interval between the creation and disconnection of the same link is very short?
  6. In theory, the agent is a seven layer forwarding, which consumes less resources and should be faster than mongod storage. Why does mongod storage node have no jitter and mongos proxy lacks jitter?

2. Failure process

2.1 occasional traffic peak and service jitter?

The cluster has several short jitters in a period of time. When the client of machine room a jitters, it is found that the corresponding agent load of machine room a is very high. Therefore, the agent of machine room a is switched to access the agent of machine room B. However, the agent of machine room B shakes after the switch, that is, the multi active handover has no effect. The specific process is analyzed as follows.

2.1.1 slow log analysis of storage node

Firstly, the monitoring information of CPU, MEM, IO and load of all mongod storage nodes in the cluster is analyzed, and it is found that everything is normal. Then the slow log of each mongod node is analyzed (because the cluster is sensitive to delay, the slow log is adjusted to 30ms). The analysis results are as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

As can be seen from the above figure, the storage node does not have any slow logs during service jitter, so it can be judged that everything is normal for the storage node, and the service jitter is independent of mongod storage node.

2.1.2 mongos proxy analysis

There are no problems with the storage node, so start troubleshooting the mongos proxy node. Due to historical reasons, the cluster is deployed on other platforms. The platform does not fully monitor QPS and delay, and the monitoring fails to detect the early jitter in time. After shaking, the platform cluster is migrated to the new management and control platform developed by oppo. The new platform has detailed monitoring information. The QPS monitoring curve after migration is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

At each time point of increasing traffic, there is a wave of timeout or jitter in the corresponding service monitoring, as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

After analyzing the corresponding mongos log, we find the following phenomenon: jitter time point mongos.log Log has a large number of link building and breaking processes, as shown in the following figure:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

As can be seen from the above figure, thousands of links are established and thousands of links are broken in one second. In addition, many links are broken in a short time by packet capture. The phenomenon is as follows (chain breaking time chain building time = 51ms, and some links are broken more than 100 ms.):

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

The corresponding packet capture is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

In addition, the low peak period of client links on the machine agent is very high, even exceeding the normal QPS value. The QPS is about 7000-8000, but the conn link is missing as high as 13000. The monitoring information obtained by mongostat is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

2.1.3 agent machine load analysis

Each time the traffic burst occurs, the agent load is very high. The deployment script is used to sample periodically. The corresponding monitoring chart of jitter time point is shown in the following figure:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

As can be seen from the above figure, the CPU load is very high at each peak of traffic, and the load is sy% and US% is very low. At the same time, the load is even as high as hundreds, sometimes even over a thousand.

2.1.4 jitter analysis summary

From the above analysis, we can see that the system load is very high due to the burst traffic of some time points. Is the root cause really due to sudden traffic? In fact, it is not. Please see the follow-up analysis. This is actually a wrong conclusion. Within a few days, the same cluster avalanched.

Therefore, the corresponding interface of burst traffic is sorted out, and the interface is dropped after sorting out. The QPS monitoring curve is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

In order to reduce the service jitter, the burst traffic interface is dropped, and the service will not jitter for several hours. After dropping the burst traffic interface, we have also done the following things:

  1. Since the real reason of mongos load is not found, each computer room expands the capacity of mongs agents, keeps 4 agents in each computer room, and ensures that all agents are in different servers, so as to reduce the agent load as far as possible.
  2. Inform all 8 agents in the business configuration of machine room a and room B, instead of only configuring the agent of corresponding machine room in each computer room (because after the first service jitter, we analyze the Java SDK of mongodb, and determine that the SDK balancing strategy will automatically eliminate the agents with high request delay. If an agent fails again next time, it will be automatically removed).
  3. The notification service increases the timeout time of all clients to 500ms.

But,There are many doubts and suspense in my heart, mainly in the following points:

  1. There are 4 storage nodes and 5 proxy nodes. There is no jitter in the storage nodes. On the contrary, the proxy load of layer 7 forwarding is high?
  2. Why is it that packets are disconnected after discovering many new connections for tens of MS or more than 100 ms? Frequent chain building and chain breaking?
  3. Why is the agent QPS only tens of thousands, when the agent CPU consumption is very high, and all is sy% system load? Based on my years of experience in middleware agent development, the agent consumes very few resources, and the CPU only consumes us%, not sy%.

2.2 the same business “avalanched” a few days later

After a few days, more serious failures occurred. The business flow of computer room B directly dropped 0 at a certain time. It is not a simple jitter problem, but a direct traffic drop of 0. The sy% load of the system is 100%, and almost 100% of the services are reconnected.

2.2.1 monitoring system

Machine CPU and system load monitoring are as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

As can be seen from the above figure, it is almost consistent with the phenomenon that the system load caused by the burst traffic is too high. The service CPU sy% load is 100%, and the load is very high. Log in the machine to get the top information, and the phenomenon is consistent with the monitoring.

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

The corresponding network monitoring at the same time is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Disk IO monitoring is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

From the above system monitoring analysis, we can see that during the period of the problem, the system CPU sy% and load load load are very high, the network read-write traffic almost drops 0, and the disk IO is normal. We can see that the whole process is almost consistent with the jitter problem caused by the previous burst traffic.

2.2.2 how to recover business

After the jitter problem caused by the first burst traffic, we expand the capacity of all agents to 8, and inform the service to configure all agents on all service interfaces. Due to the numerous service interfaces, the business of machine room B is not configured with all agents, but only two agents (4.4.4.4:1111 and 4.4.4.4:2222) in the same physical machine are configured. Finally, a performance bottleneck of mongodb is triggered (see the analysis later), causing the “avalanche” of the whole mongodb cluster

Finally, the business through the restart service, at the same time, the 8 agents in the B machine room are configured at the same time, the problem is solved.

2.2.3 monitoring and analysis of mongos proxy instance

By analyzing the agent log in this time period, we can see the same phenomenon as in 2.1, a large number of new key connections, and the new connections are closed after tens of MS and more than 100 ms. The whole phenomenon is consistent with the previous analysis, and there is no statistical analysis of the corresponding log.

In addition, analyzing the proxy QPS monitoring at that time, the QPS access curve of normal query read requests is as follows. During the failure period, QPS almost fell to zero and avalanched

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Command statistical monitoring curve is as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

From the above statistics, it can be seen that when the traffic fault time point of the agent node has a spike, and the command statistics of the time point instantly soars to 22000 (it may be higher in fact, because we monitor the sampling period of 30s, here is the average value), that is to say, 22000 connections come in instantly. Command statistics is actually db.ismaster () according to statistics, the first message after the client connects to the server is the ismaster message, and the server executes it db.ismaster () and then reply to the client. After receiving it, the formal SASL authentication process starts.

The normal client access process is as follows:

  1. The client initiates a link with mongos
  2. After the mongos server accepts the link, the link is established successfully
  3. Client send db.isMaster () command to the server
  4. The server answers ismaster to the client
  5. Client initiates SASL authentication with mongos proxy (multiple interactions with mongos)
  6. The client initiates the normal find () process

Send after the client SDK link is successfully established db.isMaster () the purpose of the server is to balance the load and determine the type of the node, so as to ensure that the client can quickly perceive the agent with high delay of access time, so as to quickly eliminate the node with high round-trip delay and determine the type of node to be visited.

In addition, through the script deployed in advance, the script will automatically capture packets when the system load is high. The analysis results of packet capture are shown in the following figure:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

The time sequence analysis of the above figure is as follows:

  1. 11: 21:59.506174 link established successfully
  2. 11: 21:59.506254 sent by client db.IsMaster () to the server
  3. 11: 21:59.656479 client sends fin disconnection request
  4. 11: 21:59.674717 sent by server db.IsMaster () reply to client
  5. 11: 21:59.675480 client direct RST

The difference between the third and the first message is about 150ms. Finally, the timeout configuration corresponding to the client IP is determined with the service, which is determined to be 150ms. In addition, there are other timeout configurations such as 40ms and 100ms in other packets capture. Through the confirmation of the corresponding client and service, it is determined that the timeout time of the corresponding client service interface is 40ms, 100ms, etc. Therefore, combined with packet capture and client configuration, it can be determined that when the agent exceeds the specified timeout time, it has not been given to the client db.isMaster () returns a value, the client will immediately time out, and immediately initiate a reconnection request after the timeout.

Summary:Through packet capture and mongos log analysis, it can be determined that the reason for quick disconnection after link establishment is: the first request of client access agent db.isMaster () timed out, causing client reconnection. After reconnection, it started to acquire db.isMaster () request, because the load CPU is 100%, the request will time out after each reconnection. The client with a timeout of 500ms is configured db.isMaster () will not time out, so SASL authentication process will be followed.

Therefore, it can be seen that the high system load is related to the repeated chain building and chain breaking. At a certain moment, a large number of links (2.2W) are established by the client, which causes the high load. Moreover, due to the different configuration of the timeout time of the client, the client will eventually enter the SASL process and obtain the random number from the kernel state, causing high sy% load and high sy% load causing client timeout The access process becomes an “dead loop” and eventually causes the mongos proxy avalanche.

2.3 offline simulated fault

After all, we can’t find out the reason why there is such a problem when the CPU load reaches 400000 seconds, but we can’t find out the reason why there is such a problem when the CPU load reaches 400000 seconds. Therefore, the key point of this fault is to analyze why the repeated chain breaking causes the sy% load of the system to be 100%.

2.3.1 simulation of fault process

The simulation steps of frequent chain building and chain breaking are as follows:

  1. Modify the mongos kernel code to delay all requests by 600ms
  2. The same machine starts two identical mongos, which are distinguished by ports
  3. The client enables 6000 concurrent links with a timeout of 500ms

Through the above operation, all requests can be guaranteed to time out. After the timeout, the client will immediately start to rebuild the chain. After the chain is established again, the access to mongodb will also time out. This simulates the process of repeatedly building and breaking the chain. In addition, in order to ensure the consistency with avalanche failure environment, two mongos agents are deployed in the same physical machine.

2.3.2 fault simulation test results

In order to ensure that the hardware environment of mongos proxy is consistent with the fault, the same type of server with the same operating system version (2.6.32-642.el6.x86) is selected_ 64). After all the programs run, the problem immediately appears

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Since the operating system version linux-2.6 of the failed server is too low, it is suspected that there may be a problem with the operating system version. Therefore, upgrade a physical machine of the same type to linux-3.10. The test results are as follows:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

As can be seen from the above figure, the client 6000 is reconnected repeatedly and the server pressure is normal. All CPU consumption is us% and sy% consumption is very low. The CPU consumption in user mode is 3 CPUs, while that in kernel mode is almost 0. This is the normal result we expect. Therefore, we think that the problem may be related to the operating system version.

In order to verify whether linux-3.10 kernel version has the same problem of high CPU consumption in SY% kernel mode in linux-3.10 kernel version, the concurrency is increased from 6000 to 30000. The verification results are as follows:

Test results:By modifying the mongodb kernel version, the client is deliberately allowed to repeatedly build and break the chain. In linux-2.6 version, the CPU sy% 100% problem of concurrent repeated chain building and chain breaking system over 1500 can appear. However, in linux-3.10, when the concurrency reaches 10000, the sy% load gradually increases. The higher the concurrency, the higher the sy% load.

Summary:In linux-2.6 system, as long as mongodb has thousands of chain breaks per second, the sy% load of the system will be close to 100%. In linux-3.10, the sy% load can reach 30% when 20000 concurrent links are established and broken repeatedly. With the increase of client concurrency, sy% load also increases correspondingly. Compared with version 2.6, linux-3.10 has great performance improvement in the scenario of repeated chain building and chain breaking, but it can not solve the fundamental problem.

2.4 the root cause of SY% 100% was caused by repeated chain building and chain breaking

In order to analyze the reason for the high load of% sy system, install perf to obtain the system top information. It is found that all CPU consumption is in the following interfaces:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

From the perf analysis, we can see that the CPU consumption in the_ spin_ lock_ The irqsave function continues to analyze the kernel state call stack and obtains the following stack information:

– 89.81% 89.81% [kernel] [k] _spin_lock_irqsave ▒

– _spin_lock_irqsave ▒

– mix_pool_bytes_extract ▒

– extract_buf ▒

extract_entropy_user ▒

urandom_read ▒

vfs_read ▒

sys_read ▒

system_call_fastpath ▒

0xe82d

The above stack information shows that mongodb is reading / dev / urandom, and it is consumed on a spinlock due to multiple threads reading the file at the same time.

At this point, the problem is further clarified. The root case failure is not caused by excessive sys caused by tens of thousands of connections per second. The root cause is that the new link of each Mongo client causes a new thread in the mongodb back-end, which in some cases calls urandom_ Read reads the random number / dev / urandom, and because multiple threads read at the same time, the kernel state is consumed in a spinlock lock lock, resulting in high CPU.

2.5 mongodb kernel random number optimization

2.5.1 analysis of mongodb kernel source code

The above analysis has determined that the root cause of the problem is that multiple threads of mongodb kernel read the random number of / dev / urandom. After reading mongodb kernel code, it is found that the file is read in the following places:

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

The above is the core code for generating random numbers. Every time you get a random number, you will read the “dev / urandom” system file, so you can analyze the problem as long as you find the place to use the interface.

Continue to read the code, mainly in the following places:

//After the server receives the first message of SASL authentication from the client, a random number will be generated here
//If it is mongos, here is the processing flow of the first message received by SASL authentication of the client
Sasl_scramsha1_server_conversation::_firstStep(...) {
 ... ...
 unique_ptr<SecureRandom> sr(SecureRandom::create());
 binaryNonce[0] = sr->nextInt64();
 binaryNonce[1] = sr->nextInt64();
 binaryNonce[2] = sr->nextInt64();
 ... ...
}
//Compared with mongos, the storage node is the client, and mongos as the client also needs to generate random numbers
SaslSCRAMSHA1ClientConversation::_firstStep(...) {
 ... ...
 unique_ptr<SecureRandom> sr(SecureRandom::create());
 binaryNonce[0] = sr->nextInt64();
 binaryNonce[1] = sr->nextInt64();
 binaryNonce[2] = sr->nextInt64();
 ... ...
}

2.5.2 random number optimization of mongodb kernel source code

From the analysis of 2.5.1, we can see that mongos will generate random numbers through “/ dev / urandom” in the process of SASL authentication for new client connections, which will cause the high sy% CPU of the system. How to optimize the random number algorithm is the key to solve this problem.

Continue to analyze mongodb kernel source code and find that there are many places to use random numbers. Some of them are generated by user mode algorithm. Therefore, we can use the same method to generate random numbers in user mode. The core algorithm of generating random numbers in user mode is as follows:

class PseudoRandom {
 ... ...
 uint32_t _x;
 uint32_t _y;
 uint32_t _z;
 uint32_t _w;
}

The algorithm can ensure the random distribution of the generated data

http://en.wikipedia.org/wiki/…

You can also view the implementation of GIT address acquisition algorithm as follows: mongodb random number generation algorithm annotation

Summary:By optimizing the SASL authentication random number generation algorithm to user mode algorithm, the problem of CPU sy% 100% can be solved,At the same time, the agent performance has several times / dozens of times performance improvement in the short link scenario.

3. Question summary and question answering

From the above analysis, it can be seen that the fault is caused by a series of factors, including improper use of client configuration, abnormal defects in mongodb server kernel extreme conditions, incomplete monitoring, etc. The summary is as follows:

  1. The client configuration is not uniform, the configuration of multiple business interfaces in the same cluster is various, and the timeout configuration and link configuration are different, which increases the difficulty of packet capturing and troubleshooting. If the timeout time is too small, it is easy to cause repeated reconnection.
  2. The client needs to be equipped with all mongos agents. In this way, when an agent fails, the client SDK will remove the failed agent node by default, so as to ensure the minimum business impact and no single point of problem will occur.
  3. Multiple business interfaces in the same cluster should be configured in the same configuration center to avoid inconsistent configuration.
  4. The new connection random algorithm of mongodb kernel has serious defects, causing serious performance jitter and even “avalanche” of services in extreme cases.

At this point, we can answer the six questions in Chapter 1 as follows:

Why does burst traffic jitter?

A: since the service is java business, link pool is used to link mongos proxy. When there is sudden traffic, the link pool will increase the number of links to improve the performance of accessing mongodb. At this time, the client will add new links. Due to the large number of clients, there may be a large number of new connections and mongos chain building. After the link is established successfully, SASL authentication is started. Since the first step of authentication needs to generate random numbers, you need to access the operating system ‘/ dev / urandom’ file. Because the mongos proxy model is to link one thread by default, it will cause multiple threads to access the file instantaneously, which will lead to high sy% load in the kernel state.

Why does mongos agent cause “avalanche” and why traffic drops to zero?

In the first step, the number of links between the proxy and the proxy will increase at a certain moment, because the number of links in the proxy will increase at a certain moment, because the number of links in the proxy will increase at a certain moment The mongos server gets the random number by reading the “/ dev / urandom”. Because multiple threads read the file at the same time, the CPU sy% 100% problem is triggered by the kernel mode spinlock lock lock. Because the sy% system load is too high, and the client time-out is too small, it further causes the client access timeout. After the timeout, it reconnects, and then enters SASL authentication, which intensifies the reading of the “/ dev / urandom” file. This cycle continues.

In addition, after the first service jitter, the server expanded the capacity of 8 mongos agents, but the client did not modify it. As a result, the two agents in the business configuration of machine room B were on the same server, which could not use the Mongo Java SDK’s strategy of automatically eliminating high load nodes, so it eventually led to an “avalanche”.

Why does the data node not have any slow logs, but the CPU load is sy% 100%?

A: because the client Java program directly accesses the mongos proxy, a large number of links only occur between the client and mongos. At the same time, because the timeout time of the client is too short (there are interface setting bits of tens of MS, some interface setting bits of more than 100 ms, and some interface setting bits of 500 ms), a chain reaction will be caused at the peak of traffic (the high system load of sudden traffic will cause customers After the timeout, the client will be reconnected quickly, which will further cause timeout and infinite loop). The link pool model is also between mongos and mongod. However, the timeout of mongos as a client accessing mongod storage node is very long. By default, it is at the second level, so it will not cause repeated timeout and chain breaking.

Why is it that when the agent of computer room a shakes, when the business of computer room a is switched to machine room B, it still shakes?

A: when the service in computer room a jitters and the service is switched to computer room B, the client needs to re-establish link authentication with the server, which will trigger a large number of repeated chain building and chain breaking and reading the random number “/ dev / uradom”, which eventually leads to the failure of the computer room’s liveness.

Why is the packet capture analysis when the exception occurs? The client frequently creates and breaks the chain, and the interval between the creation and disconnection of the same link is very short?

A: the root cause of frequent chain building and chain breaking is the high sy% load of the system. The reason why the clients set up links and then ports in a very short time is that the timeout time of client configuration is too short.

In theory, the agent is a seven layer forwarding, which consumes less resources. Compared with mongod storage, it should be faster. Why does mongod storage node have no jitter, but mongos proxy has serious jitter?

A: due to the fragmentation architecture, there is a layer of mongos agent in front of all mongod storage nodes. As the client of mongod storage node, the timeout time of mongos agent is seconds by default. There will be no timeout phenomenon and frequent chain building and disconnection process.

If the mongodb cluster adopts the normal replica set mode, will the frequent chain building and chain breaking of the client cause the same “avalanche” of mongod storage nodes?

A: Yes. If there are too many clients, the operating system kernel version is too low, and the timeout time is too long to directly access the mongod storage node of the replica set, because the authentication process of the client and storage node is the same as that of the mongos agent, it will trigger frequent reading of the “/ dev / urandom” file, resulting in excessive CPU sy% load and avalanche in extreme cases.

4. “Avalanche” solution

From the above analysis, the problem is that the client configuration is unreasonable, and the mongodb kernel authentication process reads random numbers in extreme cases, resulting in avalanche. If there is no mongodb kernel development capability, this problem can be avoided by standardizing the client configuration. Of course, if the client configuration is standardized and the mongodb kernel level solves the problem of random number reading in extreme cases, the problem can be completely solved.

4.1 standardization of Java SDK client configuration

In business scenarios where there are many business interfaces and many client machines, the client configuration must be as follows:

  1. The timeout time is set to seconds to avoid repeated chain building and chain breaking caused by over setting of timeout time.
  2. The client needs to configure all mongos proxy addresses, and can not configure a single point. Otherwise, traffic to a mongos will easily cause chain building authentication of instantaneous traffic peak.
  3. Increase the number of mongos agents, so as to ensure that the number of new key links of each agent at the same time is as few as possible. When the client is configured with multiple agents, the traffic distribution is balanced by default. If the load of an agent is high, the client will automatically eliminate it.

If you don’t have the ability to develop mongodb kernel source code, you can refer to the client configuration method, and eliminate the linux-2.6 kernel and adopt the linux-3.10 or higher kernel, which can basically avoid stepping on the same type of pit.

4.2 mongodb kernel source code optimization (discarding kernel mode to obtain random number and selecting user random number algorithm)

See section 2.5.2 for details.

4.3 PHP short link service, how to avoid stepping on the pit

Because PHP business belongs to short link business, if the traffic is very high, it is inevitable to build and break the chain frequently, and it will also go through the SASL authentication process. Finally, multithreads frequently read the file of “/ dev / uradom”, which is easy to cause the problems in the front. In this case, we can adopt the similar specification of 4.1 java client. At the same time, we should not use the low version Linux kernel. We can avoid the existence of this problem by using the kernel version above 3. X.

5. Mongodb kernel source code design and implementation analysis

The related mongodb thread model and random number algorithm implementation related source code analysis are as follows:

Mongodb dynamic thread model source design and Implementation Analysis:

https://github.com/y123456yz/…

Mongodb one link one thread model source design and Implementation Analysis:

https://github.com/y123456yz/…

Mongodb kernel mode and user mode random number algorithm implementation analysis:

https://github.com/y123456yz/…

Mongodb specific scenario performance dozens of times to improve the Optimization Practice (record a mongodb core cluster avalanche failure)

Recommended Today

Implementation example of go operation etcd

etcdIt is an open-source, distributed key value pair data storage system, which provides shared configuration, service registration and discovery. This paper mainly introduces the installation and use of etcd. Etcdetcd introduction etcdIt is an open source and highly available distributed key value storage system developed with go language, which can be used to configure sharing […]