Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Time:2021-9-18

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

1. Problem background

A core Java long connection service uses mongodb as the main storage. Hundreds of machines on the client are connected to the same mongodb cluster. There are many performance jitters in the short term. In addition, there is an “avalanche” failure. At the same time, the traffic drops to zero instantly and cannot be recovered automatically. This paper analyzes the root causes of these two failures, including a series of problems, such as unreasonable client configuration, unreasonable mongodb kernel link authentication, incomplete proxy configuration and so on.

The cluster has access to more than ten business interfaces, each of which is deployed on dozens of business servers. The total number of clients accessing the mongodb machine exceeds hundreds, and some requests pull dozens or even more than 100 lines of data at a time.

The cluster is a 2-machine room multi active cluster in the same city (the election Festival does not consume too many resources, and the election nodes are deployed in the third machine room in a different place). The architecture diagram is as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

As can be seen from the above figure, in order to realize multi activity, corresponding agents are deployed in each computer room. The client of the corresponding computer room links to the mongos agent of the corresponding computer room, and there are multiple agents in each computer room. The agent layer deployment IP: port address list (Note: it is not a real IP address) is as follows:

  1. Agent address list of machine room a: 1.1.1.1:111, 2.2.2.2:1111, 3.3.3.3:1111
  2. Agent address list of machine room B: 4.4.4.4:1111, 4.4.4.4:2222

Three agents in machine room a are deployed on three different physical machines, and two agents in machine room B are deployed on the same physical machine. In addition, machine room a and machine room B are in the same city, and the cross machine room access delay can be ignored.

The cluster storage layer and config server adopt the same architecture: machine room a (1 master node + 1 slave node) + Machine Room B (2 slave nodes) + Machine Room C (1 election node arbiter), that is, 2 (data node) + 2 (data node) + 1 (election node) mode.

The multi Activity Architecture of the machine room can ensure that any machine room is hung and has no impact on the business of another machine room. The specific multi activity principle of the machine room is as follows:

  1. If machine room a hangs up, because the agent is a stateless node, the hanging up of machine room a will not affect the agent of machine room B.
  2. If machine room a hangs up and the master node is in machine room a, there are two data nodes in machine room B and three election nodes in Machine Room C, which can ensure that more than half of the nodes are required for the new election. Therefore, the data node in machine room B will elect a new master node in a short time, so that the access of the whole storage layer will not be affected.

This paper focuses on the following six questions:

  1. **Why does burst traffic jitter**
  2. **Why does the data node not have any slow logs, but the agent load is 100% less**
  3. **Why does the mongos agent cause an “avalanche” for hours and cannot be recovered for a long time**
  4. **Why does the agent in one computer room jitter when the corresponding computer room business is switched to another computer room**
  5. **Why is it that the client frequently builds and breaks the chain when packet capture analysis is abnormal, and the interval between building and breaking the chain of the same link is very short**
  6. **Theoretically, the agent is a seven layer forwarding, which consumes less resources and should be faster than mongod storage. Why does the mongod storage node have no jitter and the mongos agent lack jitter**

2. Fault process

2.1 occasional peak traffic and business jitter?

The cluster has several short jitters over a period of time. When the client in machine room a jitters, it is found that the corresponding agent load in machine room a is very high, so switch machine room a to access the agent in machine room B, but the agent in machine room B also jitters after switching, that is, multi activity switching has no effect. The specific process is analyzed as follows.

2.1.1 storage node slow log analysis

First, analyze the monitoring information such as CPU, MEM, IO and load of all mongod storage node systems in the cluster. It is found that everything is normal, so analyze the slow log of each mongod node (since the cluster is sensitive to delay, the slow log is adjusted to 30ms). The analysis results are as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

As can be seen from the above figure, the storage node does not have any slow logs during service jitter. Therefore, it can be judged that the storage node is normal, and the service jitter has nothing to do with the mongod storage node.

2.1.2 mongos agent analysis

There are no problems with the storage node, so start troubleshooting the mongos agent node. Due to historical reasons, the cluster is deployed on other platforms. The platform does not fully monitor QPS and delay, resulting in early jitter, which is not detected in time. After jitter, migrate the platform cluster to the new management and control platform developed by oppo. The new platform has detailed monitoring information. The QPS monitoring curve after migration is as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

At each time point when the traffic increases, the corresponding business monitoring has a wave of timeout or jitter, as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

After analyzing the mongos logs of the corresponding agents, the following phenomena are found: there are a lot of link building and link breaking processes in the mongos.log logs at the jitter time point, as shown in the figure below:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

It can be seen from the above figure that thousands of links are established and thousands of links are disconnected in one second. In addition, it is found that many links are disconnected in a short time by packet capturing. The phenomenon is as follows (chain breaking time – chain building time = 51ms, and some links are disconnected for more than 100 ms):

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

The corresponding packet capture is as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

In addition, the low peak period of client links on the machine agent is very high, even exceeding the normal QPS value. The QPS is about 7000-8000, but the conn link is up to 13000,

The monitoring information obtained by mongostat is as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

2.1.3 agent machine load analysis

During each burst of traffic, the agent load is very high. Regular sampling is carried out through the deployment script. The corresponding monitoring diagram at the jitter time point is shown in the following figure:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

As can be seen from the above figure, the CPU load is very high at each traffic peak, and it is sy% load, and the US% load is very low. At the same time, the load is even as high as hundreds, sometimes even thousands.

2.1.4 jitter analysis summary

From the above analysis, it can be seen that the service has burst traffic at some time points, resulting in high system load**Is the root cause really due to sudden traffic? In fact, it is not. Please see the follow-up analysis. This is actually a wrong conclusion. A few days later, the same cluster avalanched**

Therefore, the service combs the interface corresponding to the burst traffic. After sorting out, the interface is dropped. The QPS monitoring curve is as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

In order to reduce service jitter, the burst traffic interface is dropped, and the service will not jitter in the next few hours. After dropping the burst traffic interface, we also did the following things:

  1. Since the real reason for 100% mongos load is not found, each computer room expands the capacity of mongs agents, maintains four agents in each computer room, and ensures that all agents are on different servers to minimize the agent load through shunting.
  2. Inform all 8 agents in the service configuration of machine room a and machine room B that each machine room is no longer configured with only the agents of the corresponding machine room (because after the first service jitter, we analyze mongodb’s Java SDK and determine that the SDK equalization strategy will automatically eliminate the agents with high request delay. Next time, if an agent has another problem, it will also be automatically eliminated).
  3. The notification service increases the timeout of all clients to 500ms.

But**There are always many doubts and suspense in my heart, mainly in the following points:**

  1. There are 4 storage nodes and 5 proxy nodes. The storage nodes do not have any jitter. On the contrary, the proxy load of layer 7 forwarding is high?
  2. Why are many new connections found in packet capture disconnected after tens of MS or more than 100 ms? Frequent chain building and chain breaking?
  3. Why is there only tens of thousands of proxy QPS? At this time, the proxy CPU consumption is very high, and it is all sy% system load? Based on my years of experience in middleware agent development, it is only right that the agent consumes very few resources, and the CPU will only consume us%, not sy%.

2.2 the same business “avalanched” a few days later

It’s not a long time. It’s only a few days after the service drops the interface of burst traffic, but a more serious fault occurs. The service traffic of machine room B directly drops by 0 at a certain time. It’s not a simple jitter problem, but the direct service traffic drops by 0, the system sy% load is 100%, and the service is almost 100% reconnected over time.

2.2.1 machine system monitoring analysis

Machine CPU and system load monitoring are as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

As can be seen from the above figure, it is almost consistent with the excessive system load caused by the previous burst traffic. The service CPU sy% load is 100%, and the load is very high. Log in to the machine to obtain the top information, and the phenomenon is consistent with the monitoring.

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

The corresponding network monitoring at the same time is as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Disk IO monitoring is as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

From the above system monitoring analysis, it can be seen that the CPU sy% load of the system is very high, the network read-write traffic drops by almost 0, and the disk IO is normal. It can be seen that the whole process is almost consistent with the jitter caused by the previous burst traffic.

2.2.2 how to restore business

After the first jitter caused by burst traffic, we expanded the capacity of all agents to 8, and informed the service to configure all service interfaces with all agents. Due to the large number of service interfaces, the services in machine room B are not configured with all agents, only the original two agents on the same physical machine (4.4.4.4:1111, 4.4.4.4:2222) are configured, which finally triggers a performance bottleneck of mongodb (see the following analysis for details), causing the “avalanche” of the whole mongodb cluster

Finally, the problem was solved by restarting the service and configuring the eight agents in machine room B at the same time.

2.2.3 mongos agent instance monitoring analysis

By analyzing the proxy log in this time period, we can see the same phenomenon as in 2.1, a large number of new key connections, and the new connections close after tens of MS and more than 100 ms. The whole phenomenon is consistent with the previous analysis. The corresponding log is not analyzed statistically here.

In addition, by analyzing the agent QPS monitoring at that time, the QPS access curve of normal query read requests is as follows, and the QPS almost fell to zero during the failure period:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

It can be seen from the above statistics that when the traffic failure time point of the agent node has a sharp spike, and the command statistics at the time point soars to 22000 (actually, it may be higher, because we monitor the sampling period of 30s, here is only the average value), that is, 22000 connections come in at the moment. Command statistics are actually dB. Ismaster() statistics. The first message after the client connects to the server is the ismaster message. After the server executes dB. Ismaster(), it responds to the client. After the client receives it, it starts the formal SASL authentication process.

The normal client access process is as follows:

  1. The client initiates a link to mongos
  2. After the mongos server accepts the link, the link is established successfully
  3. The client sends the DB. Ismaster() command to the server
  4. The server answers the ismaster to the client
  5. The client initiates SASL authentication with mongos proxy (multiple interactions with mongos)
  6. The client initiates the normal find() process

After the client SDK link is successfully established, the purpose of sending dB. Ismaster () to the server is to determine the load balancing strategy and the type of node, so as to ensure that the client can quickly perceive the agent with high access time delay, so as to quickly eliminate the nodes with high round-trip delay, and determine the type of node visited.

In addition, through the script deployed in advance, the script automatically captures packets when the system load is high. The analysis results from packet capturing are shown in the following figure:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

The timing analysis of the above figure is as follows:

  1. 11: 21:59.506174 link established successfully
  2. 11: 21:59.506254 client sends dB. Ismaster() to server
  3. 11: 21:59.656479 client sends fin disconnection request
  4. 11: 21:59.674717 the server sends a DB. Ismaster() response to the client
  5. 11: 21:59.675480 client direct RST

The difference between the third message and the first message is about 150ms. Finally, determine the timeout configuration corresponding to the client IP with the service, which is determined to be 150ms. In addition, there are similar timeout configurations such as 40ms and 100ms in other packet capturing. Through the confirmation of the corresponding client and service, it is determined that the timeout time of the corresponding client service interface is configured as 40ms and 100ms. Therefore, combined with packet capture and client configuration, it can be determined that when the agent exceeds the specified timeout and does not return a value to the client dB. Ismaster(), the client immediately times out and initiates a reconnection request immediately after the timeout.

**Summary:**Through packet capture and mongos log analysis, it can be determined that the reason for the quick disconnection after the link is established is that the first request dB. Ismaster() of the client access agent timed out, causing the client to reconnect. After reconnection, it starts to obtain dB. Ismaster () requests. Due to the high CPU load of 100%, the requests after reconnection will timeout each time. For clients with a timeout of 500ms, since dB. Ismaster () will not timeout, the subsequent SASL authentication process will be followed.

**Therefore, it can be seen that the high system load is related to repeated chain building and chain breaking. At a certain time, a large number of clients establish links (2.2W), resulting in high load. Because the timeout time of the clients is configured differently, the clients will eventually enter the SASL process and obtain random numbers from the kernel state, resulting in high sy% load, and high sy% load will lead to client timeout, In this way, the whole access process becomes an “endless loop”, which eventually leads to an avalanche of mongos agents**

**2.3 offline simulated fault**

So far, we have roughly determined the cause of the problem, but why can 20000 requests cause sy% load of 100% at the moment when the fault breaks out? Theoretically, tens of thousands of links a second will not cause such a serious problem. After all, our machine has 40 CPUs. Therefore, the key point of this fault is to analyze why the system sy% load is 100% due to repeated chain building and chain breaking.

2.3.1 simulated fault process

The steps of simulating frequent chain building and chain breaking faults are as follows:

  1. Modify mongos kernel code and delay all requests by 600ms
  2. The same machine has two identical mongos, which are distinguished by ports
  3. The client enables 6000 concurrent links with a timeout of 500ms

Through the above operations, all requests can be guaranteed to timeout. After the timeout, the client will immediately start to rebuild the chain. After the chain is built again, the access to mongodb will timeout. This simulates the process of repeatedly building and breaking the chain. In addition, in order to ensure consistency with avalanche failure environment, two mongos agents are deployed on the same physical machine.

2.3.2 fault simulation test results

In order to ensure the consistency with the hardware environment of the failed mongos agent, the server with the same type of failure and the same operating system version (2.6.32-642. El6. X86_64) is selected. After the programs run, the problem immediately appears:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

Because the operating system version of the failed server linux-2.6 is too low, it is suspected that there may be a problem with the operating system version. Therefore, upgrade a physical machine of the same type to linux-3.10. The test results are as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

As can be seen from the above figure, the client 6000 is reconnected concurrently and repeatedly, the server pressure is normal, and all CPU consumption is us%, sy% consumption is very low. The user state CPU consumes 3 CPUs, and the kernel state CPU consumption is almost 0. This is the normal result we expect. Therefore, we think this problem may be related to the operating system version.

In order to verify whether the linux-3.10 kernel version has the same problem of high CPU consumption in the sy% kernel state of version 2.6, the concurrency is increased from 6000 to 30000. The verification results are as follows:

**Test results:**By modifying the mongodb kernel version, the client is deliberately timed out to repeatedly build and break the chain. In linux-2.6, the problem of more than 1500 concurrent repeated build and break the chain system CPU sy% 100% can emerge. However, in linux-3.10, after the concurrency reaches 10000, the sy% load increases gradually. The higher the concurrency, the higher the sy% load.

**Summary:**In linux-2.6 system, as long as mongodb has thousands of repeated chain building and chain breaking every second, the system sy% load will be close to 100%. In linux-3.10, when the chain is built and broken repeatedly, the sy% load can reach 30%. With the increase of client concurrency, the sy% load increases accordingly. Compared with version 2.6, linux-3.10 has a great performance improvement for the scenario of repeated chain building and chain breaking, but it can not solve the fundamental problem.

2.4 repeated chain building and chain breaking at the client cause sy% 100% root cause

To analyze%sy the causes of high system load, install perf to obtain the system top information. It is found that all CPUs are consumed in the following interfaces:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

From the perf analysis, we can see that the CPU consumption is_ spin_ lock_ Irqsave function, continue to analyze the kernel state call stack, and get the following stack information:

  • 89.81% 89.81% [kernel][k] _spin_lock_irqsave ▒
  • _spin_lock_irqsave ▒
  • mix_pool_bytes_extract ▒
  • extract_buf ▒
    extract_entropy_user ▒
    urandom_read ▒
    vfs_read ▒
    sys_read ▒
    system_call_fastpath ▒
    0xe82d

The above stack information shows that mongodb is reading / dev / urandom, and because multiple threads read the file at the same time, it is consumed on a spinlock.

Here, the problem is further clarified. The fault root case is not caused by tens of thousands of connections per second, resulting in too high sys. The root cause is that the new link of each Mongo client will cause the mongodb backend to create a new thread, which will call urandom under certain circumstances_ Read to read the random number / dev / urandom, and because multiple threads read at the same time, the kernel state is consumed on a spinlock lock, resulting in high CPU.

2.5 mongodb kernel random number optimization

2.5.1 location analysis of mongodb kernel source code

The above analysis has determined that the root cause of the problem is that multiple threads of mongodb kernel read / dev / urandom random numbers. After reading mongodb kernel code, it is found that the places where the file is read are as follows:

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

The above is the core code for generating random numbers. Every time you get a random number, you will read the “/ dev / urandom” system file, so you can analyze the problem as long as you find the place where the interface is used.

Continue to read the code and find that it is mainly in the following places:

//The processing after the server receives the first message authenticated by the client SASL, where a random number will be generated

//If it is mongos, here is the processing flow of receiving the first message authenticated by the client SASL

Sasl_scramsha1_server_conversation::_firstStep(…) {

… …

unique_ptr sr(SecureRandom::create());

binaryNonce[0] = sr->nextInt64();

binaryNonce[1] = sr->nextInt64();

binaryNonce[2] = sr->nextInt64();

… …

}

//Compared with mongos, mongod storage node is the client, and mongos, as the client, also needs to generate random numbers

SaslSCRAMSHA1ClientConversation::_firstStep(…) {

… …

unique_ptr sr(SecureRandom::create());

binaryNonce[0] = sr->nextInt64();

binaryNonce[1] = sr->nextInt64();

binaryNonce[2] = sr->nextInt64();

… …

}

2.5.2 random number optimization of mongodb kernel source code

It can be seen from the analysis of 2.5.1 that mongos generates random numbers through “/ dev / urandom” in the SASL authentication process of new client connections, resulting in too high system sy% CPU. How to optimize the random number algorithm is the key to solve this problem.

We continue to analyze the mongodb kernel source code and find that there are many places where random numbers are used, some of which are generated by the user state algorithm. Therefore, we can use the same method to generate random numbers in the user state. The user state random number generation core algorithm is as follows:

class PseudoRandom {

… …

uint32_t _x;

uint32_t _y;

uint32_t _z;

uint32_t _w;

}

Optimization Practice of performance improvement of mongodb by tens of times in specific scenarios (an avalanche failure of mongodb core cluster)

The algorithm can ensure the random distribution of the generated data. See the following for the principle of the algorithm:

en.wikipedia.org/wiki/Xorshi…

You can also view the implementation of the following git address acquisition algorithm:

Comments on mongodb random number generation algorithm

**Summary:**After optimizing the random number generation algorithm of SASL authentication to user state algorithm, the CPU sy% 100% problem can be solved**At the same time, the agent performance has been improved several times / dozens of times in the short link scenario**

  1. **Question summary and answer**

From the above analysis, it can be seen that the fault is caused by a series of factors, including improper configuration and use of the client, abnormal defects of the mongodb server kernel under extreme conditions, incomplete monitoring, etc. The summary is as follows:

  1. The client configuration is not unified. Multiple business interfaces in the same cluster are configured in a variety of ways. The timeout configuration and link configuration are different, which increases the difficulty of packet capturing and troubleshooting. Setting the timeout time too small is easy to cause repeated reconnection.
  2. The client needs to be equipped with all mongos agents, so that when an agent fails, the client SDK will eliminate the failed agent node by default, so as to ensure that the business impact is minimal and there will be no single point problem.
  3. Multiple business interfaces in the same cluster should be uniformly configured in the same configuration center to avoid inconsistent configuration.
  4. The new connection random algorithm of mongodb kernel has serious defects, causing serious performance jitter and even service “avalanche” in extreme cases.

After analysis, we can answer the six questions in Chapter 1, as follows:

**Why does burst traffic jitter**

A: because the business is a Java business, the link pool is used to link mongos agents. When there is sudden traffic, the link pool will increase the number of links to improve the performance of accessing mongodb. At this time, the client will add new links. Due to the large number of clients, there may be a large number of new connections and mongos chains in an instant. After the link is established successfully, start SASL authentication. Since the first step of authentication needs to generate random numbers, you need to access the “/ dev / urandom” file of the operating system. Because mongos proxy model links one thread by default, it will cause multiple threads to access the file instantly, resulting in excessive kernel sy% load.

**Why does mongos agent cause an “avalanche” and why does the traffic fall to zero and become unavailable**

A: the reason is that the client may have a sudden increase in traffic at some time, and the number of links in the link pool is insufficient, so it adds links with mongos agent. Because it is an old cluster, the agent still defaults to a link and a thread model. In this way, there will be a large number of links in an instant. After each link is successfully established, start SASL authentication, In the first step of authentication, the server needs to generate random numbers. The mongos server obtains random numbers by reading “/ dev / urandom”. Since multiple threads read the file at the same time, the CPU sy% 100% problem of kernel spinlock lock lock is triggered. The sy% system load is too high and the client timeout is set too small, which further causes the client access timeout. After the timeout, it reconnects and enters SASL authentication after reconnection, which intensifies the reading of “/ dev / urandom” files. Such repeated cycles continue.

In addition, after the first service jitter, the server expanded the capacity of 8 mongos agents, but the client was not modified. As a result, the two agents configured for the service in machine room B were on the same server and could not take advantage of the strategy of automatically eliminating high load nodes of Mongo Java SDK, resulting in an “avalanche”

**Why does the data node not have any slow logs, but the agent load is CPU sy% 100%**

A: because the client Java program directly accesses the mongos proxy, a large number of links only occur between the client and mongos. At the same time, because the client timeout setting is too short (there are interface setting bits of tens of MS, some interface setting bits of more than 100 ms, and some interface setting bits of 500 ms), it causes a chain reaction at the peak of traffic (the high system load of burst traffic causes the client to time out quickly, and the fast reconnection after timeout further causes timeout and infinite loop). Mongos and mongod are also link pool models, but the timeout of mongos accessing the mongod storage node as a client is very long, and the default is the second level, so it will not cause repeated timeout and chain breaking.

**Why does the agent in machine room a shake when the business in machine room a switches to machine room B**

A: when the business in computer room a jitters and the business is switched to computer room B, the client needs to re establish link authentication with the server, which will trigger a large number of processes of repeatedly building and breaking the chain and reading the random number “/ dev / urandom”, so the multi activity of the computer room finally fails.

**Why is it that the client frequently builds and breaks the chain when packet capture analysis is abnormal, and the interval between building and breaking the chain of the same link is very short**

A: the root cause of frequent chain building and chain breaking is the high system sy% load. The reason why the client establishes the link and then the port in a very short time is that the timeout time of the client configuration is too short.

**Theoretically, the agent is a seven layer forwarding, which consumes less resources and should be faster than mongod storage. Why does the mongod storage node have no jitter, but the mongos agent has serious jitter**

A: due to the sharding architecture, there is a layer of mongos agent in front of all mongod storage nodes. As the client of mongod storage nodes, the timeout time of mongos agent is seconds by default. There will be no timeout and frequent chain building and breaking processes.

**If the mongodb cluster adopts the normal replica set mode, will frequent chain building and chain breaking by the client cause the same problem to the mongod storage node******avalanche********?****

A: Yes. If there are too many clients, the kernel version of the operating system is too low, and the timeout is configured for too long, direct access to the mongod storage node of the replication set. Because the authentication process of the client and storage node is the same as that of the mongos agent, it will still trigger frequent reading of “/ dev / urandom” files, resulting in excessive CPU sy% load and avalanche in extreme cases.

  1. ******avalanche********terms of settlement**

From the above series of analysis, the problem is that the client configuration is unreasonable, and there are defects in reading random numbers in the mongodb kernel authentication process in extreme cases, resulting in an avalanche. If you don’t have the ability to develop mongodb kernel, you can avoid this problem by standardizing the client configuration. Of course, if the client configuration is standardized and the problem of random number reading in extreme cases is solved at the mongodb kernel level, the problem can be completely solved.

4.1 standardization of Java SDK client configuration

In the business scenario with many business interfaces and many client machines, the client configuration must be as follows:

  1. The timeout is set to seconds to avoid repeated chain building and chain breaking caused by excessive timeout setting.
  2. The client needs to configure all mongos proxy addresses and cannot configure a single point, otherwise the traffic to a mongos will easily lead to chain building authentication of instantaneous traffic peak.
  3. Increase the number of mongos agents, which can divert and ensure that the new key links of each agent at the same time are as few as possible. When the client is configured with multiple agents, the traffic distribution is balanced by default. If the load of an agent is high, the client will automatically eliminate it.

**If you don’t have the ability to develop the source code of mongodb kernel, you can refer to the client configuration method, eliminate the linux-2.6 kernel and adopt the linux-3.10 or higher kernel, which can basically avoid stepping on the same type of pit**

4.2 mongodb kernel source code optimization (discarding kernel state random number acquisition and selecting user state random number algorithm)

See section 2.5.2 for details.

4.3 PHP short link service, how to avoid stepping on the pit

Because PHP business belongs to short link business, if the traffic is very high, it is inevitable to frequently build and break the chain, and it will follow the SASL authentication process. Finally, multi threads frequently read “/ dev / urandom” files, which is easy to cause the previous problems. In this case, you can adopt the specifications similar to the 4.1 java client. At the same time, do not use the lower version of the Linux kernel. You can avoid the problem by using the kernel version above 3. X.

5. Mongodb kernel source code design and implementation analysis

The mongodb thread model and random number algorithm implementation related to this paper are analyzed as follows:

Source code design and implementation analysis of mongodb dynamic thread model

Mongodb one link one thread model source code design and implementation analysis

Implementation analysis of mongodb kernel state and user state random number algorithm

After reading three things ❤️

If you think this article is very helpful to you, I’d like to invite you to help me with three small things:

  1. Praise, forwarding, and your “praise and comments” are the driving force for my creation.

  2. Pay attention to the official account.Rotten pig skin“And share original knowledge from time to time.

  3. At the same time, we can look forward to the follow-up articles

 

source:my.oschina.net/u/4087916/b…

 

Recommended Today

Java Engineer Interview Questions

The content covers: Java, mybatis, zookeeper, Dubbo, elasticsearch, memcached, redis, mysql, spring, spring boot, springcloud, rabbitmq, Kafka, Linux, etcMybatis interview questions1. What is mybatis?1. Mybatis is a semi ORM (object relational mapping) framework. It encapsulates JDBC internally. During development, you only need to pay attention to the SQL statement itself, and you don’t need to […]