This is the last article in the “I want to enter the big factory” series. It may be a bit of a headline party, but I want to express the same meaning and purpose.
This is a very common interview question, but most people don’t know how to answer it. In fact, this kind of question can take many forms. You must have seen it and feel that you can’t do it
How do you deal with rapid business growth?
How to deal with business volume growth of 10 times and 100 times?
How does your system support high concurrency?
How to design a high concurrency system?
What are the characteristics of high concurrency systems?
There are many ways to ask questions like this, but it’s hard to find a way to start with this type of question. However, we can have a conventional way of thinking to answer it, that is, how to design a system around business scenarios that support high concurrency? If you can think of this, then we can focus on how to support high concurrency at the hardware and software levels. In essence, this problem is a comprehensive test of whether you know how to deal with various details and whether you have experience in handling them.
In the face of ultra-high concurrency, first of all, the hardware level machines should be able to withstand it; secondly, the architecture design should do a good job of splitting microservices; at the code level, various cache, peak shaving, decoupling and other issues should be handled; at the database level, read-write separation, sub database and sub table should be done; in terms of stability, monitoring should be ensured; fusing, current limiting and degrading must be available; problems can be found and handled in time. In this way, there will be a preliminary concept in the design of the whole system.
Microservice architecture evolution
In the early days of the Internet, the single architecture was enough to support the daily business needs. All business services were deployed on a physical machine in a project. All businesses, including your trading system, member information, inventory, commodities and so on, are mixed together. Once the traffic is up, the problem of single structure will be exposed, and all businesses will be unable to use when the machine is hung.
As a result, the architecture of cluster architecture began to appear, and the single machine couldn’t resist the pressure. The simplest way is to expand horizontally and expand the capacity horizontally. In this way, the pressure flow is distributed to different machines through load balancing, which temporarily solves the problem of service unavailability caused by a single point.
However, with the development of business, maintaining all business scenarios in a project makes development and code maintenance more and more difficult. A simple requirement change needs to publish the whole service. Code merging conflicts will become more and more frequent. At the same time, the possibility of online failure is greater. The microservice architecture model was born.
The cost of development and maintenance is reduced and the pressure that the cluster can bear is increased by splitting each independent business into independent deployment. There will no longer be a small change point that needs to be moved by one engine.
From the perspective of high concurrency, the above points seem to be classified as the improvement of the overall system pressure resistance through service splitting and cluster physical machine expansion. Then, the problems brought about by the splitting are also the problems to be solved in high concurrency systems.
The benefits and convenience brought by the separation of microservices are obvious, but at the same time, the communication between each microservice needs to be considered. The traditional HTTP communication mode is a great waste of performance, so it is necessary to introduce RPC framework such as Dubbo to improve the communication efficiency of the whole cluster based on TCP long connection.
If the original QPS from the client is 9000, then the load balancing strategy is used to distribute 3000 to each machine. After HTTP is changed to RPC, the interface time is shortened, and the QPS of the single machine and the whole are improved. The RPC framework itself generally has its own load balancing, fusing and degrading mechanism, which can better maintain the high availability of the whole system.
So, RPC, as a general choice in China, some basic principles of Dubbo are the following problems.
How Dubbo works
- When the service is started, the provider and consumer connect to the Registry register according to the configuration information, and register and subscribe to the registry respectively
- The register returns the provider information to the consumer according to the service subscription relationship, and the consumer caches the provider information locally. If the information changes, the consumer will receive a push from the register
- The consumer generates proxy objects, selects a provider according to the load balancing strategy, and records the call times and time information of the interface to monitor regularly
- After getting the proxy object, the consumer initiates the interface call through the proxy object
- After receiving the request, the provider deserializes the data, and then calls the specific interface through the proxy
Dubbo load balancing strategy
- Weighted random: suppose we have a group of servers = [a, B, C], and their corresponding weights are weights = [5,3,2], and the total weight is 10. Now tile these weight values on the one-dimensional coordinate values. The [0, 5) interval belongs to server a, [5, 8) interval belongs to server B, and [8, 10) interval belongs to server C. Next, generate a random number in the range of [0, 10] through the random number generator, and then calculate the interval on which the random number will fall.
- Minimum active number: each service provider corresponds to an active number active. In the initial case, the active number of all service providers is 0. Every time a request is received, the active number is increased by one, and after the request is completed, the active number is reduced by one. After the service has been running for a period of time, the service provider with good performance can process the request faster, so the active number will decrease faster. At this time, such service provider can get new service request first.
- Consistent hash: a hash algorithm is used to generate a hash from the provider’s involvers and random nodes, and the hash is projected onto the ring [0, 2 ^ 32 – 1]. MD5 is performed according to the key during query, and then hash is performed to get the invoker whose value of the first node is greater than or equal to the current hash.
- Weighted polling: for example, if the weight ratio of server a, B and C is 5:2:1, then in 8 requests, server a will receive five of them, server B will receive two of them, and server C will receive one of them.
Cluster fault tolerance
- Automatic handover of failure cluster: the default fault-tolerant scheme of Dubbo. When the call fails, it will automatically switch to other available nodes. The specific number of retries and interval time can be configured by referring to the service. The default number of retries is 1, that is, only one call.
- Failback cluster fast failure: when the call fails, the log and call information are recorded, and then the null result is returned to the consumer, and the failed call is retried every 5 seconds through the timed task
- Failfast cluster automatic recovery: it will only be called once, and an exception will be thrown immediately after failure
- Failsafe cluster failure security: if an exception occurs in the call, the log will not be thrown, and a null result will be returned
- Forking cluster calls multiple service providers in parallel: create multiple threads through the thread pool, call multiple providers concurrently, and save the results to the blocking queue. As long as one provider successfully returns the result, it will return the result immediately
- Broadcast mode of broadcast cluster: call each provider one by one. If one of them reports an error, an exception will be thrown after the end of the loop call.
For the role of MQ, we should be very familiar with, peak shaving and valley filling, decoupling. Depending on message queue, synchronous to asynchronous mode can reduce the coupling between microservices.
For some interfaces that do not need synchronous execution, message queue can be introduced to implement asynchronously to improve the interface response time. After the transaction is completed, inventory needs to be deducted, and then points may be issued to members. In essence, the action of issuing points should be performance service, and the requirement for real-time performance is not high. As long as we ensure the final consistency, that is, we can perform successfully. For this kind of request of the same nature, MQ asynchronism can be used, which improves the system’s ability to withstand pressure.
For message queue, how to ensure the reliability and no loss of messages?
Message loss may occur in three aspects: producer sends message, MQ itself loses message, and consumer loses message.
The possibility of the producer losing the message is that the program failed to send the exception and did not try again, or the sending process was successful, but the network flashed off and MQ did not receive the message, the message was lost.
Since synchronous transmission generally does not use this way, we will not consider the problem of synchronous transmission. We are based on the scenario of asynchronous transmission.
There are two ways to send asynchronouslyAsynchronous with callback and asynchronous without callbackIn the mode of no callback, the producer may lose the message regardless of the result after sending it. We can make a solution by sending the message asynchronously + callback notification + local message table. The following single scenario is an example.
- After placing an order, first save the local data and MQ message table. At this time, the status of the message is sending. If the local transaction fails, the order fails and the transaction is rolled back.
- If the order is successful, it will return to the client directly and send the MQ message asynchronously
- The MQ callback notifies the sending result of the message, and updates the sending status of the database MQ
- Job polling has exceeded a certain time (the time depends on the business configuration) and has not sent a successful message to try again
- In the monitoring platform configuration or job program processing more than a certain number of times has been sent unsuccessful messages, alarms, manual intervention.
Generally speaking, for most scenarios, the form of asynchronous callback is OK. Only in the scenario where we need to ensure that messages cannot be lost completely, we will make a complete set of solutions.
If the producer guarantees that the message will be sent to MQ, and MQ is still in memory after receiving the message, the message may be lost if it fails to synchronize with the slave node.
For example, rocketmq:
Rocketmq is divided into two ways: synchronous disk flushing and asynchronous disk flushing. By default, asynchronous disk flushing may cause the message to be lost before it is flushed to the hard disk. The reliability of the message can be ensured by setting it to synchronous disk flushing. In this way, even if MQ is hung, messages can be recovered from the disk during recovery.
For example, Kafka can also be configured to:
Acks = all only when all nodes participating in the replication receive the message, the producer success is returned. In this way, the message will not be lost until all nodes are hung. replication.factor=N , set the number greater than 1, which requires at least 2 copies of each part min.insync.replicas=N Set a number greater than 1, which requires the leader to perceive that at least one follower is still connected Retries = n, set a very large value to let the producer send failure and try again
Although we can achieve the purpose of MQ’s high availability through configuration, it has a loss of performance. How to configure it needs to be weighed according to the business.
Scenario of consumers losing messages: when consumers have just received the message, the server is down. MQ thinks that consumers have consumed and will not send messages repeatedly, so the message is lost.
Rocketmq requires the consumer to reply ack confirmation by default, while Kafka needs to manually open the configuration and turn off the automatic offset.
The consumer does not return ack confirmation. The retransmission mechanism varies according to different MQ types. If the number of retries exceeds, it will enter the dead letter queue and need to be handled manually. (Kafka doesn’t have these)
Final consistency of messages
Transaction message can achieve the final consistency of distributed transaction. Transaction message is the XA like distributed transaction capability provided by MQ.
A semi transaction message is a message that MQ receives from the producer but does not receive a second confirmation and cannot be delivered.
The implementation principle is as follows:
- The producer first sends a semi transactional message to MQ
- After receiving the message, MQ returns ack confirmation
- The producer begins to execute local transactions
- If the transaction is executed successfully, a commit is sent to MQ, and a rollback is sent if the transaction fails
- If MQ does not receive a second confirmation commit or rollback from the producer for a long time, MQ initiates a message back query to the producer
- The producer queries the final state of transaction execution
- The second confirmation is submitted again according to the query transaction status
Finally, if MQ receives a second confirmation commit, the message can be delivered to the consumer. Otherwise, if it is a rollback, the message will be saved and deleted three days later.
For the whole system, the query and writing of all traffic ultimately fall on the database, which is the core of supporting the high concurrency ability of the system. How to reduce the pressure of database and improve the performance of database is the cornerstone of supporting high concurrency. The main way is to solve this problem by reading and writing separation and sub database and sub table.
For the whole system, the flow should be in the form of a funnel. For example, we have 200000 daily live users. In fact, only 30000 QPS users may come to the bill of lading page every day, and only 10000 QPS will be converted to the successful order payment. So for the system, read is greater than write. At this time, we can reduce the pressure of database by separating read and write.
Read write separation is equivalent to the way of database cluster to reduce the pressure of single node. In the face of the rapid growth of data, the original storage mode of single database and single table has been unable to support the development of the whole business. At this time, it is necessary to separate the database and table. For microservices, the vertical sub database itself has been done, and most of the rest are table splitting schemes.
Horizontal sub table
First, the business scenario is used to determine which fields are used as sharding fields_ For example, we now have a daily order of 10 million, and most of our scenarios are from the C side, and we can use user_ Sharding as_ Key, the data query supports orders in the last three months. If the data is more than three months, the amount of data in the three months is 900 million, which can be divided into 1024 tables, and the data of each table is about 1 million.
For example, if the user ID is 100, then we all go through hash (100), and then take the module of 1024, which can be dropped to the corresponding table.
ID uniqueness after sub table
Because our primary keys are auto incremented by default, the primary keys after the sub table will conflict in different tables. There are several ways to consider:
- Set the step size. For example, for 1-1024 tables, we set the basic step size of 1-1024 respectively, so that the primary keys fall into different tables will not conflict.
- Distributed ID, own a set of distributed ID generation algorithm or use open source, such as snowflake algorithm
- After table splitting, the primary key is not used as the query basis, but a new field is added to each form as the unique primary key. For example, the order number of the order table is unique. No matter which table is finally located, the query basis is based on the order number, and the same is true for updating.
Principle of master slave synchronization
- After the master submits the transaction, it writes to the binlog
- The slave connects to the master and gets the binlog
- Master creates a dump thread and pushes binglog to slave
- Slave starts an IO thread to read the binlog of the synchronized master and records it to the relay log relay log
- The slave starts a SQL thread to read the relay log event and execute it in the slave to complete the synchronization
- Slave records its own binglog
Since the default replication mode of MySQL is asynchronous, the master database does not care whether the slave database has been processed after sending the log to the slave database. This will cause a problem: suppose the master database is hung, and the slave database fails to process. At this time, after the slave database is upgraded to the master database, the log will be lost. There are two concepts.
Full synchronous replication
After the master database writes the binlog, the log is forced to synchronize with the slave database. All the slave databases are returned to the client after the execution is completed. Obviously, the performance will be seriously affected by this method.
Semi synchronous replication
Different from full synchronization, the logic of semi synchronous replication is as follows: after the slave database writes the log successfully, it returns ack confirmation to the master database, and the master database will consider the write operation completed when it receives at least one confirmation from the slave database.
As a representative of high performance, cache may take more than 90% of hot traffic in some special services. For some activities, such as seckill, which may be hundreds of thousands of concurrent QPS, the introduction of cache pre warm-up can greatly reduce the pressure on the database. A 100000 QPS may be hung up for a stand-alone database, but it is not a problem for a cache such as redis.
Taking the seckill system as an example, the activity preheating commodity information can be cached in advance to provide query services, the active inventory data can be cached in advance, the order process can be fully cache deduction, and after the second kill is finished, the database will bear less pressure. Of course, after the introduction of cache, we have to consider a series of issues such as cache breakdown, avalanche and hotspot.
Hot key problem
The so-called hot key problem is that suddenly there are hundreds of thousands of requests to access a specific key on the redis. This will cause the traffic to be too concentrated and reach the upper limit of the physical network card, which will cause the server of this redis to go down and cause an avalanche.
Solutions for hot key:
- Break the hot key to different servers in advance to reduce the pressure
- Add secondary cache, load hot key data into memory in advance. If redis is down, go to memory query
The concept of cache breakdown is that the concurrent access of a single key is too high. When it is expired, all requests will be called directly to the DB. This is similar to the hot key problem, but the point is that all requests are hit to the DB due to expiration.
- Lock update. For example, when a request is made to query a and it is found that it is not in the cache, lock the key A. at the same time, query the data in the database, write it to the cache, and then return it to the user. In this way, subsequent requests can get data from the cache.
- Write the expiration time combination in value, and refresh the expiration time asynchronously to prevent this kind of phenomenon.
Cache penetration refers to querying the data that does not exist in the cache, and every request will hit the DB, just as the cache does not exist.
To solve this problem, add a layer of cloth filter. The principle of the bloom filter is that when you store data, it will be mapped to k points in a bit group by hash function, and set them to 1.
In this way, when the user queries a again, and a returns directly when the value of a in the bloom filter is 0, there will be no breakdown request to call dB.
Obviously, there will be a problem after using the bloom filter, because it is an array itself, and multiple values may fall into the same position. In theory, as long as the length of our array is long enough, the probability of misjudgment will be lower. This problem will be based on the actual situation.
When a large-scale cache failure occurs at a certain time, for example, your cache service is down, a large number of requests will come in and directly hit the DB, which may lead to the collapse of the whole system, known as avalanche. The problem of avalanche is different from that of breakdown and hot key. It means that large-scale cache has expired.
Several solutions for avalanche:
- Set different expiration time for different keys to avoid expiration at the same time
- Limit the current. If redis is down, it can be limited to avoid a large number of requests crashing the dB at the same time
- The second level cache is the same as the hot key scheme.
For example, the abnormal situation of marketing service hang up or a large number of interface timeouts can not affect the main chain of the order, and some operations involving the deduction of points can be remedied afterwards.
For the high concurrency of burst, such as large promotion of seckill, if some interfaces do not do current limiting processing, the service may be directly suspended. It is particularly important to make appropriate current limiting according to the evaluation of pressure test performance of each interface.
After fusing, it can actually be said that it is a kind of degradation. For example, after the marketing interface is blown out, the degradation scheme is to stop calling the marketing service in a short period of time, and then call it after the marketing recovery.
Generally speaking, even if there is a unified configuration center, it is not allowed to make any changes during the peak period of business. However, some modifications can be made in case of emergency by configuring reasonable plans.
In view of the consistency of distributed transactions generated by various distributed systems or data anomalies caused by attacks, it is necessary to check the platform to do the final data verification. For example, check whether the amount of money in the downstream payment system and the order system is correct, and if the data received from man in the middle attacks the warehouse, whether the correctness is guaranteed.
In fact, we can see that the problem of how to design a high concurrency system itself is not difficult. It is just based on the knowledge you know, from the physical hardware level to the software architecture and code level optimization, and what middleware is used to continuously improve the system’s resistance to pressure. However, this problem itself will bring more problems. The splitting of microservices brings about the problem of distributed transactions. The use of HTTP and RPC frameworks brings about the problems of communication efficiency, routing and fault tolerance. The introduction of MQ brings about the problems of message loss, backlog, transaction messages and sequential information. The introduction of cache will also bring problems of consistency, avalanche and breakdown Separation of read and write, sub database and sub table will bring problems of master-slave synchronization delay, distributed ID, and transaction consistency. In order to solve these problems, we have to constantly add various measures such as fusing, current limiting, degradation, offline verification, plan processing, etc. to prevent and trace these problems.
This article combines some contents of previous articles. In fact, I wanted to write this one at the very beginning. I found that the length was too large and the content was not easy to summarize. So I split several pieces and started to write. This article is a summary of the previous content, not for water.