Why can redis single thread achieve million + QPS?


Why can redis single thread achieve million + QPS?
Author: coding in the world

Performance test report

After looking at the performance test report of Alibaba redis, we can see that it can achieve hundreds of thousands and millions of QPS (ignoring the optimization of redis made by Alibaba for the time being). Let’s analyze how redis does it from the design and implementation of redis.

Why can redis single thread achieve million + QPS?

Design and implementation of redis

In fact, redis mainly meets the performance requirements of such efficient throughput through three aspects

  • Efficient data structure
  • Multiplexing IO model
  • Event mechanism

1. Efficient data structure

Several efficient data structures supported by redis: string, hash, list, set and Zset

The underlying coding methods of the above exposed data structures have been optimized differently, which is not the focus of this paper.

2. Multiplexing IO model

Suppose that 10000 long connections are established with the redis server at a certain time. For blocking IO, a thread is established for each connection to process, so 10000 threads are required. At the same time, according to our experience, for IO intensive operations, we generally set the number of threads = 2 * number of CPUs + 1. For CPU intensive operations, we generally set the number of threads = CPU Quantity + 1.

Of course, there is a detailed calculation formula in various books or on the Internet to calculate the more appropriate and accurate number of threads, but the result is often a relatively small value, such as blocking IO, which also creates thousands of threads. The system is unable to carry such a load, and it is even less able to provide efficient throughput and services.

The multiplexing IO model uses one thread to put these 10000 successful links into the event one after another_ poll,event_ Poll will register callback functions for these 10000 long connections. When a long connection is ready (successful establishment, data reading, etc.), it will be written to event through callback function_ In this way, the single thread can get the required data by reading the rdlist.

It should be noted that except for asynchronous IO, other I / O models can be classified as blocking I / O models. The difference is that when the blocking I / O model reads data in the first stage, if the data is not ready at this time, it needs to be blocked. When the data is ready in the second stage, it needs to copy the data from the kernel state to the user state. This step is also blocking. The multiplexing IO model is not blocked in the first stage, but only blocked in the second stage.

In this way, you can use one or several threads to handle a large number of connections, which greatly improves the throughput

Why can redis single thread achieve million + QPS?

3. Event mechanism

The connection between the redis client and the redis server, the sending of commands, and the response of the redis server to commands all need to be done through the event mechanism, as shown in the figure below

Why can redis single thread achieve million + QPS?

  • First, the redis server runs and listens to the AE of the socket_ The readable event is in the listening stateConnect to answer processorwork
  • The client initiates a connection with the redis server and listens to the socket to generate AE_ Readable event. When the IO multiplexer detects that it is ready, it presses the event into the queue, and the file event dispatcher obtains the events in the queue and hands them over to the userConnect to answer processorWork processing, respond to the successful establishment of the client connection, at the same time, the client socket AE_ Readable events are pushed into the queue, and the events in the queue are obtained by the file event dispatcherCommand request processor Association
  • The client sends the set key value request, and the AE of the client socket_ Readable event. When the IO multiplexer detects that it is ready, it presses the event into the queue, and the file event dispatcher obtains the events in the queue and hands them over to the userCommand request processor Associationhandle
  • Command request processor AssociationAfter the processing is completed, it needs to respond to the completion of the client operation. At this time, the socket AE will be generated_ The writable event is pushed into the queue, and the event in the queue is obtained by the file event dispatcher and submitted to theCommand recovery processorProcess, return the operation result, and release AE after completion_ Writable event andCommand recovery processorThe relevance of

Reactor mode

Generally speaking, the working mode of redis is that the reactor mode cooperates with a queue, uses a serveraccept thread to handle the link of the request, and uses the IO multiplexing model to let the kernel listen to these sockets. Once the read and write events of some sockets are ready, the corresponding events are pushed into the queue, and then the worker can listen to them When an event is executed, the file event dispatcher will get the next event from the queue for processing.

Similarly, in network, we usually set bossgroup and workergroup. By default, bossgroup is 1, workergroup = 2 * number of CPUs. In this way, multiple threads can handle read-write ready events, but there can be no time-consuming operations. If there are any, they need to be put into the thread pool, otherwise, their vomit will be reduced. In redis, we can see that the value of both is 1.

Why should the stored value not be too large

For example, a string key = a stores 500MB. First, read the event and press it into the queue. After the file event dispatcher gets it, it will send it to the command request processor for processing. Here, it involves loading 500MB from the disk.

For example, if the read speed of an ordinary SSD hard disk is 200MB / s, it will take 2.5s to read. It is faster to read data in memory, such as 50g / s in DDR4, and it will take about 100ms to read 500MB.

Thread libraries generally default to 10 ms, even if the query is slow, and most of the instruction execution time is microseconds. At this time, all requests of other sockets will be in the process of waiting, which will lead to 100 ms blocking, and at the same time, it will occupy a large bandwidth, resulting in a further decline in throughput.

What’s the gain after reading this article? Please forward it to more people

Focus on “back end developer community” to improve Java skills

followBack end developer communityWeChat official account, background reply:A gift bag from the Chinese Agricultural UniversityA copy of the latest technical data can be obtained. Covers Java framework learning, architect learning and so on!

Why can redis single thread achieve million + QPS?