Previously, we mentioned that in order to ensure the high availability of redis, we need the following aspects:
- Data persistence
- Master slave replication
- Automatic fault recovery
Let’s make a brief analysis of the characteristics of these schemes and the relationship between them.
Data persistence is essentially for data backup. With data persistence, when redis goes down, we can recover the data from the disk. However, before data recovery, the service is not available, and the time of data recovery depends on the size of the instance. The larger the amount of data, the slower the recovery. For the persistence process of redis, please refer to:How does redis persistence work? Comparative analysis of RDB and AOF.
The master-slave replication is to deploy multiple replica nodes. Multiple replica nodes replicate the data of the master node in real time. When the master node goes down, we have a complete replica node to use. On the other hand, if the amount of read requests of our business is large, the master node can not bear all the read requests, and multiple replica nodes can share the read requests to achieve the separation of read and write, which can improve the access performance of redis.
But there is a problem. When the primary node goes down, although we have a complete replica node, we need to manually upgrade the secondary node to the primary node to continue to provide services. If the primary node fails every time, it needs manual operation. This process is time-consuming and labor-consuming, and can not guarantee the timeliness. The degree of high availability will be greatly reduced. How to optimize it?
YesData persistence, master-slave replication and automatic recoveryThese functions, when we use redis, can we rest easy?
The answer is No. if most of our business is read request, we can use read-write separation to improve performance. But what if the number of write requests is also large? This is the era of big data. Large companies such as Alibaba and Tencent have a huge amount of write in all the time. At this time, if only one master node can’t bear it, how to deal with it?
That’s itNeed clustering！ In a nutshell, the way to do it is,Multiple master-slave nodes form a cluster, and each node stores part of the data. In this way, write requests can also be distributed to multiple master nodes to solve the problem of high write pressure. At the same time, clustering can dynamically add new nodes when the capacity and performance of the nodes are insufficient, expand the capacity and improve the performance.
Starting from this article, we will introduce the clustering scheme of redis. Of course, clustering also means that the redis deployment architecture is more complex and the cost of management and maintenance is higher. And in the process of using, we will encounter many problems, which also leads to different clustering solutions with different focuses.
In this article, we will first give an overall introduction to several popular solutions for redis clustering. First, we will have an overall understanding of them. Later, I will make a detailed analysis of the cluster solutions that I am familiar with.
To achieve clustering, it is necessary to deploy multiple master nodes, and each master node may also have multiple slave nodes. The cluster composed of such a deployment structure can better undertake larger traffic requests and store more data.
It is the most basic function of a cluster to be able to bear more traffic. The general clustering scheme also includes the functions mentioned above, such as data persistence, data replication and automatic fault recovery. These technologies are used to ensure the high performance and high availability of the cluster.
In addition, the excellent clustering scheme is also realizedOnline horizontal expansionWhen the number of nodes is not enough, new nodes can be added dynamically to improve the performance of the whole cluster. Moreover, this process is completed online without service awareness.
The mainstream redis clustering solutions in the industry mainly include the following:
- Client fragmentation
- Redis Cluster
They’re OKDivided by centralization or notOf whichClient fragmentation and redis cluster are decentralizedThe cluster solution of,CODIS and tweeproxy are centralizedThe cluster solution of this paper is given.
Centralization refers to whether the client accesses multiple redis nodes directly or through an intermediate layer proxy. Direct access belongs to the non centralization scheme, while access through the intermediate layer proxy belongs to the centralization scheme. They have their own advantages and disadvantages, which are introduced below.
Client fragmentation mainly means that we only need to deploy multiple redis nodes. How to use these nodes is mainly in the client.
The client uses a fixed hash algorithm to calculate the corresponding hash value for different keys, and then reads and writes to different redis nodes.
Client fragmentation cluster mode
Client fragmentation needs business developersFirst evaluate the amount of requests and dataAnd then let the DBA deploy enough nodes for the developers to use.
The advantage of this scheme is that it is very convenient to deploy. The number of nodes needed by the business can be directly deployed and delivered by the DBA. The rest needs the business developers to write the key according to the number of nodesRequest routing logicTo make a rule, we usually use a fixed hash algorithm to write different keys to different nodes, and then read the data according to this rule.
It can be seen that its disadvantage is business developersThe cost of using redis is highIt is necessary to write the code of routing rules to use multiple nodes, and if the data volume of the service is not accurately evaluated in advance,Later expansion and migration costs are very highBecause after the number of nodes changes, the corresponding node of hash algorithm is no longer the previous node.
So later, the consistent hash algorithm is derived to solve the problem of minimizing data migration and performance when the number of nodes changes.
This client fragmentation scheme is generally used in business scenarios where the amount of business data is relatively stable and there will be no significant growth in the later stage. It only needs to evaluate the amount of business data in the early stage.
With the development of business and technology, people feel more and more that when I need to use redis, we don’t want to care how many nodes are behind the cluster. We hope that the redis we use is a large cluster. When our business volume increases, this large cluster can be usedAdd new nodes to solve the problem of insufficient capacity and performance。
This method is the server fragmentation scheme. The client does not need to care about the number of redis nodes behind the cluster. It only needs to operate the cluster like a redis. This scheme will greatly reduce the use cost of developers. Developers can only pay attention to the business logic, and do not need to care about the resources of redis.
How can a cluster composed of multiple nodes be used by developers like a redis? This involves how many nodes are organized to provide services. Generally, we will add a proxy layer between the client and the server. The client only needs to operate this proxy layer. The proxy layer implements specific request forwarding rules, and then forwards requests to the following nodes. Therefore, this method is also called centralized clustering scheme, CODIS is a clustering solution implemented in this way.
Proxy cluster mode
CODIS architecture diagram
CODIS was developed by the former pea pod God in China, which adopts the centralized cluster scheme. Because the proxy layer is needed to forward all requests, the performance of the proxy is very high. CODIS is developed with go language, which is compatible with the development efficiency and performance.
CODIS consists of several components:
- CODIS proxy: mainly responsible for forwarding the read and write requests
- CODIS dashbaord: unified control center, integrating data forwarding rules, automatic fault recovery, online data migration, node expansion and reduction, automatic operation and maintenance API and other functions
- CODIS group: redis server based on redis 3.2.8 has added the function of asynchronous data migration
- CODIS Fe: UI interface for managing multiple clusters
It can be seen that CODIS has quite a lot of components, and its functions are very completeIn addition to the request forwarding function, it also realizes online data migration, node expansion and reduction, automatic fault recovery and other functions.
The proxy of CODIS is the component responsible for request forwarding, which maintains the specific rules of request forwarding. CODIS divides the whole cluster into 1024 slots. When processing read-write requests, crc32hash algorithm is used to calculate the hash value of the key, and then the 1024 slots are modeled according to the hash value, and finally the specific redis node is found.
The biggest feature of CODIS is that it can be expanded online without affecting the access of clients, that is, it does not need to be shut down. This is very convenient for business users. When the cluster performance is not enough, you can dynamically add nodes to improve the performance of the cluster.
In order to realize online expansion and ensure reliable data performance in the process of data migration, CODIS has modified redis and added relevant commands for asynchronous data migration. It is developed based on redis 3.2.8, and the upper layer cooperates with dashboard and proxy components to complete data migration and expansion functions without business loss.
Therefore, if you want to use CODIS, you must use its built-in redis, which means that whether the redis in CODIS can keep up with the latest official version of features may not be guaranteed, which depends on the maintenance party of CODIS. At present, CODIS is no longer maintained, so you can only use version 3.2.8 of redis when using CODIS, which is a pain point.
In addition, since clustering requires the deployment of multiple nodes, the operation of cluster can not achieve all functions exactly like the operation of a single redis. The main reason is to disable or restrict the commands that may cause problems when operating multiple nodes. For details, please refer to the list of commands not supported by CODIS.
However, this does not affect that it is an excellent clustering solution. Since our company used redis cluster solution earlier, and redis cluster was not mature enough at that time, our company used redis cluster solution as CODIS.
At present, my work mainly focuses on CODIS. Our company has customized CODIS, and has also made some modifications to redis, so that CODIS can support data synchronization across multiple data centers. Therefore, I am familiar with CODIS code. Later, I will write some special chapters to analyze the implementation principle of CODIS and learn its principle, This is very helpful for us to understand distributed storage!
Twitter proxy is an open source clustering solution by twitter. It can be used as both redis proxy and memcached proxy.
Its function is relatively single. It only implements request routing and forwarding, and does not have the function of online expansion as CODIS does. The key point of its solution is to put the logic of client fragmentation into the proxy layer, and other functions do not do any processing.
In the early days, when there was no good server fragmentation cluster solution, the application range of tweeproxy was very wide, and the performance was extremely stable.
But it’s notThe pain point is that you can’t expand or shrink onlineThis makes the operation and maintenance very inconvenient, and there is no friendly operation and maintenance UI to use. CODIS is derived from this background.
When adopting the centralized mode of adding a layer of proxy in the middle, the requirement for proxy is very high, because once it fails, all clients operating the proxy can’t handle it. In order to realize the high availability of proxy, another mechanism is needed, such as keep alive.
Moreover, adding a layer of proxy for forwarding will inevitably lead to certain performance loss. In addition to the client fragmentation and the above-mentioned centralized solution, is there a better solution?
Redis cluster, which is officially launched by redis, has a new way. Instead of adopting the proxy scheme of centralized mode, it puts part of the request forwarding logic on the client and part on the server. They cooperate with each other to complete the request processing.
Redis cluster was launched in redis 3.0. The early rising redis cluster has not been widely promoted because it has not undergone strict testing and production verification. It is in this context that the industry has derived the above-mentioned centralized cluster solutions: CODIS and tweeproxy.
However, with the version iteration of redis, the official cluster of redis is becoming more and more stable, and more people begin to adopt the official clustering scheme. It is also because it is officially launched, so its continuous maintenance can be guaranteed, which has more advantages than those third-party open source solutions.
Redis cluster has no intermediate proxy layer, so how to forward the request?
Redis puts the logic of request forwarding in the smart client. To use redis cluster, you must upgrade the client SDK. The SDK has built-in logic of request forwarding, so business developers do not need to write forwarding rules themselves. Redis cluster uses 16384 slots to forward routing rules.
Without the proxy layer for forwarding, the client can directly operate the corresponding redis node, thus reducing the performance loss of the proxy layer forwarding.
Redis cluster also providesOnline data migration, node expansion and reduction, etcIn addition, it has built-in sentry function to complete automatic recovery of fault, which shows that it is a cluster integrating all functions. Therefore, it is very simple to deploy, it does not need to deploy too many components, and it is extremely friendly for operation and maintenance.
When redis cluster migrates node data, expands and shrinks capacity, it also deals with the request processing of the client. When the data accessed by the client happens to be in the process of migration, the server and the client make some protocols to inform the client to visit on the correct node and help the client revise its routing rules.
Although redis cluster provides the function of online data migration, its migration performance is not high. When large keys are encountered in the migration process, two migrated nodes may be blocked for a long time. Compared with CODIS, this function has better data migration performance. It’s good to learn about one first. Later, I will write some articles on the performance comparison between CODIS and redis cluster online migration functions.
Now more and more companies begin to use redis cluster, and the capable companies have carried out secondary development and customization on the basis of redis cluster to solve some problems existing in redis cluster. We look forward to the better development of redis cluster in the future.
After comparing these clustering solutions, let’s summarize them.
The mainstream clustering solutions in the industry are the above, and their characteristics and differences are briefly introduced. We can choose our own appropriate clustering solutions in the development process, but it is better to understand their implementation principles, and solve problems in the use process more calmly.