Redis cluster solution


Some time ago, we set up a redis cluster and wanted to use it as online storage for the recommendation system. It’s interesting to say that the infrastructure here is not perfect. Therefore, we need to build this storage environment by ourselves as the recommender system, and we have a lot of trouble. The awesome performance of the machine that the company gives to the machine can satisfy the current business needs. The main idea of redis cluster is as follows:

1. Scalability, scale-out, after the amount of data becomes large, it will not be pushed to the next time. Although redis can enable the virtual memory function, and a single machine can provide more than the maximum capacity of physical memory, frequent page swapping between the memory and the hard disk will greatly reduce its performance, which is a bit against the original design intention of redis.

2. Redis is a single thread IO reuse structure, which can not effectively utilize the multi-core structure of the server. If multiple redis processes can be started on a multi-core machine to provide services together, the efficiency will be higher.

3. Master slave, data backup and disaster recovery..

Therefore, the planned redis cluster hopes to achieve the following functions:

1. Data sharding, which supports data slicing.

2. Master slave backup, master node write data, master and slave provide read request service, and support master-slave automatic switch.

3. Read requests are load balanced.

4. Better, support node failure, data automatic migration.

The following is a process of going through before and after:

[step 1] try the official plan

I certainly want to check the official cluster scheme of redis, but unfortunately, the official statement on cluster is as follows:

Unfortunately Redis Cluster is currently not production ready, however you can get more information about it reading the specification or checking the partial implementation in the unstable branch of the Redis GitHub repositoriy.

Once Redis Cluster will be available, and if a Redis Cluster complaint client is available for your language, Redis Cluster will be the de facto standard for Redis partitioning.

Redis Cluster is a mix between query routing and client side partitioning.

Because we want to deploy the production environment here, we still dare not use the unstable branch at present. We have no resources and time to develop in advance in the current official version, so we gave up.

[step 2] tentative plan

After abandoning the official scheme, I wanted to build one by myself. At that time, the initial idea was: load balance the read request with LVS, write a consistent hash algorithm in the client code to do data slicing, configure redis master-slave, and configure keepalived as master-slave automatic switch. This scheme should be feasible, but I had some problems in details at that time, so I asked on stack overflow. The questions are as follows:

Since the redis cluster is still a work in progress, I want to build a simplied one by myselfin the current stage. The system should support data sharding,load balance and master-slave backup. A preliminary plan is as follows:

  1. Master-slave: use multiple master-slave pairs in different locations to enhance the data security. Matsters are responsible for the write operation, while both masters and slaves can provide the read service. Datas are sent to all the masters during one write operation. Use Keepalived between the master and the slave to detect failures and switch master-slave automatically.

  2. Data sharding: write a consistant hash on the client side to support data sharding during write/read in case the memory is not enougth in single machine.

  3. Load balance: use LVS to redirect the read request to the corresponding server for the load balance.

My question is how to combine the LVS and the data sharding together?

For example, because of data sharding, all keys are splited and stored in server A,B and C without overlap. Considering the slave backup and other master-slave pairs, the system will contain 1(A,B,C), 2(A,B,C) , 3(A,B,C) and so on, where each one has three servers. How to configure the LVS to support the redirection in such a situation when a read request comes? Or is there other approachs in redis to achieve the same goal?


One netizen gave two suggestions:

You can really close to what you need by using:

twemproxy shard data across multiple redis nodes (it also supports node ejection and connection pooling)

redis slave master/slave replication

redis sentinel to handle master failover

depending on your needs you probably need some script listening to fail overs (see sentinel docs) and clean things up when a master goes down

This netizen’s two suggestions are very enlightening. When I looked at the official DOC of redis, I had some impression on tweakproxy, but I didn’t pay much attention to it at that time. As for the latter, using redis sentinel as master failure, redis sentinel is also a module under development by redis, so I don’t dare to use it.

In addition, there are two reasons for me to abandon my preliminary plan

1. When I was writing client data sharding and balancing services, I found that the problems to be considered were more complicated than what I thought at first. If I finished writing, it would be equivalent to doing the function of tweakproxy once, and there was still less work to do to build wheels.

2. The functions are redundant. A read request must go through sharding on the client side, and then through LVS to the actual server. Without optimization, there will be a lot of delay.

[step 3] the final scheme is shown in the figure below

The picture is quite clear, so there is no more explanation.

Twimproxy is an open source database proxy service of twitter, which can be used for sharding of memcached and redis. It is compatible with the standard interface of the two. However, it does not support commands such as keys and dbsize of redis. In fact, if you think about it, proxy, which is used to do cross machine statistics in the pool, will not be supported. In addition, twoproxy uses pipeline to send commands between itself and the background redis, so the performance loss is relatively small. However, for each client connection, the number of MBUF opened by tweakproxy is limited. The maximum value can be set to 64K. If a pipeline is also used between the client proxy layer and the tweakproxy, the pipeline cannot be too deep and does not support the atomicity of pipeline. In fact, at this time, it is equivalent to that there are two layers of pipelines between the client connection and the redis database, which are It is easy to understand that the pipeline from the client to the tweakproxy and the pipeline from the tweakroy to the background redis server do not support the pipeline transaction due to their inconsistent buffer depth.. With the introduction of tweakproxy, it is time-consuming to insert large-scale data. Moreover, pipeline does not guarantee atomicity. The recovery problem when data is lost should be paid more attention on the client side.For non transaction pipeline data always lost, or for the key with large data volume, it was found that the timeou value on the tweakproxy side was set too small. If 400ms was set according to the official example, a timeout failure would be returned when a large amount of data was operated at one time. This value needs to be careful according to the business (specifically, a single command operation of the client) In general, 2000 ms is enough (it can support nearly one million data per operation).

In the above structure, the load balancing of read operation is done in the client code, and the write operation control is also in the code of the client layer. In addition, keepalived can be introduced between the single point and the master-slave to eliminate the single point and fault recovery.