More than 4000 words to explain the internal working principle of CODIS

Time:2020-2-12

I. Introduction
CODIS is a distributed redis solution that can manage a large number of redis nodes. As a professional third-party push service provider, personal push has been focusing on providing developers with efficient and stable message push services for many years. Every day, the number of messages sent through each push platform can reach 10 billion levels. Based on the high requirements of individual push service for data volume, concurrent volume and speed, it is found that the performance of a single redis node is prone to bottleneck. After considering all factors, we choose CODIS to better manage and use redis.

2、 Why choose CODIS
With the rapid growth of the company’s business scale, our demand for data storage is also growing. Practice shows that in a single redis node instance, high concurrent and massive storage data is easy to cause the memory to skyrocket.

In addition, the memory of each redis node is also limited for the following two reasons:

One is that the memory is too large. When data is synchronized, the way of full synchronization will lead to too long time, thus increasing the risk of synchronization failure;
Second, more and more redis nodes will lead to huge maintenance costs in the later period.

Therefore, we conducted in-depth research on three mainstream redis node management solutions, namely, tweetproxy, CODIS and redis cluster.
More than 4000 words to explain the internal working principle of CODIS

The biggest disadvantage of Twitter’s open-source tweetproxy is that it can’t expand and shrink smoothly. Redis cluster requires that the client must support the cluster protocol. To use redis cluster, you need to upgrade the client, which is a big cost for many stock businesses. In addition, the P2P mode of redis cluster increases the communication cost, and it is difficult to know the current status of the cluster, which undoubtedly increases the difficulty of operation and maintenance.

The open-source CODIS of Peapod can not only solve the problem of expanding and shrinking the capacity of the tweetroxy, but also be compatible with the tweetroxy, and it is the first to mature and stabilize when the redis cluster (redis official cluster scheme) vulnerabilities are frequent. So finally, we use the cluster solution of CODIS to manage a large number of redis nodes.

At present, individual push uses redis and CODIS comprehensively in push business, redis is used in small business lines, and CODIS is used in business lines with large data volume and numerous nodes.

We need to clearly understand how CODIS works internally, so as to better ensure the stable operation of CODIS cluster. Next, we will analyze how the dashboard and proxy of CODIS work from the perspective of CODIS source code.

3、 Introduction to CODIS
CODIS is a proxy middleware developed in go language. The location of CODIS in the system is shown in the following figure:
More than 4000 words to explain the internal working principle of CODIS
CODIS is a distributed redis solution. For upper level applications, there is no obvious difference between connecting to the CODIS proxy and connecting to the native redis server. Some commands do not support it;

The bottom layer of CODIS will deal with the forwarding of requests and data migration without stopping. For the previous clients, CODIS is transparent. It can be simply considered that the client is connected to a redis service with unlimited memory.

CODIS is divided into four parts:
Codis Proxy (codis-proxy)
Codis Dashboard
Codis Redis (codis-server)
ZooKeeper/Etcd

Codis architecture
More than 4000 words to explain the internal working principle of CODIS

4、 How dashboard works internally

Introduction to Dashboard
Dashboard is a cluster management tool of CODIS. All operations on the cluster, including the addition, deletion and data migration of proxy and server, must be completed through dashboard. The startup process of dashboard is the initialization of some necessary data structures and cluster operations.

Dashboard startup process
The startup process of dashboard is mainly divided into two steps: new() and start().

New () stage
⭕ when starting, first read the configuration file and fill in the config information. If the value of coordinator is “zookeeper” or “etcd”, a client of ZK or etcd will be created. Create a topom {} object from config. Topom {} is very important. This object stores all node information (slot, group, server, etc.) in the cluster at a certain time. The new () method will assign a value to the topom {} object.

A kind of Then start port 18080 to listen and process the corresponding API requests.

⭕ finally start a background thread, and clean up the invalid clients in the pool every other minute.

The following figure shows the corresponding data structure in memory when dashboard is in new().

More than 4000 words to explain the internal working principle of CODIS
Start() phase

⭕ in the start() phase, write the model. Topom {} in memory to ZK, and the path is / codis3 / CODIS demo / topom.

A kind of Set topom. Online = true.

⭕ then get the latest slotmapping, group, proxy and other data from ZK through topom.store and fill it into topom.cache (topom.cache, the cache structure. If it is empty, take slotmapping, proxy, group and other information from ZK through store and fill in the cache. It is not only when the cache is started for the first time that it will be empty. If the elements (server, slot, etc.) in the cluster change, dirtycache will be called to set the information in the cache to nil, so that the latest data will be retrieved from ZK through topom.store next time.)

Finally, start four goroutine for loops to handle the corresponding actions.
More than 4000 words to explain the internal working principle of CODIS

Create group process
The process of creating groups is simple.
⭕ first, we pull the latest slotmapping, group, proxy and other data from ZK through topom.store to fill in topom.cache.

⭕ then check according to the latest data in the memory: check whether the ID of the group already exists and whether the ID is in the range of 1-9999.

Next, create the group {} object in memory and call zkclient to create the path / codis3 / CODIS demo / group / group-0001.

Initially, the group is empty.
{

"id": 1,
"servers": [],
"promoting": {},
"out_of_sync": false

}

Add CODIS server
Next, add the CODIS server to the group. The dashboard will first connect to the backend CODIS server to determine whether the node is normal.

A kind of Then execute the slotsinfo command on the CODIS server. If the command fails to execute, it will lead to the end of the CORDIS server adding process.

⭕ after that, pull the latest slotmapping, group, proxy and other data from ZK through topom.store and fill them into topom.cache. Check according to the latest data in memory to determine whether the current group is making master-slave switch. If so, exit. Then check whether the group server already exists in ZK.

Finally, create a groupserver {} object and write it to ZK.
When the CODIS server is added successfully, as we said above, when topom {} starts, there are four goroutine for loops, among which refreshredisstats() can put the connection of the CODIS server into topom.stats.redisp.pool

More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS

tips
A kind of When topom {} starts, there are four goroutine for loops, in which refreshredisstats will put the connection of the CODIS server into topom.stats.redisp.pool during its execution;

⭕ refreshredisstats() is executed once a second. The logic is to get all the CODIS servers from topom.cache, and then get the client from topom.stats.redisp.pool.pool according to the addr of CODIS server. If it can be retrieved, execute the info command; if it cannot be retrieved, create a new client, put it in the pool, then use the client to execute the info command, and put the execution result of the info command into topom.stats.servers.

CODIS server master slave synchronization
When two nodes are added to a group, click the master-slave synchronization button to change the second node into the first slave node.

⭕ first, the first step is to refresh topom.cache. We retrieve the latest slotmapping, group, proxy and other data from ZK through topom.store and fill them into topom.cache.

⭕ then judge according to the latest data: group. Promoting. State! = models. Actionnothing, indicating that the promotion of the current group is not empty, that is, two cordis servers in the group are switching between the master and slave, and the master-slave synchronization fails;

Group. Servers [index]. Action. State = = models. Actionpending, indicating that the current node as a salve role has a state of pending, and the master-slave synchronization fails;

⭕ after the judgment is passed, obtain the value + 1 of the largest action.index with the status of all CODIS servers as actionpending, assign it to the current CODIS server, and then set the status of the current node as the slave role as: g.servers [index]. Action.state = models.actionpending. Write this information into ZK.

⭕ topom {} at start, there are four goroutine for loops, one of which is used to specifically handle the master-slave synchronization problem.

A kind of After clicking the master-slave synchronization button on the page, the corresponding data structure in the memory will change accordingly:
More than 4000 words to explain the internal working principle of CODIS

⭕ group information written into ZK:

More than 4000 words to explain the internal working principle of CODIS
tips

When topom {} starts, there are four goroutine for loops, one of which is specifically used to handle master-slave synchronization. How to do it?

First, get the latest slotmapping, group, proxy and other data from ZK through topom.store and fill it into topom.cache. After getting the latest cache data, get the group server that needs to be master-slave synchronization, modify group.servers [index]. Action. State = = models.actionsyncing and write it into ZK.

Secondly, the dashboard connects to the node as the salve role, opens a redis transaction, and executes the master-slave synchronization command:

c. Send (“multi”) – > open transaction
c.Send(“config”, “set”, “masterauth”, c.Auth)
c.Send(“slaveof”, host, port)

c.Send(“config”, “rewrite”)
c.Send(“client”, “kill”, “type”, “normal”)
c. Do (“exec”) – > thing execution

A kind of After the master-slave synchronization command is executed, modify group. Servers [index]. Action. State = = “synchronized” and write it to ZK. So far, the whole master-slave synchronization process has been completed.

In the process of making decisions and synchronizing, CODIS server will experience five states from the beginning to the completion:

“” (actionnothing) – > the status of the newly added CODIS is empty when there is no master-slave relationship
Pending (actionpending) – > the page is written to ZK after clicking master-slave synchronization
Syncing (actionsyncing) – > when the background goroutine for loop processes master-slave synchronization, it writes the intermediate state of ZK
Synced — > goroutine for loop processes the state written to ZK after master-slave synchronization succeeds
Synced ﹐ failed — > goroutine for loop handles the state written to ZK after master-slave synchronization fails

Slot assignment
Above, we added a CODIS server to the CODIS cluster and made master-slave synchronization. Next, we assigned 1024 slots to each CODIS server. CODIS provides users with a variety of ways. It can move a slot with a specified sequence number to a specified group, or move multiple slots in a group to another group. However, the most convenient way is to automatically rebalance.

Through topom.store, we first retrieve the latest slotmapping, group, proxy and other data from ZK and fill them into topom.cache. Then, according to the latest slotmapping and group information in the cache, we generate slots allocation plan plans = {0:1, 1:1 342:3,… 512:2,… 853:2,… , 1023:3}, where key is slot ID and value is group ID. Next, we update the slotmapping information according to the slots allocation plan: action. State = actionpending and action. Targetid = target group ID assigned to the slot, and write the updated information back to ZK.

Topom {} at start, there are four goroutine for loops, one of which is used to handle slot allocation.

SlotMapping:

More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS
tips
● when topom {} starts, there are four goroutine for loops, in which the connection of CODIS server is put into topom.action.redisp.pool during the execution of processslotaction.

● processslotaction() is executed once a second. After a series of processing logic is executed, it will get the client from topom {}. Action. Redisp. Pool. Pool, and then execute the slotsmgrttagslot command on redis. If the client can be retrieved, the dashboard will execute the migration command on redis; if it cannot be retrieved, create a new client, put it in the pool, and then use the client to execute the migration command.

Seven statuses corresponding to action in slotmapping:
More than 4000 words to explain the internal working principle of CODIS
We know that zookeeper is responsible for the management of CODIS. When the CODIS dashbord of CODIS changes the slot information, other CODIS proxy nodes will monitor the slot change of zookeeper and synchronize the slot information in time.

More than 4000 words to explain the internal working principle of CODIS
To summarize, in the process of starting dashboard, you need to connect ZK, create topom struct, interact with the cluster through port 18080, and then forward the information received by the port. In addition, you need to start four goroutines, refresh the status of redis and proxy in the cluster, and handle slot and synchronization operations.

5、 How proxy works internally

Proxy startup process
The starting process of proxy includes four parts: new(), online(), reinitproxy(), and receiving client request().

New () stage
⭕ first, create a new proxy {} structure object in memory and make various assignments.
⭕ secondly, start port 11080 and port 19000.
⭕ then start three goroutine background threads to handle the corresponding operations:
● proxy starts a goroutine background thread and processes the request of port 11080;
● proxy starts a goroutine background thread and processes the request of port 19000;
● the proxy starts a goroutine background thread and maintains the back-end BC by Ping the CODIS server.

More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS

Online () phase
⭕ first, assign the ID of model. Proxy {}, id = CTX. Maxproxyid() + 1. If CTX. Maxproxyid() = 0 when adding the first proxy, the ID of the first proxy is 0 + 1.

Secondly, create the proxy directory in ZK.

⭕ after that, refresh reinitproxy (CTX, P, c) for proxy memory data.

Fourthly, set the following code:
online = true
proxy.online = true
router.online = true
jodis.online = true

Fifth, create the jodis directory in ZK.

More than 4000 words to explain the internal working principle of CODIS
More than 4000 words to explain the internal working principle of CODIS

reinitProxy()
⭕ dashboard gets the latest slotmapping, group, proxy and other data from ZK [M1] and fills it into topom.cache. According to the slot mapping and group data in the cache, the proxy can get the model. Slot {}, which contains the IP and port of each slot corresponding to the backend. Establish the connection of each CODIS server, and then put the connection into router.

⭕ the redis request is processed by a backendconn taken from the sharedbackendconn. Proxy.router stores the corresponding relationship between all sharedbackendconnpool and slots in the cluster, which is used to forward the request of redis to the corresponding slots for processing, while sharedbackendconnpool and slots in router maintain the latest value through reinitproxy().

Summarize the process of starting proxy. First read the configuration file and get the config object. Second, create a new proxy according to config, and fill in the properties of the proxy. It is important to fill in models.proxy (details can be seen in zk), and connect with ZK and register relevant paths.

Then, start goroutine to listen for and forward the requests from the CODIS set of port 11080, and listen for and process the redis requests sent to port 19000. Next, refresh the data in ZK to memory, and create 1024 models.slots in proxy.router according to models.slotmapping and group. In this process, the router assigns a corresponding backendconn to each slot to forward the redis request to the corresponding slot for processing.

6、 CODIS internal principle supplement
The key allocation algorithm in CODIS is to first CRC32 the key to get a 32-bit number, then hash% 1024 to get a remainder. This value is the slot corresponding to the key, and the slot back corresponds to the instance of redis.
More than 4000 words to explain the internal working principle of CODIS
There are seven statuses in a slot: nothing (represented by an empty string), pending, preparing, prepared, migrating, and finished.

How to ensure that slots do not affect the client’s business during migration?
⭕ the client sends the command to the proxy, and the proxy will figure out which slot the key corresponds to, such as 30, and then go to the proxy router to get slot {}, including backend.bc and migrate.bc. If migrate.bc has a value, the system will take out migrate.bc.conn (back-end CODIS server connection), forcibly migrate the key to the target group on the CODIS server, then take out backend.bc.conn, access the corresponding back-end CODIS server, and carry out the corresponding operation.

7、 The deficiency of CODIS and the improvement of the use of personal push

Lack of CODIS
⭕ lack of security considerations, the CODIS Fe page has no login verification function;
A kind of Lack of self-contained multi tenant scheme;
⭕ lack of cluster capacity reduction scheme.

Improvements in the use of personal push
⭕the way of squid agent is used to restrict the access of Fe page. In the later stage, the secondary development based on Fe is used to control the login;
A kind of Small businesses reuse the same cluster by adding business identification to the key prefix; large businesses use independent clusters and machines;
A kind of The method of manually migrating data, vacating nodes and offline nodes is used to reduce capacity.

8、 Full text summary
As an important basic service of push message push, the performance of CODIS is very important. After the redis node is migrated to CODIS by personal push, the problems of expanding capacity and operation and maintenance management are effectively solved. In the future, getui will continue to focus on CODIS and discuss with you how to use it better in the production environment.

More than 4000 words to explain the internal working principle of CODIS