Time：2021-10-22

# background

As the application becomes larger and larger, the amount of data becomes more and more. Neither the single database, single table nor single redis of MySQL database can meet the high concurrent read-write operations and the storage function of large amount of data. Therefore, we are familiar with itSub database and sub table

## Vertical split and horizontal split

Vertical splitting, that is, splitting columns, splits the information of the original table into two tables from the business,The image is understood as a vertical knife cutting the table

Horizontal split, i.e. split branches, throughSome ruleSplit the data in one table into different tables,The image is understood as a horizontal knife cutting the table

Of course, the concepts of vertical split and horizontal split are not the focus of this paper, but the rules of horizontal split. The essence of horizontal splitting actually coincides with the idea of microservices in distributed systems. The author mentioned earlier that the emergence of microservices is because a single service carries too much business logic, resulting in problems such asCode logic is huge and complex, developer relationship coupling, single point of failure and so on。 These problems also affect dB and redis:

• When there are more and more access requests, the stand-alone dB / redis cannot allocate enough threads to resist highly concurrent requests
• When the amount of data is increasing, the difficulty and time-consuming of finding a needle from a bowl of water and a needle from a sea are also different. What we have to do is to find the bowl and then get the needle from the bowl

Therefore, we need to expand dB / redis to multiple servers.

# Several routing algorithms in distributed clusters

I believe the previous introduction has explained the necessity of horizontal splitting. The problem now is how to split,According to what rulesClassify different data into different MySQL databases / MySQL tables / redis machines. Let’s take userid as the key as an example.

## Fixed hash

Fixed hash is easy to understand. The database and table division logic of the department database where the author is now is simple(userid % 32) ^ (userid >> 32), the upper 32 bits and lower 32 bits of userid are taken for and operation.

The advantage of this isSimple logic, I believe this is also the database and table splitting method for DB Business of many companies. If you can ensure that the user ID canuniform distributionOn each slice.
The disadvantage is poor scalability. When new services need to be added, the new machine cannot be routed at all; When offline service is required, the fixed hash will inevitably lead to the failure of the request to the service.

## Consistent Hashing

Consistent Hashing The biggest change is to turn the hash value corresponding to the node into a range. You can imagine the consistency hash as a ring clock. Now we have three machines in reverse at 12:00, 4:00 and 8:00. At this time, we calculatehash(key)Then find the nearest machine node

For example, if hash (key) = 2, the 4 o’clock node is found

Although consistent hash can stably switch requests to new machines, it also has some small defects. Because the result of hash modulus algorithm is random, we can not guarantee that each service node can be evenly distributed to the hash ring, which leads to the classicalHot issues, also known as data skewFor example, as shown in the figure, the 8 o’clock service will be overloaded.

## Introduce consistent hash of virtual node

In order to solve the above problems, we introduce the concept of virtual node. We map multiple hashes for each machine,

hashA = hash("192.168.0.1-A") % 32
hashB = hash("192.168.0.1-B") % 32
hashC = hash("192.168.0.1-C") % 32

Thus, a machine has multiple virtual nodes on the ring, as shown in the figure

## Custom calculation method

As mentioned above, in the early stage of the company where the author works, the rule of database distribution is fixed hash. However, according to the actual business performance, some users (called large users) have n times more access than small users. Routing ordinary users to a dB or redis is bound to affect the reading and writing of ordinary users, Therefore, for these special users, a separate rule is made to route to the special partition with partition number 9999.

$$ip = \begin{cases} ip[userid \% 32],\quad userid\ not\ in\ big\ users \\ ip[9999], \quad userid\ in\ big\ users \end{cases}$$

The above mode is actually a user-defined calculation method.

# reference

## SQL statement of three-level linkage of provinces, cities and counties

The first is the table creation statement Copy codeThe code is as follows: CREATE TABLE t_address_province ( id INT AUTO_ Increment primary key comment ‘primary key’,Code  char (6) not null comment ‘province code’,Name  varchar (40) not null comment ‘province name’）Engine = InnoDB default charset = utf8 comment = ‘province information table’; CREATE TABLE […]