How to implement consistent hash algorithm of distribution algorithm with PHP

Time:2022-7-24
catalogue
  • Defects of traditional algorithm
  • Algorithm idea
  • Algorithm implementation
  • summary

Defects of traditional algorithm

For server distribution, we should consider the following three points: the data is evenly distributed, the search and location are accurate, and the impact of downtime is reduced.

Traditional algorithms usually map the keys of data into numbers with algorithms, model the number of servers used, and select the server to store according to the results. It can meet the requirements of average data distribution and accurate search and location, and has the advantages of simple algorithm and small amount of calculation during access (it will be obvious when the data is very large).

However, it has a fatal disadvantage, that is, the impact of a server downtime is great. We can calculate the impact of a server downtime:

  • Most of the original data is lost: the number of servers is reduced by one, and the modulus is reduced by 1, resulting in the disorder of modulus. If there were n servers before, only 1/ (n* (n-1)) of data can be accurately found after downtime.
  • The load cannot be balanced, resulting in collective downtime: if the down server is not handled in time, its storage tasks will be accumulated to its next server in sequence, and the next server will soon be pressed to downtime. In this way, the server group will soon be collectively down.

Algorithm idea

Consistent hash algorithm is to use a certain hash algorithm to map a large amount of data to different storage targets on average. While ensuring the accuracy of its search, we should also consider the load balance of other storage targets on their responsible storage content when one storage target fails.

The implementation idea of consistent hash algorithm is not difficult to understand, as shown in the figure:

1. A certain hash algorithm (hash function, etc.) is used to scatter the random mapping of multiple (self-set number) nodes of a group of servers into the range of 0-232. Due to its random distribution, it ensures the characteristics of the average distribution of its data;

2. Use the same algorithm to calculate the key to store data, and determine the server node to store according to the server node. Because the same algorithm is used every time, the results are the same, making its search and location accurate;

3. When searching the data, use the same algorithm to calculate the key again, and search the data node of the server;

4. If a server goes down, eliminate its server node and put the data on the next node. Due to the randomness of the random node location, the data is evenly loaded by other servers, which reduces the impact of downtime.

It should be noted that this annular space is only a virtual space, which only represents the storage range of the server and the location of the data. During storage, we also need to put the data into the corresponding server to check and modify by finding the location.

Algorithm implementation

Programming language we use PHP to implement consistent hash algorithm:

We mainly use the following functions:

int crc32 ( string $str )
Generate 32-bit cyclic redundancy check code polynomial of str. This is usually used to check whether the transmitted data is complete.

string sprintf ( string $format [, mixed $args [, mixed $… ]] )
The specific format form of the string is generated through the incoming format.

The implementation is as follows:

class Consistance
{
    protected $num=24;          // Set the number of nodes of each server. The more the number, the more evenly the server load will be distributed during downtime, but it also increases the consumption of data search.
    protected $nodes=array();   // The node list of the current server group.

    //Calculate the hash value of a data to determine the location
    public function make_hash($data)
    {
        return sprintf('%u',crc32($data));
    }

    //Traverse the node list of the current server group to determine the server that needs to be stored / found
    public function set_loc($data)
    {
        $loc=self::make_hash($data);
        foreach ($this->nodes as $key => $val)
        {
            if($loc<=$key)
            {
                return $val;
            }
        }
    }

    //Add a server and add its node to the node list of the server group.
    public function add_host($host)
    {
        for($i=0;$i<$this->num;$i++)
        {
            $key=sprintf('%u',crc32($host.'_'.$i));
            $this->nodes[$key]=$host;   
        }
        ksort($this->nodes);        // Sort the nodes so that it is easy to find.
    }

    //Delete a server and remove its corresponding node from the node list of the server group.
    public function remove_host($host)
    {
        for($i=0;$i<$this->num;$i++)
        {
            $key=sprintf('%u',crc32($host.'_'.$i));
            unset($this->nodes[$key]);
        }
    }
}

We test with the following code:

The results are as follows:

summary

So far as the algorithm is implemented, we can also optimize the algorithm. For example, when the number of servers and the number of nodes of each server are large, we can optimize the process of finding nodes, because the sorted ones can be found by dichotomy to speed up the query efficiency. People and wisdom see these.

In addition, although nginx server has plug-ins for consistency algorithm, Memcache and redis also have corresponding plug-ins, and MySQL middleware has corresponding integration, it is also meaningful to understand consistency hash algorithm. Moreover, we can also use it flexibly, such as distributed management of files, etc.

The above is the details of how to use PHP to realize the consistent hash algorithm of distribution algorithm. For more information about using PHP to realize the consistent hash algorithm of distribution algorithm, please pay attention to other related articles of developeppaer!

Recommended Today

JS generate guid method

JS generate guid method https://blog.csdn.net/Alive_tree/article/details/87942348 Globally unique identification(GUID) is an algorithm generatedBinaryCount Reg128 bitsNumber ofidentifier , GUID is mainly used in networks or systems with multiple nodes and computers. Ideally, any computational geometry computer cluster will not generate two identical guids, and the total number of guids is2^128In theory, it is difficult to make two […]