Redis dictionary implementation


4.1 dictionary data structure

Typedef struct Dict {// type specific function dicttype * type; // private data void * privateata; // hash table dictht [2]; // rehash index. When rehash is not in progress, the value is - 1 int rehashidx;} dict;
  • The type is a pointer to the dicttype structure. Each dicttype structure holds a cluster of functions for operating specific type key value pairs. Redis will set different type specific functions for dictionaries with different purposes.
Typedef struct dicttype {// function for calculating hash value unsigned   int   (*hashFunction)(const   void  * Key); // copy the function of the key    void  * (*keyDup)(void  * privdata,   const   void  * Key); // copy the function void of the value  * (*valDup)(void  * privdata,   const   void  * Obj); // function int of comparison key   (*keyCompare)(void  * privdata,   const   void  * key1,   const   void  * Key2); // destroy the void function of the key   (*keyDestructor)(void  * privdata,   void  * Key); // function void that destroys the value   (*valDestructor)(void  * privdata,   void  * obj);}dictType;
  • The privdata attribute holds the optional parameters that need to be passed to those type specific functions.
  • HT is an array containing two items. Each item in the array is a dictht hash table. Generally, the dictionary only uses the HT [0] hash table, and HT {1] is only used when rehash the HT [0] hash table
//The hash table used by redis dictionary is identified by dict. Redis dictionary uses hash table as the underlying implementation. // a hash table can have multiple hash table nodes, and each hash table node stores a key value pair in the dictionary. typedef struct dictht {
//Hash table array dictentry * * table; // hash table size unsigned long size; // hash table size mask, used to calculate the index value, always equal to size-1 unsigned long sizemask; // the number of existing nodes in the hash table is unsigned long used;} dictht;
//Hash table nodes are identified by dictentry. Each dictentry stores a key value pair typedef struct dictentry {// key void  * Key; // value Union{          void  * val;               uint64_tu64;          int64_ts64;     }   v. // point to the next hash table node to form a linked list and solve the address conflict      struct   dictEntry  * next;}   dictEntry;
  • Rehashidx records the current progress of rehash. If rehash is not in progress, its value is – 1.


4.2 hash algorithm

When adding a new key value pair to the dictionary, the program needs to calculate the hash value and index value according to the key of the key value pair, and then put the hash node containing the key value pair on the specified index of the hash table array according to the index value.

Redis calculates hash and index values as follows:

#Use the hash function set by the dictionary to calculate the hash value

hash = dict -》 type -》 hashFunction(key)

#Use the sizemask attribute of the hash table and the hash value hash to calculate the index value

#Depending on the situation, HT [x] can be HT [0] or HT [1]

index = hash & dictht -> ht[x].sizemask

When the dictionary is used as the underlying implementation of the database or the underlying implementation of the hash key, redis uses the murmurhash2 algorithm to calculate the hash value.

4.3 resolving key conflicts

Redis uses the chain address method to resolve key conflicts. Each hash table node has a next pointer, and multiple hash table nodes form a one-way linked list through the next pointer. Because the linked list composed of dictentry nodes does not have a pointer to the end of the linked list, for speed,New nodes are always added to the header of the linked list (the complexity is O (1)), ranking ahead of other existing nodes. (K2, V2) is a newly added node.

 4.4 rehash

With the continuous execution of operations, the key value pairs saved by the hash table will gradually increase or decrease. In order to maintain the load factor of the hash table within a reasonable threshold, when the number of key value pairs of the hash table is too much or too little, the hash table will be expanded or contracted accordingly.

The expansion and contraction of the hash table are performed by rehash (re hashing). The specific steps are as follows

  1. Allocate space for dictionary HT [1]. The size of this hash table space depends on the operation to be performed and the number of key value pairs currently contained in HT [0] (that is, HT [0]. Used attribute)
    1. Extension operation: the size of HT [1] is greater than or equal to the firstht[0].used * 2 2 ^ n of (n-th power of 2)
    2. Shrink operation: the size of HT [1] is greater than or equal to the firstht[0].used2 ^ n
  2. Rehash all key value pairs saved in HT [0] to HT [1]:Rehash refers to recalculating the hash value and index value of the key,Then put the key value pair in the specified position of the HT [1] hash table
  3. After HT [0] is migrated to HT [1], release HT [0], set HT [1] to HT [0], and create a new blank hash table on HT [1] to prepare for the next rehash.

4.5 progressive rehash

In order to avoid the impact of too many rehash key value pairs (involving a huge amount of computation) on the server performance, the server does not rehash all key value pairs on HT [0] to HT [1] at one time, but migrates all key value pairs in HT [0] several times and gradually.

Steps of progressive hash:

  1. Allocate space for HT [1] so that the dictionary holds HT [0] HT [1] at the same time
  2. Maintain an index counter variable rehashidx in the dictionary,And set it to 0 to identify the start of rehash
  3. During rehash, every time the dictionary is added, deleted, searched or updated, the program will rehash all key value pairs of HT [0] hash table on rehashidx index to HT [1] in addition to performing the specified operations. When rehash is completed, the program willAdd one to the rehashidx value
  4. Finally, HT [0] all rehash to HT [1], and the program willThe rehashidx value is set to – 1 to identify rehash completion

Progressive rehash divides the work of rehash into each addition, deletion, search and update, so as to avoid the problems caused by centralized rehash.

Recommended Today

SQL exercise 20 – Modeling & Reporting

This blog is used to review and sort out the common topic modeling architecture, analysis oriented architecture and integration topic reports in data warehouse. I have uploaded these reports to GitHub. If you are interested, you can have a lookAddress: recorded a relatively complete development process in my hexo blog deployed on GitHub. You can […]