Detailed explanation of SQL server2014 hash index principle

Time:2020-11-25

When a key value pair is passed to a hash function, after the calculation of the hash function, the key value pair will be placed in the appropriate hash buckets according to the results

Take a chestnut

Let’s assume that the modulus of 10 (% 10) is the hash function. If the key of the key value pair is 1525 and passed to the hash function, 1525 will be stored in the fifth bucket

Because 5 as 1525% 10 = 5.

Similarly, 537 will be stored in the seventh bucket, 2982 will be stored in the second bucket, and so on

Similarly, in the hash index, the hash index column will be passed to the hash function for matching (similar to the map operation of HashMap in Java). After the matching is successful, the,

The index column will be stored in a table in the matched hash bucket. This table will have the actual data row pointer, and then find the corresponding data row according to the actual data row pointer.

In summary, to find a row of data or process a where clause, the SQL server engine needs to do the following

1. Generate the appropriate hash function according to the parameters in the where condition

2. The index column is matched to the corresponding hash bucket. Finding the corresponding hash bucket means that the corresponding row pointer is also found

3. Read data

Hash index is simpler than b-tree index, because it does not need to traverse B-tree, so the access speed is faster

 

Examples of hash functions and corresponding syntax


CREATE TABLE dbo.HK_tbl
 (
  [ID] INT IDENTITY(1, 1)
    NOT NULL
    PRIMARY KEY NONCLUSTERED HASH WITH ( BUCKET_COUNT = 100000 ) ,
  [Data] char(32) COLLATE Latin1_General_100_BIN2
      NULL ,
  [dt] datetime NOT NULL,
 )
 WITH (
   MEMORY_OPTIMIZED =
   ON,
   DURABILITY =
   SCHEMA_AND_DATA);

In SQL Server 2014, the hash index cannot be added after the memory optimization table is created. However, in SQL Server 2016, the hash index can be added after the table is created

Adding a hash index is an offline operation.

Number of buckets in hash index

( BUCKET_ Count = 100000) defines the number of buckets that the hash index can use. This bucket is fixed and the number of buckets is specified by the user,

Instead of the number of buckets generated by SQL server during query execution. The number of buckets is always rounded to the power of 2 (1024, 2048, 4096, etc.)

In fact, the hash index of SQL Server 2014 is similar to the adaptive hash index principle of MySQL, both of which are to get rid of the shackles of B-tree and make the search efficiency faster

How does a relational database work also describes the principle of hash join