HBase reading notes – data structure


Skip table

Jump tables are widely used in kV databases, such as redis, leveldb and HBase, which take jump tables as a basic data structure to maintain ordered data sets.
Property 1 the probability of a node falling on the k-th layer is Pk-1.
Property 2 a jump list with n elements in the lowest linked list, the total number of elements, where k is the height of the jump list.
Property 3 the height of the jump table is O (logn).
Property 4 the query time complexity of the jump table is O (logn).
Property 5 the insertion / deletion time complexity of the hop table is O (logn).


LSM tree is essentially the same as b+ tree. It is an index structure of disk data. However, unlike the b+ tree, the index of the LSM tree is more friendly to write requests.
The LSM tree index is generally composed of two parts, one is the memory part, and the other is the disk part. The memory part generally uses a jump table to maintain an ordered set of keyvalues. The disk part is generally composed of multiple internal keyValue ordered files.

LSM stores a collection of multiple keyvalues. Each keyValue is generally represented by a byte array.

  • nature:
  • Binary content of key.
  • A 64 bit long value representing the version number, corresponding to timestamp in HBase; This version number usually indicates the writing order of the data. The larger the version number, the more priority the data will be read by the user. Some policies will even be designed to obsolete the data with smaller version numbers (there is a TTL policy in HBase).
  • What is stored in the LSM tree is not the data itself, but the operation records. This corresponds to the meaning of log in the LSM tree (log structured merge tree), that is, the operation log.

2. multi channel merging

HBase reading notes - data structure

  1. Index structure of LSM tree
    Memory part and disk part
  • The memory part is a concurrentskiplistmap, the key part is the above-mentioned key part, and the value is a byte array. When data is written, it is directly written to the memstore.
    With continuous writing, once the memory occupation exceeds a certain threshold, the data in the memory part will be exported to form an orderly data file and stored on the disk.
  • The process of forming an ordered data file from memory part export is called f lush. To avoid the impact of flush on write performance, the currently written memstore will be set as a snapshot, and new write operations will not be allowed to write to the memstore of this snapshot. Open another memory space as memstore for subsequent data writing. Once the memstore of the snapshot is written, the corresponding memory space can be released. In this way, stable write performance can be achieved through two memstores.
HBase reading notes - data structure


Bloom filter

Implementing a disk and memory based hash index can certainly solve this problem. Another low-cost way is to use bloom filter to realize.
HBase’s get operation uses a low-cost and efficient bloom filter to filter a large number of invalid data blocks, thus saving a lot of disk IO.