Collation of database index knowledge points

Time:2022-5-13

First knowledge index

Concept of index

Index is an auxiliary storage structure defined on the basis of storage table, which helps to quickly locate the required records without checking all records. It is composed of a series of index entries stored on disk, and each index entry is composed of two parts. Index field and row pointer.

Index field

It is formed by concatenating the values of some columns in a table, usually one column. Each value of the index field is usually stored in the index.

Row pointer

Points to the storage location on disk of records in the table that contain index field values.

The file that stores the index entries is the index file, and the storage table is called the main file.

Index file organization

(in contrast, the main file organization includes heap file, sort file, hash file, cluster file, etc.)

Sort index file: organize storage according to a certain order of index field values

Hash index file: it is stored by using hash function to allocate hash bucket according to index field value

Function of index

Create different index files for different attributes or attribute combinations on a table, and the index field value can be the value of any attribute or combination of attribute values in the table;

The index file is much smaller than the main file. By retrieving a small index file (which can be completely loaded into memory), quickly locate it, and then read the relevant records in the very large main file;

When there is an index, the update operation must synchronously update the index file and the main file. Maintain its data consistency.

Index creation and maintenance in SQL language

Basic knowledge

After defining a table, if a primary key is defined, the system automatically generates a primary index;

The index can be defined or revoked by the user;

After the index is created, the DBMS will automatically maintain all indexes, whether it is a primary index or a user-defined index;

When a table is deleted, all indexes defined on the table are automatically revoked.

Create, undo index


CREATE INDEX idxSname ON Student(Sname);
DROP INDEX idxSname;

Dense index and sparse index

Dense index

For each record in the master file (each index field value formed), there is an index item corresponding to it, indicating the location of the record. Such an index is called a dense index. (dense index)

Sparse index

For some records (formed index field values) in the main file, there are index entries corresponding to them. Such an index is called non dense index or sparse index.

How do sparse indexes locate records

Locate the record with the index field value of K

First, find the index item corresponding to the adjacent maximum index field value less than K; The table is retrieved sequentially from the record corresponding to the index item

Usage requirements of sparse indexThe storage master file must be sorted according to the attribute of the corresponding index field:

Compared with dense index: less space, lighter maintenance tasks, but slower

balance: the index item does not point to the record pointer, but to the storage block where the record is located, that is, each storage block has an index item, rather than each record has an index item – the main index

How do dense indexes locate records

1. The dense indexes of candidate key attributes can be matched one by one

2. For the dense index of non candidate key attributes, the main file is sorted according to the index field value, and the index item is established for each non repeated first index field value, and the same index field value is searched nearby;

3. For the dense index of non candidate key attributes, the main file is not sorted according to the index field value. Do not require the index field in the index item to be unique and can appear repeatedly, so as to point to the corresponding index field value in the main file;

4. For the dense index of non candidate key attributes, the main file is not sorted according to the index field value. If the index field in the index is required to be unique, an intermediate layer, pointer bucket, can be introduced; This pointer bucket is the third case.

Primary index

Primary index concept

Usually, there is an index entry for each storage block. The total number of index entries is the same as the number of storage blocks occupied by the storage table. The first record of each storage block of the storage table is also called anchor record, which is referred to as block anchor for short.

The index field value of the primary index is the index field value of the block anchor, and the pointer points to the storage block where it is located.

The primary index is an ordered file sorted by the value of the index field. It is usually based on the sorting field based on the primary code of the ordered primary file, that is, the index field of the primary index has a corresponding relationship with the sorting code (primary code) of the primary file.

The primary index is a sparse index.

Secondary index

Secondary index definition

Is an auxiliary storage structure defined on any or more non sorting fields of the main file; Usually, there is an index entry for different values on a non sorted field. The index field is the different values of the field, and the pointer points to the block containing the record or the record itself;

When the non sorting field is an index field, if the field value is not unique, a linked list like structure shall be adopted to save the positions of all records of the field value;

The secondary index is a dense index, and the retrieval speed is sometimes quite high

Differences and relations between primary index and secondary index

A master file has only one primary index, but can have multiple secondary indexes;

The main index is usually based on the main code or sorting code; The auxiliary index is based on non sorting fields;

The primary index can be used to reorganize the data of the primary file, but the secondary index cannot change the data of the primary file;

The primary index is a sparse index and the secondary index is a dense index.

Clustered index and non clustered index

Clustered index

It refers to that the adjacent records in the index are also stored adjacent in the main file;

Non clustered index

It refers to that the adjacent records in the index are not necessarily stored adjacent in the main file

be careful:

If a sorting field of the master file is not the master code, the value of each record in this field is not unique. At this time, this field is called clustering field; Cluster index is usually defined on the cluster field; Cluster index usually has an index entry for each different value on the cluster field (the total number of index entries is the same as the number of different values on the cluster field in the main file). The index field is the different value of the cluster field. Since the records with the same cluster field value may be stored in several blocks, the pointer of the index entry points to the first block. A master file can only have one clustered index file, but it can have multiple non clustered index files. The primary index is usually a clustered index (but the total number of index entries is not necessarily the same as the number of different values on the clustered field in the primary file, which is the same as the number of storage blocks in the primary file); Secondary indexes are usually non clustered indexes. The primary index / cluster index is an index that can determine the storage location of records; Non clustered indexes can only be used for queries, indicating the location of stored records.

Inverted index

Inverted index is a specific storage form to realize “word document matrix”. Through inverted index, you can quickly obtain the document list containing this word according to the word. The inverted index is mainly composed of two parts: “word dictionary” and “inverted file”.

Lexicon: the common index unit of search engine is word. The word dictionary is a string set composed of all words in the document set. Each index item in the word dictionary records some information of the word itself and a pointer to the “inverted list”.

PostingList: the inverted list records the document list of all documents in which a word has appeared and the location information of the word in the document. Each record is called a postingitem. According to the inverted list, you can know which documents contain a word.

Inverted file: the inverted list of all words is often sequentially stored in a file on the disk. This file is called inverted file. Inverted file is the physical file that stores the inverted index.

Multilevel index

When there are many index items, the index can be re indexed, which is called multi-level index.

Common multi-level indexes: B-tree / B + tree index

Multi attribute index

An index field is an index formed by the combination of multiple attribute values of a table

Hash index

Index hashing Technology

Grid index

Cross joint location and retrieval using multiple index fields

B + tree index

definition

A multi-level index that organizes index items in a tree data structure

Because a storage block can store multiple index items, each index item is composed of pointer and index field. Ki represents the index field value, PI represents the pointer, and points to the index block or data block or records in the data block.

Generally, n-1 index entries and 1 pointer can be stored in a block.

B + tree features

  • It can automatically maintain the hierarchy of the tree corresponding to the size of the main file
  • The pointer utilization of each index block is 50% ~ 100%

The index field value x is pointed by PI when Ki-1 < = x < Ki, and by PI + 1 when ki < = x < ki + 1.

What does the pointer of leaf node and leaf node point to respectively

The non leaf node pointer points to the index block, and the leaf node pointer points to the data block or data record of the main file

The last pointer of the leaf node points to the next data block

The number of index pointers actually used by an index block, D, meets the requirements (except for the root node)

n/2<=d<=n

At least 2 pointers of the root node are used

B + tree storage Convention

Index field values appear repeatedly in leaf nodes and non leaf nodes

The pointer to the main file only appears in the leaf node

All leaf nodes can overwrite the indexes of all key values

Index field values are arranged in order in leaf nodes

Only the set of leaf nodes is the complete index of the main file

This is the end of this article on sorting out the knowledge points of database index. For more relevant database index contents, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!