The boss asked me, “why do you set a self increasing ID when creating a table? Isn’t it right to use the serial number as the primary key?”

Time:2022-1-14

It’s time to start a new project again. The operation is as fierce as a tiger. Sort out the process and draw a diagram. This is not the case. I started to understand the process and table structure.

Me: come on
Boss: why do you set a self increasing ID for this table? Just use the serial number (user number / product number) as the primary key?
Me: This is specified by the DBA. Create table ID and create_ time、update_ Time these three fields should have. This is also stipulated in the java development specification.
Little partner: (echoing) Yes, it’s stipulated like this!
Boss: is the serial number the only index for you? Set it as the primary key, so you don’t need an ID, and you can reduce one query back to the table?
Me:… (what I said seems very reasonable. We dare not speak.)
Boss: since they have stipulated, why do you design a self increasing ID?
Me: take out a small book (go back and check the information ~).

The boss asked me,

Table building protocol

The boss asked me,

During work, when creating a table, the DBA will also review the SQL of the table to check whether it meets the specifications and whether the index is set for common fields.

CREATE TABLE `xxxx` (
  `id` bigint(20) NOT NULL AUTO_ Increment comment 'Auto increment primary key',
  `create_ time` datetime(3) NOT NULL DEFAULT current_ Timestamp (3) comment 'creation time',
  `update_ time` datetime(3) NOT NULL DEFAULT current_ timestamp(3) ON UPDATE current_ Timestamp (3) comment 'update time',
  PRIMARY KEY (`id`) USING BTREE,
  KEY `idx_create_time` (`create_time`) USING BTREE,
  KEY `idx_update_time` (`update_time`) USING BTREE
) ENGINE=InnoDB AUTO_ Increment = 5 default charset = utf8mb4 comment = 'table comment';

Therefore, in the process of my use, the serial number is set with a separate field, such as trans_ No, but this time I ran into a question: trans_ Since no is unique, why not use trans directly_ What about no as ID?

Let’s start to understand why step by step by consulting relevant materials?

The boss asked me,

Primary key

What is a primary key

The boss asked me,

https://dev.mysql.com/doc/ref…

In this definition, we mainly focus on the last sentence:

When choosing primary key values, consider using arbitrary values (a synthetic key) rather than relying on values derived from some other source (a natural key).

This means that when creating a primary key, try to use the MySQL self incremented primary key instead of the business generated value as the primary key.

Characteristics of primary key

The boss asked me,

In short:

Non empty, unique, little or no change.

How to add a primary key

The boss asked me,

You can specify it when creating a table, or you can add a primary key after the alter statement, but it is officially recommended to specify it when creating a table.

Why add a primary key

  1. The primary key can uniquely identify this row of data, so as to ensure that only this row of data is operated when deleting and updating.
  2. For indexing, each InnoDB table has a special index, namely cluster index, which is used to store row data. In general, clustered indexes are synonymous with primary keys.

    1. Declare the primary key, and InnoDB will use the primary key as the cluster index.
    2. If it is not declared, the first index will be found at the location of all key columns of unique, not null, and it will be used as a clustered index
    3. If a unique index is not declared and cannot be found, a hidden cluster index Gen is generated internally_ CLUST_ Index, the hidden row ID is 6 bytes and increases monotonically.

Figure – > what is an index

Indexes

Only the InnoDB engine is introduced here. For details, please refer to the official documents, and the introduction is relatively simple.

Index classification

  1. Clustered index: table storage is organized according to the value of the primary key column to speed up the query and sorting involving the primary key column. Cluster index is also introduced when primary key is introduced.
  2. Secondary index: it can also be called auxiliary index. The corresponding primary key column and auxiliary index column will be recorded in the auxiliary index. When searching according to the auxiliary index, the corresponding primary key column will be obtained according to the auxiliary index, and then the cluster index will be searched according to the primary key.It is generally not recommended that the primary key be too long, because the secondary index will use more space if the primary key is too long.

Supplement:

Back to table: first query the corresponding primary key value in the secondary index, and then get the query from the cluster index according to the primary key.
Index coverage: the secondary index records the primary key column and the secondary index column. If I only query the values of the primary key column and the secondary index column, I don’t need to go back to the table.

Physical structure of the index

The B + data structure used by InnoDB constructs a B + tree according to the cluster index value (primary key / unqiue / or self generated). The leaf node stores row record data, so each leaf node can also be called a data page. The default size of each data page is 16K, which supports customization.

The boss asked me,

Data insertion

When data is inserted, InnoDB will idle 1 / 16 of the page for future insertion and update of index records.

  1. Sequential insertion (ascending or descending): fills up about 15 / 16 of the remaining index pages
  2. Random insertion: only 1 / 2 to 15 / 16 of the capacity will be used

In random insertion, it will move and page frequently, resulting in a large number of fragments and making the index tree not compact enough. Using sequential insertion, the data is more compact and has higher space utilization.

summary

Q&A

Q: What are backtable and index overrides?

A:

  1. Back to table: first query the corresponding primary key value in the secondary index, and then get the query from the cluster index according to the primary key.
  2. Index coverage: the secondary index records the primary key column and the secondary index column. If I only query the values of the primary key column and the secondary index column, I don’t need to go back to the table.

Q: Why set self incrementing primary key ID?

A:

  1. A row of data can be uniquely identified, and the primary key will be used when InnoDB builds the index tree.
  2. The self incrementing ID is sequential, which can ensure that the data on the index tree is compact, have higher space utilization, reduce the splitting and merging of data pages, and improve efficiency.
  3. Generally, the use of mobile phone numbers and ID number as primary keys can not guarantee the order.
  4. The serial number is generally relatively long, such as 28 bits, 32 bits, etc. if it is too long, the secondary index will occupy more space. At the same time, for business needs, the serial number has a certain randomness.

Conclusion

In this paper, we mainly refer to the data to understand why we need to set a self increasing ID that has nothing to do with business as the primary key. Many contents are relatively simple, such as InnoDB’s B + tree, page splitting and page merging, insertion process, etc. there is no in-depth research. Interested partners can do more in-depth research.

At the same time, in addition to setting a self incrementing ID as the primary key when creating a table, will the partners also encounter a situation in the process of business development: user logout and data deletion are all logical deletion, not physical deletion.

The introduction of this article is relatively simple, and I hope you can correct it.