Nine generation methods of distributed ID

Time:2021-3-2

Why use distributed ID?

Before talking about the specific implementation of distributed ID, let’s briefly analyze why distributed ID is used? What characteristics should distributed ID meet?

  1. What is a distributed ID?

Take MySQL database as an example

When the amount of business data is small, single database and single table can fully support the existing business. If the data is larger, a MySQL master-slave synchronous read-write separation can also cope with it.

However, with the increasing of data, the master-slave synchronization can not be carried out, so it is necessary to divide the database into different databases and tables. However, after the database is divided into different databases and tables, a unique ID is needed to identify a piece of data, and the self increasing ID of the database obviously can not meet the demand. In particular, orders and coupons also need to be identified with a unique ID. At this time, a system that can generate global unique ID is very necessary. Then the globally unique ID is called distributed ID.

  1. So what conditions should distributed ID meet?
  2. Globally unique: the ID must be globally unique, which is the basic requirement
  3. High performance: high availability, low latency, ID generation response block, otherwise it will become a business bottleneck
  4. High availability: 100% availability is deceptive, but it should be infinitely close to 100% availability
  5. Good access: we should adhere to the design principle of ready to use, and the system design and implementation should be as simple as possible
  6. Increasing trend: it’s better to increase trend. This requirement depends on the specific business scenario. Generally, it’s not strict

What are the generation methods of distributed ID?

Today, we mainly analyze the following 9 ways: distributed ID generator and its advantages and disadvantages:

  • UUID
  • Database autoincrement ID
  • Multi master mode of database
  • No.1 section mode
  • Redis
  • Snow flake algorithm
  • Tinyid
  • Baidu (uidgenerator)
  • Leaf

So how are they all implemented? And what are their advantages and disadvantages? Let’s look down

Nine generation methods of distributed ID

UUID based

In the world of Java, if you want to get a unique ID, it is thought that UUID may be the first one. After all, it has unique features in the world. Can UUID be used as distributed ID? The answer is yes, but not recommended!

`public static void main(String[] args) {

   String uuid = UUID.randomUUID().toString().replaceAll("-","");
   System.out.println(uuid);

}`

UUIDThe generation of simple to only one line of code, output resultsc2b8c2b9e46c47e3b30dca3b0d447718But UUID is not suitable for the actual business needs. Strings such as UUID used as order number have no meaning, and no useful information related to orders can be seen. For databases, it is not only too long, but also a string, with poor storage performance and time-consuming query, so it is not recommended to be used as distributed ID.

Advantages: generation is simple enough, local generation has no network consumption, and has unique disadvantages: – unordered string, does not have the trend self increasing characteristic – has no specific business meaning – the length is too long, 16 bytes 128 bits, 36 bits string, storage and query consumes a lot of performance of MySQL, MySQL official clearly suggests that the shorter the primary key, the better, as the database primary key The disorder of UUID will lead to frequent changes of data location, which seriously affects the performance.

Self increasing ID based on Database

Auto based on Database_ Incremental self incrementing ID can be used as distributed ID. specific implementation: a separate MySQL instance is required to generate ID. the table structure is as follows:

`CREATE DATABASE SEQ_ID`;
CREATE TABLE SEQID.SEQUENCE_ID (

id bigint(20) unsigned NOT NULL auto_increment, 
value char(10) NOT NULL default '',
PRIMARY KEY (id),

) ENGINE=MyISAM;

insert into SEQUENCE_ID(value) VALUES (‘values’);“

When we need an ID, we insert a record into the table to return the primary key ID, but this method has a fatal disadvantage. MySQL itself is the bottleneck of the system when the number of visits surges. It is a big risk to use it to realize distributed services, so it is not recommended!

Advantages: simple implementation, monotonous and self incrementing ID, fast query speed of numerical type; disadvantages: DB single point has the risk of downtime, unable to withstand high concurrency scenarios

Based on database cluster mode

Previously, it was said that the single point database mode is not desirable. Then, some high availability optimization should be done for the above mode, and the master-slave mode cluster should be replaced. If you are afraid that a master node will hang up and cannot be used, you should do a dual master mode cluster, that is, two MySQL instances can produce self increasing IDS separately.

Then there will be a problem. If the self incrementing IDs of the two MySQL instances start from 1, the duplicate IDS will be generated. What should I do?

Solution: set the starting value and self increasing step size

MySQL_ 1 configuration:

`set @@auto_ increment_ Offset = 1; — starting value
set @@auto_ increment_ Increment = 2; — step size`

MySQL_ 2 configuration:

`set @@auto_ increment_ Offset = 2; — starting value
set @@auto_ increment_ Increment = 2; — step size`

The self incrementing IDs of the two MySQL instances are as follows:

1、3、5、7、9 2、4、6、8、10

What if the performance of the cluster still can’t support high concurrency? We need to expand MySQL and add nodes, which is a troublesome thing.

Nine generation methods of distributed ID

It can be seen that the horizontal expansion of the database cluster is conducive to solving the problem of single point pressure of the database. At the same time, for the ID generation characteristics, the self increasing step size is set according to the number of machines.

To add a third MySQL instance, you need to manually modify the initial value and step size of one or two MySQL instances, and set the initial generation position of the ID of the third machine farther than the existing maximum autoincrement ID. however, it must be before the ID of one or two MySQL instances has increased to the initial ID value of the third MySQL instance, otherwise the autoincrement ID will repeat, and it may not be generated when necessary It needs to be modified.

Advantages: solve DB single point problem disadvantages: not conducive to subsequent expansion, and in fact, the pressure of a single database itself is still large, still unable to meet the high concurrency scenario.

Segment pattern based on Database

Segment pattern is one of the mainstream implementation methods of distributed ID generator. Segment pattern can be understood as batch acquisition of self incrementing IDs from the database. Each time, a segment range is retrieved from the database. For example, (11000] represents 1000 IDs. Specific business services generate 1-1000 self incrementing IDs from this segment and load them into memory. The table structure is as follows:

“CREATE TABLE id_generator (
id int(10) NOT NULL,
max_ ID bigint (20) not null comment ‘current maximum ID’,
Step int (20) not null comment ‘length of segment’,
biz_ Type int (20) not null comment ‘business type’,
Version int (20) not null comment ‘version number’,
PRIMARY KEY (id)
)“

biz_ Type: represents different business types

max_ ID: the current maximum available ID

Step: represents the length of the segment

Version: an optimistic lock, which updates version every time to ensure the correctness of data during concurrency

id biz_type max_id step version 1 101 1000 2000 0

When the ID of this batch number segment is used up, apply for a new batch number segment to the database again, right_ Do an update operation for the ID field, update max_ id= max_ If the ID + step and update are successful, the new segment is obtained successfully. The range of the new segment is (max)_ id ,max_ id +step]。

update id_generator set max_id = #{max_id+step}, version = version + 1 where version = # {version} and biz_type = XXX

Because multiple business terminals may operate at the same time, the version number version optimistic lock is used to update. This distributed ID generation method does not rely on the database strongly, does not visit the database frequently, and has much less pressure on the database.

Based on redis mode

Redis can also be implemented. The principle is to use the incr command of redis to realize the atomicity self increment of ID.

`127.0.0.1:6379> set seq_ ID 1 / / initialize the auto increment ID to 1
OK
127.0.0.1:6379> incr seq_ ID / / increases by 1 and returns the incremented value
(integer) 2`

When implementing redis, we need to pay attention to the persistence of redis. Redis has two persistence methods: RDB and AOF

RDB will take a snapshot regularly for persistence. If it continuously increases but redis doesn’t persist in time, redis will hang up and the ID will repeat after restarting redis.

Aof will persist every write command. Even if redis is down, there will be no ID duplication. However, due to the particularity of incr command, it will take too long for redis to restart and recover data.

Based on snowflake mode

Snowflake algorithm is an ID generation algorithm used in the internal distributed projects of twitter company. It has been widely praised by domestic manufacturers after it is open source. Under the influence of the algorithm, each company has developed its own unique distributed generator.

Nine generation methods of distributed ID

Snowflake generates the ID of long type. A long type takes up 8 bytes, and each byte takes up 8 bits. That is to say, a long type takes up 64 bits.

Snowflake ID composition structure: positive digit (1 bit) + time stamp (41 bits) + machine ID (5 bits) + Data Center (5 bits) + self increment (12 bits), a total of 64 bits.

The first bit (1bit): in Java, the highest bit of long is the sign bit, which represents positive and negative. The positive number is 0 and the negative number is 1. Generally, the generated ID is positive, so it is 0 by default. Time stamp part (41bit): millisecond level time, it is not recommended to save the current time stamp, but use the difference value of (current time stamp – fixed start time stamp) to make the generated ID start from a smaller value; 41 bit time stamp can be used for 69 years, (1L < < 41) / (1000L * 60 * 60 * 24 * 365) = 69 years Work machine ID (10bit): also known as workid, this can be configured flexibly, including the combination of machine room or machine number. The serial number part (12bit) supports 4096 IDs generated by the same node in the same millisecond According to the logic of the algorithm, we only need to implement the algorithm in Java language and encapsulate it as a tool method. Then, each business application can directly use the tool method to obtain the distributed ID. we only need to ensure that each business application has its own work machine ID, instead of building an application to obtain the distributed ID.

Implementation of snowflake algorithm in Java version:

`/**

  • Twitter’s snowflake algorithm uses the snowflake algorithm to generate an integer, which is then converted into a 62 base address URL

*

  • https://github.com/beyondfeng…

*/
public class SnowFlakeShortUrl {

/**
  • Start time stamp

*/

private final static long START_TIMESTAMP = 1480166465631L;

/**
  • Number of digits occupied by each part

*/

private final static long SEQUENCE_ Bit = 12; // the number of digits occupied by the serial number
private final static long MACHINE_ Bit = 5; // the number of bits occupied by the machine ID
private final static long DATA_ CENTER_ Bit = 5; // the number of bits occupied by the data center

/**
  • Maximum value of each part

*/

private final static long MAX_SEQUENCE = -1L ^ (-1L << SEQUENCE_BIT);
private final static long MAX_MACHINE_NUM = -1L ^ (-1L << MACHINE_BIT);
private final static long MAX_DATA_CENTER_NUM = -1L ^ (-1L << DATA_CENTER_BIT);

/**
  • The displacement of each part to the left

*/

private final static long MACHINE_LEFT = SEQUENCE_BIT;
private final static long DATA_CENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT;
private final static long TIMESTAMP_LEFT = DATA_CENTER_LEFT + DATA_CENTER_BIT;

Private long datacenter ID; // data center
Private long machineid; // machine ID
Private long sequence = 0l; // serial number
Private long lasttimestamp = - 1L; // last timestamp

private long getNextMill() {
    long mill = getNewTimeStamp();
    while (mill <= lastTimeStamp) {
        mill = getNewTimeStamp();
    }
    return mill;
}

private long getNewTimeStamp() {
    return System.currentTimeMillis();
}

/**
  • Generates the specified serial number based on the specified data center ID and machine logo ID

*

  • @Param datacenter ID data center ID
  • @Param machineid machine flag ID

*/

public SnowFlakeShortUrl(long dataCenterId, long machineId) {
    if (dataCenterId > MAX_DATA_CENTER_NUM || dataCenterId < 0) {
        throw new IllegalArgumentException("DtaCenterId can't be greater than MAX_DATA_CENTER_NUM or less than 0!");
    }
    if (machineId > MAX_MACHINE_NUM || machineId < 0) {
        throw new IllegalArgumentException("MachineId can't be greater than MAX_MACHINE_NUM or less than 0!");
    }
    this.dataCenterId = dataCenterId;
    this.machineId = machineId;
}

/**
  • Generate next ID

*

  • @return

*/

public synchronized long nextId() {
    long currTimeStamp = getNewTimeStamp();
    if (currTimeStamp < lastTimeStamp) {
        throw new RuntimeException("Clock moved backwards.  Refusing to generate id");
    }

    if (currTimeStamp == lastTimeStamp) {
        //Within the same millisecond, the serial number increases automatically
        sequence = (sequence + 1) & MAX_SEQUENCE;
        //The number of sequences in the same millisecond has reached the maximum
        if (sequence == 0L) {
            currTimeStamp = getNextMill();
        }
    } else {
        //The serial number is set to 0 in different milliseconds
        sequence = 0L;
    }

    lastTimeStamp = currTimeStamp;

    return (currTimeStamp - START_ TIMESTAMP) << TIMESTAMP_ Left // timestamp section
            | dataCenterId << DATA_ CENTER_ Left // data center
            | machineId << MACHINE_ Left // machine identification section
            |Sequence; // part of sequence number
}

public static void main(String[] args) {
    SnowFlakeShortUrl snowFlake = new SnowFlakeShortUrl(2, 3);

    for (int i = 0; i < (1 << 4); i++) {
        //Decimal system
        System.out.println(snowFlake.nextId());
    }
}

}`

Baidu (uid generator)

Uid generator is developed by Baidu Technology Department, the project GitHub address https://github.com/baidu/uid-generator

Uid generator is based on snowflake algorithm. Different from the original snowflake algorithm, uid generator supports the number of bits of user-defined time stamp, work machine ID and serial number, and adopts the generation strategy of user-defined workid in uid generator.

Uid generator needs to be used with database, and a new worker needs to be added_ Node table. When the application starts, it will insert a piece of data into the database table. The self incrementing ID returned after successful insertion is the workid of the machine. The data consists of host and port.

For uid generator ID composition structure:

Workid takes up 22 bits, time takes up 28 bits, serialization takes up 13 bits. It should be noted that, unlike the original snowflake, the unit of time is seconds, not milliseconds, and workid is also different. In addition, the same application will consume one workid every time it is restarted.

reference https://github.com/baidu/uid-generator/blob/master/README.zh_ cn.md

Leaf

Leaf is developed by meituan, GitHub address: https://github.com/Meituan-Dianping/Leaf

Leaf supports both segment mode and snowflake algorithm mode, and can be switched.

No.1 section mode

Import the source code first https://github.com/Meituan-Dianping/Leaf , a table leaf is under construction_ alloc

`DROP TABLE IF EXISTS leaf_alloc`;

CREATE TABLE leaf_alloc (
biz_tagVarchar (128) not null default ” comment ‘business key’,
max_idBigint (20) not null default ‘1’ comment ‘the maximum ID currently assigned’,
stepInt (11) not null comment ‘the initial step size is also the minimum step size for dynamic adjustment’,
descriptionVarchar (256) default null comment ‘description of service key’,
update_timetimestamp NOT NULL DEFAULT CURRENT_ TIMESTAMP ON UPDATE CURRENT_ Timestamp comment ‘update time of database maintenance’,
PRIMARY KEY (biz_tag)
) ENGINE=InnoDB;“

Then, in the project, open the segment mode, configure the corresponding database information, and close the snowflake mode

`leaf.name=com.sankuai.leaf.opensource.test
leaf.segment.enable=true
leaf.jdbc.url=jdbc:mysql://localhost:3306/leaf_test?useUnicode=true&characterEncoding=utf8&characterSetResults=utf8
leaf.jdbc.username=root
leaf.jdbc.password=root

leaf.snowflake.enable=false

leaf.snowflake.zk.address=

leaf.snowflake.port=`

Start the leafserver application project of leaf server module and run

The test URL for getting distributed self increasing ID in segment mode: http: / / localhost: 8080 / API / segment / get / leaf segment test

Monitoring section mode: http://localhost :8080/cache

Snowflake mode The sniff mode of leaf relies on zookeeper, which is different from the original algorithm. It is mainly used to generate the workid. The workid in leaf is generated based on the sequential ID of zookeeper. When each application uses leaf sniff, it will generate a sequential ID in zookeeper at startup, which is equivalent to a machine corresponding to a sequential node, that is, a workid.

`leaf.snowflake.enable=true
leaf.snowflake.zk.address=127.0.0.1
leaf.snowflake.port=2181`

Test URL for obtaining distributed auto increment ID in snowflake mode: http://localhost :8080/api/snowflake/get/test

Tinyid

Tinyid was developed by Didi, GitHub address: https://github.com/didi/tinyid .

Tinyid is implemented based on the principle of segment mode, which is the same as leaf. Each service gets a segment (10002000), (20003000), (30004000)

Nine generation methods of distributed ID

Tinyid provides HTTP and tinyid client access

HTTP access

(1) Import tinyid source code:

git clonehttps://github.com/didi/tinyid.git

(2) To create a data table:

`CREATE TABLE tiny_id_info` (
idbigint(20) unsigned NOT NULL AUTO_ Increment comment ‘Auto increment primary key’,
biz_typeVarchar (63) not null default ” comment ‘service type, unique’,
begin_idBigint (20) not null default ‘0’ comment ‘start ID, only record initial value, no other meaning. Begin at initialization_ ID and Max_ ID should be the same ‘,
max_idBigint (20) not null default ‘0’ comment ‘current maximum ID’,
stepInt (11) default ‘0’ comment ‘step length’,
deltaInt (11) not null default ‘1’ comment ‘ID increment per time’,
remainderInt (11) not null default ‘0’ comment ‘remainder’,
create_timeTimestamp not null default ‘2010-01-01 00:00:00’ comment ‘creation time’,
update_timeTimestamp not null default ‘2010-01-01 00:00:00’ comment ‘update time’,
versionBigint (20) not null default ‘0’ comment ‘version number’,
PRIMARY KEY (id),
UNIQUE KEY uniq_biz_type (biz_type)
) ENGINE=InnoDB AUTO_ Increment = 1 default charset = utf8 comment ‘ID information table’;

CREATE TABLE tiny_id_token (
idint(11) unsigned NOT NULL AUTO_ Add comment ‘add ID’,
token varchar(255) NOT NULL DEFAULT ” COMMENT ‘token’,
biz_typeVarchar (63) not null default ” comment ‘the service type identifier accessible by this token’,
remarkVarchar (255) not null default ” comment ‘remarks’,
create_timeTimestamp not null default ‘2010-01-01 00:00:00’ comment ‘creation time’,
update_timeTimestamp not null default ‘2010-01-01 00:00:00’ comment ‘update time’,
PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_ Increment = 1 default charset = utf8 comment ‘token information table’;

INSERT INTO tiny_id_info (id, biz_type, begin_id, max_id, step, delta, remainder, create_time, update_time, version)
VALUES

(1, 'test', 1, 1, 100000, 1, 0, '2018-07-21 23:52:58', '2018-07-22 23:19:27', 1);

INSERT INTO tiny_id_info (id, biz_type, begin_id, max_id, step, delta, remainder, create_time, update_time, version)
VALUES

(2, 'test_odd', 1, 1, 100000, 2, 1, '2018-07-21 23:52:58', '2018-07-23 00:39:24', 3);

INSERT INTO tiny_id_token (id, token, biz_type, remark, create_time, update_time)
VALUES

(1, '0f673adf80504e2eaa552f5d791b644c', 'test', '1', '2017-12-14 16:36:46', '2017-12-14 16:36:48');

INSERT INTO tiny_id_token (id, token, biz_type, remark, create_time, update_time)
VALUES

(2, '0f673adf80504e2eaa552f5d791b644c', 'test_odd', '1', '2017-12-14 16:36:46', '2017-12-14 16:36:48');`` 

(3) Configuration database:

`datasource.tinyid.names=primary
datasource.tinyid.primary.driver-class-name=com.mysql.jdbc.Driver
datasource.tinyid.primary.url=jdbc:mysql://ip:port/databaseName?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8
datasource.tinyid.primary.username=root
datasource.tinyid.primary.password=123456`

(4) Test after starting tinyid server

`Get distributed auto increment ID: http://localhost :9999/tinyid/id/nextIdSimple?bizType=test&token=0f673adf80504e2eaa552f5d791b644c’
Return result: 3

Batch obtain distributed auto increment ID:
http://localhost:9999/tinyid/id/nextIdSimple?bizType=test&token=0f673adf80504e2eaa552f5d791b644c&batchSize=10′
Return results: 4,5,6,7,8,9,10,11,12,13`