Distributed unique ID generation scheme selection! Detailed analysis of snowflake algorithm snowflake

Time:2022-1-6
Distributed unique ID generation scheme selection! Detailed analysis of snowflake algorithm snowflake

Distributed unique ID

  • useRocketMQWhen, you need to use distributed uniqueID
  • Messages may be repeated, so idempotency must be done on the consumer side. In order to achieve business idempotency, the producer must have a uniqueID,The following conditions need to be met:
    • The same business scenario should be globally unique
    • The ID must be generated at the sender of the message and sent to MQ
    • The consumer determines whether it is repeated according to the ID to ensure idempotency
  • Where to generate and the idempotency of the consumer has nothing to do with the ID. the ID needs to ensure the following characteristics:
    • Locally or even globally unique
    • Increasing trend

Snowflake algorithm

  • Snowflake is Twitter’s open source distributed ID generation algorithm, and the result is aLongThe core idea of type ID is:
    • use1Bit as sign bit, determined as0,expressjust
    • use41Bit asMsec
    • use10Bit as the ID of the machine:high5Bit isData center ID,low5Bit isMachine ID
    • use12Bit asSerial number in milliseconds,This means that each node can generate data per second4096(212)ID (s)
      Distributed unique ID generation scheme selection! Detailed analysis of snowflake algorithm snowflake

      The algorithm is implemented by binary operation, and a single machine can generate at most in theory per second1000(2^12),*Namely409.610000 IDS

SnowflakeIdWorker

  • Snowflake algorithm java implementation snowflakeidworker:
/**
 * Twitter_Snowflake<br>
 *Snowflake has the following structure (each part is separated by): < br >
 * 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000 <br>
 *1-bit identification. Since the basic type of long is signed in Java, the highest bit is the sign bit, the positive number is 0 and the negative number is 1, the ID is generally a positive number and the highest bit is 0 < br >
 *41 bit time cut (in milliseconds). Note that the 41 bit time cut is not the time cut that stores the current time, but the difference of the time cut (current time cut - start time cut)
 *The start time cut here is generally the time when our ID generator starts to use, which is specified by our program (the starttime attribute of idworker class in the following program). The 41 bit time cut can be used for 69 years. Year t = (1L < < 41) / (1000L * 60 * 60 * 24 * 365) = 69 < br >
 *10 bit data machine bits can be deployed in 1024 nodes, including 5-bit datacenter ID and 5-bit workerid < br >
 *12 bit sequence, counting within milliseconds, and 12 bit counting sequence number. Each node can generate 4096 ID sequence numbers per millisecond (the same machine, the same time cut) < br >
 *Add up to just 64 bits, which is a long type< br>
 *The advantage of snowflake is that it is sorted by time, and there is no ID collision in the whole distributed system (distinguished by data center ID and machine ID), and the efficiency is high. After testing, snowflake can generate about 260000 IDS per second.
 */
public class SnowflakeIdWorker {

    // ==============================Fields===========================================
    /**Start date (January 1, 2015)*/
    private final long twepoch = 1420041600000L;

    /**Number of digits occupied by machine ID*/
    private final long workerIdBits = 5L;

    /**Number of bits occupied by data ID*/
    private final long datacenterIdBits = 5L;

    /**The maximum machine ID supported is 31 (this shift algorithm can quickly calculate the maximum decimal number represented by several binary numbers)*/
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

    /**The maximum supported data ID is 31*/
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);

    /**The number of bits the sequence occupies in the ID*/
    private final long sequenceBits = 12L;

    /**The machine ID shifts 12 bits to the left*/
    private final long workerIdShift = sequenceBits;

    /**The data ID shifts 17 bits to the left (12 + 5)*/
    private final long datacenterIdShift = sequenceBits + workerIdBits;

    /**Shift the time cut to the left by 22 bits (5 + 5 + 12)*/
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;

    /**The mask of the generated sequence is 4095 (0b111111 = 0xfff = 4095)*/
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);

    /**Work machine ID (0 ~ 31)*/
    private long workerId;

    /**Data center ID (0 ~ 31)*/
    private long datacenterId;

    /**Sequence in milliseconds (0 ~ 4095)*/
    private long sequence = 0L;

    /**Last generated ID*/
    private long lastTimestamp = -1L;

    //==============================Constructors=====================================
    /**
     *Constructor
     *@ param workerid (0 ~ 31)
     *@ param datacenter ID datacenter ID (0 ~ 31)
     */
    public SnowflakeIdWorker(long workerId, long datacenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    // ==============================Methods==========================================
    /**
     *Get the next ID (the method is thread safe)
     * @return SnowflakeId
     */
    public synchronized long nextId() {
        long timestamp = timeGen();

        //If the current time is less than the timestamp generated by the last ID, it indicates that the system clock has fallback, and an exception should be thrown at this time
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(
                    String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        //If it is generated at the same time, the sequence within milliseconds is performed
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            //Sequence overflow in milliseconds
            if (sequence == 0) {
                //Block to the next millisecond and get a new timestamp
                timestamp = tilNextMillis(lastTimestamp);
            }
        }
        //Timestamp change, sequence reset in milliseconds
        else {
            sequence = 0L;
        }

        //Last generated ID
        lastTimestamp = timestamp;

        //Shift and put together by or operation to form a 64 bit ID
        return ((timestamp - twepoch) << timestampLeftShift) //
                | (datacenterId << datacenterIdShift) //
                | (workerId << workerIdShift) //
                | sequence;
    }

    /**
     *Block to the next millisecond until a new timestamp is obtained
     *@ param lasttimestamp the last time ID was generated
     *@ return current timestamp
     */
    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    /**
     *Returns the current time in milliseconds
     *@ return current time (MS)
     */
    protected long timeGen() {
        return System.currentTimeMillis();
    }

    //==============================Test=============================================
    /**Testing*/
    public static void main(String[] args) {
        SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
        for (int i = 0; i < 1000; i++) {
            long id = idWorker.nextId();
            System.out.println(Long.toBinaryString(id));
            System.out.println(id);
        }
    }
}
  • advantage:

    • Fast generation speed
    • The implementation is simple without redundant dependencies
    • Each bit segment can be adjusted according to the actual situation, which is convenient and flexible
  • Disadvantages:

    • Only trend increments
    • Rely on machine time If a callback occurs, the generated ID may be duplicated

Snowflake algorithm time callback problem:

  • Causes of time callback:

    • Due to business needs, the machine needs to synchronize the time server
  • Solution to time callback problem:

    • When the callback time is less than 15ms, you can wait for the time to catch up and then continue to generate
    • When the callback time is greater than 15ms, the callback problem can be solved by replacing the workid to generate an ID that has not been generated before
  • Steps:

    • First, adjust the number of digits of workid to 15 digits

      Distributed unique ID generation scheme selection! Detailed analysis of snowflake algorithm snowflake

      Insert picture description here
    • ThenSnowflakeIdWorkerImplement adjustment bit segment
      • use1Bit asSign bit,That is, the generated distributed I unique D is a positive number
      • use38Bit asTimestamp,Represents the incremental value of the current time relative to the initial time, in milliseconds
      • use15Bit asMachine ID,Up to 32800 nodes can be supported
      • use10Bit asSerial number in milliseconds,Theoretically, 2 can be generated10Serial numbers
    • Because of the stateless relationship of the service, under normal circumstancesworkIdIt will not be configured in the specific configuration file. You can choose centralized configuration hereRedisAs central storage:
      • Put the extra 30000 workids obtained by adjusting the number of workids into a redis based queue for centralized management of workids
      • Each time the node is started, check whether there is a workid in the local area. If so, it will be regarded as workid If not, take a workid from the queue and delete it from the queue
      • When it is found that there are too many time callback, go to the queue to use a new workid, and save the workid in the case of callback to the queue Because the queue is taken out from the beginning and inserted from the end every time, this can avoid the possibility that the workid just used by machine a is obtained by machine B
      • If you use redis, you will encounter new small problems: how to ensure the consistency of redis? What if redis hangs up? How?
  • From the perspective of the use of basic componentsSnowflakeIdWorkerWhen the algorithm encounters the time callback problem, it only needs to throw an exception, which can ensure the simplicity of the algorithm
  • You can also refer touid-generatorMethod: take one batch at a timeworkId,Batch fetching after centralization can solve the performance problem of each node accessing the centralized machine

Recommended Today

C + + : file operation

File operation is an indispensable part of program development. Any software that needs data storage needs file operation. File operations include opening, reading and writing files. (1) Stream class library in C + + C + + language defines special class libraries for standard input and output of different types of data. The class libraries […]