Unique ID generation algorithm for distributed systems

Time:2022-5-10

Background: if the ID is not unique in the scenario of sub database and sub table, it cannot be queried through the primary key index.

1. Self increment ID of independent database

This scheme means that every time your system generates an ID, it inserts a piece of data with no business meaning into an independent table of an independent database, and then obtains an ID added by a database. After getting this ID, write it into the corresponding sub database and sub table. If you have an auto_ There is an ID library called auto_ ID table. One ID is self incremented. If you want to obtain a globally unique ID every time, you can directly insert a record into this table to obtain a globally unique ID, and then the globally unique ID can be inserted into the sub warehouse and sub table of the order. The advantage of this scheme is that it is convenient and simple, and everyone can use it. The disadvantage is that a single database generates self increasing IDs. If concurrency is used, there will be a bottleneck, because auto_ If the ID library carries tens of thousands of concurrency per second, it must be unrealistic.

2、UUID

Generate a globally unique ID with UUID.
The advantage is that each system is generated locally, not based on the database
The disadvantage is that UUID is too long (32 bits), and its performance as a primary key is too poor to be used for primary keys. If you want to randomly generate a file name, number and so on, you can use UUID, but you can’t use UUID as the primary key.

3. Get the current time of the system

This scheme means to obtain the current time as the globally unique ID. But the problem is that when the concurrency is very high, such as thousands of concurrency in one second, there will be duplication. This is definitely inappropriate. Generally, if this scheme is used, the current time is spliced with many other business fields as an ID. if it is acceptable in business, it is also acceptable. Other business field values can be spliced with the current time to form a globally unique number, such as order number: timestamp + user ID + business meaning code; Or timestamp + 6 random numbers (26 English letters + numbers)
At the same time, the table index can add a unique index.

4. Snowflake algorithm

Snowflake algorithm is an open-source distributed ID generation algorithm of twitter
The core idea is to use a 64 bit long number as the globally unique ID. among the 64 bits, one bit is not used, and then 41 bit is used as the number of milliseconds, 10 bit is used as the working machine ID, and 12 bit is used as the serial number.

Unique ID generation algorithm for distributed systems

image.png

The first part is a bit: 0, which is meaningless (because if the first bit in binary is 1, then they are all negative numbers, but the IDS we generate are all positive numbers, so the first bit is 0 uniformly)
The second part is 41 bits: it represents the time stamp, and the unit is milliseconds (41 bits can represent up to 2 ^ 41 – 1, that is, it can identify 2 ^ 41 – 1 milliseconds, and conversion to adulthood means 69 years.)
The third part is 5 bits: it represents the machine room ID, 10001 (up to 2 ^ 5 machine rooms (32 machine rooms))
The fourth part is 5 bits: it represents the machine ID, 1 1001 (each machine room ⾥ can represent 2 ^ 5 machines (32 machines))
The fifth part is 12 bits: the serial number represents the serial number of IDs generated simultaneously in a millisecond on a machine in a machine room, 0000 00000000 (the maximum positive integer that 12 bits can represent is 2 ^ 12 – 1 = 4096, that is, 4096 different IDS in the same millisecond can be distinguished by the number represented by 12 bits)

Simply put, if one of your services is supposed to generate a globally unique ID, you can send a request to the system deployed with the snowflake algorithm, which generates the unique ID.

After receiving this request, the snowflake algorithm system will first generate a 64 bit long ID by binary bit operation, and the first bit of the 64 bits is meaningless. After 41 bits, you can use the current timestamp (in milliseconds), then set the machine room ID for 5 bits, and set the machine ID for 5 bits. Finally, judge the number of requests on the machine in the current machine room within one millisecond. Add a sequence number to the request to generate ID as the last 12 bits. Finally, a 64 bit ID is displayed, similar to that in the figure above.
This algorithm can guarantee that a unique ID is generated on a machine in a computer room in the same millisecond. Multiple IDs may be generated in a millisecond, but they are distinguished by the sequence number of the last 12 bits. In short, each bit of a 64 bit number is used to set different flag bits to distinguish each ID.

Unique ID generation algorithm for distributed systems

image.png

The algorithm is implemented as follows

public class IdWorker { 
private long workerId; //  This represents the machine ID 
private long datacenterId; //  This represents the machine room ID 
private long sequence; //  This is the latest sequence number representing multiple IDs generated within 1 millisecond 
public IdWorker(long workerId, long datacenterId, long sequence) { 

//The requirement is that the machine room ID and machine ID you pass in cannot exceed 32 and cannot be less than 0 
if (workerId > maxWorkerId || workerId < 0) { 
  throw new IllegalArgumentException( String.format("worker Id can't be greater than %d or less t "))
}

if (datacenterId > maxDatacenterId || datacenterId < 0) { 
throw new IllegalArgumentException( String.format("datacenter Id can't be greater than %d or le "))
}

this.workerId = workerId; 
this.datacenterId = datacenterId; 
this.sequence = sequence; 
}

private long twepoch = 1288834974657L; 
private long workerIdBits = 5L; 
private long datacenterIdBits = 5L;

//This is a binary operation, that is, 5 bits can only have 31 numbers at most, that is, the machine ID can only be within 32 at most 
private long maxWorkerId = -1L ^ (-1L << workerIdBits); 
//This means that 5 bits can only have 31 numbers at most, and the machine room ID can only be within 32 at most 
private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits); 
private long sequenceBits = 12L; 
private long workerIdShift = sequenceBits; 
private long datacenterIdShift = sequenceBits + workerIdBits; 
private long timestampLeftShift = sequenceBits + workerIdBits + datacen 
private long sequenceMask = -1L ^ (-1L << sequenceBits); 
private long lastTimestamp = -1L; 
public long getWorkerId(){ 
return workerId; 
}
public long getDatacenterId() { 
return datacenterId; 
}
public long getTimestamp() { 
return System.currentTimeMillis(); 
}

//This is the core algorithm. By adjusting the NextID () method, let the snowflake algorithm program on the current machine generate 1 an ID
 public synchronized long nextId() { 
//Gets the current timestamp in milliseconds
 long timestamp = timeGen();
 if (timestamp < lastTimestamp) {
 System.err.printf( "clock is moving backwards. Rejecting requests until %d.", throw new RuntimeException( String.format("Clock moved backwards. Refusing to generate lastTimestamp - timestamp))"));
 }

//Suppose another request is sent within the same millisecond to generate an ID 
//At this time, the Seqence serial number must be incremented by 1, up to 4096 
if (lastTimestamp == timestamp) {
//This means that there can only be 4096 numbers at most in a millisecond, no matter how many you pass in, 
//This bit operation is guaranteed to always be within the range of 4096 to avoid passing more than 4096 sequences
sequence = (sequence + 1) & sequenceMask;
 if (sequence == 0) {
 timestamp = tilNextMillis(lastTimestamp);
 } 
} else {
 sequence = 0;
 }
//Record the timestamp of the last generated ID, in milliseconds
 lastTimestamp = timestamp;
 //The core binary operation generates a 64bit ID 
//First move the current timestamp to the left and put it at 41 bit; Move the machine room ID left to 5 bits; Move the machine ID to the left 
//Finally, it is spliced into a 64 bit binary number and converted into hexadecimal, which is a long type 
return ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift) | (workerId << workerIdShift) | sequence; 
}
private long tilNextMillis(long lastTimestamp) {
 long timestamp = timeGen();
 while (timestamp <= lastTimestamp) {
 timestamp = timeGen();
 }return timestamp;
 }
private long timeGen(){
 return System.currentTimeMillis();
 }
//---------------Testing--------------- 
public static void main(String[] args) {
 IdWorker worker = new IdWorker(1,1,1);
 for (int i = 0; i < 30; i++) {
System.out.println(worker.nextId()); 
      }
   }
}

In the actual development, this snowflake algorithm can be slightly improved. When we generate a unique ID, we must specify a table name, such as the unique ID of the order table. Therefore, among the above 64 bits, the five bits representing the computer room can be replaced by the name of the business table. For example, 00001 represents the order table. In fact, there are not so many computer rooms in many cases, so the significance of the five bits as computer room IDS is not too great. In this way, each machine of the snowflake algorithm system can generate a business table within a certain millisecond
One unique ID generates many IDS within one millisecond, and the last 12 bits are used to distinguish the sequence number.