How to generate global unique ID in distributed environment?

Time:2021-2-22

In a distributed system, there are some scenarios that need to use a globally unique ID, which can be related to business scenarios, such as payment serial number, or irrelevant to business scenarios. For example, a globally unique ID is required after database and table splitting, or used as a transaction version number, distributed link tracking, etc

  • Global uniqueness: This is the most basic requirement and cannot be repeated;
  • Incremental: some special scenarios must be incremented, such as the transaction version number, and the ID generated later must be greater than the previous ID; some scenarios are better than not incrementing, because incrementing is beneficial to the performance of the database index;
  • High availability: if it is a system or service that generates a unique ID, there must be a large number of calls, so it is very important to ensure its high availability;
  • Information security: if the ID is continuous, then it is easy to be malicious operation or leakage. For example, if the order number is continuous, then it is easy to see how many orders a day is;
  • In addition, considering the storage pressure, the shorter the ID, the better.

So what are the solutions to generate unique ID in distributed scenarios?

Using database to generate

Let’s talk about the most easy to understand scheme, which uses the self growing sequence of the database to generate: the database generates a unique primary key and provides it to other systems through services; if it is a small system, the total amount of data and concurrency are not very large, this scheme is enough to support.

If one ID is generated each time, there may be pressure on the database. You can consider generating n IDS at one time and putting them into the cache. If all the IDS in the cache are taken out, the next batch of IDS will be generated through the database.

  • advantage:It’s the easiest to understand and the easiest to implement.
  • Disadvantages:It is also very obvious that the implementation of each kind of database is different. If the database needs to be migrated, it will be more troublesome. The biggest problem is the performance. When the concurrency reaches a certain level, this method is estimated to be difficult to meet the performance requirements. In addition, the ID generated by database autoincrement carries too little information, which can only play the role of an identifier. At the same time, the autoincrement ID is also continuous.

Using other components / software / middleware to generate

Use redis / mongodb / zookeeper to generate: redis uses incr and increby; mongodb’s objectid; ZK uses znode data version; all can generate global unique identification code.

Let’s take the objectid of mongodb as an example

{"_id": ObjectId("5d47ca7528021724ac19f745")}

The objectid of mongodb is 12 bytes in total, of which:

  • Previous versions (including 3.2) before 3.2:4-byte timestamp + 3-byte machine identifier + 2-byte process ID + 3-byte random counter
  • After version 3.2:4-byte timestamp + 5-byte random value + 3-byte up counter

Whether it is the old version or the new version, the objectid of mongodb can at least guarantee the uniqueness in the cluster. We can build a globally unique ID generation service, and use mongodb to generate objectid and provide external services (all language drivers of mongodb implement the objectid generation algorithm).

  • advantage:Performance is higher than database; cluster deployment can be used; ID has some meanings, such as time stamp;
  • Disadvantages:Just like the database, the corresponding components / software need to be introduced, which increases the complexity of the system. The most important thing is that the two schemes mean that the system (service) that generates the global unique ID will become a single point. In the software architecture, it alone means risk. If the service has problems, all the systems that depend on the service will crash.

UUID

This is the most commonly used algorithm to generate unique identification code in distributed architecture. In order to ensure the uniqueness of UUID, the generation factors include MAC address, timestamp, namespace, random or pseudo-random number, timing and other elements; UUID has multiple versions, and each version has different algorithm and application scope

  • Version 1:Time based UUID is obtained by time stamp + random number + MAC address; if the application is directly used in LAN, IP address can be used instead of MAC address; it is highly unique (MAC address leakage is also a security problem).
  • Version 2:DCE security UUID, change the first four positions of timestamp in version 1 to POSIX uid or GID; highly unique.
  • Version 3:UUID (MD5) based on name is obtained by calculating the MD5 hash value of name and namespace; it is unique in a certain range.
  • Version 4:Random UUID, according to random number or pseudo-random number to generate UUID; there is a certain probability of repetition.
  • Version 5:UUID (SHA1) based on name is similar to version 3, except that SHA1 algorithm is used for hash value calculation; it is unique within a certain range.
public class CreateUUID {
 public static void main(String[] args) {
  String uuid = UUID.randomUUID().toString();
  System.out.println("uuid : " + uuid);
​
  uuid = UUID.randomUUID().toString().replaceAll("-","");
  System.out.println("uuid : " + uuid);
 }
}
  • advantage:Local generation, no network consumption, no need for third-party components (there is no single point of risk), generation is relatively simple, good performance.
  • Disadvantages:The length is long, which is not conducive to storage, and there is no sorting, which will affect the performance relatively (for example, if the UUID is used as the database primary key in the InnoDB engine of MySQL, its disorder will lead to frequent changes in the data location).

Snowflake

If you want the ID to be generated locally, but not as disorderly as UUID, you can consider using snowflake algorithm (twitter open source).

The ID generated by snowflake algorithm is a 64 bit integer, including:

  • 1 bit :No, fixed is 0;
  • 41 bit :Time stamp (MS), the value range is: 0 to the 41st power of 2 – 1; converted to adult, it is about 69 years;
  • 10 bit :Machine ID; 5-digit machine room ID + 5-digit machine ID; (when the number of service clusters is relatively small, it can be manually configured; if the service scale is large, it can be automatically configured by using third-party components, such as leaf snowflake of meituan, which uses the persistent sequence node of zookeeper as the machine ID)
  • 12 bit :Serial number, used to record different IDs generated in the same millisecond.

In Java, the ID generated by snowflake algorithm can be stored with long.

  • advantage:Local generation, no network consumption, no need for third-party components (there is no single point of risk), unique within a certain range (can basically meet most scenarios), good performance, increasing by timestamp (increasing by trend);
  • Disadvantages:Depending on the machine clock, if the same machine dials back the time, the generated ID will have the risk of repetition.

How to generate global unique ID in distributed environment?

In addition, many excellent Internet companies also provide unique ID generation solutions or frameworks, such as meituan open source leaf, baidu open source uidgenerator, etc.

@Resource
private UidGenerator uidGenerator;
​
@Test
public void testSerialGenerate() {
    // Generate UID
    long uid = uidGenerator.getUID();
    System.out.println(uidGenerator.parseUID(uid));
}

Uncle Wen of Huidian code [original]


How to generate global unique ID in distributed environment?