In the business system, it is necessary to generate non duplicate IDS in many scenarios, such as order number, payment slip number, coupon number, etc. This article will introduce the causes of distributed ID, as well as four commonly used distributed ID implementation schemes in the industry, and introduce in detail the implementation of two of them as well as their advantages and disadvantages, hoping to bring you some benefitsInspiration on distributed ID。
Why use distributed ID
With the growth of business data, there are more and more data stored in the database. When the space occupied by the index exceeds the available memory size, it will search for data through disk index, which will greatly reduce the speed of data query. How to solve this problem? Generally, we first solve the problem by dividing the database and tables. After dividing the database and tables, we can’t use the database self increasing ID as the unique number of the data, so we need toUse distributed ID for unique numberingIt’s too late.
Implementation of distributed ID
At present, there are four implementation schemes for distributed ID in the industry
- UUID: ID generated by UUID # randomuuid() of JDK;
- Atomic self accretion of redis: ID generated by using jedis # incr (string key);
- Snowflake algorithm: 64 bit long ID composed of timestamp machine number and concurrency within milliseconds;
- Step size: read the ID of a available range from the database according to the step size;
Let’s summarize the characteristics of these schemes
|programme||Sequence||Repeatability||usability||Deployment mode||Available time|
|UUID||disorder||It can achieve very low repetition probability through many random characters, but it will be repeated in theory||Always available||JDK direct call||permanent|
|Redis||Monotonic increase||In RDB persistence mode, there will be duplication||Redis not available after downtime||Jedis client call||permanent|
|Snowflake||Increasing trend||It won’t repeat||Unavailable when clock callback occurs and the callback time exceeds the waiting threshold||Integrated deployment, cluster deployment||Year 69|
|Step size||Increasing trend||It won’t repeat||If the database is down and the ID within the acquisition step is used up, it is not available||Integrated deployment, cluster deployment||permanent|
We know more about the usage and implementation of the previous two implementation schemes, so we won’t repeat them here In this paper, we will introduce snowflake algorithm and step size scheme in detail.
Snowflake algorithm can be used after the machine number is assigned. It does not rely on any third-party services to generate local ID. the less the third-party services it relies on, the higher the availability is. Let’s first introduce the snowflake algorithm.
The decimal range of long integer numbers (long numbers) is – 2 ^ 64 to 2 ^ 64-1.
Snowflake is composed of 64 binary bits from left to right, but the first bit is not used. Therefore, in snowflake, 63 bit long integer unsigned numbers are used, which are composed of time stamp, machine number and concurrent sequence number in milliseconds
- Time stamp bit: the difference between the current millisecond time stamp and the new era time stamp (the so-called new era time stamp is the time when the application starts to use snowflake). If the new era time is not set, the time stamp is calculated from 1970 by default. Setting the new era time can extend the available time of snowflake. The conversion of 41 bit binary system to decimal system is 2 ^ 41, divided by (365 days * 24 hours * 3600 seconds * 1000 milliseconds), which is about 69 years, so it can be used for 69 years at most;
- Machine number: it is 2 ^ 10, that is 1024, which means it can support 1024 machine nodes at most;
- Concurrency sequence number within a millisecond: the 12 bit binary to decimal is 2 ^ 12, which is 4096. That is to say, the concurrent ID acquisition on a machine node within a millisecond can support 4096 concurrencies at most;
Let’s take a look at the usage of each segment
|Binary segmentation||||[2, 42]||[43, 52]||[53, 64]|
|explain||The highest sign bit is not used||There are 41 bits in total, which are millisecond time stamp bits||A 10 digit is the machine number||A total of 12 bits are the concurrent sequence number within milliseconds. If the timestamp of the current request is the same as that of the last request, the concurrent sequence number within milliseconds will be increased by one|
So what does the ID generated by snowflake look like? Let’s take a few examples (suppose our new era of time stamp is 2020-12-31 00:00:00)
|time||Machine number||Millisecond concurrency||Decimal snowflake ID|
Snowflake can be deployed in three different ways: integrated distributed deployment, central cluster deployment and direct cluster deployment. Let’s introduce these deployment methods.
Snowflake integrated distributed deployment
When there are few application nodes using ID, such as 200 nodes, it is suitable to use integrated distributed deployment. After each application node determines the machine number at startup, the runtime does not rely on any third-party services, and generates ID locally using timestamp, machine number, and concurrent sequence number within milliseconds.
The following figure shows the process of application server getting distributed ID by introducing jar package. Each application server node using distributed ID will be assigned a unique machine number in the topology network. The management of this machine number is stored in MySQL or zookeeper.
When there are many machine nodes using distributed ID in the topology network, such as more than 1000 machine nodes, it is not appropriate to use the distributed ID of integrated deployment, because the number of machine numbers is 10, that is, the maximum number of machine numbers is 1024. When there are more than 1000 machine nodes, you can use the central cluster deployment mode described below.
Cluster deployment of snowflake Center
Central cluster deployment needs to add an ID gateway for request forwarding, such as using nginx reverse proxy (i.e. ID rest API gateway in the figure below).
After the ID gateway is used for networking, the application server requests the ID gateway to obtain the distributed ID through HTTP or RPC. In this way, compared with the above integrated distributed deployment mode, more application nodes can use distributed ID.
As shown in the figure, the machine number is only assigned to the ID generator node in the figure below. The application node does not need to assign the machine number.
Using the central cluster deployment mode needs to introduce a new nginx reverse proxy as the gateway, which increases the complexity of the system and reduces the availability of services. Next, we will introduce a direct cluster deployment mode that can support more than 1000 application nodes without introducing nginx.
Snowflake direct cluster deployment
Compared with the central cluster deployment mode, the direct cluster deployment mode can remove the intermediate ID gateway and improve the service availability.
When using ID gateway, we need to configure the service address of ID generator node in ID gateway. When using the direct cluster deployment mode, the service address of ID generator node can be configured in the local configuration file of the application server or in the configuration center. After the application server obtains the service address list, it needs to implement the service routing and directly connect to the ID generator to obtain the ID.
Problems of snowflake algorithm
Snowflake algorithm is a time stamp dependent algorithm. If clock callback occurs, ID duplication will occur. So how does clock callback come into being, and how do we need to solve this problem?
Automatic calibration of NTP (Network Time Protocol) service may cause clock callback. Every computer around us has its own local clock, which is calculated according to the crystal oscillator pulse of the CPU. However, as the running time goes on, the deviation between this time and the world time will be larger and larger. NTP is used to do clock calibration service.
In general, the probability of clock callback is very small, because once the local time needs to be calibrated relative to the world time, but the clock deviation value is less than the step threshold (the default is 128 MS), the computer will choose the slew mode for synchronization, that is, 0.5 The speed difference of MS / s adjusts the clock speed to ensure that the local clock is continuously forward without clock callback until the local clock is aligned with the world clock.
However, if the difference between the local clock and the world clock is greater than the step threshold, clock callback will occur. This step threshold can be modified, but the larger the modification is, the longer the calibration time will be spent during slew calibration. For example, when the step threshold is set to 10 minutes, that is, when the deviation between the local clock and the world clock is less than 10 minutes, it will be calibrated in slew mode, which will take up to 14 days to complete the calibration.
In order to avoid the problem of repeated ID caused by clock callback, the step threshold of 128 MS can be used. At the same time, when the snowflakeid is obtained, compared with the last timestamp, it can judge whether the clock callback is within 1 second. If it is within 1 second, then wait for 1 second, otherwise the service is not available. This can solve the problem of clock callback for 1 second.
Step size scheme
Snowflake takes the time stamp as the high bit of long shaping, so the minimum number generated is also very large. For example, if the machine number is 1 and the millisecond concurrency sequence is 1, the generated ID will be 4194308097. So is there a way to generate ID with smaller number in the initial state? The answer is yes. Let’s introduce the segmentation step ID scheme.
useStep sizeTo generate the ID is to store the step size and the current maximum ID in the database, and update the maximum ID in the database to increase the step size each time the ID is obtained.
The core table structure of the database is as follows:
``CREATE TABLE `segment_id` (`` `` `id` bigint(20) NOT NULL AUTO_INCREMENT,`` `` `biz_ Type ` varchar (64) not null default '// service type`` ``'MAX 'bigint (20) default' 0 '// current maximum ID value`` ``'step 'bigint (20) default' 10000 '// ID step`` ``PRIMARY KEY (`id`)`` `) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8`
When obtaining the ID, open the transaction and use the row lock to ensure that the maximum ID value of the current update is read
`start transaction;` `update segment_id set max = max + step where biz_type = 'ORDER';` `select max from segment_id where biz_type = 'ORDER';` `commit`
The advantages and disadvantages of the segmented step ID generation scheme are as follows
- Advantages: ID generation does not depend on timestamp, and the initial value of ID generation can be gradually increased from 0;
- Disadvantages: when the service is restarted, the maximum ID value needs to be increased by the step size. If the service is restarted frequently, many segments will be wasted.
For the optimization of the above two implementation schemes
The snowflake algorithm and segmented step size scheme are introduced above. They have their own advantages and disadvantages. According to their respective situations, we also give the corresponding optimization scheme in this paper.
ID buffer ring
In order to improve the concurrency performance and availability of * * snowflakeid * *, ID buffer ring can be used. By using buffer ring, we can make full use of millisecond timestamps and improve the availability, which can relatively alleviate the service unavailability caused by clock callback. Buffer ring is realized by fixed length array and cursor hash. Compared with linked list, it does not need frequent memory allocation.
When the ID buffer ring is initialized, the ID generator will be requested to fill the ID buffer ring. When the business needs to obtain the ID, the ID will be obtained from the head of the buffer ring in turn. When the number of remaining IDS in the ID buffer ring is less than the set threshold percentage, for example, when the number of remaining IDS is less than 30% of the whole ID buffer ring, asynchronous ID filling loading is triggered. Asynchronous ID padding will append the newly generated ID to the end of the queue of the ID buffer ring, and then map it to the ID buffer ring according to the hash algorithm. In addition, there is a separate timer asynchronous thread to fill the ID buffer ring regularly.
The following animation shows the three stages of ID buffer ring: ID initial loading, ID consumption and ID filling after consumption
- Buffer ring initialize load: get ID from ID generator and fill it into ID buffer ring until ID buffer ring is filled;
- Buffer ring consumption: business application obtains ID from ID buffer ring;
- Async reload, asynchronous loading and filling ID buffer ring: the timer thread is responsible for obtaining ID from ID generator asynchronously, adding it to ID buffer queue, and mapping it to ID buffer ring according to hash algorithm. When ID buffer ring is filled, asynchronous loading and filling ends;
The following flow chart shows the whole life cycle of ID buffer ring operation, in which:
- Idconsumerserver: business system using distributed ID;
- Idbufferring: ID buffer ring;
- Idgenerator: ID generator;
- Idbufferringasyncloadthread: the thread that loads ID to buffer ring asynchronously;
- Timer: responsible for adding tasks to asynchronous loading thread regularly to load ID;
- ID consumption process: the buffer ring consumption mentioned above;
Overall process: the client business requests to the application server, and the application server obtains the ID from the ID buffer ring. If the ID buffer ring is empty, then the service is not available; if there is an ID in the ID buffer ring, then an ID is consumed. At the same time, when consuming the ID in the ID buffer ring, if the number of IDS remaining in the ID buffer ring is less than 30% of the capacity of the whole ID buffer ring, asynchronous loading will be triggered to fill the ID buffer ring.
ID double bucket buffer
in useSegment step IDIf the ID of the segment is used up, the maximum value of the database segment needs to be updated before the ID generation service can be continued. In order to reduce the impact of the delay caused by the database update query on the performance of the ID service, the double bucket cache scheme can be used to improve the availability of the ID generation service.
Its main principle: design two cache buckets: currentbufferbucket and nextbufferbucket. Each bucket stores so many IDS in one step. If the ID of the current cache bucket is used up, the next cache bucket will be set as the current cache bucket.
The following animation shows the whole process of double bucket cache initialization, asynchronous loading of the preparation bucket and switching the preparation bucket to the current bucket
- Current bucket initial load: initialize the current bucket, that is, update max = MAX + step, and then obtain the updated max value. For example, if the step size is 1000 and the updated max value is 1000, then the height of the bucket is the step size, that is, 1000, min = max – step + 1 = 1, max = 1000;
- Current bucket remaining id count down to 20%，Next bucket start to load。 When the ID of the current bucket is less than 20%, you can load the next bucket, that is, update max = MAX + step, and then get the updated max value. At this time, the updated max value is 2000, min = max – step + 1 = 1001, max = 2000;
- Current bucket is drawn, switch current bucket to the next bucket. If all the IDs of the current bucket are used up, the next ID bucket will be set to the current bucket;
The flow chart of double bucket buffer is as follows:
This paper mainly introduces the implementation scheme of distributed ID, and introduces the snowflake scheme and segmented step size scheme in detail, as well as the optimization scheme for these two schemes. Let’s briefly summarize two options:
- In the high concurrency scenario, a large number of Distributed IDS are generated, which is suitable for snowflake algorithm. The concurrency sequence in milliseconds is 2 ^ 12 = 4096. The single QPS supports up to 4 million, but the machine number of ID generator needs to be managed;
- If the step size is too short to meet the concurrent requirements, if the step size is too long, it will cause the transition waste of segmentation;
The above is the whole content of this article, if you have more about distributed ID technology, you are welcome to leave a message and communicate with us.
Introduction to the author
Goode, Netease cloud letter senior java development engineer. Now I am responsible for the design and development of Netease conference account system, interactive live broadcast and other modules. Have some experience in middleware technology such as microservice and distributed transaction. Love technology, like coding, good at object-oriented design and programming, Domain Driven Design and code optimization and refactoring.
More technical dry goods welcome to pay attention to [Netease smart enterprise technology +] public number~