Design and implementation of distributed ID generator (cosid)

Time:2021-9-26

Distributed ID generator(CosId)Design and Implementation

CosIdbrief introduction

CosIdIt aims to provide a general, flexible and high-performance distributed ID generator. Two types of ID generators are currently available:

  • SnowflakeId : Single TPS performance: 409w / S Jmh benchmark, mainly solveClock callback problemMachine number assignment problemAnd provide a more friendly and flexible use experience.
  • SegmentId: get one paragraph at a time(Step)ID to reduce the network IO request frequency of segment distributor and improve performance.

    • IdSegmentDistributor: segment distributor (segment memory)

      • RedisIdSegmentDistributor: Based onRedisSegment distributor.
      • JdbcIdSegmentDistributor: Based onJdbcSegment distributor, supporting various relational databases.
    • SegmentChainId(recommend):SegmentChainId (lock-free)YesSegmentIdEnhanced. Approximate performanceAtomicLongofTPS performance: 12743w + / S Jmh benchmark

      • PrefetchWorkerMaintain safe distance(safeDistance), and supports dynamic based on hunger statesafeDistanceExpansion / contraction.

Background (whyDistributed ID

In the process of software system evolution, with the growth of business scale, we need cluster deployment to share the pressure of computing and storage. We can easily achieve stateless and elastic scalability of application services.
But is simply increasing the number of service replicas enough? Obviously, it’s not enough, because the performance bottleneck is often at the database level. At this time, we need to consider how to expand, scale and cluster the database, usually in the way of database and table.
So how do I partition (horizontal partition and, of course, vertical partition, but not the content to be discussed in this paper). The premise of partition is that we have to have an ID first, and then we can partition according to the partition algorithm. (for example, the relatively simple and commonly used ID module segmentation algorithm is similar to the concept of hash algorithm. We need to have a key to hash to obtain the insertion slot.)

Of course, there are many distributed scenarios that need to beDistributed ID, I won’t list them one by one here.

Core indicators of distributed ID scheme

  • Global (same business) uniquenessThe only guarantee is:IDIt is easy to understand that if the ID is not unique, there will be a primary key conflict.

    • Generally speaking, global uniqueness does not mean that all business services must be unique, but that different deployment replicas of the same business service are unique.
      For example, multiple deployment copies of the order service are being generatedt_orderThis table isIdIt is required to be globally unique. as fort_order_itemGeneratedIDAndt_orderWhether it is unique or not does not affect the uniqueness constraint and will not produce any side effects.
      The same is true for different business modules. That is, uniqueness mainly solves the problem of ID conflict.
  • OrderOrdering guarantee is necessary for query oriented data structure algorithms (except hash algorithm):Binary search method(divide and rule) premise.

    • Mysq InnoDB B + tree is the most widely used. Assuming that the ID is disordered, in order to maintain the order of ID, B + tree will frequently insert in the middle of the index and move the position of subsequent nodes, and even lead to frequent page splitting, which has a great impact on performance. Then, if we can ensure the order of IDS, this situation is completely different. We only need to perform additional write operations. Therefore, the order of ID is very important, and it is also an inevitable feature of ID design.
  • Throughput / performance (OPS / time): the number of IDs that can be generated per unit time (per second). Generating ID is a very high-frequency operation and the most basic. Assuming that the performance of ID generation is slow, no matter how you optimize the system, you can’t get better performance.

    • Generally, we will first generate the ID and then perform the write operation. Assuming that the ID generation is slow, the overall performance ceiling will be limited, which should be understandable.
  • Stability (time / OP): generally, the stability index can be usedPercentile sampling is performed at the time of each operationTo analyze, for exampleCosIdPercentile samplingP9999=0.208 us/op, i.e0% ~ 99.99%Unit operation time is less than or equal to0.208 us/op

    • Percentile wiki: in statistical terms, if a group of data is sorted from small to large and the corresponding cumulative percentage point is calculated, the value of the data corresponding to a percentage point is called the percentile of the percentage point, and PK is the kth percentile. The percentile is used to compare the relative position of individuals in a group.
    • Why not averageTime per operation: can Mr. Ma’s worth be equal to yours? Is the average value meaningful?
    • Can use minimumTime per operation, maximumTime per operationFor reference? Because the minimum and maximum values only describe the situation of the zero boundary point, although they can be used as a reference for stability, they are still not comprehensive. andPercentileThese two indicators have been covered.
  • Autonomy (dependency)It mainly refers to whether there is dependence on the external environment, for example:Segment modeWill strongly rely on third-party storage middleware to obtainNexMaxId。 Autonomy also has an impact on availability.
  • usabilityFor example, the availability of Distributed IDS is mainly affected by autonomySnowflakeIdWill be affected by clock callback, resulting in a short period of unavailability. andSegment modeWill be affected by a third-party transmitter(NexMaxId)Availability impact.

    • Availability wiki: the proportion of total available time for a functional individual within a given time interval.
    • MTBF: mean time between failures
    • MDT: average repair / recovery time
    • Availability=MTBF/(MTBF+MDT)
    • It is assumed that MTBF is 1 year and MDT is 1 hour, i.eAvailability=(365*24)/(365*24+1)=0.999885857778792≈99.99%That is, what we usually call the four 9s for availability.
  • adaptability: refers to the adaptive ability in the face of external environment changes. Here we mainly talk about the performance of dynamically scaling distributed IDS in the face of traffic bursts,

    • SegmentChainIdCan be based onStarvation stateconductsafe distance Dynamic scaling.
    • SnowflakeIdThe performance of the conventional bit allocation scheme is constant 409.6w. Although different TPS performance can be obtained by adjusting the bit allocation scheme, the change of bit allocation method is destructive. Generally, the bit allocation scheme will not be changed after it is determined according to the business scenario.
  • storage space : take the mysq InnoDB B + tree as an example. The common index (secondary index) will store the primary key value. The larger the primary key, the larger the memory cache and disk space will be occupied. The less data the page stores, the more disk IO accesses will be made. In short, it is a good design principle to occupy as little storage space as possible in most scenarios when meeting business requirements.

Comparison of core indicators of different distributed ID schemes

Distributed ID Global uniqueness Order throughput Stability (1s = 1000000us) Autonomy usability adaptability storage space
UUID/GUID yes Completely disordered 3078638(ops/s) P9999=0.325(us/op) Complete autonomy 100% no 128-bit
SnowflakeId yes Local monotonic increment and global trend increment (affected by global clock) 4096000(ops/s) P9999=0.244(us/op) Clock dependent Clock back dialing can cause temporary unavailability no 64-bit
SegmentId yes Local monotonic increase, global trend increase (affected by step) 29506073(ops/s) P9999=46.624(us/op) Relying on third-party segment distributors Affected by segment distributor availability no 64-bit
SegmentChainId yes Local monotonic increase and global trend increase (affected by step and safety distance) 127439148(ops/s) P9999=0.208(us/op) Relying on third-party segment distributors Affected by the availability of segment distributor, but due to the existence of safety distance, ID segment is reserved, so it is higher than segmentid yes 64-bit

Order (to divide and rule · dichotomy search method, we must protect ourselves)

We have just discussed the importance of ID ordering, so we should design the ID algorithm so that the ID is monotonically increasing as much as possible, such as the self increasing primary key of the table. Unfortunately, due to the distributed system problems such as global clock and performance, we usually can only choose the combination of local monotonic increase and global trend increase (just as we have to choose final consistency in Distributed Systems) to obtain various trade-offs. Let’s take a look at what is monotonic increase and trend increase.

Monotonic increase of order

Design and implementation of distributed ID generator (cosid)

Monotonically increasing: T represents the global absolute time point, assuming that there is tn+1>Tn(absolute time always goes forward. Relativity and time machine are not considered here). Then there must be f (T)n+1)>F(Tn), database self incrementing primary keys belong to this category.
In addition, it should be noted that monotonic increment and continuous increment are different concepts. Continuous increment:F(n+1)=(F(n)+step)That is, the next acquired ID must be equal to the current IDID+Step, whenStep=1It is similar to such a sequence:1->2->3->4->5

Expansion tips: the self increasing primary key of the database is not continuously increasing. I believe you must have encountered this situation. Please think about why the database is designed like this?

Increasing trend of order

Design and implementation of distributed ID generator (cosid)

Trend increment: tn>Tn-s, then the probability is f (T)n)>F(Tn-s)。 Although there is disorder in a period of time, the overall trend is increasing. From the above figure, there is an upward trend (trend line).

  • staySnowflakeIdinn-sAffected by global clock synchronization.
  • In segment mode(SegmentId)Mediumn-sAvailable interval of affected section(Step)Influence.

Distributed ID allocation scheme

UUID/GUID

  • Does not rely on any third-party Middleware
  • High performance
  • Completely disordered
  • Large space occupation, requiring 128 bit storage space.

The biggest defect of UUID is random and disordered. When it is used for the primary key, it will lead to the inefficiency of the primary key index of the database (in order to maintain the index tree, frequently insert data in the middle of the index instead of adding writes). This is the most important reason why UUID is not applicable to database primary keys.

SnowflakeId

Design and implementation of distributed ID generator (cosid)

SnowflakeIduseLongA distributed ID algorithm that generates IDS by (64 bit) bit partitioning.
The general bit allocation scheme is:timestamp(41-bit)+machineId(10-bit)+sequence(12-bit)=63-bit。

  • 41-bittimestamp=(1L < < 41) / (1000 / 3600 / 365), which can store a timestamp of about 69 years, that is, the absolute time that can be used isEPOCH+69 years, generally we need to customizeEPOCHFor product development time, you can also increase the number of timestamp bits by compressing the allocated bits in other areas to prolong the available time.
  • 10-bitmachineId=(1L < < 10) = 1024, that is, 1024 replicas can be deployed for the same service (there is no master-slave replica in the concept of kubernetes, and the definition of kubernetes is directly followed here). Generally, there is no need to use so many, so it will be redefined according to the deployment scale.
  • 12-bitsequence=(1L < < 12) * 1000 = 4096000, that is, a single machine can generate about 409w IDS per second, and a global same business cluster can generate IDs4096000 * 1024 = 419430w = 4.19 billion (TPS)

fromSnowflakeIdIt can be seen from the design:

  • 👍 timestampIn high order, single instanceSnowflakeIdIt will ensure that the clock is always forward (check the local clock back), so it is monotonically increasing. Affected by global clock synchronization / clock callbackSnowflakeIdThe global trend is increasing.
  • 👍 SnowflakeIdThere is no strong dependency on any third-party middleware, and the performance is also very high.
  • The bit allocation scheme can be flexibly configured according to the needs of the business system to achieve the optimal use effect.
  • Strongly dependent on the local clock, the potential clock callback problem will lead to ID duplication and short-term unavailability.
  • 👎 machineIdManual setting is required. If manual allocation is adopted during actual deploymentmachineId, it will be very inefficient.

Machine number assignment of snowflakeid

staySnowflakeIdIt is determined according to the bit allocation scheme designed by the business, and there is basically no change and little maintenance. howevermachineIdIt always needs to be configured and cannot be repeated in the cluster. Otherwise, the partition principle will be destroyed, resulting in the destruction of the ID uniqueness principle. When the cluster scale is largemachineIdThe maintenance work is very cumbersome and inefficient.

There is one thing that needs special explanation,SnowflakeIdofMachineIdIt is a logical concept, not a physical concept.
Imagine the hypothesisMachineIdIt’s physical, so it means that a machine can only have oneMachineId, what’s the problem?

at presentCosIdThe following three are provided:MachineIdDistributor.

  • Manualmachineiddistributor: manual configurationmachineIdGenerally, it can only be used when the cluster size is very small. It is not recommended.
  • Statefulsetmachineiddistributor: UsingKubernetesofStatefulSetProvide a stable identification ID (hostname = service-01) as the machine number.
  • Redismachineiddistributor: UsingRedisAs the distribution storage of the machine number, it will also be storedMachineIdLast timestamp forStart clock callbackInspection of.

Design and implementation of distributed ID generator (cosid)

Clock callback problem of snowflakeid

The fatal problem of clock callback is that it will lead to ID duplication and conflict (this is not difficult to understand). ID duplication is obviously intolerable.
staySnowflakeIdIn the algorithm, according toMachineIdThe partition ID is not difficult for us to understandMachineIdIt is impossible to produce the same ID. Therefore, the clock callback problem we solve refers to the current problemMachineIdThe clock callback problem is not the clock callback problem of all cluster nodes.

MachineIdThe clock callback problem can be divided into two cases:

  • Runtime clock callback: that is, the current timestamp obtained during runtime is smaller than the timestamp obtained last time. The clock callback in this scenario is easy to handle, generallySnowflakeIdThe code is stored when it is implementedlastTimestampIt is used to check the clock callback during operation and throw a clock callback exception.

    • It is not a good practice to throw an exception directly when the clock is dialed back, because the downstream user has almost no other processing scheme (oh, what else can I do, wait). Clock synchronization is the only choice. When there is only one choice, don’t let the user choose.
    • ClockSyncSnowflakeIdyesSnowflakeIdA wrapper that is used when a clock callback occursClockBackwardsSynchronizerActively wait for clock synchronization to regenerate ID, providing a more user-friendly experience.
  • Clock callback when starting: that is, the current clock obtained when starting the service instance is smaller than that when closing the service last time. At this timelastTimestampCannot be stored in process memory. When getting external storageMachine statusWhen it is greater than the current clock, it is usedClockBackwardsSynchronizerActive synchronization clock.

    • Localmachinestatestorage: use local file storageMachineState(machine number, last time stamp). Because local files are used, only when the deployment environment of the instance is stable,LocalMachineStateStorageIt applies.
    • Redismachineiddistributor:MachineStatestore inRedisIn the distributed cache, this can ensure that the last service instance downtime can always be obtainedMachine status

JavaScript value overflow problem of snowflakeid

JavaScriptofNumber.MAX_SAFE_INTEGEROnly 53 bit, if the 63 bitSnowflakeIdIf it is returned to the front end, there will be value overflow (so here we should know what is passed from the back end to the front end)longValue overflow problem,sooner or laterWill appear, but snowflakeid appears faster).
Obviously, overflow is unacceptable. Generally, the following two solutions can be used:

  • 63 bit to be generatedSnowflakeIdConvert toStringType.

    • Directlylongconvert toString
    • useSnowflakeFriendlyIdtakeSnowflakeIdConvert to a friendly string representation:{timestamp}-{machineId}-{sequence} -> 20210623131730192-1-0
  • customSnowflakeIdBit allocation to shortenSnowflakeIdThe number of bits (53 bit) ofIDNo overflow when supplied to the front end

    • useSafeJavaScriptSnowflakeId(JavaScriptsafeSnowflakeId)

Segment mode (segmentid)

Design and implementation of distributed ID generator (cosid)

From the above design drawing, it is not difficult to seeSegment modeThe basic design idea is to reduce the number of network IO requests and improve performance by obtaining a certain length (step) of available ID (ID segment / number segment) each time.

  • It strongly depends on the third-party segment distributor, and its availability is affected by the third-party distributor.
  • Obtained each time the number segment is used upNextMaxIdNetwork IO requests are required, and the performance will be low at this time.
  • The single instance ID increases monotonically and the global trend increases.

    • It is not difficult to see from the design drawingInstance 1Every timeNextMaxId, it must be larger than the previous time, which means that the number segment of the next time must be larger than the previous time. Therefore, from the perspective of a single instance, it is monotonically increasing.
    • Different number segments held by multiple instances mean that the IDs generated by different instances at the same time are out of order, but the overall trend increases, so the global trend increases.
  • The degree of ID disorder is affected by the step length and cluster size (it is not difficult to see from the trend increasing diagram).

    • Suppose there is only one instance in the clusterSegment modeIs monotonically increasing.
    • StepThe smaller, the less disorder. WhenStep=1Will be infinitely close to monotonically increasing. It should be noted that here is infinite approach rather than monotonic increase. For specific reasons, you can think about such a scenario:

      • Segment distributor t1Time toInstance 1DistributedID=1,T2Time toInstance 2DistributedID=2。 Due to machine performance, network and other reasons,Instance 2Network IO write requests precedeInstance 1arrive. At this time, the ID is still out of order for the database.

Segment chainid

Design and implementation of distributed ID generator (cosid)

SegmentChainIdyesSegmentIdEnhanced, compared toSegmentIdIt has the following advantages:

  • Stability:SegmentIdThe stability problem of (p9999 = 46.624 (US / OP)) is mainly due to the synchronization after the number segment is used upNextMaxIdCaused by the acquisition of (which will generate network IO).

    • SegmentChainId(p9999 = 0.208 (US / OP)) introduced new rolesPrefetchWorkerFor maintenance and guaranteesafe distance Ideally, the thread that gets the ID will almost never need to wait for synchronizationNextMaxIdThe performance can reach approximateAtomicLongofTPS performance: 12743w + / S Jmh benchmark
  • Adaptability: fromSegmentIdIn the introduction, we know the impactId out of orderThere are two factors: cluster sizeStepsize. Cluster size is beyond our control, butStepIt can be adjusted.

    • StepIt should be as small as possible to makeID monotonically increasingIncreased likelihood of.
    • StepToo small will affect throughput, so how can we set it reasonablyStepAnd? The answer is that we cannot accurately estimate the throughput demand at all time points. The best way is that when the throughput demand is high, the step automatically increases and when the throughput is low, the step automatically shrinks.
    • SegmentChainIdIntroducedStarvation stateThe concept of,PrefetchWorkerWill be based onStarvation stateDetect currentsafe distance Whether you need to expand or shrink in order to obtain the trade-off between throughput and orderliness isSegmentChainIdAdaptability of.

Segmentchainid – throughput (OPS / s)

MySqlChainIdBenchmark-Throughput

Design and implementation of distributed ID generator (cosid)

Segmentchainid – the percentile of time spent per operation (US / OP)

MySqlChainIdBenchmark-Percentile

Design and implementation of distributed ID generator (cosid)

Description of benchmark report running environment

  • Benchmark running environment: notebook development machine (macbook pro – (M1))
  • All benchmarks are performed on the development notebook.

Recommended Today

Seven Python code review tools recommended

althoughPythonLanguage is one of the most flexible development languages at present, but developers often abuse its flexibility and even violate relevant standards. So PythoncodeThe following common quality problems often occur: Some unused modules have been imported Function is missing arguments in various calls The appropriate format indentation is missing Missing appropriate spaces before and after […]