Interview questions

How to deal with ID PK after sub database and sub table?

Psychological analysis of interviewers

In fact, this is a problem you must face after the sub database and sub table. How to generate the ID? Because if you divide a table into several tables and each table is accumulated from 1, it’s not right. You need oneGlobal uniquenessID to support. So that’s what you have to consider in your actual production environment.

Analysis of interview questions

Implementation scheme based on Database

Database autoincrement ID

This means that every time you get an ID in your system, you insert a piece of data with no business meaning into a table of a database, and then get an ID that is automatically increased by a database. After getting this ID, write it into the corresponding sub database and sub table.

The advantage of this scheme is that it is convenient and easy for everyone to use;The disadvantage is single library generationIf the self increasing ID is highly concurrent, there will be a bottleneck. If you want to improve it, you need to open a service specifically. This service will get the maximum value of the current ID each time, then increase several IDS by itself, return a batch of IDS at a time, and then change the maximum ID value to a value after increasing several IDS; howeverBased on a single database anyway

Suitable sceneThere are two reasons for you to divide databases and tables: either the concurrency of a single database is too high, or the data volume of a single database is too large, unless youLow concurrency, but too much dataYou can use this scheme to expand the capacity of sub databases and sub tables, because the maximum concurrency per second may be several hundred at most. Then you can use a separate database and table to generate an auto increase primary key.

Set database sequence or table auto increment field step

You can scale horizontally by setting the database sequence or the increment field step of the table.

For example, there are eight service nodes. Each service node uses a sequence function to generate an ID. the starting ID of each sequence is different and increases in turn with a step of 8.


Suitable scene: when the user prevents the generated ID from repeating, this scheme is relatively simple to implement and can also achieve the performance goal. But the service node is fixed, and the step size is fixed. If you want to add more service nodes in the future, it’s not easy.


The advantage is that it is generated locally rather than based on the database. The disadvantage is that the UUID is too long and takes up a lot of space,Poor performance as primary keyWhat’s more, UUID does not have order, which will lead to too many random write operations (continuous ID can generate partial sequential write) when the B + tree index is written. In addition, because the sequential append operation cannot be generated when writing, the insert operation is required, and the entire B + tree node will be read to memory. After inserting this record, the entire section will be inserted Click write back to disk. When the recording space is large, the performance drops obviously.

Suitable scenario: if you want to randomly generate a file name, number, etc., you can use UUID, but as a primary key, you cannot use UUID.

UUID.randomUUID().toString().replace(“-”, “”) -> sfsdf23423rr234sfdaf

Get system current time

This is to get the current time, but the problem is,When concurrency is highFor example, thousands of concurrent in one second,There will be repetitionThis is definitely not appropriate. Basically, I don’t need to think about it.

Suitable scenario: generally, if this scheme is used, the current time is spliced with many other business fields as an ID. if you think it is acceptable in business, it is also acceptable. You can combine other business field values with the current time to form a globally unique number.

Snowflake algorithm

Snowflake algorithm is an open-source distributed ID generation algorithm of twitter, which is implemented in Scala language. It uses a 64 bit long ID, one bit is not used, 41 bit is used as the millisecond, 10 bit is used as the working machine ID, and 12 bit is used as the serial number.

  • 1 bit: No, why not? Because the first bit in binary is negative if it is 1, but the IDS we generate are all positive, so the first bit is 0.
  • 41 bit: time stamp, in milliseconds. 41 bit can represent up to2^41 - 1, that is, it can be identified2^41 - 1Millisecond value, converted into an adult is the time of 69 years.
  • 10 bit: record the working machine ID, which means that the service can be deployed on 2 ^ 10 machines at most, that is, 1024 machines. But in 10 bits, 5 bits represent the machine room ID and 5 bits represent the machine ID. It means the most representative2^5Computer rooms (32 computer rooms), each of which can represent2^5Machines (32 machines).
  • 12 bit: This is used to record different IDs generated in the same millisecond. The maximum positive integer that 12 bit can represent is2^12 - 1 = 4096That is to say, the 12 bit number can be used to distinguishWithin the same millisecondOf 4096 different IDs.
0 | 0001100 10100010 10111110 10001001 01011100 00 | 10001 | 1 1001 | 0000 00000000
public class IdWorker {

    private long workerId;
    private long datacenterId;
    private long sequence;

    public IdWorker(long workerId, long datacenterId, long sequence) {
        // sanity check for workerId
        //I'll check it here. The requirement is that the machine ID and machine ID you pass in should not exceed 32, and should not be less than 0
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(
                    String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(
                    String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
                "worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);

        this.workerId = workerId;
        this.datacenterId = datacenterId;
        this.sequence = sequence;

    private long twepoch = 1288834974657L;

    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;

    //This is a binary operation, that is, 5 bits can only have 31 numbers at most, that is to say, the machine ID can only be within 32 at most
    private long maxWorkerId = -1L ^ (-1L << workerIdBits);

    //This means that 5 bits can only have 31 numbers at most, and the machine room ID can only be within 32 at most
    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private long sequenceBits = 12L;

    private long workerIdShift = sequenceBits;
    private long datacenterIdShift = sequenceBits + workerIdBits;
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private long sequenceMask = -1L ^ (-1L << sequenceBits);

    private long lastTimestamp = -1L;

    public long getWorkerId() {
        return workerId;

    public long getDatacenterId() {
        return datacenterId;

    public long getTimestamp() {
        return System.currentTimeMillis();

    public synchronized long nextId() {
        //Here is to get the current timestamp, in milliseconds
        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
            System.err.printf("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException(String.format(
                    "Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));

        if (lastTimestamp == timestamp) {
            //This means that you can only have 4096 numbers in a millisecond
            //No matter how many times you pass in, this bit operation is guaranteed to be within the range of 4096, so as to prevent you from passing a sequence beyond the range of 4096
            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
        } else {
            sequence = 0;

        //Here is the timestamp of the last ID generation, in milliseconds
        lastTimestamp = timestamp;

        //This is to move the time stamp to the left and put it at 41 bit;
        //Move the machine room ID to the left and place it at 5 bit;
        //Move the machine ID to the left and place it at 5 bits; put the serial number at the last 12 bits;
        //Finally, it is spliced into a 64 bit binary number and converted into a 10 base is a long type
        return ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift) | sequence;

    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        return timestamp;

    private long timeGen() {
        return System.currentTimeMillis();

    public static void main(String[] args) {
        IdWorker worker = new IdWorker(1, 1, 1);
        for (int i = 0; i < 30; i++) {


How can I say that? Roughly speaking, 41 bit is a timestamp of the current millisecond unit, that’s what it means; then 5 bit is the one you passed inComputer roomID (but the maximum can only be within 32), and the other 5 bits are passed in by youmachineID (but the maximum can only be within 32). The remaining 12 bit serial number is that if it is within one millisecond from the time when you last generated ID, the sequence will be accumulated to you, up to 4096 serial numbers.

So you can use this tool class to create a service by yourself, and then initialize such a thing for each machine in each computer room. At the beginning, the serial number of this machine in this computer room is 0. Then every time you receive a request that the machine in this machine room needs to generate an ID, you will find the corresponding worker to generate it.

With this snooflake algorithm, you can develop your own company’s services. Even for the computer room ID and machine ID, you have reserved 5 bit + 5 bit for you anyway, and you can change to something with business meaning.

This snooflake algorithm is relatively reliable, so if you really want to do distributed ID generation, if it’s highly concurrent, then you should use this scenario with better performance. Generally, the scenario with tens of thousands of concurrent per second is enough for you.

The link is in the public address of MI pocket.

Welcome to Mindou Java, a java learning platform for sharing and communication.


Recommended Today

Understanding and deepening of relative path and absolute path

What is relative path and absolute path Last week’s report solved some problems, but also exposed many problems, one of which is the relative path and absolute path. For PHP using xampp to build a server, the relative path refers to the current file relative to the user’s access, and the absolute path refers to […]