When the data is stored in the database, why can duplicate records occur when the data is logically protected from duplication?

Time:2020-4-4

In many abnormal situations, such as high concurrency and bad network, there are occasionally duplicate records in the database.

If there is a book list now, the structure is similar to this

+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+

In case of exception, the following records may appear

+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|2. Brief history of mankind|
|3. A brief history of mankind|
+----+--------------+

However, when you think about it, you also add the relevant logic of weight determination when processing the relevant data. For example, when you add a new book with the same name, you will be prompted that the book is repeated and returned.

When meeting this situation for the first time, I feel a little confused. After thinking about it, I have made it clear. In fact, this has something to do with the transaction isolation level of the database.

First, we will briefly talk about the four isolation levels of database transactions, then we will reproduce the above problems, and finally we will talk about solutions.

4 isolation levels of 1 database transaction

1.1 not submitted for reading

As the name implies, when the transaction isolation level is at this setting, different transactions can read uncommitted data in other transactions.

For illustration, I open two clients (A and b) and set their isolation level to uncommitted read. (no global settings)

Set isolation level command

SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL {READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE}

OK, let’s start.

Client A

mysql> SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @@session.tx_isolation;
+------------------------+
| @@session.tx_isolation |
+------------------------+
| READ-UNCOMMITTED       |
+------------------------+
1 row in set (0.00 sec)

mysql> start transaction;
Query OK, 0 rows affected (0.01 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

MySQL > insert into books (name) values ('short history of human beings');
Query OK, 1 row affected (0.01 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|4. Brief history of mankind|
+----+--------------+
2 rows in set (0.00 sec)

When the transaction in a is not closed, let’s go to B to see the data

Client B

mysql> SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @@session.tx_isolation;
+------------------------+
| @@session.tx_isolation |
+------------------------+
| READ-UNCOMMITTED       |
+------------------------+
1 row in set (0.00 sec)

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|4. Brief history of mankind|
+----+--------------+
2 rows in set (0.00 sec)

B can read the uncommitted data of a, which is the so-called uncommitted read.

Finally, remember to commit each transaction.

Client A & Client B

mysql> commit;

<!–more–>

1.2 submit for reading

No transaction can read committed data in other transactions.

I will not post the statement to set the isolation level. When testing an isolation level, it is set by default.

Client A

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

MySQL > insert into books (name) values ('short history of human beings');
Query OK, 1 row affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|5. Brief history of mankind|
+----+--------------+
2 rows in set (0.00 sec)

A didn’t submit it. Go to see the data in B

Client B

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

As expected, uncommitted data in a is not visible in B.

Commit transaction in a

Client A

mysql> commit;

Look in B

Client B

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|5. Brief history of mankind|
+----+--------------+
2 rows in set (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

The data submitted in a can be seen in B.

1.3 repeatable reading

A careful friend may find that reading the same table for the same transaction in B results in different results. At first, there is only one table, and then there are two tables. If there is no such problem, it is repeatable.

Let’s verify

Client A

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

MySQL > insert into books (name) values ('short history of human beings');
Query OK, 1 row affected (0.01 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|6. Brief history of mankind|
+----+--------------+
2 rows in set (0.00 sec)

Client B

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

Client A

mysql> commit
Query OK, 0 rows affected (0.00 sec)

Client B

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

In line with expectations. The transaction in B is not affected by the transaction commit in a. the read data is the same as the transaction at the beginning. There is only one data in books, which is repeatable read.

Of course, it must be visible for B to make changes in his own transaction.

Client B

MySQL > insert into books (name) value ('short history of time ');
Query OK, 1 row affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|8. Brief history of time|
+----+--------------+
2 rows in set (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

1.4 serialization

This is the most stringent level of isolation where reads and writes in different transactions block each other.

Client A

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

Write the same table in B when a is not submitted

Client B

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from books;
+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+
1 row in set (0.00 sec)

MySQL > insert into books (name) value ('brief human history ');
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Because the read and write in different transactions are blocked each other, the above timeout occurs.

If a transaction is committed in a

Client A

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

Then you can write normally in B

Client B

MySQL > insert into books (name) value ('brief human history ');
Query OK, 1 row affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.01 sec)

Similarly, opening a transaction in a and inserting a record into books will not commit. Opening a transaction in B and reading the table will also timeout. When the transaction in a is committed, there is no problem in reading books in B.

2 recurring problems

Because the default transaction isolation level of InnoDB in MySQL is repeatable read, there may be problems in the duplicate judgment logic. Let’s reproduce it.

Now, the database data is like this

+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
+----+--------------+

Back end logic is like this

try:
    Book_name = 'brief history of human beings'
    book = get_by_name(book_name)
    if book:
        Raise exception (f 'book {book _name} already exists')

    #New operation
    #Other operations

    db.session.commit()
    return {'success': True}
except Exception as e:
    db.session.rollback()
    Return {'success': false,' MSG ': F' failed to add a new book {e} '}

When two users input the book name “brief history of human beings” and submit it, two threads execute this logic at the same time, which is equivalent to the above two clients opening transactions at the same time. We use these two clients to illustrate the problem

Client A

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

MySQL > select * from books where name = 'brief history of human beings';
Empty set (0.00 sec)

MySQL > insert into books (name) values ('short history of human beings');
Query OK, 1 row affected (0.00 sec)

A detects that the book does not exist, and then inserts it. However, due to “other operations” taking too much time due to network or other reasons, transaction submission is delayed.

In this case, perform similar operation in B

Client B

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

MySQL > select * from books where name = 'brief history of human beings';
Empty set (0.00 sec)

MySQL > insert into books (name) values ('short history of human beings');
Query OK, 1 row affected (0.00 sec)

Because the transaction isolation level is repeatable, uncommitted data in a cannot be read in B, so the weight determination logic passed smoothly and the same book was inserted. (that is to say, this problem may occur at or above the isolation level)

After the final a and B are submitted

Client A & Clinet B

mysql> commit;
Query OK, 0 rows affected (0.01 sec)

There are duplicate records

+----+--------------+
| id | name         |
+----+--------------+
|1. A brief history of the world|
|12. A brief history of mankind|
|13. A brief history of mankind|
+----+--------------+

How to solve

3.1 database level

Limit from the bottom. After adding a unique index to name, inserting duplicate records will report an error. This problem is solved simply and roughly.

3.2 code level

Adding unique index can solve this problem, but I always feel that the code is not complete. In fact, it can also solve this problem at the code level.

If we encounter a request with the same key parameters when receiving the request, we can directly reject it and return a response similar to “operation in progress”, which solves the problem from the source.

The above idea is also very simple, with the help of setnx of redis.

book_name = request.form.get('book_name', '')
if not book_name:
    Reutrn JSON. Dumps ({'success': false,' MSG ':' please fill in the title '})

redis_key = f'add_book_{book_name}'
set_res = redis_client.setnx(redis_key, 1)
if not set_res:
    Reutrn JSON. Dumps ({'success': false,' MSG ':' operation in progress'})

Add? Res = add? Book (book? Name)? Add operation

redis_client.delete(redis_key)
return json.dumps(add_res)

If there are many similar scenarios, you can consider encapsulating the redis operation into a decorator so that the code can be reused.

4 Summary

Due to the isolation level of the database, even if some data is logically anti heavy, there may be duplicate records. To solve this problem, you can add a unique index at the database level, or you can solve it at the code level.

Original link: http://www.kevinbai.com/articles/8.html

Pay attention to the official account of “little backend”, dry cargo waiting for you!

Recommended Today

PHP Basics – String Array Operations

In our daily work, we often need to deal with some strings or arrays. Today, we have time to sort them out String operation <?php //String truncation $str = ‘Hello World!’ Substr ($STR, 0,5); // return ‘hello’ //Chinese string truncation $STR = ‘Hello, Shenzhen’; $result = mb_ Substr ($STR, 0,2); // Hello //First occurrence of […]