Read optimistic lock and pessimistic lock and mvcc in database

Time:2021-1-20

preface

In the actual use of the database, we often encounter situations in which we do not want the data to be written or read at the same time. For example, in the second kill scenario, two requests read to the system at the same time, and there is one inventory in the system, and then the inventory is updated to 0 successively. At this time, there will be oversold. At this time, the actual inventory of the goods and our records will not correspond.

In order to solve the problem of data inconsistency caused by resource competition, we need a mechanism to ensure the correct access and modification of data. In database, this mechanism is the concurrency control of database. Among them, optimistic concurrency control, pessimistic concurrency control and multi version concurrency control are the main technical means of database concurrency control.

Pessimistic Concurrency

essence

Wikipedia: pessimistic concurrency control (PCC) is a method of concurrency control in relational database management system. It prevents a transaction from modifying data in a way that affects other users. If a lock is applied to a row of data read by a transaction, only when the lock is released by the transaction can other transactions perform the operation that conflicts with the lock.

In fact, we often say pessimistic lock is not an actual lock, but an idea of concurrency control. Pessimistic concurrency control holds a pessimistic attitude towards data modification. It thinks that when data is accessed by the outside world, there will inevitably be conflicts, so locking is used in the process of data processing to ensure the exclusive use of resources.

In fact, database lock mechanism is based on pessimistic concurrency control, and according to the actual use, database lock can be divided into many kinds, see my later article for details.

Implementation mode

The locking process of database pessimistic lock is as follows:

  • After starting a transaction, add a certain type of lock to the data application that needs to be locked according to the operation type: for example, shared row lock, etc
  • If the locking is successful, the subsequent operation will continue. If the data has been locked by other locks and conflicts with the lock to be added now, the locking will fail (for example, the exclusive lock has been added). At this time, it is necessary to wait for other locks to be released (deadlock may occur)
  • Release the lock after completing the transaction

Advantages and disadvantages

advantage:
Pessimistic concurrency control adopts a conservative strategy: “take the lock first, and access the data only after success”, which ensures that the data acquisition and modification are carried out in an orderly manner, so it is suitable for use in the environment of more writing and less reading. Of course, pessimistic lock can not maintain very high performance, but pessimistic lock can ensure the security of data on the premise that optimistic lock can not provide better performance.

Disadvantages:
Pessimistic concurrency control increases the overhead of the system, reduces the efficiency of the system, and reduces the parallelism of the system.

Optimistic concurrency control

essence

Wikipedia: in relational database management system, optimistic concurrency control (also known as “optimistic lock”, abbreviated as “OCC”) is a method of concurrency control. It assumes that multi-user concurrent transactions will not affect each other when they are processed, and each transaction can process the part of data affected by each other without generating locks.

Optimistic concurrency control is optimistic about data modification. Even in a concurrency environment, external operations on data generally do not cause conflicts, so it does not lock. Instead, before submitting data updates, each transaction will check whether other transactions have modified the data after the transaction reads the data. If other transactions are updated, the conflict information will be returned, and the user will decide how to do the next step, such as retrying or rolling back.

It can be seen that optimistic lock is not an actual lock, and it doesn’t even use lock to realize concurrency control. Instead, it uses other methods to judge whether data can be modified. Optimistic lock is generally a lock mechanism implemented by users themselves. Although it does not use the actual lock, it can produce the effect of locking.

Implementation mode

CAS (compare and swap) is a famous lock free algorithm. Lock free programming is to realize variable synchronization between multiple threads without using lock, that is, to realize variable synchronization without thread being blocked, so it is also called non blocking synchronization. The scheme to realize non blocking synchronization is called “non blocking algorithm”.

Optimistic locking is based on CAS (compare and swap) algorithm. Let’s look at the CAS process first. A CAS operation process can be represented by the following C code:

int cas(long *addr, long old, long new)
{
    /* Executes atomically. */
    if(*addr != old)
        return 0;
    *addr = new;
    return 1;
}

CAS has three operands: memory value V, old expected value a, and new value B to be modified. If and only if the expected value a and the memory value v are the same, change the memory value V to B, otherwise nothing will be done. The whole CAS operation is an atomic operation, which is indivisible.

The implementation of optimistic lock is similar to the above process, mainly in the following ways:

  • Version number mark: add a new field in the table: version, which is used to save the version number. Get the version number at the same time when getting the data, and then use the following command when updating the data:update xxx set version=version+1,… where … version="old version" and ..... At this time, we can judge whether the update is successful by judging whether the number of affected rows of the returned result is 0. If the update fails, it means that other requests have updated the data.
  • Time stamp mark: same as version number, only judged by time stamp. Generally speaking, many data tables have an update time field, so you don’t need to add another field by judging from this field.
  • Field to be updated: if there is no timestamp field and you don’t want to add a new field, you can consider using the field to be updated, because the updated data will generally change. Before updating, you can compare the old value of the field to be updated with the present value of the database, and if there is no change, it will be updated.
  • All field Tags: all fields in the data table are used to judge. This is equivalent to locking not only a few fields, but the entire data row. As long as the data of the row changes, it will not be updated.

Advantages and disadvantages

advantage:
Optimistic concurrency control has no actual locking, so there is no extra cost. It is also good for deadlock problem. It is suitable for concurrency scenarios with more read and less write. Because there is no extra cost, it can greatly improve the performance of the database.

Disadvantages:
Optimistic concurrency control is not suitable for the scenario of write more and read less concurrency, because there will be a lot of write conflicts, resulting in data writing to wait for multiple retries, in this case, its overhead is actually higher than pessimistic lock. Moreover, the business logic of optimistic lock is more complex than pessimistic lock. In business logic, failure and waiting for retrial should be considered, and the direct modification of database by other third-party systems cannot be avoided.

Multi version concurrency control

essence

Wikipedia: multi version concurrency control (MCC or mvcc) is a common concurrency control in database management system. It is also used in programming language to realize transaction memory.

Both optimistic concurrency control and pessimistic concurrency control guarantee the serialization of transactions by delaying or terminating the corresponding transactions to solve the competitive conditions between transactions. Although the former two concurrency control mechanisms can fundamentally solve the serialization problem of concurrent transactions, they are actually solving the problem of write conflict, and the difference between them lies in the optimistic attitude towards write conflict Pessimistic lock can also solve the problem of read-write conflict, but the performance is average. In the actual use process, the database read request is many times of the write request. If we can solve the problem of read-write concurrency, we can greatly improve the read performance of the database, and this is what multi version concurrency control can do.

Different from pessimistic concurrency control and optimistic concurrency control, mvcc is to solve the problem of multiple, long-time read and write operations caused by read-write lock, that is to solve the problem of read-write conflict. Mvcc can be combined with either of the two mechanisms to improve the read performance of database.

Pessimistic locking of database, based on the consideration of improving concurrency performance, generally implements multi version concurrency control at the same time. Not only mysql, but also Oracle, PostgreSQL and other database systems have implemented mvcc, but their implementation mechanisms are different, because mvcc does not have a unified implementation standard.

Generally speaking, the emergence of mvcc is a solution that database is dissatisfied with using pessimistic lock to solve the problem of read-write conflict.

Implementation mode

Mvcc is realized by saving the snapshot of data at a certain time point. The data item read by each transaction is a historical snapshot, which is called snapshot reading. Different from the current reading, the data read by snapshot reading may not be the latest, but snapshot isolation can make the data seen by the whole transaction be the data state when it was started. The write operation does not cover the existing data items, but creates a new version, which will not be visible until the transaction is committed.

Current read and snapshot read

What are current read and snapshot read in MySQL InnoDB?

Current reading
Such operations as select lock in share mode, select for update, update, insert and delete are all current reads. Why is current read? It reads the latest version of the record. When reading, it also ensures that other concurrent transactions cannot modify the current record and locks the read record.

Snapshot read
For example, the select operation without locking is snapshot read, that is, non blocking read without locking; the premise of snapshot read is that the isolation level is not uncommitted read and serialization level, because uncommitted read always reads the latest data row, rather than the data row conforming to the current transaction version. Serialization locks all read lines

Advantages and disadvantages

Mvcc allows most read operations to be done without locking. This design makes the read operation very simple, with good performance, and also ensures that only the rows that meet the standard will be read. The disadvantage is that each row of records needs additional storage space, more line checking work and some additional maintenance work.

Applicable scenarios

  • Pessimistic lock
    • Lock concurrency control for solving read write conflict and write write conflict
    • It is suitable for the situation of more writing, less reading and serious writing conflict, because pessimistic lock is locked when reading data. The scene of more reading will require frequent locking and a lot of waiting time, and using pessimistic lock in the case of serious writing conflict can ensure the consistency of data
    • High requirement of data consistency
    • It can solve the problems of dirty read, unreal read, non repeatable read, the first type of update lost, the second type of update lost
  • Optimistic lock
    • Lock free concurrency control for solving write write conflicts
    • It is suitable for reading more and writing less, because if there are a large number of write operations, the possibility of write conflicts will increase, and the business layer needs to constantly try again, which will greatly reduce the system performance
    • The requirement of data consistency is not high, but it requires very high response speed
    • Can not solve dirty read, unreal read, can not repeat read, but can solve the problem of update loss
  • MVCC
    • Lock free concurrency control for solving read write conflicts
    • Combined with the above two, improve their read performance

reference material

Wikipedia
www.cnblogs.com/rinack/p/10032207….
draveness.me/database-concurrency-…

This work adoptsCC agreementReprint must indicate the author and the link of this article

Mr. X