Database knowledge (1)

Time:2020-10-24

Database transactions

Transaction is a program execution logic unit composed of a series of operations to access and update data in the system. Transaction is the most basic unit in DBMS, and transaction is inseparable.

Four characteristics of database transaction (acid)

  1. Atomicity: indivisible, either all commit or all roll back
  2. Consistency: consistency means that the transaction must make the database change from one consistent state to another, that is to say, a transaction must be in a consistent state before and after execution.
  3. Isolation: multi users conduct transaction operation together, do not affect each other and are isolated from each other.
  4. Persistence: once a transaction is committed, the changes to the data in the database are permanent, and the transaction commit operation will not be lost even in the case of database system failure.

Database transaction isolation level

Read problem

users

id name age
1 Joe 20
2 Jill 25
  1. Dirty reading

    Dirty reads can occur when one transaction allows data modified by another transaction but not committed.

    脏读

    In the example, transaction 2 modifies a row but does not commit, and transaction 1 reads the uncommitted data. Now, if transaction 2 rolls back the previous modification or makes another modification, the data found in transaction 1 is incorrect.

  2. Non repeatable

    In a transaction, when a row of data is retrieved twice and different results are obtained, it means that “non repeatable read” occurs

    In lock based concurrency control, “non repeatable read” occurs when the select operation is not obtained or the read lock is released immediately after the select operation is executed; in multi version concurrency control, when no commit conflict is required to roll back a transaction, the “non repeatable read” phenomenon occurs.

    不可重复读

    In this example, transaction 2 commits successfully, so his changes to the row with ID 1 are visible to other transactions. But transaction 1 has already read another value of “age” from this line. At the serializable and repeatable isolation levels, the database should return the value before the transaction 2 update on the second select request. In committed read and uncommitted read, the updated value is returned. This phenomenon is non repeatable reading.

    Solve non repeatable read:

    1. Transaction 2 is required to be delayed until transaction 1 is committed or rolled back. This way it worksT1, T2Serial scheduling. Serialization scheduling can support repeatable reads.
    2. Another strategy is multi version concurrency control. For better concurrency performance, transaction 2 is allowed to commit first. But because transaction 1 starts before transaction 2, transaction 1 must operate on the snapshot of the database at which it started execution. When transaction 1 is finally committed, the database will check whether the result is equivalent to T1, T2 serial scheduling. If it is equivalent, transaction 1 is allowed to commit. If not, transaction 1 needs to roll back and throw a serialization failure error
  3. Illusory reading

    In the process of transaction execution, when two identical query statements are executed, different result sets are obtained. This phenomenon is called phantom read

    When the transaction does not getRange lockIn the case ofSELECT … WHEREThe operation may have a phantom read.

    “Phantom reading” isNon repeatableWhen transaction 1 is executed twiceSELECT … WHEREDuring the operation of retrieving a certain range of data, transaction 2 creates (such as insert) a new row of data in this table, which exactly meets the “where” clause of transaction 1.

    幻读

    It should be noted that transaction 1 executed the same query twice. If the highest isolation level is set, the same result set will be obtained twice, which is exactly what the database needs to meet at the serializable isolation level. However, at a lower isolation level, a second query may result in a different result set.

Transaction isolation is actually to solve the above-mentioned dirty read, non repeatable read and phantom read

Isolation level

  1. Read uncommitted

    Read uncommitted is the lowest isolation level. Allow dirty reads, and the transaction can see the “uncommitted” changes of other transactions.

  2. Read committed

    In the read committed level, the DBMS based on the lock mechanism concurrency control needs toWrite lockIt is held until the end of the transaction, but the read lock is released immediately after the select operation is completed.

  3. Repeatable read

    In the repeatable reads isolation level, the DBMS based on the lock mechanism concurrency control needs to keep the read locks and write locks of the selected objects until the end of the transaction.

    In order to solve the problem of non repeatable reading, or to achieve repeatable reading, MySQL adopts mvvc (multi version concurrency control).

    A row of records that we see in the database table may actually have multiple versions. In addition to the data itself, each version of the record also has a field indicating the version, which is marked as row TRX_ ID, and this field is the ID of the transaction generated. The transaction ID is recorded as transaction ID. it applies to the transaction system at the beginning of the transaction and increases in chronological order.

    版本

    According to the above figure, there are three versions of a row of records, and each version records the transaction ID generated. For example, if the transaction ID of transaction a is 100, then the row TRX of version 1_ ID is 100, the same as version 2 and version 3.

    When reading commit and repeatable read, there is a word called snapshot, and its scientific name is consistency view. This is also the key to repeatable and non repeatable reads.Repeatable read generates a global snapshot of the current transaction at the beginning of a transaction, while read commit generates a snapshot every time a statement is executed.

  4. Serializable

    Serialization is the best isolation level among the four transaction isolation levels, which solves the problems of dirty read, repeatable read and phantom read, but the effect is the worst. It changes the execution of transactions into sequential execution. Compared with the other three isolation levels, it is equivalent to single thread, and the execution of the latter transaction must wait for the end of the previous transaction. A shared lock is added when reading, that is, other transactions can be read concurrently, but cannot be written. Other locks are added when writing. Other transactions cannot be written or read concurrently.

    In order to realize serialization of DBMS based on lock mechanism concurrency control, the read lock and write lock on the selected object should be kept until the end of the transaction. When a “where” clause is used to describe a range in a select query, a range locks should be obtained. This mechanism can avoid the phenomenon of “phantom reads”

Concurrent write problem

In this case, two transactions modify the same data. The final result should be the result of which transaction. It must be the later one.

Suppose transaction a performs the update operation, and the row lock must be added to the modified row during update. The row lock will not be released until it is committed. Before transaction a is committed, transaction B also wants to update this row of data, so it applies for a row lock. However, transaction B cannot apply for a row lock because it has been occupied by transaction A. at this time, transaction B will be in a waiting state until transaction a commits. If transaction a takes too long, transaction B is likely to have a timeout exception. As shown in the figure below.

并发写

It should be noted that if the index is set for the field added to the query, the database system can easily add a row lock to the row data; if the index is not set, the database system will directly lock all the rows in the table, and then traverse it again to release the useless row locks.

Solving fantasy reading

The solution to the concurrent write problem is the row lock, and the solution to the phantom read is also the lock, called the gap lock. MySQL combines the row lock and the gap lock to solve the problem of concurrent write and phantom read. This lock is called the next key lock.

In the database, a set of B + trees will be maintained for the index to quickly locate row records. The B + index tree is ordered, so the index of this table is divided into several intervals. There may be the following B + trees:

数据库维护的B+树

As shown in the figure, it is divided into three intervals, (negative infinity, 10], (10,30] and (30, positive infinity), in which gap locking can be applied.

幻读事务处理过程

Before transaction a commits, the insert operation of transaction B can only wait, which is the role of gap lock. When transaction a executesUpdate user set name = 'kite 2' where age = 10;Because of the condition where age = 10, the database not only adds a row lock on the row with age = 10, but also adds a gap lock on both sides of the record, namely (negative infinity, 10] and (10, 30), which results in transaction B’s inserting operation unable to complete and can only wait for transaction a to commit. Not only the records with age = 10 need to wait for transaction a to commit, but the record pages with age < 10 and 10 < age < 30 cannot be completed, while the records greater than or equal to 30 will not be affected, which is enough to solve the problem of phantom reading.

This is an index case. If age is not an index column, the database will put a gap lock on the whole table. Therefore, if there is no index, no matter whether the age is greater than or equal to 30, transaction a has to be committed before it can be inserted successfully.

Isolation level vs read phenomenon

Isolation level Dirty reading Non repeatable Phantom reading
Uncommitted read It could happen It could happen It could happen
Submit for reading It could happen It could happen
Repeatable It could happen
Serializable

Serializable isolation level is not the same as serializable. Serializable scheduling is a necessary but not sufficient condition to avoid the above three phenomena.

“May happen” indicates that the corresponding phenomenon will occur at this isolation level and “-” indicates that it will not happen. The isolation level to be adopted depends on the system requirements trade-offs, among which,RepeatableIs the default level of MySQL.

Isolation level vs lock duration

In lock based concurrency control, the isolation level determines the lock holding time.“C”-Indicates that the lock will persist until the transaction is committed.“S”– indicates that the lock lasts until the current statement is executed. If the lock is released after the statement is executed, another transaction can modify the locked data before the transaction is committed, thus causing confusion

Isolation level Write operation Read operation Range operation (… Where…)
Uncommitted read S S S
Submit for reading C S S
Repeatable C C S
Serializable C C C

MySQL isolation operation:

  1. To view the default isolation level of MySQL:
mysql> show variables like 'transaction_isolation';
+-----------------------+-----------------+
| Variable_name         | Value           |
+-----------------------+-----------------+
| transaction_isolation | REPEATABLE-READ |
+-----------------------+-----------------+
1 row in set, 1 warning (0.11 sec)

You can find that the default isolation level of MySQL isRepeatable

Modify the isolation level of the database:Set [scope] transaction isolation levelThe details are as follows:

SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL {READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE}。

Database index (index)

Database index is a kind of identification attached to the table field to increase the query speed. It is a structure to sort the values of one or more columns in the database table.

MySQL create index:

CREATE INDEX INDEX_NAME ON TABLE_NAME (COLUMN_LIST)

Three paradigms of database

The first paradigm

When all the attributes of relational schema R can no longer be decomposed into more basic data units, R is said to satisfy the first normal form, that is, attributes are not separable

The second paradigm

If the relational schema r satisfies the first normal form and all the non primary attributes of R are completely dependent on each candidate key attribute of R, R is said to satisfy the second normal form

The third paradigm

Let R be a transitive pattern independent property of R, i.e. if the Transitive Pattern of R satisfies an arbitrary property of R


Reference link:

  1. https://www.cnblogs.com/fengzheng/p/12557762.html
  2. https://zh.wikipedia.org/wiki/%E4%BA%8B%E5%8B%99%E9%9A%94%E9%9B%A2#%E8%84%8F%E8%AF%BB

Recommended Today

CSS uses position: sticky; to implement sticky layout

Four common attributes for positioning are as follows: { //1. The default value of positioning, no positioning, is equivalent to standard flow position:static; //2. Relative positioning, relative to the initial position of the element itself, usually set absolute (absolute positioning) to the child element of the parent element of relative position:relative; //3. Absolute positioning, relative […]