MySQL source code reading: implementation of consistent reading

Time:2022-5-6

InnoDB is a storage engine that supports mvcc (i.e. multi version concurrency control), and the consistent read function is based on mvcc. Based on the source code of MySQL 5.7, this paper discusses the principle of consistent reading, including snapshot creation, judging whether it is visible, snapshot closing, etc.

premise

  • This paper is based on mysql-5.7.29. In order to facilitate reading, the code in this paper is greatly deleted.
  • You need to understand “snapshot read”: when generating a snapshot, the changes of uncommitted transactions are not visible, and the changes of committed transactions are visible.

Transaction structure

The structure definition of the transaction is located instorage/innobase/include/trx0trx.h:898。 The parts related to mvcc are as follows:

struct trx_t {
    /*Transaction ID*/
    trx_id_t    id;        /*!< transaction id */


    /*Snapshot of consistent read*/
    ReadView*    read_view;    /*!< consistent read view used in the transaction, or NULL if not yet set */

    //Omit a lot of attributes
    // ...
}

The transaction ID is a 64 bit non negative integer. It should be noted that onlyRead write transactionTransaction ID will be assigned, and read-only transaction will not be assigned.
Each transaction has its own readview, hereinafter referred to as snapshot.

Readview source code

stayReadViewOne is defined inm_ids, it saves a list of transaction IDS in the active state, which is used to judge whether the modifications of other transactions are visible to the current transaction.

class ReadView {

    /**
     *The transaction ID that created this snapshot
     */
    trx_id_t    m_creator_trx_id;

    /**
     *A list of transaction IDs that were active when this snapshot was generated,
     *It's an ordered list
     */
    ids_t        m_ids;

    /** 
     *High water mark: ID ≥ M_ low_ limit_ No transaction with ID is visible.
     *When a snapshot is generated, it is assigned "next transaction ID to be assigned" (which will be greater than all assigned transaction IDS).
     */
    trx_id_t    m_low_limit_id;

    /**
     *Low water mark: ID less than m_ up_ limit_ No transaction with ID is visible.
     *It is the minimum value of the active transaction ID list, which is less than m when the snapshot is generated_ up_ limit_ All transactions with ID have been committed (or rolled back).
     */
    trx_id_t    m_up_limit_id;


    //Method to determine whether a transaction is visible
    bool changes_visible(){}

    //How to close a snapshot
    void close(){}
    // ...
}

Determine whether changes are visible

stayReadViewOne of themchanges_visible()Method to determine whether the change of a transaction is visible to the current transaction:

/*Judge whether the modification of a transaction is visible to the current transaction*/
bool changes_visible(){

        /**
         *Visible conditions:
         *  1.  Less than the low water mark, that is, when the snapshot is created, the transaction has been committed (or rolled back)
         *  2.  The transaction ID is the current transaction.
         */
        if (id < m_up_limit_id || id == m_creator_trx_id) {
            return(true);
        }

        If (ID > = m_low_limit_id) {/ * invisible above the water mark, that is, the transaction has not been committed when the snapshot is created*/
            return(false);

        }Else if (m_ids. Empty()) {/ * visible when there are no other active read-write transactions when creating a snapshot*/

            return(true);
        }

        /**
         *At this step, it indicates that the transaction ID is between the low water level and the high water level, i.e. ID ∈ [m_up_limit_id, m_low_limit_id)
         *It is necessary to judge whether it belongs to the active transaction list m_ In IDS,
         *If the transaction is active (uncommitted) when the snapshot is created, the modification is not visible to the current transaction.
         */

        //Obtain the list of active transaction IDs and use binary search to determine whether the transaction ID is in M_ In IDS
        const ids_t::value_type*    p = m_ids.data();
        return(!std::binary_search(p, p + m_ids.size(), id));
}

It can be seen that the current transaction mainly depends on the active transaction list m_ IDS to judge.

For a specific line of records, how to judge whether the current transaction is visible?

Judge whether the primary key index is visible

For each data row in the primary key index, in addition to the user-defined fields, there are additional system fields, including:

  • Transaction ID: modify the transaction ID of this line of record.
  • Roll pointer: undo log pointer

With the transaction ID of the record line, callchanges_visible()You can know whether this record is visible to the current transaction:

/**
Checks that a record is seen in a consistent read.
@return true if sees, or false if an earlier version of the record
should be retrieved */
bool lock_clust_rec_cons_read_sees()
{
    //Get the transaction ID of modifying this data row
    trx_id_t    trx_id = row_get_rec_trx_id(rec, index, offsets);

    //Call changes_ Visible() determines whether it is visible
    return(view->changes_visible(trx_id, index->table->name));
}

What if it’s not visible? You need to find the previous version in the undo log:

if (!lock_clust_rec_cons_read_sees(clust_rec, index, offsets,
                           node->read_view)) {
            //When judging that the data line is invisible, look for the previous version                    
            err = row_sel_build_prev_vers(
                node->read_view, index, clust_rec,
                &offsets, &heap, &plan->old_vers_heap,
                &old_vers, mtr);
            // ......
}

Judge whether the auxiliary index is visible

Suppose you query the SQL of ID through a secondary index:

select id from t where idx=2; 

adoptidx=2When locating the index record, the corresponding ID can be obtained only from the auxiliary index. Can I return this ID directly at this time?

Obviously not.
Unlike the primary key index, the secondary index does not save the transaction ID of modifying this record, so it cannot be judgedidx=2Whether the corresponding record is visible. For example, this record is created by another transaction only after the snapshot of the current transaction is created, which is invisible at this time (when the isolation level is repeatable reading).

For secondary indexes, each data page has one field:PAGE_MAX_TRX_ID, the maximum transaction ID for modifying this data page is saved.

Therefore, when judging whether the records in the auxiliary index are visible, the judgment conditions are:max_trx_id < m_up_limit_id

/**
 *Judge whether the records in the auxiliary index are visible.
 *Visible when true is returned. When false is returned, you are not sure. You need to query in the primary key index.
 */ 
bool lock_sec_rec_cons_read_sees()
{
    //Modify the maximum transaction ID of this page
    trx_id_t    max_trx_id = page_get_max_trx_id(page_align(rec));

    //To judge whether it is visible, the condition is max_ trx_ id < m_ up_ limit_ id
    return(view->sees(max_trx_id));
}

Specifically:

  • Whenmax_trx_idWhen it is less than the low water mark, it is visible because the transaction modifying this index page has been committed when the current transaction creates a snapshot.
  • When the condition does not hold, it is impossible to determine whether it is visible. At this time, you need to search in the primary key index, and then according to the previouschanges_visible()To judge.

The code is as follows:

/**
 *When it is not possible to determine whether the secondary index is visible,
 *First execute ICP (index push down) to determine whether the indexes match. If so, check the primary key index.
 *If the conditions do not match, the next secondary index record is processed.
 */ 
if (!srv_read_only_mode
    && !lock_sec_rec_cons_read_sees(rec, index, trx->read_view)) {

    switch (row_search_idx_cond_check(
            buf, prebuilt, rec, offsets)) {
    case ICP_ NO_ Match: // ICP does not match. You don't need to look at the primary key index. Process the next record directly
        goto next_rec;
    case ICP_OUT_OF_RANGE:
        err = DB_RECORD_NOT_FOUND;
        goto idx_cond_failed;
    case ICP_ Match: // when the ICP meets the conditions, check the primary key index to determine whether it is visible
        goto requires_clust_rec;
    }

    ut_error;
}

Get active transaction list

When creating a snapshot, get the list of active read-write transactions:

/**
 *Generate a read-write transaction ID list and calculate the high and low watermarks.
 *Creating a snapshot mvcc:: View_ Called when open()
 */
void ReadView::prepare(trx_id_t id){

    //The ID of the snapshot created is the current transaction ID
    m_creator_trx_id = id;

    //The high watermark is "next transaction ID to be assigned"
    m_low_limit_no = m_low_limit_id = trx_sys->max_trx_id;

    /**
     *The active transaction ID list (trx_sys - > rw_trx_ids) will be recorded in the system transaction structure (trx_sys),
     *If there are active read-write transactions, start from TRX_ Sys copies the read / write transaction ID list to M_ In IDS
     */ 
    if (!trx_sys->rw_trx_ids.empty()) {
        copy_trx_ids(trx_sys->rw_trx_ids);
    } else {
        //There were no active read-write transactions when the snapshot was created
        m_ids.clear();
    }

    // ...
}

The above code obtainedm_idsAnd the high water mark and the low water markReadView::complete()Calculated in:

void ReadView::complete()
{
    /*The low watermark is the minimum value of the active transaction list, i.e. the first active transaction ID*/
    m_up_limit_id = !m_ids.empty() ? m_ids.front() : m_low_limit_id;

    // ...
}

Snapshot generation

Snapshot generation is defined inReadView* trx_assign_read_view()Method:

ReadView* trx_assign_read_view(
/*=================*/
    trx_t*        trx)    /*!< in/out: active transaction */
{

    If (srv_read_only_mode) {// in read-only mode, no snapshot is generated
        return(NULL);

    }Else if (! Mvcc:: is_view_active (TRX - > read_view)) {// if a snapshot has been generated, it will be returned directly
        trx_ sys->mvcc->view_ open(trx->read_view, trx);  //  Execute snapshot generation
    }

    return(trx->read_view);
}

calltrx_assign_read_view()There are two main methods:

  • row_search_mvcc()
    Called when a lock free select is executed, that is, the select does not have a lockfor updatelock in share mode
  • innobase_start_trx_and_assign_read_view()
    usestart transaction with consistent snapshotStatement creates a consistency view at the beginning of a transaction (valid only when it is repeatable). The code is as follows.
static int innobase_start_trx_and_assign_read_view()
{
    if (trx->isolation_level == TRX_ISO_REPEATABLE_READ) {
        trx_assign_read_view(trx);
    } else {
        //When the isolation level is not repeatable, a warning message will be output
        push_warning_printf(thd, Sql_condition::SL_WARNING,
                    HA_ERR_UNSUPPORTED,
                    "InnoDB: WITH CONSISTENT SNAPSHOT"
                    " was ignored because this phrase"
                    " can only be used with"
                    " REPEATABLE READ isolation level.");
    }
}

Snapshot shutdown

Processing and closing snapshots are somewhat different at different isolation levels.

  • At the repeatable read isolation level, the snapshot closes at the end of the transaction. Multiple SQL for the entire transaction use the same snapshot.

    static void trx_commit_in_memory(){
    
      If (trx_is_autocommit_non_locking (TRX)) {// automatically committed non locking consistent reads
          if (trx->read_view != NULL) {
              //Execute close snapshot
              trx_sys->mvcc->view_close(trx->read_view, false);
          }
      } else {
    
          if (trx->id > 0) {
              /*TRX - > ID greater than 0 indicates that the current transaction is a read-write transaction.
              *You need to remove the current transaction ID from the read-write transaction list of the system transaction structure (trx_sys - > rw_trx_ids) and close the view
              */
              /* For consistent snapshot, we need to remove current
              transaction from running transaction id list for mvcc
              before doing commit and releasing locks. */
              trx_erase_lists(trx, serialised);
          }
    
          //Read only transaction
          if (trx->read_only || trx->rsegs.m_redo.rseg == NULL) {
              
              //Close snapshot when snapshot is not empty
              if (trx->read_view != NULL) {
                  trx_sys->mvcc->view_close(trx->read_view, false);
              }
          }
          
    }
  • At the read committed isolation level, the snapshot is closed at the end of the SQL statement.

    /* If the MySQL lock count drops to zero we know that the current SQL
    statement has ended */
    If (TRX - > n_mysql_tables_in_use = = 0) {// the number of tables used in the current SQL is 0, indicating that the SQL execution is over
    
      //When the isolation level is read committed, execute to close the snapshot
      if (trx->isolation_level <= TRX_ISO_READ_COMMITTED
              && MVCC::is_view_active(trx->read_view)) {
    
    
          trx_sys->mvcc->view_close(trx->read_view, true);
    
      }
    }

summary

  • A snapshot is generated when a lock free select statement is executed
  • At the repeatable read isolation level, the snapshot closes at the end of the transaction.
  • At the read committed isolation level, the snapshot is closed at the end of the SQL statement.
  • Visibility of the snapshot when the transaction is committed.