Practical affairs

Time:2022-5-25

1、 Foreword

Transaction is an essential function in traditional relational database. For example, mysql, Oracle and PostgreSQL all support transaction. However, in NoSQL database, the concept of transaction is relatively weak and its implementation is not as complex as relational database.

However, for the sake of data integrity and consistency, most K-V will implement the basic characteristics of transactions, such as leveldb and rocksdb, the two ancestors of K-V database. Some open source K-V implemented by go language also support transactions, such as bolt, badger, etc.

Rosedb transaction has just implemented a primary version, and the code is still relatively simple, but in my expected idea, it may evolve more complex in the future.

It should be noted that before the implementation of rosedb transactions, my understanding of transactions is limited to the basic concepts of acid. Therefore, this implementation is entirely to cross the river by feeling the stone, and there may be some slots. If you have any questions, you can point out them, and I will continue to learn and improve them later.

2、 Basic concepts

When it comes to transactions, it’s easy to think of the acid feature of transactions. Let’s review:

  • Atomicity: all operations in a transaction either complete or fail, and will not end in the intermediate phase. If an error occurs during the execution of a transaction, it can be rolled back to the state before the start of the transaction.
  • Consistency: before and after the transaction, the integrity of the database is not destroyed, which means that the data state always meets the expectation.
  • Isolation: isolation describes the degree of interaction between multiple executing transactions. There are four common isolation levels, indicating the different degrees of influence between transactions:
    • Read uncommitted: one transaction has not been committed, and another transaction can see its changes (dirty reads exist)
    • Read committed: the modification of data by a transaction can only be seen by other transactions after it is committed (there is no dirty reading, but it cannot be read repeatedly)
    • Repeatable read: the data obtained during the execution of a transaction is consistent with the data at the beginning of the transaction (there is no dirty reading, but there is phantom reading)
    • Serializable: read and write are mutually exclusive to avoid transaction concurrency. A transaction can only be executed after the previous transaction is committed (no dirty read, repeatable read, no phantom read)
  • Durability: after a transaction is committed, its changes are permanent, which can ensure security even after the database crashes.

Acid seems to have many concepts, but it is not difficult to understand. To realize transactions is to ensure that these basic concepts of transactions are met during data reading and writing, among which aid must be guaranteed.

Consistency is consistency, which can be simply understood as the ultimate goal of the transaction. The database ensures consistency through aid, and we also need to ensure consistency at the application level. If the data we write is logically wrong, even if the database is perfect, the consistency cannot be guaranteed.

3、 Concrete implementation

Before explaining the implementation of transactions, let’s take a look at the basic usage of transactions in rosedb:

//Open database instance
db, err := rosedb.Open(rosedb.DefaultConfig())
if err != nil {
   panic(err)
}

//Manipulate data in a transaction
err = db.Txn(func(tx *Txn) (err error) {
   err = tx.Set([]byte("k1"), []byte("val-1"))
   if err != nil {
      return
   }
   err = tx.LPush([]byte("my_list"), []byte("val-1"), []byte("val-2"))
   if err != nil {
      return
   }
   return
})

if err != nil {
   panic(fmt.Sprintf("commit tx err: %+v", err))
}

First, a database instance will be opened, and then theTxnMethod. The input parameter of this method is a function in which all transaction operations are completed and executed at one time at the time of submission.

If used like this, the transaction will be automatically committed. Of course, you can also manually start the transaction and commit, and manually roll back when an error occurs, as follows:

//Open database instance
db, err := rosedb.Open(rosedb.DefaultConfig())
if err != nil {
   panic(err)
}

//Open transaction
tx := db.NewTransaction()
err = tx.Set([]byte("k1"), []byte("val-1"))
if err != nil {
   //Rollback when an error occurs
   tx.Rollback()
   return
}

//Commit transaction
if err = tx.Commit(); err != nil {
   panic(fmt.Sprintf("commit tx err: %+v", err))
}

Of course, the first usage is recommended, which eliminates manual transaction submission and rollback.

TxnMethod represents a read-write transaction, and there is anotherTxnViewMethod represents a read-only transaction, which is used in exactly the same way, except inTxnViewMethods are ignored.

db.TxnView(func(tx *Txn) error {
   val, err := tx.Get([]byte("k1"))
   if err != nil {
      return err
   }
   //Processing val

   hVal := tx.HGet([]byte("k1"), []byte("f1"))
   //Process hval

   return nil
})

After understanding the basic concept of acid of transaction and the basic usage of rosedb transaction, let’s take a look at how the transaction is implemented in rosedb and how to ensure the aid feature.

3.1 atomicity

As mentioned earlier, atomicity refers to the integrity of transaction execution, either all successful or all failed, and cannot stay in the intermediate state.

In fact, it is not difficult to realize atomicity, which can be solved with the help of the writing characteristics of rosedb. First, let’s review the basic process of rosedb data writing. There are two steps: first, the data will fall on the disk to ensure reliability, and then update the index information in memory.

For a transaction operation, to ensure atomicity, the data to be written can be temporarily stored in memory, and then written to the disk file at one time when the transaction is committed.

There is a problem, that is, what if there is an error in batch writing to the disk or the system crashes? In other words, some data may have been written successfully and some failed. According to the definition of atomicity, if the transaction is not committed and completed this time, it is invalid. How do you know that the written data is invalid?

At present, rosedb adopts one of the most understandable and relatively simple methods to solve this problem.

The specific method is as follows: at the beginning of each transaction, a globally unique transaction ID will be assigned, and the data to be written will take this transaction ID and write it to the file. When all data is written to the disk, the transaction ID is saved separately (also written to a file). When the database starts, all transaction IDS in this file will be loaded first and maintained into a collection, which is called the committed transaction ID.

In this way, even if there is an error in batch writing of data, since the corresponding transaction ID is not stored, when the database starts and takes out the data to build the index (recall the startup process of rosedb), it can be checked that the transaction ID corresponding to the data is not in the submitted transaction ID set, so these data will be considered invalid.

The K-V of most LSM genres use similar ideas to ensure the atomicity of transactions. For example, rocksdb stores all writes in transactions in a writebatch and writes them at one time when the transaction is committed.

3.2 isolation

Currently, rosedb supports two transaction types: read-write transaction and read-only transaction. Only one read-write transaction can be started at the same time, and multiple read-only transactions can be started at the same time.

In this mode, read locks will be added to read and write locks will be added to write. In other words, read and write will be mutually exclusive and cannot be carried out at the same time. It can be understood that this is serialization among the four isolation levels. Its advantage is simple and easy to implement, but its disadvantage is poor concurrency.

It should be noted that the current implementation rate will be adjusted later. My assumption is that snapshot isolation can be used to support read submission or repeatable reading, so that data reading can read the historical version without blocking write operations, but the implementation is much more complex.

3.3 persistence

For example, if the data has been written to the disk or the most common storage medium, it can ensure the security of non-volatile data.

In rosedb, if the default disk flushing strategy is adopted when writing data, the data is written to the operating system page cache, but actually does not fall on the disk. If the operating system does not have time to brush the page cache data to disk, it will cause data loss. In this way, although the persistence cannot be fully guaranteed, the performance is relatively better, because sync disk brushing is an extremely slow operation.

If the configuration item sync is specified as true when starting rosedb, it will be forced to sync every write, which can ensure no data loss, but the write performance will be reduced.

The actual choice can be made according to your own use scenario. If the system is stable, requires high performance, and can tolerate the loss of a small amount of data, you can use the default strategy, that is, sync is false, otherwise you can force the disk to be swiped.

4、 Defect

Through the above simple analysis, we can see that rosedb has basically realized the aid feature of transactions. On the whole, it is very simple, easy to learn and use, and can be well understood for further expansion. Of course, there are also some defects to be solved.

The first is the isolation level mentioned above. At present, this method is too simple. It uses a global big lock to achieve serialization. In the future, we can consider locking only a key that needs to be operated to reduce the granularity of the lock.

Another problem is that rosedb supports a variety of data structures, but structures such as list and Zset have great difficulty in supporting all commands in transactions. Therefore, at present, list only supports lpush and rpush, and Zset only supports zadd, zscore and zrem commands.

The main reason is that if the existing key is read and written in the transaction, it will be difficult to support commands such as range lookup. At present, I haven’t thought of a better solution.

Finally, attach the project address:github.com/roseduan/rosedbWelcome to watch roast.

This work adoptsCC agreement, reprint must indicate the author and the link to this article