Random talk on database (3)

Time:2021-1-26

A few days ago, Dalian had a heavy snow which was not seen in decades. Although it caused a lot of trouble for commuting, the children were very happy. They had snowball fights and made snowmen in the community.

The shops in the community are selling skis. For a while, the quiet valley community has become an amusement park, giving people who have been depressed by the epidemic for more than a year and can’t go out to travel a few days free trip to the snow village.

Today is the Japanese holiday “adult day”, so I will rest at home to do technical support for Japanese customers, and take this time to continue to chat with you about the database.

Last time we talked about it“Random I / O”How does the seek time affect the read and write of data blocks? Today, let’s talk about the methods that smart engineers have come up with to solve or improve these problems.

First of all“Random I / O”As for the solution to the impact of write operation, the “write” special index here refers to the write operation during block update (“update” and “delete”). Of course, “insert” also receives the impact of random I / O, but the block search algorithm involved is different from that during update,

We won’t repeat it here today. Of course, if you want to know more about the data block search action of “insert”, you can leave a message. We will find an opportunity to elaborate later.

Because the update (“update” and “delete”) operations are based on the “where” condition to write specific data blocks, it is inevitable to update these data blocks“Random I / O”How to solve this seemingly unsolvable problem.

Let’s think for five minutes.

By the way, first write these operations of updating data blocks to a file — redo log.

So why can the above method solve the problem“Random I / O”The problem is, because it’s a problem“Random I / O”Convert to“Continuous I / O”It’s the best way.

Now let’s talk about what is“Continuous I / O”In the last article, we know the seek method of HDD Hard disk, so if the write operation is not for a specific data block, but directly seek to the end of the redo log file, and then write multiple sectors in succession,

So“Random I / O”The lost seek time will be greatly reduced. That’s how traditional HDDs work“Continuous I / O”。

So the problem is that the changes of data blocks are written into the redo log, which seems to have failed to achieve our goal of updating the changes into data blocks.

Yes, we just did the following processing to tell application that the database update is complete.

1. Read the update object data block into memory.
2. Write the update to the data block in memory.
3. When certain conditions are met or there is a commit request, write the update to the redo log with "* *" continuous I / O * *.
4. Tell application that the database has been updated.

When do you write data blocks? Ha ha, this can be done with backstage software.

Now let’s make a simple summary: in order to improve the performance of HDD Hard disk“Random I / O”The impact on update (“update” and “delete”) processing is introduced“Redo Log”, using the“Continuous I / O”Substitute“Random I / O”。

Does it sound strange? I only heard about it before“Redo Log”It’s used to back up and restore data. Haven’t you heard of it?

Yes, it is“Redo Log”Another important role of, but with“Continuous I / O”Improvement“Random I / O”It’s really an important consideration.

This is why RDBMS relational databases based on Oracle involve redo log, and write log file before write data file.

That’s it“Random I / O”The solution to the impact of write operation, let’s talk about the solution to its impact on read operation (select).

Because the data is stored in the database (“insert”) and later updates (“update” and “delete”), it reflects the original value of the data, that is, the data is saved safely.

The data query (“select”) reflects the added value of the data, that is, the data can be retrieved according to a variety of needs, in order to provide the basis for higher value behavior or decision-making.

So any database optimization of select is the most important, the specific method is also eight immortals across the sea, each show their magic power.

Based on the above reasons and my own lack of knowledge, we can’t explain them one by one today. We are just trying to reduce them“Random I / O”Let’s give a brief explanation.

Oracle based relational database can be reduced by the following two methods“Random I / O”:

1. Use the memory cache to store the read data blocks and share them with all processes. In this way, the repeated reading of the same data block can be reduced as much as possible.

To develop this idea to the extreme is the memory database such as Hana.
   Of course, there are middleware databases like redis, which can help databases like MySQL with imperfect memory caching algorithms to provide additional memory caching functions.

2. Store the data of the same column, not the same row, in a data block to facilitate the aggregate processing of sum and AVG.

The database made of this idea is the famous data warehouse.
    Later, Oracle proposed the concept of "in memory" in the version above 12C. In fact, it is to open up a separate memory area in the memory of the traditional "row storage" database,
    In this area, data is provided to select in the form of "column storage". This concept was also adopted by later tidb in the form of "tiflash".

That’s all for today.

The next article talks about how to deal with multiple processes updating a data record at the same time.

2021/01/11  @  Dalian