The Overview of TiDB 4.0

Time:2020-8-1

In the last article, Huang Dongxu, CTO of our company, shared our vision of “future database”. I am glad that we have been on the road of “writing a better database”.On April 8, the fifth anniversary of pingcap, we released the first RC version of tidb 4.0, which is a milestone.

In 4.0, we have completed many important and potential features. This article will introduce tidb 4.0 from multiple perspectives, so that you can understand it from the aspects of installation, use, operation and maintenance, ecology and cloud. You are also welcome to use it and give feedback.

One minute deployment of tidb cluster

“How long does it take to deploy a tidb cluster on a single machine? “

Previously, it was very difficult for us to answer this question, but now we can be very proud to say thatOne minute “。 Why is it so fast? Because we have made a new component management tool for tidb 4.0, tiup.

Of course, we need to install tiup first, and use the following command:

curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

After installation, the console will prompt you to use ittiup playgroundTo start a tidb cluster on a single machine, then we can use MySQL client to connect to tidb cluster and start testing happily.

The above is just a single machine test. How do we deploy the tidb cluster when the production environment is on line? Suppose we have ten machines now. How long does it take to deploy a tidb cluster? It was still hard to answer this question before, but now,The answer is still “one minute”Because we can use it easilyTiup cluster function

First, we are ready to deploy the topology. We can refer to the tiup cluster example.

Then execute the deploy command:

tiup cluster deploy test v4.0.0-rc topology.yaml  -i ~/.ssh/id_rsa

Above, we have deployed a tidb cluster named test with the latest version of v4.0.0-rc. Then we can operate and operate the tidb cluster under the name of test, such as using thetiup cluster start testTo start the cluster.

Isn’t it cool? What’s more cool is that tiup will manage the components in the entire tidb ecosystem. Both tidb, tiflash and ecological tools can be managed and used through tiup. Users can also add corresponding component tools to tiup.

Is OLTP or OLAP still a problem?

Is my business OLTP or OLAP? We believe that many times, users can not answer clearly. But we know that what users want most is“No matter what my business is, I need to be able to quickly get results in your database.”

This requirement seems simple, but it is very difficult to meet it. However, in tidb 4.0, we can be proud to say that it is a step closer to the complete completion of this requirement, because we provide a complete set of hybrid transaction / analytical processing (HTAP) solutions, that is, tidb + tiflash.

The Overview of TiDB 4.0

In short, we will handle OLTP type services in tidb and OLAP type services in tiflash. Compared with traditional ETL solutions or other HTAP solutions, we have done more:

  1. Real time strong consistency. The updated data in tidb will be synchronized to tiflash in real time to ensure that tiflash can read the latest data when processing
  2. Tidb can intelligently judge and select row storage or column storage to deal with different query scenarios without user intervention

Make the system more observable

“I just want to know what’s wrong and why do I need to understand the principle of tidb? “– a cry from a user.

Before tidb 4.0, how to effectively check the system problems is not an easy thing. DBA students need to understand the basic architecture of tidb, and even need to be familiar with thousands of tidb monitoring indicators. In addition, they have to accumulate some experience in actual combat to ensure that they can solve problems more efficiently next time. To solve this problem, we provide a built-in dashboard in 4.0. We hope that most problems can be easily located through the dashboard.

The Overview of TiDB 4.0

We always believe that“A picture is worth a thousand words”Many problems can be observed directly through visualization. In the dashboard, we provide:

  1. Keyviz enables us to directly see the distribution of workload access data over a period of time, and quickly diagnose whether there are read-write hotspots and other anomalies in the system.
  2. SQL statement analysis enables us to quickly know which SQL is occupying too much system resources.
  3. Cluster diagnosis can automatically analyze the current status of the cluster, give diagnostic reports and tell users the potential risks.

Fast backup and recovery of 100 TB + cluster

Although tidb uses three copies by default to ensure high availability of data, many users, especially those in finance, securities and other industries, would like to have their data backed up regularly. In the early days when the tidb cluster was small, we could also use the traditional backup tools for backup. However, when the cluster data reaches tens of TB or even 100 TB, we need to consider another way.

In tidb 4.0, we provide aDistributed backup tool Br (Backup & Restore)It directly performs distributed backup of tidb, and stores the data to users’ shared storage, or S3 on the cloud. You can say that,The larger the cluster size, the better the distributed effect, and the faster the br backup. In our internal testsBr can provide 1GB / s backup and recovery speed.

We not only provide the cluster full backup tool BR, but also provideCDC (change data capture)CDC also directly subscribes to the data changes of tidbSecond level, the fastest millisecond levelIncremental data change delivery capability.

Of course, not only BR and CDC, but also tidb 4.0 provides users with a complete set of ecological tools, including the deployment and operation tool tiup mentioned above, as well asDM (data migration), data import tool tidb lightning, etc. Through these tools, we can easily integrate tidb with other ecosystem of users, and provide users with more high-value services.

Hello! Serverless TiDB

We always hope that users can use tidb unconsciously. They just need to pay attention to their own business。 Tidb is a kind of database resource for users, which can be used on demand. This is actually a very important concept in the field of cloud services: serverless.

Before tidb 4.0, in order to ensure that the tidb cluster can withstand business peak requests, users would plan the entire cluster size at the beginning, but most of the time, these resources were in a low utilization state. But in 4.0,Based on kubernetes, we implement flexible scheduling mechanism, making tidb a serverless architecture in the cloud.

Now, users only need to deploy the tidb cluster using the smallest cluster, and then tidb will automatically do some things according to the user’s own business load, including:

  1. Flexible expansion and reduction. When the business peak comes, tidb will automatically add instances to meet the business requests. Otherwise, it can automatically shrink instances.
  2. Automatically distribute hot spots with high read load.
  3. Hot spot isolation, the hot business data is moved to a separate instance to ensure that other services are not affected.

Does that sound cool?We only need to start the tidb cluster at a very low cost, and the later costs will be processed flexibly with the business, which is commonly known as “pay on demand”. All these can be directly experienced on the upcoming tidb dbaas cloud platform

Write it at the end

The above only lists some features of 4.0. Of course, there are many features that are not introduced here. You can slowly experience the tidb 4.0 RC release notes.

In addition, a simple running point is put here to let you quickly feel the performance improvement of tidb 4.0

TPC-C (Note: the test uses tidb dbaas (AWS) high configuration cluster, in which tidb uses two 16 core 32g c5.4xlarge instances, and tikv uses three i3.4xlarge instances of 16 core 122g.)

The Overview of TiDB 4.0

TPC-H 10G (Note: the unit of TPC-H is seconds. The smaller the value is, the better the performance is. Two c5.4xlarge instances of 16 core 32g are used for testing tidb, and three i3.4xlarge instances of 122g 16 core are used for tikv.)

The Overview of TiDB 4.0

Sysbench 16 table, 10000000 table size (Note: the test uses three 16 core 62g virtual machines to deploy 3Tikv, one 40 core 189g server deployment 1 TiDB)

The Overview of TiDB 4.0

We believe that tidb 4.0 is a very exciting version and a solid milestone for tidb on the road of “database of the future”. Of course, in this happy moment, there must be support for our users, because there are everyone, we can go to the present. We also believe that in the future, tidb will become better and better and bring more value to users.

Finally, interested partners are welcome to have a taste and give feedback. Click [here] to add tidb robot (wechat: tidbai) as a friend and reply with “new features”“Tidb 4.0 tasting group”Communication ~

The Overview of TiDB 4.0