“We missed another ship”? The journey of database is the sea of stars!


Introduction | relational database system has gone through 50 years, during which database has become the core component of the digital era. In recent years, domestic databases have blossomed and stepped onto the historical stage. In each stage of historical development, what key evolution has database technology experienced, and where will database technology go in the future? This article is compiled from the speech of CEO of Yunhe enmo and Gai Guoqiang of Tencent cloud TVP on “the song of ice and fire of data – from online database technology to massive data analysis technology” at the techo TVP developer summit. It introduces in detail the key evolution stage and leading technology exploration direction of database technology, This paper expounds the latest trend of database development in China.

Click to watch the wonderful speech video

1、 Three trends of database industry

Today, it is appropriate to use these four words to measure China’s database field. A large number of domestic databases are emerging in the market. Different domestic databases have found broader application space in different scenarios. This is a Chinese moment in the era of database.

In order to explain these views, first share some of my judgments about the industry.

1. Enter the new database Era

First of all, I think the development of database technology has gone through three times: from the commercial database era to the open source era, and then evolved to today – I call it the new database era. Note that many people call today’s era “cloud database era”, but I prefer to use the word “new” because foreign and domestic database markets show different development patterns.

The commercial database era is mainly represented by Oracle, which has derived a series of commercial database software companies. In this era, many domestic database companies began to explore very early.

The second era is the era of open source database represented by mysql. Open source has made the Internet. Today, we use many Tencent services and products. Why can we get fast, cheap and even free Internet services? It is because the base of the Internet is carried by the spirit of free and open source, and open source technology has made the Internet.

Third, today has entered a new database era. Open source database technology generates value by relying on the cloud. In the past, it was difficult to evaluate how much commercial sales and commercial value open source technologies generated, but today they can be sold on the cloud, and its value can be measured. The report issued by Gartner yesterday mentioned that Microsoft database has surpassed Oracle and become the first in the global market. Tencent cloud, Alibaba cloud and Huawei cloud are the leaders of the three databases in the Chinese market. What does this mean? This shows that the cloud has become the most important position of database in the new era.

However, China’s database field also presents a new characteristic – a database ecology in full bloom. Because the independent innovation and iteration of domestic databases started late, we still have to go through the foreign technical road again, which is a different pattern of Chinese databases. Tdsql started in 2012. Today, a series of domestic databases are emerging in the market.

2. Database becomes the ultimate battle on the cloud

What is the second big trend? It is the database that finally becomes the ultimate battle on the cloud.

Because the most basic characteristic of the cloud is based on IAAs. When the construction of IAAs layer is completed, followed by PAAS layer, PAAS layer must solve the problem of database. Whoever can get through the PAAS layer first can win the competitive advantage.

For example, AWS announced in 2019 that it would completely replace 7500 Oracle databases. Why is it a very important event? Because at the annual Oracle global conference, Oracle will laugh at Amazon: you sell open-source databases and your own cloud databases on the cloud, but in fact you have to buy a large number of Oracle licenses from me. So Amazon promised to replace them all and complete them in 2019. What is the essence of this replacement? I quoted a quote from Amazon’s chief evangelist after the replacement. He said that most of our traditional DBAs’ working time is spent on database expansion, storage expansion and license negotiation. Today, these works can be completed through cloud self-service. This is an essential change. This essential change is the Internet technology Cloud technology enables traditional work to be completed automatically, which I think is essentially the core of change.

3. Tob market is the key to the success of database

The Chinese market is slightly different. I think the Chinese market will be an era of long-term coexistence of public cloud and hybrid cloud, and private cloud will still occupy a broad market. What kind of database pattern will appear under this development pattern?

I summarize it as that the cloud experience should eventually be transferred to the data. Since not all databases and data can be connected to the public cloud, we need to transfer the best user experience and automatic autonomy features of the public cloud to the deployment of private data environment. This is called TPC in the analysis report, which means that only with IAAs and infrastructure, Instead of becoming a real private cloud, it also needs to have core capabilities such as the user experience of the Internet and the elasticity of the real cloud. Therefore, I think the next step in the database field, especially in the Chinese market, is to experience the cloud. In the end, the cloud and the cloud will tend to be the same. The cloud will become the only form of infrastructure provision in the future, but there are two versions of public and private, and the essential core will converge.

2、 Academic view of database technology

Just now we talked about several major trends we have seen in the production practice of the industrial sector: from business to open source to the cloud era, from above the cloud to below the cloud, and below the cloud experience. Let’s take a look at what academia is discussing and thinking about.

Among the five great gods in the field of database awarded by Turing Award in recent 50 years, the first is Bachmann, who is the initiator of mesh database. The second is Codd, who is the founder of the relational database we are discussing today. Through a paper published in 1970, he gave birth to the magnificent pattern of the whole relational database. He won the Turing Award in 1981. The third is gray, who is an in-depth researcher of transaction theory. He is the soul of database products including Microsoft in industrial implementation. The fourth is Professor Michael stoneblake, who won the Turing Award in 2014 and launched many database projects. To share with you, Ullman, who just won the Turing Award in 2021, his main achievement is in the field of teaching. He has written a very famous Dragon Book and is the enlightenment mentor of many data scientists.

You can think a little about what changes have taken place in this evolution process? From the earliest theoretical founder to the explorer of affairs; Then to stoneblake’s attempt to commercialize in industry – from relational row storage to column storage to big data, he has explored many fields; Returning to today, we say that Ullman started from academia and school education to rethink whether these database theories still have the possibility of exploration, innovation and innovation. I just discussed with Mr. Li Haixiang. He said that he was thinking about such things recently, trying to make new thinking and innovation from the transaction model. Therefore, I regard it as the development of relational database theory. There is no way to doubt that there is no way to develop it, but we begin to look back to see whether there is still a bright future on these basic theories. This is my point of view.

Professor Ullman put forward such a view in his recent paper, which is called “the war of data science”. So you have seen that I mentioned two wars. One is that database is the ultimate battle on the cloud, which is recognized by everyone; The second is the battle of data science. For a long time, there has been a pessimistic voice in the field of database. People say that the database management system is becoming irrelevant. The sentence often talked by database makers is “have we missed another ship”, which is data science. The vigorous development of data science and the hot current situation make the database people feel that we have missed another class of the best cruise ship. However, Professor Ullman proposed that database and the technology generated by database research are still the most essential core of data science. This has not changed. The core of database system has always been how to process the maximum amount of data as much as possible, and we should study all data. I think he reinterpreted the essence of database. What do we people who study database and data technology want to do? Is to store all data, study all data, and finally generate insight in these data. If artificial intelligence can play a truly innovative role in data science, the world can become a better place, but its base must be some native data.

3、 Core evolution and innovation of Oracle Database

1. Macro evolution

Since there are some views in the academic community that storing all data and studying all data is still the essence of data management system, I would like to review how Oracle, which is still the king in today’s database field, completed this road.

Let me briefly list the key journey from Oracle8 to today. On Oracle8, we can see that Oracle has launched the Internet version 8i. Did Oracle watch the Internet late? No, it has defined products as I since 1998, born for the Internet. When was 1998? Many post-90s may still be in kindergartens at that time, but the Internet was already discussing databases at that time. Oracle does research and development for parallel processing and database Native XML support, which is just when I started, so I first studied the implementation of this technology in the database. What did you do when you arrived at Oracle9i? It has become a cluster, and the cluster has really matured. It is a distributed cluster with shared storage. Oracle has developed its own Linux distribution and started to do database automation technology. In 2004, Oracle10g made Oracle automated storage management technology, which is a very successful product. It directly led to the disappearance of some companies making storage software in the market, so you can imagine its influence. This is a very important technical event.

In 2008, after entering the version of oracle11, it began to develop all-in-one products and started to do technologies such as read-write separation on the database side. However, note that I highlighted a time point here: in 2006, AWS launched S3 technology. I think only one thing Oracle database missed in the development process is cloud. When AWS launched S3 in 2006, the founder of Oracle replied in 2008, saying that I think the cloud is an old wine in a new bottle without new technological innovation, which is a concept hyped by everyone. However, when we look back, his judgment is wrong. It is precisely because of the wrong judgment in this step. We can see that the leaders in the database field are successful in the cloud. For example, Microsoft’s cloud has succeeded, so Microsoft has become the first database and Oracle has dropped to the second. So we say that insight into the future is very difficult, but very important. It is related to life and death, which is why we are here today to discuss the future of the database.

We pushed forward rapidly. Oracle began to do distributed in the 12C version launched in 2017. It is not that Oracle does not have distributed as we understand. In 2018, 18C also made its cluster into a piecemeal architecture and supported IOT. By 2019, in 19C, it supports intelligent indexing – I think intelligent indexing is an innovation and has really been used in production practice. Then to 20c – Oracle will release a version every year, named after the year, but note that due to the epidemic, 20c has not been released and will be merged into this year’s release 21C – persistent memory has been applied to the database. What does this mean? It shows that Oracle database is still an innovative product. It is constantly applying leading technologies to the database kernel to provide new productivity, including artificial intelligence index, which represent today’s cutting-edge. Including its multi-mode and combination of software and hardware – so most of these technical routes taken by Oracle seem to be right in today’s database field. It only misses the cloud.

2. Micro evolution

What I just said is actually a few macro concepts. At the micro level, I found some small points to analyze with you.

What is the most important thing for databases to focus on performance evolution today? To break up the original serial points into parallel points is actually a huge performance improvement. Whether Oracle, tdsql or other domestic databases, the main innovation and performance improvement today is to do such things. Let me give a few simple examples: for example, when Oracle9i started to split the shared memory pool into seven pieces; Distributed fragmentation of storage through ASM; Split the process between master and slave. For example, from 12C, the log writing process has become a master-slave. In the course of decades, there is only one Oracle log writing process. A single process writes logs, but from 12C, it has become a master-slave. This is a difficult change. We know that the performance bottleneck of most databases will occur in the synchronous disk dropping of logs, and everyone is doing optimization. 19c’s real-time statistical information, coupled with today’s 19C intelligent index – in fact, the idea of this intelligent index is relatively simple. It is to simulate people’s thinking, think through people’s expert system, create an index, try to confirm, improve the performance, retain it, and delete it if the performance decreases – but it is not easy to realize it in industry. In 20c, automatic parameter adjustment and autonomous active / standby switching are proposed to integrate these capabilities into the database. This is the road of technological change that Oracle has gone through for more than 40 years, and it is still running forward on this road.

4、 Taking Tencent tdsql as an example, the evolution of domestic databases

Let’s look at how domestic databases take the road of technological evolution. I mentioned earlier that relational database technology has gone through 50 years since Dr. koder’s paper in 1970. In the eyes of database makers, we often think that relational database is coming to an end. Where is its future? I think the future of relational database should be in China. Why in China? Because China has the largest data infrastructure and the most centralized data application system, this is my judgment.

If we look at China, at least many business systems are centralized at the provincial level, and a relatively large province may have hundreds of millions of people. Such a centralized data infrastructure is unimaginable abroad. Why can relational database theory break through today? I think I must find a breakthrough in the application, so in fact, I’m also watching the development and evolution of Tencent tdsql. From making tdsql gradually usable and mature, this is the first iteration. The second step is iteration. It must be used externally to the broadest user group and user scenario. The most noteworthy case is Weizhong bank. What Weizhong bank has achieved today is that there are 600 million peak transactions in a single day, and the maximum TPS can reach 100000. What is this concept? 600 million transactions. You can imagine how many transactions can be created by a large user group. Such frequent and high-frequency transactions will promote the continuous improvement and progress of the database. If you have used Oracle, you will know the peak value of transactions and the number of concurrent transactions that Oracle database can carry in the production system. If you can see an Oracle database with more than 10000 transactions per second, it is already a great challenge. However, today, under the Internet mode and the distributed architecture, massive applications and massive high concurrency can be supported, So this is the catalyst that I think can be brought in the Chinese scenario. These catalysts will promote the database to find new upward space.

In terms of expansion, once an open source database is applied to the financial core, what are the core problems that must be solved? The first is the problem of data security. How to ensure the consistency of data is absolutely reliable. Tdsql has transformed the strong consistency of data. The master-slave nodes are strongly synchronized by default, which is a technological innovation. Second, when we use the distributed architecture, it means that the data nodes to be managed will expand rapidly. The original single database storage may now have hundreds of data nodes under this architecture, so hundreds of data nodes can no longer rely on people for data maintenance. Because they can’t do it, there is a group of data in it, There are 10000 service nodes running in Weizhong bank. What should I do? It must rely on highly automated monitoring and fault handling means, preferably without human intervention. I think this is also an important future for the development of data technology. With these, I just mentioned that the whole tdsql is based on distributed construction, and the separation of reading and writing is its basic essence. Such a system has gradually found its future development path under the support of such scenarios.

This is a case two years ago. Under the customer scenario of Zhangjiagang rural commercial bank, tdsql completely replaced the core of financial transactions in the past. Weizhong bank is an internet bank. Its inherent advantage is that it has no historical burden. However, for a traditional bank, its data architecture and data application are very complex, including not only transactions, deposits and withdrawals, accounts, but also reconciliation. It is actually very difficult to implement such a complex business scenario on a new database architecture. In this case, Tencent and the end customer should have been online for more than two years. Now it seems that the result is very good. I have been paying attention to the one active three standby mode and second level failover.

But another very important information given to me in this case is that the red rabbit and flat magpie system provided by tdsql has become a foundation of automatic operation and maintenance. Why do I think this is very important? The future database must not be a single database kernel system, but a database ecosystem, including automatic operation and maintenance, autonomous active / standby switching, autonomous high availability and other features. These features are also what Oracle is doing today. I think they will be very competitive core features in the future.

5、 The future direction of database technology

To sum up, just now are the observed changes in industry applications, changes in database technology from theory to practice, and changes in Oracle, including the development of tdsql. What should the evolution of database look like in the future? First of all, I said that the replacement of the database must not be a long march again. The commercial database cannot be replaced with the domestic database, but it is not easy to use. In this way, users can not accept it. Therefore, it must not be the degradation of function and experience, but the upgrading. But how can it be upgraded? I think these five directions are what we should think about:

The first is distributed. Distributed can solve elastic scaling and fault self-healing. These two are equally important. More importantly, when a fault occurs, it can heal itself without emergency intervention.

The second is intelligence, which applies artificial intelligence technology to the database, for example, to solve the core challenges faced by DBAs in the past. In the past, DBA used to do optimization, and a large part of its work was to help the database optimize the index. Therefore, Oracle’s priority in this step is to turn the intelligent index into automation. When the conventional basic work is completed, the index becomes intelligent, and the execution plan is controllable, then the database no longer needs a lot of human intervention.

The third is multimodality, which is controversial. I mentioned earlier that Amazon has replaced 7500 Oracle databases with its own cloud database. How many cloud databases can you imagine replacing the original 7500 Oracle databases? It’s probably 75000 or even 750000. Why? Because Oracle is a multi-mode database, which can store large objects, IOT and other types of data. Although this hybrid storage may not be the best, it is the simplest interface for users. I think the development of database must also be a process of opening, opening and closing. It will eventually move towards opening and closing. A single interface output is the best for users. If it is decentralized, it is necessary to achieve a high degree of automation.

The fourth is the integration of software and hardware. Database technology must not be developed independently. It must rely on the progress of hardware technology to achieve excellent performance improvement. Therefore, why are many databases building all-in-one machines today? The overall performance is improved through high-speed network and high-speed nvme storage, so the coordinated development of software and hardware is indispensable. In the future, the core of database technology should be optimized at the CPU processor level. I think this can be done, and it will be done in China in the future.

The fifth is cloud integration. The cloud will become the only form in the future. Although public cloud and private cloud are still two worlds, they will converge, and the technology stack and user experience will converge. Moreover, the off cloud experience is the most important trend in the development of database technology.

Therefore, these five views are integrated together, which represents my judgment on the future development of database. To repeat, if domestic databases want users to get a better experience rather than degradation, there is only one way to make up for it, that is, automation. To solve this problem through automated ecology and tools, the kernel may not be up to the world-class level, but automation can help users not face the underlying complex infrastructure.

Finally, click the question, which is called DBA’s song of ice and fire. Many DBA friends once fell into confusion, especially the DBAs in the Oracle camp. He said that can Oracle DBAs survive under the wave of domestic databases? Many people often ask me if I will be eliminated by history? I said no. first of all, we believe that data will become the core asset of digital enterprises in the future, and it will become more and more important. This is a recognized premise. Second, the operation of the whole database is relatively complex. It involves the host, network and storage. It is a skill demand of the whole stack and has certain technical barriers. Therefore, DBAs should not be afraid of unemployment. Third, from the original database management, all your accumulated experience can find a broader employment space today. If we fully understand Oracle or MySQL technology, you can even become a product designer, product manager and kernel developer of domestic databases. If you can turn the advanced technology of foreign databases into product design and implementation driver, You have become the core driving force of our domestic database, so in the domestic era, our road is not narrower, but wider, and our career path is broader and brighter. DBA is a group of people who are very happy to explore, share and summarize. As long as we have such ability, it is not complicated to transfer our original learning to a new technical route.

Finally, I have another thought. In the new era, we database makers should remember these two words: one main engine and one standby engine, and commercial open source is appropriate. Learning only one database must not be enough. Both business and open source should contact, learn and correct each other. There is one major and one minor. You can choose two subjects. In this domestic era, in the short term, we tend to overestimate the difficulty, and in the long term, we always underestimate the opportunity. If you recognize the change of this industry, transform as soon as possible, and join the wave of domestic databases as soon as possible, we are the first people to seize the opportunity. Only by bending into the game can we be one step ahead.

Trainer Introduction

Gai Guoqiang

CEO of Yunhe enmo, Tencent cloud TVP

Founder of Yunhe enmo and chairman of acdu. One of the most famous Oracle technology promoters in China. His monographs such as in-depth analysis of Oracle and step-by-step Oracle have been widely praised by Oracle technology lovers. In 2009, Mr. Gai Guoqiang founded Yunhe enmo, which is committed to providing professional data services, products and solutions for Chinese users. The dbpaas products and expert services of Yunhe enmo have served more than 500 enterprise customers at home and abroad. In 2019, he initiated the creation of inkjet technology community and acdu (all China DBA Union), committed to the continuous dissemination and promotion of data knowledge and applications.