Dialogue with core R & D team of Jushan: the road of distributed database self research

Time:2020-1-17

For a long time, the core R & D team of database is very mysterious. As a recluse behind the scenes, what are their views on the development of database and the R & D team of database? In this paper, we will share the way of self-study of distributed database with the “old driver” of core technology research and development team of Jushan database.

Q: As the “old driver” of database industry, can you introduce yourself first?
A: My name is Danny. I am a member of the core R & D team of Jushan database. I am a senior database engineer and architect. I have more than 20 years of experience in database core R & D. I once participated in the architecture design and R & D of DB2, DPF and other products as a member of the DB2 core R & D team.
At present, the team of our North American R & D laboratory has many database experts “old drivers”, all from the core technology team of DB2.
Although many of our teams are “traditional enterprise level it people” from IBM and Huawei, they don’t like to appear in the public. But now it’s a new era of change in the technology circle. Our products have been open-source, so we will let our team’s technology bulls participate in community activities, share our experience in core database research and development, and make progress with you.

Q: As “old IBM”, what do you think of the core R & D team of IT enterprises with a long history like IBM? What do you feel most about this?
A: IBM is the first company to put forward the concept and theoretical system of “relational database”. In terms of technology, the traditional three relational databases in the development process, in fact, have a far-reaching technical reserve. DB2 is the only distributed product among the three traditional relational databases, so our team’s accumulation in distributed technology is the same.
In my more than ten years in DB2, what I feel most is the technical background and precipitation.
For example, before UNIX really supported threading mechanism, they used assembly language to switch and call logical threads for multithreading model, even for different hardware devices. These mechanisms were quite advanced at that time.
When it comes to R & D teams, IBM’s labs are also crouching tigers and hidden dragons. From the beginning of using assembly language, technical experts have been participating in the research and development of database, operating system and compiler bottom layer. It can be said that they created the earliest concept of relational database, and they really built database into a general software platform.

Q: What is the technical difficulty of basic software like database?
A: Database software, especially a real enterprise ready product, is not as simple as you think, just to develop a software.
Technically, database needs not only the inheritance of technical genes, but also innovation.
Database technology has developed for more than 40 years. In the development of technology, database software / platform has become a huge software product system with complex functions, huge architecture and high security requirements. Therefore, technology needs not only the accumulation of technology, but also new innovation.
At the same time, on the application side, because users are old customers who started to use databases 30 years ago, such as banks and governments, they usually can’t bear the risk of overall migration. Therefore, in terms of business technology architecture, it is inevitable to retain the historical legacy of each era. For example, the core IT systems of some banks in North America are still running in the technology platform 40 years ago Up. This also requires the enterprise ready database basic software to have a strong compatibility ability, which can not only ensure the operation of the old business, but also constantly bring forth the new.
This kind of innovation is necessary, but it is the most difficult in technology.

Q: With your nearly 20 years of experience in database industry, what do you think of the core database team?
A: I think the genes of the core R & D team of database are very important. For example, the DB2 team of IBM takes many “old guns” in the field of database as the core, with senior engineers with technical strength. Unlike many open-source new products, they are mainly young innovation teams.
As I mentioned above about the technical complexity and product history span, if database products are to be used in large-scale enterprises, the technical team must have the development experience of traditional database, which is the role of technology veteran.
In short, the basic database software is a combination of innovative technology and technical experience accumulation.

Q: What are the differences in basic software development at home and abroad?
A: Relatively speaking, overseas has the foundation of technical talents, and also has the system like IBM Oracle, which has cultivated a batch of technical talents and teams. So now many new generation basic software product teams in North America are built around the older generation of “old drivers”.
The talent accumulation of basic software in China is not enough, so the Wulin school in the field of basic software has not been fully formed, which is also the reason why domestic enterprises in the field of basic software and AI are crazy to recruit people abroad in recent years. However, due to historical reasons, it will take time for both the Internet and the research team in China to form a unique sect.
Our team here in Jushan has many core technical experts of DB2 team represented by Wang Tao, as well as technical core team members from Huawei, which is a good combination of technology gene and technology innovation.

Q: What is the difference between database development and other software?
A: Because of the characteristics mentioned just now, the basic software, especially the research and development of database, is quite different from other application software. One of the biggest differences is the development language and development model.
From the perspective of computer development, C is the most machine language oriented (assembly code). In principle, each line of C code can be mapped to some assembly instructions accurately, so it is the most accurate from the perspective of the operation system bottom level operation.
C + + is an object-oriented language developed on top of C. In the underlying programming, the advanced features of C + + are rarely used, but its design pattern is very helpful for modular development. Therefore, the use of C + + can not only take into account the most accurate control of the underlying operating system, but also integrate some object-oriented concepts into the code, which plays an important role in the construction of complex systems.
Nowadays, some new development languages are not object-oriented, so they are not suitable for the development of large-scale complex systems. At the same time, these languages simplify many of the most important pointer concepts in C / C + +, making it impossible to complete the precise operation of memory. The concept of pointer is artifact when it is used well, garbage when it is used badly, most programmers with low ability, or projects without very perfect test framework are difficult to grasp such advanced features as pointer perfectly, which makes memory leaks and crash holes everywhere in large project development.
But for our giant fir, we have the experience in the research and development of the DB2 database kernel. From the personnel ability, code quality management, to the perfection of the test framework, we can perfectly control such advanced features, and maximize the performance and processing power of the operating system and the underlying database.

Q: What is the direction of distributed database?
A: According to the common view of Gartner and our CTO Wang Tao, the real special ambassador is that the number of tables that the traditional relational database can’t store is relatively controllable. So there are many workarounds that can solve this problem, which is why, although it is troublesome to divide tables by databases, it is not impossible to solve the application problem.
In fact, the real pain of database is the pooling of data service resources under the “micro service”.
In the process of application transformation from traditional chimney construction to microservice, it is impossible to put an independent database on each microservice. In this case, the data service resource pool needs to be directly oriented to hundreds of upper level requirements from different developers and teams, with different development capabilities, different application types, different SLA security levels, etc.
Therefore, the resource pool must have a series of functions, such as elastic expansion, resource isolation, multi tenant, configurable consistency, multi-mode (supporting various SQL protocols), configurable disaster tolerance strategy in the cluster, etc. at the same time, the computing and storage capacity of each database instance needs to be able to expand unlimited. After all, some microservices may involve a lot of flow data, which cannot be limited to each The resources used by the database instance are limited to one physical device.
So, just for the sake of distributed OLTP, it just solves the problem that does not constitute a rigid need (it can be solved by dividing databases and tables for a long time). However, in the environment of microservice application development, the database should provide services to the upper layer from the perspective of resource pooling. At the same time, each database instance in the resource pool should also support a series of features such as distributed transactions, so that it can be compared with the traditional database Fully compatible.

Q: Since the release of 3.0, sequoiadb has received good feedback from the community and the market. Can you tell us something new about the product?
A: In the near future, we will release a new version, in which the performance of OLTP scenario selection will be greatly improved, as well as the SQL processing capacity. In the distributed transactional business, the overall performance will be improved by 2-3 times compared with the current version, and the performance will be 5-6 times higher than that of similar products.
We will also share and introduce these activities in this week.
On March 30, the second phase of our Jushan technology day will also be held in Beijing this weekend. We will also bring some in-depth technology sharing. There will also be live video at that time. I hope you can pay more attention and participate in it! In the future, we will also have more “mysterious” databases “old drivers” to share technology, trends and information with you~
Dialogue with core R & D team of Jushan: the road of distributed database self research