Open source databases are so popular, why do we have to work hard on our own research?


Summary:From open source to self-development, those doors in the database field.

When it comes to databases, we must talk about open source.

However, for a long time, taking relational database as an example, it has always been a patent held by commercial companies. The whole database market was monopolized and divided by these large companies. Until the emergence of the first open source version of MySQL in the 1990s, there was today’s open source database market.

According to the latest database popularity ranking released by DB engines, among the top 10, open source databases alone occupy 7 seats, including relational databases MySQL and postgre SQL, and non relational databases mangodb, redis, elasticsearch and Cassandra.

Because open source databases are so popular, more and more commercial companies are willing to do more in-depth optimization based on these open source databases.

Why self-study based on open source database?

Although there is no high commercial license fee for open source database, there are many problems in using open source database, especially in the era of data supported Internet, it is impossible to carry all kinds of accidents alone.

Many open source databases have poor ease of use and weak supporting capacity, which need to be maintained continuously. Moreover, once the data loss problem is encountered, it is difficult to recover quickly, and the loss caused is immeasurable. At the same time, open source database also has to face various large and small costs such as server, database maintenance and upgrading, human operation and maintenance, which is difficult to meet the rapid expansion and sustainable development of business.

At this time, many cloud manufacturers will do some work for DBA operation and maintenance personnel once and for all, let the open source database go to the cloud, and take care of the “trivial” operation and maintenance work at the bottom.

Taking Huawei cloud RDS series products as an example, RDS for MySQL, RDS for PostgreSQL and DDS document database service (document type Mongo) are database services built based on open source, focusing on the requirements of the most basic cloud native development for cloud database. They are mainly aimed at business scenarios with small data scale and general performance requirements, and provide extreme cost-effective solutions.

However, problems follow. The cloud on the open source database can only solve the demands of small and medium-sized enterprises such as simplified deployment, operation and maintenance, optimization and extreme cost performance, but can not meet the stringent requirements of finance, government and enterprises on data security, response speed, reliability and availability.

Weighing the pros and cons, many enterprises will choose the combination mode of open source database + commercial database to ensure the availability and reliability of data.

Gaussdb series is a new generation of distributed database product series created by combining Huawei’s database R & D experience accumulated for many years. Based on self-research and innovation and based on a unified architecture, gaussdb series embraces and is compatible with MySQL, Mongo and other ecosystems on the one hand, and creates opengauss ecology on the other hand. It is mainly for government and enterprise customers, emphasizing the demands of high performance, high reliability and high security.

In terms of relational databases, Huawei cloud officially launched the cloud native gaussdb (for MySQL) database in July this year. At the same time, the distributed database gaussdb (opengauss) built based on Huawei’s open ecosystem opengauss kernel will also be officially released commercially within the year.

In terms of non relational databases, focus on building cloud native gaussdb NoSQL multi-mode database series, and support multi protocol interfaces such as document type (Mongo), wide table type (Cassandra), temporal (influx), K-V (redis). At present, gaussdb (for Mongo), gaussdb (for Cassandra) and gaussdb (for redis) have been launched.

Compared with open source databases, gaussdb series databases support NDP (near data process) technology, which can integrate computing and data, accelerate data processing, and greatly improve the overall performance.

Taking gaussdb (for MySQL) as an example, it is based on Huawei’s latest generation of DFV distributed storage, adopts a computing storage separation architecture, supports the rapid expansion of read-only nodes with 1 write and 15 read, supports up to 128tb of mass storage, and can achieve over one million QPS throughput. The performance of a single node is 7 times higher than that of native mysql.

Gaussdb NoSQL has a strong multi-mode data management capability. It has made a qualitative leap compared with pure open source software in terms of concurrent read-write capability, capacity expansion and scaling, fault reconstruction time, backup efficiency and recovery efficiency.

Most importantly, Huawei’s gaussdb database fully supports diversified computing power including Kunpeng and x86, and has E2E R & D capability from chip to server, storage, operating system and database. Therefore, it has the unique advantages of database software and hardware performance tuning. For example, the gaussdb database operator pushes down storage, which improves the performance by 30% compared with the friend database.

Opengauss to create a new open source database ecosystem

While actively embracing the existing open source database ecology, Huawei cloud is also building an opengauss ecology.

Opengauss is an open source relational database management system, which is distributed under Mulan loose license v2. Its kernel is derived from PostgreSQL and focuses on building competitive features in the direction of architecture, transaction, storage engine and optimizer. It is deeply optimized on the chip of arm architecture and compatible with x86 architecture. Its technical features are as follows:

The concurrency control technology based on multi-core architecture, NUMA aware storage engine and SQL bypass intelligent routing execution technology release the multi-core expansion capability of the processor and realize the performance of 1.5 million TPMC in the two-way Kunpeng 128 core scenario;

Support rapid fault switching with RTO < 10s and full link data protection to meet safety and reliability requirements;

Through intelligent parameter tuning, slow SQL diagnosis, multi-dimensional performance self-monitoring, online sql time prediction and other capabilities, the operation and maintenance is made from complex to simple.

Huawei opened the source code of opengauss community version in June 2020(, encourage capable partners to launch database based on opengauss and jointly prosper the database industry ecology.

At present, Huawei cloud has launched a commercial version of gaussdb (openguass) based on opengauss kernel to enhance distributed capability, and more business partners will join in the future.

It should be emphasized that opengauss is an open Ecology: open architecture, open code, open technology and open community. It will not let the database ecology move from closed Oracle to another closed “new oracle” because it is mainly promoted by Huawei. In this way, opengauss can enable more “fellow believers” to solve defects and understand the architecture, so that it is more convenient to maintain.

For enterprises, only by choosing an open ecology can their business have better continuity. If they transform from a closed ecology to another closed ecology, the problem of business continuity is not solved in essence.

After all, an unopened ecology is not dynamic, especially database software.

Click focus to learn about Huawei cloud’s new technologies for the first time~