We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage

Time:2021-12-1

On November 7, hacking camp 2021 ecology, CO sponsored by tidb community x Jingwei China and sponsored by Chuxin capital, Mingshi capital, Jiyuan capital and juicefs, held a defense meeting, which expounded the phased achievements of the project and the prospect of future work.

Some hacking camp projects are star projects from tidb Hackathon, and some are new ideas from ecological partners. This hacking camp takes ecology as the theme and helps partners complete the incubation of the project. The six projects involved in the activity have basically completed the set objectives. After graduation, they will continue to improve the relevant functions and iterate the new version to be more stable. During this period, the tutor will continue to provide guidance for the project and help the project polish.

The items that hacking camp participated in the defense include:
Distributed POSIX file system juicefs with tikv as metadata engine

Implementation of serverlessdb for HTAP providing serverlessdb service based on tidb

Tidb for PostgreSQL for optimizing PG compatibility on tidb

Tibigdata, a one-stop solution of tidb in the field of big data

Hugegraph with tikv as back-end storage

Use tidb as the Doris connector upstream of the data

The judges reviewed the project completion, application value, contribution to tidb ecology and defense completion. Finally, serverlesdb for HTAP won the unanimous high scores of the jury and won the two awards of “excellent graduate” and “best application”.

Special thanks to the following reviewers:
Xu Zhihao, executive manager of Mingshi capital, Liu Yang, flomesh CTO & co-founder, Wang Cong, tidb team tech leader, Zhang Jian, R & D director of pingcap, and Li Jianjun, tikv maintainer

Let’s take a look at the graduation results of the project

JuiceFS:

Juicefs is a cloud native POSIX distributed file system. Combined with tikv as the data element engine, juicefs can provide 10 billion file scale and EB level data storage capacity, and still maintain stable delay in large-scale. In the metadata operation performance test, the average time consumption of the tikv engine is about 2 ~ 4 times that of redis, which is slightly better than that of local mysql.

At present, the main functions have been developed and released in version v0.16, and have passed the pjdfstest test. It has been used by users in testing and production environments. In the future, juicefs will take tikv as the first metadata engine in large-scale production environment, and actively introduce the new features of tikv under the condition of ensuring compatibility.
We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage

ServerlessDB for HTAP

The ultimate goal of the project is to turn the cloud database service into a black box, so that application developers only need to focus on how to convert the business into SQL, and users no longer have to worry about the amount of data, business load, whether the SQL type is AP or TP, which are not related to the business.

Development content

Service load module:

The business load module evaluates whether the resources currently providing services match the current business load, and establishes a business load model for decision-making on capacity expansion and contraction.

Serverless module:

The serverless module will check the CPU utilization of all computing nodes and the underlying storage capacity in real time to trigger the expansion and contraction of computing / storage resources.

Database middleware:

The middleware is used to decouple user connections and background database service nodes, so that even if users use the connection pool, the middleware can balance the traffic to all new nodes after capacity expansion.

Rule system:

Through the rule system, the resource allocation within a specific time range can be fixed. Through rule setting, allocate resources in advance before traffic growth

Serverless service orchestration module:

Through the service orchestration module, the creation, release and dynamic adjustment of tidb cluster are realized; Realize k8s local disk management, and solve the problem that cloud disk cannot be provided for privatization deployment;

When developing admission webhook to shrink tidb components, the middleware registry records are deleted in advance to realize user imperceptible shrink.

Follow up R & D plan:

Hint and rule modules are planned to be added to distinguish TP / AP more accurately, and the CPU utilization of middleware can be reduced by more than half

Provide richer load balancing algorithms, such as SQL based Runtime cost

Middleware increases business flow control. If the business load grows too fast and exceeds the growth rate that serverless can handle, the background service will be unstable. Through flow control, it can well handle the surge of business flow.

The project has also won the hacking camp excellent graduate and best application award ~ it seems that the review is moved by the vision and development strength of the project. Welcome to watch and try ~

Project address:https://github.com/tidb-incub…

TiDB for PostgreSQL

Initiated by Digital China, the project aims to provide tidb’s compatibility with PostgreSQL while retaining tidb’s high availability, elasticity and scalability. Allows users to connect existing PostgreSQL clients to tidb and use PostgreSQL specific syntax.

Currently completed development:

Delete syntax transformation

Add a specific PgSQL syntax return keyword

Complete sysbench_ TPCC is tested under PgSQL protocol and compared with the native tidb test under this version

Complete the benchmark test under the benchmarksql pgsq l protocol and compare it with the native tidb test under this version

Comparison of benchmark test results:

We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage
In the future, it is planned to support the system library table structure, graphical client and abstract protocol layer to switch different protocols at any time. Welcome to play ~

Project address:https://github.com/DigitalChi…

TiBigData

Tibigdata provides connectors for various OLAP computing engines of tidb, including Flink, Presto and MapReduce. In hacking camp, I mainly work on the development of Flink related functions.

We have implemented snapshot source and ticdc streaming source in Flink. Combined with these two sources, we have achieved the integration of stream and batch of tidb.

The second is data interworking. We use the cross data center deployment of tikv and the follower read function of Flink connector to realize the real interworking of offline data.

Finally, the calculation push down. We are compatible with the tikv push down operator in all kinds of connectors, which can greatly improve the data scanning and calculation efficiency.
We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage

Tibigdata core enhancements:

The universal ability of tidb java client is enhanced. We have implemented tidb encoder. The encoder code is decoupled from tispark, which can adapt to other OLAP engines, and can also be referenced by other community partners in need as a general tool.

Some data type conversion tools are implemented, and the Flink / Presto data type and tidb data type are converted to each other.

The distributed client of tikv is realized, which is more suitable for the distributed computing framework from the API level.

In the future, we will continue to develop change log write, tidb x Preto / Trino, Flink state backend in tikv, etc. interested students can join the community to play ~

Project address:https://github.com/tidb-incub…

HugeGraph on TiKV

Hugegraph on tikv is suitable for scenarios that require large-scale graph databases, and is particularly suitable for scenarios that require high read-write performance and have the needs of tivk storage operation and maintenance team.

Implemented functions:

Supports single graph instances

Support the addition, deletion, modification and query of schema

Support loader to import data, and support the addition, deletion, modification and query of vertices and edges

Support Kout, kneighbor and other traversal algorithms, gremlin query and index query (incomplete)

Effect display:

Import data [Xinyu novel coronavirus pneumonia dataset], see the map effect through the HugeGraph-Hubble interface:
We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage
Performance test results:

Import speed (write)
We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage
Query by ID (random read)
We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage
Follow up plan:

Perfect function

Support multi graph instances, truncate / clear graph data, monitoring interface metrics, TTL and other advanced functions

performance optimization

Write performance optimization: commit mode, batch sizing, etc

Query performance optimization: data coding optimization, sorting optimization, etc

Project address:https://github.com/tidb-incub…

Doris Connector:

Take tidb as the data source and provide Doris with a native connector to open up the data flow of tp-ap scenario. It is applicable to DML / DDL synchronization support and filtering data with specified conditions. At present, the project progress is 70%.

Design ideas

Stream load: an independent service is designed in tidb, which reads and parses tidb binlog files regularly, assembles data lines into CSV format files, and imports them into Doris through stream load.

Routine load: synchronize binlog to Kafka with the help of tidb’s container. Doris realizes data synchronization by adding tidb binlog data format

Tidb native protocol synchronization: implement tidb replica synchronization protocol in Doris, and disguise Doris as a node of tidb cluster.
We graduated! Hacking camp 2021 is completed, and the six major ecological projects have entered a new stage

Subsequent planning:

The project will continue to iterate, starting from the user’s real scene to make the data processing link more unimpeded. The project will be merged into the Doris trunk later.

Project address:https://github.com/apache/inc…

This phase of hacking camp ended in the defense of six wonderful projects, but the ecological maintenance is long-term. We will continue to provide follow-up support for these excellent ecological projects to ensure the lasting vitality of the project. Students interested in the project should also pay attention to the follow-up tweets. The founding team will interpret the value of the project to the whole tidb ecology from the application level. Please look forward to the special meetup planning!

From planet Ti to the universe, we use hacking to connect a wider range of ecology. 2021 tidb Hackathon is also about to open. Come and explore the mystery of database technology with us!