On November 7, hacking camp 2021 ecology, CO sponsored by tidb community x Jingwei China and sponsored by Chuxin capital, Mingshi capital, Jiyuan capital and juicefs, held a defense meeting, which expounded the phased achievements of the project and the prospect of future work.
Some hacking camp projects are star projects from tidb Hackathon, and some are new ideas from ecological partners. This hacking camp takes ecology as the theme and helps partners complete the incubation of the project. The six projects involved in the activity have basically completed the set objectives. After graduation, they will continue to improve the relevant functions and iterate the new version to be more stable. During this period, the tutor will continue to provide guidance for the project and help the project polish.
The items that hacking camp participated in the defense include:
Distributed POSIX file system juicefs with tikv as metadata engine
Implementation of serverlessdb for HTAP providing serverlessdb service based on tidb
Tidb for PostgreSQL for optimizing PG compatibility on tidb
Tibigdata, a one-stop solution of tidb in the field of big data
Hugegraph with tikv as back-end storage
Use tidb as the Doris connector upstream of the data
The judges reviewed the project completion, application value, contribution to tidb ecology and defense completion. Finally, serverlesdb for HTAP won the unanimous high scores of the jury and won the two awards of “excellent graduate” and “best application”.
Special thanks to the following reviewers:
Xu Zhihao, executive manager of Mingshi capital, Liu Yang, flomesh CTO & co-founder, Wang Cong, tidb team tech leader, Zhang Jian, R & D director of pingcap, and Li Jianjun, tikv maintainer
Let’s take a look at the graduation results of the project
Juicefs is a cloud native POSIX distributed file system. Combined with tikv as the data element engine, juicefs can provide 10 billion file scale and EB level data storage capacity, and still maintain stable delay in large-scale. In the metadata operation performance test, the average time consumption of the tikv engine is about 2 ~ 4 times that of redis, which is slightly better than that of local mysql.
At present, the main functions have been developed and released in version v0.16, and have passed the pjdfstest test. It has been used by users in testing and production environments. In the future, juicefs will take tikv as the first metadata engine in large-scale production environment, and actively introduce the new features of tikv under the condition of ensuring compatibility.
ServerlessDB for HTAP
The ultimate goal of the project is to turn the cloud database service into a black box, so that application developers only need to focus on how to convert the business into SQL, and users no longer have to worry about the amount of data, business load, whether the SQL type is AP or TP, which are not related to the business.
Service load module:
The business load module evaluates whether the resources currently providing services match the current business load, and establishes a business load model for decision-making on capacity expansion and contraction.
The serverless module will check the CPU utilization of all computing nodes and the underlying storage capacity in real time to trigger the expansion and contraction of computing / storage resources.
The middleware is used to decouple user connections and background database service nodes, so that even if users use the connection pool, the middleware can balance the traffic to all new nodes after capacity expansion.
Through the rule system, the resource allocation within a specific time range can be fixed. Through rule setting, allocate resources in advance before traffic growth
Serverless service orchestration module:
Through the service orchestration module, the creation, release and dynamic adjustment of tidb cluster are realized; Realize k8s local disk management, and solve the problem that cloud disk cannot be provided for privatization deployment;
When developing admission webhook to shrink tidb components, the middleware registry records are deleted in advance to realize user imperceptible shrink.
Follow up R & D plan:
Hint and rule modules are planned to be added to distinguish TP / AP more accurately, and the CPU utilization of middleware can be reduced by more than half
Provide richer load balancing algorithms, such as SQL based Runtime cost
Middleware increases business flow control. If the business load grows too fast and exceeds the growth rate that serverless can handle, the background service will be unstable. Through flow control, it can well handle the surge of business flow.
The project has also won the hacking camp excellent graduate and best application award ~ it seems that the review is moved by the vision and development strength of the project. Welcome to watch and try ~
TiDB for PostgreSQL
Initiated by Digital China, the project aims to provide tidb’s compatibility with PostgreSQL while retaining tidb’s high availability, elasticity and scalability. Allows users to connect existing PostgreSQL clients to tidb and use PostgreSQL specific syntax.
Currently completed development:
Delete syntax transformation
Add a specific PgSQL syntax return keyword
Complete sysbench_ TPCC is tested under PgSQL protocol and compared with the native tidb test under this version
Complete the benchmark test under the benchmarksql pgsq l protocol and compare it with the native tidb test under this version
Comparison of benchmark test results:
In the future, it is planned to support the system library table structure, graphical client and abstract protocol layer to switch different protocols at any time. Welcome to play ~
Tibigdata provides connectors for various OLAP computing engines of tidb, including Flink, Presto and MapReduce. In hacking camp, I mainly work on the development of Flink related functions.
We have implemented snapshot source and ticdc streaming source in Flink. Combined with these two sources, we have achieved the integration of stream and batch of tidb.
The second is data interworking. We use the cross data center deployment of tikv and the follower read function of Flink connector to realize the real interworking of offline data.
Finally, the calculation push down. We are compatible with the tikv push down operator in all kinds of connectors, which can greatly improve the data scanning and calculation efficiency.
Tibigdata core enhancements:
The universal ability of tidb java client is enhanced. We have implemented tidb encoder. The encoder code is decoupled from tispark, which can adapt to other OLAP engines, and can also be referenced by other community partners in need as a general tool.
Some data type conversion tools are implemented, and the Flink / Presto data type and tidb data type are converted to each other.
The distributed client of tikv is realized, which is more suitable for the distributed computing framework from the API level.
In the future, we will continue to develop change log write, tidb x Preto / Trino, Flink state backend in tikv, etc. interested students can join the community to play ~
HugeGraph on TiKV
Hugegraph on tikv is suitable for scenarios that require large-scale graph databases, and is particularly suitable for scenarios that require high read-write performance and have the needs of tivk storage operation and maintenance team.
Supports single graph instances
Support the addition, deletion, modification and query of schema
Support loader to import data, and support the addition, deletion, modification and query of vertices and edges
Support Kout, kneighbor and other traversal algorithms, gremlin query and index query (incomplete)
Import data [Xinyu novel coronavirus pneumonia dataset], see the map effect through the HugeGraph-Hubble interface:
Performance test results:
Import speed (write)
Query by ID (random read)
Follow up plan:
Support multi graph instances, truncate / clear graph data, monitoring interface metrics, TTL and other advanced functions
Write performance optimization: commit mode, batch sizing, etc
Query performance optimization: data coding optimization, sorting optimization, etc
Take tidb as the data source and provide Doris with a native connector to open up the data flow of tp-ap scenario. It is applicable to DML / DDL synchronization support and filtering data with specified conditions. At present, the project progress is 70%.
Stream load: an independent service is designed in tidb, which reads and parses tidb binlog files regularly, assembles data lines into CSV format files, and imports them into Doris through stream load.
Routine load: synchronize binlog to Kafka with the help of tidb’s container. Doris realizes data synchronization by adding tidb binlog data format
Tidb native protocol synchronization: implement tidb replica synchronization protocol in Doris, and disguise Doris as a node of tidb cluster.
The project will continue to iterate, starting from the user’s real scene to make the data processing link more unimpeded. The project will be merged into the Doris trunk later.
This phase of hacking camp ended in the defense of six wonderful projects, but the ecological maintenance is long-term. We will continue to provide follow-up support for these excellent ecological projects to ensure the lasting vitality of the project. Students interested in the project should also pay attention to the follow-up tweets. The founding team will interpret the value of the project to the whole tidb ecology from the application level. Please look forward to the special meetup planning!
From planet Ti to the universe, we use hacking to connect a wider range of ecology. 2021 tidb Hackathon is also about to open. Come and explore the mystery of database technology with us!