How to land gracefully on tidb?

Time:2020-4-8

When it comes to “open source project tidb”, people are always accustomed to the reaction: it has more than 17000 stars on GitHub, and has 260 + contributors from all over the world. However, the data is always cold and can not vividly show the charm of tidb community. So today, I’m sending you a “journey of mind” before and after tedb contributor Du Chuan joined the tidb community. He told you from the perspective of his own experience——

  • Is pingcaper nice enough?
  • How can active participation in the tidb community help you improve your ability?
  • How to find the most suitable place on tidb? (or find your best “little tree branch” hhhhhh on the big tree)
  • And… Make good use of fragment time, you can also give tidb 70 PR a year!

A kind of Author: Du Chuan, tidb contributor

In recent years, I have been submitting some changes to tidb intermittently. I saw some GitHub submission records two days ago, and found that there are more than 70 PR’s in total. Considering that the recent year is basically in the rhythm of crazy overtime, in addition to the busy leisure, I have basically finished the reading list of more than ten books in the front row, I think this is also a modest achievement, worth marking.

In other words, although I started to submit PR to tidb in the middle of 2017, I actually heard about the project of tidb more than a year ago, probably around April 2016. At that time, my main work was also to build a SQL execution engine, so I paid more attention to the news related to the distributed database industry.

Although database is a wheel high-risk field, with various wheels, in China, database, especially the wheel of distributed database, is basically a few large factories in the car, either not open-source, or open-source community is not very active. I’m really impressed by the fact that tidb is trying to build a distributed database from scratch, and it’s still fully open-source. Later, a young brother in the group left and went to pingcap. In the name of face-to-face foundation, I continued to participate in tidb offline meetup for several times, and thus met a lot of tidb community partners.

After returning to Chengdu from Beijing at the end of 2016, the focus of work has changed from pure infra to more business needs. However, after several years of infra, I am still interested in the database kernel. So I started to study the implementation of tidb and built a set of tidb to replace MySQL in the development environment. As we all know, after years of development, MySQL’s SQL syntax is relatively complex. Although tidb is fully compatible with MySQL syntax and protocol, it is impossible to achieve 100% compatibility because it does not reuse MySQL code. If it is implemented in some specific statements, it will definitely be different from mysql. Because I have been developing SQL Engine of OLAP system before, and I am familiar with this area. After encountering this problem, I feel that it is not very troublesome to solve it, so I slowly began to provide some PR to tidb in this area. When I get familiar with it later, I will go to the issue list of tidb to find related issues to solve when I have time, mainly focusing on SQL parser, expression calculation and MySQL compatibility. Recently, I have been working on some features related to aggregate functions.

Because work is still busy at ordinary times, and overtime is a common practice, the time period for submitting PR to tidb and replying to review opinions is mainly on weekends, after the wife goes to bed in the evening, or after lunch break. In this way, there is a problem that the time period is relatively discrete, and it is difficult to have a long time of coherent thinking. So at this stage, on the one hand, I will select some relatively small and independent features when I mention pr.On the other hand, I try to put the development on the weekend with plenty of time, use the evening and other fragmentary time to view and reply to review comments, update code and run regression tests. In this way, it takes about 3 to 5 hours to submit a PR on average, including development, testing and communication with community partners.

However, I think this time investment is very cost-effective. First, I am very interested in database. I regard participation in tidb community development as an interest, which can be seen as a relaxation after work. Second, I have been engaged in database related work, including OLAP SQL before The related work of engine runtime optimization and cloud database is closely related to what we do in the community. For example, a MySQL builtin function, How is the performance under various extreme inputs, or how various combinations of sql_mode affect the behavior of this builtin function? These problems may be difficult for me to fully consider in my daily work. However, to put forward a PR implementation of this builtin function in the community, I have to consider these problems clearly and experience various cases of small partners in the community The bombing test. When this PR is successfully committed, I am familiar with these details.

Speaking of the community, I think tidb has done quite well. On the one hand, pingcapers are very active. The issue raised on GitHub will be replied soon. If there is any question, it will be responded soon through GitHub, wechat group or even Zhihu. On the other hand, it is more important that the community partners can maintain a more rigorous attitude when reviewing pr.

As far as my experience is concerned, some corner cases and detail errors that I didn’t notice in the development process can be basically turned out in the review PR process. This requires not only reviewer to make clear the relevant details of PR corresponding features, construct the possible problematic scenarios, but also reviewer to understand the development ideas of PR authors. The effort involved is often no less than developing the feature itself. In addition, there is another aspect that I think is very good. Tidb has spent a lot of efforts to build a series of testing frameworks from UT, ft to integration testing, which makes it easier for me to test all aspects of the features developed by myself when I participate in the development project, saving a lot of trouble of back and forth.

Generally speaking, it’s a very interesting thing to participate in the tidb community, which brings me a lot of benefits, and I will continue to pay attention to the progress of the tidb project. In a short time, my plan is mainly to take time to complete some features related to aggregate functions on hand, including support for MySQL aggregate functions StdDev, variance, etc., and corresponding changes on tikv coprocessor side. After that, I plan to see if I can combine my previous experience in OLAP SQL engine runtime optimization to improve the capabilities of tidb in OLAP field. But this is a big goal, and then we should discuss it with our community partners.

Tidb community events

Tidb techday2018 will be held in Shenzhen on July 28. At present, the registration is full. See you on Saturday! Click here to view the activity details. P. S welcome to join handsTidb robot (micro signal: tidbai) joins tidb planet ~

Recommended Today

Python basics Chinese series tutorial · translation completed

Original: Python basics Python tutorial Protocol: CC by-nc-sa 4.0 Welcome anyone to participate and improve: a person can go very fast, but a group of people can go further. Online reading Apache CN learning resources catalog introduce Seven reasons to learn Python Why Python is great Learn Python introduction Executing Python scripts variable character string […]