Tencent worker bee: enterprise code management collaborative solution

Time:2020-9-29

Internet agile research and development is inseparable from efficient code management system. As the basic link of R & D process, code management has the function of connecting upstream and downstream R & D links such as demand management, continuous integration and continuous delivery, and also carries the construction of engineer culture such as code quality pursuit and code reuse. Tencent has nearly 30000 R & D personnel, with a long product line and a wide range of business. Different team sizes, technology stacks and R & D modes all put forward different requirements for R & D cooperation, which also leads to uneven code base size and R & D process. At the same time, compiling system, publishing system and so on need to check out all the code. The higher the degree of automation, the greater the access pressure to the code base. Providing secure and stable code services, managing code warehouses of different sizes, and supporting various types of R & D processes are the three major challenges facing code management. Based on the industry situation and its own development needs, Tencent chose git as the foundation and hatched its own git system internally——worker bee

First of all, we should solve the problem of server-side code base storage expansion, because a single storage node can not meet the storage capacity of TB level growth. We can consider two solutions: custom data fragmentation and general distributed file storage. The advantage of distributed storage is that the application layer hides the underlying storage structure, and the architecture is relatively simple. However, for IO intensive code managed applications, it relies too much on the IO performance of the distributed file system, and its portability is not strong. On the contrary, user-defined data fragmentation can freely control the fragmentation strategy and balance resource load flexibly. In addition, on the underlying storage of each partition, it can also combine with distributed storage to further expand data backup. The worker bee chooses the scheme of data fragmentation, takes the warehouse path as the routing rule, and implements cross fragmentation operation in the application layer. Hundreds of thousands of warehouses are distributed in different clusters, which can realize dynamic expansion and seamless migration between clusters.

After the problem of storage expansion is solved, the increase of access gradually exposes the performance bottleneck of single machine. The reading and writing of code base are concentrated on one host, which will lead to the shortage of computing and memory resources. By analyzing the source, a large number of read requests come from the compilation and publishing system. In view of this scenario of more read and less write, worker bees implement the code base level read-write separation mode of one master and many slaves. The write requests are distributed to the host, and the read requests will be evenly distributed to the slave machines according to the current load situation. The data synchronization between master and slave adopts git native operation to ensure the atomicity of operation and data consistency to the greatest extent. At the same time, as the real-time hot standby data, with the help of remote cold standby, a complete code base data disaster recovery system is established to ensure the data security. Figure 1 shows the complete code base back-end storage architecture.

Tencent worker bee: enterprise code management collaborative solution
Figure 1 data fragmentation and read-write separation

How to manage super large libraries has always been a difficult problem for code management tools. Git was originally designed to manage text code files, but there will inevitably be dependent libraries and resource files in the project, especially Tencent’s game business, which contains a large number of pictures, audio and video files, makes this problem more prominent in Tencent. Worker bee introduced git LFS, an open source extension, to host large binary files. As shown in Figure 2, by storing these files outside of the GIT repository and keeping only the text pointer of the files in the GIT warehouse, this method can greatly reduce the size of the GIT warehouse itself and speed up the cloning warehouse. At present, the single large game warehouse hosted by worker bee exceeds 2.5t, and the problem of single library upper limit is solved.

Tencent worker bee: enterprise code management collaborative solution
Figure 2 large file storage

In the overall architecture, the worker bee adopts the popular micro service architecture. In Figure 3, the protocol proxy service provides independent access links for HTTP, SSH and LFS protocols. The data service encapsulates the database access. The routing service addresses the data nodes of the back-end code base for each request. The business services are divided according to the functions provided by the platform, such as code browsing, code statistics, code review, code search, etc., are independent microservices. In addition, the unified registration center and configuration center provide global functions such as service discovery, service routing, abnormal fusing and service configuration. All microservices are designed as stateless mode, which can be easily expanded horizontally. With the ability of container deployment, the number of instances can be adjusted at any time to cope with high concurrency scenarios.

Tencent worker bee: enterprise code management collaborative solution
Figure 3 microservice architecture

If the code tools do not connect with the upstream and downstream R & D processes, the role of code tools in improving R & D efficiency is very limited. One of the advantages of worker bee is its rich open ability, which supports integrated access of third-party system. Webhook push mechanism is convenient for the third party to subscribe to the code base submission event, which is widely used to automatically trigger the compilation and construction of continuous integration system after code submission. Commit check interception mechanism is used to automatically launch code specification, defect detection, unit test and other code checks before code merging, and strictly control the quality of incoming code by setting quality red line. Worker bee also provides abundant restful API, improves private token and OAuth authorization mechanism, provides a safe and effective standardized access method for the third party, and expands the application scenarios of worker bee.

Within Tencent, worker bee has been popularized in six business groups, serving thousands of business lines including wechat and QQ. The number of code base is nearly 200000, the daily visits are tens of millions, and the average daily API calls are millions, which effectively improves the overall R & D efficiency of the company. Under the strategic goal of open source collaboration within the company, worker bees are also imperceptibly changing the way the company cooperates. At present, more than half of the worker bee projects have achieved full internal open source. Using issue to discuss is becoming an effective way of cross team cooperation.

At the end of September this year, “Tencent worker bee – git based R & D engineering platform” project stood out in the selection of China Computer Society and won the “CCF science and Technology Award” in 2019. It is reported that the “CCF science and Technology Award” is awarded to outstanding achievements with important discoveries, inventions and original innovations in computer science, technology or engineering, and with certain international influence in related fields. This award is a great affirmation to the worker bee. In the future, the worker bee will be committed to the exploration of code reuse degree, R & D integration experience, R & D process data measurement and other aspects, and will continue to dig in the code management field to provide greater value for the company and the industry.