The first lecture of “Devops transformation and practice” salon

Time:2021-1-2

On September 19, the 9th Shenzhen meetup, jointly held by coding and Devops community of China, was successfully concluded on the 2nd floor of Tencent building. The theme of this salon is “Devops transformation and practice”. Four world top 500 technology celebrities from Internet, finance and retail industries shared their opinions and experiences on Devops transformation practice. More than 80 audiences and lecturers also conducted in-depth technical discussions to discuss the new opportunities and challenges that enterprises may face under the Devops trend.

图片

Coding has always been committed to giving all developers the opportunity to listen to the most cutting-edge Devops technology sharing, and will later hold a series of Devops technology salons all over the country. At that time, we will provide a platform for students to communicate with each other in different cities! I don’t want to say much. What I want to share with you in this issue isDevops landing and practice in see shop——

Background

SEE small shop is the largest small program service provider in WeChat ecosystem. At present, it is a cooperative business with 10000+ from the media. It is an official SaaS provider of micro-blog, tiktok and other platforms, and a channel for the channel business.

Ma Zhixiong, technical director of see small electric shop, described how to introduce excellent Devops practice in the industry under the background of fierce competition in the e-commerce industry, build an efficient and stable application delivery system, and help the company move towards the era of cloud origin. Focusing on the principles of landing, he shared the objectives, principles and specific practices of Devops practice in see shop.

Introducing Devops practice

During the epidemic period, the staff of see small shop adopted the remote office mode, which put forward higher requirements for everyone’s cooperation and efficiency. All kinds of problems in the R & D process make see shop reflect on how the technical architecture, R & D process, measurement tools and team culture can deliver user value smoothly with high quality.

图片

The first step in continuous reform is to identify problems.After investigation and discussion, see shop first introduced Devops capability maturity model to sort out the problems existing in the current team and the capabilities that need to be supplemented.

See has set up a Devops working group in terms of organizational support, which is specially promoted by the engineering efficiency team. The specific matters of Devops are included in the OKR for target management, and are disassembled to the relevant front-line colleagues. Taking this opportunity, the team culture has been reshaped, and the working group has been talking with front-line colleagues about the future direction of technology, why to do it, and promoting Devops goals, methods and strategies with everyone. On the specific landing path, gray thinking helps to give priority to those colleagues who have high recognition of Devops, starting from projects that are easier to transform and can see the results quickly.

Subsequently, see small shop set a phase I improvement goal:

图片

  • Unified tool platform

In the past, the see shop used a variety of tools, which not only had high maintenance costs, but also was not used smoothly in the actual R & D process. For example, deployment still depends on manual operation in a manual + script way. After investigating all kinds of Devops tools on the market, see small shop finally chose coding Devops as a one-stop R & D platform. The main advantages are that there is no need to jump to multiple products, the whole data is connected, and it is deployed in China, with good access speed, high Chinese support, and the team members are quick to start. In addition, continuous deployment is based on spinnaker, which has powerful functions and better integration with Tencent cloud. Test management does not need to pay separately, and the whole maintenance does not need self built maintenance, which is cost-effective.

  • Unified branch strategy

After the tool switching, the small see shop is faced with a dilemma that the branch strategy is not unified. Gitlab flow and git flow are used together. The whole process is very complex, error prone and time-consuming. When multiple versions are parallel, gitlab flow needs to support multiple environments, which is contrary to the concept of continuous integration. In order to improve the R & D efficiency to a more extreme state, the see shop decided to adopt the trunk development strategy with functional branches based on the team’s own situation, and then constantly talked about how to use it within the team, including how to better dismantle user stories and tasks, how to do git cherry pick and rebase, and how to design commit more atomically.

图片

  • Unified product management

The product management dilemma faced by the see shop before was that the distribution packages of applications were diverse: after the transformation of the whole system through micro services, there were a total of 40 or 50 services to be maintained, and the problem of multi application collection management and configuration was difficult; in addition, the product library was not unified, and there was a lack of unified control and authority control. In this case, see stores use docker and helm to solve the problem of product uniformity. Docker image can standardize delivery components, helm can help simplify k8s resource collection management and configuration, and facilitate rollback and update. Then, fine-grained control of the source of the product library and the permission of the managed product library are realized.

  • Unified integration and deployment pipeline

After years of business development, the number of applications of see small shops is very large, lack of systematic assembly line construction. A few perfect projects use gitlab Ci, and some use Jenkins to build and package. Most applications still use shell script to pull jar package for publishing. The existing automatic deployment is coupled in CI, which makes the environment and configuration complex, difficult to maintain, not standardized, inefficient and prone to errors. Therefore, the see shop first formulated the specification of CI pipeline, including the design of pipeline process according to different technology stacks, the division of full integration and incremental integration according to branch model and Mr process, and the time-consuming standard of each process; then the tool was switched to coding support Jenkins, do a good job in the resource planning of building nodes to ensure that each pipeline will trigger automatic testing, including code format checking, code specification and static analysis, as well as the automatic execution of unit testing, integration testing and E2E testing. The final result of synchronization from a micro enterprise to a dedicated robot group.

图片

The pipeline design of continuous deployment mainly relies on spinnaker to arrange and configure the baker and deployment process. Helm chart is used for the delivery products, and the continuous deployment function of coding is used to manage the release documents, including the creation of bill of lading, parallel deployment, manual approval, etc.

Cloud native infrastructure

In addition to the above four steps, the Devops practice of see shop also runs through many other concepts. First of all, in terms of infrastructure, see small power shop implements the concept of cloud nativity, makes k8s container transformation for the whole application, and directly uses Tencent cloud tke products to host the cluster, so as to help quickly complete the container transformation and carry out micro service and BFF The efficient container layout under the architecture style can realize the standardized delivery of the application, ensure the resource isolation and efficient utilization, and ensure the consistency of the environment. At the same time, when facing the tide of traffic, it can also realize the flexible and rapid expansion and contraction of pod and node based on HPA and cluster autoscaler, and calmly cope with various business activities and big promotion. At the same time, with the help of k8s readness probe and liveness probe, the service can achieve self-healing and fault tolerance. After the introduction of k8s into the infrastructure, the service grid technology service mesh is introduced into see to achieve more refined service governance. Through the core of istio service governance, virtual service and destination rules are combined with coding The continuous deployment of can realize automatic gray-scale publishing, replace the original cluster blue-green publishing strategy, achieve a relatively extreme level of resource utilization and flow control, and solidify the service governance which is difficult to implement in the micro service architecture into a standard infrastructure, so that the development can pay more attention to the application itself.

图片

In the figure above, according to the configuration rules in istio, 10% of the traffic can be sent to the new application according to the percentage. Finally, after verification, the whole traffic can be completely switched to achieve fine traffic transfer.

Reliability & observability

Sre methodology is introduced by see to guide reliability construction. Firstly, SLO reliability modeling includes three dimensions: availability, latency and ticket human intervention. These three dimensions collect white box and black box monitoring indicators for front-end and back-end applications respectively, and the model calculation formula can finally output the whole service indicators. Based on this model, the SRE team built the reliability Kanban of all internal services, which can view and follow up the service reliability indicators of the production environment in real time. Another very good concept in SRE methodology is wrong budget. Based on SLO model, we can calculate the consumption of error budget in a period of time, and then decide whether to adjust the release frequency and rhythm in this period. For example, to maintain 99.9% reliability, the unavailable window period in one month is 43.2 minutes.

Observability is based on efk log platform to collect and aggregate logs of all services, which helps to collect all containers after k8s transformation; index monitoring platform is built based on Prometheus and grafana, which helps to publish to disk and record and warn pod abnormal indicators.

Quality built in

In terms of quality built-in, see shop follows some popular principles in Devops community. The first is quality access control. The coverage rate of all internal indicators of unit test modules must be greater than 80%. The old system will also be continuously supplemented and improved in the project schedule. Web, applet, IOS, Android and react native applications require 100% E2E test to cover all application scenarios. Quality access control also needs to control merge requests. The developed Mr needs to pass CI Automatic testing, code static scanning and paired code review provided by coding will be incorporated into the trunk. Secondly, the test management of coding successfully liberates the see shop from the original excel management test cases, supports the online review of test cases and makes iterative test plans. Dogfood self testing allows smoke testing in the integrated environment, tracking through the test plan in test management, and organizing demo demonstration before testing.

achievements

In the Devops reform and Practice for more than half a year, see small power stores have achieved a number of core indicators improvement, the time of single service deployment has been shortened from several hours to five minutes, the total full environment deployment has been shortened from one or two weeks to one hour, and the development bug rate has been reduced by about 10 times. After adjusting the iterative planning, the frequency of full environment deployment is about twice a week, the change lead time is reduced to less than one week, and the availability reaches 99.9%. More importantly, the team is more focused on the business delivery itself.

Q: Helm chart manages a lot of resources. Do you have any best practices in this area?
A: We have done a rough one size fits all. The front-end application is the public helm chart, and the back-end application is another chart. Each service, as a sub application of chart, is one of the charts. After the standardization of internal engineering effectiveness group, all applications are done according to this template.

Q: How does your team ensure the quality after the release?
A: Our availability is maintained at about 99.9%, but it is inevitable that there will be leakage bugs. We need to make a trade-off between business innovation and stability through SLO. By ensuring that the overall service is maintained at 99.9% reliability, we release our applications as soon as possible.

reminder:
More lecturers’ PPT and speech will be updated gradually after desensitization
Click to open the R & D workflow on coding Devops cloud

Recommended Today

SQL exercise 20 – Modeling & Reporting

This blog is used to review and sort out the common topic modeling architecture, analysis oriented architecture and integration topic reports in data warehouse. I have uploaded these reports to GitHub. If you are interested, you can have a lookAddress:https://github.com/nino-laiqiu/TiTanI recorded a relatively complete development process in my hexo blog deployed on GitHub. You can […]