Kubernetes stability assurance manual – minimalist version

Time:2022-1-3

Introduction: kubernetes is more and more adopted in the production environment, and the complexity is higher and higher, which brings more and more challenges to the stability guarantee.

Kubernetes stability assurance manual - minimalist version

The adoption rate and complexity of kubernetes in the production environment are higher and higher, which brings more and more challenges to the stability guarantee.

For kubernetes based cloud products, stability assurance has become a basic demand. Stability defects will bring huge losses to the products, such as loss of users, decline in user confidence, slow product iteration, etc.

The stability of different products based on kuberes’ best practices can’t be guaranteed, but the stability of products based on the same standard practices can’t be guaranteed in the same product stack.

Therefore, based on the past development practice and kubernetes’ stability guarantee experience, try to form kuberentes stability guarantee manual to precipitate the best practice of stability guarantee, so that everyone can form a comprehensive understanding of kubenretes’ stability guarantee theory, and the corresponding tools and services become infrastructure and reused in products of similar technology stacks, Accelerate the dissemination, iteration and application of stability assurance best practices.

As the first article of kubernetes stability assurance manual, this article abstracts the core content of stability assurance and serves as the simplest user manual of stability assurance.

Minimalist manual objectives

  • 1min to understand stability assurance objectives
  • 3min grasp the global view of stability guarantee
  • One stop search for recommended tools or services for stability assurance

Stability guarantee objective

  • Meet the demands of service or product for stability
  • Accelerate service or product iterations

Stability guarantee inspection items

Kubernetes stability assurance manual - minimalist version

Kubernetes stability assurance manual - minimalist version

Kubernetes stability assurance manual - minimalist version

Stability guarantee level

Kubernetes stability assurance manual - minimalist version

practice

Methodology

Global view

Practice process:

  1. Sort out the operation link diagram and mark whether the link is a key link
  2. Observability configuration based on operation link diagram
  3. Controllability management based on link importance

In order to reduce the cost of practice, it is necessary to grasp the elements and interaction relationships in cloud products and deconstruct complex systems from the basic elements and interactions:

  • Element (Class 2) cloud product component cloud product
  • Interaction (2 types, 3 scenarios in total) internal components of cloud products, between components, between cloud products, between cloud products

As shown below:

Kubernetes stability assurance manual - minimalist version

along withNumber of elementsandInteraction relationshipWith the increase of, the system will gradually become more and more complex, and the challenges faced by stability assurance will become greater and greater. It is necessary to avoid introducing unnecessary complexity.

Therefore, it is necessary to sort out the current operation link diagram, analyze the link importance, sort out the large diagram of components, and judge the explosion radius of components. On this basis, it is also necessary to review the participants to avoid a single point of risk in personnel investment.

Example of operation link diagram:

Kubernetes stability assurance manual - minimalist version

Example of link importance:

Kubernetes stability assurance manual - minimalist version

Example of interaction between cloud products:

Kubernetes stability assurance manual - minimalist version

Based on the above analysis of system complexity and operation links, we can effectively propose and implement solutions to the problem domain of stability guarantee.

Problem handling

Practice process:

  1. Long term maintenance role list, function flow chart and operation link diagram
  2. Perceive the occurrence and recovery of problems in multiple hierarchical “alarm groups”
  3. Handle problems and duplicate problems in the only “problem handling group”

For complex systems, there are usually the following role relationships:

Kubernetes stability assurance manual - minimalist version

Sort out the roles of each layer, and make it easy for the participating students to find the target students, which will shorten the problem processing time.

Problem domain

summary

Kubernetes stability assurance manual - minimalist version

recommend

Kubernetes stability assurance manual - minimalist version

Kubernetes stability assurance manual - minimalist version

Kubernetes stability assurance manual - minimalist version

Kubernetes stability assurance manual - minimalist version

follow-up

For kubernetes stability assurance manual, the following chapters will be refined, summarized from the perspective of methodology and tools / services, and shared with you after the first edition:

Kubernetes stability assurance manual - minimalist version

Author: Wu Peng
Original link
This article is the original content of Alibaba cloud and cannot be reproduced without permission

Recommended Today

Vue2 technology finishing 3 – Advanced chapter – update completed

3. Advanced chapter preface Links to basic chapters:https://www.cnblogs.com/xiegongzi/p/15782921.html Link to component development:https://www.cnblogs.com/xiegongzi/p/15823605.html 3.1. Custom events of components 3.1.1. Binding custom events There are two implementation methods here: one is to use v-on with vuecomponent$ Emit implementation [PS: this method is a little similar to passing from child to parent]; The other is to use ref […]