How can k8s provide more efficient and stable editing capability? Analysis of implementation mechanism of k8s Watch

Time:2022-5-7

About us

For more cases and knowledge about cloud native, please pay attention to the [Tencent cloud native] official account with the same name~

Benefits:

① The background of the official account replies to the [manual] to obtain the “Tencent cloud native roadmap manual” & “Tencent cloud native best practices”~

② The official account backstage reply [series] can obtain the collection of 15 series of 100+ super practical cloud native original dry goods, including kubernetes cost reduction and efficiency increase, k8s performance optimization practices, best practices and other series.

③ The back office of the official account replies to the [white paper] to obtain the Tencent cloud container security white paper and the source of cost reduction – cloud native cost management white paper v1.0

④ The backstage Reply of the official account [introduction to speed of light] can obtain the 50000 word essence tutorial of Tencent cloud experts, Prometheus and grafana.

author

Wang Cheng, R & D Engineer of Tencent cloud and kubernetes member, is engaged in containerization of database products, resource management and control, and pays attention to kubernetes, go and cloud native fields.

catalogue

1. General

2. Start with http

2.1 Content-Length

2.2 Chunked Transfer Encoding

2.3 HTTP/2

3. Apiserver startup

4. Etcd resource encapsulation

5. Client watch implementation

6. Server watch implementation

7. Summary

summary

Entering the k8s world, you will find that almost all objects are abstracted as resources, including k8s core resources (POD, service, namespace, etc.), CRD and apiservice extended resource types. At the same time, k8s bottom layer abstracts these resources into restful storage. On the one hand, the server is stored in etcd in the form of directory (/ registry / xxx). On the other hand, it also provides restful API interface for the client to facilitate the operation of resources (get / post / put / patch / delete, etc.).

K8s watch API is a mechanism to continuously monitor the changes of resources. When there are any changes in resources, they can be transmitted to the client in real time, sequentially and reliably, so that users can flexibly apply and operate against the target resources.

How is the k8s watch mechanism implemented? What technologies does the bottom layer depend on?

This paper will analyze the implementation mechanism of k8s watch from the aspects of HTTP protocol, apiserver startup, etcd watch encapsulation, server watch implementation, client watch implementation and so on.

The process overview is as follows:

This article and subsequent related articles are based on k8s v1 twenty-three

Start with http

Content-Length

As shown in the figure below, the HTTP sending request or server response will carry content length in the HTTP header to indicate the total data length of this transmission. If the content length is inconsistent with the actual transmission length, an exception will occur (if it is greater than the actual value, it will timeout, if it is less than the actual value, it will be truncated and may cause confusion in subsequent data parsing).

curl baidu.com -v

> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: baidu.com
> Accept: */*

< HTTP/1.1 200 OK
< Date: Thu, 17 Mar 2022 04:15:25 GMT
< Server: Apache
< Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT
< ETag: "51-47cf7e6ee8400"
< Accept-Ranges: bytes
< Content-Length: 81
< Cache-Control: max-age=86400
< Expires: Fri, 18 Mar 2022 04:15:25 GMT
< Connection: Keep-Alive
< Content-Type: text/html

What if the server does not know the total length of data to be transmitted in advance?

Chunked Transfer Encoding

HTTP has added chunked transfer encoding since 1.1, which decomposes the data into a series of data blocks and sends them in one or more blocks, so that the server can send data without knowing the total size of the sent content in advance. The length of the data block is expressed in hexadecimal, followed by \ R \ n, followed by the block data itself, followed by \ R \ n, and the termination block is a block with a length of 0.

> GET /test HTTP/1.1
> Host: baidu.com
> Accept-Encoding: gzip

< HTTP/1.1 200 OK
< Server: Apache
< Date: Sun, 03 May 2015 17:25:23 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< Content-Encoding: gzip

4\r\n        (bytes to send)
Wiki\r\n     (data)
6\r\n        (bytes to send)
pedia \r\n   (data)
E\r\n        (bytes to send)
in \r\n
\r\n
chunks.\r\n  (data)
0\r\n        (final byte - 0)
\r\n         (end message)

In order to realize the resource change of watch server in the way of streaming, http1 1. The server side will tell the client in the header to change the transfer encoding to chunked, and then conduct block transmission until the server side sends data of size 0.

HTTP/2

Http / 2 does not use chunked transfer encoding for streaming transmission, but introduces the transmission in the unit of frame. Its data has completely changed the original encoding and decoding mode, and the whole mode is similar to many RPC protocols. The frame is binary coded. The byte at the fixed position of the frame header describes the length of the body. The body can be read until flags encounter end_ STREAM。 This method naturally supports the server to send data on the stream without notifying the client of any changes.

+-----------------------------------------------+
|                 Body Length (24)                   | ----Frame Header
+---------------+---------------+---------------+
|   Type (8)    |   Flags (8)   |
+-+-------------+---------------+-------------------+
|R|                 Stream Identifier (31)          |
+=+=================================================+
|                   Frame Payload (0...)        ...    ----Frame Data
+---------------------------------------------------+

In order to make full use of the high-performance stream features of HTTP / 2 on server push and multiplexing, k8s provides http1.0 when implementing a restful watch 1 / in the alpn (application layer protocol negotiation) mechanism of http2, http2 is preferentially selected on the server. The negotiation process is as follows:

curl  https://{kube-apiserver}/api/v1/watch/namespaces/default/pods/mysql-0 -v

* ALPN, offering h2
* ALPN, offering http/1.1
* SSL verify...
* ALPN, server accepted to use h2
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7f2b921a6a90)
> GET /api/v1/watch/namespaces/default/pods/mysql-0 HTTP/2
> Host: 9.165.12.1
> user-agent: curl/7.79.1
> accept: */*
> authorization: Bearer xxx
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!

< HTTP/2 200 
< cache-control: no-cache, private
< content-type: application/json
< date: Thu, 17 Mar 2022 04:46:36 GMT

{"type":"ADDED","object":{"kind":"Pod","apiVersion":"v1","metadata":xxx}}

Apiserver startup

Apiserver starts with cobra command line, analyzes relevant flags parameters, and starts the service through run after completing – > validate logic. The start-up entrance is as follows:

// kubernetes/cmd/kube-apiserver/app/server.go
// NewAPIServerCommand creates a *cobra.Command object with default parameters
func NewAPIServerCommand() *cobra.Command {
   s := options.NewServerRunOptions()
   cmd := &cobra.Command{
      Use: "kube-apiserver",
      ...
      RunE: func(cmd *cobra.Command, args []string) error {
         ...
         // set default options
         completedOptions, err := Complete(s)
         if err != nil {
            return err
         }

         // validate options
         if errs := completedOptions.Validate(); len(errs) != 0 {
            return utilerrors.NewAggregate(errs)
         }

         return Run(completedOptions, genericapiserver.SetupSignalHandler())
      },
   }
   ...

   return cmd
}

In the run function, initialize apiserver chains (apiextentionsserver, kubeapiserver and aggregatorserver) in order to serve the resource requests corresponding to CRD (user-defined resources), k8s API (built-in resources) and API service (API extension resources). Relevant codes are as follows:

// kubernetes/cmd/kube-apiserver/app/server.go
//Create apiserver chains (apiextentionsserver, kubeapiserver and aggregatorserver) to serve CRD, k8s API and API service respectively
func CreateServerChain(completedOptions completedServerRunOptions, stopCh

After that, start secureservinginfo in a non blocking run mode Serve, configure the relevant transmission options of http2 (enabled by default), and finally start serve to listen to client requests.

For security reasons, k8s apiserver only supports HTTPS requests from clients, not http.

Etcd resource encapsulation

Etcd implements the watch mechanism, which has experienced the transformation from etcd2 to etcd3. Etcd2 monitors the change of resource events through long polling; Etcd3 implements watch stream through grpc based on http2, and the performance has been greatly improved.

Polling: due to http1 X has no server push mechanism. For the data change of the watch server, the simplest way is, of course, to pull the client: the client pulls the data synchronization from the server every fixed period of time, no matter whether the server has data change or not. However, there must be problems of untimely notification and a large number of invalid polling.

Long polling: it is an optimization based on this polling. When the client initiates long polling, if the server has no relevant data, it will hold the request until the server has data to send or timeout.

When configuring apiserverconfig in the previous step, the etcd for underlying storage is encapsulated. Take kubeapiserverconfig as an example to illustrate how k8s built-in resources encapsulate etcd underlying storage.

First, restoptionsgetter is instantiated through buildgenericconfig to encapsulate reststorage. Then instantiate the reststorage of k8s built-in resources through installlegacyapi – > newlegacyreststorage, including podstorage, nsstorage, pvstorage, servicestorage, etc., which is used for the back-end resource storage called by apiserver when processing client resource requests.

The source code of installlegacy API is as follows:

// kubernetes/pkg/controlplane/instance.go
//Register the built-in resources of k8s and package them into the corresponding reststorage (such as podstorage / pvstorage)
func (m *Instance) InstallLegacyAPI(c *completedConfig, restOptionsGetter generic.RESTOptionsGetter) error {
   ...
   legacyRESTStorage, apiGroupInfo, err := legacyRESTStorageProvider.NewLegacyRESTStorage(c.ExtraConfig.APIResourceConfigSource, restOptionsGetter)
   if err != nil {
      return fmt.Errorf("error building core storage: %v", err)
   }
   if len(apiGroupInfo.VersionedResourcesStorageMap) == 0 { // if all core storage is disabled, return.
      return nil
   }

   controllerName := "bootstrap-controller"
   coreClient := corev1client.NewForConfigOrDie(c.GenericConfig.LoopbackClientConfig)
   bootstrapController, err := c.NewBootstrapController(legacyRESTStorage, coreClient, coreClient, coreClient, coreClient.RESTClient())
   if err != nil {
      return fmt.Errorf("error creating bootstrap controller: %v", err)
   }
   m.GenericAPIServer.AddPostStartHookOrDie(controllerName, bootstrapController.PostStartHook)
   m.GenericAPIServer.AddPreShutdownHookOrDie(controllerName, bootstrapController.PreShutdownHook)

   ...
   return nil
}

In the instantiated etcd underlying storage, the switch enablewatchcache is used to control whether the watch cache is enabled. If enabled, the storagewithcache logic will be used first, and then the uncorrected storage will be used to call the underlying etcd3 storage.

K8s currently only supports etcd3 and no longer supports etcd2. K8s fully trusts the watch mechanism of etcd3 to ensure the consistency between the resource state and the underlying storage of etcd.

The whole calling process is as follows:

K8s all kinds of resources (CRD / core / aggregator) expose the HTTP request interface in a restful style, and support various types of encoding and decoding formats, such as JSON / yaml / protobuf.

Client watch implementation

After the above steps, the apiserver server has prepared the reststorage of k8s various resources (etcd3 is encapsulated at the bottom). At this time, the client can send resource requests to apiserver through the restful HTTP interface, including get / post / patch / watch / delete and other operations.

The client watch includes:
(1). Kubectl get XXX -w, obtain certain resources and continuously monitor resource changes;
(2). Various resources of reflector listandwatch apiserver in client go;

We take kubectl get pod -w as an example to illustrate how the client implements the watch operation of resources.

First, kubectl also parses parameters (–watch, or –watch-only) through the cobra command line, then calls the run to call the watch interface under the CLI runtime package, and then through the restclient Watch sends a watch request to apiserver to obtain a streaming watch Interface, and then continuously get watch from resultchan Event。 Then, according to the encoding and decoding type (JSON / yaml / protobuf) sent by the client, the data is read and decoded by frame from the stream, and the output is displayed to the command line terminal.

The client initiates a watch request through the restclient. The code is as follows:

// kubernetes/staging/src/k8s.io/cli-runtime/pkg/resource/helper.go
func (m *Helper) Watch(namespace, apiVersion string, options *metav1.ListOptions) (watch.Interface, error) {
   options.Watch = true
   return m.RESTClient.Get().
      NamespaceIfScoped(namespace, m.NamespaceScoped).
      Resource(m.Resource).
      VersionedParams(options, metav1.ParameterCodec).
      Watch(context.TODO())
}

The implementation process of client watch is summarized as follows:

Server watch implementation

After the server API server is started, it has been continuously monitoring the change events of various resources. After receiving a watch request from a certain type of resource, call the watch interface of reststorage, and control whether to enable the watch cache through the switch enablewatchcache. Finally, click etcd3 The watch package implements the event change event at the bottom of etcd.

Reststorage is the etcd resource storage that is registered and encapsulated in advance when apiserver is started.

etcd3. The watcher uses two channels (incomingeventchannel and resultchannel, with the default capacity of 100) to realize the etcd bottom event to watch Event, and then stream the returned watch through servewatch Interface, constantly taking out change events from resultchan. Then, according to the encoding and decoding type (JSON / yaml / protobuf) sent by the client, encode the data, assemble it according to the frame, and send it to the stream to the client.

The server stream monitors the returned watch through servewatch Interface, the code is as follows:

// kubernetes/staging/src/k8s.io/apiserver/pkg/endpoints/handlers/get.go
func ListResource(r rest.Lister, rw rest.Watcher, scope *RequestScope, forceWatch bool, minRequestTimeout time.Duration) http.HandlerFunc {
   return func(w http.ResponseWriter, req *http.Request) {
      ...

      if opts.Watch || forceWatch {
         ...
         watcher, err := rw.Watch(ctx, &opts)
         if err != nil {
            scope.err(err, w, req)
            return
         }
         requestInfo, _ := request.RequestInfoFrom(ctx)
         metrics.RecordLongRunning(req, requestInfo, metrics.APIServerComponent, func() {
            serveWatch(watcher, scope, outputMediaType, req, w, timeout)
         })
         return
      }
      ...
   }
}

K8s in V1 After 11, set the action of watch / watchlist type The verb is discarded, and the unification is handled by list – > restfullistresource.

The implementation process of server watch is summarized as follows:

In addition to supporting http2, apiserver also supports websocket communication. When the client request includes upgrade: websocket and connection: upgrade, the server will transmit data with the client through websocket.

It is worth noting that the underlying etcd event is converted to watch. Com through the transform function Event, including the following types:

Summary

This paper analyzes the implementation mechanism of k8s watch by analyzing the core processes of apiserver startup, etcd watch encapsulation, server watch implementation and client watch implementation in k8s. The relevant process logic is explained through source code and graphics in order to better understand the implementation details of k8s watch.

K8s bottom layer fully trusts etcd (listandwatch), abstracts all kinds of resources into restful storage, obtains the change events of all kinds of resources through the watch mechanism, and then distributes them to the downstream listening resourceeventhandler through the informer mechanism. Finally, the controller realizes the business logic processing of resources. With the continuous optimization and improvement of etcd3 on the basis of HTTP / 2, k8s will provide more efficient and stable editing capability.

reference material

  1. HTTP/2 Wikipedia
  2. Chunked Transfer Encoding
  3. Kubernetes source code
  4. ETCD watch-api
  5. K8s API Concepts
  6. Server push and client poll

[Tencent cloud native] new products of cloud talk, new technologies of cloud research, new activities of cloud travel, information of cloud appreciation, scan the code to pay attention to the official account of the same name, and get more dry goods in time!!

Recommended Today

Unity realizes a * pathfinding algorithm learning 1.0

1、 Principle of a * routing algorithm If there are two points a and B on the map, set a as the starting point and B as the target point (end point) Here, three values are defined for each map nodeGcost: cost from the starting point (distance)Hcost: cost (distance) from the target pointFcost: the sum […]