Openkruse version 0.5.0 release, support lossless streaming batch release strategy

Time:2020-9-28

Openkruse version 0.5.0 release, support lossless streaming batch release strategy

The author wishes Alibaba cloud technical experts

Reading guide: openkruise is an open-source large-scale application automation management engine of Alibaba cloud, which is benchmarked with kubernetes’ native deployment / stateful set controller in terms of function. However, openkruse provides more enhanced functions, such as elegant in place upgrade, release priority / fragmentation strategy, multi use zone workload abstract management, and unified sidecar Container injection management, etc., are the core capabilities polished by Alibaba’s super large-scale application scenarios. These features help us cope with more diverse deployment environments and requirements, and bring more flexible deployment and release combination strategies for cluster maintainers and application developers.

At present, in Alibaba’s internal cloud native environment, most applications uniformly use openkruise’s capabilities for pod deployment and release management. However, many industry companies and Alibaba cloud customers turn to openkruise as the application deployment carrier because the load such as k8s native deployment can not fully meet the demand.

Background issues

Before introducing the new capabilities of openkruse, let’s take a look at the release capabilities provided by native k8s workload

  • Deployment currently supports maxunavailable and maxsurge

Openkruse version 0.5.0 release, support lossless streaming batch release strategy

  • Statefulset currently supports partition:

Openkruse version 0.5.0 release, support lossless streaming batch release strategy

  • Other workloads, such as daemonset, only support maxunavailable.

These strategies are feasible in the test environment or small scenarios, but they can not fully meet the large-scale application scenarios. For example:

  • First of all, deployment does not support batch publishing of grayscale. Do you want to upgrade only 20% of the pods for verification? Sorry, I can’t do it. Users can only set a smaller maxunavailable to wait for it to be sent out completely, or when there is a problem in publishing, the emergency pause will be suspended;
  • Statefulset does support grayscale partition, but at present, only one pod can be upgraded. If the total number of replicas is hundreds or thousands, then a release may have to wait until dark.

New features in v0.5.0

Here, we only introduce the two main function changes of cloneset and sidecarset in v0.5.0. Interested students can see the details of version changes in GitHub changelog https://github.com/openkruise/kruise/blob/master/CHANGELOG.md 。

Cloneset supports maxsurge strategy

In the cloud native environment of Alibaba, most stateless applications are managed by cloneset. In order to meet the extreme deployment requirements of large-scale applications, we support:

  • Upgrade in place (before and after publishing, the pod object, IP and volume remain unchanged, and only the container image is upgraded)
  • Reduce replicas to specify pod deletion
  • Rich publishing strategies (streaming, grayscale batching, priority, fragmentation, etc.)

In February, we launched cloneset to open source. Cloneset has attracted wide attention since it was released. At present, many well-known Internet companies have been using it.

The initial version of cloneset does not support maxsurge (expand first and then release), only supports maxunavailable, partition and other policies. This is not a problem for Alibaba’s internal large-scale applications. However, many community users have small-scale applications on their platforms. If they cannot be configured to expand before shrinking, the usability of the applications may be affected in the release phase.

After receiving the feedback from the community on issue ාාාාාාාාාාාාාාාා񖓿ාාා񖓿񖓿񖓿ා񖓿񖓿. So far, cloneset has covered all publishing policies of k8s native workload. The following figure constructs the publishing functions currently provided by cloneset:

Openkruse version 0.5.0 release, support lossless streaming batch release strategy

For the time being, we will not give a detailed description of cloneset’s release strategy. We will have a special article to introduce it later. Let’s just look at how the new maxsurge is implemented with streaming and batch publishing? Let’s take a few simple examples

  1. Setting maxsurge + maxunavailable + partition Publishing
apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
# ...
spec:
    The total number of replicas: 5 # pod is 5
  updateStrategy:
    Maxsurge: 20% more than 5 * 20% = 1 pod (rounding up)
    Maxunavailable: 0 ා guaranteed release process 5 - 0 = 5 pods available
    Partition: 3 ා keep 3 old versions of pods (only release 5 - 3 = 2 pods)

When publishing, cloneset will first expand one pod according to maxsurge. At this time, the total number of pods is 6 (5 old versions, 1 new version)

$ kubectl get clone demo
NAME    DESIRED   UPDATED   UPDATED_READY   READY   TOTAL   AGE
demo    5         1         0               5       6       17m

Subsequently, cloneset will gradually update the way of deleting and creating pods on the premise of ensuring maxunavailable until partition = 3 is met, that is, the remaining three old versions of pods. At this time, because the expected final state has been reached, cloneset will delete one pod of the new version. At this time, the total number of pods is 5 (3 old versions, 2 new versions)

$ kubectl get clone demo
NAME    DESIRED   UPDATED   UPDATED_READY   READY   TOTAL   AGE
demo    5         2         2               5       5       17m

Here, we can observe for a period of time. When you need to continue to publish, you can change partition to 0 again. Then, cloneset will expand one more pod according to maxsurge. At this time, the total number of pods is 6 (3 old versions, 3 new versions)

$ kubectl get clone demo
NAME    DESIRED   UPDATED   UPDATED_READY   READY   TOTAL   AGE
demo    5         3         2               5       6       17m

Subsequently, cloneset will gradually update the method of deleting and creating new pods on the premise of ensuring maxunavailable until partition = 0 is met, that is, all pods are upgraded to the new version. Finally, cloneset will delete one pod of the new version. At this time, the total number of pods is 5 (5 new versions)

$ kubectl get clone demo
NAME    DESIRED   UPDATED   UPDATED_READY   READY   TOTAL   AGE
demo    5         5         5               5       5       17m
  1. Maxsurge with in place upgrade

Cloneset provides pod in place upgrade and rebuild upgrade, both of which can be released with maxsurge / maxunavailable / partition policies.

apiVersion: apps.kruise.io/v1alpha1
kind: CloneSet
# ...
spec:
  updateStrategy:
      type: InPlaceIfPossible
    maxSurge: 20%

If maxsurge is configured in the in place upgrade mode, cloneset will expand the number of pods with maxsurge first, then upgrade the old version of pod by in place method (update the image image image in the pod spec), and finally clean up and delete the number of pods with maxsurge after meeting the partition end state.

In this way, the business availability of the publishing process is guaranteed, and the IP, volume and other information in the process of pod publishing are kept unchanged as far as possible.

Sidecarset supports volume injection merging

Sidecarset is another heavyweight function provided by Kruse. Unlike cloneset / statefulset, which manages the pod workload, sidecarset is responsible for managing the version and injection of sidecar containers in the cluster.

The new feature in the version of v0.5.0 is to solve the conflict of duplicate definition of volume in sidecar set and pod during sidecar container injection. This is also the feedback from a community issue 訚 254. They use sidecarset to collect logs and manage the sidecar, and hope to inject it into all pods in a bypass way.

For example, we need to inject a log collection sidecar container into each pod in the cluster. However, on the one hand, we can’t let every application developer add the definition of this container to their cloneset / deployment. Second, even if we add this container to all application workloads, if we want to upgrade the mirror version of the log collection container, we have to update all the applied workload. This is too expensive!

And the sidcarset provided by openkruse is to solve this problem. We only need to write the sidecar definition to a global sidcarset. No matter whether the user deploys in cloneset, deployment, stateful set, etc., the expanded pod will be injected with the sidecar container we have defined.

Openkruse version 0.5.0 release, support lossless streaming batch release strategy

Taking log collection as an example, we can first define a sidecarset

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
  name: log-sidecar
spec:
  selector:
    matchLabels:
      App type: long term ා inject into all pods with long term tags
  containers:
  - name: log-collector
    image: xxx:latest
    volumeMounts:
    - name: log-volume
      Mountpath: / var / log ා mount the volume of log volume to the / var / log directory and collect the logs under this directory
  volumes:
  -Name: log volume ා defines a volume named log volume
    emptyDir: {}

Here, you may ask, what if the directory path of each application’s log is different? Don’t worry. This is the function of this volume merge.

At this time, for example, the original pod of application a is as follows:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app-type: long-term
spec:
  containers:
  - name: app
    image: xxx:latest
    volumeMounts:
    - name: log-volume
      Mountpath: / APP / logs ා apply your own log directory
  volumes:
  -Name: log volume ා defines a volume named log volume
    persistentVolumeClaim:
        claimName: pvc-xxx

Then Kruse webhook will inject the log sidecar container defined in sidecarset into the pod

apiVersion: v1
kind: Pod
metadata:
  labels:
    app-type: long-term
spec:
  containers:
  - name: app
    image: xxx:latest
    volumeMounts:
    - name: log-volume
      Mountpath: / APP / logs ා apply your own log directory
  - name: log-collector
    image: xxx:latest
    volumeMounts:
    - name: log-volume
      mountPath: /var/log
  volumes:
  -Name: log volume ා defines a volume named log volume
    persistentVolumeClaim:
        claimName: pvc-xxx

As you can see, since the name of the log volume defined in sidecarset and pod is log volume, the volume defined in pod will prevail during injection. For example, the volume in pod uses PVC to mount PV. After injecting sidecar, the volume will also be hung to the / var / log directory in the sidecar container, and then log collection can be performed.

In this way, the sidecar container is managed in the way of sidecarset, which is not only coupled with the application deployment and publishing, but also can share the volume volume volume with the application container to realize the functions of log collection, monitoring and other related sidecar.

summary

The upgrade of version v0.5.0 mainly brings the ability of application lossless release and sidecar container management more convenient.

In the future, openkruse will continue to make further optimization in the application deployment / release capability. We also welcome more students to participate in the openkruse community to jointly build a richer and more perfect k8s application management, delivery and expansion capabilities, which can be oriented to more scale-up, complex and extreme performance scenarios.

More cloud information, cloud cases, best practices, product introduction, visit: https://yqh.aliyun.com/
This article is the original content of Alibaba cloud and can not be reproduced without permission