Flag on ASM – progressive gray release series 3 progressive gray release based on mixerless telemetry

Time:2021-9-18

Introduction:As CNCF [MEMBER](https://landscape.cncf.io/car…[weave flag] (flag. APP) provides capabilities for continuous integration and continuous delivery. Flagger summarizes the progressive release into three categories: * * gray release / Canary release) * *: used for progressive traffic shifting – * * A / B testing) * *: used to convert the

As CNCFmemberWeave FlaggerProvides capabilities for continuous integration and continuous delivery. Flagger summarizes progressive releases into three categories:

  • Grayscale release / Canary release: used for progressive traffic shifting
  • A / B testingUse to request information:User requestRoute to a / b Version (HTTP headers and cookies traffic routing)
  • Blue / Green release: used for traffic switching and mirroring

This article will introduce the progressive gray publishing practice of flag on ASM.

Setup Flagger

1 deploy flagger

Execute the following command to deployflagger(for a complete script, see:demo\_canary.sh)。

alias k="kubectl --kubeconfig $USER_CONFIG"
alias h="helm --kubeconfig $USER_CONFIG"

cp $MESH_CONFIG kubeconfig
k -n istio-system create secret generic istio-kubeconfig --from-file kubeconfig
k -n istio-system label secret istio-kubeconfig istio/multiCluster=true

h repo add flagger https://flagger.app
h repo update
k apply -f $FLAAGER_SRC/artifacts/flagger/crd.yaml
h upgrade -i flagger flagger/flagger --namespace=istio-system \
    --set crd.create=false \
    --set meshProvider=istio \
    --set metricsServer=http://prometheus:9090 \
    --set istio.kubeconfig.secretName=istio-kubeconfig \
    --set istio.kubeconfig.key=kubeconfig

2 deploy gateway

During grayscale publishing, flagger will request ASM to update the virtualservice for grayscale traffic configuration. This virtualservice will use the virtual service namedpublic-gatewayGateway. To do this, we create the relevant gateway configuration filepublic-gateway.yamlAs follows:

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: public-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"

Execute the following command to deploy the gateway:

kubectl --kubeconfig "$MESH_CONFIG" apply -f resources_canary/public-gateway.yaml

3. Deploy flagger loadtester

Flag loadtester is the gray release stage, which is used to detect the application of gray pod instances.

Execute the following command to deploy flagger loadtester:

kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"

4 deploy podinfo and its HPA

Let’s first use the one that comes with the flag distributionHPA configuration(this is an operation and maintenance level HPA). After the complete process is completed, we can use the application level HPA.

Execute the following command to deploy podinfo and its HPA:

kubectl --kubeconfig "$USER_CONFIG" apply -k "https://github.com/fluxcd/flagger//kustomize/podinfo?ref=main"

Progressive grayscale Publishing

1 deploy Canary

Canary is the core CRD for gray publishing based on flag. SeeHow it works。 We first deploy the Canary configuration file as followspodinfo-canary.yaml, complete the complete progressive grayscale process, and then introduce the monitoring indicators of application dimension to further realizeApplication aware progressive gray Publishing

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    # service port number
    port: 9898
    # container port number or name (optional)
    targetPort: 9898
    # Istio gateways (optional)
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    # Istio virtual service host names (optional)
    hosts:
    - '*'
    # Istio traffic policy (optional)
    trafficPolicy:
      tls:
        # use ISTIO_MUTUAL when mTLS is enabled
        mode: DISABLE
    # Istio retry policy (optional)
    retries:
      attempts: 3
      perTryTimeout: 1s
      retryOn: "gateway-error,connect-failure,refused-stream"
  analysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    # testing (optional)
    webhooks:
      - name: acceptance-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 30s
        metadata:
          type: bash
          cmd: "curl -sd 'test' http://podinfo-canary:9898/token | grep token"
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

Execute the following command to deploy Canary:

kubectl --kubeconfig "$USER_CONFIG" apply -f resources_canary/podinfo-canary.yaml

When canary is deployed, the flag will be namedpodinfoThe deployment of is copied aspodinfo-primary, andpodinfo-primaryExpand the capacity to the minimum pod number defined by HPA. Then step by steppodinfoThe number of pods for this deployment will be reduced to 0. in other words,podinfoWill act asGrayscale deploymentpodinfo-primaryWill act asDeployment of production version

At the same time, create 3 services——podinfopodinfo-primaryandpodinfo-canary, the first two point topodinfo-primaryThis deployment, the last one points topodinfoThis deployment.

2 upgradepodinfo

Execute the following command to change the grayscale deployment version from3.1.0Upgrade to3.1.1

kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1

3 progressive grayscale Publishing

At this time, flagger will start to implement the progressive gray publishing process as described in the first part of this series. Here, the main processes are briefly described as follows:

  1. Gradually expand gray pod and verify
  2. Progressive flow cutting, verification
  3. Rolling upgrade production deployment and verification
  4. 100% cut back to production
  5. Reduced gray pod to 0

We can observe the process of progressive flow cutting through the following commands:

while true; do kubectl --kubeconfig "$USER_CONFIG" -n test describe canary/podinfo; sleep 10s;done

The log information output is as follows:

Events:
  Type     Reason  Age                From     Message
  ----     ------  ----               ----     -------
  Warning  Synced  39m                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
  Normal   Synced  38m (x2 over 39m)  flagger  all the metrics providers are available!
  Normal   Synced  38m                flagger  Initialization done! podinfo.test
  Normal   Synced  37m                flagger  New revision detected! Scaling up podinfo.test
  Normal   Synced  36m                flagger  Starting canary analysis for podinfo.test
  Normal   Synced  36m                flagger  Pre-rollout check acceptance-test passed
  Normal   Synced  36m                flagger  Advance podinfo.test canary weight 10
  Normal   Synced  35m                flagger  Advance podinfo.test canary weight 20
  Normal   Synced  34m                flagger  Advance podinfo.test canary weight 30
  Normal   Synced  33m                flagger  Advance podinfo.test canary weight 40
  Normal   Synced  29m (x4 over 32m)  flagger  (combined from similar events): Promotion completed! Scaling down podinfo.test

Corresponding kiali view (optional), as shown in the following figure:

Flag on ASM - progressive gray release series 3 progressive gray release based on mixerless telemetry

So far, we have completed a complete progressive gray publishing process. Here is the extended reading.

Application level expansion and contraction in gray scale

After completing the above progressive grayscale publishing process, let’s look at the configuration of HPA in the Canary configuration.

  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: podinfo

This is calledpodinfoThe HPA of is the configuration of flag. When the CPU utilization of gray-scale deployment reaches99%Capacity expansion at. The complete configuration is as follows:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: podinfo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  minReplicas: 2
  maxReplicas: 4
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          # scale up if usage is above
          # 99% of the requested CPU (100m)
          averageUtilization: 99

In the previous article, we described the practice of application level capacity expansion and reduction. Here, we apply it to the process of gray publishing.

1 sense HPA using QPS

Execute the following command to deployHPA aware of the number of application requests, implemented inQPSachieve10Expand the capacity when (for the complete script, see:advanced\_canary.sh):

kubectl --kubeconfig "$USER_CONFIG" apply -f resources_hpa/requests_total_hpa.yaml

Accordingly, the Canary configuration is updated to:

  autoscalerRef:
    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    name: podinfo-total

2 upgradepodinfo

Execute the following command to change the grayscale deployment version from3.1.0Upgrade to3.1.1

kubectl --kubeconfig "$USER_CONFIG" -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1

3 verify progressive gray release and HPA

Command to observe the process of progressive flow cutting:

while true; do k -n test describe canary/podinfo; sleep 10s;done

During progressive grayscale publishing (when it appears)Advance podinfo.test canary weight 10After receiving the information (see the figure below), we use the following command to send a request from the portal gateway to add QPS:

INGRESS_GATEWAY=$(kubectl --kubeconfig $USER_CONFIG -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
hey -z 20m -c 2 -q 10 http://$INGRESS_GATEWAY

Use the following command to observe the progress of progressive grayscale Publishing:

watch kubectl --kubeconfig $USER_CONFIG get canaries --all-namespaces

Use the following command to observe the change in the number of copies of HPA:

watch kubectl --kubeconfig $USER_CONFIG -n test get hpa/podinfo-total

As shown in the following figure, in the process of progressive gray-scale publishing, when the flow is cut to 30%, the number of copies of gray-scale deployment is 4:

Flag on ASM - progressive gray release series 3 progressive gray release based on mixerless telemetry

Application level monitoring index in gray scale

On the basis of completing the application level expansion and reduction in the above gray level, finally, let’s look at the configuration of metrics in the above Canary configuration:

  analysis:
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    # testing (optional)

1. Flag built-in monitoring indicators

So far, the metrics configuration used in canary has been two of flagger’sBuilt in monitoring indicators: request success rate(request-success-rate)And request delay(request-duration)。 As shown in the following figure, the definitions of built-in monitoring indicators in different platforms in Flager, in which istio uses the telemetry data related to mixerless telemetry introduced in the first part of this series.

Flag on ASM - progressive gray release series 3 progressive gray release based on mixerless telemetry

2. User defined monitoring indicators

In order to show the more flexibility that telemetry data brings to the verification of gray environment in the process of gray publishing, weistio_requests_totalFor example, create anot-found-percentageofMetricTemplateCount the proportion of the number of 404 error codes returned by the request to the total number of requests

configuration filemetrics-404.yamlAs follows (for the complete script, see:advanced\_canary.sh):

apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: not-found-percentage
  namespace: istio-system
spec:
  provider:
    type: prometheus
    address: http://prometheus.istio-system:9090
  query: |
    100 - sum(
        rate(
            istio_requests_total{
              reporter="destination",
              destination_workload_namespace="{{ namespace }}",
              destination_workload="{{ target }}",
              response_code!="404"
            }[{{ interval }}]
        )
    )
    /
    sum(
        rate(
            istio_requests_total{
              reporter="destination",
              destination_workload_namespace="{{ namespace }}",
              destination_workload="{{ target }}"
            }[{{ interval }}]
        )
    ) * 100

Execute the following command to create the above metrictemplate:

k apply -f resources_canary2/metrics-404.yaml

Accordingly, the configuration of metrics in canary is updated to:

  analysis:
    metrics:
      - name: "404s percentage"
        templateRef:
          name: not-found-percentage
          namespace: istio-system
        thresholdRange:
          max: 5
        interval: 1m

3 final verification

Finally, we execute the complete experimental script at one time.scriptadvanced_canary.shThe schematic diagram is as follows:

#!/usr/bin/env sh
SCRIPT_PATH="$(
    cd "$(dirname "$0")" >/dev/null 2>&1
    pwd -P
)/"
cd "$SCRIPT_PATH" || exit

source config
alias k="kubectl --kubeconfig $USER_CONFIG"
alias m="kubectl --kubeconfig $MESH_CONFIG"
alias h="helm --kubeconfig $USER_CONFIG"

echo "#### I Bootstrap ####"
echo "1 Create a test namespace with Istio sidecar injection enabled:"
k delete ns test
m delete ns test
k create ns test
m create ns test
m label namespace test istio-injection=enabled

echo "2 Create a deployment and a horizontal pod autoscaler:"
k apply -f $FLAAGER_SRC/kustomize/podinfo/deployment.yaml -n test
k apply -f resources_hpa/requests_total_hpa.yaml
k get hpa -n test

echo "3 Deploy the load testing service to generate traffic during the canary analysis:"
k apply -k "https://github.com/fluxcd/flagger//kustomize/tester?ref=main"

k get pod,svc -n test
echo "......"
sleep 40s

echo "4 Create a canary custom resource:"
k apply -f resources_canary2/metrics-404.yaml
k apply -f resources_canary2/podinfo-canary.yaml

k get pod,svc -n test
echo "......"
sleep 120s

echo "#### III Automated canary promotion ####"

echo "1 Trigger a canary deployment by updating the container image:"
k -n test set image deployment/podinfo podinfod=stefanprodan/podinfo:3.1.1

echo "2 Flagger detects that the deployment revision changed and starts a new rollout:"

while true; do k -n test describe canary/podinfo; sleep 10s;done

Execute the complete experiment script with the following command:

sh progressive_delivery/advanced_canary.sh

The experimental results are as follows:


#### I Bootstrap ####
1 Create a test namespace with Istio sidecar injection enabled:
namespace "test" deleted
namespace "test" deleted
namespace/test created
namespace/test created
namespace/test labeled
2 Create a deployment and a horizontal pod autoscaler:
deployment.apps/podinfo created
horizontalpodautoscaler.autoscaling/podinfo-total created
NAME            REFERENCE            TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
podinfo-total   Deployment/podinfo   <unknown>/10 (avg)   1         5         0          0s
3 Deploy the load testing service to generate traffic during the canary analysis:
service/flagger-loadtester created
deployment.apps/flagger-loadtester created
NAME                                      READY   STATUS     RESTARTS   AGE
pod/flagger-loadtester-76798b5f4c-ftlbn   0/2     Init:0/1   0          1s
pod/podinfo-689f645b78-65n9d              1/1     Running    0          28s

NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/flagger-loadtester   ClusterIP   172.21.15.223   <none>        80/TCP    1s
......
4 Create a canary custom resource:
metrictemplate.flagger.app/not-found-percentage created
canary.flagger.app/podinfo created
NAME                                      READY   STATUS    RESTARTS   AGE
pod/flagger-loadtester-76798b5f4c-ftlbn   2/2     Running   0          41s
pod/podinfo-689f645b78-65n9d              1/1     Running   0          68s

NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
service/flagger-loadtester   ClusterIP   172.21.15.223   <none>        80/TCP    41s
......
#### III Automated canary promotion ####
1 Trigger a canary deployment by updating the container image:
deployment.apps/podinfo image updated
2 Flagger detects that the deployment revision changed and starts a new rollout:

Events:
  Type     Reason  Age                  From     Message
  ----     ------  ----                 ----     -------
  Warning  Synced  10m                  flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
  Normal   Synced  9m23s (x2 over 10m)  flagger  all the metrics providers are available!
  Normal   Synced  9m23s                flagger  Initialization done! podinfo.test
  Normal   Synced  8m23s                flagger  New revision detected! Scaling up podinfo.test
  Normal   Synced  7m23s                flagger  Starting canary analysis for podinfo.test
  Normal   Synced  7m23s                flagger  Pre-rollout check acceptance-test passed
  Normal   Synced  7m23s                flagger  Advance podinfo.test canary weight 10
  Normal   Synced  6m23s                flagger  Advance podinfo.test canary weight 20
  Normal   Synced  5m23s                flagger  Advance podinfo.test canary weight 30
  Normal   Synced  4m23s                flagger  Advance podinfo.test canary weight 40
  Normal   Synced  23s (x4 over 3m23s)  flagger  (combined from similar events): Promotion completed! Scaling down podinfo.test

Copyright notice:The content of this article is spontaneously contributed by Alibaba cloud real name registered users, and the copyright belongs to the original author. Alibaba cloud developer community does not own its copyright or bear corresponding legal liabilities. Please refer to Alibaba cloud developer community user service agreement and Alibaba cloud developer community intellectual property protection guidelines for specific rules. If you find any content suspected of plagiarism in the community, fill in the infringement complaint form to report. Once verified, the community will immediately delete the content suspected of infringement.