K8s and etcd – backup etcd data to S3

Time:2019-3-11

Preface

Many components of the whole k8s are almost stateless. All data are stored in etcd. etcd can be said to be the database of the whole k8s cluster. It is conceivable that the importance of etcd. Therefore, it is very important to do a good job of etcd data backup. This article mainly talks about our related practice.

Backup etcd data to S3

There are many backup schemes that can do etcd, but they are similar, basically using etcdctl command to complete.

Why s3?

  • Because our unit uses AWS a lot, and we want us to back up to a highly available storage instead of deploying etcd locally.
  • In addition, S3 supports the lifecycle settings of storage. Once set up, AWS can help us regularly delete old data and retain new backup data.

Specific plan

We basically used the etcd-backup project, of course fork, and made a slight change, mainly to change the docker file. Modify etcdctl to our actual online version.

The modified dockerfile is as follows:

FROM alpine:3.8

RUN apk add --no-cache curl

# Get etcdctl
ENV ETCD_VER=v3.2.24
RUN \
  cd /tmp && \
  curl -L https://storage.googleapis.com/etcd/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz | \
  tar xz -C /usr/local/bin --strip-components=1

COPY ./etcd-backup /
ENTRYPOINT ["/etcd-backup"]
CMD ["-h"]

Then there are docker builds and the like.

K8s deployment scheme

It is more appropriate to choose cronjob in k8s. My backup strategy is to backup every three hours.

cronjob.yaml:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: "0 */4 * * *"
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 2
  jobTemplate:
    spec:
      # Job timeout
      activeDeadlineSeconds: 300
      template:
        spec:
          tolerations:
          # Tolerate master taint
          - key: node-role.kubernetes.io/master
            operator: Exists
            effect: NoSchedule
          # Container creates etcd backups.
          # Run container in host network mode on G8s masters
          # to be able to use 127.0.0.1 as etcd address.
          # For etcd v2 backups container should have access
          # to etcd data directory. To achive that,
          # mount /var/lib/etcd3 as a volume.
          nodeSelector:
            node-role.kubernetes.io/master: ""
          containers:
          - name: etcd-backup
            image: iyacontrol/etcd-backup:0.1
            args:
            # backup guest clusters only on production instalations
            # testing installation can have many broken guest clusters
            - -prefix=k8s-prod-1
            - -etcd-v2-datadir=/var/lib/etcd
            - -etcd-v3-endpoints=https://172.xx.xx.221:2379,https://172.xx.xx.83:2379,https://172.xx.xx.246:2379
            - -etcd-v3-cacert=/certs/ca.crt
            - -etcd-v3-cert=/certs/server.crt
            - -etcd-v3-key=/certs/server.key
            - -aws-s3-bucket=mybucket
            - -aws-s3-region=us-east-1
            volumeMounts:
            - mountPath: /var/lib/etcd
              name: etcd-datadir
            - mountPath: /certs
              name: etcd-certs
            env:
              - name: ETCDBACKUP_AWS_ACCESS_KEY
                valueFrom:
                  secretKeyRef:
                    name: etcd-backup
                    key: ETCDBACKUP_AWS_ACCESS_KEY
              - name: ETCDBACKUP_AWS_SECRET_KEY
                valueFrom:
                  secretKeyRef:
                    name: etcd-backup
                    key: ETCDBACKUP_AWS_SECRET_KEY
              - name: ETCDBACKUP_PASSPHRASE
                valueFrom:
                  secretKeyRef:
                    name: etcd-backup
                    key: ETCDBACKUP_PASSPHRASE
          volumes:
          - name: etcd-datadir
            hostPath:
              path: /var/lib/etcd
          - name: etcd-certs
            hostPath:
              path: /etc/kubernetes/pki/etcd/
          # Do not restart pod, job takes care on restarting failed pod.
          restartPolicy: Never
          hostNetwork: true
   

Note: Together with nodeselector, allow pod to be scheduled on the master node.

Then secret.yaml:

apiVersion: v1
kind: Secret
metadata:
  name: etcd-backup
  namespace: kube-system
type: Opaque
data:
  ETCDBACKUP_AWS_ACCESS_KEY: QUtJTI0TktCT0xQRlEK
  ETCDBACKUP_AWS_SECRET_KEY: aXJ6eThjQnM2MVRaSkdGMGxDeHhoeFZNUDU4ZGRNbgo=
  ETCDBACKUP_PASSPHRASE: ""

summary

We tried etcd-operator to complete backup before. In the actual use process, it is found that it is not good, there are many concepts, components are complex, and many codes are too dead.

Finally, select etcd-backup. Mainly because of simplicity, less is more. Look at the source code, write with golang, expand some of their own needs, but also relatively simple.