From getting started to getting started: what is k8s persistent volume?

Time:2022-5-26


This article is one of a series of articles introducing the basic concepts of kubernetes. In the first article, we briefly introduced persistent volumes. In this article, we’ll learn how to set up data persistence and write a kubernetes script to connect our pod to a persistent volume. In this example, azure file storage will be used to store data from our mongodb database, but you can use any type of volume to achieve the same results (such as azure disk, GCE persistent disk, AWS elastic block storage, etc.).

If you want to fully understand other concepts of k8s, you canTo view previously published articles first.

Please note: the scripts provided in this article are not limited to a certain platform, so you can practice this tutorial with other types of cloud providers or local clusters with k3s. This article recommends using k3s because it is very light, and all dependencies are packaged in a single binary, with a packaging size of less than 100MB. It is also a highly available certified kubernetes distribution for production workloads in resource constrained environments. For more information, please check the official documentation:

https://docs.rancher.cn/k3s/

preparation in advance

Before starting this tutorial, make sure docker is installed. Install kubectl at the same time (if not, please visit the following link to install:

https://kubernetes.io/docs/tasks/tools/#install-kubectl-on-windows

At kubectl cheat sheet The kubectl command used throughout this tutorial can be found in:

https://kubernetes.io/docs/reference/kubectl/cheatsheet/

In this tutorial, we will use Visual Studio code, and you can also use other editors.

What problems can kubernetes persistent volumes solve?

Remember, we have a node (hardware device or virtual machine) and inside the node, we have a pod (or multiple pods), in which we have containers. The status of pods is temporary, so they are haunted (often deleted or rescheduled, etc.). In this case, if you want to save the data in the pod after it is deleted, you need to move the data outside the pod. In this way, it can exist independently of any pod. This external location is called a volume and is an abstraction of the storage system. With volumes, you can maintain a persistent state across multiple pods.

When to use persistent volumes

When containers began to be widely used, they were designed to support stateless workloads, and their persistent data was stored elsewhere. Since then, many efforts have been made to support stateful applications in container ecosystems.

Each project requires some kind of data persistence, so you usually need a database to store the data. But in simple design, you don’t want to rely on specific implementation; You want to write an application that is as reusable and platform independent as possible.

It has always been necessary to hide the details of the storage implementation from the application. But now, in the era of cloud native applications, in the environment created by cloud providers, applications or users who want to access data need to be integrated with specific storage systems. For example, many applications directly use specific storage systems, such as Amazon S3, azurefile, or block storage, which creates unhealthy dependencies. Kubernetes is trying to change this by creating an abstraction called persistent volumes, which allows cloud native applications to connect to various cloud storage systems without establishing explicit dependencies with these systems. This can make the consumption of cloud storage more seamless and eliminate integration costs. It also makes it easier to migrate the cloud and adopt a multi cloud strategy.

Even if sometimes, due to the limitations of objective conditions such as money, time or manpower, you need to make some compromises to directly couple your application with a specific platform or provider, you should try to avoid as many direct dependencies as possible. One way to decouple the application from the actual database implementation (there are other solutions, but these solutions are more complex) is to use containers (and persistent volumes to prevent data loss). In this way, your application will rely on abstractions rather than specific implementations.

The real question now is whether we should always use containerized databases with persistent volumes, or which storage system types should not be used in containers?

There is no general golden rule when to use persistent volumes, but as a starting point, you should consider scalability and the handling of node loss in the cluster.

According to scalability, we can have two types of storage systems:

  • Vertical scaling – includes traditional RDMS solutions such as mysql, PostgreSQL, and SQL server
  • Horizontal scaling – includes “NoSQL” solutions, such as elasticsearch or Hadoop based solutions

Vertical scaling solutions such as mysql, Postgres and Microsoft SQL should not enter the container. These database platforms require high I / O, shared disks, block storage, etc., and cannot gracefully handle node loss in clusters, which usually occurs in container based ecosystems.

For horizontally scalable applications (elastic, Cassandra, Kafka, etc.), you should use containers because they can withstand the loss of nodes in the database cluster and the database application can be rebalanced independently.

Generally, you can and should containerize distributed databases, which use redundant storage technology and can withstand the loss of nodes in the database cluster (elastic search is a good example).

Type of kubernetes persistent volume

Kubernetes volumes can be classified according to their lifecycle and configuration.

Considering the life cycle of volumes, we can divide them into:

  • Temporary volumes, that is, they are closely coupled with the life cycle of the node (such as expertdir or hostpath). If the node fails, their number of cross-sections will be deleted.
  • Persistent volumes, i.e. long-term storage, are independent of PPD or node life cycle. These can be cloud volumes (such as gcepersistent disk, awselassicblockstore, azurefile, or azuredisk), NFS (network file system), or persistent
    Volume claim (a series of abstractions to connect to the underlying cloud to provide storage volumes).

According to the configuration mode of volumes, we can divide them into:

  1. Direct access
  2. Static configuration
  1. Dynamic configuration

Direct access to persistent volumes

In this case, the pod will be directly coupled to the volume, so it will know the storage system (for example, the pod will be coupled to the azure storage account). The solution is cloud independent and depends on implementation rather than abstraction. Therefore, try to avoid such solutions if possible. Its only advantage is that it is fast. Create a secret in pod and specify the secret that should be used and the exact storage type.

Create a secret script as follows:

apiVersion: v1  
kind: Secret  
metadata:  
  name: static-persistence-secret  
type: Opaque  
data:  
  azurestorageaccountname: "base64StorageAccountName"  
  azurestorageaccountkey: "base64StorageAccountKey"

In line 2 of our script, we specify any resource type. In this case, we call it secret. In line 4, we give it a name (we call it static because it is created manually by the administrator, not automatically). From the perspective of kubernetes, the opaque type means that the content (data) of the secret is unstructured (it can contain any key value pair). To learn more about kubernetes secrets, see secrets design document and configure kubernetes secrets.

https://github.com/kubernetes/community/blob/master/contributors/design-proposals/auth/secrets.md

https://kubernetes.io/docs/concepts/configuration/secret/

In the data part, we must specify the account name (in azure, it is the name of the storage account) and access key (in azure, select “Settings” and access key under the storage account). Don’t forget that both should be encoded using Base64.

The next step is to modify our deployment script to use the volume (in this case, the volume is azure file storage).

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: user-db-deployment  
spec:  
  selector:  
    matchLabels:  
      app: user-db-app  
  replicas: 1  
  template:  
    metadata:  
      labels:  
        app: user-db-app  
    spec:  
      containers:  
        - name: mongo  
          image: mongo:3.6.4  
          command:  
            - mongod  
            - "--bind_ip_all"  
            - "--directoryperdb"  
          ports:  
            - containerPort: 27017  
          volumeMounts:  
            - name: data  
              mountPath: /data/db  
          resources:  
            limits:  
              memory: "256Mi"  
              cpu: "500m"  
      volumes:  
        - name: data  
          azureFile:  
            secretName: static-persistence-secret  
            shareName: user-mongo-db  
            readOnly: false

We can see that the only difference is that from line 32, we specify the volume to be used, give it a name, and specify the exact details of the underlying storage system. Secretname must be the name of a previously created secret.

Kubernetes storage class

To understand static or dynamic configuration, we must first understand the kubernetes storage class.

Storageclass allows administrators to provide profiles or “classes” of available storage. Different classes may map to different quality of service levels, or backup policies, or any policy determined by the Cluster Administrator.

For example, you can have a profile that stores data on an HDD, named slow storage, or a profile that stores data on an SSD, named fast storage. The type of these stores is determined by the supplier. For azure, there are two providers: azurefile and azuredisk (the difference is that azurefile can be used with read write many access mode, while azuredisk only supports read write once access. This may be a disadvantage when you want to use multiple pods at the same time). Here you can learn about different types of storage classes:

https://kubernetes.io/docs/concepts/storage/storage-classes/

The following is the script for the storage class:

kind: StorageClass  
apiVersion: storage.k8s.io/v1  
metadata:  
  name: azurefilestorage  
provisioner: kubernetes.io/azure-file  
parameters:  
  storageAccount: storageaccountname  
reclaimPolicy: Retain  
allowVolumeExpansion: true

Kubernetes predefines the value of the provider attribute (see kubernetes storage class). Retention recycling policy means that after we delete PVC and PV, the actual storage media is not cleared. We can set it to delete and use this setting. Once the PVC is deleted, it will also trigger the deletion of the corresponding PV and the actual storage medium (the actual storage here is azure file storage).

Persistent volume claim

Kubernetes has a matching primitive for each traditional storage operation activity (supply / configuration / add). The persistent volume is provisioned, the storage class is being configured, and the persistent volume claim is attached.

From initial document:

*Persistent volume (PV) is the storage in the cluster. It has been configured by the administrator or dynamically using storage classes.
Persistent volume claim (PVC) is a user stored request. It is similar to pod. The consumption of node resources by pod is similar to that of PV resources by PVC. Pod can request specific resource levels (CPU and memory). Claims can request specific sizes and access modes (for example, they can be installed read / write once or read-only multiple times).
This means that the administrator will create a persistent volume to specify the storage size, access mode, and storage type that the pod can use. The developer will create a persistent volume claim that requires a volume, access rights, and storage type. In this way, there is a clear distinction between “development side” and “operation and maintenance side”. The developer is responsible for requesting the necessary volume (PVC), and the operation and maintenance personnel are responsible for preparing and configuring the required volume (PV).
The difference between static and dynamic configurations is that kubernetes will try to create these resources automatically if there are no persistent volumes and secrets manually created by the administrator*

Dynamic configuration

In this case, there are no manually created persistent volumes and secrets, so kubernetes will try to generate them. Storage class is necessary. We will use the storage class created in the previous article.

The script of persistentvolumeclaim is as follows:

apiVersion: v1  
kind:Persistent Volume Claim  
metadata:  
  name: persistent-volume-claim-mongo  
spec:  
  accessModes:  
    - ReadWriteMany  
  resources:  
    requests:  
      storage: 1Gi  
  storageClassName: azurefilestorage

And our updated deployment script:

apiVersion: apps/v1  
kind: Deployment  
metadata:  
  name: user-db-deployment  
spec:  
  selector:  
    matchLabels:  
      app: user-db-app  
  replicas: 1  
  template:  
    metadata:  
      labels:  
        app: user-db-app  
    spec:  
      containers:  
        - name: mongo  
          image: mongo:3.6.4  
          command:  
            - mongod  
            - "--bind_ip_all"  
            - "--directoryperdb"  
          ports:  
            - containerPort: 27017  
          volumeMounts:  
            - name: data  
              mountPath: /data/db  
          resources:  
            limits:  
              memory: "256Mi"  
              cpu: "500m"  
      volumes:  
        - name: data  
          Persistent Volume Claim:  
            claimName: persistent-volume-claim-mongo

As you can see, in line 34, we refer to the previously created PVC by name. In this case, we did not manually create a persistent volume or secret for it, so it will be created automatically.

The most important advantage of this method is that you don’t have to create PV and secret manually, and the deployment is cloud independent. The stored underlying details do not exist in the spec of the pod. However, there are some disadvantages: you cannot configure storage accounts or file shares because they are automatically generated, and you cannot reuse PV or secret – they will be regenerated for each new claim.

Static configuration

The only difference between static and dynamic configuration is that we manually create persistent volumes and secret in static configuration. In this way, we can fully control the resources created in the cluster.

The persistent volume script is as follows:

apiVersion: v1  
kind: PersistentVolume  
metadata:  
  name: static-persistent-volume-mongo  
  labels:  
    storage: azurefile  
spec:  
  capacity:  
    storage: 1Gi  
  accessModes:  
    - ReadWriteMany  
  storageClassName: azurefilestorage  
  azureFile:  
    secretName: static-persistence-secret  
    shareName: user-mongo-db  
    readOnly: false

Importantly, in line 12, we refer to storage class by name. In addition, in line 14, we refer to secret to access the underlying storage system.

This article recommends this solution. Even though it requires more work, it is cloud agnostic. It also allows you to apply separation of concerns about roles (Cluster Administrator and developer) and gives you control over naming and creating resources.

summary

In this paper, we understand how to use volume to persist data and state, propose three different methods to set up the system, namely direct access, dynamic configuration and static configuration, and discuss the advantages and disadvantages of each system.

Introduction to the author

Czako Zoltan, an experienced full stack developer, has rich experience in many fields including front-end, back-end, Devops, Internet of things and artificial intelligence.

Recommended Today

Fifth anniversary of excellize open source

Series articles: Excel releases version 2.4.1 and adds concurrency security support Fifth anniversary of excellize open source Excelize (github.com/xuri/excelize)Since the foundation library was opened to the public in 2016, it has become a popular choice for cloud native applications, especially for go language developers when dealing with spreadsheets and office documents. Excellize foundation library is […]