Refuse to delete library and run! Getting started with docker container data management


Refuse to delete library and run! Getting started with docker container data management

Refuse to delete library and run! Getting started with docker container data management

This article is written by members of Tuque communitymRcWelcome to join the Tuque community to create wonderful free technical tutorials for the development of programming industry.

If you think our writing is good, rememberLike + follow + commentSan Lian, encourage us to write a better tutorial

Data is the core of all applications and services, especially after witnessing the tragedies caused by “deleting database and running”, we can deeply understand the importance of data storage and backup. Docker also provides us with a convenient and powerful way to process container data. In this article, we will take you throughtheoryandactual combatTo master two common data management methods of docker: volume and bind mount, so as to be able to handle data with ease and provide strong support and guarantee for your application.

Overview of docker data management

I haven’t seen you for a long time. Welcome to the docker tutorial of “dream builder series”

  • In “time for a cup of tea, start docker”, we use “work” and “dream” to compare “application development” and “deployment”. Through some small experiments, we can let you understand how docker can achieve the leap from “dream” to “dream”, and understand how docker can achieve the goalimageandcontainerTwo key concepts, and successfully containerized the first application
  • In “dreams are interlinked: container interconnection with network”, we learned that “Dreams” are interlinked, and different containers can communicate with each other through docker network

In this tutorial, we will take you to the docker data management and build a bridge between “dream” (container environment) and “reality” (host environment). There are three ways to manage docker data

  1. Data volume(volume), tooThe most recommended way
  2. Binding mount(bind mount), a data management method commonly used in the early docker
  3. TMPFS mount, memory based data management,This tutorial will not cover

be careful

TMPFS mount is only applicable to Linux operating system.

Let’s experience it through a few small experiments immediately (students who are already familiar with it can directly move to the “actual combat drill” link below).

Data volume

Basic command

As mentioned in “remember dozens of docker command tips” at the end of the last article,Data volume is also a common type of docker objectTherefore, it is also supportedcreate(create)inspect(see details)ls(list all data volumes)prune(delete useless data volume) andrm(delete) and so on.

Let’s go through a process and experience it. First, create a data volume:

docker volume create my-volume

To view all current data volumes:

docker volume ls

Output just createdmy-volumeData volume:

local               my-volume

seemy-volumeDetails of data volume:

docker volume inspect my-volume

You can see the output in JSON formatmy-volumeInformation:

        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/my-vol/_data",
        "Name": "my-volume",
        "Options": {},
        "Scope": "local"


Curious students may go to check/var/lib/docker/volumesThe answer is: for non Linux systems (windows and MAC systems), the directory does not exist in your file system, but in the docker virtual machine.

Delete lastmy-volumeData volume:

docker volume rm my-volume

It doesn’t make sense to create a data volume alone. After all, its original function is to serve the data management of the container. Please see the figure below (from safari Books Online)

Refuse to delete library and run! Getting started with docker container data management

As you can see, data volume builds a “bridge” between “host environment” and “container environment”. Usually, we write the data to be stored in the container to the path (location) where the data volume is mounted, and then immediately and automatically store the data to the corresponding area of the host.

When creating a container with data volumes, there are usually two options: 1)Named volume(Named Volume);2)Anonymous volume(Anonymous Volume)。 Next, we will explain in detail.

Create named volumes

First, let’s demonstrate how to create a container with named volumes. Run the following command:

docker run -it -v my-vol:/data --name container1 alpine

As you can see, we went through-v(or--volume)Parameter specifies the configuration of the data volume asmy-vol:/data(you should have guessed)my-volIt’s the name of the data volume,/dataIs the path of the data volume in the container.

After entering the container, we ask/dataAfter adding a file to the directory, exit:

/ # touch /data/file.txt
/ # exit

be careful

/ #Is the default command prompt for alpine image, and the followingtouch /data/file.txtThat’s the real order to execute.

For verification/dataIf the data in is really saved, we will delete itcontainer1Container, and then create a new containercontainer2, view the/dataContents:

docker rm container1
docker run -it -v my-vol:/data --name container2 alpine
/ # ls /data
/ # exit

You can see just nowcontainer1Created infile.txtFile! As a matter of fact, it’s not easy to find outSharing data volumes between containersDocker provides a convenient parameter--volumes-fromTo easily share data volumes:

docker run -it --volumes-from container2 --name container3 alpine
/ # ls /data

Again,container3The contents of the data volume can also be accessed in.

Create anonymous volume

The way to create anonymous volume is very simple. We usedmy-vol:/dataAs-vTo create an anonymous volume, simply omit the data volume name(my-volThat is to say:

docker run -v /data --name container4 alpine

We passedinspectLet’s seecontainer4The following is the case:

docker inspect container4

We can be among themMountsYou can see the following data in the field:

"Mounts": [
        "Type": "volume",
        "Name": "dfee1d707956e427cc1818a6ee6060699514102e145cde314d4d938ceb12dfd3",
        "Source": "/var/lib/docker/volumes/dfee1d707956e427cc1818a6ee6060699514102e145cde314d4d938ceb12dfd3/_data",
        "Destination": "/data",
        "Driver": "local",
        "Mode": "",
        "RW": true,
        "Propagation": ""

Let’s analyze the important fields:

  • NameThat is, the name of the data volume. Since it is an anonymous volume, theNameA field is a long string of random numbers, and a named volume is a given name
  • SourceThe storage path of the data volume in the host file system (as mentioned earlier, windows and MAC are in the docker virtual machine)
  • DestinationIs the mount point of the data volume in the container
  • RWRefers to read write, if it isfalse, is a read-only data volume

Using data volume in dockerfile

It is very easy to use data volume in dockerfileVOLUMEKey words: specify the data volume

VOLUME /data

#Or multiple data volumes can be specified by JSON array
VOLUME ["/data1", "/data2", "/data3"]

There are two points to note:

  • Can only createAnonymous volume
  • When passeddocker run -vWhen specifying a data volume, the configuration in dockerfileIt will be covered

Binding mount

Bind mount is the earliest docker data management and storage solution. Its general idea is the same as that of data volume, but it is just a direct creationNative file systemandContainer file systemIt is very suitable to transfer data between local machine and container simply and flexibly.

We can try to mount the desktop (or other path) of our machine to the container:

docker run -it --rm -v ~/Desktop:/desktop alpine

We’re still going through-vParameters,~/DesktopIs the local file system path,/desktopIs the path in the container,~/Desktop:/desktopIs to bind the local path and container path, as if to set up a bridge. there--rmThe option is to automatically delete the container after it stops (refer to the first article for more details on the container life cycle).

After entering the container, you can try it/desktopDo you have something on your desktop, and then create a file in the container to see if you have received this file on your desktop

/# ls /desktop
#A lot of things on my own desktop: D
/# touch /desktop/from-container.txt

You should be able to see more containers created on your desktopfrom-container.txtFile!


We have posted the official document as follows:

Refuse to delete library and run! Getting started with docker container data management

We can see that:

  • Data volume is a special area maintained by docker in the local file system to store container data
  • Bind mount is the mapping between container file system and local file system
  • TMPFS manages container data directly in memory

When you specify a data volume or binding mount,-vThe format of the parameter is<first_field>:<second_field>:<rw_options>(note)Separated by colon), including three fields:

  • Data volume name or local path, which can be omitted (anonymous volume if omitted)
  • The mount point (path) of the data volume in the container,Required
  • Read write option, the default is read-write, if specifiedro(read only), it is read only


Docker was introduced after version 17.06--mountParameters, functions and-v / --volumeParameters are almost the same. It is more lengthy and clearer to specify the configuration of data volume through key value pair. This article will explain the more common and common problems in detail-vParameters,--mountRefer to the documentation for more use of parameters.

Actual combat drill

Preparation and objectives

OK, finally, it’s time for the actual combat exercise – continue to deploy the full stack to-do items we have been doing before (react front end + Express back end + mongodb database). If you don’t read the previous tutorial and want to start from this step, please run the following command:

git clone -b volume-start
cd docker-dream

Based on the previous project, we intend to

  • Store and back up the log data output from the express server instead of being stored in a “life and death” container
  • Mongodb image has done data volume configuration, so we only need to practice how to back up and restore data

Mount data volume for express server

OK, we’re hereserver/DockerfileAdd inVOLUMEConfigure, and specifyLOG_PATH(log output path environment variable, refer toserver/index.jsSource code) for/var/log/server/access.logThe code is as follows:

# ...

#Specify the working directory as / usr / SRC / APP, and all the following commands operate in this directory
WORKDIR /usr/src/app

VOLUME /var/log/server

# ...

#Setting environment variables (host IP and port of server)
ENV MONGO_URI=mongodb://dream-db:27017/todos
ENV LOG_PATH=/var/log/server/access.log

# ...

Then build the server image:

docker build -t dream-server server/

After a moment, let’s get the whole project going

#Create a network for container interconnection
docker network create dream-net

#Start mongodb container (dream dB)
docker run --name dream-db --network dream-net -d mongo

#Start the express API container (dream API)
docker run -p 4000:4000 --name dream-api --network dream-net -d dream-server

#Building nginx server providing front end page of react
docker build -t dream-client client

#Start the nginx server container (client)
docker run -p 8080:80 --name client -d dream-client

adoptdocker psMake sure that all three containers are open:

Refuse to delete library and run! Getting started with docker container data management

visitlocalhost:8080, go to the to-do page and create several items:

Refuse to delete library and run! Getting started with docker container data management

Backup of log data

Previously, we stored the log data in the anonymous volume. Because it is troublesome to directly obtain the data in the data volume, the recommended method is to create a new temporary container and back up the data by sharing the data volume. Sounds a little dizzy? Please see the figure below:

Refuse to delete library and run! Getting started with docker container data management

Follow these steps:

The first step is to realizedream-apiData sharing between containers and data volumes (implemented).

The second step is to create a temporary container and get thedream-apiData volume for. Run the following command:

docker run -it --rm --volumes-from dream-api -v $(pwd):/backup alpine

The above command uses both the data volume and the binding mount described above:

  • --volumes-from dream-apiIt is used to share data volume between containers. Here we getdream-apiNew data volume
  • -v $(pwd):/backupUsed to establish the current native file path(pwdCommand acquisition) and temporary containers/backupBinding mount of path

Third, after entering the temporary container, we compress the log data into a tar package and put it in the container/backupDirectory, and then exit:

/ # tar cvf /backup/backup.tar /var/log/server/
tar: removing leading '/' from member names
/ # exit

After exiting, do you see the log backup in the current directorybackup.tar? In fact, we can do it with one command:

docker run -it --rm --volumes-from dream-api -v $(pwd):/backup alpine tar cvf /backup/backup.tar /var/log/server

If you think the above command is difficult to understand, promise me that you must take a close look at the part of “memory and sublimation” in the last article and “understanding command: the theme of dreams”!

Database backup and recovery

The following is the highlight of this article, you play the spirit of 12 points! Whether our application will encounter the crisis of deletion depends on whether you have learned the operation skills in this section!


Here we use the backup and recovery command of mongodb(mongodumpAndmongorestore), other databases (such as MySQL) have similar commands, which can be used for reference.

Backup idea 1: temporary container + container interconnection

According to the previous idea of sharing data volumes, we also try to backup data through a temporary Mongo container. The schematic diagram is as follows:

Refuse to delete library and run! Getting started with docker container data management

First, our temporary container has to be connecteddream-dbContainer, and configure the binding mount. The command is as follows:

docker run -it --rm -v $(pwd):/backup --network dream-net mongo sh

Compared with the previous backup log data, we need to connect this temporary container to thedream-netNetwork, it can accessdream-dbFor those who are not familiar with docker network, please review the previous article.

Second, after entering the temporary container, run themongodumpCommand:

/ # mongodump -v --host dream-db:27017 --archive --gzip > /backup/mongo-backup.gz

At this point, due to the binding mount, the output to the/backupWill be saved to the current directory(pwd)In the middle. After exiting, you can see it in the current directorymongo-backup.gzThe document is ready.

Backup idea 2: do a good job of binding and mounting in advance

In the “recall and sublimation” part of the previous tutorial, we understated the passagedocker execimplementmongodumpCommand to do the backup, but the output backup file still stays in the container, as long as the container is deleted, the backup file will disappear. So a very natural idea appeared: can we do a good job of binding and mounting when we create the database container, and then through themongodumpBackup data to mount area?

Refuse to delete library and run! Getting started with docker container data management

In fact, previously, when creating a database container, you ran the following command:

docker run --name dream-db --network dream-net -v $(pwd):/backup -d mongo

And then throughdocker execimplementmongodumpCommand:

docker exec dream-db sh -c 'mongodump -v --archive --gzip > /backup/mongo-backup.gz'

It’s easy to do. We use it heresh -cTo execute an entire shell command (in string form), thus avoiding the redirection character>If you don’t understand it, you cansh -c 'xxx'replace withxxx)。 As you can see,mongodumpThe command is much simpler, we don’t need to specify it any more--hostParameter, because the database is in this container.

But there is a problemIf the database has been created and the binding mount has not been done in advance, this method will not work!

Attention, this is not a drill!

With database backup files, we can do a wave of “exercises” without fear. Through the following command, the current database and API server are directly terminated:

docker rm -f --volumes dream-db
docker rm -f dream-api

Yes, through--volumesSwitch, we not only putdream-dbThe container was deleted, and the attached data volume was also deleted! The drill just needs to be realistic enough. Visit again at this timelocalhost:8080, all previous to-do data are lost!

Start the post disaster reconstruction, let’s create a new one againdream-dbContainer:

docker run --name dream-db --network dream-net -v $(pwd):/backup -d mongo

Notice that we map the current directory to the container’s/backupDirectory, which means it can be accessed in this new container/backup/mongo-backup.gzTo recover the data, run the following command:

docker exec dream-db sh -c 'mongorestore --archive --gzip < /backup/mongo-backup.gz'

We should see the output of some logs, indicating that the data recovery is successful. Finally, restart the API server:

docker run -p 4000:4000 --name dream-api --network dream-net -d dream-server

Go back to visit our to-do app, is the data all back!?

Memory and sublimation

Another way to share data: docker CP

Previously, we transferred the data of the container outside the container by sharing the data volume or binding the mount. In fact, there is another way to transfer and share data between container and local machinedocker cpOrders. Yes, if you havecpCommand copy file, its usage will not be strange. For example, we willdream-apiCopy the log file in the container to the current directory:

docker cp dream-api:/var/log/server/access.log .

Look!access.logThere it is! Of course, we can also “reverse operation” to copy local files to the container

docker cp /path/to/some/file dream-api:/dest/path

As you can see,docker cpVery convenient to use, very suitable for one-time operation. The defects are also obvious

  1. Completely manual data management
  2. You need to know the specific path of the data in the container, which is very troublesome for iterative applications
  3. It is cumbersome to realize data sharing among multiple containers

Another way of backup and recovery: docker import / export

When backing up and restoring the database, there is a more simple and crude idea: why can’t we directly back up the entire container? In fact, docker does provide us with two commands to package and load the entire container:exportandimport

For example, theThe file system of the entire containerExport astarPackage:

docker export my-container > my-container.tar

be careful

exportcommandIt will not be exportedThe contents of the container related data volume.

Then you can go through theimportCommand to create a file with exactly the same contentimage

docker import my-container.tar

importThe command will output a sha256 string, which is the UUID of the image. Then you can use itdocker runCommand to start the image (you can specify the sha256 string, or you can use thedocker tagMake a label).

If you just triedexportandimportCommand, you will find a very serious problem: the tar package after the container is packed has several hundred megabytes. Obviously, simply and crudely packing containers also includes a lot of useless data (such as other files in the operating system), and the pressure on the hard disk increases sharply.

Tracing the origin: exploring the essence of image and container (UFS)

After learning and practicing the knowledge of data volume, we also contacted itdocker cpanddocker export/importOrders. So far, we can’t help asking, what is the essence of image and container, and how is the data stored?

Or let’s ask a more specific question:Why does the data in the image (such as various files in the operating system) exist every time the container is created, while the data written after the container is created is lost after the container is deleted?

This is what docker lives onUnion File System(UFS) mechanism. Let’s have a general feeling through a picture (source: the docker ecosystem)

Refuse to delete library and run! Getting started with docker container data management

Let’s analyze the main points of the above UFS diagram

  • The whole UFS is composed of layers of content, from the underlying operating system kernel to the upper software (such as Apache server)
  • Each layer in UFS can be divided intoRead only layer(read only, that is, theOpaque box)AndWritable layer(writable, that is, theTransparent box
  • image(for example, add Apache and busybox in the figure) consists of a series ofRead only layerconstitute
  • When we are based onimageestablishcontainerThis is to add a layer on top of all the read-only layers of the mirrorWritable layerAny data changes made in the container will be recorded in the writable layer without affecting the underlying read-only layer
  • When the container is destroyed, all changes made in the writable layer will be lost

The data management skills (data volume, binding and mounting) we explained in this article completely bypass UFS and make important business data stored independently, which can be backed up and recovered, instead of being trapped in the writable layer of the container and making the whole container bulky.

Looking back at the above questions, do you have a train of thought?

Want to learn more wonderful practical technical courses? Come to Tuque community.

The source code involved in this article is all on GitHub. If you think our writing is good, I hope you can give us some advice❤️ This article likes + GitHub warehouse plus star ❤️ oh

Refuse to delete library and run! Getting started with docker container data management