Why is it not recommended to deploy the database in the docker container?

Time:2021-9-22

Why is it not recommended to deploy the database in the docker container?

The original link is as follows:
https://www.toutiao.com/i6805…

In recent years, docker has great potential in the application end of the enterprise environment. I think everyone can see that the container of stateless services is a general trend. The question is whether the database as the core of the system needs to be container?

Different people may give different answers to the question of whether the database is suitable for containerization. Before answering this question, let’s take a look at some comparisons between containerized deployment database and conventional database deployment.

7 reasons why docker is not suitable for database deployment

1. Data security issues

Do not store data in containers, which is also one of the official container usage tips of docker. The container can be stopped, or deleted at any time. When the container is RM dropped, the data in the container will be lost. To avoid data loss, users can use data volume mount to store data. However, the volumes of the container is designed to provide persistent storage around the union FS image layer, and the data security is not guaranteed. If the container crashes suddenly and the database does not shut down normally, the data may be damaged. In addition, the shared data volume group in the container will cause great damage to the hardware of the physical machine.

Even if you want to store docker data on the host, it still can’t guarantee no data loss. Docker volumes is designed to provide persistent storage around the union FS mirror layer, but it still lacks assurance.

Using the current storage driver, docker is still at risk of unreliability. If the container crashes and the database does not shut down properly, the data may be corrupted.

2. Performance issues

As we all know, MySQL is a relational database with high IO requirements. When a physical machine runs more than one, IO will accumulate, resulting in io bottleneck, which greatly reduces the read and write performance of MySQL.

In a special session on the top ten difficulties of docker application, an architect of a state-owned bank also proposed: “The performance bottleneck of the database usually appears in io. If you follow the idea of docker, the final IO requests of multiple dockers will appear in storage. Now most Internet databases are based on share nothing architecture, which may also be a factor that does not consider migrating to docker.”.

For performance problems, some students may also have corresponding solutions:

(1) Separation of database program and data

If docker is used to run mysql, the database program and data need to be separated, the data is stored in the shared storage, and the program is placed in the container. If the container has an exception or MySQL service exception, a new container will be started automatically. In addition, it is recommended not to store data in the host. The host and container share volume groups, which has a great impact on the damage of the host.

(2) Run lightweight or distributed databases

When a lightweight or distributed database is deployed in docker, docker itself recommends that the service hang up and automatically start a new container instead of restarting the container service.

(3) Rational layout and Application

For applications or services with high IO requirements, it is more appropriate to deploy the database in the physical machine or KVM. At present, tdsql of TX cloud and oceanbase of Alibaba are deployed directly on physical machines rather than dockers.

3. Network problems

To understand docker network, you must have a deep understanding of network virtualization. You must also be prepared to deal with unexpected situations. You may need to fix bugs without support or additional tools.

We know that databases need dedicated and persistent throughput to achieve higher load. We also know that the container is an isolation layer behind the hypervisor and the host virtual machine. However, network is very important for database replication, which requires a 24 / 7 stable connection between master and slave databases. The unresolved docker network problem is still unresolved in version 1.9.

Putting these problems together, containerization makes database containers difficult to manage. I know you are a top engineer. Any problem can be solved. However, how much time do you need to spend solving docker network problems? Wouldn’t it be better to put the database in a dedicated environment? Save time to focus on really important business goals.

4. Status

Packaging stateless services in docker is cool. It can choreograph containers and solve single point of failure. But what about the database? If the database is placed in the same environment, it will be stateful and make the scope of system failure larger. The next time your application instance or application crashes, it may affect the database.

The horizontal scaling of knowledge points in docker can only be used for stateless computing services, not databases.

An important feature of docker’s rapid expansion is stateless. Those with data status are not suitable to be placed directly in docker. If databases are installed in docker, storage services need to be provided separately.

At present, both tdsql (Financial distributed database) of TX cloud and oceanbase (distributed database system) of Alibaba cloud are running directly on physical machines, not on dockers that are easy to manage.

5. Resource isolation

In terms of resource isolation, docker is indeed inferior to the virtual machine KVM. Docker uses CGroup to limit resources, which can only limit the maximum resource consumption, but can not isolate other programs from occupying their own resources. If other applications occupy physical machine resources excessively, the reading and writing efficiency of MySQL in the container will be affected.

The more isolation levels are required, the more resource overhead is obtained. Compared with the dedicated environment, easy horizontal scaling is a major advantage of docker. However, in docker, horizontal scaling can only be used for stateless computing services, and the database is not applicable.

We don’t see any isolation for the database, so why should we put it in a container?

6. Inapplicability of cloud platform

Most people start projects through the shared cloud. The cloud simplifies the complexity of virtual machine operation and replacement, so there is no need to test the new hardware environment at night or on weekends. When we can start an instance quickly, why do we need to worry about the environment in which the instance runs?

That’s why we pay a lot to cloud providers. When we place the database container for the instance, the above convenience does not exist. Because the data does not match, the new instance will not be compatible with the existing instance. If you want to restrict the instance from using stand-alone services, you should let the DB use a non container environment. We only need to reserve the ability of elastic expansion for the computing service layer.

7. Environment requirements for running database

You often see DBMS containers and other services running on the same host. However, the hardware requirements of these services are very different.

Databases (especially relational databases) have high requirements for Io. General database engines use a dedicated environment to avoid concurrent resource competition. If you put your database in a container, you will waste the resources of your project. Because you need to configure a lot of additional resources for this instance. In the public cloud, when you need 34g of memory, the instance you start must have 64g of memory. In practice, these resources are not fully used.

How? You can design hierarchically and use fixed resources to start multiple instances at different levels. Horizontal stretching is always better than vertical stretching.

summary

To solve the above problem, does it mean that the database must not be deployed in the container?

The answer is: No

We can digitize the services (search and buried points) that are not sensitive to data loss, and use database fragmentation to increase the number of instances, so as to increase the throughput.

Docker is suitable for running lightweight or distributed databases. When the docker service hangs up, it will automatically start a new container instead of restarting the container service.

Using middleware and containerization system, the database can automatically scale, disaster recovery, switch, bring multiple nodes, and can also be containerized.

Why is it not recommended to deploy the database in the docker container?