Aurora notes build UMS private cloud file server


Aurora notes build UMS private cloud file server
Aurora Senior Engineer – Hu Guanjun

1、 Background

Because ums5 In version 1, SMS signature, e-mail support for uploading local pictures / attachments, and subsequent scenarios that may require a large amount of file storage, so it is necessary to build a private cloud’s own file server, and the server should also be compatible with the client file server (Note: the client file server is generally compatible with S3 protocol)

2、 Research file server

After various investigations, type selection and group discussion, Minio was finally selected

1. Introduction to Minio

Minio is based on Apache license v2 0 open source protocol object storage service developed in go language. It is compatible with Amazon S3 cloud storage service interface and is very suitable for storing large capacity unstructured data, such as pictures, videos, log files, backup data and container / virtual machine images. An object file can range from a few KB to a maximum of 5t. Minio is a very lightweight service that can be easily combined with other applications, such as nodejs, redis or mysql.

2. Minio advantages

Compatible with Amazon S3

Minio uses the Amazon S3 V2 / V4 API.

data protection

Minio uses erasure code to prevent hardware failures. Even if more than half of the hard disk is damaged, data can still be recovered from it.

Highly available

Minio servers can tolerate up to (n / 2) – 1 node failures in distributed systems.

Lambda calculation

Minio server triggers lambda function through its event notification service compatible with AWS SNS / SQS. The supported targets are message queues, such as Kafka, AMQP, elasticsearch, redis, MySQL and other databases.

Encryption and tamper proof

Minio provides confidentiality, integrity and authenticity guarantee for encrypted data, and the performance overhead is minimal. Support server-side and client-side encryption using aes-256-gcm, cha20-poly1305 and aes-cbc.

Dockable back-end storage

In addition to Minio’s own file system, it also supports DAS, jbods, NAS, Google cloud storage and azure blob storage.

SDK support

Based on the lightweight feature of Minio, it is supported by SDK in languages such as Java, python or go


In the distributed and stand-alone mode, all read and write operations of Minio strictly follow the read after write consistency model.

3. Minio architecture diagram

Minio adopts a decentralized shared architecture. The object data is scattered and stored in multiple hard disks of different nodes, provides unified namespace access, and realizes load balancing among servers through web load balancer or DNS round robin.

Aurora notes build UMS private cloud file server

4. Minio storage mechanism

4.1 basic concepts

Hard disk (drive): the disk that stores data. When Minio is started, it is passed in as a parameter.
Group (set): that is, a set of drives. Distributed deployment automatically divides one or more sets according to the cluster size, and the drives in each set are distributed in different locations. An object is stored on a set.
Bucket: the logical location where file objects are stored. For the client, it is equivalent to a top-level folder where files are stored.

4.2 erasure code

Minio uses erasure code and check sum to protect data from hardware failure and silent data damage. Even if you lose half (n / 2) of your hard drives, you can still recover your data.

What is erasure code? It is a mathematical algorithm to recover lost and damaged data. Minio uses Reed Solomon code to realize erasure code. It divides the object into N / 2 data blocks and N / 2 parity blocks. This means that if there are 12 disks, an object will be divided into 6 data blocks and 6 parity blocks. You can lose any 6 disks (whether they are stored data blocks or parity blocks), and you can still recover the data from the remaining disks.

4.3 brief analysis of Reed Solomon code data recovery principle

RS encoding takes word as the encoding and decoding unit. Large data blocks are split into word with word length of W (generally 8 or 16 bits), and then word is encoded and decoded. The coding principle of data block is the same as that of word. Later, taking word as an example, the variables Di and CI will represent a word. The input data is regarded as a vector D = (D1, D2,…, DN), the encoded data is regarded as a vector (D1, D2,…, DN, C1, C2,…, CM), and the RS encoding can be regarded as a matrix operation as shown in the following (Fig. 1). On the far left of Figure 1 is the coding matrix (or generation matrix, distribution matrix). The coding matrix needs to meet the invertibility of any n * n sub matrix. In order to facilitate data storage, the upper part of the coding matrix is the unit matrix (n rows and N columns), and the lower part is the M rows and N columns matrix. The lower matrix can be Van der Monde matrix or Cauchy matrix.

Rs can tolerate up to M data blocks to be deleted. The data recovery process is as follows:

(1) Suppose D1, D4 and C2 are lost, delete the row corresponding to the lost data block / coding block from the coding matrix. (Figures 2 and 3)

(2) Since B ‘is reversible, note that the inverse matrix of B’ is (B ‘^ – 1), then B’ * (B ‘^ – 1) = I identity matrix. Left multiply B ‘inverse matrix on both sides. (Figures 4 and 5)

(3) The calculation formula of the following original data D is obtained, as shown in the following figure:

Aurora notes build UMS private cloud file server

(4) Re encode d to get the lost code.

Aurora notes build UMS private cloud file server

4.4 run Minio in erasure mode

Minio will automatically generate 12 disks. The commands are as follows:

Aurora notes build UMS private cloud file server

4.5 storage form

When data objects are stored in the Minio cluster, they are first deleted and segmented, and then scattered and stored on each hard disk. Specifically, Minio automatically generates several erasure correction groups in the cluster. Each erasure correction group contains a group of hard disks, usually 4 to 16; Partition data objects. The default strategy is to get the same number of data partitions and verification partitions; Then, the correction and deletion group corresponding to the data object is calculated by hash algorithm, and the data and verification fragments are stored on the hard disk in the correction and deletion group.

Aurora notes build UMS private cloud file server

As shown in the above figure, suppose that the erasure group in a Minio cluster contains four hard disks, a data object is named MyObject, its subordinate bucket is named mybucket, and the corresponding erasure groups obtained by hash calculation are disks 1 ~ 4. Then, under the data path of disk 1 ~ 4, mybucket / MyObject sub path will be generated. The sub path contains two files, which are XL for storing metadata information JSON and MyObject objects are the first partition part on the disk 1。 Where XL represents the default storage format of data objects in Minio.

5. Minio golang SDK is easy to use

The following example of uploading files can be run directly, and the files will be uploaded to the official Minio server

Aurora notes build UMS private cloud file server
Aurora notes build UMS private cloud file server

3、 Practical application of Minio in UMS system

1. Application system architecture

In the whole architecture, HTTP protocol is used for communication between modules, and the functions of each module are as follows:

(1) The role of Web / API server is to provide authentication and authentication of UMS system, that is, to verify the legitimacy of web client or developer API request interface;
Aurora notes build UMS private cloud file server

(2) The function of the file management server is to provide an interface for external operation of the Minio server. According to the current business requirements of the UMS system, it only provides the presignedurl for obtaining uploaded files, setting the expiration time and setting external access

Policy, create storage bucket and generate Download File URL; So what is a presignedurl? It is that the object owner uses his own security credentials to create a pre signed URL to authorize upload or download within a limited time

Load object permissions to share objects with other users. Note: even private objects can be shared with others using the presignedurl, and the maximum validity of the presignedurl is 7 days.

The file management server directly uses the official Minio API to obtain the presignedurl of uploaded files. Of course, you can also implement the presignedurl method yourself. In addition, the maximum retention time of downloading presignedurl is 7 days, which does not meet the business requirements of UMS system. Therefore, the file management server implements a method to generate the download URL itself, The expiration time of this link can be set arbitrarily, but the external access policy of the bucket must be set to public. Thus, the client can directly upload the file on the presignedurl to the Minio server, and directly download the file using the download link.

(3) The function of Minio cluster is to store entity files. The cluster adopts decentralized and non shared architecture. Each node has a peer-to-peer relationship. Connecting to any node can realize access to the cluster. Nginx is added to the front end of Minio cluster to realize reverse proxy; The communication between Minio nodes uses RPC. In addition, in addition to the SDK mentioned above, the official management Minio server also provides the form of command line and web page, as follows:

Aurora notes build UMS private cloud file server

Enter the nginx proxy IP and port number or the IP and port number of any node in the Minio cluster into the browser, and enter the Minio account name and password to log in. The interface is as follows:

Aurora notes build UMS private cloud file server

2. Specific interaction logic

Aurora notes build UMS private cloud file server

First, the client needs to request the business server (webserver / apiserver) to obtain the document for uploading files (presignedurl), and then the business server responds to an upload file URL and a download file URL. The client uses the upload URL to upload files to the file server, and uses the download URL as the file parameter at the back end of the request. For example, sending an email message supports uploading local pictures, The image uploaded to the back end can use the file download URL as a parameter.

The advantages of this scheme are as follows:

The client directly uploads files to the Minio server without going through the business server, reducing the pressure on the business server and improving availability
The database server only stores the download URL of the file to reduce the storage of the database
Support uploading large files, such as 3G or above. When the hardware performance is sufficient, the maximum single file of Minio server can reach 5T
There is no limit to the number of uploaded files
It can solve the problem of overwriting files with the same name
It can be adapted to any file server compatible with S3 protocol to meet the requirements of different customers

4、 Minio distributed deployment

Minio distributed deployment architecture
1.1 Architecture Overview

Minio cluster adopts decentralized and no sharing architecture. Each node has a peer-to-peer relationship. Connecting to any node can achieve access to the cluster. The design of maintaining peer-to-peer relationship between nodes is not the most common distributed cluster architecture. At present, the nodes of most distributed storage clusters can often be divided into multiple roles, such as the access node responsible for connecting and processing external application requests, the management node responsible for storing metadata, the actual data storage node and so on. Unlike Minio, all nodes in the Minio cluster assume multiple roles at the same time, integrating metadata storage, data storage, application access and other functions, truly realizing decentralization and complete peer-to-peer of all nodes. Its advantage is to effectively reduce the complex scheduling process in the cluster and the failure risk and performance bottleneck caused by the central node.

In the following figure, nginx agent is added to the Minio cluster:

Aurora notes build UMS private cloud file server

Deploying a Minio cluster requires only one command, but each node in the cluster must execute the same command

Aurora notes build UMS private cloud file server

Among them, the official recommended node ip should be continuous.

1.2 Minio capacity expansion scheme

Firstly, the minimalist design concept of Minio makes that the Minio distributed cluster does not support the capacity expansion mode of adding a single node to the cluster and adjusting it automatically. This is because the problems such as data balancing and erasure group division caused by adding a single node will bring complex scheduling and processing processes to the whole cluster, which is not conducive to maintenance. Therefore, Minio provides a way of peer-to-peer capacity expansion, that is, the number of nodes and disks required to be increased must be equal to the original cluster.

For example, if the original cluster contains four nodes and four disks, four nodes and four disks (or multiple thereof) must be added during capacity expansion so that the system can maintain the same data redundancy SLA, thus greatly reducing the complexity of capacity expansion. For example, after capacity expansion, the Minio cluster will not completely balance the data of all eight nodes, but treat the original four nodes as one area and the newly added four nodes as another area. When new objects are uploaded, the cluster will determine the storage area according to the proportion of available space in each area, In each region, the corresponding erasure group is still determined by hash algorithm for final storage. In addition, after a peer-to-peer expansion, the cluster can continue peer-to-peer expansion according to the expansion rules. However, for security reasons, the maximum number of nodes in the cluster shall not exceed 32.

Minio supports the command to specify a new cluster to expand the existing cluster (erasure code mode). The command line is as follows:

Aurora notes build UMS private cloud file server

Now the whole cluster has expanded 1024 disks, and the total number of disks has changed to 2048. New object upload requests will be automatically allocated to the least used cluster. With the above expansion strategy, you can expand your cluster as needed. Restarting the cluster after reconfiguration takes effect immediately in the cluster and has no impact on the existing cluster. In the above command, we can regard the original cluster as an area and the new cluster as another area. The new objects are placed in the area according to the proportion of available space in each area. In each region, the location is determined based on the deterministic hash algorithm.

Note: each area you add must have the same number of disks (erasure code set) size as the original area in order to maintain the same data redundancy SLA. For example, the first zone has 8 disks. You can expand the cluster to 16, 32 or 1024 disks. You only need to ensure that the deployed SLA is a multiple of the original zone.

The advantages and disadvantages of peer-to-peer capacity expansion are as follows:

Advantages: the configuration operation is simple and easy, and the capacity expansion can be completed through one command.

Disadvantages: ① restart is required for capacity expansion; ② There are restrictions on capacity expansion. The number of cluster nodes generally does not exceed 32. This is because the Minio cluster ensures strong consistency through distributed locks. If the number of cluster nodes is too large, maintaining strong consistency will bring performance problems.

However, when the initial storage capacity is not very large and the short-term shutdown and restart of the cluster has little impact on the business, peer-to-peer capacity expansion can be used.

matters needing attention
All nodes in distributed Minio need the same access key and secret key, that is, user name and password
The disk directory where distributed Minio stores data must be empty
The official recommendation of distributed Minio is that the production environment should have at least 4 nodes. Because there are n nodes, it is necessary to ensure that at least N / 2 nodes can be read and at least N / 2 + 1 nodes can be written
The time of distributed Minio nodes should be the same, and the machine configuration should be the same
Distributed Minio will save a data file on each disk to ensure data reliability and security

3. Specific implementation steps

Many people on the Internet deploy Minio clusters using a single script, which is very unfriendly in the actual production environment, because Minio requires each node in the cluster to execute the same command to start successfully, so the best way is to deploy Minio clusters using ansible.

3.1 installing ansible

Aurora notes build UMS private cloud file server

3.2 deploying Minio clusters using ansible

The core code written by ansible is as follows. Readers can Baidu for specific details

Aurora notes build UMS private cloud file server

3.3 configuring the nginx proxy cluster

The contents of the nginx configuration file are as follows:

Aurora notes build UMS private cloud file server

3.4 verify whether the Minio cluster is successfully deployed

On the browser, enter the address of the server where nginx is located and the listening port in the nginx configuration to access the file server web page. The effect of successful deployment is as follows:

Aurora notes build UMS private cloud file server

5、 Conclusion

The above is the main content of the development and deployment of UMS private cloud file service. The scheme has been implemented and verified. If you want to build a file server compatible with S3 protocol, this article is of reference value. Of course, due to the short time and the small initial file storage, the scheme also needs to be optimized, For example, if you want to implement the dynamic capacity expansion mechanism, you can use the official federal capacity expansion method, but this requires the introduction of etcd and more machines. In short, you still need to decide according to the specific business scenario. For example, the bigger the shoes, the better. The ones that fit are the best.

Recommended Today

More features of JavaScript console

catalogue summary console.log console.warn console.error console.table console.assert console. Group and console groupEnd conclusion summary You may have used console in JavaScript projects log。 This is a convenient way to view the value of a variable or what happens when the program is running. However, the JavaScript console object has many other functions that can […]