In recent years, cloud computing has become the mainstream. Enterprises, proceeding from their own interests, or unwilling to be locked by a single cloud service provider, or business and data redundancy, or for cost optimization, will try to migrate some or all businesses from offline computer rooms to the cloud or from one cloud platform to another. Business migration involves data migration. As it happens, juicefs has connected various object storage APIs and realized the logic of data synchronization. Let’s understand the sync command of juicefs.
What is juicefs sync
The sync subcommand of juicefs is a fully functional data synchronization utility, which can simultaneously synchronize or migrate data between all object stores supported by juicefs. It supports not only data migration between “object store” and “juicefs”, but also data migration across clouds and regions between “object store” and “object store”. Similar to Rsync, in addition to object storage, it also supports synchronizing local directories, accessing remote directories through SSH, HDFS, WebDAV, etc., and provides advanced functions such as full synchronization, incremental synchronization, conditional pattern matching, etc.
Basic Usage
Command format
juicefs sync [command options] SRC DST
ImmediatelySRC
Sync toDST
, you can synchronize both directories and files.
Of which:
SRC
Represents the address and path of the data sourceDST
Represents the destination address and path[command options]
Represents an optional synchronization option. See detailsCommand Reference。
Address formats are[NAME://][ACCESS_KEY:[email protected]]BUCKET[.ENDPOINT][/PREFIX]
Of which:
NAME
Is the storage type, such ass3
、oss
。 View detailsAll supported storage servicesACCESS_KEY
andSECRET_KEY
Is the API access key stored by the objectBUCKET[.ENDPOINT]
Is the access address of the object storePREFIX
Is optional and defines the prefix of the directory name to be synchronized.
The following is an example of an address stored in an Amazon S3 object:
s3://ABCDEFG:[email protected]
In particular,SRC
andDST
If/
The end will be treated as a directory, for example:movies/
。 Not to/
The end will be regarded as “prefix” and will be matched according to the rules of prefix matching. For example, there is a prefix in the current directorytest
andtext
Two directories, which can be synchronized to the target path using the following command~/mnt/
:
juicefs sync ./te ~/mnt/te
In this way,sync
The command willte
The prefix matches all directories or files containing the prefix in the current path, i.etest
andtext
。 And the target path~/mnt/te
Mediumte
It is also a prefix. It will replace the prefix of all synchronized directories and files. In this example, it willte
Replace withte
, that is, keep the prefix unchanged. If you adjust the prefix of the destination path, for example, change the destination prefix toab
:
juicefs sync ./te ~/mnt/ab
Synchronized from the target pathtest
The directory name becomesabst
,text
Will becomeabxt
。
Resource list
The following storage resources are assumed:
-
Object storage a
- Bucket Name: AAA
- Endpoint:
https://aaa.s3.us-west-1.amazonaws.com
-
Object storage B
- Bucket Name: BBB
- Endpoint:
https://bbb.oss-cn-hangzhou.aliyuncs.com
-
Juicefs file system
- Metadata storage:
redis://10.10.0.8:6379/1
- Object storage:
https://ccc-125000.cos.ap-beijing.myqcloud.com
- Metadata storage:
All storedAccess keyAre:
- ACCESS_KEY:
ABCDEFG
- SECRET_KEY:
HIJKLMN
Synchronization between object storage and juicefs
Store object A’smovies
Synchronize directories to the juicefs file system:
#Mount juicefs
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
#Perform synchronization
juicefs sync s3://ABCDEFG:[email protected]/movies/ /mnt/jfs/movies/
The of the juicefs file systemimages
Synchronize directory to object store a:
#Mount juicefs
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
#Perform synchronization
juicefs sync /mnt/jfs/images/ s3://ABCDEFG:[email protected]/images/
Synchronization between object storage and object storage
Synchronize all data of object store a to object store B:
juicefs sync s3://ABCDEFG:[email protected] oss://ABCDEFG:[email protected]
Advanced Usage
Incremental synchronization and full synchronization
The sync command works in incremental synchronization mode by default, that is, first compare the differences between the source path and the target path, and then synchronize only the parts with differences. have access to--update
or-u
Options for updating filesmtime
。
For full synchronization, i.e. resynchronization regardless of whether the same file exists on the target path, you can use--force-update
or-f
。 For example, store the object A’smovies
Full synchronization of directories to the juicefs file system:
#Mount juicefs
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
#Perform full synchronization
juicefs sync --force-update s3://ABCDEFG:[email protected]/movies/ /mnt/jfs/movies/
pattern matching
sync
The pattern matching function of the command is similar to Rsync. You can exclude or include certain types of files through rules, and synchronize any set through the combination of multiple rules. The rules are as follows:
- with
/
The pattern at the end will only match the directory, otherwise it will match the file, link or device; - contain
*
、?
or[
Characters will be matched in Wildcard mode, otherwise they will be matched according to the conventional string; *
Match any non empty path component, in/
Stop matching at;?
Matching Division/
Any character outside;[
Matches a set of characters, such as[a-z]
or[[:alpha:]]
;- If there is no wildcard, it can be used to match the meaning of the wildcard, but in the case of no wildcard, it can be used to escape the meaning of the wildcard;
- Always recursively match with patterns as prefixes.
Exclude files / directories
use--exclude
Option sets the directories or files to exclude. For example, fully synchronize the juicefs file system to object store a, but do not synchronize hidden files and folders:
In Linux system, all
.
Beginning names are treated as hidden files
#Mount juicefs
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
#Full synchronization, excluding hidden files and directories
juicefs sync --exclude '.*' /mnt/jfs/ s3://ABCDEFG:[email protected]/
You can repeat this option to match more rules, such as excluding all hidden filespic/
Catalogue and4.png
File:
juicefs sync --exclude '.*' --exclude 'pic/' --exclude '4.png' /mnt/jfs/ s3://ABCDEFG:[email protected]
Include files / directories
use--include
Option sets the directories or files to include (not excluded), for example, synchronize onlypic/
and4.png
Two files, other files excluded:
juicefs sync --include 'pic/' --include '4.png' --exclude '*' /mnt/jfs/ s3://ABCDEFG:[email protected]
When using include / exclude rules, the option with the first position has higher priority.
--include
It should be in the front if it is set first--exclude '*'
Excluding all files, then the following--include 'pic/' --include '4.png'
Inclusion rules will not take effect.
Multithreading and bandwidth limitation
JuiceFS sync
By default, 10 threads are enabled to perform synchronization tasks, which can be set as needed--thread
Option to increase or decrease the number of threads.
In addition, if you need to limit the bandwidth occupied by synchronization tasks, you can set--bwlimit
Options, unitsMbps
, the default is0
That is, there are no restrictions.
Directory structure and file permissions
By default, the sync command only synchronizes file objects and directories containing file objects, and empty directories will not be synchronized. To synchronize empty directories, you can use--dirs
Options.
In addition, when synchronizing between local, SFTP, HDFS and other file systems, if you need to maintain file permissions, you can use--perms
Options.
Copy symbolic link
JuiceFS sync
stayBetween local directoriesDuring synchronization, settings are supported--links
Option turns on the ability to synchronize itself rather than the object it points to when a symbol chain is encountered. The path pointed to by the synchronized symbolic link is the original path stored in the source symbolic link. No matter whether the path is reachable before and after synchronization, it will not be converted.
Several other details that need attention
- Symbolic links themselves
mtime
Will not be copied; --check-new
and--perms
The behavior of the option is ignored when symbolic links are encountered.
Multi machine concurrent synchronization
In essence, synchronizing data between two object stores is to pull data from one end and then push it to the other end. As shown in the figure below, the efficiency of synchronization depends on the bandwidth between the client and the cloud.
In the figure below, jucesync supports a large number of concurrent data when the bandwidth of a single machine is full.
The manager executes as the mastersync
Command, by--worker
The parameter defines multiple worker hosts. Juicefs will dynamically split the synchronized workload according to the total number of workers and distribute it to each host for execution at the same time. That is, the amount of synchronous tasks originally processed on one host is divided into multiple copies and distributed to multiple hosts for simultaneous processing. The amount of data that can be processed per unit time is larger, and the total bandwidth is doubled.
When configuring multi machine concurrent synchronization tasks, you need to configure the SSH password free login from the manager host to the worker host in advance to ensure that the client and tasks can be successfully distributed to the worker.
Manager will distribute the juicefs client program to the worker host. In order to avoid the compatibility problem of the client, please ensure that manager and worker use the same type and architecture of operating system.
For example, synchronize object store a to object store B, and adopt multi host parallel synchronization:
juicefs sync --worker [email protected],[email protected] s3://ABCDEFG:[email protected] oss://ABCDEFG:[email protected]
Current host and two worker hosts[email protected]
and[email protected]
The task of data synchronization between two object stores will be shared.
If the SSH service of the worker host is not the default port 22, please go through the manager host
.ssh/config
The configuration file sets the SSH service port number of the worker host.
Scene application
Remote disaster recovery and backup of data
Remote disaster recovery backup aims at the file itself, so the files stored in juicefs should be synchronized to other object storage. For example, synchronize the files in juicefs file system to object storage a:
#Mount juicefs
sudo juicefs mount -d redis://10.10.0.8:6379/1 /mnt/jfs
#Perform synchronization
sudo juicefs sync /mnt/jfs/ s3://ABCDEFG:[email protected]/
After synchronization, you can directly see all files in object store a.
Create a copy of juicefs data
Different from the disaster recovery backup oriented to the file itself, the purpose of establishing the juicefs data copy is to establish an image with exactly the same content and structure for the juicefs data storage. When the object storage in use fails, you can switch to the data copy to continue working by modifying the configuration. It should be noted that only the data of the juicefs file system is copied here, and the metadata is not copied. The data backup of the metadata engine is still needed.
This requires directly operating the underlying object store of juciefs and synchronizing it with the target object store. For example, to store object B as a copy of the data of the juicefs file system:
juicefs sync cos://ABCDEFG:[email protected] oss://ABCDEFG:[email protected]
After synchronization, the content and structure in object store B are exactly the same as those in the object store used by juicefs.
If you are helpful, please pay attention to our projectJuicedata/JuiceFSYo! (0ᴗ0✿)