Linux system under the split command to cut the file usage tutorial

Time:2021-1-27

Have you ever wanted to split a large file into small files? For example, we need to divide a 5GB log file into several small files, so that we can read it with a common text editor. Sometimes we need to transfer 20GB large files to another server, which requires us to split it into multiple files, so as to facilitate data transmission. Let’s explain how to use the split command to cut files.

Copy code

The code is as follows:

[[email protected] database]# ls -lh gkdb.db

-r–r–r–    1 root     root         411M Jul 23 17:20 gkdb.db

The 276m file is divided into 20m small files. The file segmentation operation is as follows:

Copy code

The code is as follows:

[[email protected] database]# split -b 20m gkdb.db gkdb_pack_

Split is life today.

-B ﹣ 20m ﹣ refers to the maximum of 20m for each volume file.

gkdb.db Files that need to be split.

gkdb_ pack_ It refers to the beginning of the file after segmentation. The file name after segmentation is generally “gkdb”_ pack_ aa、gkdb_ pack_ ab、gkdb_ pack_ AC and so on.

Display the file name and size after segmentation

Copy code

The code is as follows:

[[email protected] database]# ll -lh gkdb_pack_a*

-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_aa
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ab
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ac
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ad
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ae
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_af
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ag
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ah
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ai
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_aj
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ak
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_al
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_am
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_an
-rw-r–r–    1 root     root          20M Jul 27 16:09 gkdb_pack_ao
-rw-r–r–    1 root     root          20M Jul 27 16:10 gkdb_pack_ap
-rw-r–r–    1 root     root          20M Jul 27 16:10 gkdb_pack_aq
-rw-r–r–    1 root     root          20M Jul 27 16:10 gkdb_pack_ar
-rw-r–r–    1 root     root          20M Jul 27 16:10 gkdb_pack_as
-rw-r–r–    1 root     root          20M Jul 27 16:10 gkdb_pack_at
-rw-r–r–    1 root     root          11M Jul 27 16:10 gkdb_pack_au

The bottom file is the rest.   20*20+11=411MB

Merge files

Copy code

The code is as follows:

[[email protected] database]# cat gkdb_pack_*>gkdb.tar.gz

The MD5 values before and after assembly are the same. If you don’t believe it, you can use the md5sum command to check it.

If you want to split a text file, for example, there are thousands of lines. Of course, the number of characters in each line is different. If you want to split a text file by the number of lines, for example, one file is generated for every 100 lines, you only need the – L parameter, as follows:
 

Copy code

The code is as follows:

[[email protected] public_rw]# split -l 100 test.txt

 
In fact, if you do not add any parameters, it is divided by 1000 rows by default.

Example 1: split each file by 1000 lines
The split command splits the file into 1000 lines, and the file names are [prefix] AA, [prefix] AB, [prefix] AC, etc. the default prefix is x, and the number of lines in each file is 1000. The command is as follows:

Copy code

The code is as follows:

$ split mylog
$ wc -l *

     4450 mylog
     1000 xaa
     1000 xab
     1000 xac
     1000 xad
      450 xae

Example 2: each file is divided into 20MB
Split the file into multiple 20MB files with – B option. The command is as follows:

Copy code

The code is as follows:

$ split -b 20M logdata
$ ls -lh | tail -n +2

-rw——- 1 sathiya sathiya 102M Jul 25 18:47 logdata
-rw——- 1 sathiya sathiya  20M Jul 25 19:20 xaa
-rw——- 1 sathiya sathiya  20M Jul 25 19:20 xab
-rw——- 1 sathiya sathiya  20M Jul 25 19:20 xac
-rw——- 1 sathiya sathiya  20M Jul 25 19:20 xad
-rw——- 1 sathiya sathiya  20M Jul 25 19:20 xae
-rw——- 1 sathiya sathiya 1.6M Jul 25 19:20 xaf

Example 3, 50 MB per file to specify prefix segmentation
Use the – bytes option to split the file into multiple 50MB files. The – bytes option is similar to the – B option, and the prefix is specified in the second parameter.

Copy code

The code is as follows:

$ split –bytes=50M logdata mydatafile
$ ls -lh

total 204M
-rw——- 1 sathiya sathiya 102M Jul 25 18:47 logdata
-rw——- 1 sathiya sathiya  50M Jul 25 19:23 mydatafileaa
-rw——- 1 sathiya sathiya  50M Jul 25 19:23 mydatafileab
-rw——- 1 sathiya sathiya 1.6M Jul 25 19:23 mydatafileac

Example 4. File segmentation based on the number of lines
Use the – L option to specify the number of lines to split the file into multiple files with the same number of lines.

Copy code

The code is as follows:

$ wc -l testfile
2591 testfile
$ split -l 1500 testfile importantlog
$ wc -l *
1500 importantlogaa
1091 importantlogab
2591 testfile

Example 5. Name the file with a number suffix
Use the – D option to specify the suffix as a number, such as 00, 01, 02.., instead of AA, AB, AC.

Copy code

The code is as follows:

$ split -d testfile
$ ls
testfile x00 x01 x02

Recommended Today

Pandas data analysis — detailed explanation of super easy to use groupby

WeChat official account: “Python reads money”If there are any questions or suggestions, please official account message. In the daily data analysis, it is often necessary to analyze the dataDivide into different groups according to one (more) fieldFor example, in the field of e-commerce, the total sales of the whole country are divided by provinces, and […]