Using fdupes to find and delete duplicate files in Linux system

Time:2021-6-10

Finding and replacing duplicate files is a common requirement for most computer users. Finding and removing duplicate files is a tedious task, which is time-consuming and labor-consuming. But if you’re running GNU / Linux on your machine, it’s easy to find duplicate files, thanks to the fdupes tool.
What is fdupes?
Fdupes is a tool under Linux. It is written by Adrian Lopez in C programming language and distributed based on MIT license. The application can find duplicate files in the specified directory and subdirectories. Fdupes identifies duplicate content by comparing MD5 signature of files and comparing files byte by byte. Fdupes has various options to list, delete and replace files with hard links of file copies.

File comparison begins in the following order:

Size comparison > partial MD5 signature comparison > full MD5 signature comparison > byte by byte comparison

Install fdupes to Linux
Install the latest version of fdupes on Debian based systems, such as Ubuntu and Linux mint, with the following commands.

Copy code

The code is as follows:

$ sudo apt-get install fdupes

On CentOS / RHEL and fedora based systems, you need to open the EPEL repository to install the fdupes package.

Copy code

The code is as follows:

# yum install fdupes
# dnf install fdupes

    [ In Fedora 22 and beyond]
Note: since Fedora 22, the default package manager Yum has been replaced by DNF.

How to use the fdupes command
1. For the purpose of demonstration, let’s create some duplicate files in a certain directory (such as tecmint). The command is as follows:

Copy code

The code is as follows:

$ mkdir /home/”$USER”/Desktop/tecmint && cd /home/”$USER”/Desktop/tecmint && for i in {1..15}; do echo “I Love Tecmint. Tecmint is a very nice community of Linux Users.” > tecmint${i}.txt ; done

After executing the above command, let’s use the LS command to verify whether the duplicate file is created.

Copy code

The code is as follows:

$ ls -l

total 60
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint10.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint11.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint12.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint13.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint14.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint15.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint1.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint2.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint3.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint4.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint5.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint6.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint7.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint8.txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint9.txt
The above script creates 15 files named tecmint1.txt, tecmint2.txt… Tecmint15.txt, and the data of each file is the same, such as

“I Love Tecmint. Tecmint is a very nice community of Linux Users.”
2. Now search the tecmint folder for duplicate files.

$ fdupes /home/$USER/Desktop/tecmint
/home/tecmint/Desktop/tecmint/tecmint13.txt
/home/tecmint/Desktop/tecmint/tecmint8.txt
/home/tecmint/Desktop/tecmint/tecmint11.txt
/home/tecmint/Desktop/tecmint/tecmint3.txt
/home/tecmint/Desktop/tecmint/tecmint4.txt
/home/tecmint/Desktop/tecmint/tecmint6.txt
/home/tecmint/Desktop/tecmint/tecmint7.txt
/home/tecmint/Desktop/tecmint/tecmint9.txt
/home/tecmint/Desktop/tecmint/tecmint10.txt
/home/tecmint/Desktop/tecmint/tecmint2.txt
/home/tecmint/Desktop/tecmint/tecmint5.txt
/home/tecmint/Desktop/tecmint/tecmint14.txt
/home/tecmint/Desktop/tecmint/tecmint1.txt
/home/tecmint/Desktop/tecmint/tecmint15.txt
/home/tecmint/Desktop/tecmint/tecmint12.txt
3. Use the – R option to recursively search for duplicate files in each directory, including its subdirectories.

It recursively searches all the files and folders and takes a little time to scan for duplicate files, depending on the number of files and folders. In the meantime, the terminal will display the whole process, as shown below.

Copy code

The code is as follows:

$ fdupes -r /home

Progress [37780/54747] 69%
4. Use the – s option to see the size of duplicate files found in a folder.

Copy code

The code is as follows:

$ fdupes -S /home/$USER/Desktop/tecmint

65 bytes each:                         
/home/tecmint/Desktop/tecmint/tecmint13.txt
/home/tecmint/Desktop/tecmint/tecmint8.txt
/home/tecmint/Desktop/tecmint/tecmint11.txt
/home/tecmint/Desktop/tecmint/tecmint3.txt
/home/tecmint/Desktop/tecmint/tecmint4.txt
/home/tecmint/Desktop/tecmint/tecmint6.txt
/home/tecmint/Desktop/tecmint/tecmint7.txt
/home/tecmint/Desktop/tecmint/tecmint9.txt
/home/tecmint/Desktop/tecmint/tecmint10.txt
/home/tecmint/Desktop/tecmint/tecmint2.txt
/home/tecmint/Desktop/tecmint/tecmint5.txt
/home/tecmint/Desktop/tecmint/tecmint14.txt
/home/tecmint/Desktop/tecmint/tecmint1.txt
/home/tecmint/Desktop/tecmint/tecmint15.txt
/home/tecmint/Desktop/tecmint/tecmint12.txt
5. You can use the – s and – R options at the same time to view the size of duplicate files in all involved directories and subdirectories, as follows:

Copy code

The code is as follows:

$ fdupes -Sr /home/avi/Desktop/

65 bytes each:                         
/home/tecmint/Desktop/tecmint/tecmint13.txt
/home/tecmint/Desktop/tecmint/tecmint8.txt
/home/tecmint/Desktop/tecmint/tecmint11.txt
/home/tecmint/Desktop/tecmint/tecmint3.txt
/home/tecmint/Desktop/tecmint/tecmint4.txt
/home/tecmint/Desktop/tecmint/tecmint6.txt
/home/tecmint/Desktop/tecmint/tecmint7.txt
/home/tecmint/Desktop/tecmint/tecmint9.txt
/home/tecmint/Desktop/tecmint/tecmint10.txt
/home/tecmint/Desktop/tecmint/tecmint2.txt
/home/tecmint/Desktop/tecmint/tecmint5.txt
/home/tecmint/Desktop/tecmint/tecmint14.txt
/home/tecmint/Desktop/tecmint/tecmint1.txt
/home/tecmint/Desktop/tecmint/tecmint15.txt
/home/tecmint/Desktop/tecmint/tecmint12.txt
107 bytes each:
/home/tecmint/Desktop/resume_files/r-csc.html
/home/tecmint/Desktop/resume_files/fc.html
6. Instead of searching recursively in one or all folders, you can choose to search selectively in two or three folders as required. There’s no need to remind you that you can use the – s and / or – R options if you need to.

Copy code

The code is as follows:

$ fdupes /home/avi/Desktop/ /home/avi/Templates/

7. To remove duplicate files while keeping a copy, you can use the – D option. With this option, you must be extra careful, otherwise the end result may be the loss of files / data. This operation cannot be resumed.

Copy code

The code is as follows:

$ fdupes -d /home/$USER/Desktop/tecmint

[1] /home/tecmint/Desktop/tecmint/tecmint13.txt
[2] /home/tecmint/Desktop/tecmint/tecmint8.txt
[3] /home/tecmint/Desktop/tecmint/tecmint11.txt
[4] /home/tecmint/Desktop/tecmint/tecmint3.txt
[5] /home/tecmint/Desktop/tecmint/tecmint4.txt
[6] /home/tecmint/Desktop/tecmint/tecmint6.txt
[7] /home/tecmint/Desktop/tecmint/tecmint7.txt
[8] /home/tecmint/Desktop/tecmint/tecmint9.txt
[9] /home/tecmint/Desktop/tecmint/tecmint10.txt
[10] /home/tecmint/Desktop/tecmint/tecmint2.txt
[11] /home/tecmint/Desktop/tecmint/tecmint5.txt
[12] /home/tecmint/Desktop/tecmint/tecmint14.txt
[13] /home/tecmint/Desktop/tecmint/tecmint1.txt
[14] /home/tecmint/Desktop/tecmint/tecmint15.txt
[15] /home/tecmint/Desktop/tecmint/tecmint12.txt

Copy code

The code is as follows:

Set 1 of 1, preserve files [1 – 15, all]:

You may notice that all duplicate files are listed and prompted for deletion, one by one, or specify a range, or delete all at once. You can select a range, like the following, to delete the files within the specified range.

Copy code

The code is as follows:

Set 1 of 1, preserve files [1 – 15, all]: 2-15

   [-] /home/tecmint/Desktop/tecmint/tecmint13.txt
   [+] /home/tecmint/Desktop/tecmint/tecmint8.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint11.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint3.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint4.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint6.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint7.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint9.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint10.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint2.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint5.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint14.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint1.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint15.txt
   [-] /home/tecmint/Desktop/tecmint/tecmint12.txt
8. From a security point of view, you may want to print the output of fdupes to a file, and then check the text file to decide what file to delete. This reduces the risk of accidentally deleting files. You can do this:

Copy code

The code is as follows:

$ fdupes -Sr /home > /home/fdupes.txt

Note: you should replace / home with the folder you want. Also, if you want to search recursively and print the size, use the – R and – s options.

9. You can use the – f option to ignore the first file in each matching set.

First, list the files in the directory.

Copy code

The code is as follows:

$ ls -l /home/$USER/Desktop/tecmint

total 20
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint9 (3rd copy).txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint9 (4th copy).txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint9 (another copy).txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint9 (copy).txt
-rw-r–r– 1 tecmint tecmint 65 Aug  8 11:22 tecmint9.txt
Then, the first file in each matching set is ignored.

Copy code

The code is as follows:

$ fdupes -f /home/$USER/Desktop/tecmint

/home/tecmint/Desktop/tecmint9 (copy).txt
/home/tecmint/Desktop/tecmint9 (3rd copy).txt
/home/tecmint/Desktop/tecmint9 (another copy).txt
/home/tecmint/Desktop/tecmint9 (4th copy).txt
10. Check the installed version of fdupes.

Copy code

The code is as follows:

$ fdupes –version

fdupes 1.51
11. If you need help with fdupes, use the – H switch.

$ fdupes -h
Usage: fdupes [options] DIRECTORY…
 -r –recurse       for every directory given follow subdirectories
                    encountered within
 -R –recurse:      for each directory given after this option follow
                    subdirectories encountered within (note the ‘:’ at
                    the end of the option, manpage for more details)
 -s –symlinks      follow symlinks
 -H –hardlinks     normally, when two or more files point to the same
                    disk area they are treated as non-duplicates; this
                    option will change this behavior
 -n –noempty       exclude zero-length files from consideration
 -A –nohidden      exclude hidden files from consideration
 -f –omitfirst     omit the first file in each set of matches
 -1 –sameline      list each set of matches on a single line
 -S –size          show size of duplicate files
 -m –summarize     summarize dupe information
 -q –quiet         hide progress indicator
 -d –delete        prompt user for files to preserve and delete all
                    others; important: under particular circumstances,
                    data may be lost when using this option together
                    with -s or –symlinks, or when specifying a
                    particular directory more than once; refer to the
                    fdupes documentation for additional information
 -N –noprompt      together with –delete, preserve the first file in
                    each set of duplicates and delete the rest without
                    prompting the user
 -v –version       display fdupes version
 -h –help          display this help message
That’s it. Let me know how you used to find and delete duplicate files in Linux? At the same time, let me know what you think about this tool. Provide your valuable feedback in the comments section below. Don’t forget to praise and share with us to help us spread.