Shell realizes text de duplication and maintains the original order

Time:2022-5-26

In short, this technique corresponds to the following scenario

Suppose there is text as follows

Copy codeThe code is as follows:
cccc
aaaa
bbbb
dddd
bbbb
cccc
aaaa

Now it needs to be de duplicated. This is very simple. Sort – u can handle it. But if I want to keep the original order of the text, for example, there are two AAAA here, I just want to remove the second AAAA, and the first AAAA is in front of BBBB. After de duplication, it still needs to be in front of it, so my expected output result is

Copy codeThe code is as follows:
cccc
aaaa
bbbb
dddd

Of course, this problem itself is not difficult. It’s easy to write in C + + or python, but when the so-called killing machine can be solved by shell command, it will always be our first choice. The answer is given at the end. Here’s how I thought of it

Sometimes when we want to add our own directory to the environment variable path, we will be in ~ / The bashrc file reads like this. For example, the directory to be added is $home / bin

Copy codeThe code is as follows:
export PATH=$HOME/bin:$PATH

In this way, we add the path $home / bin to the path and let it be searched at the front, but when we execute source ~ / After bashrc, the $home / bin directory will be added to path. If we add another directory next time, such as

Copy codeThe code is as follows:
export PATH=$HOME/local/bin:$HOME/bin:$PATH

Then execute source ~ / In bashrc, there are actually two records in the $home / bin directory in the path. Although this does not affect the use, it is unbearable for an obsessive-compulsive disorder. Therefore, the problem becomes that we need to remove the repeated paths in $path and keep the original path order unchanged, that is, who is in the front, and who is still in the front after de duplication, because we start from the first path when executing the shell command, So order is important

Well, having said so much, let’s reveal the final result. Take the data at the beginning of the article as an example, assuming that the input file is in Txt, the command is as follows

Copy codeThe code is as follows:
cat -n in.txt | sort -k2,2 -k1,1n | uniq -f1 | sort -k1,1n | cut -f2-

These are very simple shell commands, which are explained below

Copy codeThe code is as follows:
cat -n in. Txt: output text, preceded by line numbers, separated by \ t
Sort – k2,2 – k1,1n: sort the input contents. The primary key is the second field and the second key is the first field, and sort by number
Uniq - F1: ignore the first column and de duplicate the text, but the first column will be included in the output
Sort – k1,1n: sort the input contents. Key is the first field and is sorted by number
Cut – F2 -: output the contents of column 2 and beyond. The default separator is \ t

You can start with the first command and combine them in turn to see the actual output effect, which will be easier to understand. How to deal with the repeated path in $path? Or in the previous example, just use tr to convert before and after

Copy codeThe code is as follows:
export PATH=$HOME/local/bin:$HOME/bin:$PATH
export PATH=`echo $PATH | tr ‘:’ ‘\n’ | cat -n | sort -k2,2 -k1,1n | uniq -f1 | sort -k1,1n | cut -f2- | tr ‘\n’ ‘:’`

In fact, there will be a problem when using path in this way. For example, if we want to remove the path of $home / bin after executing the above command, it is not enough to modify it to the following content

Copy codeThe code is as follows:
export PATH=$HOME/local/bin:$PATH
export PATH=`echo $PATH | tr ‘:’ ‘\n’ | cat -n | sort -k2,2 -k1,1n | uniq -f1 | sort -k1,1n | cut -f2- | tr ‘\n’ ‘:’`

Because we have added $home / bin to $path, this does not play the role of deletion. Perhaps the best way is to know all paths clearly and then display the specified instead of adding