Grep, SED and awk: three powerful tools of Linux

Time:2021-2-26

1、 Grep and regular expressions

grep

Grep (global search regular expression (RE) and print out the line) is a powerful text search tool. It can use regular expressions to search text and print out the matching lines.

option

-This parameter must be used when you specify that you want to find a directory instead of a file, otherwise the grep command will return information and stop the action.

-H when searching multiple files, the matching file name prefix is not displayed

-I ignore the difference between case and character.

-List the file names whose contents conform to the specified template style.

-N lists all the matching text lines and displays the line number

-R recursive search, search the current directory and subdirectories, the effect of this parameter and the specified "- D recurse" parameter is the same.

-V reverse lookup. Show only mismatched lines of text

1. – R recursive search

[email protected]:/home/xhprof/trunk# grep -r XHProfRuns_Default *
examples/sample.php:$xhprof_runs = new XHProfRuns_Default();
xhprof_html/callgraph.php:$xhprof_runs_impl = new XHProfRuns_Default();
xhprof_html/typeahead.php:$xhprof_runs_impl = new XHProfRuns_Default();

2. Use of – I to display the file name

[email protected]:~# grep -I root abc.txt 123.txt passwd 
passwd:root:x:0:0:root:/root:/bin/bash

3. -n

[email protected]:~# grep -n 'root' passwd 
1:root:x:0:0:root:/root:/bin/bash

regular expression

1. Regular expression single character

  • Specific characters

    • grep ‘a’ passwd
  • Characters in range

    • grep ‘[a-z]’ passwd
    • grep ‘[A-Za-z0-9]’ passwd
    • Grep ‘[^ 0-9]’ passwd negates characters other than numbers
  • Any character

    • grep ‘.’ passwd

But in grep ‘[.]’, it only represents the character of dot. Pay attention to the difference. If you want to use the original meaning of, use the way of

  • The above three combinations

2. Regular expressions and other symbols

  • Boundary character

    • ^A character, a head character, placed in front of a string of letters to begin with. grep ‘^root’ passwd
    • The $sign, such as false $, means that it ends with the false character
    • ^$stands for blank line, grep ‘^ $’ passwd
  • Metacharacter

    • w: Matches any type of character, including underscores. Equivalent to ([a-za-z0-9]_ ])
    • W: Uppercase w to match any non word character. Equivalent to ([^ a-za-z0-9_ ])
    • B stands for word separation. For example, grep ‘/ BX / B’ passwd can select a single separated x character, but not the X character in the word
  • Regular expression character combination

    • repeat

      \*: matches the preceding character or subexpression zero or more times. Example: grep 'se *' test.txt \
      \+: matches the preceding character or expression one or more times. Example: grep 'se \ +' test.txt Note that the plus sign is preceded by a backslash
      ?: matches the preceding character or expression zero or once. For example: grep 'se \?' test.txt . attention? The backslash should also be added in front
      The use of brackets: grep '\ (SE) *' test.txt . Note the backslash before the brackets
      Specify the number of repetitions: grep '[0-9] \ {2,3 \}' passwd

2、 Sed line editor

Sed is a kind of stream editor, which is a very important tool in text processing. It can be perfectly used with regular expressions, and its function is extraordinary. During processing, the current processing line is stored in the temporary buffer, which is called “pattern space”, and then the content of the buffer is processed with the SED command. After processing, the content of the buffer is sent to the screen. The next line is then processed, and this is repeated until the end of the file. The contents of the file do not change unless you use redirection to store the output. Sed is mainly used to automatically edit one or more files, simplify the repeated operation of files, write conversion programs, etc.

Command format

sed [options] 'command' file(s)
sed [options] -f scriptfile file(s)

Common options

-E < script > or -- expression = < script >: the input text file is processed with the script specified in the options;

-N or -- quiet or -- silent: only the result processed by script is displayed;

Common use of command

A \ \ insert text below the current line.
I \ \ insert text above the current line.
Change the selected line to a new text.
D delete, delete the selected row.
N reads the next input line and processes the new line with the next command instead of the first.
S replaces the specified character
P prints the line of the template block.
Q quit sed.
R file reads lines from the file.
W file writes and appends the template block to the end of the file.

1. P print related lines

NL passwd|sed - n '10p' // print line 10
sed -n 'p' passwd
Sed - n '/ root / P' passwd // regular matching printing
NL passwd|sed - n '10,20p' // print lines 10 to 20
NL passwd|sed - n '/ news /, / nobody / P' // use regularization to specify the range of a row
NL passwd|sed - n '10,20! P' // do not select lines 10 to 20,! Stands for negation
NL passwd | sed - n '1 ~ 2p' // interval lines, 1,3,5... Lines will be output

Note that the – N option must be added here, otherwise each data will display the same 2 rows. And other irrelevant content will also be displayed

2. A add content after the line

[email protected]:~# nl passwd|sed '2a **************'
     1    root:x:0:0:root:/root:/bin/bash
     2    daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
**************
     3    bin:x:2:2:bin:/bin:/usr/sbin/nologin
     
NL passwd|sed '1,2a ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^

3. I insert before the line

[email protected]:~# nl passwd|sed '1,2i **************'
**************
     1    root:x:0:0:root:/root:/bin/bash
**************
     2    daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin

4. C change the selected line to a new text

[email protected]:~# nl passwd|sed '1c abcd'
abcd
     2    daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
     
 //Different from a and I, if this is a range of lines, the content of this range is replaced by the current content   
[email protected]:~# nl passwd|sed '1,3c abcd'
abcd
     4    sys:x:3:3:sys:/dev:/usr/sbin/nologin

5. D delete line

[email protected]:~# nl passwd | sed '/root/d'
     2    daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
     3    bin:x:2:2:bin:/bin:/usr/sbin/nologin

Application cases

Insert 2 lines at the end of the file
nl passwd | sed '$a \    abcd \n    linux'

    49    memcache:x:126:132:Memcached,,,:/nonexistent:/bin/false
    50    postfix:x:127:133::/var/spool/postfix:/bin/false
    51    mongodb:x:128:65534::/var/lib/mongodb:/bin/false
    abcd 
    linux
    
    
    Delete the blank line in the file, and ^ $directly connected represents the blank line
    nl passwd | sed '/^$/d'

6. S replacement command

sed 's/false/true/' passwd 
Output:
...
sphinxsearch:x:124:131::/home/sphinxsearch:/bin/true
sshd:x:125:65534::/var/run/sshd:/usr/sbin/nologin
memcache:x:126:132:Memcached,,,:/nonexistent:/bin/true
postfix:x:127:133::/var/spool/postfix:/bin/true

Sed's /: /% / g 'passwd // global substitution with G
Output:
sphinxsearch%x%124%131%%/home/sphinxsearch%/bin/false
sshd%x%125%65534%%/var/run/sshd%/usr/sbin/nologin
memcache%x%126%132%Memcached,,,%/nonexistent%/bin/false
postfix%x%127%133%%/var/spool/postfix%/bin/false

Filtering IP in ifconfig

Eno1 link encap: Ethernet Hardware Address F8: B1: 56: C5: E7: 44  
          INET address: 172.19.5.175 broadcast: 172.19.5.255 mask: 255.255.255.0
          Inet6 address: fe80:: c422: e82d: ad66: 7a92 / 64 Scope:Link
          UP BROADCAST RUNNING MULTICAST   MTU:1500   Metric: 1
          Received packet: 35171885 error: 53864 discard: 0 overload: 0 frames: 29047
          Send packet: 25049325 error: 0 discard: 0 overload: 0 carrier: 0
          Collision: 0 send queue length: 1000 
          Receive byte: 8124495140 (8.1 GB) send byte: 4549284803 (4.5 GB)
          Interrupt: 20 Memory:f7f00000-f7f20000 

 Ifconfig eno1 | sed - n '/ INET / P' | sed's / INET. * address: // '| sed's / broadcast. * $/'
 
 Output:
 172.19.5.175

Advanced operation command

1. Multiple sed commands, wrapped with {} and separated by ‘;’

Delete lines 44-48 and replace false with true
nl passwd|sed '{44,48d;s/false/true/}'

    41    statd:x:121:65534::/var/lib/nfs:/bin/true
    42    mysql:x:1001:1001::/home/mysql:/sbin/nologin
    43    www:x:1002:1002::/home/www:/sbin/nologin
    49    memcache:x:126:132:Memcached,,,:/nonexistent:/bin/true
    50    postfix:x:127:133::/var/spool/postfix:/bin/true
    51    mongodb:x:128:65534::/var/lib/mongodb:/bin/true

2. N read the next input line

//The usage of n
[email protected]:~# nl passwd|sed -n '{p;n}'
     1    root:x:0:0:root:/root:/bin/bash
     3    bin:x:2:2:bin:/bin:/usr/sbin/nologin
     5    sync:x:4:65534:sync:/bin:/bin/sync
     7    man:x:6:12:man:/var/cache/man:/usr/sbin/nologin

Tip: NL passwd | sed - n '{1 ~ 2p}' can achieve the same effect

3. & replace the fixed string, & represents the character matched before

//Put a space between the name and the following
[email protected]:~# sed 's/^[a-z_]\+/&     /' passwd
root     :x:0:0:root:/root:/bin/bash
daemon     :x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin     :x:2:2:bin:/bin:/usr/sbin/nologin


//The initial of the user name is capitalized
//Metacharacter (case conversion for first letters) to uppercase and lowercase characters

//Lowercase u, replace the initial of the user name
[email protected]:~# sed 's/^[a-z_]\+/\u&/' passwd
Root:x:0:0:root:/root:/bin/bash
Daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
Bin:x:2:2:bin:/bin:/usr/sbin/nologin

//Uppercase u, replace all user names with uppercase U
[email protected]:~# sed 's/^[a-z_]\+/\U&/' passwd
ROOT:x:0:0:root:/root:/bin/bash
DAEMON:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
BIN:x:2:2:bin:/bin:/usr/sbin/nologin

4. Use of ()

//From the passwd file, extract the user name, uid, GID. The characters of the first () are matched by the characters of the first () in the file
[email protected]:~# sed 's/\(^[a-z_-]\+\):x:\([0-9]\+\):\([0-9]\+\):.*$/USER:    UID:   GID:/' passwd
USER:root    UID:0   GID:0
USER:daemon    UID:1   GID:1
USER:bin    UID:2   GID:2
USER:sys    UID:3   GID:3
USER:sync    UID:4   GID:65534

5. – R copies the specified file and inserts it into the matching line. -W copy the matching line to the specified file

//123. TXT file has three lines, all numbers. abc.txt There are three lines in the file, all letters
//The following command implementation results, read 123. TXT content, copy to the matching abc.txt The first line of the file, the contents of the file do not change
[email protected]:~# sed '1r 123.txt' abc.txt 
qwefadssa
1232323223
32343434
23333
trwrda
asdfasdf

//The implementation result of the following command, match abc.txt The second line of the file, written to 123. TXT file. The 123.txt file will change, abc.txt The content of the document remains unchanged
[email protected]:~# sed '2w 123.txt' abc.txt 
qwefadssa
trwrda
asdfasdf
[email protected]:~# cat 123.txt 
trwrda


//Summary
Sed '2W or 2R file a' file B
The matched files are all for file B, and the read or write files are all for file a

6. Q quits early after finding the specified result

[email protected]:~# nl passwd |sed '2q'
     1    root:x:0:0:root:/root:/bin/bash
     2    daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
[email protected]:~# nl passwd |sed '/root/q'
     1    root:x:0:0:root:/root:/bin/bash
[email protected]:~# 

7. Sed batch replacement of multiple file contents

If the - I parameter is added, the content of the file will be changed. If the - I parameter is not added, the replacement result will be output, and the content of the file will not be changed
Format: sed - I "s / find field / replace field / g" ` grep find field - RL path`
Example: sed - I "s / oldstring / newString / g" ` grep oldstring - RL yourdir`
         sed -i "s/English/China/"  `ls test*`

3、 Awk

Awk is a language for processing text files and a powerful tool for text analysis. It is characterized by flexible processing and powerful functions. It can realize statistics, tabulation and other functions.

Awk is called because it takes the initials of Alfred aho, Peter Weinberger and Brian Kernighan’s family name.

format

  • Command line format
    awk [options] ‘command’ file(s)
  • Script format
    awk -f awk-script-file file(s)

Command form:

awk [-F|-f|-v] ‘BEGIN{} //{command1; command2} END{}’ file

  • [- F | – F | – v] big parameter, – f specifies separator, – f calls script, – V defines variable var = value

‘reference code block

  • Begin initialization code block. Before processing each line, initialization code mainly refers to global variables and sets FS separator
  • //Matches a code block, which can be a string or regular expression
  • {} command code block containing one or more commands
  • ; multiple commands are separated by semicolons
  • The end code block, which is executed after processing each line, is mainly used for final calculation or output end summary information

Common built in parameters

  • $0, $1, $2… Represents the entire current line
  • $1 first field per line
  • NF field quantity variable
  • NR record number of each line, multiple file record increment
  • File name

1. Common built-in parameters, $1, $2. Specified by the separator, in the order of $1, $2. The default separator is space

awk -F ':' '{print "USERNAE:"$1"\t""UID:"$3}' passwd

2. NR,NF,FILENAME

awk -F ':' '{print "Line:"NR,"Col:"NF,"USER:"$1}' passwd 

3. Print in the format specified by printf

awk -F ':' '{printf("Line:%3s Col:%s User:%s\n",NR,NF,$1)}' passwd

[email protected]:~# awk -F ':' '{printf("Line:%3s Col:%s User:%s\n",NR,NF,$1)}' passwd
Line:  1 Col:7 User:root
Line:  2 Col:7 User:daemon
Line:  3 Col:7 User:bin
Line:  4 Col:7 User:sys
...

4. Use if

awk -F ':' '{if ($3>100) printf("Line:%3s Col:%s User:%s\n",NR,NF,$1)}' passwd 

5. Combination of regular and command

awk -F ':' '/root/{print $1}' passwd

[email protected]:~# awk -F ':' '/root/{print $1}' passwd
root

6. Use begin and end to tabulate

awk -F ':' 'BEGIN{print "line  col   user"}{print NR" |"NF" |"$1}END{print "----------------"FILENAME}' passwd

7. Use begin and end to count the total size of files in a directory

ls -l|awk 'BEGIN{size=0}{size+=$5}END{print " size is "size/1024/1024"M"}'

8. Count the number of non empty rows in passwd. $1! ~, ~ means regular after match,! ~ means not match. /^$/ regular matching empty line

awk -F ':' 'BEGIN{count=0}$1!~/^$/{count++}END{print " count ="count}' passwd

9. Put the statistical results into the array and print them out

awk -F ':' 'BEGIN{count=0}{if ($3>100) name[count++]=$1}END{for(i=0;i<count;i++) print i,name[i]}' passwd 

[email protected]:~# awk -F ':' 'BEGIN{count=0}{if ($3>100) name[count++]=$1}END{for(i=0;i<count;i++) print i,name[i]}' passwd 
0 nobody
1 systemd-network
2 systemd-resolve
3 systemd-bus-proxy
4 syslog

10. Calculate the total score and average score

Test content

zhangsan 80
lisi 81.5
wangwu 93
zhangsan 85
lisi 88
wangwu 97
zhangsan 90
lisi 92
wangwu 88

Required output format: (average: average, total: total)
name#######average#######total
zhangsan xxx xxx
lisi xxx xxx
wangwu xxx xxx

awk 'BEGIN{print "name####average#####total"}{score[$1]+=$2;count[$1]+=1}END{for (i in score) print i,score[i]/count[i],score[i]}' test.txt

Grep, SED and awk: three powerful tools of Linux
This problem provides a new way to traverse array, for (x in array)

reference resources: