Introduction of awk basic command for Linux text analysis (8)

Time:2021-4-6

Awk is a pattern scanning and processing language, which is a very powerful tool for data analysis and processing.

awk [options] ‘pattern {action}’ file…

The working process of awk is as follows: read the input (standard input or file) by line, and execute action for the line that conforms to the pattern. When pattern is omitted, it means to match any string; when action is omitted, it means to execute ‘{print}’; they cannot be omitted at the same time.
Each line of input is a record for awk. Awk uses $0 to refer to the current record


[[email protected] ~]# head -1 /etc/passwd | awk '{print $0}'
root:x:0:0:root:/root:/bin/bash

In the example, the commandhead -1 /etc/passwdAction $awk is omitted as the input of the current record, which means to print.
For each record, awk uses a separator to split it into columns. The first column is represented by $1, the second column by $2… And the last column by $NF

The – f option specifies the separator
For example, the first row, first column (user name) and last column (login shell) of the output file / etc / passwd:


[[email protected] ~]# head -1 /etc/passwd | awk -F: '{print $1,$NF}'
root /bin/bash

When no separator is specified, use one or more blanks (white space characters, generated by the space bar or tab key) as the separator. The default separator for output is space.
For example, in the result of the output command LS – L *, the file size and file name:


[[email protected] temp]# ls -l * | awk '{print $5,$NF}'
13 b.txt
58 c.txt
12 d.txt
0 e.txt
0 f.txt
24 test.sh
[[email protected] temp]# 

You can also filter any column:

[[email protected] temp]# ls -l *|awk '$5>20 && $NF ~ /txt$/'
-Rw-r -- R -- 1 body body 58 November 16 16:34 c.txt

Where $5 > 20 means that the value of the fifth column is greater than 20; & & means logical and; ~ means match in $NF ~ / TXT $/ and / / is a regular expression. The action is omitted here, and the whole awk statement indicates that the print file size is greater than 20 bytes and the file name ends in txt.

awkUse NR to represent the line number


[[email protected] temp]# awk '/^root/ || NR==2' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
[[email protected] temp]#

In the example, || represents logical or, and the statement represents the line or the second line beginning with root in the output file / etc / passwd.

In some cases, using awk filtering is even more flexible than using grep
For example, get the network card name and its corresponding MTU value in the output of ifconfig

[[email protected] ~]# ifconfig|awk '/^\S/{print $1"\t"$NF}'
ens32: 1500
ens33: 1500
lo:   65536
[[email protected] ~]# 
#The regular expression here does not start with a white space character, and the output content is formatted with the character.

The NR and NF mentioned above are built-in variables of awk. Some commonly used built-in variables are listed below

$0 current record (this variable holds the contents of the whole line)
$1 ~ $n the nth field of the current record, separated by FS
FS input field separator is space or tab by default
The number of fields in the current record is the number of columns
NR line number, starting from 1, if there are multiple files, this value is also accumulated.
FNR input file line number
RS input record separator, default to newline character
Ofs output field separator, the default is also space
The record separator of ORS output. The default is newline
The name of the current input file

For example, the name of the network card and its corresponding output variables can be assigned to the name of RWA


[[email protected] ~]# ifconfig|awk '/^\S/{a=$1}/RX p/{print a,$5}'
ens32: 999477100
ens33: 1663197120
lo: 0

There are two special patterns in awk: begin and end. They do not match the input text. The action part of begin is combined into a code block and executed before any input starts. The action part of end is combined into a code block and executed after all input processing is completed.

#Note the usage of assignment and print function similar to C language
[[email protected] temp]# ls -l *|awk 'BEGIN{print "size name\n---------"}$5>20{x+=$5;print $5,$NF}END{print "---------\ntotal",x}'
size name
---------
58 c.txt
24 test.sh
---------
total 82
[[email protected] temp]#

Awk also supports arrays. The indexes of arrays are regarded as strings (associative arrays). You can use the for loop to traverse array elements
For example, the output file / etc / passwd contains various login shells and their total number

#Pay attention to the writing of array assignment and for loop traversal array
[[email protected] temp]# awk -F ':' '{a[$NF]++}END{for(i in a) print i,a[i]}' /etc/passwd
/bin/sync 1
/bin/bash 2
/sbin/nologin 19
/sbin/halt 1
/sbin/shutdown 1
[[email protected] temp]#

Of course, there are also if branch statements

#Notice how the braces define the action block
[[email protected] temp]# netstat -antp|awk '{if($6=="LISTEN"){x++}else{y++}}END{print x,y}'
6 3
[[email protected] temp]#

Patterns can be separated by commas, which means that they start from matching the first pattern until they match the second pattern


[[email protected] ~]# awk '/^root/,/^adm/' /etc/passwd    
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin

It also supports the trinocular operator pattern1? Pattern2: pattern3, which means to judge whether pattern1 matches, true matches pattern2, false matches pattern3, and pattern can also be an expression similar to C language.
If the login shell with uid greater than 500 in the file / etc / passwd is / bin / bash, the whole line will be output if yes, otherwise the line with uid 0 will be output

#Note that the directory separator is escaped to avoid confusion
[[email protected] ~]# awk -F: '$3>500?/\/bin\/bash$/:$3==0 {print $0}' /etc/passwd     
root:x:0:0:root:/root:/bin/bash
learner:x:1000:1000::/home/learner:/bin/bash
#Three operators can also be nested, examples are omitted

The – f file option reads awk instructions from the file

#Print the first ten items of Fibonacci sequence
[[email protected] temp]# cat test.awk 
BEGIN{
  $1=1
  $2=1
  OFS=","
  for(i=3;i<=10;i++)
  {
    $i=$(i-2)+$(i-1)
  }
  print
}
[[email protected] temp]# awk -f test.awk 
1,1,2,3,5,8,13,21,34,55
[[email protected] temp]#

Option – f specifies the column separator

#When multiple characters are used as separators
[[email protected] temp]# echo 1.2,3:4 5|awk -F '[., :]' '{print $2,$NF}'
2 5
[[email protected] temp]#
#Here, the content in the single quotation mark after - F is also a regular expression

Option – V var = Val to set variable

#Here, the usage of printf function is similar to the function with the same name in C language
[[email protected] ~]# awk -v n=5 'BEGIN{for(i=0;i<n;i++) printf "%02d\n",i}' 
00
01
02
03
04
[[email protected] ~]#

Functions such as print also support the use of redirection > and > > to save the output to a file

#For example, split the file according to the first column (IP) access.log , and save to ip.txt In the file
[[email protected] temp]# awk '{print > $1".txt"}' access.log 
[[email protected] temp]# ls -l 172.20.71.*
-Rw-r -- R -- 1 root 5297 Nov 22 21:33 172.20.71.38.txt
-Rw-r -- R -- 1 root 1236 November 22 21:33 172.20.71.39.txt
-Rw-r -- R -- 1 root 4533 Nov 22 21:33 172.20.71.84.txt
-Rw-r -- R -- 1 root 2328 Nov 22 21:33 172.20.71.85.txt

Built in function
length()Get string length


[[email protected] temp]# awk -F: '{if(length($1)>=16)print}' /etc/passwd 
systemd-bus-proxy:x:999:997:systemd Bus Proxy:/:/sbin/nologin
[[email protected] temp]#

split()Separate the string by a separator and save it to an array


[[email protected] temp]# head -1 /etc/passwd|awk '{split($0,arr,/:/);for(i=1;i<=length(arr);i++) print arr[i]}'
root
x
0
0
root
/root
/bin/bash
[[email protected] temp]# 

Getline gets records from input (which can be a pipe, another file, or the next line of the current file), assigns values to variables, or resets some environment variables

#Get the current hours through the pipe from the shell command date
[[email protected] temp]# awk 'BEGIN{"date"|getline;split($5,arr,/:/);print arr[1]}'
09
#Get from the file, which will override the current $0. When the line by line output of geteoline > c.txt is less, you will get less line by line coverage
[[email protected] temp]# awk '{getline <"c.txt";print $4}' b.txt 
"https://segmentfault.com/blog/learnning"
[[email protected] temp]# 
#Assign to variable
[[email protected] temp]# awk '{getline blog <"c.txt";print $0"\n"blog}' b.txt 
aasdasdadsad
BLOG ADDRESS IS "https://segmentfault.com/blog/learnning"
[[email protected] temp]# 
#Read the next line (will also override the current $0)
[[email protected] temp]# cat file
anny
100
bob
150
cindy
120
[[email protected] temp]# awk '{getline;total+=$0}END{print total}' file
370
#This means that only even rows are processed

nextSimilar to getline, it reads the next line and covers $0. The difference is that after the next line is executed, the subsequent commands are no longer executed, but the next line is read and executed again.

#Skip the lines beginning with A-S, count the number of lines, and print the final result
[[email protected] temp]# awk '/^[a-s]/{next}{count++}END{print count}' /etc/passwd
2
[[email protected] temp]# 
#Another example is merging two files with the same column
[[email protected] temp]# cat f.txt 
Student number score
00001 80
00002 75
00003 90
[[email protected] temp]# cat e.txt 
Name student number
Zhang San 00001
Li Si 00002
Wang Wu 00003
[[email protected] temp]# awk 'NR==FNR{a[$1]=$2;next}{print $0,a[$2]}' f.txt e.txt  
姓名 Student number score
Zhang San 00001 80
Li Si 00002 75
Wang Wu 00003 90
#Here, when reading the first file, NR = = FNR is set, a [$1] = $2 is executed, and then next ignores the following. When reading the second file, NR = = FNR does not hold, execute the following print command

Sub (regex, substr, string) replaces substr, the first substring that matches the regular regex, in the string (which is $0 when omitted)


[[email protected] temp]# echo 178278 world|awk 'sub(/[0-9]+/,"hello")'
hello world
[[email protected] temp]#

Gsub (regex, substr, string) is similar to sub (), but it does not just replace the first one, it is a global replacement


[[email protected]ntos7 temp]# head -n5 /etc/passwd|awk '{gsub(/[0-9]+/,"----");print $0}'   
root:x:----:----:root:/root:/bin/bash
bin:x:----:----:bin:/bin:/sbin/nologin
daemon:x:----:----:daemon:/sbin:/sbin/nologin
adm:x:----:----:adm:/var/adm:/sbin/nologin
lp:x:----:----:lp:/var/spool/lpd:/sbin/nologin

Substr (STR, N, m) cuts the string STR, starting from the nth character, cutting M characters. If M is omitted, then to the end

[ [email protected]  "Hello, the world! "|awk '{print substr($0,8,1)}'
circles
[[email protected] temp]#

Lower (STR) and upper (STR) denote case conversion

[ [email protected]  "Hello, the world! "|awk '{A=toupper($0);print A}'
Hello, the world!
[[email protected] temp]#

System (CMD) executes the shell command CMD and returns the execution result. The execution success is 0 and the failure is non-0

#Here, if statement judgment is consistent with C language, 0 is false, non-0 is true
[[email protected] temp]# awk 'BEGIN{if(!system("date>/dev/null"))print "success"}'
success
[[email protected] temp]#

Match (STR, regex) returns the position of the matching regular regex in the string str


[[email protected] temp]# awk 'BEGIN{A=match("abc.f.11.12.1.98",/[0-9]{1,3}\./);print A}'
7
[[email protected] temp]# 

As a programming language, awk can deal with all kinds of problems, and even write application software, but it is more commonly used in the command line text analysis, report generation and so on. In these scenarios, awk works well. If you often need text analysis in your work, mastering the usage of this command will save you a lot of time.

The above is the whole content of this article, I hope to help you learn, and I hope you can support developer more.