Awk of three swordsmen in shell text processing

Time:2020-11-26

Awk is a text processing tool, which is usually used to process data and generate result reports. Its name comes from the initials of the three founders: Alfred aho, Peter Weinberger and Brian Kernighan.

Grammar:

  • awk [options] 'BEGIN{} pattern {commands} END{}' file
  • stdout | awk [options] 'BEGIN{} pattern {commands} END{}'

explain:

  • optionsoption
  • BEGIN{}Before formal data processing
  • patternMatching pattern
  • {commands;...}Processing commands, possibly multiple lines
  • END{}Execute after processing all matching data

<!– more –>

Built in variables

Variable name explain
$0 Entire line
$1-$n Fields 1 – n of the current row (column)
NF Number Field, the number of fields in the current row (how many columns)
NR Number Row, the line number of the current line, counting from 1
FNR File Number RowIn multi file processing, each file line number is counted separately, starting from 0
FS Field Separator, enter field separator (default space or tab key)
RS Row Separator, enter line separator (default carriage return line feed)
OFS Output Field Separator, output field separator (default space)
ORS Output Row Separator, output line separator (default carriage return line feed)
FILENAME The name of the currently entered file
ARGC Number of command line parameters
ARGV Array of command line parameters

Example:

#Separated by: output column 1
➜  awk 'BEGIN{FS=":"} {print $1}' /etc/passwd

#Separate rows with -- and columns with: and output columns 1 and 2
➜  awk 'BEGIN{FS=":";RS="--"} {print $1,$2}' /etc/passwd

#Separate columns with: and output the last column, because NF variable is the total number of columns
➜  awk 'BEGIN{FS=":"} {print $NF}' /etc/passwd

Formatted output (printf)

Formant explain Modifier explain
%s character string - Align left
%d decimal system + Right alignment
%f Floating point number # Octal is preceded by 0 and hexadecimal by 0
%x hexadecimal
%o octal number system
%e Scientific counting
%c ASCII code for single character

Example:

# printf "%+20s %-20s\n",$1,$7
#- align left; + align right
#20 columns wide, fill in space if insufficient
#S print string
#. 3F print floating point numbers with 3 digits reserved
➜  awk 'BEGIN{FS=":";OFS="-"}{printf "%+20s %20.3f %-20s\n",$1,$3,$7}' /etc/passwd
                root                0.000 /bin/bash
                 bin                1.000 /sbin/nologin
              daemon                2.000 /sbin/nologin
                 adm                3.000 /sbin/nologin
                  lp                4.000 /sbin/nologin

Pattern matching

  • RegExp/patern/
  • Relational operation<><=>===!=~Regular matching!~Irregular matching&& And||Or!wrong

Example:

#Print lines starting with root
➜  awk 'BEGIN{FS=":"} /^root/ {print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash

#Print rows with column 3 greater than 1000
➜  awk 'BEGIN{FS=":"} $3>1000 {print $0}' /etc/passwd
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin

#Print the row where column 7 is / SBIN / nologin
➜  awk 'BEGIN{FS=":"} $7=="/sbin/nologin" {print $0}' /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

#Print the line in column 7 that ends in nologin
➜  awk 'BEGIN{FS=":"} $7~/.*nologin$/ {print $0}' /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin

#Print lines with column 3 greater than 500 and column 7 with nologin
➜  awk 'BEGIN{FS=":"} $3>500 && $7~/.*nologin$/ {print $0}' /etc/passwd
chrony:x:997:995::/var/lib/chrony:/sbin/nologin
dockerroot:x:996:993:Docker User:/var/lib/docker:/sbin/nologin

Calculation expression

Example:

#Mathematical calculation
➜  awk 'BEGIN{x=10;y=2; print x+y}'
12
➜  awk 'BEGIN{x=10;y=2; print x*y}'
20
➜  awk 'BEGIN{x=10;y=2; print x^y}'
100
➜  awk 'BEGIN{x=10;y=2; print x**y}'
100
➜  awk 'BEGIN{x=10;y=x++; print x,y}'
11 10
➜  awk 'BEGIN{x=10;y=++x; print x,y}'
11 11


#Print the blank line number and count the number of blank lines
➜  awk 'BEGIN{idx=0;} /^$/ {idx++; print NR} END{print idx;}' /etc/passwd

Process control statement

Grammar:

#Conditional judgment
if(condition1) {
    # do something
} else if(condition2) {
    # do something
} else {
    # do something
}

#Circulation
while(condition) {
    #do something
}

do
    # do something
while(condition)

for(i=0;i<10;i++) {
    # do something
}

Example:

#If column 3 is less than 10 and column 7 is a line of / SBIN / nologin, print this is if
#If column 3 is greater than 500, this is else if
#Otherwise, print this is else
➜  awk 'BEGIN{FS=":"} { if($3<10 && $7="/sbin/nologin") {print "this is if"} else if($3>500) {print "this is else if"} else {print "this is else"}}' /etc/passwd
this is if
this is if
this is else
this is else if
this is else

#Calculate the result of adding 1-10
#Note: variables do not need to be declared in advance
➜  awk 'BEGIN{ while(i<10) { sum+=i; i++}; print sum}'
45
➜  awk 'BEGIN{do { sum+=i; i++; } while(i<10); print sum}'
45
➜  awk 'BEGIN{ for(i=0;i<10;i++) { sum+=i; }; print sum}'
45

String function

Function name explain Return value
length(str) Calculate string length Integer length value
index(str,sub_str) Find sub in str_ Location of STR Location index, counting from 1
tolower(str) To lower case Converted lowercase string
toupper(str) Make it bigger Converted uppercase string
substr(str,start,length) The length bit is intercepted from the start character of str Truncated to substring
split(str,arr,fs) Split the string by FS and save the result to arr Number of substrings after splitting
match(str,reg) Press reg in STR to search and return to the location Index location
sub(reg,new_sub_str,str) Search STR for a substring that matches REG and replace it with new_ sub_ str,Replace only the first one Number of substitutions
gsub(reg,new_sub_str,str) similarsub, replace all Number of substitutions

Example:

#Sub (/ OO /, "11", $1) returns the number of substitutions; the following $1 is the replaced value
➜  awk 'BEGIN{FS=":"} { print length($1),toupper($1),substr($1,0,2),sub(/oo/,"11",$1),$1}' /etc/passwd
4 ROOT ro 1 r11t
3 BIN bi 0 bin
6 DAEMON da 0 daemon
4 SYNC sy 0 sync

#Array subscripts start at 1
➜  awk 'BEGIN{str="Shell;Python;C;C++;Java;PHP"; split(str,arr,";"); print arr[2]}'
Python
➜  awk 'BEGIN{str="Shell;Python;C;C++;Java;PHP"; split(str,arr,";"); for(i in arr) { print arr[i]; }}'
C++
Java
PHP
Shell
Python
C

Common options

  • -vParameter transfer
  • -fSpecify script file
  • -vSpecify separator
  • -VView awk version

Example:

#Introducing external variables
➜  var1=10
➜  var2="hello awk"
➜  awk -v var1="$var1" -v var2="$var2" 'BEGIN{print var1,var2}'
10 hello awk

#Extract all operations into a separate file
#Suggestion: this method is preferred for complex operation, which is easier to understand and manage
➜  touch script.awk
BEGIN{
    FS=":"
}

{
    if($3<10 && $7="/sbin/nologin") {
        print "this is if"
    } else if($3>500) {
        print "this is else if"
    } else {
        print "this is else"
    }
}

➜  awk -f script.awk /etc/passwd

#- F: equivalent to begin {FS =: "}
$ awk -F: '{print $1}' pwd
root
bin
daemon

array

The array operations in the shell are as follows:

operation Examples output
Define an array arr=("Python" "PHP" "Java" "Go" "Rust")
An array element (subscript from 0) echo ${arr[2]} Java
Number of array elements echo ${#arr[@]} 5
The length of an element echo ${#arr[0]} 6
Modify element values arr[2]="JAVA"
Delete array elements unset arr[1]
Print all array elements) echo ${arr[@]} Python JAVA Go Rust
Fragment access echo ${arr[@]:0:2} Python JAVA
Array element replacement (first found) echo ${arr[@]/A/a} Python JaVA Go Rust
Array element replacement (all) echo ${arr[@]//A/a} Python JaVa Go Rust
Array traversal for a in ${arr[*]}; do echo $a; done

However, the use of arrays in awk is slightly differentAssociative arrayProvide array function, that is, the index of array can benumberorArbitrary string

Syntax example:

#Definition
#Syntax: array_ name[index]=value
➜  awk 'BEGIN{arr[0]=0; arr["second"]="2"; print arr[0],arr["second"];}'
0 2

#Array elements participate in calculation
➜  awk 'BEGIN{arr[0]=0; arr["second"]="2"; print arr[0]+3,arr["second"];}'
3 2

#Delete array elements
#Syntax: delete array_ name[index]
➜  awk 'BEGIN{arr[0]=0;arr["second"]="2"; delete arr["second"]; print arr["second"];}'

#Traversal array
#Mode 1: for... In is out of order output
➜  awk 'BEGIN{str="Python Rust PHP Go"; arrLen=split(str,arr," "); for(i in arr){ print i,arr[i] }}'
4 Go
1 Python
2 Rust
3 PHP
#Method 2: for (I = 1; I < = len; I + +) {...} orderly output
➜  awk 'BEGIN{str="Python Rust PHP Go"; arrLen=split(str,arr," "); for(i=1;i<=arrLen;i++){ print i,arr[i] }}'
1 Rust
2 Go
3 Python
4 PHP