Linux text processing three swordsman’s awk learning notes 07: Grammar

Time:2021-11-29

grammar

variable

I think awk should be regarded as a dynamic programming language. Its variables can be used without prior declaration. When we want to use it, we can reference it at any time without declaring its data type in advance.

The variables of awk have three states.

  • Undeclared state (untyped). There is no reference or assignment.
  • Unassigned status. Referenced but not yet assigned.
  • Assigned status.

Refers to an unassigned variable whose initial value is an empty string or the number 0.

Since GNU awk version 4.2.0, the typeof() function has been provided to determine the type of a variable.

# awk 'BEGIN{print typeof(a)}'
untyped
# awk 'BEGIN{a;print typeof(a)}'
unassigned
# awk 'BEGIN{a=3;print typeof(a)}'
number
# awk 'BEGIN{a="alongdidi";print typeof(a)}'
string

Judge the unreferenced and unassigned array and return untyped. However, judging the unreferenced elements in the unreferenced array will return unassigned, which is strange and needs to be remembered.

# awk 'BEGIN{print typeof(arr)}'
untyped
# awk 'BEGIN{print typeof(arr["name"])}'
unassigned

For versions before 4.2.0, the following methods can be used to judge the variable status.

awk 'BEGIN {
    if(a==""&&a==0) {
        print "Untyped or unassigned."
    } else {
        print "Assigned."
    }
}'

Variable assignment

Assigning a value to a variable in awk can be regarded as an expression with a return value.

# awk 'BEGIN{print a=3}'
3

It is equivalent to:

awk 'BEGIN{a=3;print a}'

Based on the characteristics that variable assignment can return value, continuous variable assignment can be made.

x=y=z=5
#Equivalent to
z=5;y=5;x=5

Assignment statements can be placed where expressions are allowed to evaluate values.

# awk 'BEGIN{if(a=1){print "True"}else{print "False"}}'
True
# awk 'BEGIN{if(a=0){print "True"}else{print "False"}}'
False

If the code logic is complex, it is not recommended to put the variable assignment statement into the expression.

# awk 'BEGIN{a=1;arr[a+=2]=a=a+6;print arr[9];print arr[3]}'
7 # empty

It’s not easy to determine whether to first calculate the array index assignment on the left of the equal sign (bold red) or the variable assignment on the right of the equal sign. It may vary according to different awk versions.

Variable use

Variables can be assigned in three positions.

awk -v var=val [-v var=val ...] '{code}' var=val file1 var=val file2
  1. Assign values in the – V option. If you assign multiple values, you need to use multiple – V options.
  2. Assign values in code blocks.
  3. Assign a value before the file.

The scope of variables assigned to different positions will be different. For example, variables assigned in the main code block cannot be referenced in begin, which can be obtained according to the workflow of awk.

Variable assignment before the file, which is suitable for modifying FS in some cases.

awk '{...}' FS=" " a.txt FS=":" /etc/passwd

Awk can also reference variables in the shell.

# name="alongdidi"
# awk -v nameAwk=$name 'BEGIN{print nameAwk}'
alongdidi
# awk 'BEGIN{print nameAwk='\"$name\"'}'
alongdidi
# awk '{print nameAwk}' nameAwk=$name a.txt 
alongdidi
...

data type

Awk has two basic data types: string and numeric. As of 4.2.0, regular expression types are also supported.

The type of data should not only depend on the literal meaning. For example, seeing a number does not mean that it is a numeric type. It depends on the context of the data: it is converted to string type in string operation environment and to numeric type in numerical operation environment.

Conversion is divided into implicit conversion and explicit conversion.

Implicit conversion

1. Arithmetic operations on data can convert it to a numeric type.

Data that can be converted to numerical values will be converted to numerical values correctly, for example: “123”, “123abc” and“     123abc”。 Data that cannot be correctly converted to a value is converted to a value of 0, for example: “abc123”.

# awk 'BEGIN{a="123";print a+0;print typeof(a+0)}'
123
number
# awk 'BEGIN{a="123abc";print a+0;print typeof(a+0)}'
123
number
# awk 'BEGIN{a="   123abc";print a+0;print typeof(a+0)}'
123
number
# awk 'BEGIN{a="abc123";print a+0;print typeof(a+0)}'
0
number

Arithmetic operations include not only addition, but also all four operations. “String” + 0 is commonly used. It is converted to a value without changing the size of the value.

2. String concatenation of data can convert it to string type.

Use spaces and double quotes.

# awk 'BEGIN{print typeof(123"")}'
string
# awk 'BEGIN{print typeof(123 123)}'
string

Variables A and B are numeric values, which are implicitly converted to string after space connection, and then implicitly converted to number after addition.

# awk 'BEGIN{a=2;b=3;print a b}'
23
# awk 'BEGIN{a=2;b=3;print (a b)+4}'
27
# awk 'BEGIN{a=2;b=3;print typeof((a b)+4)}'
number

Explicit conversion

1. Use the function sprintf() to convert a numeric value to a string based on the predefined variable convfmt.

The default value of convfmt is%. 6G.

# awk 'BEGIN{print CONVFMT}'
%.6g

Variable a is a decimal. When a “” string connection is encountered, the value based on convfmt is implicitly converted to the string “123.46”, and then printed based on OFMT.

# awk 'BEGIN{a=123.4567;print a""}'
123.457
# awk 'BEGIN{a=123.4567;CONVFMT="%.2f";print a""}'
123.46

2. Use the strtonum () function to explicitly convert strings to numbers.

# awk 'BEGIN{a="100";print strtonum(a);print typeof(strtonum(a))}'
100
number
# awk 'BEGIN{a="abc";print strtonum(a);print typeof(strtonum(a))}'
0
number

Literal

There are three literal quantities in awk, which exactly correspond to the data types of the three variables.

  • String literal.
  • Numeric literal.
  • Regular expression literal.

The meaning of literal quantity is to express its literal meaning and will not refer to anything else.

Numeric literal

The numbers represented by integers, floating-point numbers and scientific counting methods are numeric literal quantities, but they must not be wrapped in double quotation marks, otherwise they are string literal quantities.

123
123.00
1.23e+8
1.23e-06

Values are always saved as floating-point numbers inside awk. If the value is found to be an integer when outputting, the decimal part will be automatically discarded.

# awk 'BEGIN{a=10.0;print a}'
10

Arithmetic operation

The following arithmetic operators take precedence from high to low.

++-- self increasing and self decreasing.
^* * power operation (power).
+- unary operator, indicating the positive and negative of a number.
*/% multiplication, division and modulo.
+- binary operators, addition and subtraction.

Like other programming languages, self increasing and self decreasing operations have different effects when variables appear in different positions. We take self increasing operation as an example.

A + +: first reference the value of a to participate in the operation, and then increase a automatically.

++a: First increment a, and then reference the value of a to participate in the operation.

When the self increment operation exists independently as a statement, there is no difference between them.

# awk 'BEGIN{a=3;a++;print a}'
4
# awk 'BEGIN{a=3;++a;print a}'
4

The situation is different when they participate in expressions.

# awk 'BEGIN{a=3;print a++;print a}'
3
4
# awk 'BEGIN{a=3;print ++a;print a}'
4
4

Power operation^ It is the symbol of power operation, which conforms to POSIX standard, but * * cannot be used in some versions of awk, so it is recommended to only use ^.

# awk 'BEGIN{print 2^3}'
8

The operation order of power operation is from right to left, not from left to right. Therefore, the value in the following example is 512 instead of 64.

# awk 'BEGIN{print 2^3^2}'
512

The assignment operator has the lowest priority and is lower than the arithmetic operator mentioned above.

= += -= *= /= %= ^= **=

It is not recommended to write some ambiguous sentences. Because different awk versions may have different results.

# awk 'BEGIN{b=3;print b+=b++}'
7

string literal

All enclosed in double quotation marks in awk are string literals. Cannot be wrapped in single quotes.

"alongdidi"
"29"
"\n"
" "
""

Awk does not provide special operators for string concatenation. Just close the strings together or separate them with spaces (multiple).

# awk 'BEGIN{print "abc""def"}'
abcdef
# awk 'BEGIN{print "abc" "def"}'
abcdef
# awk 'BEGIN{print "abc"     "def"}'
abcdef

If it is a string variable, it cannot be close together, otherwise they will be combined into another variable.

# awk 'BEGIN{a="abc";b="def";print ab}'

Concatenation of strings also has the concept of priority, which is lower than addition and subtraction. Therefore, in the first example below, only the concatenation of strings (two values are connected into a string through a space string) is successful.

# awk 'BEGIN{print 12 " " 23}'
12 23
# awk 'BEGIN{print 12 " " -23}'
12-23# if the series connection is successful, it should be like this 12-23

In the second example, due to the high priority of subtraction, first concatenate “” and – 23. It is recognized that binary subtraction is equal to 0-23 and equal to the value – 23, and then concatenate the value 12 and the value – 23 (because there is a space in the middle, concatenate). Therefore, the result of series connection is “12-23” rather than “12-23”.

The priority of operators is described in detail below.

Regular expression literal

We’ve been in touch before. What looks like this is a regular literal.

/Alice/
/A.*e/
/[[:alnum:]]+/

Regular is used in pattern. The matching method is as follows.

"str"~/pattern/
"str"!~/pattern/

The string literal “STR” here can also be replaced by variables, often $0 or $n.

If / pattern / appears alone, it is equivalent to $0 ~ / pattern /.

Regular matching results have return values. 1 is returned for successful matching and 0 is returned for failed matching.

Therefore, there are some points to pay attention to when using regular.

1. The following two are equivalent. The value of a will always be 0 or 1, and the regular literal will not be saved for subsequent references. The following details how to use variables to save regular literals.

a=/pattern/
a=$0~/pattern/

2. The following three are step-by-step equivalent, but they are generally not written like this (/ pattern / ~ $1). You need to understand this process, because generally awk will not report an error.

/pattern/~$1
$0~/pattern/~$1
0/1~$1

3. It is expected to pass the regular literal as an argument to the function.

a=/Alice/
func(arg1,a)

In fact, the value of a passed in the past will only be 0 or 1.

In addition to these three, there are many points that need to be paid attention to, mainly because / pattern / is the abbreviation of $0 ~ / pattern / and cannot be useddirectAssign a regular literal to a variable to use.

To assign a regular literal to a variable, you must use version 4.2.0. Since this version, regular types have been added to the data types of variables. The method of use is to add a @ before the regular literal during assignment.

# awk 'BEGIN{[email protected]/Alice/;print typeof(a)}' a.txt
regexp
# awk 'BEGIN{[email protected]/Alice/}$0~a{print}' a.txt
2   Alice   female  24   [email protected]  18084925203

When regular type variables are used, regular matching cannot be abbreviated.

[email protected]/pattern/
$0 ~ a {action} # correctly matched
A {action} # error matching

So you can’t just print Alice’s line.

awk 'BEGIN{[email protected]/Alice/}a{print}' a.txt

Regular expressions supported by gawk

.: matches any single character, including the newline character in gawk.

^: matches the beginning of the line.

$: matches the end of the line.

[…]: matches any single character within brackets.

[^…]: matches any single character outside the brackets.

|: logical or, one of two.

+: matches the previous character at least once.

*: matches the previous character any number of times (0, 1, or more).

?: Matches the previous character 0 or 1 times.

(): Group capture is used for back reference.

{m} : matches the previous character exactly m times.

{m,}: match the previous character at least m times.

{m, n}: match the previous character m to N times.

{, n}: matches the previous character at most N times.

[: lower:]: lowercase letters.

[: Upper:] capital letters.

[: Alpha:]: Letters (both uppercase and lowercase).

[: digit:]: number.

[: alnum:]: letters or numbers.

[: xdigit:]: hexadecimal characters.

[: Blank:]: a space or tab character.

[: Space:] space characters, including space, tab, line feed, carriage return, form feed and vertical tab.

[: punct:]: punctuation character. Non alphanumeric, control, and space characters.

[: graph:]: characters that can be printed and visible. For example, letters can be printed and visible, but spaces can be printed but invisible.

[: Print:] printable characters, i.e. non control characters.

[: CNTRL:]: control character.

The following are the regular metacharacters supported by gawk.

\y: The empty character at the beginning or end of a word, that is, the left or right boundary of the word\ yballs?\ y. You can match the word ball or the word balls.

\B: An empty character between characters inside a word. For example, Cr \ bea \ BTE can match create, but not “crea te”.

\

\>: the right boundary of the word.

\s: Any single white space character, equivalent to [[: Space:]].

\S: Any single non white space character, equivalent to [^ [: Space:]].

\w: Any single letter, number or underscore is equivalent to [[: alnum:]], and these three characters are also components of the word.

\W: Negate [^ [: alnum:]] equivalent to \ W.

\`: absolute line beginning. For example, when “ABC \ nDef \ Nghi” is encountered, the beginning and end of the line have three positions respectively, and the absolute beginning of the line will only be in front of a and the absolute end of the line will only be after I.

\’: absolute end of line.

Regular modifiers are not supported in awk, so if you want to ignore case for matching, you need to convert them into large / small case before matching; Or preset the predefined variable ignorecase.

# awk '$2~/alice/{print}' a.txt 
# awk 'tolower($2)~/alice/{print}' a.txt 
2   Alice   female  24   [email protected]  18084925203
# awk 'BEGIN{IGNORECASE=1}$2~/alice/{print}' a.txt 
2   Alice   female  24   [email protected]  18084925203

Awk Boolean

In awk, the keywords true and false are not provided to represent Boolean values like other programming languages. However, its Boolean logic is very simple:

  • A value of 0 indicates Boolean false.
  • An empty string represents Boolean false.
  • All other cases indicate Boolean truth. Note that “0” represents Boolean truth because it is a non empty string rather than a numeric value of 0.

As we said above, regular matching has a return value, 1 for successful matching and 0 for failed matching.

Boolean operations also have return values. Boolean true returns 1 and Boolean false returns 0.

# awk 'BEGIN{if(0){print "True"}}'
# awk 'BEGIN{if(""){print "True"}}'
# awk 'BEGIN{if("0"){print "True"}}'
True
# awk 'BEGIN{if("alongdidi"){print "True"}}'
True
# awk 'BEGIN{if(100){print "True"}}'
True
#Awk 'begin {if (a = 100) {print "true"}}' # assignment operation has a return value. Here, the value 100 and Boolean true are returned.
True
# awk 'BEGIN{if(a==100){print "True"}}'
#Awk 'begin {if ("alongdidi" ~ / A + /) {print "true"}}' # regular matching succeeds, and returns 1, Boolean true.
True

Comparison operation in awk

Strnum type

Source of data in awk:

  1. Generated internally. Including the assignment of variables, the return value of expressions or functions, etc.
  2. External data. External data, such as read files, user input, etc.

Without considering the regular expression types introduced in gawk version 4.2.0, the basic data types of awk are string and number or numeric. For external data (such as data read from a file), they should theoretically be of string type. However, some string data looks like numeric data, such as $1, $4 and $NF from line 2 in a.txt. For this type of data, sometimes you need to treat them as numeric values, and sometimes you need to treat them as strings.

Therefore, POSIX defines a type called “numeric string” to represent such data. The strnum data type is used in gawk. When the obtained user data looks like a numeric value, it is of type strnum, which is treated as a numeric type when used.

Note: the data type strnum is only for data in awk other than data constants, string constants and expression evaluation results. For example, fields read from files, elements in arrays, and so on.

Although the external data from the pipeline should have been recognized as string, in some cases it is recognized as strnum because it looks like a value.

# echo "30" | awk '{print typeof($0)}'
strnum
# echo "+30" | awk '{print typeof($0)}'
strnum
# echo " 30" | awk '{print typeof($0)}'
strnum
# echo " +30" | awk '{print typeof($0)}'
strnum
# echo "30a" | awk '{print typeof($0)}'
string
# echo "a30" | awk '{print typeof($0)}'
string
# echo "30 a" | awk '{print typeof($0),typeof($1)}'
string strnum

Size comparison operation

Compare operators.

, <=, >=, !=, ==: Size and equivalence comparison.
In: array member test.

Compare rules.

+----------------------------------------------
        |       STRING          NUMERIC         STRNUM
--------+----------------------------------------------
STRING  |       string          string          string
NUMERIC |       string          numeric         numeric
STRNUM  |       string          numeric         numeric
--------+----------------------------------------------

In short, string has the highest priority. Once one of the two sides of the operator is string, the two sides adopt string type comparison. Otherwise, numeric type comparison is used in other cases.

We output $0 and $1 and their corresponding data types. Although all external data should be strings, they are recognized as strnum because they are similar to numerical values.

# echo ' +3.14' | awk '{print "---"$0"---";print "---"$1"---"}' 
--- +3.14---
---+3.14---
# echo ' +3.14' | awk '{print typeof($0);print typeof($1)}'
strnum
strnum

The first group of comparisons: in this group of comparisons, strnum and string are compared. As long as one of them is a string, the two sides use string for comparison. Strings are compared character by character, so the first pair of true and the last two pairs of false.

# echo ' +3.14' | awk '{print($0==" +3.14")}'
1
# echo ' +3.14' | awk '{print($0=="+3.14")}'
0
# echo ' +3.14' | awk '{print($0=="3.14")}'
0

The second group of comparison: in this group of comparison, $0 and $1 are strnum type data, and the difference is only in one space character. Strnum and numeric are compared according to the numeric method. Therefore, both $0 and $1 are compared according to the value 3.14, so the two pairs of comparisons return Boolean true.

# echo ' +3.14' | awk '{print($0==3.14)}'
1
# echo ' +3.14' | awk '{print($1==3.14)}'
1

Comparison of the third group: the comparison principle of this group is actually the same as that of the first group. If the first group understands it, it will understand it here. Therefore, no explanation will be made.

# echo ' +3.14' | awk '{print($1==" +3.14")}'
0
# echo ' +3.14' | awk '{print($1=="+3.14")}'
1
# echo ' +3.14' | awk '{print($1=="3.14")}'
0

The fourth group of comparison: awk can identify the value 1E2 of the scientific counting representation transmitted by echo through the pipeline and identify it as strnum, then compare strnum with strnum, and compare it according to numeric. The result is obvious. If you don’t understand the ternary operator “?:, you will also explain it later.

# echo 1e2 3 | awk '{print $1;print $2;print typeof($1);print typeof($2)}'
1e2
3
strnum
strnum
# echo 1e2 3 | awk '{print ($1>$2)?"True":"False"}'
True

When using string comparison, it should be noted that it is compared character by character according to (should be) ASCII coding table.

The ASCII encoding (decimal) of Figure 1 is 49, 9 is 57, and a is 97.

#The following are true.

Logical operation

Expr1 & & expr2: logical and, binary operator. If expr1 is false, there is no need to calculate expr2 (i.e. short circuit operation), and it is directly determined that the whole logic and expression are false.

Expr1 | expr2: logical or, binary operator. If expr1 is true, there is no need to calculate expr2 (i.e. short circuit operation), and it is directly determined that the whole logic or expression is true.

! Expr: logical negation (not), unary operator.

You can use this “!” or “!” to convert data to Boolean values 0 or 1.

awk 'BEGIN{print(!99)}'    # 0
awk 'BEGIN{print(!"ab")}'    # 0
awk 'BEGIN{print(!0)}'    # 1
awk 'BEGIN{print(!ab)}'    # 1
awk 'BEGIN{print(!"")}'    # 1

awk 'BEGIN{print(!!99)}'    # 1
awk 'BEGIN{print(!!"ab")}'    # 1
awk 'BEGIN{print(!!0)}'    # 0
awk 'BEGIN{print(!!ab)}'    # 0
awk 'BEGIN{print(!!"")}'    # 0

Since the value of a variable is an empty string or a value of 0 when it is first used and not assigned, its Boolean value is false. Combining this feature with inversion, we can process the data in a specified range. The idea is:

  1. Set the starting position of the qualified range, and use VaR =! Var makes the Boolean value of VaR true. Note that VaR here can be any variable, as long as it is unassigned (or empty string or value 0).
  2. VaR is used as a pattern condition for data processing.
  3. After processing, use VaR =! Var makes the Boolean value of VaR false.

For example, print the data from line 1 to line 4 of a.txt. We can implement it in the way we used before.

awk 'NR<=4{print}' a.txt
awk 'NR>=1&&NR<=4{print}' a.txt

Or use the idea we just said.

# awk 'NR==1{print;along=!along;next}along{print}NR==4{along=!along;next}' a.txt 
ID  name    gender  age  email          phone
1   Bob     male    28   [email protected]     18023394012
2   Alice   female  24   [email protected]  18084925203
3   Tony    male    21   [email protected]    17048792503

You can also use a string connection to store it in a variable and print it at the end.

awk 'NR==1{print;along=!along;next}along{multiLine=multiLine$0"\n"}NR==4{along=!along;next}END{printf multiLine}' a.txt

Operator priority

The priority of operations / operators is arranged from high to low.

()
$
++ --
+ - !
* / %
+ -
space
| |&
< > <= >= != ==
~ !~
in
&&
||
?:
= += -= *= /= %= ^=

() highest priority; Unary operators take precedence over most binary operators; Space is a space that represents string concatenation. In string concatenation of string literals, we have explained the influence of string concatenation priority on concatenation results.

For operators with the same priority, the operation order is from left to right, but for assignment and power operation, it is from right to left.

a-b+c  ==>  (a-b)+c
a=b=c  ==>  a=(b=c)
2**2**3  ==>  2**(2**3)

>In addition to the greater than sign, it also indicates output redirection. Generally speaking, its operators should be kept to the lowest to avoid unexpected results.

awk 'BEGIN{print "foo">1<2?"true.txt":"false.txt"}'
awk 'BEGIN{print "foo">(1<2?"true.txt":"false.txt")}'

Process control statement

In fact, most programming languages in this part are the same. Bash’s learning notes in my blog also describe these basic things more, so I won’t repeat them.

In awk, the code block {…} does not separate the scope of variables. For example, if we have used a variable I in a loop, the variable I is still valid after exiting the loop.

# awk 'BEGIN{for(i=1;i<=10;i++){} print i}'
11

If statement

#Single branch
if(cond){
    statements
}

#Double branch
if(cond){
    statements1
}else{
    statements2
}

#Multi branch
if(cond1){
    statements1
}else if(cond2){
    statements2
}else if(cond3){
    statements3
}else{
    statementsLast
}

Let’s look at an interesting example. There was a couple whose husband was a programmer. The wife said to her husband, “go out and buy a steamed stuffed bun. If you see someone selling watermelon, buy two.”

#Natural language understanding
Buy a steamed stuffed bun
If (see the one selling watermelon){
    Buy 2 watermelons
}

#Programming language understanding
If (see the one selling watermelon){
    Buy two steamed stuffed buns
}else{
    Buy a steamed stuffed bun
}

Example.

# cat if.awk 
BEGIN{
    if(mark>=0&&mark<60) {
        print "bad"
    } else if(mark>=60&&mark<90) {
        print "ordinary"
    } else if(mark>=90&&mark<=100) {
        print "Good"
    } else {
        print "error mark"
    }
}
# awk -v mark=100 -f if.awk 
Good

Conditional expression

Conditional expression is our common ternary operator.

selector ? if-true-exp : if-false-exp

The three are expressions. First, calculate the selector. If the value is true (we have explained the true and false concepts of awk when explaining Boolean values), calculate the expression if true exp and take its return value as the return value of the whole expression. Otherwise, calculate if false exp and take its value as the return value of the whole expression.

Let’s look at two successful examples.

awk 'BEGIN{mark=60;grade=mark>=60?"pass":"fail";print grade}'
awk 'BEGIN{mark=60;mark>=60?grade="pass":grade="fail";print grade}'

Note that these three are expressions. The difference between them and ordinary statements is that they can evaluate and have a return value, while ordinary statements can’t. For example.

# awk 'BEGIN{mark=60;mark>=60?print "pass":print "fail"}'
awk: cmd. line:1: BEGIN{mark=60;mark>=60?print "pass":print "fail"}
awk: cmd. line:1:                        ^ syntax error

switch Statements

The function of the switch statement in awk is the same as that in bash. The difference is that each branch of the switch statement of awk needs to explicitly use break to leave the branch, otherwise branch penetration will occur.

switch(expression) {
    case val1|regex1 : statements1
    case val2|regex2 : statements2
    case val3|regex3 : statements3
... ...
    [default: statemtsLast]
}

Examples are as follows. Except for the break of each branch, the rest is the same as the switch we saw in bash.

# cat switch1.awk 
{
    switch($0){
        case 1:
            print "Monday."
            break
        case 2:
            print "Tuesday."
            break
        case 3:
            print "Wednesday."
            break
        case 4:
            print "Thursday."
            break
        case 5:
            print "Friday."
            break
        case 6:
            print "Saturday."
            break
        case 7:
            print "Sunday."
            break
        default:
            print "What day is today?"
    }
}
# awk -f switch1.awk 
0
What day is today?
8
What day is today?
1
Monday.
2
Tuesday.

What day is today?    #  CTRL + d end input

We can comment out the break instruction of case = 1 | 2 | 3 | 4 branch to see the effect. We can understand what branch penetration is. This is very simple and will not be demonstrated.

If we want to output weekday or weekend information according to user input, we can comment out break.

{
    switch($0){
        case 1:
        case 2:
        case 3:
        case 4:
        case 5:
            print "Weekday."
            break
        case 6:
        case 7:
            print "Weekend."
            break
        default:
            print "What day of today?"
            break
     }
}

The value of case does not support logic or, and an error will be reported.

# cat switch4.awk
{
    switch($0){
        case 1|2|3|4|5:
            print "Weekday."
            break
        case 6|7:
            print "Weekend."
            break
        default:
            print "What day of today?"
            break
     }
}
# awk -f switch4.awk 
awk: switch4.awk:3:         case 1|2|3|4|5:
awk: switch4.awk:3:               ^ syntax error

In this case, you can only use regular.

{
    switch($0){
        case /[12345]/:
            print "Weekday."
            break
        case /[67]/:
            print "Weekend."
            break
        default:
            print "What day is today?"
            break
    }
}

loop

While loop.

while(condition) {
    statements
}

Do while loop.

do {
    statements
} while(condition)

For loop.

for(expr1;expr2;expr3) {
    statements
}

The for loop iterates through the array.

for(idx in arrary) {
    statements
}

Break and continue

Break and continue in awk are almost the same as those in Bash, except that break in awk is also used for exit branches in switch… Case… Statements.

# awk 'BEGIN{for(i=1;i<=5;i++){if(i==3){break}print i}}'
1
2
# awk 'BEGIN{for(i=1;i<=5;i++){if(i==3){continue}print i}}'
1
2
4
5

Next and nextfile

Next will be explained together with getline. Let’s look directly at the result of the command execution.

# seq 1 5 | awk 'NR==3{next}{print}'
1
2
4
5
# seq 1 5 | awk 'NR==3{getline}{print}'
1
2
4
5

The execution result of the command is the same, but the execution process is different.

The pattern and {action} in the code block together form a rule. In this example, there are two rules. The first rule is that the pattern is NR = = 3, and the second rule has no pattern, which means that each record conforms to the rule.

Next: read the next line, and then return to the header of the rule (NR = = 3).

Getline: read the next line and continue processing at the current position (the position of getline).

That is, the first command returns to the rule header after next to determine whether NR is equal to 3, while the second command executes a print without pattern after getline.

Nextfile: the next command stops the currently processing record and then enters the next record, while the nextfile command stops the currently processing file and then enters the processing of the next file.

# awk 'NR==3{nextfile}{print}' a.txt a.txt 
ID  name    gender  age  email          phone
1   Bob     male    28   [email protected]     18023394012
ID  name    gender  age  email          phone
... ...
10  Bruce   female  27   [email protected]   13942943905
# awk 'FNR==3{nextfile}{print}' a.txt a.txt 
ID  name    gender  age  email          phone
1   Bob     male    28   [email protected]     18023394012
ID  name    gender  age  email          phone
1   Bob     male    28   [email protected]     18023394012

exit

exit [code]

Exit is used to exit the awk program with a return value.

If you execute exit in the begin or main code block, the current processing will be stopped and the contents of the end code block (if any) will be executed. That is to say, the execution of the exit command includes the execution of the end code block.

If exit is executed in the end code, the program will exit directly.

We’re talking about begin andENDIt has been said during the code block that if there is an end code block but no EOF is encountered (the end of the file is encountered or Ctrl + D is typed in stdin), the end code block will not be executed. But now there is exit execution. Even without EOF, we can execute the contents of the end code block.

#Awk 'begin {print "Hello world!"; exit} end {print "this is not end!!!"}' # there is no file processing here, and the following two lines are not stdin.
Hello world!
This is not end!!!

If there is no begin, you need to enter it at least once in main.

# awk '{print "Hello world!";exit}END{print "This is not end!!!"}'
1 # user input
Hello world!
This is not end!!!

Sometimes, in order to make exit in begin or main exit the program directly like other programming languages (such as bash), we can set a variable (such as flag) before exit, and then judge the variable in the head of end to decide whether to exit.

# awk 'BEGIN{print "Hello world!";flag=1;exit}{}END{if(flag){exit};print "end code"}'
Hello world!

The complete pseudo code is as follows:

BEGIN {
    ... ...
    if(begin cond) {
        flag=1
        exit
    }
    ...
}
{
    ... ...
    if(main cond) {
        flag=1
        exit
    }
    ... ...
}
END {
    If (flag) {exit} # must be in the end header.
    ... ...
}

Exit can specify the exit status code (return value). If the exit with status code is only one time, the status code is adopted.

# awk 'BEGIN{exit 100}'
# echo $?
100
#Awk '{exit 100}' # here must be entered at least once. If you directly Ctrl + D, you will return 0 instead of the exit status code.
1
# echo $?
100
#Awk '{exit 100}' # directly results from Ctrl + D.
# echo $?
0
#Awk 'end {exit 100}' # here is also a direct Ctrl + D. the result is different because the code blocks are different.
# echo $?
100

If there are multiple exits and more than one exit has a status code, the exit status code is the status code of the last exit with a status code executed.

# awk 'BEGIN{exit 10}{exit 20}END{exit 30}'
# echo $?
30
# awk '{exit 20}END{exit 30}'
# echo $?
30
# awk '{exit 20}END{exit 30}'
1
# echo $?
30
# awk 'BEGIN{exit 10}{exit 20}END{exit}'
# echo $?
10 # the last exit has no status code. The last exit with status code is in begin and will not be executed in main.