We know the three swordsmen of Linux. They aregrep
、sed
、awk
。 As I said earliergrepandsed, students who haven’t seen it can directly click to read. What we want to share today is more powerfulawk
。
Sed can realize non interactive string replacement, and grep can realize effective filtering function. Compared with the two, awk is a powerful text analysis tool, especially when analyzing data and generating reports.
The powerful function of awk is unmatched by general linux commands. In this article, I won’t tell you that awk is also a programming language, so as not to scare you. We just need to use it as a powerful text analysis tool under Linux.
I still uphold this articlepractical、practicePrinciples, providing a large number of examples, but not exhaustive. This article can help you quickly apply awk, which is enough to deal with most application scenarios at work.
scene
Before learning how to use it, let’s take a look at what awk can do:
1.It can output and display the given text content in the desired format and print it into a report.
2.Analyze and process the system log, quickly analyze and mine the data we care about, and generate statistical information;
3.It is convenient for statistics, such as website visits, IP visits, etc;
4.Through the combination of various tools, quickly summarize and analyze the operation information of the system, so that you can know the operation of the system like the back of your hand;
5.Powerful script language expression ability, support syntax such as loop, condition and array to help you analyze more complex data;
……
Of course, awk can not only do these things, but also conduct efficient data analysis and statistics according to your wishes when you integrate its usage.
However, we need to know that awk is not omnipotent. It is good at processingformatText, such asjournal、csvFormat data, etc;
principle
Let’s first briefly understand the basic working principle of awk. Through the following graphic description, I hope you can understand how awk works.
Awk basic command format
The working principle of awk is explained in detail in combination with the following figure

-
First, execute the keyword
BEGIN
Marked{}
Commands in; -
complete
BEGIN
After the command in braces, start executionbody
Command; -
Read the data line by line. It is read by default
\n
The split content is onerecord, actuallythat ‘s okThe concept of; -
takerecordDivided into by the specified separatorfield, actuallycolumnThe concept of;
-
Loop execution
body
The command in the block is executed once for each line readbody
, final completionbody
Execution; -
Finally, execute
END
Command, usually inEND
Output the final result in the;
Awk is input driven. It will be executed as many times as there are input linesbody
Command.
In the following example study, we should always remember:record (Record)namelythat ‘s ok,field (Field)namelycolumn,BEGIN
It is the pretreatment stage,body
It’s the stage where awk really works,END
This is the final processing stage.
Actual combat – Introduction
Starting from the following content, we go directly to the actual combat. For example, I will save the following information to file txt

OK, let’s start with the simplest and most commonly used awk example, outputting columns 1, 4 and 8:

Inside the braces is the awk statement, which can only beSingle quotation markIncluding, where,$1..$N
Indicates the column number,$0
Represents the entire line content
Look againawk
More practical functionsFormat output。 andC
Linguisticprintf
The format output is a dime. I personally like this format, rather thanC++
The way of flow in.

%s
Represents a string placeholder,-4
Indicates that the column width is4
, andAlign left, we can also list more complex formats as needed. We won’t give detailed examples here.
Actual combat – Advanced
(1) Filter records
Some data may not be what you want and can be filtered as needed

The above filter condition is that the row with root in column 3 and 10 in column 6 will be output.
Awk supports various comparison operation symbols!=
、>
、<
、>=
、<=
, where$0
Represents all the contents of the entire line.
(2) Built in variable
Awk has built-in variables to facilitate our data processing

Filter the root user in column 3 and the content in line 2, and output the line number when printing.NR
Indicates the current row,NF
Indicates how many columns the current row has.
(3) Specify separator
Our data are not always based onSpaceAs a separator, we canFS
Variable specifies the delimiter.

We specify the separator as2019
In this way, the line content is divided into two parts2019
Replaced with*
The above command can also be passed-F
Option specifies the delimiter

If you need to specify multiple separators, you can do so-F '[;:]'
。 I believe that smart you will be able to understand and understand.
Similarly, awk can specify the delimiter for output throughOFS
Variable to set

When outputting, each field usesOFS
The specified symbols are separated.
Actual combat – Advanced
(1) Condition matching
Lists all the files for the root user, along with the first line of files

The upper match is contained in the third columnroot
OK,~
It’s actually a regular expression match.
Similarly, awk can match a row like grep, like this

In addition, this can be done/Aug|Dec/
Match multiple keywords.
patternReversehave access to!
Symbol

(2) Split file
Let’s do an interesting thing. You can split the text information into multiple files. The following command splits the file information into multiple files according to the month (column 5)

Awk supports redirection symbols>
, redirect each line directly to the file named month. Of course, you can also output the specified column to the file
(3) If statement
For complex condition judgment, awk can be usedif
Statement, awk is powerful because it is a script interpreter and has the programming ability of general scripting language. The following example splits files through slightly complex conditions

Notice,if
The statement is inside curly braces.
(4) Statistics
Statistics of all in the current directory*.c
、*.h
Total space occupied by files

Column 5 represents the file size, which will be calculated to every row readsum
Variable, at the endEND
Phase print outsum
, that is, the total size of all files.
Let’s take another example to count how much memory each user’s process occupies. Note that the value is the RSS column

It’s used herearrayandforLoop, it is worth mentioning that the array of awk can be understood as a dictionary orMap
, the key can be a numeric value or a string. This data type is commonly used in peacetime.
(5) String
The following simple example shows awk’s support for string operations

Awk supports a series of string functions,length
Calculate string length,toupper
The function converts a string to uppercase.
Actual combat – skills
In order to understand the working mechanism of awk as a whole, let’s take a comprehensive example. Suppose there is a student transcript:

Because the sample program is slightly complex and difficult to read on the command line, we also want to introduce another awk execution method through this case. Our awk script is as follows:

The results of executing awk are as follows

We can write complex awk statements to script filescal.awk
, and then through-f
Option specifies execution from a script file.
-
stay
BEGIN
In this stage, we initialize the relevant variables and print the format of the header -
stay
body
Stage, we read each line of data and calculate the total score of the subject and the student -
stay
END
In the stage, we first printed the format of the end of the table, printed the total score, and calculated the average value
This simple example fully reflects the working mechanism and principle of awk. I hope this example can help you really understand how awk works.
Summary
Through the above examples, we have learned the working principle of awk. Let’s summarize the following concepts and common knowledge points.
(1) Built in variable
1.Each line of content record is calledrecord, English nameRecord
2.Each column in each row separated by a separator is calledfield, English nameField
After clarifying these concepts, let’s summarize several important built-in variables:
-
NR
: indicates the current number of rows; -
NF
: indicates the current number of columns; -
RS
: line separator, which defaults to line feed; -
FS
: column separator, default is space and tab; -
OFS
: output column separator, which is used to split fields during printing. The default is space -
ORS
: output line separator, used to split records during printing. The default is line feed
(2) Output format
Provided by awkprintf
Function to format the output function, specific usage andC
The grammar is basically the same.
Basic Usage

Common formatting methods:
-
%d
Decimal signed integer -
%u
Decimal unsigned integer -
%f
Floating point number -
%s
character string -
%c
Single character -
%e
Floating point number in exponential form -
%x
%X
Unsigned integer in hexadecimal -
%0
Unsigned integer in octal -
%g
Automatically select the appropriate representation -
\n
Newline character -
\t
Tab character
(3) Programming statement
Awk is not only a Linux command line tool, but also a scripting language. It supports all control structures of programming language. It supports:
-
Conditional statement
-
Circular statement
-
array
-
function
(4) Common functions
Awk has built-in a large number of useful functions and supports custom functions, allowing you to write your own functions to expand the built-in functions.
Here is a brief list of some commonly used string functions:
-
index(s, t)
Returns the position of substring t in S -
length(s)
Returns the length of the string s -
split(s, a, sep)
Split the string and store the split fields in array a -
substr(s, p, n)
Returns a substring according to the parameter -
tolower(s)
Convert string to lowercase -
toupper(s)
Convert string to uppercase
Here is a brief summary of some commonly used string function functions and specific use methods. You also need to refer to the previous example program, draw inferences from one instance and apply it to practical problems.