Sed can realize non interactive string replacement, and grep can realize effective filtering function. Compared with the two, awk is a powerful text analysis tool, especially when analyzing data and generating reports.
The powerful function of awk is unmatched by general linux commands. In this article, I won’t tell you that awk is also a programming language, so as not to scare you. We just need to use it as a powerful text analysis tool under Linux.
I still uphold this articlepractical、practicePrinciples, providing a large number of examples, but not exhaustive. This article can help you quickly apply awk, which is enough to deal with most application scenarios at work.
Before learning how to use it, let’s take a look at what awk can do:
1.It can output and display the given text content in the desired format and print it into a report.
2.Analyze and process the system log, quickly analyze and mine the data we care about, and generate statistical information;
3.It is convenient for statistics, such as website visits, IP visits, etc;
4.Through the combination of various tools, quickly summarize and analyze the operation information of the system, so that you can know the operation of the system like the back of your hand;
5.Powerful script language expression ability, support syntax such as loop, condition and array to help you analyze more complex data;
Of course, awk can not only do these things, but also conduct efficient data analysis and statistics according to your wishes when you integrate its usage.
However, we need to know that awk is not omnipotent. It is good at processingformatText, such asjournal、csvFormat data, etc;
Let’s first briefly understand the basic working principle of awk. Through the following graphic description, I hope you can understand how awk works.
Awk basic command format
The working principle of awk is explained in detail in combination with the following figure
First, execute the keyword
BEGINAfter the command in braces, start execution
Read the data line by line. It is read by default
\nThe split content is onerecord, actuallythat ‘s okThe concept of;
takerecordDivided into by the specified separatorfield, actuallycolumnThe concept of;
bodyThe command in the block is executed once for each line read
body, final completion
ENDCommand, usually in
ENDOutput the final result in the;
Awk is input driven. It will be executed as many times as there are input lines
In the following example study, we should always remember:record (Record)namelythat ‘s ok，field (Field)namelycolumn，
BEGINIt is the pretreatment stage,
bodyIt’s the stage where awk really works,
ENDThis is the final processing stage.
Actual combat – Introduction
Starting from the following content, we go directly to the actual combat. For example, I will save the following information to file txt
OK, let’s start with the simplest and most commonly used awk example, outputting columns 1, 4 and 8:
Inside the braces is the awk statement, which can only beSingle quotation markIncluding, where,
$1..$NIndicates the column number,
$0Represents the entire line content
awkMore practical functionsFormat output。 and
printfThe format output is a dime. I personally like this format, rather than
C++The way of flow in.
%sRepresents a string placeholder,
-4Indicates that the column width is
4, andAlign left, we can also list more complex formats as needed. We won’t give detailed examples here.
Actual combat – Advanced
（1） Filter records
Some data may not be what you want and can be filtered as needed
The above filter condition is that the row with root in column 3 and 10 in column 6 will be output.
Awk supports various comparison operation symbols
$0Represents all the contents of the entire line.
（2） Built in variable
Awk has built-in variables to facilitate our data processing
Filter the root user in column 3 and the content in line 2, and output the line number when printing.
NRIndicates the current row,
NFIndicates how many columns the current row has.
（3） Specify separator
Our data are not always based onSpaceAs a separator, we can
FSVariable specifies the delimiter.
We specify the separator as
2019In this way, the line content is divided into two parts
The above command can also be passed
-FOption specifies the delimiter
If you need to specify multiple separators, you can do so
-F '[;:]'。 I believe that smart you will be able to understand and understand.
Similarly, awk can specify the delimiter for output through
OFSVariable to set
When outputting, each field uses
OFSThe specified symbols are separated.
Actual combat – Advanced
（1） Condition matching
Lists all the files for the root user, along with the first line of files
The upper match is contained in the third column
~It’s actually a regular expression match.
Similarly, awk can match a row like grep, like this
In addition, this can be done
/Aug|Dec/Match multiple keywords.
patternReversehave access to
（2） Split file
Let’s do an interesting thing. You can split the text information into multiple files. The following command splits the file information into multiple files according to the month (column 5)
Awk supports redirection symbols
>, redirect each line directly to the file named month. Of course, you can also output the specified column to the file
（3） If statement
For complex condition judgment, awk can be used
ifStatement, awk is powerful because it is a script interpreter and has the programming ability of general scripting language. The following example splits files through slightly complex conditions
ifThe statement is inside curly braces.
Statistics of all in the current directory
*.hTotal space occupied by files
Column 5 represents the file size, which will be calculated to every row read
sumVariable, at the end
ENDPhase print out
sum, that is, the total size of all files.
Let’s take another example to count how much memory each user’s process occupies. Note that the value is the RSS column
It’s used herearrayandforLoop, it is worth mentioning that the array of awk can be understood as a dictionary or
Map, the key can be a numeric value or a string. This data type is commonly used in peacetime.
The following simple example shows awk’s support for string operations
Awk supports a series of string functions,
lengthCalculate string length,
toupperThe function converts a string to uppercase.
Actual combat – skills
In order to understand the working mechanism of awk as a whole, let’s take a comprehensive example. Suppose there is a student transcript:
Because the sample program is slightly complex and difficult to read on the command line, we also want to introduce another awk execution method through this case. Our awk script is as follows:
The results of executing awk are as follows
We can write complex awk statements to script files
cal.awk, and then through
-fOption specifies execution from a script file.
BEGINIn this stage, we initialize the relevant variables and print the format of the header
bodyStage, we read each line of data and calculate the total score of the subject and the student
ENDIn the stage, we first printed the format of the end of the table, printed the total score, and calculated the average value
This simple example fully reflects the working mechanism and principle of awk. I hope this example can help you really understand how awk works.
Through the above examples, we have learned the working principle of awk. Let’s summarize the following concepts and common knowledge points.
（1） Built in variable
1.Each line of content record is calledrecord, English nameRecord
2.Each column in each row separated by a separator is calledfield, English nameField
After clarifying these concepts, let’s summarize several important built-in variables:
NR: indicates the current number of rows;
NF: indicates the current number of columns;
RS: line separator, which defaults to line feed;
FS: column separator, default is space and tab;
OFS: output column separator, which is used to split fields during printing. The default is space
ORS: output line separator, used to split records during printing. The default is line feed
（2） Output format
Provided by awk
printfFunction to format the output function, specific usage and
CThe grammar is basically the same.
Common formatting methods:
%dDecimal signed integer
%uDecimal unsigned integer
%fFloating point number
%eFloating point number in exponential form
%XUnsigned integer in hexadecimal
%0Unsigned integer in octal
%gAutomatically select the appropriate representation
（3） Programming statement
Awk is not only a Linux command line tool, but also a scripting language. It supports all control structures of programming language. It supports:
（4） Common functions
Awk has built-in a large number of useful functions and supports custom functions, allowing you to write your own functions to expand the built-in functions.
Here is a brief list of some commonly used string functions:
index(s, t)Returns the position of substring t in S
length(s)Returns the length of the string s
split(s, a, sep)Split the string and store the split fields in array a
substr(s, p, n)Returns a substring according to the parameter
tolower(s)Convert string to lowercase
toupper(s)Convert string to uppercase
Here is a brief summary of some commonly used string function functions and specific use methods. You also need to refer to the previous example program, draw inferences from one instance and apply it to practical problems.