Detailed explanation of the usage of SED and awk in Linux

Time:2020-2-9

Sed usage:

Sed is a very good file processing tool. It is a pipeline command. It is mainly processed in behavioral units. You can replace, delete, add, select data lines and other specific work. Let’s learn about the use of SED first

The SED command line format is:

Sed [- nefri] 'command' enter text

Common options:

– N: use silent mode. In the general use of SED, all data from stdin will be listed on the screen. But if the – n parameter is added, only the row (or action) specially processed by sed will be listed.

– E: edit sed directly in command line mode;

– F: write the SED action in a file directly, – f filename can execute the SED action in filename;

The action of – R: sed supports the syntax of extended normal representation. (presupposition is the basic normal representation grammar)

-I: directly modify the read file content, rather than output by the screen.

Common commands:

A: new. A can be followed by strings. These strings will appear on a new line (current next line) –
C: instead, C can be followed by strings, which can replace the lines between N1 and N2!
D: delete. Because it’s delete, there’s usually no noise after D;
I: insert, I can be followed by strings, and these strings will appear in a new line (current previous line);
P: printing is also about to print out the selected information. Usually P will operate with the parameter sed-n ~
S: substitution, we can directly carry out the work of substitution! Usually this s action can be matched with normal representation! For example, 1,20s / old / new / G is it!

For example: (suppose we have a file named AB)

Delete a row

[root @ localhost Ruby] ා sed '1D' ab 񖓿 delete the first line 
   [root @ localhost Ruby] ා sed '$d' ab 񖓿 delete the last line
   [root @ localhost Ruby] ා sed '1,2d' ab 񖓿 delete the first line to the second line
   [root @ localhost Ruby] ා sed '2, $d' ab 񖓿 delete the second line to the last line

Display a row

[root @ localhost Ruby] ා sed - n '1p' ab ා display the first line 
   [root @ localhost Ruby] ා sed - n '$p' ab ා display the last line
   [root @ localhost Ruby] ා sed - n '1,2p' ab ා display the first line to the second line
   [root @ localhost Ruby] ා sed - n '2, $p' ab ා display the second line to the last line

Use mode to query

[root @ localhost Ruby] ා sed - n '/ RUBY / P' ab ා query all lines including the keyword Ruby
   [root @ localhost Ruby] ා sed - n '/ \ $/ P' ab ා query all lines including the keyword $, use the backslash \ to shield the special meaning

Add one or more lines of string

[[email protected] ruby]# cat ab
   Hello!
   ruby is me,welcome to my blog.
   end
   [root @ localhost Ruby] 3535; sed '1A drive tea' ab ා add the string "drive tea" after the first line
   Hello!
   drink tea
   ruby is me,welcome to my blog. 
   end
   [root @ localhost Ruby] ා sed '1,3a drive tea' ab ා add the string "drive tea" after the first line to the third line
   Hello!
   drink tea
   ruby is me,welcome to my blog.
   drink tea
   end
   drink tea
   [root @ localhost Ruby] ා sed '1A drive tea \ nor coffee' ab ා add more than one line after the first line, and use the newline character \ n
   Hello!
   drink tea
   or coffee
   ruby is me,welcome to my blog.
   end

Replace one or more lines

[root @ localhost Ruby] ා sed '1C hi' ab ා replace the first line with hi
   Hi
   ruby is me,welcome to my blog.
   end
   [root @ localhost Ruby] ා sed '1,2c hi' ab ා replace the first line to the second line with hi
   Hi
   end

Replace a part of a line

Format: sed’s / string to replace / new string / g ‘(string to replace can be regular expression)

[root @ localhost Ruby] ා sed - n '/ RUBY / P' ab | sed 's / RUBY / bird / g' ා replace ruby with bird
  [root @ localhost Ruby] ා sed - n '/ RUBY / P' ab | sed 's / RUBY // g' ා delete Ruby

insert

[root @ localhost Ruby] ා sed - I '$a bye' ab ා enter "Bye" directly in the last line of the file ab
   [[email protected] ruby]# cat ab
   Hello!
   ruby is me,welcome to my blog.
   end
   bye

Delete matching row

Sed – I ‘/ matching string / D’ filename (Note: if the matching string is a variable, it needs to be ”, not ”. Remember as if)

Replace a string in a matching line

Sed – I ‘/ match string / S / replace source string / replace target string / g’ filename

The usage of Linux awk

Brief introduction

Awk is a powerful text analysis tool. Compared with grep search and sed editing, awk is particularly powerful when it analyzes data and generates reports. In short, awk is to read the file line by line, slice each line with the space as the default separator, and then analyze the cut part.

There are three different versions of awk: awk, nawk and gawk. Without special explanation, it generally refers to gawk, which is the GNU version of awk.

Awk’s name comes from the first letter of its founders, Alfred aho, Peter Weinberger and Brian Kernighan. In fact, awk does have its own language: awk programming language, which has been formally defined as “style scanning and processing language” by three creators. It allows you to create short programs that read input files, sort data, process data, perform calculations on input, generate reports, and numerous other functions.

Usage method


awk '{pattern + action}' {filenames}

Although operations can be complex, the syntax is always the same, where pattern represents what awk looks for in the data, and action is a series of commands executed when it finds a match. Curly braces ({}) do not need to always appear in the program, but they are used to group a series of instructions according to a specific pattern. Pattern is the regular expression to be represented, enclosed by slashes.

The most basic function of awk language is to browse and extract information based on specified rules in files or strings. Only after awk extracts information can other text operations be performed. A full awk script is usually used to format information in a text file.

In general, awk is a file processing unit. Awk receives one line of the file and executes the corresponding command to process the text.

Call awk

There are three ways to call awk

1. Command line mode


awk [-F field-separator] 'commands' input-file(s)

Where commands is the real awk command and the [- f field separator] is optional. Input file (s) is the file to be processed.

In awk, in each line of a file, each item separated by a domain separator is called a domain. In general, the default field separator is a space without naming the – f field separator.

2. Shell script mode

Insert all the awk commands into a file, and make the awk program executable. Then, the awk command interpreter is the first line of the script, which is called by typing the script name again.

Equivalent to the first line of the shell script: ා! / bin / Sh

It can be replaced with: ා! / bin / awk

3. insert all awk commands into a single file, and then call:


awk -f awk-script-file input-file(s)

Where, the – f option loads the awk script in the awk script file, and the input file (s) is the same as above.

This chapter focuses on the command line approach.

Introductory example

Suppose the output of last-n 5 is as follows

[root @ www ~] ා last - N 5 < = = only take out the first five lines
root   pts/1  192.168.1.100 Tue Feb 10 11:21  still logged in
root   pts/1  192.168.1.100 Tue Feb 10 00:46 - 02:28 (01:41)
root   pts/1  192.168.1.100 Mon Feb 9 11:41 - 18:30 (06:48)
dmtsai  pts/1  192.168.1.100 Mon Feb 9 11:41 - 11:41 (00:00)
root   tty1          Fri Sep 5 14:09 - 14:10 (00:01)

If only the 5 most recently logged in accounts are displayed


#last -n 5 | awk '{print $1}'
root
root
root
dmtsai
root

The workflow of awk is as follows: read in a record with ‘\ n’ line break, and then divide the record into fields according to the specified field separator, fill in the fields, $0 for all fields, $1 for the first field, $n for the nth field. The default domain separator is “blank key” or “[tab] key”, so $1 represents the login user, $3 represents the login user IP, and so on.

If only the account of / etc / passwd is displayed


#cat /etc/passwd |awk -F ':' '{print $1}' 
root
daemon
bin
sys

This is an example of awk + action. Each line will execute action {print $1}.

-F specifies that the domain separator is’: ‘.

If only the account of / etc / passwd and the shell corresponding to the account are displayed, the account and shell are separated by tab key


#cat /etc/passwd |awk -F ':' '{print $1"\t"$7}'
root  /bin/bash
daemon /bin/sh
bin   /bin/sh
sys   /bin/sh 

If only the account of / etc / passwd and the shell corresponding to the account are displayed, and the account and shell are separated by commas, and the column name name, shell are added in all lines, and “blue, / bin / nosh” is added in the last line.


cat /etc/passwd |awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
name,shell
root,/bin/bash
daemon,/bin/sh
bin,/bin/sh
sys,/bin/sh
....
blue,/bin/nosh

The workflow of awk is as follows: first, execute begin, then read the file, read in a record with / N line break, then divide the record into domains according to the specified domain separator, fill in the domain, $0 represents all domains, $1 represents the first domain, $n represents the nth domain, and then start the corresponding action of the mode. Then start to read in the second record… Until all records are read, and finally execute the end operation.

Search / etc / passwd for all lines with the root keyword


#awk -F: '/root/' /etc/passwd
root:x:0:0:root:/root:/bin/bash

This is an example of using pattern. Only rows matching pattern (root in this case) can execute action (no action is specified, and the content of each row is output by default).

Search supports regular, for example, find the one starting with root:awk -F: '/^root/' /etc/passwd

Search / etc / passwd for all lines with the root keyword and display the corresponding shell


# awk -F: '/root/{print $7}' /etc/passwd       
/bin/bash

It’s specified hereaction{print $7}

Awk built in variables

Awk has many built-in variables for setting environment information, which can be changed. The most commonly used variables are listed below.

Number of argc command line parameters
Argv command line parameter arrangement
Environ supports the use of system environment variables in queues
Filename the filename that the awk browsed
FNR number of records browsing files
FS sets the input field separator, equivalent to the command line – f option
NF number of domains for browsing records
NR number of records read
Ofs output field separator
Ors output record separator
RS control record separator

In addition, the $0 variable refers to the entire record. $1 represents the first field of the current row, $2 represents the second field of the current row,… And so on.

Statistics / etc / passwd: file name, line number of each line, number of columns of each line, and corresponding complete line content:


#awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:"$0}' /etc/passwd
filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash
filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh
filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh 

Using printf instead of print can make the code more concise and easy to read


 awk -F ':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n",FILENAME,NR,NF,$0)}' /etc/passwd 

Print and printf

Awk provides print and printf functions at the same time.

Where the parameter of the print function can be a variable, a value, or a string. The string must be quoted in double quotes, and the parameters separated by commas. Without commas, parameters are concatenated and indistinguishable. Here, the comma has the same effect as the separator of the output file, except that the latter is a space.

Printf function, whose usage is basically similar to that of printf in C language, can format strings. When the output is complex, printf is easier to use and the code is easier to understand.

Awk programming

Variables and Assignment

In addition to the built-in variables of awk, awk can also customize variables.

The following statistics show the number of accounts in / etc / passwd


awk '{count++;print $0;} END{print "user count is ", count}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
......
user count is 40

Count is a custom variable. In the previous action {} there was only one print, in fact, print was only one statement, and action {} can have multiple statements separated by; signs.

The count is not initialized here. Although it is 0 by default, the proper way is to initialize it to 0:


awk 'BEGIN {count=0;print "[start]user count is ", count} {count=count+1;print $0;} END{print "[end]user count is ", count}' /etc/passwd
[start]user count is 0
root:x:0:0:root:/root:/bin/bash
...
[end]user count is 40 

Count the number of bytes occupied by files in a folder


ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size}'
[end]size is 8657198

If displayed in M:


ls -l |awk 'BEGIN {size=0;} {size=size+$5;} END{print "[end]size is ", size/1024/1024,"M"}' 
[end]size is 8.25889 M

Note that the statistics do not include the subdirectories of the folder.

Conditional statement

The conditional statements in awk are borrowed from C language, as shown in the following Declaration:


if (expression) {
  statement;
  statement;
  ... ...
}
if (expression) {
  statement;
} else {
  statement2;
}
if (expression) {
  statement1;
} else if (expression1) {
  statement2;
} else {
  statement3;
}

Count the number of bytes occupied by files in a folder, and filter files of 4096 size (generally folders):


ls -l |awk 'BEGIN {size=0;print "[start]size is ", size} {if($5!=4096){size=size+$5;}} END{print "[end]size is ", size/1024/1024,"M"}' 
[end]size is 8.22339 M 

Loop statement

The circular statements in awk also refer to C language and support while, do / while, for, break and continue. The semantics of these keywords are exactly the same as those in C language.

array

Because the subscript of an array in awk can be a number or a letter, the subscript of an array is usually called a key. Values and keywords are stored in an internal hash table for key / value applications. Because hashes are not stored in order, when you display the contents of an array, you will find that they are not displayed in the order you expect. Arrays and variables are created automatically when they are used, and awk will also automatically determine whether they store numbers or strings. In general, arrays in awk are used to gather information from records, to calculate totals, to count words, to track the number of times a template has been matched, and so on.

Show accounts for / etc / passwd


awk -F ':' 'BEGIN {count=0;} {name[count] = $1;count++;}; END{for (i = 0; i < NR; i++) print i, name[i]}' /etc/passwd
0 root
1 daemon
2 bin
3 sys
4 sync
5 games
...... 

Here we use the for loop to traverse the array

summary

The above is a detailed explanation of SED and awk in Linux introduced by Xiaobian. I hope it can help you. If you have any questions, please leave me a message and Xiaobian will reply to you in time. Thank you very much for your support of the developepaer website!