Using shell command to count log



As we all know, you can easily count and analyze the logs by using shell command. When the service is abnormal, you need to check the logs, so it is necessary to master a skill of counting logs.

Suppose you have a log file, access.log, that contains the following. Let’s take the log of this file as an example.

date=2017-09-23 13:32:50 | ip= | method=GET | url=/api/foo/bar?params=something | status=200 | time=9.703 | bytes=129 | referrer="-" | user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7" | cookie="-"
date=2017-09-23 00:00:00 | ip= | method=HEAD | url=/api/foo/healthcheck | status=200 | time=0.337 | bytes=10 | referrer="-" | user-agent="-" | cookie="-"
date=2017-09-23 13:32:50 | ip= | method=GET | url=/api/foo/bar?params=anything | status=200 | time=8.829 | bytes=466 | referrer="-" | user-agent="GuzzleHttp/6.2.0 curl/7.19.7 PHP/7.0.15" | cookie="-"
date=2017-09-23 13:32:50 | ip= | method=GET | url=/api/foo/bar?params=everything | status=200 | time=9.962 | bytes=129 | referrer="-" | user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7" | cookie="-"
date=2017-09-23 13:32:50 | ip= | method=GET | url=/api/foo/bar?params=nothing | status=200 | time=11.822 | bytes=121 | referrer="-" | user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7" | cookie="-"

The logs corresponding to different services may be different. The format of sample logs used in this article is:

date | ip | method | url | status | time | bytes | referrer | user-agent | cookie

Be careful:The command behavior in MAC system and Linux system may be different. Please use the following command in Linux system

Exclude special logs

When we count the logs, we may not care about the head request or only about the get request. First, we need to filter the logs and use the grep command. -V means to exclude matching lines of text.

Grep get access.log only get requests are counted
Grep - V head access.log do not count head requests
Grep - V 'head \ | post' access.log ා do not count head and post requests

View interface time consumption

We can match the time of each row and make a sort. Using the match method of awk, we can match the regularity:

awk '{ match($0, /time=([0-9]+\.[0-9]+)/, result); print result[1]}' access.log

The awk command is used as follows:

awk '{pattern + action}' {filenames}

We actually used onlyaction:match($0, /time=([0-9]+\.[0-9]+)/, result); print result[1] This paragraph.

The match method takes three parameters: the text to match, the regular expression, and the result array. $0 represents each row processed by the awk command. The result array is optional. Because we want to get the matching result, we pass in a result array to store the matching result.

Note that I don’t use \ d to represent the number in the regularity here, because the awk instruction uses “eres” by default, and does not support the representation of \ D. for details, see the difference comparison of Linux shell regular expressions (bres, eres, PRES).

The result array is actually similar to the result array in JavaScript, so we print out the second element, which is the matched content. After executing this command, the result is as follows:


Of course, in fact, there may be thousands of logs in a day. We need to sort the logs and only show the first three. Use the sort command here.

The sort command sorts from small to large by default, and sorts as a string. So by default, “11” will be in front of “8” after using the sort command. Then you need to use – n to specify sorting by number, – r to sort by large to small, and then we look at the first three items:

awk '{ match($0, /time=([0-9]+\.[0-9]+)/, result); print result[1]}' access.log | sort -rn | head -3



View the most time consuming interfaces

Of course, we usually don’t only check the interface time consumption, but also print out the specific logs. The above commands can’t meet the requirements.

By default, the printing of awk is separated by spaces, which means that if you use awk ‘{print $1}’ to print out “2017-09-23”, similarly, $2 will print out get.

According to the log characteristics, we can use | as the separator, so that we can print out the values of interest. Because we want to find the most time-consuming interface, let’s find time, date and URL separately.

The – f parameter of awk is used to customize the separator. Then we can count the next three parts separated by |: time is the sixth, date is the first, and URL is the fourth.

awk -F '|' '{print $6 $1 $4}' access.log

The result is as follows:

 time=9.703 date=2017-09-23 13:32:50 url=/api/foo/bar?params=something
 time=0.337 date=2017-09-23 00:00:00 url=/api/foo/healthcheck
 time=8.829 date=2017-09-23 13:32:50 url=/api/foo/bar?params=anything
 time=9.962 date=2017-09-23 13:32:50 url=/api/foo/bar?params=everything
 time=11.822 date=2017-09-23 13:32:50 url=/api/foo/bar?params=nothing

Because we want to sort by time, and sort can be sorted by column, and the column is separated by space. At present, the first column is time = XXX, which can’t be sorted, so we need to find a way to get rid of time = here, because we put the time in the first column, so in fact, we need to separate it by time =.

awk -F '|' '{print $6 $1 $4}' access.log | awk -F 'time=' '{print $2}'


9.703 date=2017-09-23 13:32:50 url=/api/foo/bar?params=something
0.337 date=2017-09-23 00:00:00 url=/api/foo/healthcheck
8.829 date=2017-09-23 13:32:50 url=/api/foo/bar?params=anything
9.962 date=2017-09-23 13:32:50 url=/api/foo/bar?params=everything
11.822 date=2017-09-23 13:32:50 url=/api/foo/bar?params=nothing

Use the – k parameter of sort to specify the column to be sorted, here is the first column. Combined with the sorting above, you can print out the most time-consuming log:

awk -F '|' '{print $6 $1 $4}' access.log | awk -F 'time=' '{print $2}' | sort -k1nr | head -3


11.822 date=2017-09-23 13:32:50 url=/api/foo/bar?params=nothing
9.962 date=2017-09-23 13:32:50 url=/api/foo/bar?params=everything
9.703 date=2017-09-23 13:32:50 url=/api/foo/bar?params=something

Interface with the most requests

If you need to count which interfaces have the most requests per day, you only need to introduce a new uniq command.

We can already passgrep -v HEAD access.log | awk -F '|' '{print $4}' To filter out all URLs, the uniq command can delete the same adjacent lines, and – C can output the number of times each line appears.

So we first sort the URLs so that the same URLs are put together, and then use uniq-c to count the number of times:

grep -v HEAD access.log | awk -F '|' '{print $4}' | sort | uniq -c

Because the number of sample logs is too small, let’s assume that there are multiple logs in the log. The result should be similar to the following:

1 url=/api/foo/bar?params=anything
19 url=/api/foo/bar?params=everything
4 url=/api/foo/bar?params=nothing
5 url=/api/foo/bar?params=something

Next, sort:

grep -v HEAD access.log | awk -F '|' '{print $4}' | sort | uniq -c | sort -k1nr | head -10


The above is the whole content of this article. I hope that the content of this article can bring some help to your study or work. If you have any questions, you can leave a message and communicate with us. Thank you for your support for developpaer.