An introduction to graylog analysis system


The log analysis system can collect, analyze, monitor and alarm the logs in real time, and can also analyze the logs in non real time. Splunk is powerful and easy to use, but it has to be charged. The free version has a daily limit of 500m. Logs exceeding 500m cannot be processed. Elk system is the most common system. The disadvantage is that the configuration is troublesome and heavy. Graylog is open source and free, and its configuration is simpler than elk system. To sum up, this paper attempts to build a set of graylog system in container mode, which does not do the configuration of real-time log collection and alarm, but only completes the function of non real-time passive receiving website log and analyzing various indexes of log.

I think the domestic speed of docker’s official image is slow, so I changed it to domestic image. new file daemon.json as follows

vi /etc/docker/daemon.json
"registry-mirrors": [""]

You can also use Netease image
After configuration, restart docker to take effect

#service docker restart

Pull the following three images

docker pull mongo:3
docker pull
docker pull graylog/graylog:3.3

Although I started the browser in half a minute, I didn’t start the browser in a hurry, but I didn’t open it too quickly. Finally, by looking at the log of the container startup, it is found that elasticsearch has requirements for system parameters, so modify it as follows.

In / etc/ sysctl.conf Add a line to the end of the file


vi /etc/security/limits.conf

*              -       nofile            102400  

After the modification, restart the system to make the variables take effect.

Docker should add parameters when starting elasticsearch

--ulimit nofile=65536:65536 --ulimit nproc=4096:4096

To ensure that the environment inside the container meets the requirements, otherwise you will see the container exception exit error of exit (78) or exit (1) under the docker pa – a command.
The most accurate way to check the container startup error is to use the command “docker logs – f container ID”. Let’s try without the — ulimit parameter

[[email protected] ~]# docker ps
CONTAINER ID        IMAGE                                                      COMMAND                CREATED             STATUS              PORTS                                            NAMES
7e4a811093d9   "/usr/local/bin/dock   6 seconds ago       Up 4 seconds>9200/tcp,>9300/tcp   elasticsearch

Use the container ID above to view the log at startup

[[email protected] ~]# docker logs -f 7e4a811093d9
It will be printed out
[1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
[2]: max number of threads [3869] for user [elasticsearch] is too low, increase to at least [4096]
[2020-08-27T06:10:25,888][INFO ][o.e.n.Node               ] [WG6mVz4] stopping ...
[2020-08-27T06:10:25,903][INFO ][o.e.n.Node               ] [WG6mVz4] stopped
[2020-08-27T06:10:25,903][INFO ][o.e.n.Node               ] [WG6mVz4] closing ...
[2020-08-27T06:10:25,928][INFO ][o.e.n.Node               ] [WG6mVz4] closed

Two lines of too low prompt are the reason for the container to exit.
The correct start command for the three containers is as follows

docker run --name mongo -d mongo:3

docker run --name elasticsearch \
    -e "" \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
    --ulimit nofile=65536:65536 --ulimit nproc=4096:4096 \
    -p 9200:9200 -p 9300:9300 \

docker run --name graylog --link mongo --link elasticsearch \
    -p 9000:9000 -p 12201:12201 -p 1514:1514 -p 5555:5555 \
    -v /home/graylog/geodata:/usr/share/graylog/log \
    -d graylog/graylog:3.3

There’s nothing to say about the launch of Mongo.
The — ulimit of elasticsearch must be added, otherwise exit after startup, – P 9200:9200 is the management port, which needs to be accessed to delete data in the future.
The graylog 9000 port is the system interface, and 5555 is an open TCP port, which is used to passively receive log data.

-v /home/graylog/geodata:/usr/share/graylog/log

It is to mount the local / home / graylog / geodata to the / usr / share / graylog / log directory of the container. I configure it so that graylog can read geolite2- City.mmdb Geographic information database, this library is IP and geographical location corresponding. I tried to copy it to the container, but I got an error

[[email protected] graylog]# docker cp ./GeoLite2-City.mmdb 151960c2f33b:/usr/share/graylog/data/
Error: Path not specified

They say that they want to upgrade docker 1.7 to a higher version. They don’t want to upgrade. Instead, they use the mount method. If you don’t want to mount any files, the – V line parameter can be removed.
I used the command “ා docker exec – it graylog container ID bash” to enter the container first. I saw that there was nothing in the / usr / share / graylog / log directory in the container, so I chose to mount it to this directory.
Geographic data is used to show which city and country the IP address of visiting website is distributed, as well as the display of world map. Need to be in…The trouble is that you need to register here. I downloaded geolite2 city_ twenty million two hundred thousand eight hundred and twenty-five tar.gz After decompressing, geolite2- City.mmdb Upload this file to Linux’s / home / graylog / geodata directory. This file needs to be mounted to the container for graylog to use.
Do not want to register, please download from the link below
Extraction code: bsmm

GRAYLOG_ HTTP_ EXTERNAL_ Do not write for the URI address. In this way, if you can access it from outside Linux, the web page is blank. You should write the IP address of Linux to open it in an external browser.

In addition, the startup of graylog depends on Mongo and elasticsearch. After the other two start successfully, start graylog.

Let’s start by demonstrating how to configure the graylog system and analyze the Apache standard format logs of the website. The general steps are as follows
Configure input > configure extractor for input > configure Geographic Information Database > input log manually > analyze log.
Browser inputhttp:// 9000 / the user name and password are both admin, log in to the graylog system.
An introduction to graylog analysis system

Click the drop-down arrow on the right side of select input to display the drop-down list, and select raw / plaintext TCP

An introduction to graylog analysis system
Then click lanch new input, and the only option is selected from the drop-down node. The title is arbitrarily named, and the port is written with 5555, because the – P 5555:5555 written by our docker startup parameter must be consistent.
An introduction to graylog analysis system
If you don’t need to fill in others, click the Save button below to start the input automatically. You can see that the configuration just added under local inputs. In fact, cat is used now access.log |NC localhost 5555 and other commands send log data to port 5555. The data can be entered into the graylog system and can be easily searched. But this kind of search is the most basic string matching, which is of little value. If we want to analyze the indicators of the log and generate a chart, the system must be able to parse each field (field or field value) of each log. For example, clientip is a field, and request is also a field. To parse the field, to configure extractor for input, click manager actor.
An introduction to graylog analysis system

An introduction to graylog analysis system

Paste the following into the extractors JSON

"extractors": [
"title": "commonapache",
"extractor_type": "grok",
"converters": [],
"order": 0,
"cursor_strategy": "copy",
"source_field": "message",
"target_field": "",
"extractor_config": {
"grok_pattern": "%{COMMONAPACHELOG}"
"condition_type": "none",
"condition_value": ""
"version": "3.3.5"

An introduction to graylog analysis system

Finally, click Add extractors to input to display successful.
By now, the field of log can be correctly parsed. But if we want to analyze geographic information, we must also configure the geographic information database, MMDB file downloaded above.
System > configurations. There is a geo location processor at the bottom right. Click the update button under the modified project

An introduction to graylog analysis system

After configuration, click save.

The geoip resolver should be placed at the bottom of the table below message processors configuration at the top of the configurations. Click Update at the bottom of the table
An introduction to graylog analysis system
Hold down geoip resolver and drag down,
An introduction to graylog analysis system

An introduction to graylog analysis system

Click save when finished. The geoip resolver of the message processors configuration table is at the bottom.
The following is the manual input of the log to the input. I put the access2020-07-06.log in the Linux directory and execute it in the directory

# cat access2020-07-06.log | head -n 10000 | nc localhost 5555

The command sends 10000 lines of log from the beginning to port 5555 of the local machine. Since the input configuration of graylog is also port 5555, the command parameter is – P 5555:5555 when docker runs graylog. As long as the three places are consistent, the command will be successful. Here, the commands NC, NCAT and netcat can achieve the same effect.

When the import is complete, select the search option at the top of graylog
An introduction to graylog analysis system

The button at the top is the query time range. This time is the time of log import, not the time of log recording requests. If you want to search all the information, select search in all messages directly
The magnifying glass button at the bottom is to search, and then you can add search keywords or restrictions on a certain field. There are many search syntax, which are very convenient. After you click search, unqualified log records will be removed.
All messages below are the original log results that meet the criteria.
If you want to count which cities the visits come from, click the X (field) button at the bottom of the left sidebar. Select clientip_ cityname->show top values
An introduction to graylog analysis system
Click the gray area on the right to return to the main interface. The city information of the source is already in the list.
An introduction to graylog analysis system
N / a means that there are a large number of requests that can not identify the city where the IP is located. This may be because our geographic information database is not complete and not up-to-date, or some intranet addresses such as 192172 can’t identify the area, which will not be discussed here. If you want to remove n / a data, just look at the distribution of recognizable cities. Put the mouse on the right side of N / A, the arrow in the drop-down menu will appear. Click the arrow and select exclude from results, the data of N / a will be removed, and this filter condition will be automatically added in the search bar above,
An introduction to graylog analysis system
An introduction to graylog analysis system

Note that the current statistics exclude n / a data. The data range is actually smaller than the whole log range, which is very valuable in practical application. In many cases, we need to look at a certain local range to count some indicators. Let’s take a look at the statistical chart of the source city, click the drop-down arrow in the upper right corner, and select Edit
An introduction to graylog analysis system
Click the drop-down menu at the date table on the left, and you can see that the bar chart, pie chart, scatter chart, etc. are all listed in it. If you select which one, the statistical chart will appear on the right.
An introduction to graylog analysis system
An introduction to graylog analysis system
An introduction to graylog analysis system
If you want to show the distribution of access sources on the world map, select clientip from the field menu_ geolocation->show top values,
An introduction to graylog analysis system
The statistics table that pops up is the number of visits to the latitude and longitude coordinates. As with the icon above, enter the date tab drop-down menu with the world map at the bottom
An introduction to graylog analysis system
Select to display the map statistics results, zoom in and adjust the position as shown in the figure below
An introduction to graylog analysis system

Statistics of other indicators, such as request distribution and access time distribution, are available in the list under field. Follow the same operation as above if necessary. Geographic information data and standard Apache logs can be combined to take effect, but it is uncertain whether some custom extractors will work.

The chapter of Foreign Affairs

Configure extractor for input. The standard Apache format log is configured above. What if the log format is nginx or customized?
Graylog provides the function of configuring extractor for logs. Suppose that we have configured input and not configured extractor for input, we directly import logs and configure extractor according to the following steps
In the input interface, select manager extractor
An introduction to graylog analysis system
An introduction to graylog analysis system
Load message will display one of the logs just entered. Click Select extractor type in the message position to indicate that we want to configure extractor for message, that is, the whole message. Select grok pattern from the drop-down menu. If the log has been entered for a long time, the load message cannot display the log. You need to search the log by using the message ID tag next to it. You need to provide message ID and index. These two parameters can be found by clicking a log data in all message at the bottom of the search interface. Message ID is like 4b282600-e8d2-11ea-b962-0242ac110008, and index is like graylog_ 0。
An introduction to graylog analysis system
Enter the extractor configuration, and the pattern in it should be filled in by yourself. You can select several combinations from the existing pattern on the right, or define it by yourself. Here, you need to be familiar with grok and regular syntax. What I fill in here is the pattern format of parsing nginx native log, which is also searched on the Internet. After filling in, click try against example. If the parsing is successful, the values of each field corresponding to the log will be listed in the table below. If it fails, an error will be reported. You need to modify the pattern until no error is reported.
An introduction to graylog analysis system
My pattern is as follows

^%{IPORHOST:clientip} (?:-|%{USER:ident}) (?:-|%{USER:auth}) \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|-)\" %{NUMBER:response} (?:-|%{NUMBER:bytes})([\s\S]{1})\"(?<http_referer>\S+)\"([\s\S]{1})\"(?<http_user_agent>(\S+\s+)*\S+)\".*%{BASE16FLOAT:request_time}

If the parsing is successful, name the extractor Title randomly and click Create extractor at the bottom

An introduction to graylog analysis system

Extractor has been successfully added to input. The above action has export extractor. Click to display the extractor just configured in JSON format.
An introduction to graylog analysis system
Copy the JSON text and save it locally. If you encounter the nginx native format log, you can use it directly through the above import extractor. You don’t need to configure the grok pattern test.
It should be noted that whether a log record is resolved into fields depends on whether the extractor is configured when the log enters the system. The post configured extractor cannot parse the previous logs.
If the extractor is configured, only a small part of the logs in the same format will enter the system. Do not look for other reasons. The reason is that the pattern is not correct. Although the test has passed and it has been configured, you need to modify the pattern again. If the pattern is correct, all logs in the format should enter the system.

For some log formats, the configuration of grok pattern requires a lot of debugging, but the debugging of graylog is not convenient, and the official grok debugger website can not be opened in China. The following provides a tool, you can paste the log directly to the page debugging
Extraction code: t6q6
Windows CMD direct Java – jar grokconstructor-0.1.0-snapshot- standalone.jar
Then the browser visits, click matcher, fill in the log above, and fill in grok pattern below,
An introduction to graylog analysis system
Click go. If the parsing is successful, the results of each field will be displayed in a table.
An introduction to graylog analysis system
Ramdom example gives some examples of common logs and the corresponding pattern format.

If you want to reconfigure graylog and input data, first

docker stop $(docker ps -a -q)

Stop all containers, and then

docker rm $(docker ps -a -q)

Delete all containers, and then docker run starts three containers in order. In this way, the containers started are brand new, and the previous configuration and data will be lost.
The operation of the container is troublesome and can be used

curl -L`uname -s`-`uname -m` -o /usr/local/bin/docker-compose

Install docker compose and write the startup parameters to docker- compose.yml In the file, it is easy to operate the command in this way.

Recommended Today

Comparison and analysis of Py = > redis and python operation redis syntax

preface R: For redis cli P: Redis for Python get ready pip install redis pool = redis.ConnectionPool(host=’′, port=6379, db=1) redis = redis.Redis(connection_pool=pool) Redis. All commands I have omitted all the following commands. If there are conflicts with Python built-in functions, I will add redis Global command Dbsize (number of returned keys) R: dbsize P: print(redis.dbsize()) […]