Understanding elk is not particularly difficult


This article mainly introduces some framework composition, principle and practice of elk. This version of elk is version 7.7.0.

Elk introduction

Elk introduction

Elk is the acronym of elasticsearch, logstash and kibana (but the later filebeat (one of beats) can be used to replace the data collection function of logstash, which is lightweight). It is also called elastic stack on the market.

Filebeat is a lightweight delivery tool for forwarding and centralizing log data. Filebeat monitors the log files or locations you specify, collects log events, and forwards them to elasticsearch or logstash for indexing. Filebeat works as follows: when you start filebeat, it will start one or more inputs that will be found in the location specified for log data. For each log found by filebeat, filebeat starts the collector. Each collector reads a single log to get new content and sends the new log data to libbeat, which aggregates events and sends the aggregated data to the output configured for filebeat.

Logstash is a free and open server-side data processing pipeline, which can collect data from multiple sources, convert data, and then send data to your favorite “repository”. Logstash can dynamically collect, convert and transmit data without being affected by format or complexity. Grok is used to derive structure from unstructured data, decode geographical coordinates from IP address, anonymize or exclude sensitive fields, and simplify the overall processing process.

Elasticsearch is a distributed search and analysis engine at the core of elastic stack. It is a near real-time search platform framework based on Lucene, distributed and interactive through restful mode. Elasticsearch provides near real-time search and analysis for all types of data. Whether you are structured or unstructured text, digital data or geospatial data, elasticsearch can effectively store and index it in a way that supports fast search.

Kibana is an open source analysis and visualization platform for elasticsearch, which is used to search and view the data interactively stored in elasticsearch index. With kibana, advanced data analysis and display can be carried out through various charts. And it can provide log analysis friendly web interface for logstash and elasticsearch, which can summarize, analyze and search important data logs. It can also make massive data easier to understand. It is easy to operate, and the browser based user interface can quickly create a dashboard to display elasticsearch query dynamics in real time.

Why use elk

Logs mainly include system logs, application logs and security logs. System operation and maintenance personnel and developers can understand the software and hardware information of the server through the log, check the errors in the configuration process and the causes of the errors. Regular analysis of logs can understand the load, performance and security of the server, so as to take timely measures to correct errors.

Often, the logs of a single machine can be basically analyzed by using grep, awk and other tools, but when the logs are scattered and stored on different devices. If you manage dozens or hundreds of servers, you are still using the traditional method of logging in to each machine in turn. Does this feel cumbersome and inefficient. It is imperative that we use centralized log management, such as open source syslog, to collect and summarize logs on all servers. After centralized log management, log statistics and retrieval become a more troublesome thing. Generally, we can use grep, awk, WC and other linux commands to achieve retrieval and statistics, but we still have a little difficulty in using this method for higher requirements such as query, sorting and statistics and a large number of machines.

Generally, a large-scale system is a distributed deployment architecture. Different service modules are deployed on different servers. When a problem occurs, it is necessary to locate the specific server and service module according to the key information exposed by the problem, and build a centralized log system, which can improve the efficiency of locating the problem.

Basic characteristics of complete log system
  • Collection: it can collect log data from multiple sources
  • Transmission: it can analyze, filter and transmit the log data to the storage system stably
  • Storage: store log data
  • Analysis: supports UI analysis
  • Warning: it can provide error reporting and monitoring mechanism

Elk architecture analysis

Beats + elasticsearch + kibana mode

Understanding elk is not particularly difficult

As shown in the figure above, the elk framework is composed of beats (we usually use filebeat for log analysis) + elasticsearch + kibana. This framework is relatively simple and entry-level. Filebeat can also simply parse and index logs through module. And view the pre built kibana dashboard.

The framework is suitable for simple log data and can generally be used for playing. It is recommended to access logstash in the production environment.

Beats + logstash + elasticsearch + kibana mode

Understanding elk is not particularly difficult

The framework introduces logstash on the basis of the above framework. The benefits of introducing logstash are as follows:

  • Logstash has a disk based adaptive buffer system, which will absorb the incoming throughput and reduce the back pressure
  • Extract from other data sources, such as databases, S3, or messaging queues
  • Send data to multiple destinations, such as S3, HDFS or write file
  • Use conditional data flow logic to form more complex processing pipelines

Advantages of filebeat combined with logstash:

  • Horizontal scalability, high availability and variable load handling: filebeat and logstash can achieve load balancing between nodes, and multiple logstash can achieve high availability of logstash.
  • Message persistence and at least one delivery guarantee: when using filebeat or winlogbeat for log collection, at least one delivery can be guaranteed. The two communication protocols from filebeat or winlogbeat to logstash and from logstash to elasticsearch are synchronous and support confirmation. Logstash persistent queues provide protection against cross node failures. For disk level elasticity in logstash, it is very important to ensure disk redundancy.
  • End to end secure transmission with authentication and wired encryption: the transmission from beats to logstash and from logstash to elasticsearch can be transmitted by encryption. When communicating with elasticsearch, there are many security options, including basic authentication, TLS, PKI, LDAP, ad and other custom domains.

Of course, other data input methods can be introduced on the basis of the framework: for example, TCP, UDP and HTTP protocols are common methods for inputting data into logstash (as shown in the figure below):

Understanding elk is not particularly difficult

Beats + cache / message queue + logstash + elasticsearch + kibana mode

Understanding elk is not particularly difficult

Based on the above, we can add some components redis, Kafka and rabbitmq between beats and logstash. Adding middleware will have the following benefits:

  • Reduce the impact on the machines where the logs are located. Generally, reverse agents or application services are deployed on these machines, and the load itself is very heavy, so do as little work on these machines as possible;
  • If there are many machines that need to collect logs, making each machine continuously write data to elasticsearch will inevitably put pressure on elasticsearch. Therefore, it is necessary to buffer the data. At the same time, such buffering can also protect the data from loss to a certain extent;
  • The formatting and processing of log data can be done uniformly in the indexer. The code and deployment can be modified in one place to avoid the need to modify the configuration on multiple machines.

Elk deployment

The website of elk components can be downloaded from the official website:https://www.elastic.co/cn/

Or download it in the Chinese community:https://elasticsearch.cn/down…

Note: this installation is in the form of compressed package.

Introduction to the installation of filebeat


Filebeat works as follows: when you start filebeat, it will start one or more inputs that will be found in the location specified for log data. For each log found by filebeat, filebeat starts the collector. Each collector reads a single log to get new content and sends the new log data to libbeat, which aggregates events and sends the aggregated data to the output configured for filebeat.

Filebeat structure: it is composed of two components, namely inputs and collectors. These components work together to track files and send event data to your specified output. Harvester is responsible for reading the contents of a single file. Harvester reads each file line by line and sends the contents to the output. Start a harvester for each file. Harvester is responsible for opening and closing files, which means that the file descriptor remains open when harvester is running. If you delete or rename a file while collecting it, filebeat will continue to read the file. The side effect of this is that the space on the disk is reserved until the harvester is closed. By default, filebeat keeps the file open until it reaches close_ inactive。

Simple installation

This article is installed in a compressed package, Linux version, filebeat-7.7.0-linux-x86_ 64.tar.gz。

tar -xzvf filebeat-7.7.0-linux-x86_64.tar.gz

Configuration sample file: filebeat.reference.yml (including all configuration items that are not outdated)

Configuration file: filebeat.yml

Start command:. / filebeat – e

See the following text for specific principles, applications and examples:https://www.cnblogs.com/zsql/…

Introduction to logstash installation

Basic principles

Logstash is divided into three steps: inputs → filters (optional) → outputs (required), the generation time of inputs, filters filter and process its events, and outputs to the output or determine which components it is stored in. Inputs and outputs support encoding and decoding.

Each input stage in the logstash pipeline runs in its own thread. Enter write events into a central queue in memory (default) or on disk. Each pipeline worker thread takes out a batch of events from the queue, processes the batch of events through the configured filter, and then outputs them to the specified component storage through output. The size of the amount of data processed by the pipeline and the number of pipeline worker threads are configurable.

Simple installation

Download address 1:https://www.elastic.co/cn/dow…

Download address 2:https://elasticsearch.cn/down…

JDK needs to be installed here. I use the JDK that comes with elasticsearch 7.7.0:

Unzip and install:

tar -zxvf logstash-7.7.0.tar.gz

A logstash version of HelloWorld:

./bin/logstash -e 'input { stdin { } } output { stdout {} }'

Understanding elk is not particularly difficult

Introduction to elasticsearch installation

Basic introduction

Elastic search (ES) is an open source, distributed and restful full-text search engine based on Lucene. Elasticsearch is also a distributed document database, in which each field can be indexed and the data of each field can be searched. Es can be expanded horizontally to hundreds of servers to store and process Pb level data. It can store, search and analyze a large amount of data in a very short time.

The basic concepts include cluster, node, index, document, shards & replicas, shards and replicas.

Advantages of elasticsearch:

  • Distributed: horizontal expansion is very flexible;
  • Full text retrieval: powerful full-text retrieval capability based on Lucene;
  • Near real-time search and analysis: when data enters es, it can achieve near real-time search and aggregate analysis;
  • High availability: fault tolerance mechanism, automatic discovery of new or failed nodes, reorganization and rebalancing of data;
  • Mode Freedom: ES’s dynamic mapping mechanism can automatically detect the structure and type of data, create indexes and make the data searchable;
Linux system parameter setting

1. Set system configuration

ulimit  # Temporarily modify, switch to the user es, ulimit  - n   sixty-five thousand five hundred and thirty-five 
/etc/security/limits.conf  # Permanent modification   es  -   nofile    sixty-five thousand five hundred and thirty-five
ulimit  - a  # View resource limits for the current user

2. Disable sawping

Mode 1:

swapoff  - a  # Temporarily disable all swap files
vim  / etc/fstab  # Comment out all swap related lines and disable them permanently

Mode 2:

cat  / proc/sys/vm/swappiness  # View the value
sysctl   vm.swappiness=1  # Temporarily modify the value to 1
vim  / etc/sysctl.conf  # Modify file   Permanent effect
vm.swappiness  =  one  # If there is this value, modify it. If not, append this option, sysctl  - P effective order

Mode 3:

Configure the elasticsearch.yml file and add the following configuration:
bootstrap.memory_lock: true
GET  _ nodes?filter_ path=**.mlockall   # Check whether the above configuration is successful

Note: if you try to allocate more memory than is available, mlockall may cause the JVM or shell session to exit!

3. Profile descriptor

ulimit  - n   sixty-five thousand five hundred and thirty-five   # Temporary modification
vim  / etc/security/limits.conf  # Permanent modification
es         soft    nproc     65535
es         hard    nproc     65535

4. Configure virtual memory

sysctl  - w   vm.max_ map_ count=262144  # Temporarily modify the value
vim  / etc/sysctl.conf  # Permanent modification

5. Number of configured threads

ulimit  - u   four thousand and ninety-six  # Temporary modification
vim  / etc/security/limits.conf  # Permanent modification
Elasticsearch installation

Elasticsearch needs to be started by other users, so you need to create a new user elk first:

groupadd  elastic
useradd elk -d /data/hd05/elk -g elastic
echo '[email protected]' | passwd elk --stdin


You can also download WGet from the official website:https://artifacts.elastic.co/…

Decompression: tar -zxvf elasticsearch-7.7.0-linux-x86_ 64.tar.gz
Establish soft link: ln – s elasticsearch-7.7.0 es

Table of contents:

bin:  $ ES_ HOME/bin   # Es startup command and plug-in installation command
conf:$ES_ HOME/conf  # Elasticsearch.yml configuration file directory
data:$ES_ HOME/data   # The corresponding parameter path.data is used to store the index fragment data file
logs:$ES_ HOME/logs   # The corresponding parameter path.logs is used to store logs
jdk:$ES_ HOME/jdk   # JDK supporting this es version is provided
plugins:  $ ES_ HOME/jplugins  # Plug in storage directory
lib:  $ ES_ HOME/lib  # Store dependent packages, such as Java class libraries
modules:  $ ES_ HOME/modules  # Contains all es modules

Configure the built-in Java environment:

Vim ~/.bashrc
############Add the following to the end######################
export JAVA_HOME=/data/hd05/elk/es/jdk
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:/lib/tools.jar

Jvm.options File Description:

Configure Java parameters
One is to modify the JVM parameters by modifying the / data / hd05 / elk / elasticsearch-7.7.0/config/jvm.options file. One variable has been used_ JAVA_ Opts to declare JVM parameters
/Data / hd05 / elk / elasticsearch-7.7.0/config/jvm.options introduction:
8:-Xmx2g   # Indicates Java only   eight
8-:-Xmx2g   # Indicates that it is suitable for Java   Version 8
8-9:-Xmx2g  # Indicates that it is suitable for Java   8, and Java   nine

Through variable es_ JAVA_ Opts to declare JVM parameters:

For example: export   ES_ JAVA_ OPTS="$ES_JAVA_OPTS  - Djava.io.tmpdir=/path/to/temp/dir"

Configure config / jvm.options:

[[email protected] config]$ cat  jvm.options  | egrep -v '^$|#'                 

Configure encrypted communication certificate:

Generate certificate:

Method 1:
./bin/elasticsearch-certutil ca -out config/elastic-certificates.p12 -pass "password"

Check the config directory, and the elastic-certificates.p12 file is generated:

Understanding elk is not particularly difficult

Method 2:

./bin/elasticsearch-certutil   ca   # To create a cluster authentication authority, you need to enter the password interactively
./bin/elasticsearch-certutil   cert  -- ca   elastic-stack-ca.p12   # Issue a certificate for the node, which is the same as the password above
Execute. / bin / elasticsearch keystore   add   xpack.security.transport.ssl.keystore.secure_ password   And enter the password entered in the first step 
Execute. / bin / elasticsearch keystore   add   xpack.security.transport.ssl.truststore.secure_ password   And enter the password entered in the first step 
Move the generated elastic-certificates.p12 and elastic-stack-ca.p12 files to the config directory

Configure config / elasticsearch.yml:

[[email protected] config]$ cat  elasticsearch.yml  | egrep -v '^$|#'
cluster.name: my_cluster
node.name: lgh01
node.data: true
node.master: true
path.data: /data/hd05/elk/elasticsearch-7.7.0/data
path.logs: /data/hd05/elk/elasticsearch-7.7.0/logs
http.port: 9200
transport.tcp.port: 9300
discovery.seed_hosts: ["","","",""]
cluster.initial_master_nodes: ["lgh01","lgh02","lgh03"]
cluster.routing.allocation.cluster_concurrent_rebalance: 32
cluster.routing.allocation.node_concurrent_recoveries: 32
cluster.routing.allocation.node_initial_primaries_recoveries: 32
http.cors.enabled: true
http.cors.allow-origin: '*'
#The following is to configure x-pack and TSL / SSL encrypted communication
xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12
bootstrap.memory_ lock:   false    # CentOS   6 configuration required
bootstrap.system_ call_ filter:   false  # CentOS   6 configuration required

Then through SCP to other nodes, modify the above node.name and node.master parameters, and then delete the data target, otherwise there will be an error.

Then use. / bin / elasticsearch – D to start elasticsearch in the background, and remove – D to start elasticsearch in the front end.

Then. / bin / elasticsearch setup passwords interactive configure the password of the default user: (there are the following interactions), which can be generated automatically by auto.

[[email protected] elasticsearch-7.7.0]$ ./bin/elasticsearch-setup-passwords interactive
Enter password for the elasticsearch keystore : 
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]: 
Reenter password for [elastic]: 
Enter password for [apm_system]: 
Reenter password for [apm_system]: 
Enter password for [kibana]: 
Reenter password for [kibana]: 
Enter password for [logstash_system]: 
Reenter password for [logstash_system]: 
Enter password for [beats_system]: 
Reenter password for [beats_system]: 
Enter password for [remote_monitoring_user]: 
Reenter password for [remote_monitoring_user]: 
[email protected] password for user [apm_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]

You can then log inhttp:// 9200 / you need to enter a password. Enter elastic / passwd to log in.

Understanding elk is not particularly difficult

Head plug-in installation

Head official website:https://github.com/mobz/elast…

Nodejs Download:https://nodejs.org/zh-cn/down…

According to the official description, elasticsearch 7 has three ways to use the head plug-in, and I have only tried two here:

First: use the Google browser head plug-in, which can be used by installing the plug-in directly on Google browser.

Second: use the head service (use the head as a service). The installation is as follows:

#Running with built in server
git clone git://github.com/mobz/elasticsearch-head.git
cd elasticsearch-head
npm install
npm run start
open http://localhost:9100/

If an error is reported during the above installation, you can try this command before continuing to install NPM install phantomjs- [email protected] –ignore-scripts。

Installation introduction of kibana

Download address:https://elasticsearch.cn/down…

You can also go to the official website to download.

After decompression, modify kibana.yml file:

[[email protected] config]$ cat kibana.yml  | egrep -v "^$|#"
server.port: 5601
server.host: ""
server.name: "my-kibana"
elasticsearch.hosts: ["","",""]
elasticsearch.preserveHost: true
kibana.index: ".kibana"
elasticsearch.username: "elastic"
elasticsearch.password:   "password"    # Or use the saved password "${es_pwd}" of the keystore
. / bin / kibana starts.

Visit website: 5601 /, and log in using elastic / password.

Case analysis

Now let’s get an instance of beats + cache / message queue + logstash + elasticsearch + kibana:

We use Kafka as the intermediate component. Let’s take a look at the official website of filebeat using Kafka as output:https://www.elastic.co/guide/…

Understanding elk is not particularly difficult

Here we should pay attention to the Kafka version. I tried both extreme versions and made a hole in myself. If you already have a Kafka cluster, I install a stand-alone version (1.1.1):

The data set is in Apache log format, and the download address is:https://download.elastic.co/d…

The log format is as follows:

[[email protected] ~]$ tail -3 logstash-tutorial.log - - [04/Jan/2015:05:30:37 +0000] "GET /projects/xdotool/ HTTP/1.1" 200 12292 "http://www.haskell.org/haskellwiki/Xmonad/Frequently_asked_questions" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0" - - [04/Jan/2015:05:30:37 +0000] "GET /reset.css HTTP/1.1" 200 1015 "http://www.semicomplete.com/projects/xdotool/" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0" - - [04/Jan/2015:05:30:37 +0000] "GET /style2.css HTTP/1.1" 200 4877 "http://www.semicomplete.com/projects/xdotool/" "Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20140205 Firefox/24.0 Iceweasel/24.3.0"

First, we configure the configuration file filebeat.yml of filebeat:

#=========================== Filebeat inputs =============================
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
  # Change to true to enable this input configuration.
  enabled: true
  # Paths that should be crawled and fetched. Glob based paths.
    -  / data/elk/logstash-tutorial.log   # The Apache log format is used here
    #- c:programdataelasticsearchlogs*
  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']
  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']
  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #exclude_files: ['.gz$']
  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #  level: debug
  #  review: 1
  ### Multiline options
  # Multiline can be used for log messages spanning multiple lines. This is common
  # for Java Stack Traces or C-Line Continuation
  # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
  #multiline.pattern: ^[
  # Defines if the pattern set under pattern should be negated or not. Default is false.
  #multiline.negate: false
  # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
  # that was (not) matched before or after or as long as a pattern is not matched based on negate.
  # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
  #multiline.match: after
#================================ Outputs =====================================
  hosts:   [""]    # Configure Kafka's broker
  topic:  ' filebeat_ test'    # Configure topic   name
    reachable_only: false
  required_acks: 1
  compression: gzip
  max_message_bytes: 1000000

Then use the command to start in the background:

cd filebeat-7.7.0-linux-x86_64 && nohup ./filebeat -e &

Next, we configure the logstash configuration file:

cd logstash-7.7.0/ && mkidr conf.d
cd conf.d
vim apache.conf 
################The apache.conf file is filled with the following contents##############################
input {
                bootstrap_servers => ""
                topics => ["filebeat_test"]
                group_id => "test123"
                auto_offset_reset => "earliest"
filter {
                source => "message"
        grok {
        match => { "message" => "%{COMBINEDAPACHELOG}"}
        remove_field => "message"
output {
    stdout { codec => rubydebug }
    elasticsearch {
                    hosts => ["","","",""]
                    index => "test_kakfa"
                    user => "elastic"
                    password => "${ES_PWD}"

Then start the logstash command in the background:

cd logstash-7.7.0/ && nohup ./bin/logstash -f conf.d/apache.conf &

Then we look at the elasticsearch cluster to see the index.

Understanding elk is not particularly difficult

Understanding elk is not particularly difficult

Next, we log in to kibana to view the analysis of the index.

Understanding elk is not particularly difficult

Original link:https://www.cnblogs.com/zsql/…

Understanding elk is not particularly difficult

Recommended Today

Awk command is used in actual combat

We know the three swordsmen of Linux. They aregrep、sed、awk。 As I said earliergrepandsed, students who haven’t seen it can directly click to read. What we want to share today is more powerfulawk。 Sed can realize non interactive string replacement, and grep can realize effective filtering function. Compared with the two, awk is a powerful text […]