Log platform (gateway layer) – based on openresty + elkf + Kafka

Time:2020-6-29

Background

1. Problems and attempts

The online system without log recording is definitely a pit left for the system operation and maintenance personnel. Especially for projects with front-end and back-end separation, the back-end interface log can solve many problems in docking, testing and operation and maintenance. The interfaces published on previous projects are unified choreographed through Oracle service bus (OSB). Log records are added during the compilation, and the interface logs are stored in the database. Finally, the log platform is developed based on the interface log data to unify the interface log analysis.
But we can’t always use OSB for logging, which is not free. This year, we have used spring to develop many background interfaces, and the deployment environment of background programs is not limited to Oracle middleware environment. When some scenarios are separated from OSB, how to record the interface log is the problem to be solved in this paper.

In my spring series of articles, I have tried to use spring’s AOP to log. In the code of each project, define a logging aspect, which will log all interfaces under the project.
For an independent project with a long cycle and a large scale, this scheme is feasible. Because the project cycle is very long, it is no problem to spend two days logging AOP development, and this log is more suitable for the business characteristics of the system.
But the development that our team faces is basically small projects with large number and short cycle. The development cycle of a project may be only ten days. Even if each project only uses one day’s workload in the log record, it accounts for one tenth of the total workload. If we have to log each project independently, the cumulative workload is also very large, and it is boring to repeat such work.
Just like aspect oriented programming (AOP), set “aspect” unified programming on all interfaces of a project. If we can set “facet” unified programming on all projects, we can solve our current problems. This aspect is the gateway.

2. Scheme design

This plan is discussed by two technical leaders in the company. Such a surprising idea makes all the fog of previous troubles clear. I spent two days to do a demo, and the verification scheme is indeed feasible. The code of actual operation in this demo will be attached below.
In short, all project interfaces go through the gateway of nginx. Instead of collecting logs at the code level, we can obtain the desired log information on nginx. With elkf (elastic search, logstash, kibana, filebeat) solutions, we can build a unified log platform

  1. Nginx + Lua programming, according to the format defined by us, all interfaces through the gateway will leave log information and write it to the log file.
  2. Filebeat collects data, monitors the target log file in real time, and pushes the collected data to logstash.
  3. Logstash filters and processes data. After logstash filters and processes data, it will push the data to elasticsearch and Kafka at the same time.
  4. Elasticsearch + kibana, elasticsearch as the data search engine, and use kibana’s visual interface to display the log data in the form of reports.
  5. Kafka Message Queuing Middleware, the log data is pushed to Kafka, and then the message is published, and all subscribers can read the data from the queue. This time is to write a program to read the data in the queue in real time and store it in the database.

Log platform (gateway layer) - based on openresty + elkf + Kafka

3. System environment

In this demo, due to resource constraints, all products and services will be deployed on a server. The related environment on the server is as follows:

Configuration item Environment configuration information
The server Alibaba cloud server ECS (public network: 47.96.238.21, private network: 172.16.187.25)
Server configuration 2 vcpu + 4 GB memory
JDK version JDK 1.8.0_181
operating system CentOS 7.4 64 bit
OpenResty 1.13.6.2
Filebeat 6.2.4
Elasticsearch 6.2.4
Logstash 6.2.4
Kibana 6.2.4
Kafka 2.10-0.10.2.1

Logging based on openresty

Openresty ® is a high-performance web platform based on nginx and Lua, which integrates a large number of sophisticated Lua libraries, third-party modules and most dependencies. It is used to easily build dynamic web applications, web services and dynamic gateways that can handle ultra-high concurrency and high scalability.
We choose openresty for two purposes: (1) using Lua programming, we can get the desired log information better on nginx; (2) the integration of other functional modules of the system, such as JWT integration, can refer to the article “nginx implementation of JWT verification based on openresty” written by colleagues.

1. Openresty installation

Before installing openresty, you need to install the dependency library. Openresty dependency libraries include Perl 5.6.1 +, libreadline, libpcre, libssl. We are CentOS system, which can be installed directly from yum.

[[email protected] ~]# yum install readline-devel pcre-devel openssl-devel perl

Next, we will use the new official Yum source on the current CentOS system

[[email protected] ~]# yum install yum-utils
[[email protected] ~]# yum-config-manager --add-repo https://openresty.org/package/centos/openresty.repo

At this point, we can install openresty directly

[[email protected] ~]# yum install openresty
[[email protected] ~]# yum install openresty-resty

Openresty is now installed, and by default the program will be installed in the / usr / local / openresty directory

#You can view the successful installation
[[email protected] ~]# cd /usr/local/openresty/bin/
[[email protected] bin]# ./openresty -v
nginx version: openresty/1.13.6.2

#Setting environment variables
[[email protected] sbin]# vi /etc/profile
#Add export path = ${path} / usr / local / openresty / nginx / SBIN to the last side of the file
[[email protected] sbin]# source /etc/profile

2. Log nginx

After openresty is installed, there are configuration files and related directories. In order to avoid interference between the working directory and the installation directory, we create a separate working directory. I created a new folder of / openrestytest / V1 / in the root directory, and created logs and conf subdirectories to store logs and configuration files respectively.

[[email protected] ~]# mkdir /openrestyTest /openrestyTest/v1 /openrestyTest/v1/conf /openrestyTest/v1/logs
[[email protected] ~]# cd /openrestyTest/v1/conf/
#Create and edit nginx.conf
[[email protected] conf]# vi nginx.conf

stay nginx.conf Copy the following text as a test in

worker_ Processes 1; number of nginx workers
error_ log logs/ error.log ; ා specifies the error log file path
events {
    worker_connections 1024;
}

http {
    server {
        #Monitor port. If your port 6699 is occupied, you need to modify it
        listen 6699;
        location / {
            default_type text/html;

            content_by_lua_block {
                ngx.say("HelloWorld")
            }
        }
    }
}

This syntax is based on Lua, listening to port 6699 and outputting HelloWorld. We will now start nginx in openresty.

[[email protected] ~]# /usr/local/openresty/nginx/sbin/nginx -p '/openrestyTest/v1/' -c conf/nginx.conf
#Because of configuration or environment variables, it can also be used directly
[[email protected] ~]# nginx -p '/openrestyTest/v1/' -c conf/nginx.conf
[[email protected] conf]# curl http://localhost:6699
HelloWorld

Access the port address and successfully display HelloWorld. I deployed an interface on Tomcat of this server in advance. The port is 8080. My idea is to reverse proxy the 8080 to 9000, get all the log information of the service through port 8080, and output it to the local log file.
The log contents I need to record temporarily include: interface address, request content, request time, response content, response time, etc. Code written, directly replace / openrestytest / V1 / conf/ nginx.conf File content.

worker_processes  1;
error_log logs/error.log;

events {
    worker_connections 1024;
}

http {
log_format myformat '{"status":"$status","requestTime":"$requestTime","responseTime":"$responseTime","requestURL":"$requestURL","method":"$method","requestContent":"$request_body","responseContent":"$responseContent"}';
access_log logs/test.log myformat;

upstream tomcatTest {
    server 47.96.238.21:8080;
}

server {
        server_name 47.96.238.21;
        listen 9000;
        #Body is read by default
        lua_need_request_body on;

        location / {
                log_escape_non_ascii off;
                proxy_pass  http://tomcatTest;
                set $requestURL '';
                set $method '';
                set $requestTime '';
                set $responseTime '';
                set $responseContent '';

                body_filter_by_lua '
                        ngx.var.requestTime=os.date("%Y-%m-%d %H:%M:%S")

                        ngx.var.requestURL=ngx.var.scheme.."://"..ngx.var.server_name..":"..ngx.var.server_port..ngx.var.request_uri
                        ngx.var.method=ngx.var.request_uri

                        local resp_body = string.sub(ngx.arg[1], 1, 1000)
                        ngx.ctx.buffered = (ngx.ctx.buffered or"") .. resp_body
                        if ngx.arg[2] then
                                ngx.var.responseContent = ngx.ctx.buffered
                        end

                        ngx.var.responseTime=os.date("%Y-%m-%d %H:%M:%S")
                  ';

        }

    }
}

Restart nginx and verify

[[email protected] conf]# nginx -p '/openrestyTest/v1/' -c conf/nginx.conf -s reload

My prepared interface address is: http://47.96.238.21 : 8080 / springboot demo / Hello. The interface returns “Hello! Spring boot”.
Now call the interface in post mode http://47.96.238.21 : 9000 / springboot demo / Hello, application / JSON mode is used to input content in request: “segmentfault log platform (gateway layer) – based on openresty + elkf + Kafka”. Then look at the logs folder and find that there are more than one test.log File, we look at the file. It can be found that every time we call the interface, we will synchronously output the interface log to the file.

[[email protected] conf]#  tail -500f /openrestyTest/v1/logs/test.log
{"status":"200","requestTime":"2018-10-11 18:09:02","responseTime":"2018-10-11 18:09:02","requestURL":" http://47.96.238.21 : 9000 / springboot demo / Hello "," method ":" / springboot demo / Hello "," requestcontent ":" segmentfault "logging platform (gateway layer) - based on openresty + elkf + Kafka", "responsecontent": "Hello! Spring boot!"}

So far, extracting the interface information passing through the nginx gateway and writing it to the log file is completed, and all interface logs are written test.log File.

E + L + K + F = log data processing

Elkf is a combination of elastic + logstash + kibana + filebeat. Maybe elk is more familiar to you. Elkf only has more filebeats, which are open source products launched by elastic company. Just in these days, elastic company has been successfully listed, setting off a wave of elkf product discussion upsurge.
In the original elk architecture, logstash was responsible for collecting log information and reporting it, but later elastic company launched filebeat. We found that filebeat was better in log file collection, so only logstash was responsible for log processing and reporting. In this system, elastic serves as a search engine, logstash is a log analysis and reporting system, filebeat is a log file collection system, and kibana provides a visual web interface for this system.
Log platform (gateway layer) - based on openresty + elkf + Kafka

1. Filebeat installation configuration

Filebeat: a lightweight log collector, which is responsible for collecting logs in the form of files and pushing the collected logs to logstash for processing.

[[email protected] ~]# cd /u01/install/
[[email protected] install]# wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.2.4-x86_64.rpm
[[email protected] install]# yum localinstall -y filebeat-6.2.4-x86_64.rpm

After the installation, we started to configure filebeat to collect logs and push them to logstash.

[[email protected] install]# cd /etc/filebeat/
[[email protected] filebeat]# vi filebeat.yml

The filebeat.yml It is the configuration file of filebeat. Most of the modules in it have been commented. The code released in this configuration is as follows;

filebeat.prospectors:
- type: log
  enabled: true
  paths:
    - /openrestyTest/v1/logs/*.log
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 3
output.logstash:
  hosts: ["47.96.238.21:5044"]

Listen to the log files in the directory / openrestytest / V1 / logs / and output the collected log information to logstash. After the hosts are installed and started, start filebeat first.

[[email protected] filebeat]# cd /usr/share/filebeat/bin/
[[email protected] bin]# touch admin.out
[[email protected] bin]# nohup ./filebeat -e -c /etc/filebeat/filebeat.yml > admin.out &
#View admin.out  Log, is it started successfully

2. Logstash installation configuration

Logstash: log processing tool, which is responsible for log collection, conversion, parsing, etc., and pushes the parsed logs to elasticsearch for retrieval.

[[email protected] ~]# cd /u01/install/
[[email protected] install]# wget https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.rpm
[[email protected] install]# yum localinstall -y logstash-6.2.4.rpm
#Logstash does not recommend root startup
[[email protected] install]# group add logstash
[[email protected] install]# useradd -g logstash logstash
[[email protected] install]# passwd logstash
#Set password
[[email protected] install]# su logstash
[[email protected] install]# mkdir -pv /data/logstash/{data,logs}
[[email protected] install]# chown -R logstash.logstash /data/logstash/
[[email protected] install]# vi /etc/logstash/conf.d/logstash.conf

Create and edit / etc / logstash / conf.d/ logstash.conf File, configuration as follows:

input {
  beats {
    port => 5044
    codec => plain {
          charset => "UTF-8"
    }
  }
}

output {
  elasticsearch {
    hosts => "47.96.238.21:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

1. Input: refers to the data source of logstash. After startup, use 5044 to listen. Whether you are familiar with it is the hosts of the filebeat push log in the previous section.
2. Output; is the location of logstash output data, which is defined as elasticsearch here. It is used for log analysis in elk architecture as mentioned below

Next, we modify /etc/logstash/ logstash.yml

#vim /etc/logstash/logstash.yml
path.data: /data/logstash/data
path.logs: /data/logstash/logs

Now you can start logstash

[[email protected] install]# su logstash
[[email protected] root]$ cd /usr/share/logstash/bin/
[[email protected] bin]$ touch admin.out
[[email protected] bin]$ nohup ./logstash -f /etc/logstash/conf.d/logstash.conf >admin.out &

3. Elasticsearch installation configuration

Elastic search: it is a distributed restful search and data analysis engine. It also provides centralized storage function. It is mainly responsible for retrieving, querying and analyzing the log data captured by logstash.

[[email protected] ~]# cd /u01/install/
[[email protected] install]# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.4.rpm
[[email protected] install]# yum localinstall -y elasticsearch-6.2.4.rpm
#Elasticsearch is not recommended to start with root
[[email protected] install]# group add elsearch
[[email protected] install]# useradd -g elsearch elsearch
[[email protected] install]# passwd elsearch
#Set password
[[email protected] install]# su elsearch
[[email protected] bin]$  mkdir -pv /data/elasticsearch/{data,logs}
[[email protected] bin]$  chown -R elsearch.elsearch /data/elasticsearch/
[[email protected] bin]$  vi /etc/elasticsearch/elasticsearch.yml
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/logs
network.host: 0.0.0.0
http.port: 9200

If you want the Internet to be accessible, the host must be set to 0.0.0.0. The starting of elastic search is as follows

[[email protected] install]# su elsearch
[[email protected] bin]$ cd /usr/share/elasticsearch/bin/
[[email protected] bin]$ ./elasticsearch -d
#- D guarantee background startup

4. Kibana installation configuration

Kibana: web front end, which can convert the logs retrieved by elasticsearch into various charts to provide data visualization support for users.

[[email protected] ~]# cd /u01/install/
[[email protected] install]# wget https://artifacts.elastic.co/downloads/kibana/kibana-6.2.4-x86_64.rpm
[[email protected] install]# yum localinstall -y kibana-6.2.4-x86_64.rpm
[[email protected] install]# vi /etc/kibana/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.url: "http://47.96.238.21:9200"

Similarly, the host is 0.0.0.0 to ensure that the external network can access. Kibana is only used as a front-end display. Log data can be obtained by elasticsearch. Therefore, the elasticsearch.url 。 Then start kibana, and you can see the log report through the page.

[[email protected] ~]# cd /usr/share/kibana/bin/
[[email protected] bin]# touch admin.out
[[email protected] bin]# nohup ./kibana >admin.out &

We visit on the browser http://47.96.238.21 : 5601 /, normally you can access kibana’s page. If the elkf configuration is OK, you can see all log information in real time on kibana’s page.

Log platform (gateway layer) - based on openresty + elkf + Kafka

From Kafka to database

After getting the log data, a log viewing platform has been completed through elasticsearch and kibana. However, we have also developed a log platform within our own project, hoping to connect these logs to the previous log platform; or we want to customize a log platform that is more in line with the actual use. All of these need to store the log data in the database.
However, all log records are obviously in a high concurrency environment, and it is easy to cause the request to be blocked due to the lack of time for synchronous processing. For example, a large number of requests such as insert and update arrive at the database at the same time, which directly leads to numerous row lock and table locks. In the end, requests will accumulate too much, thus triggering too many connections errors. By using message queuing, we can process requests asynchronously to relieve the pressure on the system. After comparing the open-source message middleware in the market, I chose Kafka.
Apache Kafka is a distributed publish subscribe message system, which can support massive data transmission. Kafka is widely used in both offline and real-time message processing systems. Kafka persisted the message to disk, and created a backup for the message to ensure the data security. Kafka mainly deals with message consumption based on the pull mode and pursues high throughput. At the beginning, it was used for log collection and transmission. Version 0.8 starts to support replication, does not support transactions, and does not have strict requirements for message duplication, loss and error. It is suitable for data collection services of Internet services that generate a large amount of data.

Log platform (gateway layer) - based on openresty + elkf + Kafka

  • Broker: Kafka’s broker is stateless, and the broker uses zookeeper to maintain the state of the cluster. Zookeeper is also in charge of the leader election.
  • Zookeeper: zookeeper is responsible for maintaining and coordinating broker. When a new broker is added in Kafka system or a broker fails, zookeeper informs the producer and consumer. Producers and consumers coordinate with broker to publish and subscribe data according to the broker status information of zookeeper.
  • Producer: the producer pushes the data to the broker. When a new broker appears in the cluster, all producers will search for the new broker and automatically send the data to the broker.
  • Consumer: because Kafka’s broker is stateless, consumer must use partition
    Offset to record how much data is consumed. If a consumer specifies a topic offset, it means that the consumer has consumed all the data before the offset. The consumer can start consuming data from the specified location of the topic by specifying offset. The offset of consumer is stored in zookeeper.

1. Kafka installation and configuration

We started the installation and startup of Kafka

#Installation
[[email protected] ~]# cd /u01/install/
[[email protected] install]# wget http://apache.fayea.com/kafka/0.10.2.1/kafka_2.10-0.10.2.1.tgz
[[email protected] install]# tar -zvxf kafka_2.10-0.10.2.1.tgz -C /usr/local/
[[email protected] install]# cd /usr/local/
[[email protected] local]# mv kafka_2.10-0.10.2.1 kafka
#Start
[[email protected] local]# cd /usr/local/kafka/bin/
[[email protected] bin]# ./zookeeper-server-start.sh -daemon ../config/zookeeper.properties
[[email protected] bin]# touch admin.out
[[email protected] bin]# nohup ./kafka-server-start.sh ../config/server.properties >admin.out &

Create a topic named Kerry

[[email protected] bin]# ./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic kerry
#Topic created successfully. Let's take a look
[[email protected] bin]# ./kafka-topics.sh --list --zookeeper localhost:2181
kerry

We send information to this topic

[[email protected] bin]# ./kafka-console-producer.sh --broker-list localhost:9092 --topic kerry
Hello Kerry!this is the message for test

Let’s open another window to receive messages from topic

[[email protected] bin]# ./kafka-console-consumer.sh --zookeeper localhost:2181 --topic kerry --from-beginning
Hello Kerry!this is the message for test
#Can successfully receive

2. Producer: logstash

Kafka has been installed and topic has been built. The object (producer) I want to send messages to the topic is logstash. That is, after logstash obtains data from filebeat, it will output to logstash in addition to elasticsearch. Logstash is the producer of Kafka.
Here, you need to modify the logstash configuration file and add Kafka information to the output

vi /etc/logstash/conf.d/logstash.conf
input {
  beats {
    port => 5044
    codec => plain {
          charset => "UTF-8"
    }
  }
}

output {
  elasticsearch {
    hosts => "47.96.238.21:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
  kafka {
    bootstrap_ servers => " localhost:9092 "Producers
    topic_ Id = > "Kerry" ා set the topic written to Kafka
    compression_type => "snappy"
    codec => plain {
            format => "%{message}"
        }
  }
}

Restart logstash

[[email protected] bin]# cd /usr/share/logstash/bin
[[email protected] bin]# ps -ef|grep logstash 
#Kill process
[[email protected] bin]# nohup ./logstash -f /etc/logstash/conf.d/logstash.conf >admin.out &

We use post to call the previous test interface http://47.96.238.21 : 9000 / springboot demo / Hello, the request is: “this is a test of Kafka”. Then check to accept the message from topic

[[email protected] bin]#./kafka-console-consumer.sh --zookeeper localhost:2181 --topic kerry --from-beginning
{"status":"200","requestTime":"2018-10-12 09:40:02","responseTime":"2018-10-12 09:40:02","requestURL":" http://47.96.238.21 : 9000 / springboot demo / Hello "," method ": / springboot demo / Hello", "requestcontent": "this is a test of Kafka", "responsecontent": "Hello! Spring boot!"}

You can successfully receive the log message pushed

3. Consumer: springboot programming

Logs can be continuously pushed to Kafka, so consumers need to subscribe to these messages and write them to the database. I wrote a program with spring boot to subscribe to Kafka’s logs. The important code is as follows:
1、application.yml

spring:
  # kafka
  kafka:
    #Kafka server address (multiple)
    bootstrap-servers: 47.96.238.21:9092
    consumer:
      #Specify a default group name
      group-id: kafka1
      #Earliest: when there are committed offsets under each partition, consumption starts from the committed offset; when there is no committed offset, consumption starts from the beginning
      #Latest: when there are committed offsets under each partition, the consumption starts from the committed offset; when there is no committed offset, the newly generated data under the partition is consumed
      Wei none:topic If there are committed offsets in each partition, consumption starts after offset; if there is no committed offset in one partition, an exception will be thrown
      auto-offset-reset: earliest
      #Deserialization of key / value
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
    producer:
      #Serialization of key / value
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
      #Batch capture
      batch-size: 65536
      #Cache capacity
      buffer-memory: 524288
      #Server address
      bootstrap-servers: 47.96.238.21:9092

2、POM.xml

        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
            <version>1.0.6.RELEASE</version>
        </dependency>

3、KafkaController.java

package df.log.kafka.nginxlog.controller;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import javax.naming.InitialContext;
import javax.sql.DataSource;
import java.sql.Connection;


@RestController
@EnableAutoConfiguration
public class KafkaController {

    @RequestMapping("/hello")
    public String hello(){
        return "Hello!Kerry. This is NginxLog program";
    }
    /**
     *Monitor information
     */
    @KafkaListener(topics = "kerry" )
    public void receive(ConsumerRecord<?, ?> consumer) {
        //Kafkalog is the log information obtained
        String kafkaLog = (String) consumer.value();
        System.out.println ("received a message:" + kafkalog);
        //Code omission stored in database
    }

}

After the program is deployed, @ kafkalistener (Topics = Kerry) will continue to listen for messages with topic Kerry. If we call the previous test interface again, we will find that the new interface log will be continuously monitored, printed on the console and stored in the database.

Log platform (gateway layer) - based on openresty + elkf + Kafka

End

This operation document is to record the demo process, and many places are not mature, such as: how to obtain more comprehensive log information in nginx + Lua; reprocessing the log on logstash; writing beautiful spring boot code to make it easy to write to the database and make good use of kibana’s chart, etc.
Our next step is to formally build a log platform in the production environment of the project. We already have the rancher environment, and this architecture is planned to be implemented in the form of microservices. Subsequent build documents will be updated continuously.