Telegraf, a sharp tool for understanding index collection

Time:2021-9-18

Telegraf, a sharp tool for understanding index collection
Author Jiang Mingming
Source |Erda official account

Reading guide: in order to let you better understand the design and implementation of APM system in MSP, we decided to write a series of articles on detailed discussion on microservice observation to go deep into the products, architecture design and basic technology of APM system. This is the third article in this series. It will mainly introduce the implementation principle of telegraf data processing link and the implementation mode of plug-in.

A series of articles on micro service observation:

Telegraf is a very popular open source index acquisition software of influxdata company, which has tens of thousands of stars in gihub. With the help of the community, it has more than 200 kinds of collection plug-ins and more than 40 kinds of export plug-ins, covering almost all monitoring items, such as machine monitoring, service monitoring and even hardware monitoring.

architecture design

Pipeline concurrent programming


In go, pipeline concurrent programming mode is a common concurrent programming mode. In short, it is composed of a series of stages as a whole. Each stage is composed of a group of goroutines running the same function, and each stage is connected with each other by channel.

In each phase, goroutine is responsible for the following:

  1. Receive the data generated by the upstream stage through the entry channel.
  2. Process data, such as format conversion, data filtering, aggregation, etc.
  3. Send the processed data to the downstream stage through the exit channel.

Each stage has one or more exit and entry channels at the same time. Except for the first and last stage, it has only exit and entry channels respectively.

Implementation in telegraf


This programming mode is adopted by telegraf, which mainly has four stages: inputs, processors, aggregators and outputs.

  • Inputs: responsible for collecting the original monitoring indicators, including active acquisition and passive acquisition.
  • Processors: responsible for processing the data collected by inputs, including de duplication, renaming, format conversion, etc.
  • Aggregators: responsible for aggregating the data processed by processors and calculating the aggregated data.
  • Outputs: it is responsible for receiving and processing the data output by processors or aggregators and exporting it to other media, such as files, databases, etc.

And they are also linked to each other by channel. The frame composition is as follows:

Telegraf, a sharp tool for understanding index collection

It can be seen that the pipeline concurrent programming mode is adopted as a whole. Let’s briefly introduce its operation mechanism:

  • The first stage is inputs. Each input generates a goroutine, which collects data and fan in to the channel.
  • The second stage is processors. Each processor generates a goroutine and connects with each other through channel in order.
  • The third stage is aggregators. Each aggregator generates a goroutine, consumes the data generated by processors, and fans out to each aggregator.
  • The last stage is outputs. Each output generates a goroutine, consumes the data generated by processors or aggregators, and fans it out to each output.

Fan in: multiple functions output data to a channel, and a function reads the channel until it is closed.

Fan out: multiple functions read the same channel until it is closed.

Plug in design


With so many input, output and processor plug-ins, how does telegraf manage these plug-ins efficiently? And how to design plug-in system to cope with the increasing expansion requirements? Don’t worry, please let me elaborate.

In fact, the plug-in here is not a normal plug-in (that is, dynamically loading and binding the dynamic link library at runtime), but a variant based on the factory mode. First, let’s take a look at the plug-in directory structure of telegraf:

plugins
├── aggregators
│   ├── all
│   ├── basicstats
│   ├── registry.go
...
├── inputs
│   ├── all
│   ├── cpu
│   ├── registry.go
...
├── outputs
│   ├── all
│   ├── amqp
│   ├── registry.go
...
├── processors
│   ├── all
│   ├── clone
│   ├── registry.go
...


As can be seen from the above, the directory structure is regular (we take the plug-in of inputs as an example below, and the implementation of other modules is similar).

  • Plugins / inputs: package directory for each input plug-in.
  • Plugins / inputs / all: import the plug-in module package (mainly to avoid circular reference).
  • Plugins / inputs / registry.go: stores the registry and related functions.

Interface declaration


Telegraf declares the following input interface through interface, indicating input:

Telegraf, a sharp tool for understanding index collection

Interface implementation


Create a plug-in in the plugins / inputs / directory, such as CPU, and implement the input interface:

Telegraf, a sharp tool for understanding index collection

Register plug-ins


Finally, we only need to register the factory function of the plug-in in the global registry:

Telegraf, a sharp tool for understanding index collection

In this way, many plug-ins are managed in an orderly manner. At the same time, the extension is also very convenient. You only need to implement the input interface and register the factory function.

Application in Erda


In Erda, we use telegraf as the indicator collection service of Erda platform and deploy it on each physical machine in the form of daemon. Nowadays, it has been widely used in production, operates stably on thousands of machines, collects and reports a large number of indicators for SRE and relevant operation and maintenance personnel to analyze and troubleshoot conveniently.

Due to some special requirements, we have to carry out secondary development based on telegraf to better adapt to business requirements. Nevertheless, thanks to the powerful plug-in system of telegraf, we often only need to add plug-ins according to the requirements. For example, add the output plug-in to report to our own collection end, and add the input plug-in to check the health of Erda’s own components.

In the future, we will gradually abandon the second part, embrace open source, and maximize the consistency with the official open source version of telegraf, so as to give back to the community.

reference resources

If you have any questions, welcome to add a little assistant wechat (erda202106) to join the communication group and participate in communication and discussion!

Recommended Today

I want to discuss canvas 2D and webgl with you

background This article is included inData visualization and graphicsspecial column As mentioned above, I wrote my first column and realized a simple program for the cognition of graphics and visualization. It was originally intended that the follow-up sequence has focused on the algorithm and rendering direction. However, based on the students’ feedback on obscure problems […]