Detailed explanation of filebeat core configuration


Filebeat principle

Official description:

Detailed explanation of filebeat core configuration


FilebeatThere are two parts,inputsandharvestersinputsResponsible for finding documents (similarfindCommand) and managementharvesters, oneharvesterIt corresponds to a file one by one, reads it line by line, and then sends it tooutput(similartail -f)。

Log input configuration details

Official description:

Basic examples

- type: log
    - /var/log/system.log
    - /var/log/wifi.log
- type: log
    - "/var/log/apache2/*"
    apache: true
  fields_under_root: true

Inputs can be configured with multiple blocks
Paths can be configured with multiple files. Both file paths and file names support generic configuration.

ignore_ Older and scan_ frequency

Scenario problems

Scenario 1: there may be many historical files under the path. For example, the daily segmentation is configured. Obviously, we do not need the old files.
Scenario 2: how to control the scanning frequency? If the general configuration setting is complex, frequent scanning of files is also a great overhead.


Scenario 1 is throughignore_olderParameter resolution, which means how long ago old files are not scanned.
For example, if it is set to 1H, it means that logs with a file time before 1H will not be collected by the input module until a new log is generated.

Scenario 2 is throughscan_frequencyParameter control, indicating how often to scan for new files.
For example, if 10s is set (default), a new file will be found after 10s, or an old file (aboveignore_older)This file was not found until 10s after a new line of log was generated.

close_* And clean_*

coverharvesterDo you always hold the acquired documents? What happens when a file is renamed or deleted?

close_* Configure cluster

The close_* configuration options are used to close the harvester after a certain criteria or time. Closing the harvester means closing the file handler.

How often to close a file, such as a log file, and close the file handle after 10 minutes without reading any new content.

The time here does not depend on the last update time of the file, butFilebeatThe time recorded internally. The time difference between the last time the file was read and this time the file was attempted to be read.

The official recommended setting time is one order of magnitude higher than the frequency of file data generation (5m by default). For example, logs are generated every second. This value can be set to 1m.

Close or notrenameFile for.

On by default.
When a file is deleted, the file handle is closed.

This is in line with the normal scenario. Generally, the log cleaning program cleans the logs many days ago. This time is much longer thanignore_olderandclose_inactive

clean_* Configure cluster

The clean_* options are used to clean up the state entries in the registry file.

Filebeat internally records many file states, which are saved in data/registry/filebeat/data json。 If you do not clean up, the file will become larger and larger, affecting efficiency.

    "source": "/xxx/logs/logFile.2021-09-20.log",
    "offset": 661620031,
    "timestamp": "2021-09-21T00:04:23.050179808+08:00",
    "ttl": 10800000000000,
    "type": "log",
    "meta": null,
    "FileStateOS": {
        "inode": 184559118,
        "device": 2056

How often do you clean up your registration information. The default value is 0 (the related functions of clean_ * are not enabled)

The cleaned file information needs to ensure that the file is inactive, so this value needs to be greater thanignore_older + scan_frequency
Otherwise, if the file is found again after cleaning, it will be read again, and it will be repeated.

Whether to clear the registration information after the file is deleted. It is enabled by default.
Need andclose_removedValues are consistent

Briefly summarize several time configurations:
clean_inactive > ignore_older + scan_frequency > close_inactive

Recommended configuration:

tail_files: false
scan_frequency: 10s
ignore_older: 60m
close_inactive: 10m
close_renamed: true
close_removed: true
clean_inactive: 70m
clean_removed: true

Resource constraints

When there are many logs and the machine load is high, the machine burden is increased. It is suggested that the filebeat resources should be limited in the production environment:
max_procsThe maximum number of cores to use will be used by default. It is limited to 1-4 cores according to the machine conditions, which will not affect the push efficiency.

Configure automatic loading

Official description:

  enabled: true
  path: configs/*.yml
  reload.enabled: true
  reload.period: 10s

The specific input configuration file is placed in the configs folder, for example:

- type: log
    - /var/log/messages
    - /var/log/*.log