Alertmanager cluster setup

Time:2021-8-15

1、 Alertmanager cluster setup

1. Background

If the single node alarm manager goes down, all alarm information cannot be sent out, which is still dangerous. Therefore, we need to build a highly available alarm manager.

Here, record how to build an alertmanager cluster with three nodes.

2. Machine

machine Cluster port Web page port
127.0.0.1 9083 9082
127.0.0.1 9085 9084
127.0.0.1 9087 9086

3. Cluster available configurations

To create a highly available cluster of the Alertmanager the instances need to be configured to communicate with each other. This is configured using the --cluster.* flags.

  • --cluster.listen-addressstring: cluster listen address (default “0.0.0.0:9094”; Empty string disables (HA mode) the address that the cluster service listens to
  • --cluster.advertise-address string: cluster advertise address
  • --cluster.peerValue: initial peers (repeat flag for each additional peer) the cluster machine address associated with other instances during initialization
  • --cluster.peer-timeout value: peer timeout period (default “15s”)
  • --cluster.gossip-interval value: cluster message propagation speed (default “200ms”)
  • --cluster.pushpull-interval value: lower values will increase convergence speeds at expense of bandwidth (default “1m0s”)
  • --cluster.settle-timeout value: maximum time to wait for cluster connections to settle before evaluating notifications.
  • --cluster.tcp-timeout value: timeout value for tcp connections, reads and writes (default “10s”)
  • --cluster.probe-timeout value: time to wait for ack before marking node unhealthy (default “500ms”)
  • --cluster.probe-interval value: interval between random node probes (default “1s”)
  • --cluster.reconnect-interval value: interval between attempting to reconnect to lost peers (default “10s”)
  • --cluster.reconnect-timeout value: length of time to attempt to reconnect to a lost peer (default: “6h0m0s”)

The chosen port in the cluster.listen-address flag is the port that needs to be specified in the cluster.peer flag of the other peers.

The cluster.advertise-address flag is required if the instance doesn’t have an IP address that is part of RFC 6890 with a default route.

Upper configuration, fromalertmanagerstaygithubConfiguration on. Address:https://github.com/prometheus…

4. Alertmanager startup script

1. 127.0.0.1:9083 machine startup script

nohup /Users/huan/soft/prometheus/alertmanager-0.21.0/alertmanager \
--config.file="/Users/huan/soft/prometheus/alertmanager-0.21.0/alertmanager.yml" \
--web.listen-address="0.0.0.0:9082" \
--data.retention=48h \
--storage.path="/Users/huan/soft/prometheus/alertmanager-0.21.0/data" \
--cluster.listen-address="0.0.0.0:9083" \
--log.level=debug \
> logs/alertmanager.out 2>&1 &

2. 127.0.0.1:9085 machine startup script

nohup /Users/huan/soft/prometheus/alertmanager-0.21.0/alertmanager \
--config.file="/Users/huan/soft/prometheus/alertmanager-0.21.0/alertmanager.yml" \
--web.listen-address="0.0.0.0:9084" \
--data.retention=48h \
--storage.path="/Users/huan/soft/prometheus/alertmanager-0.21.0/data" \
--cluster.listen-address="0.0.0.0:9085" \
--cluster.peer="127.0.0.1:9083" \
--log.level=debug \
> logs/alertmanager.out 2>&1 &

3. 127.0.0.1:9087 machine startup script

nohup /Users/huan/soft/prometheus/alertmanager-0.21.0/alertmanager \
--config.file="/Users/huan/soft/prometheus/alertmanager-0.21.0/alertmanager.yml" \
--web.listen-address="0.0.0.0:9086" \
--data.retention=48h \
--storage.path="/Users/huan/soft/prometheus/alertmanager-0.21.0/data9087" \
--cluster.listen-address="0.0.0.0:9087" \
--cluster.peer="127.0.0.1:9083" \
--log.level=debug \
> logs/alertmanager-9087.out 2>&1 &

5. Modify Prometheus configuration

Prometheus.yml configuration modification

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 127.0.0.1:9082
      - 127.0.0.1:9084
      - 127.0.0.1:9086

6. View cluster status

Alertmanager cluster setup

At this point, a cluster of alertmanager is set up.

7. Precautions

1. If the instance does not use the default routeRFC 6890Part of the IP address, the cluster.advertisement-address flag is required.

2. If our alarm manager is 0.15 or above, both TCP and UDP ports need to be accessible.

3. Do not load balance between Prometheus and alertmanager. You should tell Prometheus all alertmanager addresses.

4. Nodes in the cluster communicate throughGossipProtocol.

8. High availability architecture diagram of alarm manager

Alertmanager cluster setup

2、 Reference link

1、RFC 6890
2、Alertmanager cluster setup
3、https://www.bookstack.cn/read/prometheus-book/ha-alertmanager-high-availability.md