Consul1.7 multi data center new hashicorp Learning Guide

Time:2020-11-26

Consul1.7 multi data center new hashicorp Learning Guide

More services, more problems

  1. With the popularity of Internet distributed system and microservice, how to improve the scalability and scalability between services?
  2. How to minimize the impact on dependent services when the server makes changes?
  3. How does the client develop the service without knowing?
  4. How to reduce configuration modification and overload nginx services behind the load balancing implemented by nginx?
  5. If Prometheus is used to monitor containers or services? Do you really need to write a lot of configuration? How do you monitor 1000 services? When some services are “optimized”, do you need to remove them from the configuration file of monitoring services one by one?

In the trend of microservice, in order to maximize the flexibility of expansion and reduction, name service and service discovery are more and more popular. At present, the mainstream service discovery components are: consult, etcd, zookeeper. The differences are not explained here. You can view the differences between these services on the official website.

What is consult

Consult is an open source tool developed by hashicorp company, which is used to realize the service discovery and registration center of distributed system. Compared with other distributed service registration and discovery solutions, consult’s scheme is more “one-stop”, with built-in service registration and discovery framework, distributed consistency protocol implementation, health check, key / value storage, access control, and multi data center solutions, so it no longer needs to rely on other tools (such as zookeeper). It is also easy to use. ConsultGoLanguage, so it has natural portability (supports Linux, windows and Mac OS X); the installation package contains only one executable file, which is easy to deploy, and can work seamlessly with lightweight containers such as docker.

Consult advantage

  1. Compared with Paxos algorithm, zookeeper uses Paxos, while etcd uses raft.
  2. It supports multiple data centers, and the internal and external network services use different ports for monitoring. Multi data center cluster can avoid single point of failure of single data center, and its deployment needs to consider network delay, fragmentation and so on. Zookeeper and etcd do not support multi data center functions.
  3. Support health examination. Etcd does not provide this feature.
  4. Support HTTP and DNS protocol interface. The integration of zookeeper is complex, etcd only supports HTTP protocol.
  5. The official web management interface is provided, but etcd has no such function.

Consult service architecture and core concepts

Consul1.7 multi data center new hashicorp Learning Guide

The server in the figure is the high availability cluster of the consult server, and client is the consult client. The client does not save the data and forwards the received request to the server. The data consistency is realized through LAN or WAN communication between servers. Each server or client is a consult agent, or the server and client are just different roles played by the agent.
Consult uses two different gossip pools. We call them LAN or WAN respectively. Each consult data center has a LAN session pool containing all members (server and client).

LAN pool has the following purposes:

  1. Membership allows clients to automatically discover server nodes, reducing the amount of configuration required
  2. Distributed fault detection allows fault detection to be performed at several points on a certain server, rather than centralizing all nodes in the whole cluster
  3. Gossip allows reliable and fast event broadcasting, such as leader elections

Wan pool is globally unique. No matter which data center it belongs to, all servers should join the WAN pool. Wan pool provides member information to enable server nodes to execute cross data center requests. In other words, this pool is different from the LAN pool. Its purpose is to allow data centers to discover each other in a low touch way. When the server in the data center receives the request from different data centers, it can forward the request to the leader of the data center

In each data center, client and server are mixed. Generally, 3-5 servers are recommended. This is based on a trade-off between availability and performance in the event of a failure, because the more machines join, the slower it is to reach consensus. However, there is no limit to the number of clients. They can be easily expanded to thousands or tens of thousands.

Build a consumer cluster

Environmental Science Server IP Node name role DC
ubuntu16.04 192.168.99.128 Node1 Server BJ
ubuntu16.04 192.168.99.129 Node2 Server BJ
ubuntu16.04 192.168.99.130 node3 Server BJ
ubuntu16.04 192.168.99.131 Node4 Server SH

At present, we use the latest version for test research:

wget https://releases.hashicorp.com/consul/1.7.2/consul_1.7.2_linux_amd64.zip
apt-get install unzip && unzip consul_1.7.2_linux_amd64.zip && mv consul /usr/local/bin/
for ip in 129 130 131
do
  scp /usr/local/bin/consul [email protected]$i:/usr/local/bin/
done

After preparing the binary package of consult, we will run the consult service instance on each machine:

#node1
consul agent -server -bootstrap-expect=3 -data-dir=/tmp/consul -node=192.168.99.128 -bind=192.168.99.128 -client=0.0.0.0 -datacenter=BJ -ui
#node2
consul agent -server -bootstrap-expect=3 -data-dir=/tmp/consul -node=192.168.99.129 -bind=192.168.99.129 -client=0.0.0.0 -datacenter=BJ -ui
#node3
consul agent -server -bootstrap-expect=3 -data-dir=/tmp/consul -node=192.168.99.130 -bind=192.168.99.130 -client=0.0.0.0 -datacenter=BJ -ui

Generally like to pass--helpCheck the command options, here as the number of words!

  1. Server: start as server. The default is client
  2. Bootstrap expect: the minimum number of servers required by the cluster. If the number is lower than this number, the cluster will fail.
  3. Data dir: the directory where the data is stored. For more information, see the consult data synchronization mechanism
  4. Node: node ID. each node in the cluster must have a unique name. By default, consult uses the host name of the machine
  5. Bind: the IP address to listen to. The default binding is 0.0.0.0, which can not be specified. Represents the address that consult listens to, and it must be accessible to other nodes in the cluster. By default, consult will listen to the first private IP, but it is better to provide one. The server on the production equipment usually has several network cards, so it is not wrong to specify one
  6. Client: the IP address of the client. 0.0.0 means that anyone can access it (without this, the following UI: 8500 cannot be accessed)
  7. UI: you can access the UI interface
  8. -Config dir specifies the configuration folder, and consult loads all the files in it
  9. -Datacenter specifies the data center name. The default is DC1
    When we start the consult service instance and complete, we will find that there is an error log. This is because the three machines have not joined at this time, which can not be regarded as a cluster. The consult on the three machines can not work normally because the leader is not selected
    2020-04-13T08:55:43.821-0400 [INFO]  agent.server.raft: entering follower state: follower="Node at 192.168.99.128:8300 [Follower]" leader=
    2020-04-13T08:55:43.823-0400 [INFO]  agent.server: Adding LAN server: server="192.168.99.128 (Addr: tcp/192.168.99.128:8300) (DC: bj)"
    2020-04-13T08:55:43.823-0400 [INFO]  agent.server: Handled event for server in area: event=member-join server=192.168.99.128.bj area=wan
    2020-04-13T08:55:43.824-0400 [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
    2020-04-13T08:55:43.825-0400 [INFO]  agent: Started HTTP server: address=[::]:8500 network=tcp
    2020-04-13T08:55:43.825-0400 [INFO]  agent: started state syncer
==> Consul agent running!
    2020-04-13T08:55:50.370-0400 [WARN]  agent.server.raft: no known peers, aborting election
    2020-04-13T08:55:50.833-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-04-13T08:56:07.711-0400 [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-04-13T08:56:18.200-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-04-13T08:56:37.477-0400 [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-04-13T08:56:50.035-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-04-13T08:57:13.322-0400 [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-04-13T08:57:27.154-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
    2020-04-13T08:57:38.397-0400 [ERROR] agent: Coordinate update error: error="No cluster leader"
    2020-04-13T08:57:58.613-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"

When a consult agent starts, it does not know any other nodes. To learn from other nodes in the cluster, the agent must join an existing cluster. To join such a cluster, it only needs to know one node in the cluster. After joining, it will talk to the member gossip and quickly discover other nodes in the cluster. A consult agent can join any type of other agents, not just those running in server mode. So we put192.168.99.129and192.168.99.130Add to192.168.99.128in

# node2
consul join 192.168.99.128
# node3
consul join 192.168.99.128
#At this point, you have joined the cluster
[email protected]:~# consul members
Node            Address              Status  Type    Build  Protocol  DC  Segment
192.168.99.128  192.168.99.128:8301  alive   server  1.7.2  2         bj  <all>
192.168.99.129  192.168.99.129:8301  alive   server  1.7.2  2         bj  <all>
192.168.99.130  192.168.99.130:8301  alive   server  1.7.2  2         bj  <all>
[email protected]:~# consul operator raft list-peers
Node            ID                                    Address              State     Voter  RaftProtocol
192.168.99.128  9095c165-7f5f-6892-e9bc-722c3a08ebf0  192.168.99.128:8300  leader    true   3
192.168.99.129  7234405d-cde5-e0ef-56b1-55e958de5b6c  192.168.99.129:8300  follower  true   3
192.168.99.130  8bbc7729-e41c-e548-c9a8-9bf9c01fdb54  192.168.99.130:8300  follower  true   3

The consult supports multiple data center nodes. As mentioned above, the multi data center nodes are discovered through the WAN pool. Therefore, we add a sh consult node

#node4
consul agent -server -bootstrap-expect 3 -data-dir /tmp/consul -node=192.168.99.131 -bind=192.168.99.131 -datacenter SH -ui
[email protected]:~# consul join -wan 192.168.99.128
Successfully joined cluster by contacting 1 nodes.
[email protected]:~# consul members -wan
Node               Address              Status  Type    Build  Protocol  DC  Segment
192.168.99.128.bj  192.168.99.128:8302  alive   server  1.7.2  2         bj  <all>
192.168.99.129.bj  192.168.99.129:8302  alive   server  1.7.2  2         bj  <all>
192.168.99.130.bj  192.168.99.130:8302  alive   server  1.7.2  2         bj  <all>
192.168.99.131.sh  192.168.99.131:8302  alive   server  1.7.2  2         sh  <all>

In this way, the cluster of multi data center nodes is configured. We can see the cluster information on the web interface of consult.

Consul1.7 multi data center new hashicorp Learning Guide
Consul1.7 multi data center new hashicorp Learning Guide

Recommended Today

Second week of summer vacation (improve code and summarize problems)

preface: This week, we mainly reviewed and improved the unfinished refactoring tasks. In this process, we also encountered problems. We summarized them. At the same time, through continuous practice, we have a further understanding of the code of the tutorial. 1 Taking the management of teachers as an example, this paper analyzes the relationship between […]