More services, more problems
- With the popularity of Internet distributed system and microservice, how to improve the scalability and scalability between services?
- How to minimize the impact on dependent services when the server makes changes?
- How does the client develop the service without knowing?
- How to reduce configuration modification and overload nginx services behind the load balancing implemented by nginx?
- If Prometheus is used to monitor containers or services? Do you really need to write a lot of configuration? How do you monitor 1000 services? When some services are “optimized”, do you need to remove them from the configuration file of monitoring services one by one?
In the trend of microservice, in order to maximize the flexibility of expansion and reduction, name service and service discovery are more and more popular. At present, the mainstream service discovery components are: consult, etcd, zookeeper. The differences are not explained here. You can view the differences between these services on the official website.
What is consult
Consult is an open source tool developed by hashicorp company, which is used to realize the service discovery and registration center of distributed system. Compared with other distributed service registration and discovery solutions, consult’s scheme is more “one-stop”, with built-in service registration and discovery framework, distributed consistency protocol implementation, health check, key / value storage, access control, and multi data center solutions, so it no longer needs to rely on other tools (such as zookeeper). It is also easy to use. Consult
GoLanguage, so it has natural portability (supports Linux, windows and Mac OS X); the installation package contains only one executable file, which is easy to deploy, and can work seamlessly with lightweight containers such as docker.
- Compared with Paxos algorithm, zookeeper uses Paxos, while etcd uses raft.
- It supports multiple data centers, and the internal and external network services use different ports for monitoring. Multi data center cluster can avoid single point of failure of single data center, and its deployment needs to consider network delay, fragmentation and so on. Zookeeper and etcd do not support multi data center functions.
- Support health examination. Etcd does not provide this feature.
- Support HTTP and DNS protocol interface. The integration of zookeeper is complex, etcd only supports HTTP protocol.
- The official web management interface is provided, but etcd has no such function.
Consult service architecture and core concepts
The server in the figure is the high availability cluster of the consult server, and client is the consult client. The client does not save the data and forwards the received request to the server. The data consistency is realized through LAN or WAN communication between servers. Each server or client is a consult agent, or the server and client are just different roles played by the agent.
Consult uses two different gossip pools. We call them LAN or WAN respectively. Each consult data center has a LAN session pool containing all members (server and client).
LAN pool has the following purposes:
- Membership allows clients to automatically discover server nodes, reducing the amount of configuration required
- Distributed fault detection allows fault detection to be performed at several points on a certain server, rather than centralizing all nodes in the whole cluster
- Gossip allows reliable and fast event broadcasting, such as leader elections
Wan pool is globally unique. No matter which data center it belongs to, all servers should join the WAN pool. Wan pool provides member information to enable server nodes to execute cross data center requests. In other words, this pool is different from the LAN pool. Its purpose is to allow data centers to discover each other in a low touch way. When the server in the data center receives the request from different data centers, it can forward the request to the leader of the data center
In each data center, client and server are mixed. Generally, 3-5 servers are recommended. This is based on a trade-off between availability and performance in the event of a failure, because the more machines join, the slower it is to reach consensus. However, there is no limit to the number of clients. They can be easily expanded to thousands or tens of thousands.
Build a consumer cluster
|Environmental Science||Server IP||Node name||role||DC|
At present, we use the latest version for test research:
wget https://releases.hashicorp.com/consul/1.7.2/consul_1.7.2_linux_amd64.zip apt-get install unzip && unzip consul_1.7.2_linux_amd64.zip && mv consul /usr/local/bin/ for ip in 129 130 131 do scp /usr/local/bin/consul [email protected]$i:/usr/local/bin/ done
After preparing the binary package of consult, we will run the consult service instance on each machine:
#node1 consul agent -server -bootstrap-expect=3 -data-dir=/tmp/consul -node=192.168.99.128 -bind=192.168.99.128 -client=0.0.0.0 -datacenter=BJ -ui #node2 consul agent -server -bootstrap-expect=3 -data-dir=/tmp/consul -node=192.168.99.129 -bind=192.168.99.129 -client=0.0.0.0 -datacenter=BJ -ui #node3 consul agent -server -bootstrap-expect=3 -data-dir=/tmp/consul -node=192.168.99.130 -bind=192.168.99.130 -client=0.0.0.0 -datacenter=BJ -ui
Generally like to pass
--helpCheck the command options, here as the number of words!
- Server: start as server. The default is client
- Bootstrap expect: the minimum number of servers required by the cluster. If the number is lower than this number, the cluster will fail.
- Data dir: the directory where the data is stored. For more information, see the consult data synchronization mechanism
- Node: node ID. each node in the cluster must have a unique name. By default, consult uses the host name of the machine
- Bind: the IP address to listen to. The default binding is 0.0.0.0, which can not be specified. Represents the address that consult listens to, and it must be accessible to other nodes in the cluster. By default, consult will listen to the first private IP, but it is better to provide one. The server on the production equipment usually has several network cards, so it is not wrong to specify one
- Client: the IP address of the client. 0.0.0 means that anyone can access it (without this, the following UI: 8500 cannot be accessed)
- UI: you can access the UI interface
- -Config dir specifies the configuration folder, and consult loads all the files in it
- -Datacenter specifies the data center name. The default is DC1
When we start the consult service instance and complete, we will find that there is an error log. This is because the three machines have not joined at this time, which can not be regarded as a cluster. The consult on the three machines can not work normally because the leader is not selected
2020-04-13T08:55:43.821-0400 [INFO] agent.server.raft: entering follower state: follower="Node at 192.168.99.128:8300 [Follower]" leader= 2020-04-13T08:55:43.823-0400 [INFO] agent.server: Adding LAN server: server="192.168.99.128 (Addr: tcp/192.168.99.128:8300) (DC: bj)" 2020-04-13T08:55:43.823-0400 [INFO] agent.server: Handled event for server in area: event=member-join server=192.168.99.128.bj area=wan 2020-04-13T08:55:43.824-0400 [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp 2020-04-13T08:55:43.825-0400 [INFO] agent: Started HTTP server: address=[::]:8500 network=tcp 2020-04-13T08:55:43.825-0400 [INFO] agent: started state syncer ==> Consul agent running! 2020-04-13T08:55:50.370-0400 [WARN] agent.server.raft: no known peers, aborting election 2020-04-13T08:55:50.833-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" 2020-04-13T08:56:07.711-0400 [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-04-13T08:56:18.200-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" 2020-04-13T08:56:37.477-0400 [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-04-13T08:56:50.035-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" 2020-04-13T08:57:13.322-0400 [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-04-13T08:57:27.154-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader" 2020-04-13T08:57:38.397-0400 [ERROR] agent: Coordinate update error: error="No cluster leader" 2020-04-13T08:57:58.613-0400 [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
When a consult agent starts, it does not know any other nodes. To learn from other nodes in the cluster, the agent must join an existing cluster. To join such a cluster, it only needs to know one node in the cluster. After joining, it will talk to the member gossip and quickly discover other nodes in the cluster. A consult agent can join any type of other agents, not just those running in server mode. So we put
# node2 consul join 192.168.99.128 # node3 consul join 192.168.99.128 #At this point, you have joined the cluster [email protected]:~# consul members Node Address Status Type Build Protocol DC Segment 192.168.99.128 192.168.99.128:8301 alive server 1.7.2 2 bj <all> 192.168.99.129 192.168.99.129:8301 alive server 1.7.2 2 bj <all> 192.168.99.130 192.168.99.130:8301 alive server 1.7.2 2 bj <all> [email protected]:~# consul operator raft list-peers Node ID Address State Voter RaftProtocol 192.168.99.128 9095c165-7f5f-6892-e9bc-722c3a08ebf0 192.168.99.128:8300 leader true 3 192.168.99.129 7234405d-cde5-e0ef-56b1-55e958de5b6c 192.168.99.129:8300 follower true 3 192.168.99.130 8bbc7729-e41c-e548-c9a8-9bf9c01fdb54 192.168.99.130:8300 follower true 3
The consult supports multiple data center nodes. As mentioned above, the multi data center nodes are discovered through the WAN pool. Therefore, we add a sh consult node
#node4 consul agent -server -bootstrap-expect 3 -data-dir /tmp/consul -node=192.168.99.131 -bind=192.168.99.131 -datacenter SH -ui [email protected]:~# consul join -wan 192.168.99.128 Successfully joined cluster by contacting 1 nodes. [email protected]:~# consul members -wan Node Address Status Type Build Protocol DC Segment 192.168.99.128.bj 192.168.99.128:8302 alive server 1.7.2 2 bj <all> 192.168.99.129.bj 192.168.99.129:8302 alive server 1.7.2 2 bj <all> 192.168.99.130.bj 192.168.99.130:8302 alive server 1.7.2 2 bj <all> 192.168.99.131.sh 192.168.99.131:8302 alive server 1.7.2 2 sh <all>
In this way, the cluster of multi data center nodes is configured. We can see the cluster information on the web interface of consult.