Distributed communication

Time:2020-6-28

Today we will discuss distributed communication technology.

Why distributed communication is needed

When we talked about distributed resource scheduling, we made an analogy between each node in the distributed system and the process of the operating system. We know that the process communication mechanism is needed because of the data exchange between the processes of the operating system. In the same way, communication is also needed between distributed systems. At the business level, each distributed system generally carries a microservice, so microservices must also communicate with each other. For example, each of our business lines needs to query the data of user center microservices and so on. There are three common ways of communication: RPC, publish subscribe and message queue.

RPC

In the traditional B / S mode, the server will expose the interface, and then the client will complete the communication between them by calling the interface. So in the distributed system, we can also use this mode. However, the B / S architecture is based on the HTTP protocol. Each time an interface is called, an HTTP request needs to be made first. This is not suitable for large-scale distributed systems with low delay requirements, so the implementation of remote call mostly uses the lower layer network communication protocol. Let’s take a look at the RPC architecture with a picture:
Distributed communication
In this case, the order system process does not need to know how the underlying transmission is. In the user’s eyes, there is no difference between a remote procedure call and a local service call. This is the core of RPC. Steps 3 and 8 in the figure are transparent to our callers. Different from the interface calls we often use, the network communication in the figure is basically based on some protocols encapsulated by TCP protocol itself. In this way, the data format of both sides of communication can be agreed, so that the client packet and the server packet can be unpacked more quickly and more suitable for the distributed system. For the communication protocol encapsulation here, please refer to redis’s resp protocol and fastcgi protocol.

Typical implementation of RPC – Dubbo

Suppose we want to implement an RPC communication framework ourselves, how should we implement it? If we use four callers and four service providers, how do we manage them?
Distributed communication
First of all, the easiest thing to think of is that the service provider is the service caller and provides the relevant SDK. The service caller can directly introduce the SDK to initiate RPC call requests, while the caller doesn’t care what protocol is used inside the SDK. This is a scheme. However, as more and more service providers and service callers, the service invocation relationship will become more complex. Assuming that there are n service providers and M service callers, the call relationship can reach n * m, which will lead to a large amount of system traffic, and the SDK will not be able to cope with it. At this point, you may think that in the computer field, all problems can be solved by adding a middle layer. Then, why don’t we use a service registry for unified management? In this way, the caller only needs to find the corresponding address in the service registry, and doesn’t care how many service providers there are, so as to realize the decoupling between the service caller and the service provider:
Distributed communication
On the basis of the introduction of service registry, Dubbo also adds the monitoring center component (used to monitor the call of services for the convenience of service governance), and implements an RPC framework. As shown in the figure below, Dubbo’s architecture mainly includes four parts:

  • Service provider. The service provider will register its services with the service registry.
  • Service registry. The service registration and discovery center is responsible for storing and managing the service information registered by the service provider and the service type subscribed by the service caller.
  • Service caller. According to the address list of the service returned by the service registry, the remote service is accessed through remote call.
  • Monitoring center. The monitoring center that counts the number and time of service calls to facilitate service management or service failure analysis.

The following is a caller’s demo given on Dubbo’s official website. First, configure the address of the service to be called in the service registry:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:dubbo="http://dubbo.apache.org/schema/dubbo"
       xmlns="http://www.springframework.org/schema/beans"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.3.xsd
       http://dubbo.apache.org/schema/dubbo http://dubbo.apache.org/schema/dubbo/dubbo.xsd">

    <!-- consumer's application name, used for tracing dependency relationship (not a matching criterion),
    don't set it same as provider -->
    <dubbo:application name="demo-consumer"/>
    <!-- use multicast registry center to discover service -->
    <dubbo:registry address="multicast://224.5.6.7:1234"/>
    <!-- generate proxy for the remote service, then demoService can be used in the same way as the
    local regular interface -->
    <dubbo:reference id="demoService" check="false" interface="org.apache.dubbo.demo.DemoService"/>
</beans>

Then, call the newly configured service provider address in the business code. We no longer need the SDK:

import org.springframework.context.support.ClassPathXmlApplicationContext;
import org.apache.dubbo.demo.DemoService;
 
public class Consumer {
    public static void main(String[] args) throws Exception {
        ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(new String[] {"META-INF/spring/dubbo-demo-consumer.xml"});
        context.start();
        // Obtaining a remote service proxy
        DemoService demoService = (DemoService)context.getBean("demoService");
        // Executing remote methods
        String hello = demoService.sayHello("world");
        // Display the call result
        System.out.println(hello);
    }
}

Publish subscribe

The idea of publish subscribe can be seen everywhere in our life. For example, when we eat chicken, we usually team up four people to drive black. We can be compared to four nodes in the distributed system. Take a very classic scene, for example, when I jumped the airport, I had a lot of resources, including 5.56 bullets, 7.62 bullets, etc. So I said to my teammates, I have 5.56 and 7.62 more bullets, who needs to talk to me. But some of my teammates will be poorer if they go to the field. They will say to me, I want 5.56 bullets or I want 7.62 bullets. Then, I will find this poor teammate and distribute the corresponding 5.56 and 7.62 bullets to them, so as to complete a publish subscribe process. Among them, “I said to my teammates, I have 5.56 and 7.62 more bullets. If anyone needs to talk to me,” this is to publish the news event of “I have multiple bullets”, and then poor teammates say “I need XXX bullets”, which is equivalent to subscribing to the news event I published, and then I will give them the bullets. This bullet is equivalent to our news. This This completes the communication of the publish subscribe model:
Distributed communication
The producer can send messages to the center, and the message center is usually divided into topics. Each message has a corresponding topic, which represents the type of the message. All consumers who subscribe to the topic can get the message for consumption. Here, our 5.56 and 7.62 bullets are equivalent to two topics. We can subscribe to one of them to get the type of bullets we need.

Typical implementation of publish subscribe Kafka

Kafka is a typical publish subscribe message system. Its system architecture also includes three parts: producer, consumer and message center
Distributed communication
In Kafka, in order to solve the problem of load balancing and system reliability of message storage, the concepts of topic and partition are introduced. We just talked about the concept of topic, which is a logical concept, referring to message type or data type. So the partition is based on topic. The content of a topic can be divided into multiple partitions, which are distributed on different cluster nodes. The data content of each partition depends on the data synchronization mechanism to ensure the consistency of the data stored in each partition:
Distributed communication
Each broker represents a physical node in a cluster. Through the partition mechanism, we avoid “putting all the data in one basket”, distribute the data on different broker machines, improve the data reliability of the system, and achieve load balancing.
In the figure, another difference is that two consumers form a consumption group. So why introduce consumer groups? We know that in the case of too many messages, when the consumption capacity of a single consumer is limited, the consumption efficiency will be too low, which will lead to the overflow of broker storage, so some messages have to be discarded. In order to solve this problem, Kafka introduced a consumption group to improve the speed of consumption.
In Kafka, zookeeper is used in addition to the three basic elements. Zookeeper is a third-party component that provides the ability of distributed service collaboration. It is used to coordinate and manage the broker and consumer in the whole cluster, realize the decoupling of broker and consumer, and provide reliability guarantee for the system. Both consumer and broker will register with zookeeper when they start, and zookeeper will conduct unified management and coordination.
Zookeeper will store some metadata information, for example, for broker, it will store the partition corresponding to the subject, the storage location of each partition, etc.; for consumer, it will store which consumers are included in the consumer group, which partitions each consumer will be responsible for consuming, etc.

Message queuing

Message queuing is similar to publish subscribe model, but there are some differences. Then for the example of eating chicken before, message queuing doesn’t care who needs what bullets, just put more resources in a certain position and let teammates take them. If teammates need it, just take it by yourself. Message queuing does not directly allocate resources to a specific consumer, it is only responsible for publishing to the message queue, and then consumers get what they need. The most typical scenario is asynchronous communication.
For example, user registration needs to write a database, send an email, and follow the simplest synchronous communication mode. Then, from the user submitting registration to receiving a response, it needs to wait for the system to complete these two steps before the user returns to the registration. If it takes a long time to send an email, users have to wait:
Distributed communication
As shown in the figure below, if message queue is introduced as the intermediate communicator between the three components of register message writing to database and sending mail and SMS, the three components can realize asynchronous communication and asynchronous execution:
Distributed communication
That is to say, the user only needs to write the message queue of sending mail after writing to the database to return the registration success, and does not need to wait for the real sending mail to return. Therefore, we decoupled the two operations of registration and sending mail, greatly improving the response speed of registration. Then you might ask, what if you fail to send an email? Generally, we will write some retry logic in the business layer to ensure that the email is sent successfully before it is considered as successful consumption. In general, queues have a persistence mechanism to ensure that messages are not lost.
In addition to converting synchronization to asynchrony, message queuing also plays a role of peak shaving in high concurrency systems. For flow control, there are leaky bucket and token bucket algorithms, which can be further understood by interested readers.

Next notice

Distributed computing

Focus on us

Readers who are interested in this series are welcome to subscribe to our official account.

Distributed communication