By Guo Hao (Xiang Sheng) Head of RPC framework for Alibaba economies
Reading guide: Dubbo community has planned a series of articles on “Dubbo cloud native road” to review the development of Apache Dubbo products and community, and look forward to the future development. The article series mainly covers three parts: Dubbo technology interpretation, community operation and application case analysis. This is the fourth in a series.
Protocol is the foundation of RPC. What is the format of data transmission on the connection, how the server determines the size of the received request, whether there can be multiple requests on the same connection at the same time, and how to respond to the request if there is an error… These are all problems that need to be solved by the protocol.
By definition, protocol defines rules, formats and semantics to specify how data is transmitted between networks. RPC requires that both sides of the communication can identify the same protocol. Data is transmitted in the form of bit stream on the network. If the local protocol is not recognized by the peer, the peer will not be able to obtain useful information from the request, which will lead to the same situation and will not meet the business requirements of the upper layer.
A simple protocol needs to define data exchange format, protocol format and request mode.
Data exchange format is also called serialization format in RPC. Commonly used serialization methods include JSON / protobuf / Hessian, etc
- Size of byte array after serialization
- Serialization and deserialization speed
- Readability after serialization
When choosing the serialization method, the protocol makes a trade-off among the three dimensions according to the specific requirements. The smaller the serialized array is, the more network traffic is saved, but the serialization process may be more time-consuming. Jsonxml, a text-based serialization method, is often easier to be accepted by developers, because compared with the byte array transmitted in a row, the text is easier to understand and can be easily recognized in all layers of devices, but the result of improving readability is that the performance is greatly reduced.
Protocol format is closely related to RPC framework. According to the function, there are two kinds of protocol formats
- One is compact protocol, which only provides simple metadata and data content for calling;
- The other is the composite protocol, which carries the metadata of the framework layer to provide functional enhancement. One representative of this kind of protocol is rsocket.
The request mode is closely related to the protocol format. The common request formats are synchronous request / response and asynchronous request / response. The difference is whether the client needs to wait for the response to return after sending a request. If there is no need to wait for a response, multiple unfinished requests can exist on a link at the same time, which is also called multiplexing. Another request model is streaming. There are multiple RPCs in a complete business call, and some data are transmitted each time. It is suitable for streaming data transmission.
With these three basic conventions, a simple RPC Protocol can be implemented.
One of the core contents of Dubbo 3 is to define the next generation RPC Protocol. In addition to the basic communication function, the new protocol should also have the following features:
- Unified cross language binary format
- Support streaming and application layer full duplex call model
- Easy to expand
- It can be identified by the devices of each layer
Here we compare some common protocols to explore the form of new protocol.
Http / 1.1 should be the most widely used protocol. Its simple and clear syntax, cross language and support for native mobile terminal make it the most widely accepted RPC scheme.
However, in terms of the demands of RPC Protocol, HTTP1.1 mainly has the following problems
- Head of queue blocking (hol) results in low performance on single connection. Although it supports pipeline, it can’t avoid response returning in order;
- The text-based protocol will repeatedly carry a lot of useless header information in each request, which wastes bandwidth and affects performance;
- The pure request / response request model can’t implement server push and can only rely on client polling. Similarly, streaming’s full duplex is not secure.
RESP Redis is a communication protocol used by redis. Its concise and easy to understand format also helps the rapid development of redis language clients. But this kind of protocol similar to http / 1.1 also has the same performance problem.
- However, the protocol does not support setting specific serialization methods, so it can only rely on client conventions;
- There is also the problem of queue head blocking, and pipeline can not fundamentally solve the problem of single connection performance;
- Pub / sub also has a number bottleneck in the case of single connection.
Dubbo 2.0 protocol is directly defined on TCP transport layer protocol, which provides the greatest flexibility for protocol function definition. But because of this obvious flexibility advantage, RPC Protocol is generally customized private protocol.
There is an extensible attachments part in the body of Dubbo protocol, which makes it possible to pass additional attributes besides RPC methods. It is a good design. However, the similar header part lacks similar extensible attachments. For this, please refer to the HTTP defined ASCII header design, which divides the responsibilities of body attachments and header attachments.
- Some RPC request locators in the body protocol, such as service name, method name and version, can be mentioned in the header and decoupled from the specific serialization protocol, so that they can be better identified by the network infrastructure or used for traffic control;
- For example, there is no reserved status identifier in the header, or there is a special packet designed for protocol upgrade or negotiation like http;
- In the Java version of the code implementation, it is not concise and universal. For example, in the link transmission, there are some language binding contents; There are redundant contents in the message body, such as service name in both body and attachments.
Http / 2.0 retains all the semantics of HTTP / 1. While maintaining compatibility, it makes great improvements in communication model and transmission efficiency, mainly to solve the problems in http / 1.
- It supports multiplexing on a single link. Compared with the exclusive request response link, the implementation based on frame makes more efficient use of the link. Streamid provides context state, and the client can support out of order response return according to streamid;
- The header compresses hpack and implements header cache based on static table and dynamic table to reduce the amount of data transferred;
- Request – stream semantics, supporting the data transmission of server push and stream;
- Binary frame, binary frame, can handle header and data separately.
Although http / 2.0 overcomes the above problems, there are still some controversial points, such as the necessity of flow control in the upper layer of TCP, and whether the compatibility of HTTP semantics through hpack is too complicated.
Compared with some frameworks that build application layer protocol on bare TCP, grpc chooses http / 2.0 as the transport layer protocol. The upper layer protocol function is realized by limiting the content of header and the format of payload.
Here are some design concepts of grpc:
- Coverage & simplicity: the protocol design and framework implementation should be universal and simple enough to run on any device, even on devices with resources such as IOT and mobile;
- Interoperability & reach, to build on a more general protocol, the protocol itself should be able to be supported by almost all the infrastructure on the network;
- General purpose & performer, to balance scenarios and performance, first of all, if the protocol itself is suitable for various scenarios, it should also have high performance as far as possible;
- The load transmitted on the protocol should be language and platform neutral;
- Streaming, which supports communication models such as request response, request stream and Bi steam;
- Flow control: the protocol itself has the ability of flow perception and restriction;
- Metadata exchange, in addition to RPC service definition, provides additional data transmission capabilities.
Under the guidance of this design concept, grpc is finally designed as a cross language, cross platform and general protocol. Basically, the functions have been fully equipped or can be easily extended to the new functions needed. However, we know that there is no silver bullet in software engineering. Compared with bare TCP proprietary protocol, grpc is definitely worse in ultimate performance. But for most applications, compared with HTTP / 1.1 protocol, grpc / http2 has made great progress in performance and readability.
In serialization, grpc is designed to be payload neutral, but the actual cross language scenario requires a strong specification interface definition language to ensure the consistency of serialization results. In the official implementation of grpc, protobuf and JSON are used to support performance scenarios and development efficiency scenarios respectively. From the selection of serialization mode to the comparison of protocol dimensions, it is the best choice to extend a new protocol based on grpc.
The protocol of Dubbo 3.0 is based on grpc, which provides extensions in application layer, exception handling, protocol layer load balancing support and reactive support. There are three main objectives
- In the distributed large-scale cluster scenario, it provides more perfect load balancing to achieve higher performance and ensure stability;
- It supports distributed standard extension such as tracing / monitoring, micro service standardization and smooth migration;
- Reactive semantics is enhanced in the protocol layer, which can provide distributed back pressure capability and better streaming support.
In addition to the support of protocol layer, the new Dubbo 3.0 protocol also includes the support of ease of use, including supporting both IDL Compiler and annotation compiler. The client will better support native asynchronous callbacks, future asynchronous and synchronous calls, and the server will use non reflective calls, which significantly improves the performance of the client and server. From the perspective of user migration, Dubbo framework will provide smooth protocol upgrade support, and strive to double the performance with as little modification code or configuration as possible.
- Dubbo cloud native Road: ASF graduation anniversary, 3.0
- Dubbo takes an important step out of cloud Nativity: application level service discovery and analysis
This paper introduces the basic concepts of RPC Protocol, compares some common protocols, and proposes Dubbo 3.0 protocol after comparing the advantages and disadvantages of these protocols. Dubbo 3.0 protocol will take the lead in ease of use, cross platform, cross language and high performance. It is expected that the Dubbo 3.0 agreement will be fully supported in March 2021. Please wait and see.
“Alibaba cloud nativeFocus on micro service, Serverless, container, Service Mesh and other technology areas, focusing on cloud native technology trends, cloud native large-scale landing practice, do the best understanding of the official account of cloud developers.