Introduction and selection of Dubbo 3 triple protocol

Time:2022-5-7

Dubbo 3 provides triple (Dubbo 3) and Dubbo 2 protocols, which are the native protocols of the Dubbo framework. In addition, Dubbo 3 also integrates many third-party protocols and integrates them into Dubbo’s programming and service governance system, including grpc, thrift, jsonrpc, hessian2, rest, etc. The following focuses on the triple and dubbo2 protocols.

Next generation RPC Protocol – Triple

Triple protocol is the main protocol launched by dubbo3. Triple means the third generation, through Dubbo 1 0/ Dubbo2. With the evolution of two generations of protocols and the wave of technology standardization brought by cloud native, the new dubbo3 protocol triple came into being.

Introduction to RPC Protocol

Protocol is the core of RPC, which regulates the transmission content and format of data in the network. In addition to the necessary request and response data, it usually contains additional control data, such as the serialization method, timeout time, compression method and authentication information of a single request.

The agreement consists of three parts:

  • Data exchange format: defines the byte stream content of RPC request and response objects in network transmission, also known as sequencing mode;
  • Protocol structure: defines the list of fields, the semantics of each field and the arrangement of different fields;
  • The protocol specifies how data is transmitted between networks by defining rules, formats and semantics. A successful RPC requires that both ends of the communication can read and write the network byte stream and convert objects according to the protocol. If the two sides can not reach an agreement on the protocol to be used, there will be chicken and Duck Talk, which can not meet the needs of remote communication.

Introduction and selection of Dubbo 3 triple protocol

The design of RPC Protocol needs to consider the following contents:

  • Universality: unified binary format, cross language, cross platform and multi transport layer protocol support
  • Extensibility: the protocol adds fields, upgrades, supports user expansion and additional business metadata
  • Performance: as fast as it can be
  • Penetration: it can be recognized and forwarded by various terminal devices: the universality and high performance of gateway and proxy server can not be achieved at the same time, so the protocol designer needs to make some trade-offs

HTTP/1.1

Compared with the private RPC Protocol directly built on the TCP transport layer, the remote call solution built on HTTP will have better universality, such as WebServices or rest architecture. Using HTTP + JSON can be said to be a de facto standard solution.

Choosing to build on HTTP has two biggest advantages:

  • The semantics and extensibility of HTTP can well meet the needs of RPC calls.
  • Universality. HTTP protocol is supported by almost all devices on the network and has good protocol penetration.

However, there are also obvious problems:

  • In a typical request response model, there can only be one waiting request on a link at a time. Hol will be generated.
  • Human readable headers uses a more general and easier to read header transmission format, but its performance is quite poor
  • There is no direct server push support, and alternative modes such as polling and long polling need to be used

gRPC

The advantages and disadvantages of building RPC Protocol on HTTP and TCP protocol are mentioned above. Compared with Dubbo building on TCP transport layer, Google chooses to directly define grpc on HTTP / 2 protocol. The advantages of grpc are inherited from http2 and protobuf.

  • The protocol based on http2 is simple enough, the user learning cost is low, and naturally has the ability of server push / multiplexing / flow control
  • Based on protobuf’s multi language and cross platform binary compatibility, it provides strong unified cross language ability
  • Based on the rich ecology of the protocol itself, the natural support protocol of k8s / etcd and other components, and the original de facto protocol standard of cloud

But there are some problems

  • The support for service governance is more basic and more inclined to the basic RPC function. The protocol layer lacks the necessary unified definition, which is not easy for users to use directly.
  • The serialization method of strong binding protobuf requires high learning cost and transformation cost. For the existing monolingual users, the migration cost can not be ignored

Thoughts on the selection of triple

Finally, we chose to be compatible with grpc and build a new protocol, namely triple, with http2 as the transport layer. The rise of container applications and microservices has promoted the development of load content optimization technology. The traditional communication protocols (restful or other custom protocols based on HTTP) used in the client are difficult to meet the convenience needs of applications in performance, maintainability, scalability, security and so on. A cross language and modular protocol will gradually become a new application development protocol standard. Since grpc protocol became a project of CNCF in 2017, more and more infrastructures and businesses, including k8s and etcd, have begun to use the ecology of grpc. As the original micro service framework of cloud, Dubbo’s new protocol is also perfectly compatible with grpc. Moreover, triple will also enhance and supplement some imperfect parts of grpc protocol. So, does triple protocol solve a series of problems mentioned above?

  • Performance: triple protocol adopts the strategy of separating metadata from payload, which can avoid the parsing and deserialization of payload by intermediate devices such as gateway, so as to reduce the response time.
  • In terms of routing support, because metadata supports users to add custom headers, users can more conveniently divide clusters or route according to headers, so that when publishing, there is higher flexibility in stream cutting gray or disaster recovery.
  • In terms of security, it supports encrypted transmission capabilities such as bidirectional TLS authentication (MTLs).
  • In terms of ease of use, triple not only supports protobuf serialization recommended by native grpc, but also supports other serialization such as Hessian / JSON in a general way, which can make it easier for users to upgrade to triple protocol. For the original Dubbo service, to modify or add the triple protocol, you only need to add a line of protocol configuration in the code block declaring the service, and the transformation cost is almost zero.

Introduction and selection of Dubbo 3 triple protocol

present situation

1. Fully compatible with grpc, and the client / server can connect with the native grpc client

2. At present, it has been verified by large-scale production practice and reached the production level

Characteristics and advantages

1. With the ability of cross language interoperability, the traditional multi language and multi SDK mode and mesh cross language mode all need a more general and extensible data transmission format.

2. Provide a more complete request model. In addition to the request / response model, it should also support streaming and bidirectional.

3. It is easy to expand and has high penetration, including but not limited to tracing / monitoring support. It should also be able to be recognized by devices at all levels. Gateway facilities can recognize data messages, which is friendly to service mesh deployment and reduces the difficulty of user understanding.

4. Multiple serialization methods support and smooth upgrade.

5. It supports Java users to upgrade without perception. There is no need to define cumbersome IDL files. You can easily upgrade to triple protocol by simply modifying the protocol name.

Introduction to triple protocol

Further extension based on grpc protocol

  • Service-Version → “tri-service-version” {Dubbo service version}
  • Service-Group → “tri-service-group” {Dubbo service group}
  • Tracing-ID → “tri-trace-traceid” {tracing id}
  • Tracing-RPC-ID → “tri-trace-rpcid” {_span id _}
  • Cluster-Info → “tri-unit-info” {cluster infomation}

Service version and service group respectively identify the version and group information of Dubbo service, because the path of grpc declares service name and method name. Compared with Dubbo protocol, version and group information are missing; Tracing ID and tracing RPC ID are used for full link tracking capability, representing tracing ID and span ID information respectively; Cluster info represents cluster information, which can be used to build some flexible service governance capabilities related to routing, such as cluster division.

Triple Streaming

Compared with the traditional unary method, triple protocol has more capabilities of streaming RPC currently provided

  • What scenarios is streaming used for?

In some large file transmission, live broadcast and other application scenarios, consumers or providers need to transmit a large amount of data with the opposite end. Because the amount of data in these cases is very large, there is no way to transmit in an RPC packet. Therefore, for these packets, we need to fragment the packets and transmit them through multiple RPC calls, If we transmit these split RPC packets in parallel, the relevant packets are out of order after arriving at the opposite end. The received data needs to be sorted and spliced, and the relevant logic will be very complex. However, if we transmit the split RPC packets serially, the corresponding network transmission RTT and data processing delay will be very large.

In order to solve the above problems, and for the transmission of a large amount of data, it is pipelined between consumers and providers, so the model of streaming RPC came into being.

Through the streaming RPC mode of triple protocol, multiple long user connections, streams, will be established between consumers and providers. Multiple streams can exist on the same TCP connection at the same time, and each stream is identified by streamid. The data packets on one stream will be read and written in sequence.

summary

In the field of API, the most important trend is the rise of standardization technology. Triple protocol is the main protocol launched by dubbo3. It adopts layered design, and its data exchange format is developed based on protocol buffers (protocol buffers). It has excellent serialization / deserialization efficiency. Of course, it also supports a variety of serialization methods and many development languages. In the transport layer protocol, triple selects http / 2. Compared with HTTP / 1.1, its transmission efficiency has been greatly improved. In addition, as a mature open standard, http / 2 has rich security, flow control and other capabilities, as well as good interoperability. Triple can not only be used for server-side service calls, but also support the interaction between browsers, mobile apps and IOT devices and back-end services. At the same time, triple protocol seamlessly supports all service governance capabilities of Dubbo 3.

Under the trend of cloud native, the demand for interoperability between systems across platforms, manufacturers and environments will inevitably give birth to RPC technology based on open standards. Grpc conforms to the historical trend and has been more and more widely used. In the field of micro services, the proposal and implementation of triple protocol is a big step for dubbo3 towards cloud native micro services.

Appendix: dubbo2 protocol spec

Protocol SPEC!

Introduction and selection of Dubbo 3 triple protocol

  • Magic – Magic High & Magic Low (16 bits)Identifies dubbo protocol with value: 0xdabb
  • Req/Res (1 bit)Identifies this is a request or response. Request – 1; Response – 0.
  • 2 Way (1 bit)Only useful when Req/Res is 1 (Request), expect for a return value from server or not. Set to 1 if need a return value from server.
  • Event (1 bit)Identifies an event message or not, for example, heartbeat event. Set to 1 if this is an event.
  • Serialization ID (5 bit)Identifies serialization type: the value for fastjson is 6.
  • Status (8 bits)Only useful when Req/Res is 0 (Response), identifies the status of response
    • 20 – OK
    • 30 – CLIENT_TIMEOUT
    • 31 – SERVER_TIMEOUT
    • 40 – BAD_REQUEST
    • 50 – BAD_RESPONSE
    • 60 – SERVICE_NOT_FOUND
    • 70 – SERVICE_ERROR
    • 80 – SERVER_ERROR
    • 90 – CLIENT_ERROR
    • 100 – SERVER_THREADPOOL_EXHAUSTED_ERROR
  • Request ID (64 bits)Identifies an unique request. Numeric (long).
  • Data Length (32)Length of the content (the variable part) after serialization, counted by bytes. Numeric (integer).
  • Variable PartEach part is a byte[] after serialization with specific serialization type, identifies by Serialization ID.

Every part is a byte[] after serialization with specific serialization type, identifies by Serialization ID

  1. If the content is a Request (Req/Res = 1), each part consists of the content, in turn is:
    • Dubbo version
    • Service name
    • Service version
    • Method name
    • Method parameter types
    • Method arguments
    • Attachments
  1. If the content is a Response (Req/Res = 0), each part consists of the content, in turn is:
    • Return value type, identifies what kind of value returns from server side: RESPONSE_NULL_VALUE – 2, RESPONSE_VALUE – 1, RESPONSE_WITH_EXCEPTION – 0.
    • Return value, the real value returns from server.

Note: for the variable length part, when the current version of Dubbo framework uses JSON serialization, a new line character is added between the contents of each part as a separation. Please add an additional new line character after each part of the variable part, such as:

Dubbo version bytes
Service name bytes
...