Spark – how does the RPC of each component communicate


Basic concepts

Before we start, let’s talk about a few concepts:

  • Rpcendpoint: an instance of RPC distributed, which is used to specify the processing of messages, such as receiving messages.
  • Rpcendpointref: the reference of rpcendpoint, that is, it points to the rpcendpoint of the server, so rpcendpointref will have the RPC address of the server.

Spark - how does the RPC of each component communicate

  • Inbox: in addition to rpcendpoint and rpcendpointref (the figure below is too crowded and the two are not drawn), it also stores the message queue of inboxmessage.
  • Endpointdata: including rpcendpoint, nettyrpcendpointref and inbox.

To sum up, the structure of endpointdata is as follows:
Spark - how does the RPC of each component communicate

Rpcendpoint registration

When rpcendpoint is instantiated and registered through dispatcher, the above components will be instantiated and an endpointdata will be encapsulated. It should be noted here that when the inbox component is instantiated, it will put OnStart messages into the messages queue.
After receiving the endpointdata, the dispatcher will throw the dispatcher into the receivers queue.
In addition, the mapping relationship between endpointdata and name, and the mapping relationship between rpcendpoint and rpcendpointref are also stored.
Spark - how does the RPC of each component communicate
The endpointdata thrown into the receivers queue is certainly not just thrown there, so it will have a thread pool for processing. In this thread pool, there will be n threads called messageloop that will listen to the receivers queue. If there is data in the queue, the data in the queue will be taken out.
This data is endpointdata. The processing of endpointdata is the inbox inside endpointdata.
As mentioned above, inbox also maintains an inboxmessage message queue messages, and inbox will take out messages and consume them one by one. Messages here have their own types, such as rpcmessage, OnStart, onstop, and so on.
As mentioned above, when instantiating an inbox, messages will have an OnStart, so there is something in the messages queue of the inbox at the beginning, that is, the corresponding method in OnStart, that is, OnStart of rpcendpoint, will be executed immediately.
Therefore, the signature part of the life cycle of rpcendpoint is to construct – > OnStart.
Spark - how does the RPC of each component communicate

Client request send

After rpcendpoint is registered, it can accept the request. Now let’s see how the client sends the request.
Before sending a message, the client will encapsulate the message into a requestmessage, and then judge whether the message sending address is the current address. If it is the current address, the process is the same as above. Put the message into the inbox, then store it in the receivers queue and wait for the thread to consume it.
If it is different, you need to serialize the message, package it into outboxmessage, and send it to outbox for processing.
Like inbox, outbox also has its own outboxmessage message queue, constantly taking messages from the queue and sending messages.
Spark - how does the RPC of each component communicate
After receiving the request, the server will deserialize the message and throw it to the receivers queue.

Overall process

1. The client generates a message, which is serialized and put into the Outbox queue.
2. A thread of the client takes out the message and sends it through netty according to the address of the server
3. The server receives the message, reverses the sequence number of the message, and then finds endpointdata through the information given by the client.
4. Remove the inbox from endpointdata and put the message into the queue.
5. Put endpointdata into the receivers queue.
6, the thread in the thread pool will take EndpointData out of the receivers queue and then call the RpcEndPoint of Inbox to process the message.
7. If there is a return value, the message sequence number will be stored in outbox through rpcendpointref, and the server thread will return the message to the client.
8. After the client receives the message, repeat the third step to process the message.
Spark - how does the RPC of each component communicate

Source code mind map

Creation of rpcenv
client poke request
Server processing request