Half an hour is enough for websocket to be proficient!

Time:2021-5-29

The original title of this article is “websocket: 5 minutes from introduction to Mastery”_ See the resources section at the end of the article for a link to the original text. There are some changes in this collection.

1. Introduction

Since the emergence of websocket in HTML5, it has completely changed the “pain point” of the basic channel of instant messaging technology on the webShort polling, long polling, comet, SSE and other technologies, it can be said that they have been suffering for a long time…), now we no longer have to worry about whether “polling” or “comet” technology should be used to ensure the real-time performance of data. Happiness comes so suddenly ^ – ^.

Websocket is now not only widely used in web applications, but also gradually applied by developers to various rich clients (such as mobile terminals) that originally used TCP and UDP protocols.

In view of this, it is very necessary for instant messaging developers to have a comprehensive and in-depth understanding of websocket, and they will also investigate this knowledge when they view it face to face.

Therefore, in the past few years since its establishment, the instant messaging network has continuously collated a large number of technical articles related to web instant messaging (especially websocket). This article is also an article about websocket from beginning to proficient. The content is from simple to deep, which is more suitable for developers who want to have a deeper understanding of websocket protocol in a short time.

(this article is published at:http://www.52im.net/thread-3134-1-1.html

2. Related articles

Detailed explanation of websocket (1): preliminary understanding of websocket Technology
Websocket (2): technical principles, code demonstration and application cases
Detailed explanation of websocket (3): go deep into the details of websocket communication protocol
The relationship between HTTP and websocket
The relationship between HTTP and websocket (Part 2)
Websocket (6): the relationship between websocket and socket

3. Text overview

The emergence of websocket makes the browser have the ability of real-time two-way communication.

This article will introduce how websocket establishes connection, the details of data exchange, and the format of data frame. In addition, it also briefly introduces the security attacks against websocket and how the protocol can resist similar attacks.

4. What is websocket

4.1 basic introduction

HTML5 began to provide a full duplex communication network technology between browser and server, which belongs to the application layer protocol. It is based on TCP and multiplexes the handshake channel of HTTP.

For most web developers, the above description is a bit boring. In fact, just remember a few points:

  • 1) Websocket can be used in browser;
  • 2) Support two-way communication;
  • 3) It’s easy to use.

4.2 what are the advantages

When it comes to the advantages, the contrast here is HTTP protocol, which is more flexible, more efficient and more extensible.

  • 1) Support two-way communication, real-time stronger;
  • 2) Better binary support;
  • 3) Less control overhead. After the connection is created, when the WS client and server exchange data, the packet header controlled by the protocol is smaller. If the header is not included, the header from server to client is only 2-10 bytes (depending on the packet length). If the header is from client to server, an additional 4-byte mask is needed. The HTTP protocol needs to carry a complete header for each communication;
  • 4) Support extension. Ws protocol defines extension, users can extend the protocol or implement custom sub protocol( Such as supporting custom compression algorithm, etc.)

For the latter two points, students who have not studied websocket protocol specification may not understand it intuitively, but it does not affect the learning and use of websocket.

4.3 what to learn

For the learning of network application layer protocol, the most important process is often the process of connection establishment and data exchange. Of course, the format of data is inevitable, because it directly determines the ability of the protocol itself. Good data format can make the protocol more efficient and scalable.

The following is mainly around the following points:

  • 1) How to establish the connection;
  • 2) How to exchange data;
  • 3) Data frame format;
  • 4) How to maintain the connection.

5. Introduction demo code

Before formally introducing the details of the agreement, let’s take a simple example to have an intuitive feeling. Examples include websocket server and websocket client. The complete code can be found inhere  Find it.

Here, the server uses the WS library. Compared with the familiar socket.io, WS is lighter and more suitable for learning.

5.1 server

The code is as follows, monitoring port 8080. When a new connection request arrives, print the log and send a message to the client. When a message from the client is received, the log is also printed.

var app = require(‘express’)();
var server = require(‘http’).Server(app);
var WebSocket = require(‘ws’);
var wss = newWebSocket.Server({ port: 8080 });
wss.on(‘connection’, function connection(ws) {
    console.log(‘server: receive connection.’);
    ws.on(‘message’, functionincoming(message) {
        console.log(‘server: received: %s’, message);
    });
    ws.send(‘world’);
});
app.get(‘/’, function(req, res) {
  res.sendfile(__dirname + ‘/index.html’);
});
app.listen(3000);

5.2 client

The code is as follows: initiate websocket connection to port 8080. After the connection is established, print the log and send a message to the server. After receiving the message from the server, the log is also printed.

<script>
  var ws = new WebSocket(‘ws://localhost:8080’);
  ws.onopen = function() {
    console.log(‘ws onopen’);
    ws.send(‘from client: hello’);
  };
  ws.onmessage = function(e) {
    console.log(‘ws onmessage’);
    console.log(‘from server: ‘+ e.data);
  };
</script>

5.3 operation results

You can view the logs of the server and the client respectively, which will not be expanded here.

Server output:

server: receive connection.
server: received hello

Client output:

client: ws connection is open
client: received world

6. How to establish a connection

As mentioned earlier, websocket reuses the HTTP handshake channel. Specifically, the client negotiates the upgrade protocol with websocket server through HTTP request. After the protocol upgrade, the subsequent data exchange will follow the websocket protocol.

6.1 client: apply for protocol upgrade

First, the client initiates a protocol upgrade request. It can be seen that the standard HTTP message format is adopted, and only the get method is supported.

GET / HTTP/1.1
Host: localhost:8080
Origin:http: //127.0.0.1:3000
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: w4v7O6xFTi36lq3RNcgctw==

The significance of the first part of the key request is as follows:

  • 1) Connection: Upgrade: indicates to upgrade the protocol;
  • 2) Upgrade: websocket: indicates to upgrade to the websocket protocol;
  • 3) SEC websocket version: 13: indicates the version of websocket. If the server does not support this version, it needs to return a sec websocket versionheader containing the version number supported by the server;
  • 4) SEC websocket key: it is matched with the SEC websocket accept in the server response header, providing basic protection, such as malicious connection or unintentional connection.

Note: the above request omits part of the non key request header. As it is a standard HTTP request, the headers of requests such as host, origin and cookie will be sent as usual. In the handshake stage, the security restriction and permission verification can be performed through the relevant request header.

6.2 server: respond to protocol upgrade

The content returned by the server is as follows, and the status code 101 indicates protocol switching. At this point, the protocol upgrade is completed, and the subsequent data interaction is carried out according to the new protocol.

HTTP/1.1 101 Switching Protocols
Connection:Upgrade
Upgrade: websocket
Sec-WebSocket-Accept: Oy4NRAQ13jhfONC7bP8dTKb4PTU=

remarks:Each header ends with RN, and an extra blank line is added to the last line   rn。 In addition, the HTTP status code responded by the server can only be used in the handshake phase. After the handshake phase, only a specific error code can be used.

6.3 calculation of SEC websocket accept

SEC websocket accept is calculated according to the SEC websocket key of the client request header.

The calculation formula is as follows:

  • 1) The SEC websocket key is spliced with 258eafa5-e914-47da-95ca-c5ab0dc85b11;
  • 2) The summary is calculated by SHA1 and converted to Base64 string.

The pseudo code is as follows:

toBase64( sha1( Sec-WebSocket-Key + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 ) )

Verify the previous returned results:

const crypto = require(‘crypto’);
const magic = ‘258EAFA5-E914-47DA-95CA-C5AB0DC85B11’;
const secWebSocketKey = ‘w4v7O6xFTi36lq3RNcgctw==’;
let secWebSocketAccept = crypto.createHash(‘sha1’)
    .update(secWebSocketKey + magic)
    .digest(‘base64’);
console.log(secWebSocketAccept);
// Oy4NRAQ13jhfONC7bP8dTKb4PTU=

7. Data frame format

7.1 overview

The data exchange between client and server is inseparable from the definition of data frame format. Therefore, before we talk about data exchange, let’s take a look at the data frame format of websocket.

The minimum unit of communication between websocket client and server is frame, which consists of one or more frames to form a complete message.

  • 1) Sender: the message is cut into multiple frames and sent to the server;
  • 2) Receiver: receives message frames and reassembles the associated frames into a complete message.

The focus of this section is to explain the format of data frame. The detailed definition can be referred to  Rfc6455 Section 5.2 。

7.2 overview of data frame format

The unified format of websocket data frame is given below. Students who are familiar with TCP / IP protocol should be familiar with this graph.

  • 1) From left to right, the unit is bits. For example, fin and rsv1 occupy 1 bit each, and opcode occupies 4 bits;
  • 2) The content includes identification, operation code, mask, data, data length and so on( The next section will start)

Half an hour is enough for websocket to be proficient!

7.3 detailed explanation of data frame format

In view of the previous format overview diagram, here we will explain each field one by one. If there are any unclear points, please refer to the protocol specification or leave a message for communication.

_ 1)FIN:_ 1 bit.

If it is 1, it means it is the last fragment of the message. If it is 0, it means it is not the last fragment of the message.

2)RSV1, RSV2, RSV3:One bit each.

Generally, all of them are 0. When the client and server negotiate to use websocket extension, the three flag bits can be non-zero, and the meaning of the value is defined by the extension. If there is a non-zero value and websocket extension is not used, the connection error will occur.

3)Opcode:Four bits.

The value of opcode determines how to parse the subsequent data payload. If the operation code is unknown, the receiver should fail the connection.

The optional operation codes are as follows:

%X0: represents a continuation frame. When opcode is 0, it means that data fragmentation is adopted in this data transmission, and the currently received data frame is one of the data fragmentation.
%X1: indicates that this is a text frame
%X2: indicates that this is a binary frame
%X3-7: reserved operation code for subsequent defined non control frames.
%X8: indicates the connection is broken.
%X9: indicates that this is a ping operation.
%Xa: indicates that this is a Pong operation.
%Xb-f: reserved operation code for the control frame defined later.

4)Mask1 bit.

Indicates whether to mask the data payload. When sending data from client to server, it is necessary to mask the data; When sending data from the server to the client, there is no need to mask the data.

If the data received by the server is not masked, the server needs to disconnect.

If mask is 1, a mask key will be defined in the mask key, and the mask key will be used to demask the data payload. For all data frames sent from client to server, mask is 1.

Mask algorithm, use in the next section.

5)Payload length:The length of the data payload, in bytes. It is 7 bits, or 7 + 16 bits, or 1 + 64 bits.

Suppose the number payload length = = x, if:

X is 0 ~ 126: the length of data is x bytes.
X is 126: the next two bytes represent a 16 bit unsigned integer whose value is the length of the data.
X is 127: the next 8 bytes represent a 64 bit unsigned integer (the highest bit is 0), and the value of the unsigned integer is the length of the data.

In addition, if the payload length takes up more than one byte, the binary expression of payload length adopts the network order (big endian).

6)Masking-key:0 or 4 bytes (32 bits)

All the data frames transmitted from the client to the server are masked. The mask is 1 and carries a 4-byte masking key. If mask is 0, there is no masking key.

Note: the length of load data, excluding the length of mask key.

7)Payload data:(x + y) bytes

Load data:Including extended data and application data. The expansion data is x bytes and the application data is y bytes.

Extended data:If the extension is not negotiated, the extension data is 0 bytes. All extensions must declare the length of the extended data, or how to calculate the length of the extended data. In addition, how to use the extension must be negotiated in the handshake phase. If extended data exists, the length of load data must include the length of extended data.

Application data:Any application data, after the extended data (if there is extended data), occupies the rest of the data frame. The length of application data is obtained by subtracting the length of extended data from the length of load data.

7.4 Mask Algorithm

The masking key is a 32-bit random number selected by the client. The mask operation does not affect the length of the data payload.

The following algorithms are used for mask and anti mask operations.

First, suppose:

Original-octet-i: the i-th byte of the original data;
Transformed-octet-i: the i-th byte of the converted data;
j: It is the result of I mod 4;
Masking-key-octet-j: the jth byte of the mask key.

The algorithm is described as followsAfter XOR of original-octet-i and masking-key-octet-j, transformed-octet-i is obtained.

j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j

8. Data transmission

Once the websocket client and server establish a connection, the subsequent operations are based on the transmission of data frames.

Websocket distinguishes the types of operations according to opcode. For example, 0x8 indicates disconnection, and 0x0-0x2 indicates data interaction.

8.1 data fragmentation

Each message in websocket may be divided into multiple data frames. When the receiver of websocket receives a data frame, it will judge whether it has received the last data frame of the message according to the value of fin.

Fin = 1 indicates that the current data frame is the last data frame of the message. At this time, the receiver has received the complete message and can process the message. If fin = 0, the receiver needs to continue to listen and receive the rest of the data frames.

In addition, opcode represents the type of data in the scenario of data exchange. 0x01 for text and 0x02 for binary. However, 0x00 is special. It means continuation frame. As the name suggests, it means that the data frame corresponding to the complete message has not been received.

8.2 data fragmentation example

It’s more vivid to look at the examples directly. The following example is from  MDN, which can well demonstrate the fragmentation of data. The client sends a message to the server twice, and the server responds to the client after receiving the message. Here we mainly see the message sent by the client to the server.

First message:

Fin = 1, indicating that it is the last data frame of the current message. After receiving the current data frame, the server can process the message. Opcode = 0x1, indicating that the client is sending a text type.

Second message:

  • 1) Fin = 0, opcode = 0x1, indicating that the text type is being sent, and the message has not been sent, and there are subsequent data frames;
  • 2) Fin = 0, opcode = 0x0, indicating that the message has not been sent, and there are subsequent data frames. The current data frame needs to be followed by the previous data frame;
  • 3) Fin = 1, opcode = 0x0, indicating that the message has been sent and there is no subsequent data frame. The current data frame needs to be followed by the previous data frame. The server can assemble the associated data frames into a complete message.

Client: FIN=1, opcode=0x1, msg=”hello”
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg=”and a”
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg=”happy new”
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg=”year!”
Server: (process complete message) Happy new year to you too!

9. Connection maintenance, heartbeat

In order to maintain the real-time two-way communication between client and server, websocket needs to ensure that the TCP channel between client and server is not disconnected.

However, for the connection without data exchange for a long time, if it is still maintained for a long time, the included connection resources may be wasted.

However, some scenarios are not ruled out. Although there is no data exchange between the client and the server for a long time, they still need to be connected.

At this time, you can use heartbeat to achieve:

Sender → receiver: Ping
Receiver to sender: Pong

The operations of Ping and Pong correspond to the two control frames of websocket, and the opcode is respectively 0x9 and 0xa.

give an example:Websocket server sends Ping to client, only need the following code (using WS module)

ws.ping(”, false, true);

10. The function of SEC websocket key / accept

As mentioned earlier, SEC websocket key / sec websocket accept   Its main function is to provide basic protection and reduce malicious connection and accidental connection.

The functions are summarized as follows

  • 1) Prevent the server from receiving illegal websocket connection (for example, if the HTTP client accidentally requests to connect to the websocket service, the server can directly refuse the connection)
  • 2) Make sure that the server understands the websocket connection. Because the WS handshake phase uses the HTTP protocol, the WS connection may be processed and returned by an HTTP server. At this time, the client can ensure that the server knows the WS protocol through sec websocket key( It’s not 100% insurance. For example, there are always some boring HTTP servers that only deal with SEC websocket key, but do not implement WS protocol
  • 3) When using the browser to initiate Ajax requests and set the header, SEC websocket key and other related headers are prohibited. This can avoid the unexpected request for websocket upgrade when the client sends Ajax request
  • 4) It can prevent reverse proxy (not understanding WS protocol) from returning wrong data. For example, the reverse proxy receives two WS connection upgrade requests before and after, and the reverse proxy returns the first request to the cache, and then directly returns the cache request when the second request arrives (meaningless return).
  • 5) The main purpose of SEC websocket key is not to ensure the security of data, because the conversion calculation formula of SEC websocket key and SEC websocket accept is open and very simple. Its main function is to prevent some common accidents (unintentional).

emphasize:Sec-WebSocket-Key/Sec-WebSocket-Accept   However, there is no practical guarantee whether the connection is secure, whether the data is secure, and whether the client / server is legal.

11. Function of data mask

11.1 general

In websocket protocol, the role of data mask is to enhance the security of the protocol. But the data mask is not to protect the data itself, because the algorithm itself is public and the operation is not complex. In addition to the encryption channel itself, it seems that there are not many effective ways to protect communication security.

So why introduce mask computing? In addition to increasing the amount of computation of the computing machine, it does not seem to have much benefit (this is also the point that many students are puzzled).

The answer is still two words: safety. However, it is not to prevent data leakage, but to prevent proxy cache poisoning attacks and other problems existing in earlier versions of the protocol.

11.2 proxy cache pollution attack

Here is an excerpt from a speech on security in 2010. It mentioned the security problems that may be caused by the defects of the proxy server in the protocol implementation(Click here to see the source)。

“We show, empirically, that the current version of the WebSocket consent mechanism is vulnerable to proxy cache poisoning attacks. Even though the WebSocket handshake is based on HTTP, which should be understood by most network intermediaries, the handshake uses the esoteric “Upgrade” mechanism of HTTP. In our experiment, we find that many proxies do not implement the Upgrade mechanism properly, which causes the handshake to succeed even though subsequent traffic over the socket will be misinterpreted by the proxy.”
【TALKING】 Huang, L-S., Chen, E., Barth, A., Rescorla, E., and C.
Jackson, “Talking to Yourself for Fun and Profit”, 2010,

Before formally describing the attack steps, we assume that there are the following participants:

  • 1) The attacker, the server controlled by the attacker (hereinafter referred to as “evil server”), and the resources forged by the attacker (hereinafter referred to as “evil resources”);
  • 2) Victims and resources that victims want to visit (referred to as “justice resources”);
  • 3) The server that the victim actually wants to access (referred to as “justice server”);
  • 4) Intermediate proxy server.

Attack step 1:

  • 1) Attacker’s browser to   Evil server   Initiate websocket connection. According to the above, the first is a protocol upgrade request;
  • 2) Protocol upgrade request actually arrived   Proxy server;
  • 3) Proxy server   Forward protocol upgrade request to   Evil server;
  • 4) Evil server   Agree to connect, proxy server   Forward the response to the attacker.

Because of the defect in the implementation of upgrade, the proxy server is unable to run   I thought I was forwarding ordinary HTTP messages. Therefore, when the protocol server   Agree to connect, proxy server   I think this conversation is over.

Attack step 2:

  • 1) Attackers   In the connection established before, through the websocket interface to   Evil server   Send the data, and the data is well constructed text in HTTP format. It includes   The address of the justice resource, and a fake host (pointing to the justice server)( See message below)
  • 2) Request to arrive   proxy server  。 Although the previous TCP connection is reused, the   proxy server   Thought it was a new HTTP request.
  • 3) Proxy server   towards   Evil server   request   Evil resources.
  • 4) Evil server   return   Evil resources. proxy server   Cache   The URL is right, but the host is   Justice server   Your address).

Here, the victims can come on stage:

  • 1) Victims   adopt   proxy server   visit   Justice server   Of   Justice resources.
  • 2) Proxy server   Check the URL and host of the resource, and find that there is a local cache (forged).
  • 3) Proxy server   take   Evil resources   Return to   victim.
  • 4) Victims   Death.

Attached:The elaborately constructed “HTTP request message” mentioned earlier.

Client → Server:
POST /path/of/attackers/choice HTTP/1.1 Host: host-of-attackers-choice.com Sec-WebSocket-Key: <connection-key>
Server → Client:
HTTP/1.1 200 OK
Sec-WebSocket-Accept: <connection-key>

11.3 current solutions

The original proposal was to encrypt the data. Based on the consideration of security and efficiency, a compromise scheme is adopted: mask the data payload.

It should be noted that this only limits the mask processing of the data payload by the browser, but the bad guys can completely implement their own websocket client and server. If they do not follow the rules, the attack can proceed as usual.

But adding this restriction to the browser can greatly increase the difficulty of the attack and the scope of the attack. If there is no such restriction, you just need to put a phishing website on the Internet to cheat people to visit, and then you can launch a large-scale attack in a short time.

12. Write it at the back

There are many things that websocket can write, such as websocket extensions. How to negotiate and use extension between client and server. Websocket extension can add a lot of capacity and imagination to the protocol itself, such as data compression, encryption, multiplexing and so on.

Limited by space, not to start here, interested students can exchange messages. If there are any mistakes or omissions in this article, please point out.

13. References

[1] Rfc6455: websocket specification
[2] Specification: data frame mask details
[3] Specification: data frame format
[4] server-example
[5] Writing websocket server
[6] Attacks on network infrastructure (what data masking operations should prevent)
[7] Talking to yourself for fun and profit
[8] What is Sec-WebSocket-Key for?
[9] 10.3. Attacks On Infrastructure (Masking)
[10] Talking to Yourself for Fun and Profit
[11] Why are WebSockets masked?
[12] How does websocket frame masking protect against cache poisoning?
[13] What is the mask in a WebSocket frame?

Appendix: more web instant messaging information

SSE Technology: a new HTML5 server push event technology
Comet Technology: Web real time communication technology based on HTTP long connection
Some practice and ideas of pushing messages by socket.io
LinkedIn’s Web instant messaging practice: hundreds of thousands of long connections on a single machine
The development of Web instant messaging technology and the technical practice of websocket and socket.io
Web instant messaging security: detailed explanation of cross site websocket hijacking vulnerability (including sample code)
Open source framework pomelo practice: building high performance distributed IM chat server on Web
Using websocket and SSE technology to push messages on Web
The evolution of Web Communication: from Ajax and jsonp to SSE and websocket
Why does the network layer framework of mobile IM SDK web use socket.io instead of netty?
Integrating theory with practice: understanding the communication principle, protocol format and security of websocket from scratch
How to use websocket to realize long connection in wechat applet (including complete source code)
Quick understanding of electron: a new generation of web based cross platform desktop Technology
Understanding the evolution of front end technology in one article
Make up lessons of basic knowledge of instant messaging on the Web: understand all the problems of cross domain in one article!
Instant messaging on the Web: how to make your websocket disconnected and reconnected faster?
Half an hour is enough for websocket to be proficient!
 More similar articles

(this article is published at:http://www.52im.net/thread-3134-1-1.html

The official account will be released in the “instant messaging technology circle”.
Half an hour is enough for websocket to be proficient!
The link on the official account is:Click here to enter