Websocket from entry to mastery, half an hour is enough!

Time:2021-8-6

The original title of this article is “websocket: 5 minutes from getting started to mastering”, and the author is “program ape small card”_ Casper “, see the resources section at the end of the text for a link to the original text. There are changes in this collection.

1. Introduction

Since the emergence of websocket in HTML5, it has completely changed the “pain point” of the basic channel of Web instant messaging technology in the past (before that, developers had to make problems such as:Short polling, long polling, comet, SSE and other technologies have suffered for a long time…), now there is no need to worry about whether to use “polling” or “comet” technology to ensure the real-time data. Happiness comes suddenly ^ – ^.

Websocket is not only widely used in web applications, but also slowly applied by developers to various rich clients (such as mobile terminals) that originally use protocols such as TCP and UDP.

In view of this, it is very necessary for developers in the direction of instant messaging to have a comprehensive and in-depth understanding of websocket, and they will investigate this knowledge in person.

Therefore, in the few years since its establishment, the instant messaging network has continuously sorted out a large number of technical articles related to web instant messaging (especially websocket). This article is also an article about websocket from introduction to mastery. The content goes from simple to deep. It is more suitable for developers who want to have a deeper understanding of websocket protocol in a short time.

(this article is published on:http://www.52im.net/thread-3134-1-1.html

2. Related articles

Detailed explanation of websocket (I): preliminary understanding of websocket Technology
Detailed explanation of websocket (II): technical principle, code demonstration and application cases
Detailed explanation of websocket (III): go deep into the details of websocket communication protocol
Detailed explanation of websocket (IV): thoroughly investigate the relationship between HTTP and websocket (Part I)
Detailed explanation of websocket (V): get to the bottom of the relationship between HTTP and websocket (Part 2)
Detailed explanation of websocket (6): get to the bottom of the relationship between websocket and socket

3. Text overview

The emergence of websocket makes the browser have the ability of real-time two-way communication.

This article will introduce how websocket establishes a connection, the details of data exchange, and the format of data frames. In addition, it also briefly introduces the security attacks against websocket and how the protocol can resist similar attacks.

4. What is websocket

4.1 basic introduction

HTML5 began to provide a network technology for full duplex communication between browser and server, which belongs to application layer protocol. It is based on TCP transport protocol and reuses HTTP handshake channel.

For most web developers, the above description is a little boring. In fact, just remember a few points:

  • 1) Websocket can be used in the browser;
  • 2) Support two-way communication;
  • 3) It’s easy to use.

4.2 what are the advantages

When it comes to advantages, the reference here is HTTP protocol. In a nutshell, it supports two-way communication, which is more flexible, efficient and scalable.

  • 1) It supports two-way communication and has stronger real-time performance;
  • 2) Better binary support;
  • 3) Less control overhead. After the connection is created, when the WS client and server exchange data, the header of the protocol controlled packet is small. When the header is not included, the header from the server to the client is only 2 ~ 10 bytes (depending on the packet length). For the client to the server, an additional 4-byte mask needs to be added. The HTTP protocol needs to carry a complete header for each communication;
  • 4) Support extension. Ws protocol defines extensions. Users can extend the protocol or implement custom sub protocols( For example, support custom compression algorithm, etc.)

For the latter two points, students who have not studied the websocket protocol specification may not understand it intuitively, but it will not affect the learning and use of websocket.

4.3 what to learn

For the learning of network application layer protocol, the most important is often the connection establishment process and data exchange process. Of course, the format of data cannot escape, because it directly determines the capability of the protocol itself. Good data format can make the protocol more efficient and scalable.

The following mainly focuses on the following points:

  • 1) How to establish a connection;
  • 2) How to exchange data;
  • 3) Data frame format;
  • 4) How to maintain the connection.

5. Getting started demo code

Before formally introducing the details of the agreement, let’s take a look at a simple example to have an intuitive feeling. Examples include websocket server and websocket client (web page). The complete code can be found inhere  Found.

The server uses WS this library. Compared with the familiar socket.io, WS is lighter and more suitable for learning.

5.1 server

The code is as follows. Listen to port 8080. When a new connection request arrives, print the log and send a message to the client. When a message is received from the client, the log is also printed.

var app = require(‘express’)();
var server = require(‘http’).Server(app);
var WebSocket = require(‘ws’);
var wss = newWebSocket.Server({ port: 8080 });
wss.on(‘connection’, function connection(ws) {
    console.log(‘server: receive connection.’);
    ws.on(‘message’, functionincoming(message) {
        console.log(‘server: received: %s’, message);
    });
    ws.send(‘world’);
});
app.get(‘/’, function(req, res) {
  res.sendfile(__dirname + ‘/index.html’);
});
app.listen(3000);

5.2 client

The code is as follows. Initiate a websocket connection to port 8080. After the connection is established, print the log and send a message to the server. After receiving the message from the server, the log is also printed.

<script>
  var ws = new WebSocket(‘ws://localhost:8080’);
  ws.onopen = function() {
    console.log(‘ws onopen’);
    ws.send(‘from client: hello’);
  };
  ws.onmessage = function(e) {
    console.log(‘ws onmessage’);
    console.log(‘from server: ‘+ e.data);
  };
</script>

5.3 operation results

You can view the logs of the server and the client respectively. They are not expanded here.

Server output:

server: receive connection.
server: received hello

Client output:

client: ws connection is open
client: received world

6. How to establish a connection

As mentioned earlier, websocket reuses the handshake channel of HTTP. Specifically, the client negotiates the upgrade protocol with the websocket server through an HTTP request. After the protocol upgrade is completed, the subsequent data exchange follows the websocket protocol.

6.1 client: apply for protocol upgrade

First, the client initiates a protocol upgrade request. It can be seen that the standard HTTP message format is adopted, and only get method is supported.

GET / HTTP/1.1
Host: localhost:8080
Origin:http: //127.0.0.1:3000
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: w4v7O6xFTi36lq3RNcgctw==

The significance of the first part of the key request is as follows:

  • 1) Connection: Upgrade: indicates the protocol to be upgraded;
  • 2) Upgrade: websocket: indicates to upgrade to websocket protocol;
  • 3) SEC websocket version: 13: indicates the version of websocket. If the server does not support this version, you need to return a sec websocket versionheader, which contains the version number supported by the server;
  • 4) SEC websocket key: it is matched with the SEC websocket accept in the response header of the server. It provides basic protection, such as malicious connection or unintentional connection.

Note: the above request omits some non key request headers. Since it is a standard HTTP request, the headers of requests such as host, origin and cookie will be sent as usual. In the handshake phase, security restrictions and permission verification can be carried out through the relevant request header.

6.2 server: response protocol upgrade

The content returned by the server is as follows, and the status code 101 indicates protocol switching. This completes the protocol upgrade, and subsequent data interactions are performed according to the new protocol.

HTTP/1.1 101 Switching Protocols
Connection:Upgrade
Upgrade: websocket
Sec-WebSocket-Accept: Oy4NRAQ13jhfONC7bP8dTKb4PTU=

remarks:Each header ends with RN and an extra blank line is added to the last line   rn。 In addition, the HTTP status code responded by the server can only be used in the handshake phase. After the handshake phase, only specific error codes can be used.

6.3 calculation of SEC websocket accept

SEC websocket accept is calculated according to the SEC websocket key in the header of the client request.

The calculation formula is:

  • 1) Splice the SEC websocket key with 258eafa5-e914-47da-95ca-c5ab0dc85b11;
  • 2) Calculate the summary through SHA1 and convert it into Base64 string.

The pseudo code is as follows:

toBase64( sha1( Sec-WebSocket-Key + 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 ) )

Verify the previous returned results:

const crypto = require(‘crypto’);
const magic = ‘258EAFA5-E914-47DA-95CA-C5AB0DC85B11’;
const secWebSocketKey = ‘w4v7O6xFTi36lq3RNcgctw==’;
let secWebSocketAccept = crypto.createHash(‘sha1’)
    .update(secWebSocketKey + magic)
    .digest(‘base64’);
console.log(secWebSocketAccept);
// Oy4NRAQ13jhfONC7bP8dTKb4PTU=

7. Data frame format

7.1 general

The data exchange between client and server is inseparable from the definition of data frame format. Therefore, before actually explaining data exchange, let’s take a look at the data frame format of websocket.

The minimum unit of communication between websocket client and server is frame, which consists of one or more frames to form a complete message.

  • 1) Sender: cut the message into multiple frames and send it to the server;
  • 2) Receiver: receives message frames and reassembles the associated frames into complete messages.

The focus of this section is to explain the format of data frames. For detailed definitions, please refer to  Rfc6455 Section 5.2 。

7.2 overview of data frame format

The unified format of websocket data frame is given below. Students familiar with TCP / IP protocol should be familiar with such a diagram.

  • 1) From left to right, the unit is bits. For example, fin and rsv1 occupy 1 bit each, and opcode occupies 4 bits;
  • 2) The contents include identification, operation code, mask, data, data length, etc( (the next section expands)

Websocket from entry to mastery, half an hour is enough!

7.3 detailed explanation of data frame format

For the format overview diagram above, we will explain it field by field. If there is any ambiguity, please refer to the protocol specification or leave a message.

_ 1)FIN:_ 1 bit.

If it is 1, it means it is the last fragment of the message. If it is 0, it means it is not the last fragment of the message.

2)RSV1, RSV2, RSV3:One bit each.

Generally, it is all 0. When the client and server negotiate to adopt websocket extension, the three flag bits can be non-0, and the meaning of the value is defined by the extension. If there is a non-zero value and the websocket extension is not adopted, the connection error occurs.

3)Opcode:4 bits.

The opcode value determines how subsequent data payloads should be parsed. If the operation code is unknown, the receiver should fail the connection.

The optional operation codes are as follows:

%X0: represents a continuation frame. When opcode is 0, it means that the data transmission adopts data fragmentation, and the currently received data frame is one of the data fragmentation.
%X1: indicates that this is a text frame
%X2: indicates that this is a binary frame
%X3-7: reserved operation code for subsequent defined non control frames.
%X8: indicates that the connection is disconnected.
%X9: indicates that this is a ping operation.
%Xa: indicates that this is a Pong operation.
%Xb-f: reserved operation code for subsequent defined control frames.

4)Mask1 bit.

Indicates whether to mask the data payload. When sending data from the client to the server, the data needs to be masked; When sending data from the server to the client, there is no need to mask the data.

If the data received by the server has not been masked, the server needs to disconnect.

If the mask is 1, a masking key will be defined in the masking key and used to unmask the data payload. For all data frames sent from the client to the server, the mask is 1.

The algorithm and purpose of mask are explained in the next section.

5)Payload length:The length of the data payload, in bytes. Is 7 bits, or 7 + 16 bits, or 1 + 64 bits.

Suppose the number payload length = = = x, if:

X is 0 ~ 126: the length of data is x bytes.
X is 126: the next 2 bytes represent a 16 bit unsigned integer whose value is the length of the data.
X is 127: the next 8 bytes represent a 64 bit unsigned integer (the highest bit is 0), and the value of the unsigned integer is the length of the data.

In addition, if the payload length occupies more than one byte, the binary expression of payload length adopts the network order (big endian, the important bits first).

6)Masking-key:0 or 4 bytes (32 bits)

All data frames transmitted from the client to the server are masked. The mask is 1 and carries a 4-byte masking key. If mask is 0, there is no masking key.

Note: the length of load data does not include the length of mask key.

7)Payload data:(x + y) bytes

Load data:Including extended data and application data. Where, the extended data is x bytes and the application data is y bytes.

Extended data:If the extension is not negotiated, the extension data is 0 bytes. All extensions must declare the length of extension data, or how to calculate the length of extension data. In addition, how the extension is used must be negotiated during the handshake phase. If the extended data exists, the load data length must include the length of the extended data.

Application data:Any application data occupies the remaining position of the data frame after the extended data (if there is extended data). The length of application data is obtained by subtracting the length of extended data from the length of load data.

7.4 Mask Algorithm

Masking key is a 32-bit random number selected by the client. The mask operation does not affect the length of the data payload.

The following algorithms are used for mask and unmask operations.

First, suppose:

Original octet-i: the ith byte of the original data;
Transformed-octet-i: the ith byte of the converted data;
j: Is the result of I mod 4;
Masking-key-octet-j: the j-th byte of mask key.

The algorithm is described as:After the exclusive or of original-octet-i and masking-key-octet-j, transformed-octet-i is obtained.

j = i MOD 4
transformed-octet-i = original-octet-i XOR masking-key-octet-j

8. Data transmission

Once the websocket client and server establish a connection, the subsequent operations are based on the transmission of data frames.

Websocket distinguishes the types of operations according to opcode. For example, 0x8 indicates disconnection, and 0x0-0x2 indicates data interaction.

8.1 data slicing

Each message of websocket may be divided into multiple data frames. When the websocket receiver receives a data frame, it will judge whether it has received the last data frame of the message according to the value of fin.

Fin = 1 means that the current data frame is the last data frame of the message. At this time, the receiver has received the complete message and can process the message. If fin = 0, the receiver needs to continue listening and receiving other data frames.

In addition, opcode represents the type of data in the scenario of data exchange. 0x01 represents text and 0x02 represents binary. 0x00 is special, which means the continuation frame. As the name suggests, the data frame corresponding to the complete message has not been received.

8.2 example of data fragmentation

It’s more vivid to look directly at the example. The following example is from  MDN, it can well demonstrate the fragmentation of data. The client sends messages to the server twice, and the server responds to the client after receiving the message. Here we mainly look at the messages sent by the client to the server.

First message:

Fin = 1, indicating the last data frame of the current message. After receiving the current data frame, the server can process the message. Opcode = 0x1, indicating that the client sends text type.

Second message:

  • 1) Fin = 0, opcode = 0x1, indicating that the text type is sent, and the message has not been sent, and there are subsequent data frames;
  • 2) Fin = 0, opcode = 0x0, indicating that the message has not been sent, and there are subsequent data frames. The current data frame needs to be connected after the previous data frame;
  • 3) Fin = 1, opcode = 0x0, indicating that the message has been sent and there is no subsequent data frame. The current data frame needs to be connected after the previous data frame. The server can assemble the associated data frames into a complete message.

Client: FIN=1, opcode=0x1, msg=”hello”
Server: (process complete message immediately) Hi.
Client: FIN=0, opcode=0x1, msg=”and a”
Server: (listening, new message containing text started)
Client: FIN=0, opcode=0x0, msg=”happy new”
Server: (listening, payload concatenated to previous message)
Client: FIN=1, opcode=0x0, msg=”year!”
Server: (process complete message) Happy new year to you too!

9. Connection hold, heartbeat

In order to maintain the real-time two-way communication between the client and the server, websocket needs to ensure that the TCP channel between the client and the server is not disconnected.

However, for a connection without data exchange for a long time, if it is still maintained for a long time, the included connection resources may be wasted.

However, some scenarios are not excluded. Although the client and server have no data exchange for a long time, they still need to maintain the connection.

At this time, heartbeat can be used:

Sender – > receiver: Ping
Receiver – > sender: Pong

The operations of Ping and Pong correspond to the two control frames of websocket, and the opcodes are 0x9 and 0xa respectively.

give an example:The websocket server only needs the following code to send Ping to the client (WS module is adopted)

ws.ping(”, false, true);

10. Role of SEC websocket key / accept

As mentioned earlier, SEC websocket key / sec websocket accept   Its main function is to provide basic protection and reduce malicious and accidental connections.

The functions are summarized as follows:

  • 1) Avoid the server receiving illegal websocket connections (for example, if the HTTP client accidentally requests to connect to the websocket service, the server can directly refuse the connection)
  • 2) Ensure that the server understands the websocket connection. Because the WS handshake phase adopts the HTTP protocol, the WS connection may be processed and returned by an HTTP server. At this time, the client can ensure that the server understands the WS protocol through the SEC websocket key( It is not 100% safe. For example, there are always boring HTTP servers that only process sec websocket key, but do not implement WS protocol…)
  • 3) When launching Ajax requests in the browser and setting headers, SEC websocket key and other related headers are prohibited. This can prevent the client from accidentally requesting a websocket upgrade when sending an Ajax request
  • 4) It can prevent reverse proxy (not understanding WS protocol) from returning wrong data. For example, the reverse proxy receives two WS connection upgrade requests before and after the reverse proxy. The reverse proxy returns the first request to the cache, and then directly returns the cached request when the second request arrives (meaningless return).
  • 5) The main purpose of SEC websocket key is not to ensure data security, because the conversion calculation formulas of SEC websocket key and SEC websocket accept are public and very simple. Their main function is to prevent some common accidents (unintentional).

emphasize:Sec-WebSocket-Key/Sec-WebSocket-Accept   The conversion of can only bring basic guarantee, but there is no practical guarantee for WS client and WS server whether the connection is safe, whether the data is safe, and whether the client / server is legal.

11. Function of data mask

11.1 general

In websocket protocol, the function of data mask is to enhance the security of the protocol. But the data mask is not to protect the data itself, because the algorithm itself is open and the operation is not complex. In addition to encrypting the channel itself, there seems to be not many effective ways to protect communication security.

So why introduce mask computing? In addition to increasing the amount of computation of the computing machine, it doesn’t seem to have much benefit (this is also the point that many students doubt).

The answer is still two words: safety. However, it is not to prevent data leakage, but to prevent proxy cache pollution attacks in earlier versions of the protocol.

11.2 proxy cache pollution attack

The following is an excerpt from a speech on security in 2010. It refers to the security problems that may be caused by the defects of proxy server in protocol implementation(Click here to see the source)。

“We show, empirically, that the current version of the WebSocket consent mechanism is vulnerable to proxy cache poisoning attacks. Even though the WebSocket handshake is based on HTTP, which should be understood by most network intermediaries, the handshake uses the esoteric “Upgrade” mechanism of HTTP. In our experiment, we find that many proxies do not implement the Upgrade mechanism properly, which causes the handshake to succeed even though subsequent traffic over the socket will be misinterpreted by the proxy.”
【TALKING】 Huang, L-S., Chen, E., Barth, A., Rescorla, E., and C.
Jackson, “Talking to Yourself for Fun and Profit”, 2010,

Before formally describing the attack steps, we assume the following participants:

  • 1) The attacker, the server controlled by the attacker (hereinafter referred to as “evil server”), and the resources forged by the attacker (hereinafter referred to as “evil resources”);
  • 2) Victims, resources that victims want to visit (hereinafter referred to as “justice resources”);
  • 3) The server that the victim actually wants to access (hereinafter referred to as “justice server”);
  • 4) Intermediate proxy server.

Attack step 1:

  • 1) Attacker browser to   Evil server   Initiate a websocket connection. According to the above, the first is a protocol upgrade request;
  • 2) Protocol upgrade request actually arrived   Proxy server;
  • 3) Proxy server   Forward protocol upgrade request to   Evil server;
  • 4) Evil server   Agree to connect, proxy server   Forward the response to the attacker.

Due to defects in the implementation of upgrade, the proxy server   I thought I was forwarding ordinary HTTP messages. Therefore, when the protocol server   Agree to connect, proxy server   Think this session has ended.

Attack step 2:

  • 1) Assailant   On the previously established connection, through the websocket interface   Evil server   Send data, and the data is carefully constructed text in HTTP format. It contains   The address of the justice resource and a forged host (pointing to the justice server)( (see the following message)
  • 2) Request arrival   proxy server  。 Although the previous TCP connection is reused, but   proxy server   Thought it was a new HTTP request.
  • 3) Proxy server   towards   Evil server   request   Evil resources.
  • 4) Evil server   return   Evil resources. proxy server   Cache live   The evil resource (URL) is right, but the host is   Justice server   Address).

Here, the victim can appear:

  • 1) Victim   adopt   proxy server   visit   Justice server   Yes   Justice resources.
  • 2) Proxy server   Check the URL and host of the resource and find that there is a local cache (forged).
  • 3) Proxy server   take   Evil resources   Return to   victim.
  • 4) Victim   Pawn.

Attached:The elaborate “HTTP request message” mentioned earlier.

Client → Server:
POST /path/of/attackers/choice HTTP/1.1 Host: host-of-attackers-choice.com Sec-WebSocket-Key: <connection-key>
Server → Client:
HTTP/1.1 200 OK
Sec-WebSocket-Accept: <connection-key>

11.3 current solutions

The original proposal was to encrypt the data. Based on the consideration of security and efficiency, a compromise scheme is finally adopted: mask the data load.

It should be noted that this only limits the browser to mask the data load, but bad people can implement their own websocket client and server. If they do not follow the rules, the attack can be carried out as usual.

However, adding this restriction to the browser can greatly increase the difficulty of the attack and the impact range of the attack. Without this restriction, you only need to put a phishing website on the Internet to deceive people to visit, and you can launch a large-scale attack in a short time.

12. Write it at the back

There are many things that websocket can write, such as websocket extension. How do clients and servers negotiate and use extensions. Websocket extension can add a lot of capabilities and imagination space to the protocol itself, such as data compression, encryption, multiplexing and so on.

Limited space, not here. Interested students can leave messages for communication. Please point out any mistakes and omissions in the article.

13. References

[1] Rfc6455: websocket specification
[2] Specification: data frame mask details
[3] Specifications: data frame formats
[4] server-example
[5] Write websocket server
[6] Attacks on network infrastructure (what data mask operations should prevent)
[7] Talking to yourself for fun and profit
[8] What is Sec-WebSocket-Key for?
[9] 10.3. Attacks On Infrastructure (Masking)
[10] Talking to Yourself for Fun and Profit
[11] Why are WebSockets masked?
[12] How does websocket frame masking protect against cache poisoning?
[13] What is the mask in a WebSocket frame?

Appendix: more web instant messaging materials

SSE Technology Details: a new HTML5 server push event technology
Comet Technology Details: web side real-time communication technology based on HTTP long connection
Practice and thought of implementing message push with socket.io
LinkedIn’s Web instant messaging practice: realize hundreds of thousands of long connections on a single machine
The development of Web instant messaging technology and the technical practice of websocket and socket.io
Web instant messaging security: detailed explanation of cross site websocket hijacking vulnerability (including sample code)
Open source framework pomelo practice: building a web-side high-performance distributed IM chat server
Using websocket and SSE technology to realize web message push
Explain the evolution of Web Communication: from Ajax and jsonp to SSE and websocket
Why does the network layer framework of mobileimsdk web use socket.io instead of netty?
Integrating theory with practice: understand the communication principle, protocol format and security of websocket from zero
How to use websocket to realize long connection in wechat applet (including complete source code)
Get to know electron quickly: a new generation of Web-based cross platform desktop Technology
Understand the evolution of front-end technology: review the 20-year history of technological changes of Web front-end
Make up course of basic knowledge of Web instant messaging: understand all cross domain problems in one article!
Web instant messaging practice dry goods: how to make your websocket disconnected and reconnected faster?
Websocket from entry to mastery, half an hour is enough!
 More similar articles

(this article is published on:http://www.52im.net/thread-3134-1-1.html

The official account will be released in the “instant messaging technology circle”.
Websocket from entry to mastery, half an hour is enough!
The link on the official account is:Click here to enter