Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

Time:2022-7-2

This article is shared by the elab technical team. The original title “exploring HTTPS” has been revised and changed.

1. Introduction

For im developers, the most commonly used communication technologies in IM are socket long connection and HTTP short connection (usually a mainstream im will be the combination of these two communication means). From the perspective of communication security, the security of socket long connection is realized based on the TCP protocol encrypted by ssl/tls (for example, the mmtls of wechat, see wechat new generation communication security solution: detailed explanation of mmtls based on tls1.3); For the security of HTTP short connection, that is HTTPS.

What is HTTPS? Why use HTTPS? Today, I would like to take this opportunity to learn more about HTTPS with you, including the development of HTTP, the problems encountered by HTTP, symmetric and asymmetric encryption algorithms, digital signatures, third-party certification authorities and other concepts.

Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

exchange of learning:

  • Introduction to mobile IM development: a beginner’s introduction is enough: develop mobile IM from scratch
  • Open source im framework source code:https://github.com/JackJiang2…

(this article has been synchronously published on:http://www.52im.net/thread-38…

2. Series of articles

This article is the ninth in a series of articles on IM communication security knowledge. The general contents of this series are as follows:

Instant messaging security (I): correctly understand and use Android encryption algorithm
Instant messaging security (II): discussing the application of combined encryption algorithm in im
Instant messaging security (III): explanation of common encryption and decryption algorithms and communication security
Instant messaging security (IV): example analysis of the risks of key hard coding in Android
Instant messaging security (V): application practice of symmetric encryption technology on Android platform
“Instant messaging security (VI): principle and application practice of asymmetric encryption technology”
“Instant messaging security (VII): if you understand the principle of HTTPS in this way, one article is enough”
“Instant messaging security (8): do you know whether HTTPS uses symmetric encryption or asymmetric encryption?”
“Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections (* this article)

3. Write in front

Speaking of HTTPS, we have to go back to HTTP protocol.

For the HTTP protocol, everyone must be too familiar. Do you know the difference between HTTPS and HTTP?

For this classic interview question, most people will answer as follows:

1) HTTPS has an S (secure) more than http: that is, HTTPS is a secure version of HTTP;
2) Different port numbers: http uses port 80, HTTPS uses port 443;
3) Encryption algorithm: HTTPS uses asymmetric encryption algorithm.

What is the score of the above answer? After reading this article, we can look back at this answer.

So, how does HTTPS achieve secure short connection data transmission? To thoroughly understand this problem, we need to start with the development of HTTP

4. HTTP protocol review

4.1 basic knowledge
HTTP is the abbreviation of hypertext transfer protocol, and its full Chinese name is Hypertext Transfer Protocol (see “understanding HTTP protocol in simple terms”).

The popular explanation is:

1) Hypertext refers to multimedia resources including but not limited to pictures, audio, video, etc. outside the text;
2) The protocol is the data transmission format and communication rules agreed by both parties.

HTTP is the highest layer of tcp/ip protocol Cluster – application layer protocol:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections
▲ the above figure is quoted from “understanding HTTP protocol in simple terms”

When the browser and server use HTTP protocol to transmit hypertext data to each other, they put the data into the message body, fill the header (request header or response header) to form a complete HTTP message and deliver it to the lower transmission layer. Then each layer plus the corresponding header (control part) will be distributed layer by layer, and finally the binary data will be sent out in the form of electrical signals by the physical layer.

The HTTP request is shown in the following figure:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections
▲ the above figure is quoted from “understanding HTTP protocol in simple terms”

The HTTP message structure is as follows:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

4.2 development history
The development history of HTTP is as follows:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

According to the development history of HTTP, the initial version of HTTP (http1.0) can only send an HTTP request after each TCP connection is established, and the TCP connection is released after the request is completed.

We all know that the establishment of TCP connection requires three handshakes, and each time an HTTP request is sent, the TCP connection needs to be re established, which is undoubtedly very inefficient. Therefore, HTTP1.1 improves this by using a long connection mechanism, that is, “one TCP connection, n HTTP requests”.

The long connection and short connection of HTTP protocol are in essence the long connection and short connection of TCP protocol.

When using a long connection, when a web page is opened, the TCP connection between the client and the server for transmitting HTTP data will not be closed. When the client accesses the server again, it will continue to use the established connection. Keep alive will not keep the connection permanently. It has a hold time, which can be set in different server software (such as APACHE). To realize long connection, both the client and the server need to support long connection.

PS: for im developers, in order to distinguish from socket long connection channel, HTTP is usually regarded as “short connection” (although this “short connection” is not necessarily “short”).

To open a long connection in http1.0, you need to add the connection: keep alive request header. For the detailed development history of HTTP protocol, please read the article “understand the historical evolution and design ideas of HTTP Protocol”.

4.3 safety issues
With the more and more extensive use of HTTP, the security problems of HTTP are gradually exposed.

Recall that many years ago, there were hijackings by operators everywhere. When you visit a normal web page, some advertising labels, jump scripts, deceptive red packet buttons appeared on the page inexplicably. Sometimes, when you originally wanted to download a file, it turned into a completely different thing. These are all the phenomena that operators hijacked HTTP plaintext data.

The following figure is a deja vu effect picture of operator hijacking:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

PS: about the problem of operator hijacking, you can read “comprehensive understanding of mobile DNS domain name hijacking and other complications: principle, root cause, httpdns solution, etc.”.

HTTP mainly has the following three security problems:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

To sum up:

1) Data confidentiality: because HTTP is stateless and plaintext transmission, all data content runs naked in the network, including the user’s identity information, payment account and password. These sensitive information is easy to be leaked, causing potential safety hazards;
2) Data integrity: HTTP packets will pass through many forwarding devices before arriving at the destination host. Each device node may tamper with or transfer the packet information, and the integrity of the data cannot be verified;
3) Authentication problem: it may suffer from man in the middle attack. We cannot verify that the other party of the communication is our target object.

Therefore, in order to ensure the security of data transmission, HTTP data must be encrypted.

5. Common encryption methods

5.1 basic information
There are three common encryption methods:

1) Symmetric encryption;
2) Asymmetric encryption;
3) Digital summary.

The first two are suitable for data transmission encryption, and the irreversibility of digital signature is often used for digital signature.

Next, we will briefly learn these three common encryption methods one by one.

5.2 symmetric encryption
Symmetric encryption, also known as key encryption or one-way encryption, is to use the same set of keys to encrypt and decrypt. The key can be understood as an encryption algorithm.

Symmetric encryption is shown as follows:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

Symmetric encryption widely used includes:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

Advantages, disadvantages and applicable scenarios of symmetric encryption algorithm:

1) Advantages: open and simple algorithm, easy encryption and decryption, fast encryption speed and high efficiency;
2) Disadvantages: relatively speaking, it is not particularly secure. There is only one key. If the ciphertext is intercepted and the key is also hijacked, the information is easy to be decoded;
3) Applicable scenario: the encryption and decryption is fast and efficient, so it is suitable for the encryption scenario of a large amount of data. Because how to transmit the key is a headache, it is suitable for scenarios without key exchange, such as internal systems, where the key can be directly determined in advance.

PS: you can experience the symmetric encryption algorithm online. The link is:http://www.jsons.cn/textencrypt/

Little knowledge: Base64 coding also belongs to symmetric encryption!

5.3 asymmetric encryption
Asymmetric encryption uses a pair of keys (public key and private key) for encryption and decryption.

Asymmetric encryption can complete decryption without directly transferring the key. The specific steps are as follows:

1) Party B generates two keys (public key and private key). The public key is public and can be obtained by anyone, while the private key is confidential;
2) Party A obtains Party B’s public key and encrypts the information with it;
3) Party B gets the encrypted information and decrypts it with the private key.

Take RSA, the most typical asymmetric encryption algorithm, as an example:

To thoroughly understand RSA, you need to understand the knowledge of number theory and RSA encryption algorithm in the whole derivation process. Briefly introduce the idea: using two super large prime numbers and their product as the material to generate public key and private key, it is very difficult to calculate the private key from the public key (it needs to factorize the super large number into the product of two super large prime numbers). At present, the longest RSA key cracked is 768 binary bits. In other words, the key with a length of more than 768 bits cannot be cracked (at least no one announced it publicly). Therefore, it can be considered that the 1024 bit RSA key is basically secure, and the 2048 bit key is extremely secure.

Advantages, disadvantages and applicable scenarios of asymmetric encryption algorithm:

1) Advantages: high strength, stronger security than symmetric encryption algorithm, no need to pass the private key, resulting in no risk of key leakage;
2) Disadvantages: large amount of calculation and slow speed;
3) Applicable scenarios: applicable to scenarios that require key exchange, such as Internet applications, where keys cannot be agreed in advance.

In practice, it can be combined with symmetric encryption algorithm:

1) The key of symmetric encryption algorithm is transferred by using the good security of asymmetric encryption algorithm.
2) Using the fast encryption and decryption speed of symmetric encryption algorithm, we can encrypt the encryption scene with large data content (such as HTTPS).

PS: for im developers, the article “discussing the application of combined encryption algorithm in IM” is worth reading.

5.4 how to choose?
1) If symmetric encryption is selected:

The HTTP requestor encrypts the data using a symmetric algorithm. In order for the receiver to decrypt, the sender also needs to pass the key to the receiver. In the process of passing the key, the attacker may still be attacked by sniffing. After stealing the key, the attacker can still decrypt and get the sent data, so this scheme is not feasible.

2) If asymmetric encryption is selected:

The receiver retains the private key and passes the public key to the sender. The sender encrypts the data with the public key, and the receiver decrypts the data with the private key. Although the attacker cannot obtain these data directly (because there is no private key), he can intercept the transmitted public key, then transmit his public key to the sender, and then decrypt the data sent by the sender with his private key.

In the whole process, both sides of communication do not know the existence of intermediaries, but intermediaries can obtain complete data information.

Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

3) A mixture of two encryption methods:

First, the asymmetric encryption algorithm is used to encrypt and transmit the symmetric encryption key, and then both parties encrypt the data to be sent through symmetric encryption. It seems nothing wrong, but is it true?

Intermediaries can still intercept the transmission of public keys and replace them with their own public keys, addressing the symptoms rather than the root cause.

To cure the root cause, we need to find a third-party notary to prove that the public key has not been replaced. Therefore, the concept of digital certificate is introduced, which will be shared in the next section.

6. Digital certificate

6.1 CA organization
CA is the certificate authority, the institution that issues digital certificates.

As a trusted third party, CA undertakes the responsibility of verifying the legitimacy of the public key in the public key system.

A certificate is a data file that the source server applies to a trusted third party organization. This certificate not only indicates who the domain name belongs to and the date of issue, but also includes the private key of the third-party certificate.

The server puts the public key in the digital certificate. As long as the certificate is trusted, the public key is trusted.

The following two pictures are the letters of some contents in the certificate of flybook domain name:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

6.2 digital signature
Abstract algorithm: it is generally implemented by hash function, which can be understood as a fixed length compression algorithm, which can compress data of any length to a fixed length. This is like adding a lock to the data. Any small change to the data will make the summary very different.

Generally, the applicant (server) of a digital certificate will generate a key pair consisting of a private key, a public key, and a certificate signing request (CSR). CSR is an encoded text file that contains the public key and other information that will be included in the certificate (such as domain name, organization, e-mail address, etc.). Key pair and CSR generation are usually completed on the server where the certificate will be installed, and the type of information contained in CSR depends on the authentication level of the certificate. Unlike the public key, the applicant’s private key is secure and should never be shown to the Ca (or anyone else).

After generating CSR: the applicant sends it to the Ca, and the CA will verify whether the information contained in it is correct. If it is correct, the certificate will be digitally signed with the issued private key, and then the signature will be placed in the certificate and sent to the applicant together with the certificate.

Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

In the SSL handshake stage: after receiving the server’s certificate, the browser decrypts it with the CA’s public key, and takes out the data, digital signature and the server’s public key in the certificate. If the decryption is successful, the identity of the server can be verified. Then the browser performs hash operation on the data and compares the result with the digital signature. If it is consistent, it can be considered that the content has not been tampered with.

Symmetric encryption and asymmetric encryption are public key encryption and private key decryption, while digital signature is just the opposite – private key encryption (signature) and public key decryption (verification), as shown in the following figure.

Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

Limited by space, this article will not repeat the content of digital certificate. You can read it in detail:

1) Understand the security principle, digital certificate, single certification, dual certification, etc. of HTTPS;
2) Do you know whether HTTPS uses symmetric encryption or asymmetric encryption?;
3) If you understand HTTPS in this way, one article is enough.

7. Why use HTTPS

The book illustrated HTTP mentioned that HTTPS is HTTP in SSL shell.

Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

7.1 SSL
SSL was renamed TLS in 1999.

Therefore, HTTPS is not a new application layer protocol, but the HTTP communication interface is partially replaced by SSL and TLS.

Specifically, HTTP will first communicate directly with TCP, while HTTPS will evolve to communicate with SSL first, and then SSL and TCP.

SSL is an independent protocol. Not only HTTP can be used, but also other application layer protocols, such as FTP and SMTP, can use SSL to encrypt.

7.2 HTTPS request process
The whole process of HTTPS request is as follows:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

As shown in the above figure:

1) The user initiates an HTTPS request in the browser, and the 443 port of the server is used for connection by default;
2) HTTPS needs to use a set of Ca digital certificates, which will be attached with a server’s public key pub, and the corresponding private key private will be kept in the server;
3) The server receives the request and returns the configured certificate containing the public key pub to the client;
4) The client receives the certificate and verifies the validity, mainly including whether it is within the validity period, whether the domain name of the certificate matches the requested domain name, and whether the upper certificate is valid (recursive judgment until the root certificate built in the system or configured by the browser is judged). If it fails, an HTTPS warning message is displayed, and if it passes, it continues;
5) The client generates a random key for symmetric encryption, encrypts it with the public key pub in the certificate, and sends it to the server;
6) The server receives the ciphertext of the random key, decrypts it with the private key private paired with the public key pub, and obtains the random key that the client really wants to send;
7) The server uses the random key sent by the client to symmetrically encrypt the HTTP data to be transmitted and return the ciphertext to the client;
8) The client uses the random key to decrypt the ciphertext symmetrically to get the HTTP data plaintext;
9) Subsequent HTTPS requests use the previously exchanged random keys for symmetric encryption and decryption.

7.3 what problems does HTTPS solve
HTTPS does solve three security problems of http:

1) Confidentiality: combine asymmetric encryption and symmetric encryption to achieve confidentiality. Use asymmetric encryption to encrypt the secret key of symmetric encryption, and then use symmetric encryption to encrypt data;
2) Integrity: solve the integrity problem through the digital signature of the third-party CA;
3) Identity verification: verify the identity of the server through the digital certificate of the third-party ca.

7.4 advantages and disadvantages of HTTPS
Finally, we summarize the advantages and disadvantages of HTTPS:
Instant messaging security (IX): why use HTTPS? In simple terms, explore the security of short connections

It can be seen that HTTPS is indeed the best solution for secure transmission of HTTP today, but it is not perfect, and there will still be loopholes.

8. References

[1] Understand HTTP protocol in simple terms
[2] Some knowledge of HTTP protocol
[3] Deep decryption of HTTP from the data transport layer
[4] Understand the historical evolution and design ideas of HTTP protocol in one article
[5] Do you know how many HTTP requests can be initiated on a TCP connection?
[6] If you understand HTTPS in this way, one article is enough
[7] One minute to understand what problem HTTPS solves
[8] Do you know whether HTTPS uses symmetric encryption or asymmetric encryption?
[9] The HTTPS era has come. Are you going to update your HTTP service?
[10] Understand HTTPS: encryption principle, security logic, digital certificate, etc
[11] Comprehensively understand the miscellaneous diseases such as DNS domain name hijacking on the mobile terminal: principle, root cause, httpdns solution, etc
(this article has been synchronously published on:http://www.52im.net/thread-38…