DWQA QuestionsCategory: ProgramJavascript: character encoding conversion and entity conversion
cattail asked 8 months ago

This question stems from this post. http://segmentfault.com/q/10100000001… From the “interference string” http://www.awflasher.com/jsmail replied by @ sunny on the third floor. Post the source code of the actual work here:

function htmlEncode( s )

My understanding of the source code is that HTMLEncode transforms the normal characters originally displayed in the web page into HTML entities; URLEncode encodes the URL in the original page’s href attribute into utf8 format, which is equivalent to using encodeuri, but the reason why encodeuri is not used here is that the author hopes to escape only about 25% of the strings, thus increasing the difficulty of the interpreter.
My question is:

  1. In the code, one of the representations of HTML entities is used & (x); where x is the Unicode encoding of. The Unicode code of a character can be obtained by the method atcharcode. The problem is, when I look up table I (see below), it seems that the space is an exception, the value obtained through atcharcode is 32, and its HTML entity is What’s the reason?
  2. The way URLEncode translates Unicode into utf8 surprised me. http://cattail2012.wordpress.com/2012…。 This article is my understanding of character set and encoding. In one section (related applications), the conversion from Unicode to utf8 is introduced. I don’t know if I’m wrong or the author’s code is defective?

Javascript: character encoding conversion and entity conversion
Table 1

Tychio replied 8 months ago

Actual measurement: &ා32; yes’ (true space), &ා160; yes & nbsp; so it’s not a special case, no problem

1 Answers
Best Answer
coder answered 8 months ago

The first problem is that & nbsp and spaces are two different characters in Unicode. The space code is 32, and the non breaking space code is 160. In HTML text, if you encounter multiple consecutive space characters, the browser will merge them into a single space character, and for multiple consecutive & nbsp, you will not merge them.The second problem: the author wrote the URLEncode function to encode email address. Email address can only be characters in ASCII character set. The encoding rules for ASCII characters in UTF-8 are the same as those in ASCII character set, so this method can be used for encoding. Obviously, it is not possible to encode characters beyond the ASCII character set in this way, such as Chinese. Your understanding is right, and the author’s writing is right. The key is to see where he intends to use it.

cattail replied 8 months ago

Ha ha, thank you very much. I have learned two points here: 1. What is non breaking space; 2. Another method of converting decimal system to hexadecimal number.:)

coder replied 8 months ago

It is to change a number like 0x32 into a string like “32”.