Solution to the possible style disorder caused by UTF-8 BOM

Time:2019-12-19

When using UTF-8 encoding to write web pages, the problem of BOM (byte order mark) often leads to some unknown blank lines or garbled characters in web pages. This is because UTF-8 encoding is not mandatory for BOM. Therefore, when UTF-8 code saves files, there will be different processing methods. For example, some browsers (Firefox) can automatically filter out all UTF-8 BOM, and some (ie) can only filter out BOM once (why once? This problem occurs when you include multiple files).

Solution to the possible style disorder caused by UTF-8 BOM
Use EDITPLUS or other editors to delete the BOM signature in the file and refresh the page. The style is normal.

Find a description of BOM here, which may help you understand:

In UCS encoding, there is a character called “zero width no-break space”, whose encoding is FEFF. Fffe is a non-existent character in UCS, so it should not appear in actual transmission. The UCS specification suggests that we transfer the character “zero width no-break space” before transferring the byte stream. In this way, if the receiver receives FEFF, it indicates that the byte stream is big endian; if the receiver receives fffe, it indicates that the byte stream is small endian. Therefore, the character “zero width no-break space” is also called BOM.

UTF-8 does not need BOM to indicate byte order, but can use BOM to indicate encoding mode. The UTF-8 encoding of the character “zero width no-break space” is EF BB BF. So if the receiver receives a byte stream that starts with EF BB BF, it knows that this is UTF-8 encoding.

Windows uses BOM to mark the encoding of text files.

In UTF-8 encoded files, BOM takes up three bytes. If you save a text file as UTF-8 encoding with Notepad, open the file with UE and switch to hex editing status to see the beginning fffe. This is a good way to identify UTF-8 code file. The software uses BOM to identify whether the file is UTF-8 code. Many software also requires that the read file must have BOM. However, there are still many software that can’t recognize BOM. When I studied Firefox, I knew that in the early versions of Firefox, extensions could not have BOM, but Firefox 1.5 and later versions have started to support BOM. Now we find that PHP doesn’t support BOM either.

PHP does not consider BOM when designing, that is to say, it will not ignore the three characters of BOM at the beginning of UTF-8 encoded file. Since the code that must follow <? Or <? PHP will be executed as PHP code, the three characters will be output directly. If the plug-in file has this problem, the white screen will be displayed after the plug-in is activated or not activated in the background page. If the template file has this problem, the three characters will be output directly, resulting in a small blank line at the top of the page. Foreign English plug-ins and templates are generally encoded in ASCII code, and there is no BOM, only domestic plug-ins and templates will cause problems due to the author’s ignorance. In addition, when you modify the template, because the output page uses UTF-8 encoding, if you add Chinese characters when you modify the template, you must convert the file to UTF-8 encoding to display normally. At this time, if the used editor automatically adds BOM, it will result in the output of these three characters on the page. The display effect depends on the browser. Generally speaking It’s a blank line or a jumble.