MB for detailed explanation of PHP_ String handle windows Chinese characters

Time:2022-7-27

We all know that in windows (Chinese version of course), the file name and file content are encoded in GBK, while in our development process, the encoding in IDE is UTF-8 (why and so on are not discussed here, only how to convert the encoding into the same) so the Chinese in the regular pattern string of UTF-8 encoding I wrote cannot match correctly in GBK encoded files.

At first, I didn’t have any way. I tried to change the code of PHP script file to GBK, which can also be used, but I thought this method was too low, so I looked for functions in PHP to meet my needs.

At this time, I thought of the function iconv () used in dealing with file names in windows, and its prototype is as follows:

stringiconv(string$in_charset,string$out_charset,string$str)

Performsacharactersetconversiononthestringstrfromin_charsettoout_charset.

We often use:


$out_charset='utf-8';

$fileName=iconv($fileName,$out_charset,'gbk');

To process the file name, change the file name from GBK to UTF-8 without changing the content.

Manual translation attachment:

  • If you are outputting the string $out_ Add //translit after charset, i.e. $out_ Charset=’utf-8//translate’, when encountering a character that cannot be converted to UTF-8, the program will automatically replace it with a UTF-8 character with a similar character;
  • If you are outputting the string $out_ Add //ignore after charset, i.e. $out_ Charset=’utf-8//ignore’, when a character that cannot be converted to UTF-8 is encountered, the program will automatically skip this character.
  • If you do not add anything, the replacement will be interrupted when you encounter characters that cannot be replaced with UTF-8.

However, when I use this function, the result is as follows:

It means that the maximum number of characters that iconv() function can handle is only 64, which is the general size of file name. Obviously, the content of my file is more than 64 characters.

I had no choice but to search for other functions again.

Until I found MB_ String function library, which is generally integrated in the PHP environment. We can find it in phpinfo ().

mb_ There is an MB in the string function_ convert_ Encoding() function can change the encoding of a string. Its function prototype is as follows:

stringmb_convert_encoding(string$str,string$to_encoding[,mixed$from_encoding])

Convertsthecharacterencodingofstringstrtoto_encodingfromoptionallyfrom_encoding.

The base prototype is similar to the iconv () function, but it does not modify the suffix of the output function, nor does it have a clear limit on the length of the string.

And we see $from_ Encoding is optional. It can automatically identify the source code.

Because we can’t find an exact character that can’t be transcoded, and we don’t know how it will deal with characters that can’t be transcoded.

Through MB_ convert_ The encoding () function handles the whole file, so the problem is solved smoothly.

Finally, let’s introduce MB_ String function library, whose full name is multibytestring. Many of its methods are extended from PHP’s own string function library. The function name is added in front of the original function with “mb_”, In addition to the function of the original function, these functions also add an optional parameter of $encoding at the end of the optional parameter, which can specify the encoding method of the function to process strings.

For example, the strpos () function finds the position of a string in another string.

Strpos (“welcome to visit”, “ask”, 0) returns a result of 12, because the script is UTF-8 encoded, and after converting the string to UTF-8 encoding, each Chinese character will occupy 3 bytes.

And in MB_ In strpos() function, MB_ Strpos (“welcome to visit”, “ask”, 0,’utf-8′) will return 4, and it will execute the string as if it has been converted to UTF-8.

And MB_ Strpos (“welcome to visit”, “ask”, 0,’gbk’) will return 6

The above is the detailed explanation of PHP using MB_ String processing windows Chinese characters, more about PHP using MB_ String processing windows Chinese characters, please pay attention to other related articles of developeppaer!

Recommended Today

JS generate guid method

JS generate guid method https://blog.csdn.net/Alive_tree/article/details/87942348 Globally unique identification(GUID) is an algorithm generatedBinaryCount Reg128 bitsNumber ofidentifier , GUID is mainly used in networks or systems with multiple nodes and computers. Ideally, any computational geometry computer cluster will not generate two identical guids, and the total number of guids is2^128In theory, it is difficult to make two […]