Detailed explanation of PCRE regular parsing code in PHP

Time:2020-1-15

I. Preface

In the previous blog, there is a parsing of character set. This is not about character set. In PHP, many functions are processed in UTF-8 encoding format in Unicode by default. So don’t talk too much, just get to the point.

2、 Analysis of PHP function mb_split

<?php
$preg_strings = 'test, try, one, next';
$preg_str = mb_split('、', $preg_strings);
print_r($preg_str);

Output effect

Array(

  [0] = > test

  [1] = > test

  [2] = > 1

  [3] = > below)

This function is parsed in UTF-8 encoding format by default. Use the hexadecimal code point of Unicode of the separator (,) to divide the character $preg_strings.

3、 PHP function preg_split parsing

Split string “test it”

<?php
$strings = 'test it';
$mb_arr = preg_split('//u', $strings, -1, PREG_SPLIT_NO_EMPTY);
print_r($mb_arr);

The printing results are as follows:

Array(

  [0] = > test

  [1] = > test

  [2] = > 1

  [3] = > below

)

4、 Analysis of / u in PCRE

In PHP, the regular delimiters can be ා,%, / and so on.

Sometimes there are modifiers after a regular. So what do they mean?

For example:


%[\x{4e00}-\x{9fa5}]+%u

The following modifier u code table is matched in the encoding format of UTF-8 with regular matching.

Example 1:

<?php
 $strings = 'test it';
 $is_true = preg_match_all('%[\x{4e00}-\x{9fa5}]+%u', $strings, $match);
var_dump($is_true);

The printing results are as follows:

Array(

  [0] => Array

    (

      [0] = > test it

    )

)

What does [\ x {4e00} – \ x {9fa5}] mean here?

In PHP regex \ x is used to represent hexadecimal.

The Unicode code point in Chinese is 4e00 – 9fff (all of which are hexadecimal)

Therefore, the regular matching method is interval [], [\ x {4e00} – \ x {9fff}]

The effect of these two regularities is the same.

Recommended Today

[Redis5 source code learning] analysis of the randomkey part of redis command

baiyan Command syntax Command meaning: randomly return a key from the currently selected databaseCommand format: RANDOMKEY Command actual combat: 127.0.0.1:6379> keys * 1) “kkk” 2) “key1” 127.0.0.1:6379> randomkey “key1” 127.0.0.1:6379> randomkey “kkk” Return value: random key; nil if database is empty Source code analysis Main process The processing function corresponding to the keys command is […]