Regular use (? & gt )Realize curing grouping and improve efficiency

Time:2020-11-15

Specifically, use (? >) )”Is no different from a normal match, but if the match goes after this structure (that is, after the closed parentheses), all the standby states in the structure are discarded (cannot be backtracked).
That is to say, at the end of solidification group matching, the matched text has been solidified into a unit, which can only be retained or abandoned as a whole. The untouched standby states in the subexpression in parentheses are no longer available, so backtracking can never select any of them (at least, the state in which “locked in” is in when the structure match is complete).
example:
For example, to process a batch of data, the original format was 123.456. Later, due to the display problem of floating-point numbers, part of the data format changed to 123.4566000000789, which requires only 2-3 digits after the decimal point, but the last digit cannot be 0. How to write this regular? (the numbers after the decimal point are directly considered below). After writing the regularization, we have to use this regularization to match the data and replace the original data with the matching result.

Regular one

Copy codeThe code is as follows:
$str = preg_replace(‘\.(\d\d[1-9]?)\d*’,’\\1′,$str);
//Group 1 of the matching result is referred back

Obviously, this method of writing, for part of the data format of 123.456 this format, white processing, in order to improve efficiency, we have to deal with this regular. Comparing the string 123.456 with others, we find that there is no number after 123.456, so we have to deal with it in vain. That’s easy to do. Let’s change the regular rule by changing the following quantifier * to +. In this way, we will not deal with those with 1 or 2 digits after 123.45 decimal point. Moreover, for those with more than three digits, the processing is normal. Its PHP code is

Regular 2

Copy codeThe code is as follows:
$str = preg_replace(‘\.(\d\d[1-9]?)\d+’,’\\1′,$str);

OK, is this regular really OK?? Now, let’s also analyze the regular matching process.
The string “123.456”, and the regular expression is [\. (\ D / D [1-9]?) / D +], let’s take a look

First of all (123 before the decimal point),
If [\.] matches “.”, the matching is successful. If the control right is given to the next [?], the [?] matches “4” successfully, and the control right is given to the second [?], which matches “5”. Then, the control right is given to [[1-9]?]. Since the quantifier is [?], the regular expression follows “quantifier first matching”, and here is [?], which will leave a backtracking point 。 Then the match “6” is successful, and then the control right is given to [D +]. If the [- D +] finds that there is no character after it, it follows the “last in, first out” rule and returns to the previous backtracking point for matching. At this time, [[1-9]?] will return its matched character “6”, and [[1-9]?] matches “6” successfully. The match is complete. It is found that the result of [(([1-9]?)] matching is indeed “45”, which is not the “456” we want, and “6” has been matched by [D +]. So what should we do? Can [1-9]?] match successfully without backtracking? This uses the above-mentioned “solidification group”, PHP (preg)_ The regular engine used in replace function supports fixed grouping. According to the writing method of fixed group, we can change the code into the following way

Regular 3

Copy codeThe code is as follows:
$str = preg_replace(‘\.(\d\d(?>[1-9]?))\d+’,’\\1′,$str);

In this case, the string “123.456” does not meet the requirements and will not be matched. Then we can fulfill our requirements.

So let’s take a look at (\. [1-9]?) / D +.
In the fixed group, the quantifier can work normally, so if [1-9] does not match, the regular expression will return to the standby state left by. Then the matching is separated from the curing group and continues to move to the “⁃ D +”. In this case, when control leaves the curing group, no standby state needs to be abandoned (because no standby state is created in the solidification group).
If [1-9] can match, the standby state saved by “?” will still exist after the matching is separated from the solidification group. However, it will be discarded because it belongs to the closed solidification group.
This happens when matching ‘. 625’ or ‘. 625000’. In the latter case, abandoning those States does not cause any trouble, because the “﹤ D +” matches “. 625000”, where the regular expression has finished matching. However, for ‘. 625’, the regular engine needs to backtrack because the “﹣ D +” cannot match, but the backtracking cannot be carried out because the standby state no longer exists. Since there is no backup state that can be traced back, the whole match fails, and ‘. 625’ does not need to be processed, which is exactly what we expect.

Recommended Today

PHP 12th week function learning record

sha1() effect sha1()Function to evaluate the value of a stringSHA-1Hash. usage sha1(string,raw) case <?php $str = “Hello”; echo sha1($str); ?> result f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0 sha1_file() effect sha1_file()Function calculation fileSHA-1Hash. usage sha1_file(file,raw) case <?php $filename = “test.txt”; $sha1file = sha1_file($filename); echo $sha1file; ?> result aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d similar_text() effect similar_text()Function to calculate the similarity between two strings. usage similar_text(string1,string2,percent) case […]