I took notes when learning regular. In fact, regular is not difficult

Time:2022-1-14

RT, regular can handle a lot of things Oh, let’s learn

I regular expression
1. Matching character
1) header matching character “^”: for example, ^ 0754 only matches the string with the beginning of 0754
2) tail matching character “$”: for example, 0754 $, only strings ending in 0754 are matched
3) full word matching: combine ^ and $, such as ^ 0754 $, to match 0754 string
2. Escape character
1) empty character:
Wrap line \ n
Enter \ R
Tab \ t
2) other characters:
        “$” \$
        “^” \^
        “+” \+
        “/” \/
3. Wildcard
1) * sign: used to match whether the previous character appears zero or more times in the string
Example 1: ‘ABC *’ matches all strings containing ab.
2) + No.: One or more times
Example 2: ‘ABC +’ matches all strings containing ABC.
    3)? number:……………………………….. Zero or once
Example 3: only match the string containing AB and ABC, and the end no longer contains C. For example, ABCA, AABC and aaab are OK, but ABCC is not.
4. About escape characters \ $and double and single quotation marks (PHP4 environment)
1) the regular expression itself is a string.
2) when $is contained in quotation marks, there is a difference between double quotation marks and single quotation marks. The differences are as follows:
(1) when using single quotation mark definition, the interpreter will assign all characters (including $) in the quotation mark to the string variable intact.
(2) when double quotation marks are used for definition, the interpreter will translate the “$” character in the quotation marks and the following legal characters (letters, numbers and underscores) into variables. The variable name will not end until an illegal character is encountered. The illegal character and the characters after it are regarded as general characters and assigned to string variables until the next “$” is encountered.
(3) Note: when a single $appears at the end of the double quotation mark and there are no more characters after it, the interpreter will not translate it into a variable. There is no need to add escape \, of course, it is not advocated.
(4) if the character to be matched has $, the regular expression cannot be defined in double quotation marks because the meaning of the escape character \ $in single and double quotation marks is different:
< 1 > in double quotation marks, \ $has the same meaning as a single $and represents the tail matching character, so C \ $$= C \ $\ $= C \ $\ $= C $= C \ $\ $; In the double quotation marks, \ $represents only one character “$” at any time. The echo “C \ $$” result is C $$, and \ $and a single $(a single $means that the $cannot form a variable name with subsequent characters) are exactly equivalent. They are tail matching characters. Therefore, the character “$” as a non tail matching character cannot be written in the double quotation marks, which is precisely why, Most regular expressions that need to match $can only be defined with ”.
< 2 > in single quotation marks, \ $only means the character “$”, and the trailing matching character is $, regardless of whether there is a legal variable name character after it; In single quotation marks, \ $is actually two characters. If it is not used for regular matching, it will have no meaning. The echo ‘C \ $$’ result is still C \ $$. Single is used as a regular expression. The \ $in single quotation marks represents the special character “$”, and the tail matching character is a separate $character.
3) the tail matching character “$” of regular expression is the same as the variable definer:
Example 1: define the regular expression as ^ AB $: $pattern = “^ ab \ $”; The escape character \ $in double quotation marks represents the character $, and the result is ^ AB $.
Example 2: as above, use $pattern = “^ AB $”; Obviously wrong, but because $is at the end and there are no other characters after it, it still applies.
Example 3: regular expression ending with character combination C $: $pattern =’c \ $$’;
Example 4: as mentioned above, $pattern = “C \ $$”; The regular expression treats \ $as a tail matcher, so the match only ends with C.
5. Usage of “[]” square brackets (character cluster)
1) [] matches a character. Using the beginning of ^ in [] means taking no, that is, all subsequent characters do not match.
Example 1: [a-za-z0-9] matches all upper and lower case letters and numbers.
Example 2: [\ n \ t \ R \ F] matches all null characters.
Example 3: [^ A-Z] does not match uppercase letters.
Example 4: ^ [^ 0-9] matches a character or string that does not start with a number
2) special character “.” (period) matches all characters except “new line”, pattern ^ ABC $matches any character ending in ABC, but not itself. Mode “.” You can match any string except an empty string and a string with only one “new line” character.
Example 1: ‘^ abc$’; Match all strings with ABC at the end, do not match decimals (new lines), when ABC is not matched.
Example 2: ‘.’; Matches all strings, but does not match null values.
Example 3: ‘ abc’; Match all strings containing ABC, decimals, etc. on the premise that ABC is not the first and ABC is not matched.
Example 4: ‘ abc$’; Match all strings ending in ABC, any decimal, etc. do not match ABC.
3) PHP provides built-in general character clusters:
[[: Alpha:]] any letter
[[: digit:]] any number
[[: alnum:]] any letters and numbers
[[: Space:]] any white space character
[[: Upper:]] any uppercase letter
[[: lower:]] any lowercase letter
[[: punct:]] any table point symbol
[[: xdigit:]] any hexadecimal digit
[[: CNTRL:]] any character with an ASCII value less than 32
Note: the above character cluster has a feature. As long as there is this character in the matched character or string, the matching is correct, no matter how the string is composed.
6. “{}” brace usage
1) square brackets can only match one character, while matching multiple characters can only be realized by {}: {} is used to determine the number of occurrences of the previous content. {n} Indicates n occurrences; {m, n} indicates m ~ n times, including M and N times; {n,} indicates n or more occurrences.
Example 1: ^ a {10} $; Match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.
Example 2: [0-9] {1,} $; Matches all numbers > 0.
2) relationship between “{}” and wildcard
    ? Equivalent to {0,1} zero or one time
    *  ….. {0,} zero or countless times
    +  ….. {1,} once or countless times
7. “()” usage
The pattern enclosed by parentheses “()” indicates the sub pattern, such as $pattern = ‘([1-9] {1} [0-9] {3}) – ([0-1] {1} [1-2] {1}) – ([0-3] {1} ([0-9] |)’; What () expands is a sub pattern, () is equivalent to making them independent, matching them separately without interfering with each other.
II POSIX style regular expression function
1.ereg
    ereg(pattern,string,[array $regs]);
    eregi(pattern,string,[array $regs]);
The ereg function finds the text satisfying the pattern pattern in the string. If true is found, false is not found. If there is a third parameter $regs, the found text will be placed in $regs [0], and the regs array will store the sub pattern matching results expressed in parentheses at one time$ Regs [1] stores the matching results of the first sub pattern, and $regs [2] is the second, from left to right, and so on. If no matching text is found, the value of the $regs array will not be changed.
Note: if a matching text is found, ereg () will only change the value of the first 10 elements of the $regs array, no matter how many sub patterns are found > 9 or < 9. However, this does not affect the matching result of the function to the sub pattern combination. Ereg always matches first. If no matching text is found, it will be false, and if found, it will be true. If there are sub patterns, it will gradually find the matching text in the string according to these sub patterns until the $regs array is filled with 10 elements or all sub patterns are matched. If the sub patterns are less than 10, the remaining $regs will be null. In a word, matching returns to matching, $regs returns to $regs, and $regs has only 10 values.
The eregi () function is basically the same as ereg (), except that eregi is case insensitive.
2.ereg_ Replace and eregi_ replace
    ereg_replace(pattern,string replacement,string)
    eregi_replace(pattern,string replacement,string)
The text in the string that satisfies the pattern will be replaced with replacement. If there is text matching pattern in the string, the replaced value is returned. If there is no text matching pattern, the original string value is returned.
If the pattern contains sub patterns, the sub patterns can be selectively retained without being replaced.
Example 1: the second sub pattern in pattern is not replaced. The replacement can be written as follows: replacement \ \ 2. In this way, the string of the matching pattern in the string will be replaced with replacement + pattern2, which represents the text matching the second sub pattern of pattern in the text matching the pattern. If you use “\ \ 0”, the entire matching text is retained. Using this feature, you can insert text after a specific string.
Replacement must be a string type variable. If not, it will be cast to string type during replacement.
3. Usage of split() function and spliti() function
    split(pattern,string,[int limit]);
    spliti(pattern,string,[int limit]);
Split separates a string into several parts using the pattern defined by the regular expression pattern as the separator. If the separation is successful, the returned value is an array of separated parts. If it fails, false is returned. Optional limit indicates the maximum number of split blocks. If the limit is 5, even if > 5 parts of the string conform to the pattern, the string is only divided into 5 parts, and the last part is the rest of the string after removing the first four parts. There are only 5 elements in the return value.
III Perl style regular expressions and related functions
1. Perl regular syntax
Perl separator, you can use “/” and “!” And “{}”.
Example 1: / ^ [^ 0-9] /^ [0-9]! {^ [0-9]} all three are the same.
Inside the separator, the separator character itself is a special sensitive character that needs to be escaped. If the separator “/” is used and the expression character “/” is used in the regular, then “\ /” must be used. If you mix “/” and “!” No problem.
Example 2: / \ / \ / $/! / / $! Both are the same
Example 3:! ^ \! \! [0-9]$!     /^!! [0-9] $/ both are the same
2. Perl special meaning characters
\ a , alarm symbol with ASCII value of 7
\ B} word boundaries
\ a ‘is equivalent to the escape sign (“/”)
\ B # non word boundary
\ cn , control characters
\ D , single number
\ D # single non numeric
\ s # single blank
\ s # single non blank
\ w , single letters or underscores
\ w # single non word characters (not letters or underscores)
\ Z , match from the end of the target string
3. Advanced features
1) or operation “|”:
For example^ ex|em! The matching condition is a string beginning with ex or em, which can also be written as^ e(x|m)!。
Note: the contents in () represent sub patterns\
2) mode options after logical symbols
        ! Regular expression! Logical options
A: match only the characters at the beginning of the target string.
E: this option makes the regular expression composed of the escape character $match only the end character of the target string. If you select the M option, this option is ignored.
U: this option disables the search of the maximum length. In general, the search will try to find the longest matching string. For example, the matching result of pattern / A + / in the “caaaaaab” string is “AAAAA”, but the matching result of pattern / A + / u using this option will be “a”.
S: learn the mode and improve the search speed.
I: this option ignores case.
M: this option treats strings containing newline characters as multiple lines rather than one line. At this time, “$”, “^” and other characters will match each newline character.
S: this option makes the period “.” Line breaks are also matched.
X: this option tells the PHP interpreter to ignore the non escape space character in the regular expression definition during analysis. In this way, you can use spaces in regular expressions to enhance their readability, but when you use spaces in expressions, you must use escape characters.
3) extended mode symbol.
(? #comment) add a comment to enhance regular readability.
(? = pattern) specifies that the pattern must be followed by the value pattern.
(?! pattern) specifies that the value pattern cannot be followed after the pattern.
(? N) define the mode option n inside the mode instead of at the end.
(?:) consumes characters and does not capture matching results.
Example: echo ereg (“?: ^ a $”, “a”)// No output.
4. Per regular function
    1. preg_ Grep function
        preg_grep(pattern,array input);
Find the string matching the pattern pattern in the input array, and return all the matching strings. The return value is an array of all matching strings.
    2. preg_ Match function
        preg_match(pattern,string subject,[array matches])
This function looks for a string matching pattern in the subject string. If found, a non-zero value is returned; otherwise, a zero value is returned. If optional matches is selected, the matching string will be placed at the position of the first element and can be read with $matches [0]. The results of parenthesis matching are also placed in this array in order. The first is $matches [1], the second is $matches [2], and so on.
    3. preg_ match_ All function
        preg_match_all(pattern,subject,array matches,[int order])
This function looks for non overlapping text matching pattern in the subject string. If it finds the matching text, it returns the number of matching text, otherwise it returns 0. The matching text is placed in the two-dimensional array matches, where all matching strings are stored. The matching results of various embedded sub patterns are placed in the array matches [1] ~ [n].
The order parameter is optional, and the desirable value is preg_ PATTERN_ Order and preg_ SET_ ORDER。
    4. preg_ Replace function
        preg_replace(pattern,replacement,subject,[int limit])
This function replaces the part of the subject that conforms to the pattern with replacement. The return value type is the same as that of the subject. If there is replacement, it returns the replaced value, otherwise it returns the original value.
Parameters can be arrays or variables. There are several cases:
< 1 > if the subject parameter is an array type. Function to replace each array element;
< 2 > if the pattern is an array, the function replaces it according to the type in each pattern;
< 3 > if both pattern and replacement are arrays, the replacement is completed according to the corresponding elements in the two arrays;
< 4 > if the number of elements in the replacement is less than the number of elements in the pattern. Then the insufficient part will be replaced by an empty string.
    5. preg_ Split function
        preg_split(pattern,subject,[int limit][flages])
This function takes the pattern defined by pattern as the separator, separates the subject string into several parts, and returns an array in which the separated string is stored. Limit can limit the number of returned strings. If it is set to – 1, it means that there is no limit on the number of returned strings. Flags is also optional and has two values: preg_ SPLIT_ NO_ Empty setting function does not return empty string, PERG_ SPLIT_ DELIM_ Capture, this option sets that the embedded sub pattern in the pattern will also be matched by the function.