Regular expression rules and common methods of PHP

Time:2020-12-11

Note: This paper is transferred fromPick up the star blog

Regular expressions in PHP

"^ D + $" // non negative integer (positive integer + 0)
"^ [0-9] * [1-9] [0-9] * $" // positive integer
"^ ((- \ D +)| (0 +)) $" // non positive integer (negative integer + 0)
"^ - [0-9] * [1-9] [0-9] * $" // negative integer
"^ -? \ D + $" // integer
"^ D + (\. D +)? $" // non negative floating point number (positive floating point number + 0)
"^ (([0-9] + \. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] * \. [0-9] +) | ([0-9] * [1-9] [0-9] *)) $" // positive floating point number
"^ ((- \ D + (\. D +)?)| (0 + (\. 0 +)?) $" // non positive floating point number (negative floating point number + 0)
"^ (- ((([0-9] + \. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] * \. [0-9] +) | ([0-9] * [1-9] [0-9] *)) $" // negative floating point number
"^ (-?) (\. D +)? $" // floating point number
"^ [a-za-z] + $" // a 26 letter string
"^ [A-Z] + $" // a string consisting of 26 uppercase letters
"^ [A-Z] + $" // a string of 26 lowercase letters
"^ [a-za-z0-9] + $" // a string of numbers and 26 letters
"^ W + $" // a string consisting of numbers, 26 English letters or underscores
"^ [\ W -] + (\. [w -] +) * @ [\ W -] + (\. [w -] +) + $" // email address
"^[a-zA-z]+://(\w+(-\w+)*)(\.(\w+(-\w+)*))*(\?\S*)?$"  //url
/^(D {2}| D {4}) - ((0 ([1-9] {1})) | (1 [1 | 2]) - (([0-2] ([1-9] {1})) | (3 [0 | 1])) $// / year month day
/^((0 ([1-9] {1})) | (1 [1 | 2])) / (([0-2] ([1-9] {1})) | (3 [0 | 1]) / (D {2}| D {4}) $// / month / day / year
"^([w-.]+)@(([[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.)|(([w-]+.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(]?)$"   //Emil
/^((\ +? [0-9] {2,4} \ - [0-9] {3,4} \ -) | ([0-9] {3,4} \ -)? ([0-9] {7,8}) (\ - [0-9] +)? $// / phone number
(D {1,2} ә 1DD ә 2 [0-4] d_ [0-5]) (D {1,2} dd_ [0-4] d_ [0-5]) (D {1,2} DD ә 2 [0-4] d_ [0-5]) (D {1,2} dd_ [0-4] d [0-5]) (D {1,2} 1DD DD [0-4] d_ [0-5 [0-5]) (D {1,2,2} 1DD [0-4] IP address

Regular expression matching Chinese characters: [u4e00-u9fa5]

Match double byte characters (including Chinese characters): [^ ^ X00 - \ XFF]

Regular expressions matching empty lines: n [s|] * r
Regular expression matching HTML Tags: / <)>.< /1>|<(.*) />/
Regular expression matching first and last spaces: (^ s))|(s$)
Regular expression matching email address: W + ([- +.] W +)@w+([-.]w+).w+([-.]w+)*
Regular expression matching URL of web address: ^ [A-ZA – Z] +: / / (- W + (- W +)))(\.(\w+(-\w+)))(\?\S)?$
Whether the matching account is legal (start with a letter, allow 5-16 bytes, allow alphanumeric underscores): ^ a-za-z {4,15}$
Match domestic phone number: (D {3} – | D {4} -)? (D {8} | D {7})?
Matching Tencent QQ number: ^ [1-9]1-9$

Metacharacters and their behavior in the context of regular expressions:
Marks the next character as a special character, or an literal character, or a backward reference, or an octal escape character.
^Matches the start of the input string. If the multiline property of the regexp object is set, ^ also matches the position after ‘n’ or ‘R’.
$matches the end of the input string. If the multiline property of the regexp object is set, the $also matches the position before ‘n’ or ‘R’.
*Matches the preceding subexpression zero or more times.
+Matches the previous subexpression one or more times. +It is equivalent to {1,}.
? matches the previous subexpression zero or once. ? is equivalent to {0,1}.
{n} N is a nonnegative integer, which matches n times.
{n,} n is a nonnegative integer that matches at least N times.
Both {n, m} m and N are nonnegative integers, where n < = M. At least N times and at most m times. You cannot have spaces between commas and two numbers.
When the character follows any other qualifier (*, +,?, {n}, {n,}, {n, m}), the matching pattern is non greedy. The non greedy model has as few matches as possible

The default greedy pattern matches as many strings as possible.
. matches any single character except “n”. To match any character, including ‘n’, use a pattern like ‘[. N]’.
(pattern) match the pattern and get the match.
(?: pattern) matches the pattern but does not get the matching result, that is, it is a non retrieval match and is not stored for future use.
(? = pattern) forward prefetching, matching the search string at the beginning of any string that matches a pattern. This is a non fetch match, that is, the match does not need to be

Get for later use.
(?! pattern) has the opposite effect as (?! pattern)
X|y matches X or y.
[XYZ] character set.

[^ XYZ] negative character set.

[A-Z] character range, which matches any character in the specified range.

[^ A-Z] negative range of characters to match any character that is not in the specified range.

B matches a word boundary, which is the position between the word and the space.
B matches non word boundaries.
CX matches the control character specified by X.
D matches a numeric character. It is equivalent to [0-9].

\D matches a non numeric character. Equivalent to [^ 0-9].

F matches a page break. It is equivalent to x0c and CL.
N matches a newline character. It is equivalent to x0a and CJ.
R matches a carriage return. It is equivalent to x0d and cm.
S matches any white space characters, including spaces, tabs, page breaks, and so on. It is equivalent to [fnrtv].

\S matches any non white space characters. It is equivalent to [^ ^ f / N / R / T / v].

T matches a tab character. It is equivalent to X09 and CI.
V matches a vertical tab. It is equivalent to x0B and CK.
W matches any word character that includes an underline. Equivalent to ‘[a-za-z0-9_ ]’。

\W matches any non word characters. Equivalent to '[^ a-za-z0-9_ ]’。

Xn matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be two digits long.
Num matches num, where num is a positive integer. Reference to the match obtained.

N identifies an octal escape value or a backward reference. If there are at least n acquired subexpressions before N, then n is a backward reference. Otherwise, if n is an octal digit (0-7), then n is an octal escape value.

Nm identifies an octal escape value or a backward reference. If there are at least nm derived subexpressions before nm, then nm is a backward reference. If there are at least n fetches before nm, then n is a backward reference followed by the word M. If neither of the previous conditions is satisfied, if n and m are octal digits (0-7), then nm will match the octal escape value nm.

NML if n is an octal digit (0-3), and m and L are octal digits (0-7), then the octal escape value NML is matched.

UN matches n, where n is a Unicode character represented by four hexadecimal digits.
Regular expression matching Chinese characters: [u4e00-u9fa5]

Match double byte characters (including Chinese characters): [^ X00 XFF]

Regular expressions matching empty lines: n [s|] * r
Regular expression matching HTML Tags: / <)>.</1>|<(.*) />/
Regular expression matching first and last spaces: (^ s))| (s$)
Regular expression matching email address: W + ([- +.] W +)@w+([-.]w+).w+([-.]w+)*

Regular expression matching URL: http: // ([w -] +.) + [w -] + (/ [w -. /?%, = *)?

Using regular expressions to restrict the input content of text box in web form:

Use regular expression to restrict Chinese input only:

onkeyup="value=value.replace(/[^u4E00-u9FA5]/g,'')" 

onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^u4E00-u9FA5]/g,''))"

Use regular expressions to restrict the input of full width characters only:

onkeyup="value=value.replace(/[^uFF00-uFFFF]/g,'')" 

onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^uFF00-uFFFF]/g,''))"

Use regular expressions to restrict you to only enter numbers:

('text',clipboardData.getData('text').replace(/[^d]/g,''))"

Use regular expression to restrict the input of only numbers and English:

onkeyup="value=value.replace(/[W]/g,'') 

"onbeforepaste="clipboardData.setData('text',clipboardData.getData('text').replace(/[^d]/g,''))"

Regular expressions in common use
Regular expression matching Chinese characters: [u4e00-u9fa5]

Match double byte characters (including Chinese characters): [^ ^ X00 - \ XFF]

Regular expressions matching empty lines: n [s|] * r
Regular expression matching HTML Tags: / <)>.</1>|<(.*) />/
Regular expression matching first and last spaces: (^ s))|(s$)
Regular expression matching IP address: / (D +). (D +). (D +). (D +) / g//
Regular expression matching email address: W + ([- +.] W +)@w+([-.]w+).w+([-.]w+)*

Regular expression matching URL of web address: http: // [[w -] + \.) + [\ W -] + (/ [\ W -. /?%, = *)?

SQL statement: ^ (select drop delete create update insert)*$
1. Nonnegative integer: ^ D+$
2. Positive integer: ^ [0-9]1-9$
3. Non positive integer: ^ ((- D +)| (0 +))$
4. Negative integer: ^ – [0-9]1-9$
5. Integer: ^ -? D+$
6. Nonnegative floating point number: ^ D + (. D +)$
7. Positive floating point number: ^ ((0-9) +. [0-9]1-9)|([0-9]1-9. [0-9]+)|([0-9]1-9))$
8. Non positive floating point number: ^ ((- D +. D +)?) ((0 + (. 0 +)?))$
9. Negative floating point number: ^ (- (positive floating-point regular formula)))$
10. English string: ^ [a-za-z]+$
11. English capital string: ^ [A-Z]+$
12. English lowercase string: ^ [A-Z]+$
13. English character digit string: ^ [A-ZA – z0-9]+$
14. English number underlined string: ^ w+$
15. E-mail address: ^ [w -] + (. [w -] +) * @ [w -] + (. [w -] +)+$
16、URL:^[a-zA-Z]+://(w+(-w+))(. (w+(-w+)))(?s)?$
Or:

^http:\/\/[A-Za-z0-9]+\.[A-Za-z0-9]+[\/= \?%\-&_~`@[\]\':+!]*([^<>\"\"])*$

17. Postcode: ^ [1-9] d {5}$
18. Chinese: ^ [u0391-uffe5]+$
19. Telephone number: ^ ((d2,3) | (D {3} -)? (0d2,3 | 0d {2,3} -)? [1-9] d {6,7} (- D {1,4})$
20. Mobile phone number: ^ (((D {2,3})) | (D {3} -))? 13D {9}$
21. Double byte characters (including Chinese characters): ^ X00 XFF
22. Match the first and last spaces: (^ s))|(s$) (trim function like VBScript)
23. Matching HTML Tags: <)>.</1>|<(.*) />
24. Matching blank line: n [s|] * r
25. Extracting network links in information: (H | h) (R | R) (E | E) (f | f)= (‘|”)?(w|\|/|.)+(‘|”| *|>)?
26. Email address in information extraction: W + ([- +.] W +)@w+([-.]w+).w+([-.]w+)*
27. Extract image links from information: (s | s) (R | R) (C | C)= (‘|”)?(w|\|/|.)+(‘|”| *|>)?
28. Extract the IP address in the information: (D +). (D +). (D +). (D +)
29. Chinese mobile phone number in information extraction: (86)013d{9}
30. Extract the Chinese fixed telephone number from the information: (D3,4 | D {3,4} – | s)? D {8}
31. Extract Chinese phone numbers (including mobile and fixed lines) in the information: (D3,4 | D {3,4} – | s)? D {7,14}
32. Extract the Chinese postal code from the information: [1-9] {1} (D +) {5}
33. Extract floating-point numbers (i.e. decimals) in information: (-? D *).? D+
34. Extract any number in the information: (-? D *) (. D +)?
35、IP:(d+).(d+).(d+).(d+)
36. Area code: / ^ 0d {2,3}$/
37. Tencent QQ number: ^ [1-9]1-9$
38. Account number (at the beginning of a letter, 5-16 bytes are allowed, and alphanumeric underscores are allowed): ^ a-za-z {4,15}$
39. Chinese, English, numbers and underscores: ^ [u4e00-u9fa5_ a-zA-Z0-9]+$

40. Chinese characters, English, numbers, underscores, short links – different extraction methods under utf8 and GB2312 (examples are as follows:

function getChinaEnglishNumStrlen($str,$charset='utf8'){

if($charset=='gb2312'){
if(!preg_match_all("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_\-]+/",$str,$match)){
return false;
}
return implode('',$match[0]);
}
//
if($charset=='utf8'){
if(!preg_match_all("/[\x{4e00}-\x{9fa5}A-Za-z0-9_\-]+/u",$str,$match)){
return false;
}
return implode('',$match[0]);
}
return false;

}

The above function returns the extracted alphanumeric characters_ -Symbol string combination

41. Filter out special characters and keep only Chinese, English letters, numbers, underscores and dashes
be careful:In the following method, only Chinese characters, English letters, numbers, underscores, and dashes are reserved, and other symbols are filtered out. If the string is UTF-8, the following ones do not need transcoding, and the MB inside can be commented out_ convert_ Encoding method.

/**
     *Filter special characters (only Chinese, English letters, numbers, underscores and dashes are reserved)
     *@ desc this method is mainly used to filter the content of sensitive words with special symbols in the advertisements
     *@ param string $STR characters to be processed (GBK code)
     * @return string
     */
    function filter_special_characters($str)
    {
       if(empty($str))  return "";

       //Converting GBK into UTF-8 code
       $str = mb_convert_encoding($str, "utf-8", "gbk");
       
       //Filtered string
       $new_str = "";

       //Regular matching
       if(preg_match_all("/[\x{4e00}-\x{9fa5}A-Za-z0-9_\-]+/u", $str , $match))
       {
         if($match[0])
         {
           foreach($match[0] as $val)
          {
            $new_str  .= $val;
          }

          //Transcoding to GBK output
          $new_str = mb_convert_encoding($new_str , "gbk", "utf-8");
      
        }
       
       }

       return $new_str;

    }

  //Call method to test special characters in filtering spam advertisements
  $STR = "a dream in the world of mortals + Q [1 ⒐ 6.2.4] [reputation first]";
  $new_str = filter_special_characters($str);
  print_r($new_str);

  //Printout
  //A dream in the world of mortals

42、preg_ Match combined with regular use
preg_ Match() will stop matching after one successful match. If you want to match all the results, you need to use preg_ match_ All() function.

preg_match (pattern , subject, matches)

Example 1 – find letters:

<?php 
//The "I" after the pattern qualifier indicates a case insensitive search 
if (preg_match ("/hi/i", "Welcome to hi-docs.com.")) { 
  echo "A match was found."; 
 } else { 
  echo "A match was not found."; 
 } 
 ?> 

Output:
A match was found.

Example 2 – matching URL hyperlinks in strings

<?php
$urls = '<h3><a target="_ blank" href="/php/preg_ match.html "><span class="hl">preg</span>_ match()</a></h3><p>[<a href="/ Php.html "> PHP < / a >] for regular expression matching < br / > < EM > applicable version: 5 < / EM > < / P > < DD > < DD > < H3 > < a target ="_ blank" href="/php/preg_ match_ all.html "><span class="hl">preg</span>_ match_ all()</a></h3>';
if(preg_match("/<a[^>]*?href=\"([^>]+?)\"[^>]*?>.+?<\/a>/i", $urls ,$match)) { 
  print_r($match); 
 } else { 
  Echo "does not match."; 
 } 
 ?>

Output:
Array
(
    [0] => <a target="_blank" href="/php/preg_match.html"><span class="hl">preg</span>_match()</a>
    [1] => /php/preg_match.html
)

Example 3 – using regular expressions to match Chinese

$str = 'preg_ Match regular matching Chinese 123 ';
//Regular expression matching Chinese (utf8 encoding)
if(preg_match('/[\x{4e00}-\x{9fa5}]+/u',$str)){
    Echo 'match';
}else{
    Echo 'no match';
}
//GB2312, GB2312
preg_match("/^[".chr(0xa1)."-".chr(0xff)."A-Za-z0-9_]+$/",$str);

Match the relevant data according to the article number:

define('runcode', 1);


$SKU = "color category: a72287962 brown; size: XXL; lovers' style: Men's style";

//Regular matching for Chinese
$pattern = "/[\x{4e00}-\x{9fa5}]+[:|;|;|\s]([A-Za-z0-9_-]+)\s*(.*)?[:|;|;|\s]+[\x{4e00}-\x{9fa5}]+[:|;|;|\s]([A-Za-z0-9]+)/u";
if(preg_match($pattern, $sku, $matches))
{
  dump($matches);
}

Print results:

Array
(
    [0] = > Color Classification: a72287962 brown; size: XXL
    [1] => A722287962
    [2] = > Brown
    [3] => XXL
)

preg_ Match usage explanation