PHP real regular expression (1): verifying mobile phone number

Time:2021-1-27

In this paper, by gradually improving a regular expression to verify the mobile phone number, we introduce the application of regular expressionCharacter groupclassifierString start / end positiongroupingSelection structure in groupingBack referenceNamed groupingAnd so on.

1 basic verification

That is to verify whether the string is 11 digits.

expression

  • [0123456789]{11}

  • or[0-9]{11}

  • or\d{11}

PHP real regular expression (1): verifying mobile phone number

Knowledge points

Character group:Using square brackets in regular expressions[...]Represents a character group. Character groups represent characters that may appear at the same location.
For example,[0123456789]Represents any one of the matching numbers 0123456789;[0123abc]Match either the number 0123 or the letter ABC.

Range representation of character group:Use dashes in character groups([..-..])To represent a range of characters.
For example,[a-z]It means to match any one of all lowercase English letters;[a-zA-Z]Matches one of all lowercase and uppercase letters;[0-9]Represents any one of the matches 0123456789.
Note that the default range is between the ACS II code of the starting character and the ACS II code of the ending character

Abbreviation of character group:For some common character groups, regular expressions specify some shorthand symbols to represent them.

  • \dAll the numbers, i.e[0-9]

  • \DAll non numbers, and\dmutex

  • \wAll word characters (characters, numbers, underscores), i.e[0-9a-zA-Z_]

  • \WAll non word characters, and\Wmutex

  • \sAll white space characters, including spaces, tabs, carriage returns, line breaks and other white space characters

  • \SAll non white space characters, and\smutex

classifier:A quantifier indicates the number of times the object it modifies (such as a character or a character group) may appear.
The general form of quantifier is{m,n}(comma),It means that the number of occurrences of the character (or character group) modified by it is greater than or equal to m times and less than or equal to N times. In particular

  • {m}Indicates that the modified object can only appear m times;

  • {0,n}The modified object can appear n times at most and 0 times at least;

  • {m,}Indicates that the modified object appears at least m times.

The length of 2 can only be 11?

Looking at the code in the GIF below, you can see that when the input string is a number with a length of 15, it can also match the first 11 numbers. Even the input character isabcd180123412341234It can also match to 11 numbers.

This is because the above regular expression means “match 11 numbers”as long asIn the input stringThere are 11 consecutive numbersYou can match it. To verify that the input string is just the phone number, you need to use the starting position of the string in the regular expression^And the end of the string$

PHP real regular expression (1): verifying mobile phone number

expression

  • ^\d{11}$

PHP real regular expression (1): verifying mobile phone number

Knowledge points

There are symbols in regular expressions that match position, not text. These symbols are calledAnchor point(anchor)。^$That’s two of them.

^The matching position is the beginning of the string
$The matching position is the end of the string

3. More rigorous verification

We all know that the common mobile phone numbers in China begin with 130-139150-153, 155-159, 180, 182, 185-189, and there are also 170, 176-178, etc. The expression we got in the previous section does not validate the beginning of the phone.

expression

^1(3[0-9]|5[012356789]|8[0256789]|7[0678])\d{8}$

PHP real regular expression (1): verifying mobile phone number

Knowledge points

grouping:You can use parentheses in regular expressions(...)Represents a group (subexpression), so that in the matching result, in addition to all the matching contents, the matching contents of each subexpression will be returned. As shown in the above figure, the 0 th element of the array is the value matched by the whole regular expression, and the 1 st element is the value matched by the regular expression in the pair of parentheses.

preg_match('/^1(3[0-9]|5[012356789]|8[0256789]|7[0678])\d{8}$/', '18012341234', $arr);
print_r($arr);
/*
Array
(
    [0] => 18012341234
    [1] => 80
)
*/

Select structure:Pair of parentheses(...)The subexpressions in are shown with vertical bars|The separation indicates different choices, and the entire regularity in parentheses can match any choice.
For example,(3[0-9]|5[012356789]|8[0256789]|7[0678])The matching value here can be3[0-9]perhaps5[012356789]perhaps8[0256789]perhaps7[0678]Any one of them.

The icing on the cake

Sometimes, there will be a number in the middle of the cell phone number-Symbol, become180-1234-1234For example, the current iPhone will automatically convert the phone number to this format.

According to some knowledge introduced so far, you can write the following regular expression to be compatible180-1234-1234In the form of:

^1(3[0-9]|5[012356789]|8[0256789]|7[0678])-{0,1}\d{4}-{0,1}\d{4}$

among-{0,1}Representation character-It can appear once or not. This is the quantifier we have seen before. In fact, in regular expressions, for this kind of quantifierCommon quantifiersSpecial notation is also provided

  • ?amount to{0,1}, can appear 0 or 1 times

  • +amount to{1,}, the number of occurrences is greater than or equal to 1

  • *amount to{0,}, the number of occurrences is greater than or equal to 0

Therefore, the above regular expression is also equivalent to

^1(3[0-9]|5[012356789]|8[0256789]|7[0678])-?\d{4}-?\d{4}$

PHP real regular expression (1): verifying mobile phone number

However, in addition to the above expression can match18012341234and180-1234-1234In fact, it can match180-123412341801234-1234There are two forms.
If we just want to match18012341234and180-1234-1234In these two forms, you can use theBack reference:

^1(3[0-9]|5[012356789]|8[0256789]|7[0678])(-?)\d{4}\2\d{4}$

above\2Is a reverse reference, which matches the second pair of parentheses(...)Match to. The form of a back reference is\numIt refers to the content matched by the previous grouping in the regular expression.

PHP real regular expression (1): verifying mobile phone number

In the regular expression above, we use\2To reverse reference, however\1But it’s useless, so can we ignore those unnecessary groups? In regular expressionsNon capture packetIt can meet this demand:

^1(?:3[0-9]|5[012356789]|8[0256789]|7[0678])(-?)\d{4}\1\d{4}$

above(?:3[0-9]|5[012356789]|8[0256789]|7[0678])It’s a non capture group. The form of non capture is(?:...)After using the non capture packet, the matching result will no longer have the result matched by the packet.

The above reference to groups is based on the number of sub expressions. When regular expressions are complex or have too many numbers, it is a very painful thing to figure out the number of each group. Therefore, regular expressions provideNamed grouping

^1(?:3[0-9]|5[012356789]|8[0256789]|7[0678])(?P<separato>-?)\d{4}(?P=separato)\d{4}$

In the regular expression above(?P<separato>-?)It’s named grouping. The form of named grouping is(?P<name>...), named group reference use(?P=name)In the form of.

PHP real regular expression (1): verifying mobile phone number

5 Summary

So far, a robust regular expression for mobile phone number verification is complete. Although the function is very simple, it still involves many knowledge points in regular expressions.