In this paper, by gradually improving a regular expression to verify the mobile phone number, we introduce the application of regular expressionCharacter group、classifier、String start / end position、grouping、Selection structure in grouping、Back reference、Named groupingAnd so on.
1 basic verification
That is to verify whether the string is 11 digits.
expression

[0123456789]{11}

or
[09]{11}

or
\d{11}
Knowledge points
Character group:Using square brackets in regular expressions[...]
Represents a character group. Character groups represent characters that may appear at the same location.
For example,[0123456789]
Represents any one of the matching numbers 0123456789;[0123abc]
Match either the number 0123 or the letter ABC.
Range representation of character group:Use dashes in character groups（[....]
）To represent a range of characters.
For example,[az]
It means to match any one of all lowercase English letters;[azAZ]
Matches one of all lowercase and uppercase letters;[09]
Represents any one of the matches 0123456789.
Note that the default range is between the ACS II code of the starting character and the ACS II code of the ending character。
Abbreviation of character group:For some common character groups, regular expressions specify some shorthand symbols to represent them.

\d
All the numbers, i.e[09]

\D
All non numbers, and\d
mutex 
\w
All word characters (characters, numbers, underscores), i.e[09azAZ_]

\W
All non word characters, and\W
mutex 
\s
All white space characters, including spaces, tabs, carriage returns, line breaks and other white space characters 
\S
All non white space characters, and\s
mutex
classifier:A quantifier indicates the number of times the object it modifies (such as a character or a character group) may appear.
The general form of quantifier is{m,n}
(comma),
It means that the number of occurrences of the character (or character group) modified by it is greater than or equal to m times and less than or equal to N times. In particular

{m}
Indicates that the modified object can only appear m times; 
{0,n}
The modified object can appear n times at most and 0 times at least; 
{m,}
Indicates that the modified object appears at least m times.
The length of 2 can only be 11?
Looking at the code in the GIF below, you can see that when the input string is a number with a length of 15, it can also match the first 11 numbers. Even the input character isabcd180123412341234
It can also match to 11 numbers.
This is because the above regular expression means “match 11 numbers”as long asIn the input stringThere are 11 consecutive numbersYou can match it. To verify that the input string is just the phone number, you need to use the starting position of the string in the regular expression^
And the end of the string$
。
expression

^\d{11}$
Knowledge points
There are symbols in regular expressions that match position, not text. These symbols are calledAnchor point（anchor）。^
、$
That’s two of them.
^
The matching position is the beginning of the string$
The matching position is the end of the string
3. More rigorous verification
We all know that the common mobile phone numbers in China begin with 130139150153, 155159, 180, 182, 185189, and there are also 170, 176178, etc. The expression we got in the previous section does not validate the beginning of the phone.
expression
^1(3[09]5[012356789]8[0256789]7[0678])\d{8}$
Knowledge points
grouping:You can use parentheses in regular expressions(...)
Represents a group (subexpression), so that in the matching result, in addition to all the matching contents, the matching contents of each subexpression will be returned. As shown in the above figure, the 0 th element of the array is the value matched by the whole regular expression, and the 1 st element is the value matched by the regular expression in the pair of parentheses.
preg_match('/^1(3[09]5[012356789]8[0256789]7[0678])\d{8}$/', '18012341234', $arr);
print_r($arr);
/*
Array
(
[0] => 18012341234
[1] => 80
)
*/
Select structure:Pair of parentheses(...)
The subexpressions in are shown with vertical bars
The separation indicates different choices, and the entire regularity in parentheses can match any choice.
For example,(3[09]5[012356789]8[0256789]7[0678])
The matching value here can be3[09]
perhaps5[012356789]
perhaps8[0256789]
perhaps7[0678]
Any one of them.
The icing on the cake
Sometimes, there will be a number in the middle of the cell phone number
Symbol, become18012341234
For example, the current iPhone will automatically convert the phone number to this format.
According to some knowledge introduced so far, you can write the following regular expression to be compatible18012341234
In the form of:
^1(3[09]5[012356789]8[0256789]7[0678]){0,1}\d{4}{0,1}\d{4}$
among{0,1}
Representation character
It can appear once or not. This is the quantifier we have seen before. In fact, in regular expressions, for this kind of quantifierCommon quantifiersSpecial notation is also provided

?
amount to{0,1}
, can appear 0 or 1 times 
+
amount to{1,}
, the number of occurrences is greater than or equal to 1 
*
amount to{0,}
, the number of occurrences is greater than or equal to 0
Therefore, the above regular expression is also equivalent to
^1(3[09]5[012356789]8[0256789]7[0678])?\d{4}?\d{4}$
However, in addition to the above expression can match18012341234
and18012341234
In fact, it can match18012341234
、18012341234
There are two forms.
If we just want to match18012341234
and18012341234
In these two forms, you can use theBack reference:
^1(3[09]5[012356789]8[0256789]7[0678])(?)\d{4}\2\d{4}$
above\2
Is a reverse reference, which matches the second pair of parentheses(...)
Match to. The form of a back reference is\num
It refers to the content matched by the previous grouping in the regular expression.
In the regular expression above, we use\2
To reverse reference, however\1
But it’s useless, so can we ignore those unnecessary groups? In regular expressionsNon capture packetIt can meet this demand:
^1(?:3[09]5[012356789]8[0256789]7[0678])(?)\d{4}\1\d{4}$
above(?:3[09]5[012356789]8[0256789]7[0678])
It’s a non capture group. The form of non capture is(?:...)
After using the non capture packet, the matching result will no longer have the result matched by the packet.
The above reference to groups is based on the number of sub expressions. When regular expressions are complex or have too many numbers, it is a very painful thing to figure out the number of each group. Therefore, regular expressions provideNamed grouping：
^1(?:3[09]5[012356789]8[0256789]7[0678])(?P<separato>?)\d{4}(?P=separato)\d{4}$
In the regular expression above(?P<separato>?)
It’s named grouping. The form of named grouping is(?P<name>...)
, named group reference use(?P=name)
In the form of.
5 Summary
So far, a robust regular expression for mobile phone number verification is complete. Although the function is very simple, it still involves many knowledge points in regular expressions.