regular expression syntax
Regular expression is a text pattern composed of ordinary characters (such as numbers, letters, punctuation and metacharacters used to represent a specific character or character set) and special characters (characters used as qualifications or special functions).
Common characters
All numbers, lowercase letters, uppercase letters, punctuation are common characters.
Metacharacters are required to match characters other than those listed above, or to match any character in a specific range. Metacharacter is divided into direct quantity character and character class.
Direct quantity character
character | describe |
---|---|
uXXXX | The Unicode character specified by the hexadecimal number XXX |
xNN | The Latin character specified by the hexadecimal number NN |
o | Nul character (u0000) |
t | Tab (u0009) |
n | Line break (u000a) |
v | Vertical tab (u000b) |
f | Page break (u000c) |
r | Carriage return (u000d) |
Character class metacharacter
Character class metacharacters are used to match characters in a specific range. A character class can match any character it contains.
character | describe |
---|---|
. | All single characters except line breaks and line terminators |
w | Equivalent to [0-9a-zA-Z] |
W | All characters except w |
s | Any Unicode whitespace |
S | Any non Unicode whitespace character (range larger than w) |
d | Equivalent to [0-9] |
D | All characters except D |
[…] | Any character in square brackets |
1 | Any character other than the character set in square brackets |
Special characters
repeat
When several specified characters need to be matched, some special characters are needed to represent the number of times of repeated matching.
character | describe |
---|---|
{n,m} | Match at least N times, but not more than m times |
{n,} | Match at least N times, no upper limit |
{n} | Match n times |
? | Match 0 or 1 times, equivalent to {0,1} |
+ | Match 1 or more times, no upper limit, equivalent to {1,} |
* | Match 0 or more times, no upper limit, equivalent to {0,} |
Repeat characters need to be used with normal characters, such as [A-Z] {3,5}, that is, matching 3 to 5 lowercase letters.
Anchor character
The anchor character is used when you want to match a string that begins or ends with the specified character.
character | describe |
---|---|
^ | Matches the beginning of a string. In multiline retrieval, matches the beginning of a line |
$ | Matches the end of a string. In multiline retrieval, matches the end of a line |
(?=p) | Zero width forward first assertion requires the following characters to match P, and the matching result does not contain P |
(?!p) | Zero width negative first assertion requires that the next character cannot match P |
b | Match a word boundary (need to be used with other expressions, use [b] alone for backspace) |
B | Match a non word boundary |
Example: it is necessary to match strings starting with letters, underscores or $to verify whether variable names are legal. You can use / ^ [a-za-z]_ $].*/
Note: ^ indicates reverse matching inside square brackets.
The zero width first assertion can take any regular expression as the ending anchor. Q (? = P) means to match all Q strings ending with P. the matching result contains only the Q part, not the anchor P.
Example: to match a string ending with “. You can use / W + (? =) / g
Then the result of matching ‘a.b.c.d.e.f’ is [‘a ‘,’b’,’c ‘,’d’,’e ‘]. There is no.. After F, so it is not matched.
Q (?! P) matches all Q strings that do not end with P.
Note: Javascript does not support post assertion, that is, regular expressions cannot be used as start anchors.
B can be used to match the beginning or end of a word. Example: match the STR in the string with / BSTR /, and match the ing in the string with / ingb/
B can be used to match the middle part of a word. Example: match trin in string with / btrinb/
Select, group, and reference characters
character | describe |
---|---|
| | Select match, the expression on the left of “|” and the expression on the right can be matched |
(…) | Group matching, combining several items into a unit, which can be decorated with symbols such as “*”, “+”, “?”, “|”, “{n, m}”. And you can remember the characters that match the combination. |
(?:…) | Non capture group, only responsible for matching, do not remember the matching characters |
The difference between (?:…) and (…) only exists with the matching result. When it is necessary to record a part of the match, use (…). If it is only used for matching, and the string matched by grouping is not needed, then (?:…) is used
Escape character
character | describe |
---|---|
\ |
Regexp uses
Regexp instantiation method
Regexp can be instantiated in two ways. One is instantiated by literal and the other is instantiated by the new regexp() constructor
Var exp = / pattern / flags; // regexp literal
Var exp = new regexp ("pattern", "flags"); // regexp constructor
Flags are used to represent the behavior of regular expressions. The options are g, I and M. one or more identifications can be defined at the same time.
g:globalRepresents the global pattern. When the flag contains g, the expression matches all the strings that can be matched; if no G is included, the matching stops when the first string is matched.
i:case-insentive, indicating that case is ignored. When I is included in flags, the case of the string is ignored when the expression matches.
m:multi-line, representing multiple line matching. When the flag contains M, the expression matches to the end of a line of text and continues to find out if there is a match in the next line.
Pattern is a regular expression statement.
The advantage of the regexp constructor is that regular expressions can be passed in dynamically.
Regexp instance properties
global
Boolean value indicating whether the G flag is set.
ignoreCase
Boolean value indicating whether the I flag is set.
multiline
Boolean value indicating whether the m flag is set.
source
The string representation of a regular expression that is returned as a literal rather than as a string pattern passed into the constructor.
lastIndex
An integer that represents the character position to start searching for the next match, starting from 0.
For example:
var exp = /\[bc\]at/gi;
exp.global;//true
exp.ignoreCase;//true
exp.multiline;//false
exp.source;//“\[bc\]at”
Regexp instance method
exec()
The exec method takes a parameter, the string to which the pattern needs to be applied, and returns an array containing information about the first match, or returns NULL if there is no match. But it also contains two additional attributes: array and index.
Explain it sentence by sentence.
Returns an array containing information about the first match, or null if there is no match:
There are two cases, one is global matching, the other is non global matching.
After the global pattern performs exec() matching successfully once, when exec() is executed again, it will continue to match backward from the last bit of the previous match. For example:
var exp = /.at/g
var matches = exp.exec ('cat, bat, sat, fat '); // first match
matches.index => 0
matches.input => 'cat, bat, sat, fat'
matches => ['cat']
exp.lastIndex => 3
matches = exp.exec ('cat, bat, sat, fat '); // the second match
matches.index => 5
matches.input => 'cat, bat, sat, fat'
matches => ['bat']
exp.lastIndex => 8
...
After the exec() match is performed successfully in non global mode, when exec() is executed again, it will be re matched from the beginning.
var exp = /.at/
var matches = exp.exec ('cat, bat, sat, fat '); // first match
matches.index => 0
matches.input => 'cat, bat, sat, fat'
matches => ['cat']
exp.lastIndex => 0
matches = exp.exec ('cat, bat, sat, fat '); // the second match
matches.index => 0
matches.input => 'cat, bat, sat, fat'
matches => ['cat']
exp.lastIndex => 0
The returned array is an array instance:
When (…) is used to group matches, more than one result is matched at a time by exec()
var exp = /(there)\s+(you)\s+(are)/;
var matches = exp.exec('hey, there you are my dear');
matches = ["there you are", "there", "you", "are"];
Matches [0] matches the entire expression string.
Matches [1] matches the string within the first (…), and so on.
Supplementary note: the array returned by the expression containing (?:…) non capture group does not contain (?:…) matching string after executing exec. This is the difference between non capture group (?:…) and capture group (…).
Attributes index and input:
Index represents the starting position of the match in the string.
Input represents the string to which the regular expression is applied.
test()
The test method takes a parameter as the string to apply the pattern. If the pattern matches the parameter, it returns true, otherwise it returns false. This method is very convenient when you only want to know whether the target string matches a pattern, but do not need to know its text content. Therefore, the test () method is often used in if statements.
String method for pattern matching
String has four methods that can pass in regular expressions as arguments.
search()
str.search (EXP) returns the start of the first string in str that matches the exp expression. If no match is found, return – 1. If the parameter passed in to search is a string, it is first converted to a regular expression through the regexp constructor. Search does not support global search, it ignores the global identity.
'JavaScript'.search(/script/i) => 4
replace()
str.replace (exp, replacestr) the first parameter is a regular expression, and the second parameter is the string to be replaced with. If exp has a global identity, all substrings in the source character STR that match the exp expression are replaced. Without g, only the first substring in str that matches exp is replaced.
text.replace (/ JavaScript / GI, 'JS') // change all JavaScript in text to JS
In replacestr, if the string of $plus the number n appears, it represents the text matching the nth subexpression.
text.replace (/ '([^'] *) '/ g,' "$1 ') // replaces all substrings referenced by' 'in the full text of text.
match()
str.match (EXP) returns an array of matching results.
This array is different from the array returned by the regexp instance method exex(). When there is a global identifier, match returns an array of all strings that are globally matched; when there is no global identifier, match returns a single element array of the first matched strings.
'11+2=13'.match(/\d/g) => ['1','1','2','1','3']
'11+2=13'.match(/\d/) => ['1']
The match method will not return the substring matched by the capturing combination, and (…) grouping matching will not return the matched substring in the match method as the regexp instance method exec().
split()
When the parameter passed in by the split method is a regular expression, this makes the split () method extremely powerful. For example, you can specify a separator, allowing as many white spaces as you want on both sides:
'1 , ,2 , 3 , 4 , 5'.split(/\s*,\s*/) => ['1','2','3','4','5']
- … ↩