JS regular expression syntax

Time:2020-11-28

regular expression syntax

Regular expression is a text pattern composed of ordinary characters (such as numbers, letters, punctuation and metacharacters used to represent a specific character or character set) and special characters (characters used as qualifications or special functions).

Common characters

All numbers, lowercase letters, uppercase letters, punctuation are common characters.

Metacharacters are required to match characters other than those listed above, or to match any character in a specific range. Metacharacter is divided into direct quantity character and character class.

Direct quantity character

character describe
uXXXX The Unicode character specified by the hexadecimal number XXX
xNN The Latin character specified by the hexadecimal number NN
o Nul character (u0000)
t Tab (u0009)
n Line break (u000a)
v Vertical tab (u000b)
f Page break (u000c)
r Carriage return (u000d)

Character class metacharacter

Character class metacharacters are used to match characters in a specific range. A character class can match any character it contains.

character describe
. All single characters except line breaks and line terminators
w Equivalent to [0-9a-zA-Z]
W All characters except w
s Any Unicode whitespace
S Any non Unicode whitespace character (range larger than w)
d Equivalent to [0-9]
D All characters except D
[…] Any character in square brackets
1 Any character other than the character set in square brackets

Special characters

repeat

When several specified characters need to be matched, some special characters are needed to represent the number of times of repeated matching.

character describe
{n,m} Match at least N times, but not more than m times
{n,} Match at least N times, no upper limit
{n} Match n times
? Match 0 or 1 times, equivalent to {0,1}
+ Match 1 or more times, no upper limit, equivalent to {1,}
* Match 0 or more times, no upper limit, equivalent to {0,}

Repeat characters need to be used with normal characters, such as [A-Z] {3,5}, that is, matching 3 to 5 lowercase letters.

Anchor character

The anchor character is used when you want to match a string that begins or ends with the specified character.

character describe
^ Matches the beginning of a string. In multiline retrieval, matches the beginning of a line
$ Matches the end of a string. In multiline retrieval, matches the end of a line
(?=p) Zero width forward first assertion requires the following characters to match P, and the matching result does not contain P
(?!p) Zero width negative first assertion requires that the next character cannot match P
b Match a word boundary (need to be used with other expressions, use [b] alone for backspace)
B Match a non word boundary

Example: it is necessary to match strings starting with letters, underscores or $to verify whether variable names are legal. You can use / ^ [a-za-z]_ $].*/

Note: ^ indicates reverse matching inside square brackets.

The zero width first assertion can take any regular expression as the ending anchor. Q (? = P) means to match all Q strings ending with P. the matching result contains only the Q part, not the anchor P.

Example: to match a string ending with “. You can use / W + (? =) / g
Then the result of matching ‘a.b.c.d.e.f’ is [‘a ‘,’b’,’c ‘,’d’,’e ‘]. There is no.. After F, so it is not matched.

Q (?! P) matches all Q strings that do not end with P.

Note: Javascript does not support post assertion, that is, regular expressions cannot be used as start anchors.

B can be used to match the beginning or end of a word. Example: match the STR in the string with / BSTR /, and match the ing in the string with / ingb/

B can be used to match the middle part of a word. Example: match trin in string with / btrinb/

Select, group, and reference characters

character describe
| Select match, the expression on the left of “|” and the expression on the right can be matched
(…) Group matching, combining several items into a unit, which can be decorated with symbols such as “*”, “+”, “?”, “|”, “{n, m}”. And you can remember the characters that match the combination.
(?:…) Non capture group, only responsible for matching, do not remember the matching characters

The difference between (?:…) and (…) only exists with the matching result. When it is necessary to record a part of the match, use (…). If it is only used for matching, and the string matched by grouping is not needed, then (?:…) is used

Escape character

character describe
\

Regexp uses

Regexp instantiation method

Regexp can be instantiated in two ways. One is instantiated by literal and the other is instantiated by the new regexp() constructor

Var exp = / pattern / flags; // regexp literal
Var exp = new regexp ("pattern", "flags"); // regexp constructor

Flags are used to represent the behavior of regular expressions. The options are g, I and M. one or more identifications can be defined at the same time.

g:globalRepresents the global pattern. When the flag contains g, the expression matches all the strings that can be matched; if no G is included, the matching stops when the first string is matched.

i:case-insentive, indicating that case is ignored. When I is included in flags, the case of the string is ignored when the expression matches.

m:multi-line, representing multiple line matching. When the flag contains M, the expression matches to the end of a line of text and continues to find out if there is a match in the next line.

Pattern is a regular expression statement.

The advantage of the regexp constructor is that regular expressions can be passed in dynamically.

Regexp instance properties

global

Boolean value indicating whether the G flag is set.

ignoreCase

Boolean value indicating whether the I flag is set.

multiline

Boolean value indicating whether the m flag is set.

source

The string representation of a regular expression that is returned as a literal rather than as a string pattern passed into the constructor.

lastIndex

An integer that represents the character position to start searching for the next match, starting from 0.

For example:

var exp = /\[bc\]at/gi;
exp.global;//true
exp.ignoreCase;//true
exp.multiline;//false
exp.source;//“\[bc\]at”

Regexp instance method

exec()

The exec method takes a parameter, the string to which the pattern needs to be applied, and returns an array containing information about the first match, or returns NULL if there is no match. But it also contains two additional attributes: array and index.

Explain it sentence by sentence.

Returns an array containing information about the first match, or null if there is no match:

There are two cases, one is global matching, the other is non global matching.

After the global pattern performs exec() matching successfully once, when exec() is executed again, it will continue to match backward from the last bit of the previous match. For example:

var exp = /.at/g
var matches =  exp.exec ('cat, bat, sat, fat '); // first match
matches.index => 0
matches.input => 'cat, bat, sat, fat'
matches => ['cat']
exp.lastIndex => 3
matches =  exp.exec ('cat, bat, sat, fat '); // the second match
matches.index => 5
matches.input => 'cat, bat, sat, fat'
matches => ['bat']
exp.lastIndex => 8
...

After the exec() match is performed successfully in non global mode, when exec() is executed again, it will be re matched from the beginning.

var exp = /.at/
var matches =  exp.exec ('cat, bat, sat, fat '); // first match
matches.index => 0
matches.input => 'cat, bat, sat, fat'
matches => ['cat']
exp.lastIndex => 0
matches =  exp.exec ('cat, bat, sat, fat '); // the second match
matches.index => 0
matches.input => 'cat, bat, sat, fat'
matches => ['cat']
exp.lastIndex => 0

The returned array is an array instance:

When (…) is used to group matches, more than one result is matched at a time by exec()

var exp = /(there)\s+(you)\s+(are)/;
var matches = exp.exec('hey, there you are my dear');
matches = ["there you are", "there", "you", "are"];

Matches [0] matches the entire expression string.
Matches [1] matches the string within the first (…), and so on.

Supplementary note: the array returned by the expression containing (?:…) non capture group does not contain (?:…) matching string after executing exec. This is the difference between non capture group (?:…) and capture group (…).

Attributes index and input:
Index represents the starting position of the match in the string.
Input represents the string to which the regular expression is applied.

test()

The test method takes a parameter as the string to apply the pattern. If the pattern matches the parameter, it returns true, otherwise it returns false. This method is very convenient when you only want to know whether the target string matches a pattern, but do not need to know its text content. Therefore, the test () method is often used in if statements.

String method for pattern matching

String has four methods that can pass in regular expressions as arguments.

search()

str.search (EXP) returns the start of the first string in str that matches the exp expression. If no match is found, return – 1. If the parameter passed in to search is a string, it is first converted to a regular expression through the regexp constructor. Search does not support global search, it ignores the global identity.

'JavaScript'.search(/script/i) => 4

replace()

str.replace (exp, replacestr) the first parameter is a regular expression, and the second parameter is the string to be replaced with. If exp has a global identity, all substrings in the source character STR that match the exp expression are replaced. Without g, only the first substring in str that matches exp is replaced.

text.replace (/ JavaScript / GI, 'JS') // change all JavaScript in text to JS

In replacestr, if the string of $plus the number n appears, it represents the text matching the nth subexpression.

text.replace (/ '([^'] *) '/ g,' "$1 ') // replaces all substrings referenced by' 'in the full text of text.

match()

str.match (EXP) returns an array of matching results.

This array is different from the array returned by the regexp instance method exex(). When there is a global identifier, match returns an array of all strings that are globally matched; when there is no global identifier, match returns a single element array of the first matched strings.

'11+2=13'.match(/\d/g) => ['1','1','2','1','3']
'11+2=13'.match(/\d/) => ['1']

The match method will not return the substring matched by the capturing combination, and (…) grouping matching will not return the matched substring in the match method as the regexp instance method exec().

split()

When the parameter passed in by the split method is a regular expression, this makes the split () method extremely powerful. For example, you can specify a separator, allowing as many white spaces as you want on both sides:

'1   ,  ,2 , 3  ,   4  ,  5'.split(/\s*,\s*/) => ['1','2','3','4','5']

Recommended Today

Regular expression sharing for checking primes

This regular expression is shown as follows: Regular expressions for checking prime numbers or not To use this positive regular expression, you need to convert the natural number into multiple 1 strings. For example, 2 should be written as “11”, 3 should be written as “111”, 17 should be written as “11111111111”. This kind of […]