Notes on regular expressions (1)



A regular expression is a set of rules declared in advance to match characters in a string.

Basic grammar


In the pattern of regular expressions, some characters have special meanings, which are calledMetacharacter. Metacharacters are all for single character matching.

\wMatch the upper and lower case English characters and any one of the numbers 0 to 9 and underscores, which is equivalent to[a-zA-Z0-9_]

\WDoes not match the case of English characters and numbers 0 to 9 between any one, equivalent to[^a-zA-Z0-9_]

\sMatches any white space character, equivalent to[ \f\n\r\t\v]

\SMatches any non whitespace character, equivalent to[^\s]

\dMatches any single number between 0 and 9, equivalent to[0-9]

\DDoes not match any single number between 0 and 9, equivalent to[^0-9]

[\u4e00-\u9fa5]Match any single Chinese character (here Unicode code is used to represent Chinese characters)


The more common regular expression qualifier is double slash/regex/

The concept of atom

Atoms in regular expressions are divided into visible atoms and invisible atoms.

[ \f\n\r\t\v]They belong to invisible atoms, others represent visible atoms


\*Matches 0 to more than one metacharacter, equivalent to{0,}

?Match 0 to 1 metacharacters, equivalent to{0,1}

{n}Match n metacharacters

{n,}Match at least n metacharacters

{n,m}Match n to m metacharacters

\+Match at least 1 metacharacter, equivalent to{1,}


\bMatch word boundaries

^The string must start with the specified character

$The string must end with the specified character

Capture group

In regular expressions, use()Several units (which can be characters or regular expressions) are organized together to form a single unit.

In regular expression, the group is divided into capture group and non capture group.


Mode modification

Greedy / lazy, ignore case, ignore white space

Usage scenarios

Form validation, template engine