Regular expressions – theoretical basis
Moved from personal blog, original siteRegular expressions – theoretical basis
-
What is a regular expression?
It can be called a rule. It is an object describing the pattern of characters. The letters and numbers in regular expressions are matched according to the literal meaning.Powerful string matching tool -
How to build a regular expression?
-
Use the regexp() constructor to create a regexp object
-
Special direct quantity syntax, defined as a character contained between a pair of slashes (/)
Example:var reg = /s$/;
, var reg = new RegExp(s$);
The two have exactly the same effect and are used to match all strings ending with the letter “s”.
Regular expression syntax of “text pattern”
1. Direct character
Non alphabetic character matching, these characters need to escape by using backslash (\) as prefix
Character | Match |
---|---|
Alphabetic and numeric characters | oneself |
\o |
Nul character |
\t |
Tab |
\n |
Line break |
\v |
vertical tab |
\f |
Page feed |
\r |
Carriage return |
\xnn |
Latin character specified by the hexadecimal number NN, for example:\x0A Equivalent to\n
|
\uxxxx |
The Unicode character specified by the hexadecimal number XXX, for example:\u0009 Equivalent to\t
|
\cX |
Control character ^ x, for example:\cJ Equivalent to newline\n
|
-
Punctuation with special meaning:
^
, $
, .
, *
, +
, ?
, =
, !
, :
, |
, \
, /
,(
,)
,[
,]
,{
, }
If you want to use the direct quantity of these characters to match in regular expressions, you must use the prefix “\”, other punctuation symbols (such as @ and quotation marks) have no special meaning, and match according to the literal meaning in regular expressions.
2. Character class
Square brackets, or, range
The character class is formed by putting the direct character into square brackets. A character class can match any character it contains.Character classes can use hyphens to represent character ranges.
Character | meaning |
---|---|
[...] |
Any character in square brackets |
[^...] |
Any character not in square brackets |
. |
Any character other than a newline that conforms to other Unicode line terminators |
\w |
Any word consisting of ASCII characters, equivalent to[a-zA-Z0-9]
|
\W |
Any word that is not an ASCII character is equivalent to[^a-zA-Z0-9]
|
\s |
Any Unicode white space character |
\S |
Any characters that are not Unicode white space, note that\W And\s Different |
\d |
Any ASCII number is equivalent to[0-9]
|
\D |
Any character other than an ASCII number, equivalent to[^0-9]
|
[\b] |
Detrended direct quantity (special case) |
-
Example:
-
/[a,b,c]/
: indicates that it matches any of the letters “a”, “B” and “C” -
/[a-z]/
: matches lowercase letters in the Latin alphabet -
/[/s/d]/
: matches any white space or number
-
3. Repeat
A token used to specify the repetition of characters
Character | meaning |
---|---|
{n,m} |
Match the previous item at least N times, but not more than m times |
{n,} |
Match the previous item n or more times |
{n} |
Match the previous item n times |
? |
Match the previous term 0 or 1 times (that is, the previous term is optional, equivalent to {0,1}) |
+ |
Match the previous term once or more (equivalent to {1,}) |
* |
Match the previous term 0 or more times, equivalent to {0,} |
-
Example:
-
/\d{2,4}/
: match 2-4 numbers. -
/\w{3}\d?/
: match exactly three words and an optional number. -
/\s+java\s+/
: string “Java” with one or more spaces before and after matching -
/[^(]*/
: matches 0 or more characters that are not left parentheses"("
(match)
-
-
in use
"*"
, and"?"
Note that since these characters can match 0 characters, they allow nothing to matchFor example:-
/a*/
It can match the string “BBBB” because it contains 0 a.
-
-
The matching characters listed in the table above are “greedy” matchesIt always matches as many as possible and allows subsequent regular expressions to continue to match.
-
Non greedy matching, just follow a question mark after the character to be matched,
"?"
,+?
,*?
or{1,5}?
. Example:-
/a+/
: when “AAA” is matched, all three characters are matched, -
/a+?/
: when “AAA” is matched, only the first a can be matched because it has as few matches as possible
-
4. Select – group and reference
Includes: specifying options, grouping self expressions, and special characters that refer to the previous subexpression
-
The order of trying to match the selected items is from left to right until a match is found. If the left side matches successfully, the items on the right side are ignored.
-
The role of parentheses in regular expressions:
-
Separate items are made into subexpressions so that they can be used as independent units
\|
,*
,+
perhaps?
And so on -
Defining sub patterns in a complete pattern
-
It is allowed to refer to the previous subexpression at the back of the same regular expression
-
Character | meaning |
---|---|
“|” | Select “|” to match the left or right subexpression of the symbol |
(...) |
To combine several items into a unit. This unit can be accessed through “|”,* ,+ perhaps? And you can remember the matching string for future reference |
(?:..) |
To combine items into a unit without remembering the characters that match the group |
\n |
It matches the first character of the nth group. The group is a subexpression in parentheses (it may also be nested). The group index is a left parenthesis number from left to right |
-
Example:
-
/ab|cd|ef/
: can match “ab”, can match “CD”, can match “EF” -
/\d{3}|[a-z]{4}/
Can match 3 numbers or 4 lowercase letters -
(/[a-z]+(\d+)/)
: matching one or more lowercase letters followed by one or more digits. After wrapping with brackets, you can extract digits from the retrieved matches
-
5. Specify the matching location
We call these elements anchors of regular expressions, which locate patterns at specific locations in the search string
Character | meaning |
---|---|
^ |
Matches the beginning of a string. In multi line retrieval, matches the beginning of a line |
$ |
Match the end of the string. In multi line retrieval, match the end of one line |
\b |
Match a word boundary, in short, the position between the character and the string, or the position between the character and the beginning or end of the string (Note: [[b] matches the backspace character) |
\B |
Matches the position of non word boundaries |
-
Example:
-
/^JavaScript$/
: match the word “JavaScript” -
/\B[Ss]cript/
: matches JavaScript and / postscript, but not script and script
-
6. Modifier
Explain the rules of high-level matching, put them outside the ‘/’ symbol and behind the second slash
Character | meaning |
---|---|
i |
Performs case insensitive matching |
g |
To perform a global match, in short, is to find all the matches, rather than stop after finding one |
m |
Multi line matching pattern, ^ matches the beginning of a line and the beginning of a string, $matches the end of a line and the end of a string |
-
Example:
-
/java$/im
: can match “Java” or “Java / NIS fun”
-
String method for pattern matching
Regular expressions are used to parse strings
1. search()
:
-
Find a string, accept a parameter as a regular expression, and return the starting position of a matching substring. If it is not found, return – 1
-
Example:
JavaScript.search (/ script / I); // returns 4
-
search()
Method does not support global property retrieval, so the modifier G is ignored
2. replace()
:
-
It is used to perform retrieval and replacement operations. It accepts two parameters. The first parameter is a regular one, and the second is the string to be replaced
-
If the first parameter is a string,
replace()
The string is searched directly -
If the$plus numberSo
replace()
Replace the two characters with text that matches the specified subexpression
Example:
text.replace (/ JavaScript / GI, "JavaScript"); // replace all case insensitive JavaScript with case correct JavaScript
//A quote begins with quotation marks and ends with quotation marks
//The middle content area cannot contain quotation marks
var quote = /"([^"]*)"/g;
//Replace English quotation marks with Chinese quotation marks, and keep the contents between quotation marks (stored in $1) unchanged
text.replace(quote,'“$1”');
3. match()
-
It takes the only parameter, that is, a regular expression, and returns an array of matching results
-
"1 plus 2 equals 3". Match (/ \ D + / g); // returns ["1", "2", "3"]
-
-
If the modifier G is not set in the regular expression,
match()
Instead of global retrieval, it retrieves only the first match. But even if match() does not perform a global search, it returns an array. In this case, the first element of the array is the matching string, and the remaining elements are the subexpressions enclosed in parentheses in the regular expression. -
Therefore, if mach() returns an array a, then a [0] stores the complete match, a [1] stores the substring that matches the expression enclosed in the first parenthesis, and so on. In order to keep the method replace (), a [n] stores the content of $n.
-
Example: parsing a URL
var url = /(\w+):\/\/([\w.]+)\/(\S*)/;
var text = "Visit my blog at http://www.example.com/~david";
var result = text.match(url);
if (result != null) {
Var fullurl = result [0]; // contains“ http://www.example.com/ "
Var protocol = result [1]; // contains "HTTP"
Var host = result [2]; // contains“ www.example.com "
Var path = result [3]; // contains ~ David
}
4. split()
-
Used to split the calling string into an array of substrings. The separator is
split()
Parameters of -
The parameter is a string
-
"123456789". Split (","); // returns ["123", "456", "789"]
-
-
The parameter is regular
-
"1,2,3,4,5". Split (/ [s *, [s * /) // returns ["1", "2", "3", "4", "5"]
-
The separator “,” is specified to allow any number of blank characters on both sides
-
Regexp object
-
Regexp object is used to create a new regexp object. It can take two parameters, the second one is optional
-
The first parameter of the regexp object is the body of the regular expression, which is
/...../
Both string direct quantity and regular expression use “\” as the prefix of escape character -
The second optional parameter is the modifier that specifies the regular expression,
i
,g
,m
Or a combination of them
Example:
Var reg = new regexp ("\ \ D {5}", "g"); // match 5 numbers in global mode
alert(reg.test("1J2a35786"));//true
alert(reg.test("1J2a356"));//false
Regexp property
Property name | type | meaning |
---|---|---|
source | Read only string | Text containing regular expressions |
global | Read only Boolean | Used to indicate whether a regular expression is decoratedg
|
ignoreCase | Read only Boolean | Used to indicate whether a regular expression is decoratedi
|
multiline | Read only Boolean | Used to indicate whether a regular expression is decoratedm
|
lastIndex | Integer for readability | If the matching pattern containsg This property is stored at the beginning of the next retrieval in the entire string |