JavaScript regular expression regexp introductory text tutorial

Time:2020-9-22

What is a regular expression?

Regular expression, also known as “regular expression” (regexp), is a concept of computer science.

What’s the use of regular expressions?

It is often used to search for and replace text that matches a pattern.

The regular expression is:The best choice for matching special characters or characters with special collocation principles.

Escape character “\”

Example: adding a “character is not tenable under normal circumstances, but using the escape character in regular expression” can make it true;

Adding the escape character “\” before “can make the variable hold. In the figure, the escape character + double quotation mark is successfully changed into a text symbol. At this time,” asdf “ghjs can be output in the browser“

String newline character

Example: wrapping in var STR = asdfghj

Get the results

How to create regular expressions

1. Direct quantity

Var reg = / /; write between slashes

Var reg = / ABC /; indicates that a rule ABC is matched in var STR = / ABCDEF “; through the reg.test (STR) test whether STR contains the string specified by reg. If yes, it returns true, if not, it returns false

Write the attribute value (I, m, g) after / /

2.new RegExp()


The effect is the same as that of the direct quantity. If it contains the value in the regular expression, it returns true; if it does not, it returns false

Writable attribute value in regexp: regexp (“ABC”, parameter (I, G, m))

Three modifiers of regular expression: I, m, G

i: Case insensitive

m: Perform multi line matching

Var reg = / ^ A /; indicates that if the start character of the search is a, there is no character meeting this requirement in STR at this time. However, if multiple line matching is implemented in reg = / ^ A / M;, the newline character can be recognized. When there is a newline character, it is considered as a line before and a line after

g: Perform a global match (finds all matches instead of stopping after the first match is found)

Methods in regular expressions: reg.test (); returns the result true / false

Methods in variables: str.match (); the result can be returned, and the effect is more intuitive

expression

[]: can be written in the range of []

[^ ABC]: starts with ABC

[0-9]: range 0-9

[A-Z]: range A-Z

[A-Z] range A-Z

[A-Z] range A-Z, A-Z

|: indicates or


Metacharacter:

\w: Word character

\W: Non word characters

\d: Numbers

\D: Non digital

\s: White space characters (including: space character, tab character, carriage return character, line feed character, vertical line feed character, page feed character)

\S: Non white space characters

\n: Line break

\r: Carriage return

\b: Word boundary

\B: Non word boundary

\t: Tab

.: represents all characters except for

Quantifier (the following n is the word representing quantity)

N +: can appear 1 to countless times

N *: it can appear 0 to countless times, and the logical distance at the end is empty

n? 0 or a string can appear, and the logical distance is empty

N {x}: x n strings can appear

N {x, y}: matches strings containing x to y n (according to the greedy matching principle, more is more)

N {x,}: matches a string containing at least x n (according to the greedy matching principle, as many as possible)

^n: Start with n

N $: ends with n

Reegexp object properties:

Ignorecase: does regexp object contain modifier I

Global: does regexp object contain modifier G

Multiline: does regexp object contain modifier m

Source: display regular expression function body

Regular expression method:

Test: checks the value specified in the string. Return true / false

Exec: checks the value specified in the string. Returns a value and determines its location.

The position of the match changes with the start position of its cursor. When the cursor moves to the last bit, it will return to null. If it is executed again, it will be executed from the beginning

In the figure below

“Ab” does not return a value,

Index is the cursor position

Method of string object:

Match: find a match for one or more regular expressions. Returns a matching value.

Search: checks the value that matches the regular expression and returns the position of its cursor. If not, return – 1

Split: splits the string.

Replace: replaces the string that matches the regular expression.

Example: replace var STR = “AABB” with “bbaa”

1. var reg = /(\w)\1(\w)\2/g;

console.log(str.replace(reg,”$2$2$1$1″));

\\The output result is: “bbaa”

2. var reg = /(\w)\1(\w)\2/g;

console.log(str.replace(reg, function ($, $1, $2) {

return $2 + $2 + $1 + $1;

}));

\\The output result is: “bbaa”

Where $is the global of the regular expression, $1 is the first argument “(\ w) – 1”, and $2 is the second argument “(- W) – 2”;

Example: the first name becomes the first name;

Where $is the global value of the regular expression and $1 is the first argument ‘- (\ w) “

Forward prefetching (forward assertion): in regular rules, it only participates in qualification, not in selection

1. Check the string with B after var STR = “abaaaaa” in regular expression, but the character B will not be displayed when outputting;

var str = “abaaaa”;

Var reg = / a (? = b) / g; / / indicates that a is followed by B, but B participates in selection only in qualification

2. Check the string without B after var STR = abaaaaa in regular expression;

Var reg = / a (?! b) / g; / / means to find a string without B character after a;

Non greedy matching: greedy matching is the default in regular expressions, but we can change greedy matching into non greedy matching by methods. Add after any quantifier?

Reg = / N {1,} /; / / in this case, the number of n is between 1 and infinity. Under the effect of greedy matching, the number of values will be as large as possible;

Reg = / N (1,)? /; / / add after n (1,)? In this case, the value of regular matching will be based on the minimum number;

2. Some advanced rules in regular expressions

Greedy and non greedy in 2.1 matching times

When using special symbols to modify the matching times, there are several ways to make the same expression match different times, such as “{m, n}”, “{m,}”, “?”, “*”, “+”, and the specific matching times depend on the string to be matched. In the process of matching, the expression with indefinite number of times is always matched as much as possible. For example, for the text “dxxxdxxxd”, an example is as follows:

expression

Matching results
(d)(\w+) “W +” will match all characters “xxxdxxd” after the first “d”
(d)(\w+)(d) “W +” will match all characters “xxxdxxx” between the first “d” and the last “d”. Although “\ W +” can match the last “d”, in order to make the whole expression match successfully, “W +” can “give up” the last “d” that it could have matched

It can be seen from this that “﹥ W +” always matches as many characters as possible that conform to its rules. Although in the second example, it doesn’t match the last “D,” that’s also to make the whole expression match successfully. Similarly, expressions with “*” and “{m, n}” are matched as much as possible, and expressions with “?” should be matched as much as possible when they can be matched but not matched. This matching principle is called the “greedy” pattern.

Non greedy mode:

Adding a “?” sign after modifying the special symbol of matching times can make the expressions with indefinite matching times match as few as possible, and the expressions that can be matched but not matched can be “mismatched” as much as possible. This matching principle is called the “non greedy” pattern, also known as the “forced” pattern. If there is less matching, the whole expression matching will fail. Similar to greedy pattern, non greedy pattern will match a little more to make the whole expression match successful. For example, for the text “dxxxdxdxxd”:

expression

Matching results
(d)(\w+?) “W +?” will match as few characters as possible after the first “d”. The result is: “only one” X “is matched
(d)(\w+?)(d) In order to make the whole expression match successfully, “W +?” has to match “XXX” to match the “d” of the following edge, so that the whole expression matches successfully. Therefore, the result is: “W +?” matches “XXX”

More examples are as follows:

Example 1: when the expression “< td > (. *) < / td >” matches with the string “< td > < p > AA < / P > < td > < td > < p > BB < / P > < / td >”, the matching result is: success; the matching content is “< td > < p > AA < / P > < td > < td > < p > BB < / P > < / td >”, and the “< / td >” in the expression will match the last “< / td >” in the string.

Example 2: in contrast, when the expression “< td > (. *?) < / td >” matches the same string in example 1, it will only get “< td > < p > AA < / P > < / td >”, and when matching the next one again, you can get the second “< td > < p > BB < / P > < / td >”.

2.2 reverse references

When the expression matches, the expression engine will record the string that the expression contained in the bracket “()” matches. When getting the matching result, the string matched by the expression contained in the bracket can be obtained separately. This has been shown many times in the previous examples. In practical application, when a certain boundary is used to search, and the content to be obtained does not contain a boundary, the bracket must be used to specify the desired range. For example, < td > (. *?) < / td > “.

In fact, “the string matched by the expression contained in the parentheses” can be used not only after the matching, but also during the matching process. At the end of the expression, you can refer to the previous “string that has been matched by the sub matching in brackets”. The reference method is “\” plus a number. “- 1” refers to the string matched in the first pair of brackets; “2” refers to the string matched in the second pair of brackets By analogy, if a pair of parentheses contains another pair of parentheses, the outer brackets are numbered first. In other words, which pair of left parenthesis “(” first) is sorted first.

Examples are as follows:

Example 1: when the expression “(‘) (. *?) (1)” matches “‘Hello’,” world “, the matching result is: success; the content matched is:” hello “.”. When you match the next one again, you can match to “” world “”.

Example 2: when the expression “(\ w) 1 {4,}” matches “AA BBBB ABCDEFG CCCC 111121111 999999999”, the matching result is: successful; the content matched is “CCCC”. When you match the next one again, you get 9999999. This expression requires the characters in the range of “﹤ W” to be repeated at least 5 times. Note the difference between “﹤ 5,}”.

Example 3: the expression “< (\ W +) / s * (\ W + (= (‘). *?). *?) * >. *? < /? 1 >” in match “< TD id =? TD1’ style=“ bgcolor:white “> < / td >”, the matching result is successful. If “< td >” and “< / td >” do not match, the matching will fail; if you change to another pair, you can also match successfully.

2.3 pre search, mismatch; reverse pre search, mismatch

In the previous chapters, I talked about some special symbols that represent the abstract meaning: ^ “,” $”,” B “. They all have one thing in common, that is, they do not match any characters themselves, but attach a condition to “both ends of the string” or “gap between characters”. After understanding this concept, this section will continue to introduce another, more flexible representation of “two ends” or “gaps”.

Forward pre search: (? = xxxxx), “(?! xxxxx)”

The format is: (? = xxxxx). In the matched string, the additional condition for the “gap” or “two ends” is that the right side of the slot must be able to match the expression of XXXXX. Because it is only used as a condition attached to the gap, it does not affect the expression at the back to match the character after the gap. This is similar to the “\ \ B”, which itself does not match any characters. “B” just takes the characters before and after the gap to make a judgment, and does not affect the expression at the back to truly match.

Example 1: when the expression “windows (? = nt| XP)” matches “Windows 98, Windows NT, Windows 2000”, it will only match “windows” in “Windows NT”, and other words of “windows” will not be matched.

Example 2: the expression “(? W) ((? =? 1-1-1) (- 1)) +” can match the first four of six “F” and the first seven of nine “9” when matching the string “AAA ffffff 99999999”. This expression can be read as: repeat more than 4 times of alphanumeric, then match the part before the last two digits. Of course, this expression can not be written in this way. The purpose of this expression is for demonstration.

Format: (?! xxxxx) “, on the right side of the gap, must not match the expression of XXXXX.

Example 3: when the expression “(?! \ bstop / b) +” matches “fdjka ljfdl stop fjdsla FDJ”, it will match from the beginning to the position before “stop”. If there is no “stop” in the string, the whole string will be matched.

Example 4: when the expression “do (?)! W)” matches the string “done, do, dog”, it can only match “do”. In this example, the effect of using “(?! \ w)” after “do” is the same as that of “B”.

Reverse pre search: (? < = xxxxx), “(? <! Xxxxx)”

The concept of the two formats is similar to the forward pre search. The condition of reverse pre search is: the “left side” of the gap. The two formats require that the specified expression must be matched and must not match, instead of judging the right side. The same as “forward pre search”: they are all additional conditions for the gap, and they do not match any characters.

Example 5: when matching “1234567890123456”, the expression “(? < = \ D {4})” matches “1234567890123456”, it will match the middle 8 digits except the first four digits and the last four digits. because JScript.RegExp Reverse pre search is not supported, so this example cannot be demonstrated. Many other engines can support reverse pre search, such as Java 1.4 or above java.util.regex Package in. Net System.Text.RegularExpressions Namespace, as well as the site recommended the most simple and easy to use deelx regular engine.