JavaScript (E5, 6) regular learning summary learning, can see can not see!

Time:2019-12-15

1. overview

Regular expression is a method to express text pattern (string structure).

There are two ways to create:

One is to use literals, with slashes for start and end.

var regex = /xyz/

The other is to use the regexp constructor.

var regex = new RegExp('xyz'); 

The main difference between them is that the first method creates a new regular expression when the engine compiles the code, and the second method creates a new regular expression at runtime, so the former is more efficient. Moreover, the former is more convenient and intuitive, so in practical application, we basically use literal quantity to define regular expression.

2. Instance properties

  1. I: ignore case
  2. M: multiline mode
  3. G: global search

3. Example method

3.1 RegExp.prototype.test()

The test method of the regular instance object returns a Boolean value indicating whether the current pattern can match the parameter string.

/Xiaozhi /. Test ('xiaozhi lifelong learning executor ') // true

Reg.exec (STR) returns an array of matching results, and null if it doesn’t match. Every exec is executed, it will match backward once

3.2 RegExp.prototype.exec()

3.2.1 reg.exec (STR) returns an array of matching results, and returns NULL if it does not match. Every exec is executed, it will match backward once

var s = '_x_x';
var r1 = /x/;
var r2 = /y/;

r1.exec(s) // ["x"]
r2.exec(s) // null

3.2.1.2 if there are brackets () in the expression, it is called group matching. In the returned result, the first is the overall matching result, followed by the matching results of each bracket in turn

var s = '_x_x';
var r = /_(x)/;

r.exec(s) // ["_x", "x"]

The return array of the exec method also contains the following two properties:

  1. Input: the entire original string.
  2. Index: the starting position (counting from 0) of successful pattern matching.
var r = /a(b+)a/;
var arr = r.exec('_abbba_aba_');

arr // ["abbba", "bbb"]

arr.index // 1
arr.input // "_abbba_aba_"

3.2.3 if there is the G option in the expression for global search, exec can be used multiple times, and the next match starts from the last result

var reg = /a/g;
var str = 'abc_abc_abc'

var r1 = reg.exec(str);
r1 // ["a"]
r1.index // 0
reg.lastIndex // 1

var r2 = reg.exec(str);
r2 // ["a"]
r2.index // 4
reg.lastIndex // 5

var r3 = reg.exec(str);
r3 // ["a"]
r3.index // 8
reg.lastIndex // 9

var r4 = reg.exec(str);
r4 // null
reg.lastIndex // 0


4. Instance method of string

4.1 str.match (reg), similar to reg.exec, but if you use the G option, str.match returns all results at once.

var s = 'abba';
var r = /a/g;

s.match(r) // ["a", "a"]
r.exec(s) // ["a"]

4.2 str.search (reg), which returns the first place where the match is successful, or – 1 if there is no match.

'_x_x'.search(/x/)
// 1

4.3 str.replace(reg,newstr) ;

Use the first parameter Reg to match, and use the second parameter newstr to replace. If the regular expression does not add the G modifier, replace the first matching successful value, otherwise replace all matching successful values.

'aaa'.replace('a', 'b') // "baa"
'aaa'.replace(/a/, 'b') // "baa"
'aaa'.replace(/a/g, 'b') // "bbb" 

4.4 str.split (reg [, MaxLength]) is cut with a matching pattern. The second parameter is to limit the maximum number of returned results

5. Matching rules

5.1 literal characters and metacharacters

Most characters are literal in regular expressions, such as / A / matches a, and / B / matches B. If, in a regular expression, a character represents only its literal meaning (like the preceding a and b), they are called“Literal character”(literal characters)。
In addition to literal characters, there are also some characters with special meanings, which do not represent literal meanings. They are called metacharacters, and there are mainly the following.

(1) dot character (.)
The dot character (.) matches all characters except carriage return (R), line feed (n), line separator (u2028), and segment separator (u2029).

/c.t/

In the above code, C.T matches the case that there is any one character between C and T, as long as the three characters are on the same line, such as cat, c2t, C-T, etc., but they do not matchcoot

(2) position character

  • ^Represents the beginning of a string
  • $Represents the end of a string
//Test must appear at the start
/^test/.test('test123') // true

//Test must appear at the end
/test$/.test('new test') // true

//Only test from start to end
/^test$/.test('test') // true
/^test$/.test('test test') // false

(3) selector (|)

The vertical bar symbol (|) represents “or relationship” (or) in the regular expression, that is, cat| “dog” represents matching cat or dog.

/11|22/.test('911') // true

In the above code, the regular expression specifies that it must match 11 or 22.

5.2 escape character

If you want to match metacharacters with special meanings in regular expressions, you need to add a backslash before them. For example, to match +, you need to write +.

/1+1/.test('1+1')
// false

/1\+1/.test('1+1')
// true

In regular expressions, there are 12 characters in total that need to be escaped by backslash: ^,., [, $, (,), |, *, +,?, {and. In particular, if you use the regexp method to generate regular objects, you need to use two slashes to escape, because the inside of the string will be escaped once first.

(new RegExp('1\+1')).test('1+1')
// false

(new RegExp('1\+1')).test('1+1')
// true

5.3 character class

A character class indicates that there is a series of characters to choose from, as long as one of them is matched. All the optional characters are placed in square brackets, for example, [x y z] means any match among x, y and Z.

/[abc]/.test('hello world') // false
/[abc]/.test('apple') // true

There are two characters that have special meanings in character classes.

(1) caret (^)
If the first character in square brackets is[^xyz]It means that except x, y and Z, it can match:

/[^abc]/.test('hello world') // true
/[^abc]/.test('bbc') // false

If there are no other characters in square brackets, that is, only [^], it means that all characters are matched, including line breaks. In contrast, the period as a metacharacter (.) does not include line breaks.

var s = 'Please yes\nmake my day!';

s.match(/yes.*day/) // null
s.match(/yes[^]*day/) // [ 'yes\nmake my day']

In the above code, the string s contains a line break, and the dot does not include the line break, so the first regular expression fails to match; the second regular expression [^] contains all characters, so the match is successful.

(2) hyphen (-)

In some cases, for consecutive sequence characters, hyphens (-) are used to provide a shorthand for the continuous range of characters. For example, [ABC] can be written as [a-c], [0123456789] can be written as [0-9], similarly [A-Z] represents 26 capital letters.

/a-z/.test('b') // false
/[a-z]/.test('b') // true 

The following are all legal shorthand forms of character classes.

[0-9.,]
[0-9a-fA-F]
[a-zA-Z0-9-]
[1-31]

The last character class [1-31] in the above code does not represent 1 to 31, but only represents 1 to 3.

In addition, don’t overuse hyphens, set a large range, otherwise it is likely to select unexpected characters. The most typical example is [A-Z]. On the surface, it selects 52 letters from uppercase A to lowercase Z, but because there are other characters between uppercase and lowercase letters in ASCII coding, unexpected results will appear.

/[A-z]/.test('\') // true

In the above code, because the ASCII code of the backslash (”) is between upper and lower case letters, the result will be selected.

5.4 predefined modes

Predefined patterns are shorthand for some common patterns.

  • D matches any number between 0-9, equivalent to[0-9]
  • D matches all characters except 0-9, equivalent to[^0-9]
  • W matches any letter, number, and underscore, equivalent to[A-Za-z0-9_]
  • W characters other than all letters, numbers and underscores, equivalent to[^A-Za-z0-9_]
  • S matches spaces (including line breaks, tabs, spaces, etc.), equal to[ \t\r\n\v\f]
  • S matches characters that are not spaces, equivalent to[^ \t\r\n\v\f]
  • B the boundary of the match word.
  • B matches the non word boundary, that is, within the word.
//Example of \ s
/\s\w*/.exec('hello world') // [" world"]

//Example of \ B
/\bworld/.test('hello world') // true
/\bworld/.test('hello-world') // true
/\bworld/.test('helloworld') // false

//Example of \ B
/\Bworld/.test('hello-world') // false
/\Bworld/.test('helloworld') // true

In general, a regular expression stops matching when it encounters a newline (n).

var html = "<b>Hello</b>\n<i>world!</i>";

/.*/.exec(html)[0]
// "<b>Hello</b>"

In the above code, the string HTML contains a newline character, and the result dot character (.) does not match the newline character, resulting in the matching result may not conform to the original meaning. Using the s character class, you can include line breaks.

var html = "<b>Hello</b>\n<i>world!</i>";

/[\S\s]*/.exec(html)[0]
// "<b>Hello</b>\n<i>world!</i>"

In the above code, [SS] refers to all characters.

5.5 repetition

The exact number of matches for the pattern, represented by curly braces ({}). {n} means exactly n repetitions, {n,} means at least n repetitions, {n, m} means no less than n repetitions and no more than m repetitions.

/lo{2}k/.test('look') // true
/lo{2,5}k/.test('looook') // true

In the above code, the first mode specifies that o occurs twice in a row, and the second mode specifies that o occurs between two and five times in a row.

5.6 quantifier sign

*. ?The question mark indicates that a mode occurs 0 or 1 times, which is equivalent to{0, 1}
*. *An asterisk indicates that a pattern occurs 0 or more times, which is equivalent to{0,}
*. +The plus sign indicates that a mode occurs once or more times, which is equivalent to{1,}

5.7 greedy mode

The three quantifiers in the previous section are all the maximum possible matches by default, that is, until the next character does not meet the matching rules. This is called the greedy model.

var s = 'aaa';
s.match(/a+/) // ["aaa"]

In the above code, the pattern is / A + /, which means that one or more a’s are matched. How many a’s are matched? Because the default is greedy mode, it will match until the character a does not appear, so the matching result is 3 a.

If you want to change the greedy mode to non greedy mode, you can add a question mark after the quantifier.

var s = 'aaa';
s.match(/a+?/) // ["a"]

In addition to the plus sign of non greedy pattern, there are also the asterisk (*) of non greedy pattern and the question mark (?) of non greedy pattern

  • +?: indicates that a pattern occurs once or more times, and non greedy pattern is used for matching.
  • *?: indicates that a pattern occurs 0 or more times, and non greedy pattern is used for matching.
  • ??: 0 or 1 occurrence of a pattern in the table, non greedy pattern is used for matching.

5.8 group matching

(1) overview
The parentheses of regular expressions indicate grouping matching, and the patterns in parentheses can be used to match the contents of grouping.

/fred+/.test('fredd') // true
/(fred)+/.test('fredfred') // true

In the above code, the first pattern has no brackets, the result + only represents the repeated letter D, and the second pattern has brackets, and the result + indicates that the word “free” is matched.

Here is another example of packet capture.

var m = 'abcabc'.match(/(.)b(.)/);
m
// ['abc', 'a', 'c']   

In the above code, the regular expression / (.) B (.) / uses two brackets, the first bracket captures a, and the second bracket captures C.

Note that when using group matching, the G modifier should not be used at the same time, otherwise the match method will not capture the contents of the group.

var m = 'abcabc'.match(/(.)b(.)/g);
m // ['abc', 'abc']

Within a regular expression, you can also use n to refer to the content matched by brackets. N is a natural number starting from 1, representing the brackets in the corresponding order.

/(.)b(.)b/.test("abcabc")
// true

In the above code, 1 represents the content matched by the first bracket (i.e. a), and 2 represents the content matched by the second bracket (i.e. C).

(2) non capture group
(?:x)It is called non capturing group, which means that the matching content of the group is not returned, that is, the bracket is not included in the matching result.

If you want to match foo or foofoo, the regular expression should be written as / (foo) {1, 2} /, but this will occupy a group match. In this case, you can use the non capture group to change the regular expression to / (?: foo) {1, 2} /, which has the same effect as the previous regular expression, but does not output the contents inside the brackets separately.

var m = 'abc'.match(/(?:.)b(.)/);
m // ["abc", "c"]

The pattern in the above code uses two parentheses. The first parenthesis is a non capture group, so there is no first parenthesis in the final returned result, only the content matched by the second parenthesis.

(3) advance assertion

X (? = y) is called positive look ahead. X matches only before y, and Y is not included in the returned result. For example, to match a number followed by a percent sign, write / D + (? =%) /.

The bracketed part of “antecedent assertion” will not be returned.

var m = 'abc'.match(/b(?=c)/);
m // ["b"]

The above code uses the antecedent assertion. B is matched before C, but the C corresponding to the bracket is not returned.

(4) prior negative assertion
X (?! y) is called negative look ahead. X matches only when it is not before y, and Y is not counted in the returned result. For example, to match a number that is not followed by a percent sign, write / D + (?!%) /.

/\d+(?!\.)/.exec('3.14')
// ["14"]

In the above code, the regular expression specifies that only numbers that are not before the decimal point will be matched, so the returned result is 14.

6. actual combat

6.1 eliminating spaces at the beginning and end of a string

  var str = '  #id div.class  ';
   str.replace(/^\s+|\s+$/g, '')
   // "#id div.class"

6.2 verify mobile number

var reg = /1[24578]\d{9}/;

reg.test('154554568997'); //true
reg.test('234554568997'); //false

6.3 replace mobile phone number with*

var reg = /1[24578]\d{9}/;

Var STR = 'Name: Zhang San Mobile: 18210999999 gender: Male';

Str.replace (reg, '* *') / "Name: Zhang San mobile phone: * * * gender: male"

6.4 match page labels

Var strhtlm = 'Xiaozhi Xiaozhi < div > 222222 @. QQ. Com < / div > Xiaozhi';

var reg = /<(.+)>.+<\/>/;

strHtlm.match(reg); // ["<div>[email protected]</div>"]

6.5 replace sensitive words

Let STR = the Communist Party of China, the people's Liberation Army, the people's Republic of China;

Let r = str.replace (/ Chinese Army / g, input = >{
    let t = '';
    for (let i = 0; i<input.length; i++) {
        t += '*';
    }
    return t;
})
 
Console.log (R); // * * Communist Party * * People's Liberation * People's Republic of China

6.6 thousand separator

let str = '100002003232322';

let r = str.replace(/(\d)(?=(?:\d{3})+$)/g, '$1,');

console.log(r); //100,002,003,232,322

Reference link

https://developer.mozilla.org…

https://wangdoc.com/javascrip…

Your praise is the driving force for me to continue to share good things. Welcome to praise!

A stupid farmer, my world can only learn for life!

More content, please pay attention to the public number “big move the world”!