Deep understanding of JavaScript – regular expressions

Time:2021-6-22

regular expression

Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used toRegExpOfexecandtextMethods, andStringInmatchmatchAllreplace search andsplitmethod

Create expression

Literal

Use two/Create a regular expression directly, with a slash to indicate the beginning and end

var	reg	=	/ab/g

When the script is loaded, the regular expression literal is compiled. When the regular expression is kept unchanged, better performance can be obtained by using this method.

RegExpConstructors

var reg = new RegRxp('ab','g')
//Equivalent to var reg = / AB / g

Literal quantity creates a modifier after the end of the slash and the second argument of the constructor.

The above two ways of writing are to add a new object in the regular expression. The difference is that when the first re engine compiles code, it creates a new regular expression. The second method creates a new expression at run time, so literal quantity is more efficient. And the literal quantity is more convenient and intuitive, basically will use literal quantity to define regular expression.

Instance properties

Regular modifier related instance properties (read only)

  • ignoreCase: returns a Boolean value indicating whether it is setiModifier
  • global: returns a Boolean value indicating whether it is setgModifier
  • multiline: returns a Boolean value indicating whether it is setmModifier
  • flags: returns a string containing all the modifiers for the setting

Not related to modifiers. Strength attribute:

  • lastIndex: returns a certificate indicating the location of the next search
  • source: returns the string form of a regular expression, read-only
var reg = /abc/gim
//Modifier related properties
reg.ignoreCase  //true
reg.global  //true
reg.multiline  //true
reg.flags   //gim
//Modifier independent properties
reg.lastIndex //0
reg.source    //abc

Example method

Regular instance method

test()

Test the match in the string and return the valuetrue orfalse

var reg = /av/g
var s = 'avbabc'
reg.test(s)  //true

reg.lastIndex = 2
reg.test(s) //false

When a regular expression has agModifier every timetestMethods will match backwards from the last ending position. You can uselastIndexView current location

var reg = /av/g
var s = 'avbavabc'

reg.lastIndex //0
reg.test(s)		//true

reg.lastIndex	//2
reg.test(s)		//true

reg.lastIndex //5
reg.test(s)		//false

If the regular expression is an empty string, it matches all the strings and returnstrue

exec()

In the string to find matching characters, return an array, not matching to returnnull
execMethod returns an array containing two properties:

  • input: the entire original string
  • index: start position index of successful pattern matching
var reg = /av/g
var s = 'avbavabc'

reg.exec(s)   //["av", index: 0, input: "avbavabc", groups: undefined]
reg.exec(s)		//["av", index: 3, input: "avbavabc", groups: undefined]
reg.exec(s)		//null

Like test, when a regular expression has agModifier every timeexecMethods will match backwards from the last ending position. You can uselastIndexView current location

When a regular expression contains()When group matching, the returned array contains multiple matching data. The first is the successful result of the whole regular matching. The second is the matching result in brackets. If there are multiple brackets, the third is the matching content in the second bracket. And so on.

var reg = /a(v)/g
var s = 'avbavabc'

reg.exec(s)		//[ 'av', 'v', index: 0, input: 'avbavabc', groups: undefined ]
reg.exec(s) 	//[ 'av', 'v', index: 3, input: 'avbavabc', groups: undefined ]
reg.exec(s)  	//null

Multiple()

var reg = /a(v)(b)/g
var s = 'avbavabc'

reg.exec(s) // [ 'avb', 'v', 'b', index: 0, input: 'avbavabc', groups: undefined ]
reg.exec(s) //null

String instance method

match()

In the string to find matching characters, return an array, not matching to returnnull
When a regular expression does not have agThe modifier is to return an array withindexandiuputattribute

var reg = /ac/
var s = 'acbacvabc'
var s1 = 'aabaavabc'

s.match(reg)  //[ 'ac', index: 0, input: 'acbacvabc', groups: undefined ]
s1.match(reg) //null

Regular expressions withgModifier, the method returns all the result arrays that match successfully at one time. No longer withindexandinputattribute

var reg = /ac/g
var s = 'acbacvabc'

s.match(reg) //[ 'ac', 'ac' ]

Note: set thelastindexAttribute pairmatchThe method is invalid,matchMethod always matches the first string.

matchAll()

Performs a search in a string for all matching characters and returns an iterator. Note that when usingmatchAllThe regular expression needs to have agModifier, otherwise an error will be reported.

var reg = /a/g
var s = 'acbacvabc'


arr = [...s.matchAll(reg)]
console.log(arr)
//Output:
/**
[
  [ 'a', index: 0, input: 'acbacvabc', groups: undefined ],
  [ 'a', index: 3, input: 'acbacvabc', groups: undefined ],
  [ 'a', index: 6, input: 'acbacvabc', groups: undefined ]
]
**/

search()

To find the matching character in the string, return the position of the first matching character, and return – 1 if it does not match

var reg = /en/g
var reg1 = /yo/g
var s = 'yuwenbo'

s.search(reg) //3
s.search(reg1)	//-1

replace()

Find the matching character in the string and replace the matching substring with the replacement string. Two parameters, one is the regular expression, the other is the content to be replaced.

If notgModifier to replace only the first successful match. If sogModifier, replace all values that match successfully.

var s = 'i love you'
console.log(s.replace(/\s/, '❤'))  //i❤love you
console.log(s.replace(/\s/g, '❤')) //i❤love❤you

replaceThe second parameter can be used$Symbol, which is used to make the replacement more convenient

  • $&: substring to match
  • `$’: matches the text in front of the result
  • $': matches the text following the result
  • $n: match the nth group of contents successfully. N is a natural number starting from 1
  • $$: dollar symbol$
console.log('he llo'.replace(/(\w+)\s(\w+)/, '$2 $1')) //llo he
console.log('hello'.replace(/e/, '-$`-$&-$\'-')) //h-h-e-llo-llo

replaceThe second parameter of can also be used as a function to replace each regular matching content with the return value of the function

The function can accept multiple parameters. The first parameter is the matched content, followed by the group matching content (there can be multiple group matching), the penultimate parameter is the position of the matching content in the string, and the penultimate parameter is the original string.

console.log('hello'.replace(/e/, function (match, index, str) {
 console.log(match, index, str)
 return '❤'
}))

//e 1 hello
//h❤llo

split()

Use regular expression or a fixed string to split a string, and store the split substring in the array
This method can accept two parameters. The first parameter is a regular expression, which represents the segmentation rule. The second parameter is the maximum number of members of the returned array

str = 'ni hao ya.hei hei hei'
str.split(/ |\./, 5) //[ 'ni', 'hao', 'ya', 'hei', 'hei' ]

Conclusion:

To determine whether a string is matched, usetestperhapssearchmethod
For more information, useexecOr,matchThe method will be slow.

Modifier (flag)

The modifier represents the additional rule, which is placed at the end of the regular pattern. It can be used individually or together.

//Single modifier
'abAbab'.match(/a/g)  //["a","a"] 

//Use multiple modifiers together 
'abAbab'.match(/a/gi)  //["a", "A", "a"]

gModifier

Global search. By default, it only matches once, and then stops matching. With modifiers, it will search downward all the time

iModifier

By default, matching strings are case sensitive

mModifier

Multi line search, multi line mode, can be modified^and$act
By default,^and$Matches the beginning and end of a string
addmModifier,^and$It also matches the beginning and end of the line, that is^and$Line breaks are recognized\n

For example:

  • /yewen$/m.test('hi yuwen\n')bytrue 
  • /yewen$/.test('hi yuwen\n')byfalse

sModifier

allow.Match line breaks

uModifier

useunicodeCode pattern matching

yModifier

The sticky search match starts at the current position of the target string

Special characters

\character

Escape character
Regular expressions need to match the special character itself, need to be followed by a backslash\
In regular expressions, backslashes need to be escaped^,.,[,$,(,),|,*,+,?,{,\

^character

Match start position
If the multiline flag is set, the position after the newline character is also matched

For example:/^A/It will match"Ant"InA, but it won’t match"ntA"InA

$character

Match end position
If the multiline flag is set, the position before the newline character is also matched

For example:/A$/It will match"ntA"InA, but it won’t match"Ant"InA

*character

Matches an expression 0 or more times
Equivalent to{0, }

For example:/yueno*/It will match"yuenoooyuen"Inyuenoooandyuen

+character

Matches an expression one or more times
Equivalent to{1, }

For example:/yueno+/It will only match"yuenoooyuen"Inyuenooo

?character

Match an expression 0 or 1 times
Equivalent to{0, 1}

  • For example:/yueno?/It will only match"yuenoooyuen"Inyueno
  • be careful:?If it is followed by any quantifier*+?or{}Will make quantifiers non greedy (match as few characters as possible)
  • For example:/yueno??/It will only match"yuenoooyuen"Inyuen

.character

Any single character other than the newline is matched by default

  • For example:/.y/It will only match"yuenoooyuen"Inoy
  • For example:/..y/It will only match"yuenoooyuen"Inooy

(x)character

Capture parenthesis
The bracket in regular expression indicates grouping matching, and the pattern in bracket can be used to match the content of grouping
Group matching can be used\n
In regular substitution, you can use$1,$2grammar

  • For example:/(wenbo)+/.test('wenbowenbo')bytrue, indicating a matchwenboOne or more times as a whole
  • For example:"wenbo,zhijian".replace(/(wenbo),(zhijian)/, '$2,$1')
  • Output:zhijian,wenbo

(?=x)character

matchingXBut don’t remember the match
Non trapping parentheses enable you to define subexpressions used with regular expression operators
Use non capture parentheses to match elements, but not in use\nand$nmethod

x(?=y)character

Match >x, only if >xAfter that is the >y>, antecedent assertion

  • For example:'wenbo'.match(/wen(?=bo)/)
  • Output:[ 'wen', index: 0, input: 'wenbo', groups: undefined ]
  • For example:'wenyu'.match(/wen(?=bo)/)
  • Output: null

(?<=y)xcharacter

Match >x, only if >xIn front of it is >y, > after assertion

  • For example:'wenbo'.match(/(?<=wen)bo/)
  • Output:[ 'bo', index: 3, input: 'wenbo', groups: undefined ]
  • For example:'yubo'.match(/(?<=wen)bo/)
  • Output: null

x(?!y)character

Match >x, only if >xIt’s not >yWhen the > positive negative look-up

(?<=y)xcharacter

Match >x, only if >xThe front is not >y>, reverse negative search

x|ycharacter

Match >xOr >yIt can be used together

  • For example:'wenyu'.match(/w|e|n/g)
  • Output:[ 'w', 'e', 'n' ]

{n}character

Matching the previous character just appears >nTimes, >n>Is a positive integer

  • For example:'hello'.match(/l{2}/g)
  • Output:[ 'll' ]

{n,}character

Matching a character with at least >nTimes, >n>Is a positive integer

{n,m}character

Match the preceding character at least >n>Most times >m>Times, >n> ,> mYes > is a positive integer >

[xyz]character

Character set > matches any character in brackets, including escape character. You can use dash (-) to specify a character, > for example: >[a-zA-Z1-9]>

  • For example:'hello 123'.match(/[a-h1-2]/g)
  • Output:[ 'h', 'e', '1', '2' ]

[^xyz]character

Reverse character set, > matches any character that is not contained in square brackets

  • For example:'hello 123'.match(/[^a-h1-2]/g)
  • Output:[ 'l', 'l', 'o', '3' ]

[\b]character

Match a backspace (U + 0008), not >\bDon’t mix it up

\bcharacter

Match the boundaries of a word

For example:

  • /\bworld/.test('hello world') // true
  • /\bworld/.test('hello-world') // true
  • /\bworld/.test('helloworld')  // false

\Bcharacter

Match a non word boundary

For example:

  • /\bworld/.test('hello world') // false
  • /\bworld/.test('hello-world') // false
  • /\bworld/.test('helloworld')  // true

\cXcharacter

When x is a character between a and Z, it matches a controller in the string

\dcharacter

Matching a number is equivalent to >[0-9]

\Dcharacter

Matching a number is equivalent to >[^0-9]

\Dcharacter

Matching a number is equivalent to >[^0-9]

\fcharacter

Match a page feed (U + 000C)

\ncharacter

Match a newline character (U + 000a)

\rcharacter

Match a carriage return

\scharacter

Matches a blank character, including spaces, tabs, page breaks, and line breaks

[\f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]

\Scharacter

Matches a non whitespace character

\tcharacter

Match a horizontal tab

\vcharacter

Match a vertical tab

\wcharacter

Match a single character (letter, number or underline) >, equivalent to >[A-Za-z0-9_]

\wcharacter

Match a non single character >, equivalent to >[A-Za-z0-9_]

\Wcharacter

Matches a non word character

\ncharacter

Returns the last nth word and captures the matching sub characters. The number of > captures is calculated by the left bracket

\0character

Match null character (U + 0000)

\xhhcharacter

Matches a character represented by a two digit hexadecimal number (< X00 – < XFF)

\uhhhhcharacter

Matches a utf-16 code unit represented by a four digit hexadecimal number

\u{hhhhh}Character or\u{hhhh}

Matches the Unicode character represented by a hexadecimal character (only if the U flag is set)