ES6 (8) – regexp

Time:2021-4-18

RegExp

  • Sticky — y modifier
  • On regularization of Chinese — u modifier

    • Multibyte Chinese character matching
    • Dot character
    • Add a new Unicode code point to match Chinese characters
    • classifier
    • I modifier
    • Predefined patterns
  • Es6-es10 learning map

Sticky — y modifier

Y means sticky, global matching, must start from the first, continuous matching

const s = 'aaa_aa_a'
Const R1 = / A + / g // equivalent to ^$
const r2 = /a+/y
console.log(r1.exec(s))
// ["aaa",index: 0, input:"aaa_aa_a"]
//Matching result, matching starting index, input value
console.log(r2.exec(s))
// ["aaa",index: 0, input:"aaa_aa_a"]

console.log(r1.exec(s))
// ["aa",index: 4, input:"aaa_aa_a"]
console.log(r2.exec(s))
// null

The G modifier is matched from the next AAA, which may not be a at the beginning

The Y modifier is matched from the next AAA. At the beginning, it must be a. if it is not a, it will return null

Example: use the lastindex property to better illustrate the Y modifier.

const REGEX = /a/g
//Specify match from position 2 (y)
REGEX.lastIndex = 2
//Match succeeded
const match = REGEX.exec('xaya')
//Match succeeded at position 3
console.log(match.index) // 3
//The next match starts from position 4
console.log(REGEX.lastIndex) // 4
//Position 4 start matching failed
REGEX.exec('xaxa') // null

In the above code, the lastindex property specifies the starting position of each search, and the G modifier searches backward from this position until a match is found.

The Y modifier also adheres to the lastindex property, but requires that a match be found in the location specified by lastindex.

const REGEX = /a/y

//Specify match from position 2
REGEX.lastIndex = 2

//It's not conglutination. Matching failed
REGEX.exec('xaya') // null

//Specify match from position 3
REGEX.lastIndex = 3

//The third position is adhesion. The match is successful
const match = REGEX.exec('xaxa')
console.log(match.index) // 3
console.log(REGEX.lastIndex) // 4

On regularization of Chinese — u modifier

Multiple byte characters,unicodeMedium greater than\uffffThere is no way to match correctly in Es5. That is to say, using the U modifier properly handles the four byte utf-16 encoding.

𠮷U+20BB7

Multibyte Chinese character matching

let s = '𠮷'
let s2 = '\uD842\uDFB7'

console.log (/ ^ ud842 /. Test (S2)) // true only matches two characters, which is wrong
console.log(/^\uD842/u.test(s2)) //false

Dot character

Dot character means any single character except newline character, but the single character dot character greater than 0xFFFF cannot be recognized

console.log (/ ^. $/. Test (s)) // false is wrong to match any character
console.log (/ ^. $/ u.test (s)) // true matches any character

Add a new Unicode code point to match Chinese characters

console.log(/\u{20BB7}/u.test(s)) //true
console.log(/\u{61}/u.test('a')) //true
console.log(/\u{61}/.test('a')) //false

classifier

Can count

//{2} It means to appear twice
console.log(/𠮷{2}/u.test('𠮷𠮷')) //true
console.log (/ {2} /. Test (') // false match incorrect

In addition, only when the U modifier is used, the braces in Unicode expressions will be interpreted correctly, otherwise they will be interpreted as quantifiers.

/^\u{3}$/.test('uuu') // true

In the above code, since the regular expression has no u modifier, the braces are interpreted as quantifiers. With the U modifier, it is interpreted as a Unicode expression.

/\u{20BB7}{2}/u.test('𠮷𠮷') // true

After using the U modifier, Unicode expression + quantifier is also OK.

I modifier

console.log(/[a-z]/iu.test('\u212A')) // true
console.log (/ [A-Z] / I.Test ('[u212a')) // false although I ignore case, the match is incorrect

Predefined patterns

The U modifier also affects whether the predefined pattern can correctly recognize Unicode characters with code points greater than 0xFFFF.

/^\S$/.test('𠮷') // false
/^\S$/u.test('𠮷') // true

The above code\SIs a predefined pattern that matches all characters that are not spaces. Only when the U modifier is added, it can correctly match the code point greater than0xFFFFOfUnicodeCharacter.

With this, you can write a function that returns the length of the string correctly.

function codePointLength(text) {
  const result = text.match(/[\s\S]/gu);
  return result ? result.length : 0;
}

const s = '𠮷𠮷';

console.log(s.length) // 4
const reals = codePointLength(s)
console.log(reals) // 2

Learning territory

ES6 (8) - regexp

Recommended Today

Review of SQL Sever basic command

catalogue preface Installation of virtual machine Commands and operations Basic command syntax Case sensitive SQL keyword and function name Column and Index Names alias Too long to see? Space Database connection Connection of SSMS Connection of command line Database operation establish delete constraint integrity constraint Common constraints NOT NULL UNIQUE PRIMARY KEY FOREIGN KEY DEFAULT […]