New feature of es9: regexp

Time:2021-5-4

brief introduction

Regular expression is a common tool when we do data matching. Although the syntax of regular expression is not complex, it will give us a feeling that we can’t start if we combine multiple grammars.

So regular expressions are a nightmare for programmers. Today we’ll look at how to play with regular expressions in es9.

Numbered capture groups

We know that regular expressions can be grouped. Grouping is represented by brackets. If you want to get the value of grouping, it is called capture groups.

Generally speaking, we access capture groups by serial number, which is called numbered capture groups.

for instance:

const RE_DATE = /([0-9]{4})-([0-9]{2})-([0-9]{2})/;

const matchObj = RE_DATE.exec('1999-12-31');
const year = matchObj[1]; // 1999
const month = matchObj[2]; // 12
const day = matchObj[3]; // 31

The regular expression above needs to match the date, month and year, and then returns the array of match through the exec method. This array stores the matching groups information.

Because we have three brackets, we can match three groups. Then access the specific group through 1, 2, 3.

Let’s take a look at the above matchobj output

[
  '1999-12-31',
  '1999',
  '12',
  '31',
  index: 0,
  input: '1999-12-31',
  groups: undefined
]

You can see that matchobj is an array, and index 0 stores the string to be matched. Here we see that matchobj also has an undefined group, which is named groups.

Named capture groups

As mentioned above, numbered capture groups access matched data through serial numbers. But the matched group has no name.

Let’s see how we can name these groups:

const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;

const matchObj = RE_DATE.exec('1999-12-31');
const year = matchObj.groups.year; // 1999
const month = matchObj.groups.month; // 12
const day = matchObj.groups.day; // 31

Take a look at matchobj:

[
  '1999-12-31',
  '1999',
  '12',
  '31',
  index: 0,
  input: '1999-12-31',
  groups: [Object: null prototype] { year: '1999', month: '12', day: '31' }
]

As you can see, there is more information about groups this time.

If you want to match the group information that we have matched before, you can use the group information of numbered groups or named groups

Let’s look at an example

const RE_TWICE = /^(?<word>[a-z]+)!\k<word>$/;
RE_TWICE.test('abc!abc'); // true
RE_TWICE.test('abc!ab'); // false
const RE_TWICE = /^(?<word>[a-z]+)!$/;
RE_TWICE.test('abc!abc'); // true
RE_TWICE.test('abc!ab'); // false

Both grammars can be used.

Named capture groups can also be used with replace.

With group name, we can directly use group name as a reference in replace

const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
console.log('1999-12-31'.replace(RE_DATE,
    '$<month>/$<day>/$<year>'));
    // 12/31/1999

The second parameter of replace can also be a function. The parameters of the function are some of the contents that we group

const RE_DATE = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/;
console.log('1999-12-31'.replace(
    RE_DATE,
    (g0,y,m,d,offset,input, {year, month, day}) => // (A)
        month+'/'+day+'/'+year));
    // 12/31/1999

In the above example, G0 = 1999-12-31 indicates the matching substring. y. M and D match numbered groups 1, 2 and 3.

Input is the input of the whole system{ Year, month, day} matches named groups.

Escape of Unicode attribute in regexp

In the Unicode standard, every character has an attribute. To put it simply, the attribute is used to describe the character.

For example, general_ Category refers to the classification of characters: X: general_ Category = Lowercase_ Letter

White_ Space refers to spaces, tabs and newlines_ Space = True

Age indicates when the character is added to Unicode, etc.

There is also a corresponding abbreviation for these attributes: lowercase_ Letter = Ll , Currency_ Symbol = SC, etc.

For example, we want to match spaces. The traditional approach is this:

> /^\s+$/.test('\t \n\r')
true

The first is a regular expression, and then use a test method to match the string to get true.

We just talked about the properties of Unicode. We can also use properties to match:

> /^\p{White_Space}+$/u.test('\t \n\r')
true

Attribute matching uses\p, followed by the property value.

Note that we also add u after the regular expression to indicate that we are using Unicode property escape.

lookaround assertion

Lookaround assertion can be translated as a look around assertion. It is a structure in regular expressions, which is used to determine the context of the object to be matched.

There are two kinds of lookaround assertions, one is lookahead, the other is lookbehind.

Let’s first look at the use of lookahead

const RE_AS_BS = /aa(?=bb)/;
const match1 = RE_AS_BS.exec('aabb');
console.log(match1[0]); // 'aa'

const match2 = RE_AS_BS.exec('aab');
console.log(match2); // null

Lookahead is to look ahead. We use(?=bb)To match BB forward.

Note that although the regular expression matches AABB, BB is not included in the match.

The result is that the first one matches and the second one doesn’t.

In addition to using?=Besides, we can also use?!Means unequal:

> const RE_AS_NO_BS = /aa(?!bb)/;
> RE_AS_NO_BS.test('aabb')
false
> RE_AS_NO_BS.test('aab')
true
> RE_AS_NO_BS.test('aac')
true

Let’s look at the use of lookbehind.

Lookbehind and lookahead queries are just in the opposite direction.

Backward matching is used?<=Let’s take a look at an example

const RE_DOLLAR_PREFIX = /(?<=$)foo/g;
'$foo %foo foo'.replace(RE_DOLLAR_PREFIX, 'bar');
    // '$bar %foo foo'

In the above example, we matched the front $, and then replaced foo with bar.

We can also use it?<!To represent the case of inequality:

const RE_NO_DOLLAR_PREFIX = /(?<!$)foo/g;
'$foo %foo foo'.replace(RE_NO_DOLLAR_PREFIX, 'bar');
    // '$foo %bar bar'

dotAll flag

Under normal circumstances, dot. Represents a character, but this character cannot represent the end of a line

> /^.$/.test('\n')
false

Dotall is the s introduced after dot. Matching, which can be used to match the end of a line

> /^.$/s.test('\n')
true

In ES, the following characters indicate the end of a line:

  • U+000A LINE FEED (LF) (\n)
  • U+000D CARRIAGE RETURN (CR) (\r)
  • U+2028 LINE SEPARATOR
  • U+2029 PARAGRAPH SEPARATOR

summary

These are the new features of regexp introduced in es9. I hope you like it.

Author: what about the flydean program

Link to this article:http://www.flydean.com/es9-regexp/

Source: flydean’s blog

Welcome to my official account: the most popular interpretation of “those things”, the most profound dry cargo, the most concise tutorial, and many small tricks you don’t know, etc. you’ll find them!

Recommended Today

Large scale distributed storage system: Principle Analysis and architecture practice.pdf

Focus on “Java back end technology stack” Reply to “interview” for full interview information Distributed storage system, which stores data in multiple independent devices. Traditional network storage system uses centralized storage server to store all data. Storage server becomes the bottleneck of system performance and the focus of reliability and security, which can not meet […]