JavaScript regular expression regexp

Time:2020-9-16

Regular expression, also known as regular expression, is often used to check and filter strings. Due to the flexibility, logic and functionality of regular expressions, and the control of complex strings in a very simple way, many programming languages support regular expressions. stayJavaScriptIt is also very powerful and practical.

basic form

Regular expression is a method of expressing text pattern (i.e. string structure). It is a kind of template of string, which is often used as a tool to match text according to “given pattern”. For example, a regular expression gives a pattern of an email address and then uses it to determine whether a string is an email address. The regular expression system of JavaScript is established with reference to Perl 5.

There are two ways to create a new regular expression. One is the use of literal quantities, with slashes to indicate the beginning and end.

//Literal form
var telRegex1 = /^1[3|5|7|8]\d{9}$/;
//Constructor form
var telRegex2 = new RegExp('^1[3|5|7|8]\d{9}$');

All of the above create a content for^1[3|5|7|8]\d{9}$The regular expression of a mobile phone number, which represents the verification of a mobile phone number. Must start with 1, the second digit is 3 / 5 / 7 / 8, followed by 9 digits.

There is a subtle difference between the two approaches — literal and constructor — at run time. Using literal method, regular objects are generated when code is loaded (i.e. compile time); regular objects are generated when code is running by using constructor method. Considering the convenience and intuition of writing, in practical application, the literal method is basically used.

It is important to note that when you use the constructor to create a regular expression, the parameters passed in are in the form of strings,\It is also an escape character, so you need to use another one\To escape the regular expression. In the second example above,\\dTo represent any number.

For the meaning and usage of various symbols in regular expressions, please refer to the following introduction:

Metacharacter

Some commonly used metacharacters are as follows:

  • .Matches any character except newline

  • \wMatch letters or numbers or underscores or Chinese characters

  • \sMatch any whitespace

  • \dMatching numbers

  • \bMatch the beginning or end of a word

  • ^Match the beginning of the string

  • $Matches the end of the string.

  • *Matches the previous subexpression any number of times.

  • ?Before matching the face subexpression 0 times or once, equivalent to{0, 1}

  • +The subexpression before matching is one to more than one time, which is equivalent to{1, }

  • {n}Match the previous subexpression n times.

  • {m,n}The sub expression before matching is at least m times and at most N times.

  • {n, }Match the previous subexpression at least N times.

  • [xyz]A collection of characters representing any one of the characters. Indicates that the range is available-Links, such as[a-z]Represents any letter between a and Z. It can also be written in this way[A-Za-z0-9]

  • [^xyz]Character, which means not any one of the characters. Indicates that the range is available-Links, such as[^a-z]Denotes any letter between a and Z.

  • |expressOr (or)Relationships, such ascom|cn, indicating that it matches com or CN.

  • ()Used for grouping, the contents of which may have passed$1-$9In order to obtain (in string correlation method), the following regular can also be obtained through\1-\9Reference (inside the regular expression). (grouping 0 represents the entire regular matching content or the entire regular expression)

In regular expressions, the above and some unlisted metacharacters have their own meanings. If we need to match the metacharacters themselves, we can use\It can be transferred.

More metacharacters to see: regular expressions

attribute

Modifier

  • ignoreCase: returns a Boolean value indicating whether the I modifier is set, which is read-only.

  • global: returns a Boolean value indicating whether the G modifier is set. This property is read-only.

  • multiline: returns a Boolean value indicating whether the M modifier is set, which is read-only.

  • stickyES6Returns a Boolean value indicating whether the Y modifier is set, read-only.

var r = /abc/igm;

r.ignoreCase; // true
r.global;  // true
r.multiline;  // true

Property on match

  • lastIndex: returns the location of the next search. This property is readable and writable, but only meaningful if the G modifier is set.

  • sourceES5Returns the string form of a regular expression (excluding backslashes), which is read-only.

  • flagsES6Returns the modifier in a regular expression.

var r = /abc/igm;

r.lastIndex; // 0
r.source; // "abc"
r.flags; //"igm"

method

test()

Of regular objectstestObject receives a string representing the test string and returns a Boolean value indicating whether the string meets the matching criteria.

telRegex1.test('13612341234'); // true
telRegex2.test('13612341234'); // true
telRegex1.test('136123412'); // false

If the regular expression hasgModifier, every timetestMethods are matched backwards from where they last ended. At the same time, thelastIndexProperty specifies where to start the search.

var xReg = /x/g;
var str = 'xyz_x1_y1_x3';

xReg.lastIndex; // 0
xReg.test(str); // true

xReg.lastIndex; // 1
xReg.test(str); // true
xReg.lastIndex; // 5

//Start from the specified position. If the next match starts from the last bit, it will not match
xReg.lastIndex = 11; // 11
xReg.test(str); // false

xReg.lastIndex; // 0
var indexReg = /^(?:http|https).+\/jwebui\/pages\/themes\/(\w+)\/\.jspx(\?\S+)?$/i ;

Above is a regular expression in F8 to check whether it is the front page.

  • At the beginning^And the last$Represents the beginning and end of the match, respectively.

  • (?:http|https)Indicates one of the two. If you write yes or no, the group matches,()It will not be stored in groups. It can also be written as(http|https)But the back one\1It needs to be replaced with\2Because the first group is formed here.

  • .+Any character appears at least once.

  • \/jwebui\/pages\/themes\/It’s a matching string"/jwebui/pages/themes/"

  • (\w+)At least one character or group of characters is represented as an underline.

  • \1Represents a reference to the first group, and then repeats the contents of the first group.

  • \.jspxexpress.jspx

  • (\?\S+)?express(\?\S+)0 or one occurrence of matching content. Among them:

    • \?express

    • \S+Indicates that any visible character appears at least once.
      `

exec()

Of regular objectsexecMethod to return the matching result. If a match is found, an array is returned. The member is a substring of each successful match. Otherwise, it is returnednull

If the regular expression contains parentheses (that is, contains a group match), the returned array contains more than one member. The first member is the result of a successful match, and the next member is the group of successful matches corresponding to the parentheses. In other words, the second member corresponds to the first bracket, the third member corresponds to the second bracket, and so on. Of the entire arraylengthProperty is equal to the number of group matches plus 1.

var ipReg = /(\d{1,3}\.){3}(\d{1,3})/;
var ipStr = 'My ip is "192.168.118.47" , please tell me yours';

ipReg.exec(ipStr); // ["192.168.118.47", "118.", "47"]

The first code above represents a simple IP test, with 1-3 digits of the number followed by one.Then the whole will appear three times, and finally there will be a number of 1-3 digits. In the result array, the first value represents the matching result, and the subsequent value represents the content matched by the regular group.

If you add the G modifier to a regular expression, you can use the exec method multiple times, and the next search starts at the position where the last match successfully ended. You can also specifylastIndex, so that it starts next time at the specified location (before visibletestExamples).

var ipLastReg = /\d+(?=;)/g;

var ipsStr = '192.168.118.47;192.168.118.46;192.168.118.48;';

ipLastReg.exec(ipsStr); // ["47"]
ipLastReg.exec(ipsStr); // ["46"]
ipLastReg.exec(ipsStr); // ["48"]

In the code above, the(?=;)Indicates a lookahead assertion, indicating that only matches in the;front\d+

If it’s just for matching, use RegExp.test()Method or string instance.search() Substitution, higher efficiency.

String correlation method

It is called a string dependent method because it is called on a string (although ES6 starts with a regular method, it is still an entry provided on the string).

  • match(): returns an array whose members are all matching substrings.

  • search(): searches according to the given regular expression and returns an integer indicating the start of the match.

  • replace(): replace according to the given regular expression and return the replaced string.

  • split(): split the string according to the given rules and return an array containing each member after segmentation.

match()

matchMethod to perform regular matching on the string and return the matching result. This method is similar to that of regular objectsexecThe method is very similar: a successful match returns an array, and a failed match returns an arraynull。 If the regular expression hasgModifier, the method is similar to the regular object’sexecMethod behavior is different, will return all matching results at once.

var ipLastReg = /\d+(?=;)/g;
var ipsStr = '192.168.118.47;192.168.118.46;192.168.118.48;';

ipsStr.match(ipLastReg); // ["47", "46", "48"]

The above regular is the last bit in the matching IP, where the(?=;)First assertion means that only match in the;Before, but not included;。 For more advance assertions, see below.

search()

searchMethod to return the position of the first matching result that satisfies the condition (strings can be used directly, not necessarily regular objects) in the whole string. If there is no match-1

var nowDateStr = '2016-11-1';
var testReg = /-/g;

nowDateStr.search(testReg); // 4
//Search again or 4
nowDateStr.search(testReg); // 4

//Check lastindex and set 
testReg.lastIndex; // 0
testReg.lastIndex = 6;
nowDateStr.search (testreg); // 4 the result is still 4

searchmethodalwaysFrom the stringStarting positionFind, and regular expression’sgModifiers andlastIndexProperty independent.

replace()

replaceMethod can replace the matching value and return the new string after replacement. It takes two parameters, the first is the search pattern (strings can be used directly, not necessarily regular objects), and the second is the content of the replacement (strings or a function can be used). If the search pattern does not add the G modifier, the first matching value is replaced; otherwise, all matching successful values are replaced.

amongreplaceThe second parameter of the method can use the dollar sign $, which refers to the replaced content, as follows:

  • $&Refers to the matching substring.

  • $`Refers to the text before the match result.

  • $'Refers to the text after the match result.

  • $nRefers to the nth group of content that matches successfully. N is a natural number starting from 1.

  • $$Refers to the dollar sign $.

var re = /-/g; 
var str = '2016-11-01';
var newstr = str.replace(re,'.');
console.log(newstr);  // "2016.11.01"

'hello world'.replace(/(\w+)\s(\w+)/, '$2 $1');
// "world hello"

'abc'.replace('b', '[$`-$&-$\']');
// "a[a-b-c]c"

The second parameter is the function:

function toCamelStyle(str) {
    //Match - and a character after it, where the character is in a group
    var camelRegExp = /-([a-z])/ig;

    return str.replace(camelRegExp, function(all, letter) {
        //All is the matched content, and letter is the group matching        
        return letter.toUpperCase();
    });
}

toCamelStyle('margin-left'); // "marginLeft"
toCamelStyle('aa-bb-cccc'); // "aaBbCccc"

The above code shows that theaa-bb-ccccThis string is converted toaaBbCcccThis form.replaceThe callback function receives two parameters. The first is the matched content, the second is the matched group. The number of parameters can be transferred according to the number of groups. After that, there can be two parameters, one is the position of the matched content in the original string, and the other is the original string.

split()

splitMethod splits a string according to regular rules, and returns an array of parts after segmentation. This method takes two parameters. The first parameter is the separation rule (strings can be used directly, not necessarily regular objects), and the second parameter is the maximum number of returned array members.

'2016-11-01'.split('-'); // ["2016", "11", "01"]
'2016-11-01'.split(/-/); // ["2016", "11", "01"]

Greedy model and lazy mode

When a regular expression contains qualifiers that can accept repetition, the usual behavior is to match as many characters as possible (on the premise that the whole expression can be matched), which is calledGreedy model

For example:

var s = 'aaa';
s.match(/a+/); // ["aaa"]

Sometimes, we need lazy matching, that is, matching as few characters as possible. The qualifier given above can be converted to lazy matching pattern by adding a question mark? After it?. In this way, *? Means matching any number of repetitions, but using the least number of repetitions on the premise that the whole match is successful.

var s = 'aaa';
s.match(/a+?/); // ["a"]

Here are some instructions

  • *?Repeat any number of times, but as little as possible

  • +?Repeat once or more, but repeat as little as possible

  • ??Repeat 0 or 1 times, but repeat as little as possible

  • {n,m}?Repeat n to m times, but repeat as little as possible

  • {n,}?Repeat more than n times, but as little as possible

That is to say, by default, it is greedy mode, plus oneThen it turns into lazy mode, also known as non greedy mode.

Group matching

Usually one()The content of the group will be stored in the regular expression (using the\1-\9)And related methods (using $1-$9)Reference, which has been introduced before, will not be mentioned any more.

As for group matching, there are the following situations:

Non capture group

(?:x)It is called non capturing group, which means that the matching content of the group is not returned, that is, the bracket is not included in the matching result.

//Normal matching
var url = /(http|ftp):\/\/([^/\r\n]+)(\/[^\r\n]*)?/;

url.exec('http://google.com/');
// ["http://google.com/", "http", "google.com", "/"]

//Non capture group matching
var url = /(?:http|ftp):\/\/([^/\r\n]+)(\/[^\r\n]*)?/;

url.exec('http://google.com/');
// ["http://google.com/", "google.com", "/"]

afterAntecedent assertionandPrior negation assertionThey’re all non capture groups

Antecedent assertion

x(?=y)It is called positive look ahead,xOnly inyThe front matches,yWill not be included in the returned results.

For example, the previous IP matching example:

var ipLastReg = /\d+(?=;)/g;
var ipsStr = '192.168.118.47;192.168.118.46;192.168.118.48;';

ipsStr.match(ipLastReg); // ["47", "46", "48"]

In the regular object above(?=;)It means that only the;Before, but not included;

Prior negation assertion

x(?!y)It is called negative look ahead,xOnly not inyThe front matches,yWill not be included in the returned results.

var xreg = /\d+(?!%)/g ;
xreg.exec('100% is 1'); // ["10"]
xreg.exec('100% is 1'); // ["1"]
/\d+?(?!%)/.exec('100% is 1'); // ["1"]

The code above indicates that the match does not exist%The number before,xregDirectly written in\d+Represents a greedy pattern, so the first match is 10, the second match will match the following 1, because the number 10 itself is not there%Previously, regularization does not regard 100 as a whole (Note: Here we need to define a regular object to call. When we call the regular directly in literal form, each call is a new object, and the result is always10)。

For one match to the last1, we are\d+Add one after that?Convert it toNon greedy modelThat’s fine.

For a match to the front100Medium1, we are\d+Add one after that?Convert it toNon greedy modelThat’s fine.

Before ES6,JavaScriptNot supported inSubsequent assertionandNegative after assertionSupport for this has been added to ES6. See the ES extension later.

ES6 extension

Constructor

There are two cases of arguments to the regexp constructor.

  • In the first case, the parameter is a string, and the second parameter represents the flag of the regular expression.

  • In the second case, the parameter is a regular expression. In this case, there can be no second parameter. A copy of the original regular expression will be returned.

ES6In the second case, a second parameter is allowed to be passed in to set the modifier for the regular expression of the first parameter.

Var regex = new regexp (/ XYZ /,'I '); // syntax error before ES6

New regexp (/ ABC / Ig, 'I'); // the result in ES6 is / ABC / I

Regular method of string

There are four methods of string object, which can use regular expression:match()replace()search()andsplit()

ES6 calls all the four methods inside the languageRegExpSo that all regular related methods are defined on regexp objects.

Modifier

ES6 adds regular expressionsuModifier, meaning“UnicodeMode “is used to correctly handle Unicode characters larger than ufff. That is, the four byte utf-16 encoding is handled correctly.

ES6 also addsyModifier, called “adhesion”(sticky)Modifier.

yThe function of modifier andgThe modifier is similar to a global match, and the last match starts from the next location where the previous match was successful. The difference is,gAs long as there is a match in the remaining positions, the Y modifier ensures that the matching must start from the first remaining position, which is the meaning of “adhesion”.

var s = 'aaa_aa_a';
var r1 = /a+/g;
var r2 = /a+/y;

//The first time it matches correctly
r1.exec(s); // ["aaa"]
r2.exec(s); // ["aaa"]

//The second time the results were inconsistent
r1.exec(s); // ["aa"]
r2.exec(s); // null

Personal understanding,\yIs similar to implicitly adding^, indicating the start position.

attribute

In Es5, regular objects existsourceProperty to return the regular expression itself.

In ES6, theflagsProperty to return all modifiers of a regular object.

Subsequent assertion

The latter assertion is the opposite of the first assertion. for example/(?<=y)x/Represents a matchx, but it requiresxMust be inyBack.

In the same way, the negative assertion is as follows:/(?<!=y)x/Represents a matchx, but it requiresxCan’t be inyBack.

It should be noted that when there are post row assertions, the order of regular execution will change. This part of the latter row assertion will be matched first, and then the others, and the order will change from right to left. Therefore, the results of some matching operations may be quite inconsistent, and\1-\9The order of references for also changes.

Reference link

  • Getting started with ES6 – regular expressions

  • JavaScript RegExp

  • Regular expression 30 minute tutorial

The original text is published in my blog, regexp, welcome to visit!

Error correction

In the first negative assertion

For one match to the last1, we are\d+Add one after that?Convert it toNon greedy modelThat’s fine.

For a match to the front100Medium1, we are\d+Add one after that?Convert it toNon greedy modelThat’s fine.

Recommended Today

How to share queues with hypertools 2.5

Share queue with swote To realize asynchronous IO between processes, the general idea is to use redis queue. Based on the development of swote, the queue can also be realized through high-performance shared memory table. Copy the code from the HTTP tutorial on swoole’s official website, and configure four worker processes to simulate multiple producers […]