Let you thoroughly understand regular expressions, no longer copy

Time:2021-2-13

Regular expressions are used to match a string. The word “regular expression” is too long. We usually use its abbreviation “regex” or “regexp”. Regular expressions can be used to replace text in a string, validate a form, extract a string from a string based on pattern matching, and so on.

From now on, say goodbye to copy regular expressions!

In our coding process, regular expressions are our frequent customers, especially the verification of form fields. For the sake of convenience, the usual way is to purchase goods online, and then sell them to the form as an intermediary. Although this method is convenient, it can only meet the needs of ordinary customers (forms). If we meet a big customer who needs to customize the product (feature verification rules), then it’s too late for us to learn how to make it. The customer doesn’t wait for us. If we lose the customer, we won’t say it (the project is delayed), and you may have to deduct your balance (get criticized by the superior);

1、 Basic knowledge

Basic grammar

/pattern/[modifiers];
  • Pattern: pattern
  • Modifiers: modifiers

Modifier

Modifier You can be case insensitive in global search:

Modifier describe
i Performs case insensitive matching.
g Perform a global match (find all matches instead of stopping when the first match is found).
m Perform multiple line matching.

Examples

var str = 'aBc Abcd abcde';
str.match(/bcd/);         // ["bcd"]
str.match(/bcd/g);        // ["bcd", "bcd"]

str.match(/abc/g);        // ["abc"]
str.match(/abc/gi);       // ["aBc", "Abc", "abc"]

In the process of using, we need global matching in most cases. Whether the case is sensitive depends on the actual situation.

When multiple descriptors are used at the same time, the order of description is not required

str.match(/abc/gi);       // ["aBc", "Abc", "abc"]
str.match(/abc/ig);       // ["aBc", "Abc", "abc"]

Description character

According to the regular expression grammar rules, most characters can only describe themselves. These characters are called ordinary characters, such as all letters, numbers and so on.

Metacharacter is a special character with dynamic function. It needs to be marked with backslash to distinguish it from ordinary character and escape character. The metacharacter supported by JavaScript regular expression is shown in the table.

Metacharacter describe
. Find a single character, except for the newline and line terminator
\w Find word characters
\W Find non word characters
\d Find numbers
\D Find non numeric characters
\s Find white space
\S Find non white space characters
\b Match word boundaries
\B Match non word boundaries
\n Find line breaks
\f Find page breaks
\r Find carriage return
\t Find tab
\v Find vertical tabs
\xxx Find the character specified in the octal number XXX
\xdd Finds the character specified in the hexadecimal number DD
\uxxxx Finds the Unicode character specified in hexadecimal XXXX

Examples

var str = "hello word 12a3";

str.match(/./gi);           // ["h", "e", "l", "l", "o", " ", "w", "o", "r", "d", " ", "1", "2", "a", "3"]
str.match(/\d/gi);          // ["1", "2", "3"]
str.match(/\D/gi);          // ["h", "e", "l", "l", "o", " ", "w", "o", "r", "d", " ", "a"]
str.match(/\w/gi);          // ["h", "e", "l", "l", "o", "w", "o", "r", "d", "1", "2", "a", "3"]
str.match(/\W/gi);          // [" ", " "]
str.match(/\s/gi);          // [" ", " "]
str.match(/\S/gi);          // ["h", "e", "l", "l", "o", "w", "o", "r", "d", "1", "2", "a", "3"]
str.match(/\b/gi);          // ["", "", "", "", "", ""]
str.match(/\B/gi);          // ["", "", "", "", "", "", "", "", "", ""]

Var STR ='Hello world
//Match any ASCII character
str.match(/[\u0000-\u00ff]/g);  // ["!", " ", "H", "e", "l", "l", "o", " ", "w", "o", "r", "d", "!"]
//Matching any double byte Chinese characters
str.match (/ [^ u0000 - < u00ff] / g); // ["you", "Hao", "Shi", "Jie"]
//Match capital letters
str.match( /[\u0041-\u004A]/g);  // ["H"]

Duplicate matching

You can match a certain content more than once

classifier describe
n+ Matches any string containing at least one n
n* Matches any string containing zero or more n’s
n? Matches any string containing zero or one n
n{x} Matches a string containing x n sequences
n{x,y} Matches a string containing at least X and at most y n sequences
n{x,} Matches a string containing at least x n’s

Examples

var str = 'Hello helllo hehello hehehelllloooo'

str.match(/he/gi);     // ["He", "He", "He", "he", "He", "he", "he", "he"]
str.match(/(he)+/gi);  // ["He", "He", "Hehe", "Hehehehe"]
str.match(/(he)*/gi);  // ["He", "", "", "", "", "He", "", "", "", "", "", "Hehe", "", "", "", "", "Hehehehe", "", "", "", "", "", "", "", "", ""]
str.match(/(he)?/gi);  // ["He", "", "", "", "", "He", "", "", "", "", "", "He", "he", "", "", "", "", "He", "he", "he", "he", "", "", "", "", "", "", "", "", ""]
str.match(/(he){1}/gi); // ["He", "He", "He", "he", "He", "he", "he", "he"]
str.match(/(he){2}/gi); // ["Hehe", "Hehe", "hehe"]
str.match(/(he){2,}/gi); //  ["Hehe", "Hehehehe"]
str.match(/(he){3,4}/gi); // ["Hehehehe"]
str.match(/(he)+l+/gi);   // ["Hell", "Helll", "Hehell", "Hehehehellll"]
str.match(/(he)+l{3,}/gi);   //  ["Helll", "Hehehehellll"]

Through the above example, we can find that several different usages can get the same result

  • n+Equivalent ton{1,}
  • n?Equivalent ton{0,1}
  • n*Equivalent ton{0,}

Boundary quantifier

Location of matching pattern

classifier describe
^ Match the beginning. In multi line detection, the beginning of a line will be matched
$ Match the end. In multi line detection, the end of a line is matched

Examples

var str = 'abc ABC';

/^abc/gi.exec(str);    // ['abc']
/abc$/gi.exec(str);    // ['ABC'] 
/abc/gi.exec(str);     // ['abc']

If you do not add ^ and $, match from the beginning by default

Matching range

expression describe
[abc] Find any character between square brackets.
[0-9] Find any number from 0 to 9.
(x\ y) Find any\ Separated options. That is x or y

Examples

var str = 'Hello RegExp 369'

str.match(/[2-8]/gi);     // ["3", "6"]
str.match(/[el]/gi);      // ["e", "l", "l", "e", "E"]
str.match(/[x|5|6]/gi);   // ["x", "6"]
str.match(/[a-h]/gi);     // ["H", "e", "e", "g", "E"]

//You can also use more than one
str.match(/[2-8a-h]/gi);  // ["H", "e", "e", "g", "E", "3", "6"]

Escape character

From the above learning, we can see that in regular expressions, different matching patterns can be represented by using some special characters. For example, +, {}, ^,? And so on. So what do we do when we need to match these special characters? For example, if we match “+” in “1 + 2 = 3”, we need to escape “+”, that is, we need to add “\” before the character to be escaped.

var str = '1 + 1 = 3'
str.match(/\+/gi);    // ["+"]

If you only need to match a “+”, you will report an error when you do not escape:

Let you thoroughly understand regular expressions, no longer copy

When we need to match the following special characters, we need to escape them:

$()*+.[]?\^{}|

2、 Assertion

Suppose there is such a scene, you need to match the price in “today’s 18:00-20:00 full court 50% off, laundry liquid as long as ¥ 19, don’t miss it”.

Prices are made up of numbers. If we only match numbers, we will also match other information

Var STR ='today 18:00-20:00 50% off, laundry liquid as long as ¥ 19, don't miss it ';
str.match(/\d+/gi);       // ["18", "00", "20", "00", "5", "19"]

Obviously, it’s not good to only use numbers. You can notice that the price is after the ¥ symbol, and nothing else. If there is any way to match the specified content, it’s good. The answer is assertion:

Var STR ='today 18:00-20:00 50% off, laundry liquid as long as ¥ 19, don't miss it ';
str.match(/(?<=¥)\d+/gi);       // ["19"]

There are four types of assertions

Symbol describe meaning
reg(?=exp) Forward antecedent assertion Match reg, andbehindcontentsatisfyexp
reg(?!exp) Negative antecedent assertion Match reg, andbehindcontentdissatisfactionexp
(?<=exp)reg Forward backward assertion Match reg, andfrontcontentsatisfyexp
(?<!exp)reg Negative backward assertion Match reg, andfrontcontentdissatisfactionexp

Forward antecedent assertion

In shapeA(?=B)It means that it matches to a, and the content of B follows a.

var str = 'I scream, you scream, we all scream for ice-cream!'
//Match a word before scream
str.match(/\w+(?=\sscream)/gi);   // ["I", "you", "all"]

Negative antecedent assertion

In shapeA(?!B)It means that it matches a, and the content before a cannot satisfy B.

var str = 'I scream, you scream, we all scream for ice-cream!';
//Match the word scream without a space after it,
str.match (/ Scream (?!) / GI); // ["scream", "scream"] can only match the first and second

Forward backward assertion

In shape(?<=B)AIt means that it matches to a, and the front of a satisfies B.

var str = 'I scream, you scream, we all scream for ice-cream!?'
//Words after matching scream
str.match(/(?<=scream\s)\w+/gi);   ["for"]

Negative backward assertion

In shape(?<!B)AIt means that it matches a, and the one before a does not satisfy B.

var str = 'I scream, you scream, we all scream for ice-cream!';
//Matches cream and cannot be preceded by a letter
str.match (/ (? <!) cream / GI); // ["cream"] can only match cream in ice cream

Many people may easily confuse these memories after reading them. Here’s a simple way to understand and remember:

  • Assertion (EXP)Write it at the backnamelyMatch behindThe content of,Write on the frontnamelyMatch frontContent of
  • PositiveexpresssatisfyThe condition (symbol)= ),NegativeexpressdissatisfactionThe condition (symbol)! )
Welcome to my personal website (I’m sure you’ll like my style)www.dengzhanyong.com
Pay attention to my official account.Xiaoyuan】, don’t miss my every tweet

3、 Common regular expressions

  • positive integer: ^\d+$
  • negtive integer: ^-\d+$
  • Telephone number: ^+?[\d\s]{3,}$
  • Telephone code: ^+?[\d\s]+(?[\d\s]{10,}$
  • integer: ^-?\d+$
  • user name: ^[\w\d_.]{4,16}$
  • Alphanumeric characters: ^[a-zA-Z0-9]*$
  • Alphanumeric characters with spaces: ^[a-zA-Z0-9 ]*$
  • password: ^(?=^.{6,}$)((?=.*[A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z]))^.*$
  • E-mail: ^([a-zA-Z0-9._%-][email protected][a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})*$
  • IPv4 address: ^((?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))*$
  • Lowercase letters: ^([a-z])*$
  • capital: ^([A-Z])*$
  • website: ^(((http|https|ftp):\/\/)?([[a-zA-Z0-9]\-\.])+(\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\/+=%&_\.~?\-]*))*$
  • Visa credit card number: ^(4[0-9]{12}(?:[0-9]{3})?)*$
  • Date (mm / DD / yyyy): ^(0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])[- /.](19|20)?[0-9]{2}$
  • Date (yyyy / mm / DD): ^(19|20)?[0-9]{2}[- /.](0?[1-9]|1[012])[- /.](0?[1-9]|[12][0-9]|3[01])$

……

4、 Regular expression method

test()

test() Method is used to detect whether a string matches a pattern. If the string contains matching text, it returns true; otherwise, it returns false.

exec()

exec() Method is used to retrieve a match of regular expressions in a string. This function returns an array of matching results. If no match is found, the return value is null.

Other methods use regular expressions

match()

match() Method to retrieve a specified value within a string or find a match for one or more regular expressions.

replace()

replace()Method is used to replace some characters with others in a string, or to replace a substring that matches a regular expression.

Examples

//Replace all numbers with*
var text = 'aaa126bbb34278ccc23';
text.replace(/\d/gi, '*');   // "aaa***bbb*****ccc**"

search()

search() Method uses an expression to search for a match, and then returns the location of the match.

Examples

//Gets the location where at least two consecutive numbers appear
var text = 'ab1cfd3ff452de7532';
text.search(/\d{2,}/gi);   // 9

5、 How to write efficient regular expressions

  1. Mismatch

    about+*?These symbols need to choose the appropriate use according to the actual scene, do not confuse them

  2. Missing match

    If you need to match the 18 ID number, if so\d{18}There will be a missing match, because the last digit of the ID card may be x, which can be improved as follows: D {17} (x | x | d)

  3. to make clear

    Usually the more simple regular matching is, the more the result is. The ID number is used for example. The 18 0 also satisfies the matching conditions, but this is obviously not a provincial card number. In order to get more accurate matching results, we need to make our regularization more explicit.

6、 Actual combat drill

Let’s write a matching rule for the ID number. First, we need to know the structure of the ID number. I wrote an article a long time ago【Do you know how to prevent ID card from counterfeiting?】I won’t explain it in detail here.

Address codeThe length is 6, the first digit is 1-9, and the last five digits are 0-9

/^[1-9]\d{5}/

Year codeThe length is 4, the first two may be 18, 19, 20, the last two are 0-9

/(18|19|20)\d{2}/

Month codeTwo digits 01-12, date code 2 digits 01-31

/((0[1-9])|1[0-2])(([0-2][1-9])|10|20|30|31)/

Sequence codeIt’s a three digit 0-9 number

/\d{3}/

Check codeOne digit may be 0-9 or X, and X may be lowercase X

/\d{17}(X|\d|x)$/ 

You can also write like this

/\d{17}[0-9Xx]$/

Finally, put them together

/^[1-9]\d{5}(18|19|20|(3\d))\d{2}((0[1-9])|(1[0-2]))(([0-2][1-9])|10|20|30|31)\d{3}[0-9Xx]$/

This rule can be used to determine whether the ID number is in accordance with the basic requirements, but if more accurate calibration is needed, it is necessary to compile some programs to verify it. If the address code of each province is:

North China: Beijing 11, Tianjin 12, Hebei 13, Shanxi 14, Inner Mongolia 15

Northeast: Liaoning 21, Jilin 22, Heilongjiang 23

East China: Shanghai 31, Jiangsu 32, Zhejiang 33, Anhui 34, Fujian 35, Jiangxi 36, Shandong 37

Central China: Henan 41, Hubei 42, Hunan 43

South China: Guangdong 44, Guangxi 45, Hainan 46

Southwest: Sichuan 51, Guizhou 52, Yunnan 53, Tibet 54, Chongqing 50

Northwest: Shaanxi 61, Gansu 62, Qinghai 63, Ningxia 64, Xinjiang 65

Special: Taiwan 71, Hong Kong 81, Macao 82

Some months do not have the 31st, check code is correct, etc

Welcome to my personal website (I’m sure you’ll like my style)www.dengzhanyong.com
Pay attention to my official account.Xiaoyuan】, don’t miss my every tweet