Summary of regular expression learning


1、 Background

There is such a demand in work:

Use regular expression to replace the span tag in a string with img tag, and put the content of the original span tag into the SRC of img tag. The problem is described in detail: click me

Seeing this requirement, I know that regular expressions can be used. However, since I didn’t use them very much before, when I think of regular expressions, I have a lot of special symbols. It seems that there are no rules to follow and it’s a little difficult to understand. However, knowing that I could not escape, I tried to write this regular expression to solve my needs. The detailed description of the problems mentioned above is probably the process of my thinking. After I put forward the questions, someone answered them immediately. After reading their answers, I felt ashamed that I lacked knowledge. I would be old if I didn’t study

2、 Basics of regular expressions

2.1 introduction to metacharacters

  • “^”: ^ matches the start of a line or string, and sometimes the beginning of the entire document.

  • “$”: $matches the end of a line or string.

  • “B”: does not consume any characters, only matches one position, and is often used to match word boundaries"This is Regex".match(/\bis\b/);“B” does not match the characters on both sides of is, but it recognizes whether the two sides of is are word boundaries.

  • “D”: matches a number.

  • “W”: matches letters, numbers, and underscores. Equivalent to ‘[a-za-z0-9_ ]’。

  • “S”: matches the space.

  • “.”: matches any character except newline.

  • “[a-za-z]”: character groups match characters that contain elements in parentheses.

  • Several Antonyms: if it is changed to capital, the meaning is opposite to the original.
    For example:
    “W”: matches any non word character. Equivalent to ‘1‘。
    2“: matches any character except ABC.

  • Character escape: in regular expressions, metacharacters have special meanings. When we want to match metacharacters themselves, we need to use character escape, such as:/\./.test("."); // true

2.2 quantifiers

2.2.1 common quantifiers

  • “*” (greedy) repeats zero or more times. The greedy quantifier will first match the whole string. When trying to match, it will select as many contents as possible. If it fails, it will backtrack one character. Then it will try to backtrack again. The process will backtrack one character at a time until the matching content is found or no character can be backtracked. For example:
    "aaaaaa".match(/a*/) // ["aaaaaa"]

  • “?” (laziness) repeats zero or one time. Lazy quantifier uses another way to match. It starts from the beginning of the target and tries to match, checking one character at a time and looking for what it wants to match, looping until the end of the character. For example:"aaaaaa".match(/a?/) // ["a"]

  • “+” (possession) repeats zero or more times. The possessive quantifier will cover the target string, and then try to find the matching content, but it only tries once and does not backtrack. For example:
    "aaaaaa".match(/a+/) // ["aaaaaa"]

  • “{n}” is repeated N times
    "aaaaaa".match(/a{3}/) // ["aaa"]

  • “{n, m}” is repeated N to m times
    "aaaaaa".match(/a{3,4}/) // ["aaaa"]

  • “{n,}” is repeated N or more times
    "aaaaaa".match(/a{3,}/) // ["aaaaaa"]

2.2.1 lazy qualifier

  • “*?” is repeated any times, but as few as possible"aabab".match(/a.*?b/) // ["aab"]Why is the first match AAB (first to third characters) instead of AB (second to third characters)? Simply put, because regular expressions have another rule that takes precedence over the lazy / greedy rule:The first match has the highest priority

  • “+?” is repeated one or more times, but as little as possible. As above, only at least once. For example:"aabab".match(/a.+?b/) // ["aab"]

  • “?” is repeated 0 or 1 times, but as few as possible. For example:"aabab".match(/a.??b/) // ["aab"]

  • “{n, m}?” repeat n to m times, but repeat as little as possible. For example:"aaa".match(/a{1,3}?/) // ["a"]

  • “{n,}?” is repeated more than n times, but as few as possible. For example:"aaa".match(/a{1,}?/) // ["a"]

2.2.2 processing options

  • There are three regular expressions supported by regular expressions in JavaScript, G, I and m, which represent global matching, ignoring case and multi line pattern respectively. The three attributes can be freely combined and coexist.

  • In the default mode, the metacharacters ^ and $match the beginning and end of a string, respectively. Pattern m changes the definition of these metacharacters to match the beginning and end of a line.

3、 Regular progression

3.1 capture packet

One of the most important features of regular expressions is the ability to store a part of a successful pattern for later use.Adding parentheses around a regular expression pattern or partial pattern will cause the part of the expression to be stored in a temporary buffer。 (you can use non capture metacharacters’?: ‘,’? = ‘, or’?! ‘to ignore saving this part of the regular expression.)

Each sub match captured is stored in terms of content encountered from left to right in the regular expression pattern. The buffer number for storing sub matches starts at 1 and continues until the maximum 99 subexpressions. Each buffer can be accessed using ‘n’, where n is a one or two decimal number that identifies a particular buffer.

One of the simplest and most useful applications of backward referencing is the ability to locate two consecutive words in a text. for instance:

/(\b[a-zA-Z]+\b)\s+\b/.exec(" asd sf  hello hello asd"); //["hello hello", "hello"]

Explain this example:

1. (B [a-za-z] + b) is a capture group that captures all the words,

" asd sf  hello hello asd".match(/(\b[a-zA-Z]+\b)/g) // ["asd", "sf", "hello", "hello", "asd"]

Note: adding the / g option is easy for me to understand. Without this option, only the first word ASD is output.
2. S adds a space constraint, so the last word is excluded,

" asd sf  hello hello asd".match(/(\b[a-zA-Z]+\b)\s/g) \ ["asd ", "sf ", "hello ", "hello "]

3. “1” backward reference,

" asd sf  hello hello asd".match(/(\b[a-zA-Z]+\b)\s+\b/g) \ ["hello hello"]

To tell you the truth, this example took me a long time to understand, a little bit of thinking, I feel that the concept seems easy, but it is not easy to write.

3.2 common usage of capture grouping (assertion)

  • “(EXP)” matches exp and captures the text into an automatically named group; for example:

/(hello)\sworld/.exec("asdadasd hello world asdasd") // ["hello world", "hello"]
  • “(?: exp)” matches exp, does not capture matching text, and does not assign group numbers to this group

/(?:hello)\sworld/.exec("asdadasd hello world asdasd")  // ["hello world"]
  • “(? = exp)” is used to capture the characters in front of exp. the contents in the group will not be captured and the group number will not be assigned

/hello\s(?=world)/.exec("asdadasd hello world asdasd")  // ["hello "]
  • “(?! exp)” captures the characters not followed by exp, and does not capture the contents of the group, and does not assign the group number

/hello\s(?!world)/.exec("asdadasd hello world asdasd") //null
 The world changes:
/hello\s(?!world)/.exec("asdadasd hello wosrlds asdasd") //["hello "]
  • “(? <! Exp)” matches the position in front of which is not exp; for example:

/(?!<\d)123/.exec("abc123 ") // ["123"]

4、 The use of regular expression in JavaScript

To define a regular expression in JavaScript, the syntax is as follows:

Var reg = / hello / or var reg = new regexp ("hello")

Next, I will list the functions that can use regular expressions in JavaScript, and briefly introduce the functions of these functions.

four point one method

It is used to find the index index of a certain substring in the original string for the first time. If not, it returns – 1. More can be found in the official documentation.

"abchello".search(/hello/);  //  3

four point two String.prototype.replace method

Used to replace a substring in a string. Simple example:

"abchello".replace(/hello/,"hi");   //  "abchi"

It is mentioned in the official documents that:

If the first parameter is a regexp object, then the replacement string can be inserted with the special variable name $n, which is a non negative integer less than 100, indicating the insertion of the string matching the nth bracket.

So the requirements I mentioned at the beginning of the article can be used
str.replace(/<span>(.*?)<\/span>/g, '<img/>')[$1 means / < span > (“(“) matched string]
Answer it.

four point three String.prototype.split method

Used to split strings

"abchelloasdasdhelloasd".split(/hello/);  //["abc", "asdasd", "asd"]

four point four String.prototype.match method

Used to capture the substring of a string into an array.By default, only one result is captured into the arrayWhen the regular expression has the attribute of “global capture” (add parameter g when defining the regular expression), all results will be captured to the array.

"abchelloasdasdhelloasd".match(/hello/);  //["hello"]
"abchelloasdasdhelloasd".match(/hello/g);  //["hello","hello"]

four point five RegExp.prototype.exec method

Similar to the match method of strings, this method also captures the strings that satisfy the conditions from the string into the array, but there are two differences.
1. The exec method can only capture one part of the string into the array at a time, regardless of whether the regular expression has global properties

/hello/g.exec("abchelloasdasdhelloasd"); // ["hello"]

2. Regular expression objects (that is, regexp objects in JavaScript) have a lastindex property,It is used to indicate where to start the capture next time. After executing the exec method, lastindex will be pushed back until null is returned when no matching character is found, and then capture from the beginning again。 This property can be used to traverse substrings in the capture string.

var reg=/hello/g;
reg.lastIndex; //0
reg.exec("abchelloasdasdhelloasd"); // ["hello"]
reg.lastIndex; //8
reg.exec("abchelloasdasdhelloasd"); // ["hello"]
reg.lastIndex; //19
reg.exec("abchelloasdasdhelloasd"); // null
reg.lastIndex; //0

four point six RegExp.prototype.test method

Used to test whether there are substrings in a string

/hello/.test("abchello");  // true

5、 Summary

Finally, I have learned a little about regular expression, and it needs more practice to master it^_^
reference material:

  1. A-Za-z0-9_ ↩
  2. abc ↩

Recommended Today

How to share queues with hypertools 2.5

Share queue with swote To realize asynchronous IO between processes, the general idea is to use redis queue. Based on the development of swote, the queue can also be realized through high-performance shared memory table. Copy the code from the HTTP tutorial on swoole’s official website, and configure four worker processes to simulate multiple producers […]