Understanding the Mars (regular) expressions: trapping elements and non trapping elements


Want to understand all those abnormal regular expressions in the world? Dream, young, what do you want, do your best to learn


JS has been learning and using for nearly one or two years. When you find objects, you can use them, and you can also learn how to inherit them. However, when you look at other people’s framework code, you will always get stuck at any time. One of the major reasons is that you can’t understand a string of Martian characters (regular expressions). Learning is to find and fill in the gaps. You are not afraid that you don’t understand, but you think you understand it all. Before regexy, whether it’s a regular software or a regular tool, it’s a good thing to do.

Knowledge aggregation

Grammar review, focus on three parts of knowledge:

  1. Greedy matching (? 0 = < x < = 1, + x > = 1, * any) and those associated qualifiers ({n}, {n, m});
  2. Special characters: ^ $. * +? =!: | / () [] {}
    In order to match the original meaning of the characters, the literal expression should be preceded by a single slash, and a double slash should be added if new is used;
  3. Non captured metacharacters:?:,? = (positive pre query),?! (negative pre query);
  4. Backtracking reference, the previous character matching is basically related to him;
  5. Other, what character boundary, bracket, bracket, etc;

Regular expression analysis principle: This is not my slag can write out temporarily, recommend one

Progressive analysis

Greedy matching

First understand greedy matching, and the daily application of regular expressions is basically satisfied. In the beginning of the rookie tutorial, we have already mentioned in detail the syntax. For example, there is a regex: Chapter [1-9] /, which we can only match to chapter 1-chapter 9, This is the first level title of a chapter. But what if we want to match a second or third level title, greedy matching is used here, which is to maximize the matching result in the target string. Change the previous regex / chapter [1-9] / to / chapter [1-9] + /, so that we can match Chapter 1, Chapter 12 and Chapter 123, but if we change it to / chapter [1-9]? /, No matter how many numbers are entered after / chapter, it can only match one number at most. Here is Chapter 1. But different from the original expression, this expression can also match the naked chapter, which is called (x?) The X in front of the question mark can appear 0 times or 1 time. When we change it to / chapter [1-9] asterisk (avoid the markdown syntax) /, this can finally achieve? The common result of and + is that x appears any number of times. We can also match the occurrence times of X by [n, M], that is, n = < x < = m. The effect of {0,1} is equivalent to?, while {1,} is equivalent to +, {0,} is equivalent to asterisk (avoiding markdown syntax).

lazy match

The other pairing with greedy matching is called lazy matching, adding a? After all greedy matching in the previous part, so the whole expression becomes lazy matching, which can be understood as the minimum matching. For example, the result of / chapter [1-9] + / chapter 12345 is Chapter 12345, but / chapter [1-9] +? /The result of matching is Chapter 1 / chapter [1-9] {2,4} / the matching result is Chapter 1234, and / chapter [1-9] {2,4}? /The result is Chapter 12, which is the so-called minimum to match the result, take the lower limit, usually known as lazy pattern.

Capture element and non capture element

What did you see before?:,? =,?!, and I didn’t pay attention to it. Recently, large-scale disasters have often been seen. I was so scared that I ran into a regex expression in gulp: / – [0-9a-f] {8,10} -? / (matching MD5 value in app-7ef5d9ee29. CSS), What’s the special meaning of ‘-?’ in the end, I found out that TMD is a greedy match, you idiot, but I really don’t understand what the source code author is thinking. Maybe I didn’t come across app-7ef5d9ee29- any.css Such a file name, or you have to add a ‘-?’ why, let me jump straight into the pit.
Back to the main topic, first understand what is a capture group. To sum up, in the form of brackets, such as’ (pattern) ‘, the matching meets the requirements of brackets, which is a capture group. Let’s start with a definition from the rookie tutorial:Understanding the Mars (regular) expressions: trapping elements and non trapping elements
What’s the difference between the four forms, add? And no? The difference is that the capture element and the non capture element are different. The expression is to match with the exec method, and the capture group is simply saved in a set of variables. The theory is too boring. If you look at the example directly, it comes from the JS high-level setting page106, with slight changes:

var str ='mom and dad and baby';
    Var pattern = / mom (and Dad (and baby)) /; // capture meta form
    Var pat = / mom (?: and Dad (?: and baby)) /; // non capture meta form
    var mat = pattern.exec(str);
    var match = pat.exec(str);

Understanding the Mars (regular) expressions: trapping elements and non trapping elements
Look at the results printed by devtools. Yes, although the matching results are consistent, when the capture group matches, the unit meeting the capture element form will be saved as a matching result, while the non capture element will not be saved separately, only the complete matching result will be saved. Our common regexp. $1 and $2 are actually references to capture group results.
Capture element and non capture element understand, then (?: pattern) and (? = pattern) what is the difference, the answer, two differences. Difference 1: the results of the former include the capture element, while the latter does not; the second difference is that the former consumes characters (indexes) while the latter does not. Let’s take an example

var str ='ababa';
var pattern = /ab(?:a)/g;
var pat=  /ab(?=a)/g; 
var mat = pattern.exec(str);
var match = pat.exec(str);
 mat =  pattern.exec (STR); // global pattern, second match
 match =  pat.exec (STR); // global pattern, second match

Understanding the Mars (regular) expressions: trapping elements and non trapping elements

From the screenshot of the code running above, we can see that difference 1, that is, the result of capture element matching in the form of (?: pattern) will be saved in the final result, and (? = pattern); the difference 2 is not very obvious. At this time, we need to rely on regexbuddy. What happened in the process? Look at the running screenshot. If you are careful enough, you can find the difference. When you first match the result and start the second match,?: starts from the character index 3, and? = starts from 2, which is the character consumed and non consumed characters mentioned above.
Understanding the Mars (regular) expressions: trapping elements and non trapping elements
Well, the last question, the whole box pre check (? = pattern) and negative pre check (?! pattern), in fact, it will bring ambiguity if we simply understand the negative preview from Chinese. In fact, the negative direction here is only the reverse of the positive pre query, that is, if the characters to be matched do not meet the capture conditions, the results can be matched.
If there is any incorrect or vague description in the article, please correct it in time.
Well, let’s start with so much. Although they are jobless, they should also have the right to enjoy the weekend. After all, the pressure of looking for a job is so great that we still need to relieve ourselves. See you last week.