[regular expression] matching times of basic reference books

Time:2021-4-22
*. grouping

Before we talk about the number of matches, let’s talk about the others.
Use parentheses () to wrap the character content and make it a subexpression. After matching, in addition to the total matching result, the subexpression matching result will also be stored in memory.
Here, the function of brackets is similar to that of brackets in four operations. The operations in brackets have higher priority and participate in the matching operation as a whole.

1. Use metacharacter to determine matching times

By following a single character or a sub expression with a metacharacter indicating the number of matches, the number of times that the character or sub expression should be continuously matched is determined.

1. ?

?It means 0 or 1 times of matching, that is, the content to be matched is optional.
For example, regular expressions(www\.)?sss.comTo match a stringwww.sss.com, can also matchsss.com

2. *

*It means to match 0 times or more times, that is, to match as many times as possible.
For example, regular expressionsax*bTo match a stringabaxbaxxxxbaxxxxxxxxxxxbAnd so on.

3. +

+It means to match one or more times, that is, to match as many times as possible, but at least once.
For example, regular expressionsax+bTo match a stringaxbaxxxxbaxxxxxxxxxbWait, but it doesn’t matchab

4. Curly brackets

Use curly brackets to wrap natural numbers to indicate the number of matches({m})。 You can add a comma between two natural numbers to indicate the interval range of matching two numbers({m, n})。
If there is only the first natural number and a comma in the curly brackets, it means the number of times to match > = the first number({m,})。
For example:
ce{3}bOnly strings can be matchedceeeb
ce{1,3}bOnly strings can be matchedcebceebceeeb
ce{3,}bCan match stringceeebceeeeeeebceeeeeeeeeeeeeeebWait, but cannot match stringcebceeb

2. Greedy mode and non greedy mode

In a regular expression, the larger the character set corresponding to a character is, the more matching conditions are satisfied. If the number of matches of this character is larger, the number of matches in the string will be more. At this time, how to choose the final matching result?

For example, using regular expressionsa.*bTo match a stringabccbxxbThe matching result should beababccbstillabccbxxb

At this time, we need to judge according to greedy mode or non greedy mode.

Greedy mode (default mode), that is, in the case of regular matching, take the result with the most characters.
Non greedy mode, that is, on the contrary, take the result with the least number of characters. It is used by adding a metacharacter after the metacharacter indicating the number of matches?

According to the above concept, the answer to the previous question should beabccbbxxb
If you want to use non greedy patterns to matchab, the regular expression should be modified toa.*?b

Let’s take a more realistic example. We’re going to take out the link contained in a tag of HTML.
Suppose this html is:

<a href="a.com"></a><div class="one">

For example, we use regular expressions".*"To match, the result is"a.com"></a><div class="one". Due to greedy pattern, the longest string is matched.
Therefore, it should be used".*?"To match, the result is"a.com". At this point, correctly match the link contained in the a tag (although there are two quotation marks).