Another regular expression learning note

Time:2021-8-14

1. \ B: indicates the beginning or end of a word. It may be a space, punctuation or line feed, but \ B does not match any of them. This refers to any of these elements.
Example: \ BHI \ B: find all “Hi” words in the text, but not including him, history, etc
1.1 ^: the beginning of a matching string, especially the beginning of a paragraph.
1.2 $: end of matching string. Refers specifically to the end of a paragraph, both of which are subsets of \ B.
Repeat:
2. *: indicates that the content before * repeats any number of times, and “. *” together indicates any number of characters that do not contain newline characters.
Example: \ BHI \ B. * \ blucy \ B: first an Hi, then any number of characters (but no carriage return), and finally a single word Lucy.
2.1 +: it also indicates quantity, but + must be 1 or more times, excluding 0 times, while * refers to any quantity, including 0 repetitions.
2.2 {n}: quantity control, and the characters in front are accurately repeated N times.
2.3 {n, m}: quantity control. The characters in front are repeated N to m times, n < = M.
2.4 ?: Repeat 0 or 1 times.
3..: indicates any character, excluding carriage return and line feed.
4. \ D: match any number (0,1,2… 9)
Example: 0 \ D \ D – \ D {7}: find a string starting with 0, followed by two numbers, followed by a hyphen “-“, followed by seven numbers, such as 025-8224110.
5. \ s: match any blank characters, including space, tab (TAB), line feed, Chinese full corner space, etc.
6. \ W: match letters, numbers, underline, etc.
Example 1: \ BA \ w * \ B: match starts with the letter “a”, followed by any number of arbitrary characters, excluding white space, and then a word terminator. Its meaning is all words that begin with a.
Example 2: \ B \ w {6} \ B: matches a word exactly 6 characters long.
7. []: match any character in square brackets.
Example: [ABC] \ w{4} \ B: a word that starts with any character in a, B, C and is followed by 4 letters.
Antisense
8. \ D \ s \ w \ B the uppercase forms of these metacharacters respectively represent the antisense of the set they represent.
Example: \ D: all characters that are not numbers, such as: abced
8.1 [^ x]: indicates all characters that are not x characters
8.2 [^ XYZ]: indicates a character that is not one of X, y and Z
9. Replacement
“|”: use the “|” symbol to realize logic or operation. With the use of parenthesis “()”, or operation with different conditions can be realized.
10 grouping
“()”: enclose the implemented expression with parentheses. It is convenient to continue to use repetition, replacement and other operations.
Example: \ B (\ W + \ B \ S +) \ 1 + \ B: use \ 1 to represent the parenthesis expression that appears for the first time, which can match go
It’s very good to learn by yourself. Let’s continue to study the advanced properties of regular expressions
Assertion:
(?= Express) this is a hypothetical condition, which can be placed after the expression. It has been verified whether the expression after the front character is express, but does not include the rear express.
Example: \ B \ w * (= Ing \ b): get the prefix of all words with suffix ing.
(?<= Express) pre assertion, placed at the head of the expression, has verified whether the expression in front of the string conforms to express, and also does not include express itself.
Example: (?)<=\ BRE) \ w * \ B: get the last part of all words prefixed with re
notes:
(?#) Annotate regular expressions in this form.
Example: 2 [0-4] \ D (?)# 200-249)
Lazy pattern matching
*: matches the most characters
*?: Match least characters