Regular expressions are easy to understand 1 — your symbols are up to date

Time:2020-8-31

introduction

In leetcode, I got a question about regular expression. However, the information about regular expressions given on the Internet is often ready-made. It is possible to copy them directly, but the essence can not be grasped. Therefore, I use this article to introduce the process of learning regular expressions from simple to deep.The blog will be updated with the progress of learning. I hope you can point out the misunderstanding of bloggers in learning

Starting from symbols

First of all, this paper briefly introduces some basic symbols in regular expressions and their daily usage

^Position the node to the beginning of a line
$Position the node to the end of a row
These two symbols have no other special meanings. They are mainly used for positioning. Here is an example from the book to distinguish them

^catWill return a string starting with cat, such ascatrd
cat$A cat terminated string is returned, such assscat
‘cat’ will return similarsdfcatsdfThat is, any string containing cat

Here is a special example^$This expression matchesEmpty line(no strings, including white space characters)


[]Character groups the values in character groups are potential values
[-]Character group metacharacter underscores only take effect in character groups and represent a region
[^...]The exclusive character group will only take effect in the character group, and will match all elements that are not in the character group (PS: here, the symbol and the beginning ^ are the same symbol, and different positions bring different meanings)
Also, give examples of usage

[aA]pp: match to app or app, character groups are often used to match case
[0-9]: matches all numbers. Elements in character groupsMix and match allowedFor example, [0-9a-zA-Z] matches all numbers and letters
[^0-9]: matches all non numeric characters

Here we need to pay attention to the understanding of exclusive character groups. Exclusive character groups areMatches an unlisted character, rather thanDo not match the characters listed。 The example given in the book is to find a word with the letter Q not followed by the letter U. the regular expression we are trying to use here isq[^u]But Iraq is not in the result set. That is, the regular expression matches any and at least one character other than u. let me put it another way,Even exclusive character groups need to match at least one character


.Match any character

09.10.11: can match 09-10-11 and 09 / 11 / 11 or even 0981011.What needs to be noted here is thatIf you want to narrow the scope of. You can use character groups such as [-. /], but you can’t use [. – /], because – is the range


|Multi choice structure, or (a|b) = [AB] < > [a|b]

The practical scenarios here are(fir|1)stFirst and first are equivalent


?Optional element, indicating that it is allowed, but its presence is not necessary for a successful match

colou?rColor or color can be used
July?(4th|fourth)Various representations of July 4


+One or more occurrences of the immediately preceding element
*Any number of times or none of the previous adjacent elements appear
{}Interval specifies the number of repetitions

[0-9]+Match to 11, 18, which means the numbers appear at least once
[0-9]*Matches to 1, 2 or empty, which means that the numbers can appear more than once or not
[0-9]{2,8}Match to 2-8 numbers
[0-9]{1,}Match at least one number, equivalent to [0-9]+


\Escape character, that is, if the character to be matched is itself a metacharacter, the escape character needs to be added before the character

[1\-9]Equivalent to (1| – |9)


Some shortcut symbols:
\tMatch table symbol
\nMatch line breaks
\bMatch backspace
\fASCII feed character
\sMatches all blanks, including spaces, tabs, newlines, carriage returns
\SMatch all characters except s
\w [a-zA-Z0-9]
\W [^a-zA-Z0-9]
\d [0-9]
\D [^0-9]

[ \t]*You can match multiple spaces or multiple tabs

Collation and summary

Here, the meaning of regular expressions mentioned above is sorted out for future reference.

^Position the node to the beginning of a line
$Position the node to the end of a row
[]Character groups the values in character groups are potential values
[-]Character group metacharacter underscores only take effect in character groups and represent a region
[^...]The exclusive character group will only take effect in the character group, and will match all elements that are not in the character group (PS: here, the symbol and the beginning ^ are the same symbol, and different positions bring different meanings)
.Match any character
|Multi choice structure, or (a|b) = [AB] < > [a|b]
?Optional element, indicating that it is allowed, but its presence is not necessary for a successful match
+One or more occurrences of the immediately preceding element
*Any number of times or none of the previous adjacent elements appear
{}Interval specifies the number of repetitions
\Escape character, that is, if the character to be matched is itself a metacharacter, the escape character needs to be added before the character
\tMatch table symbol
\nMatch line breaks
\bMatch backspace
\fASCII feed character
\sMatches all blanks, including spaces, tabs, newlines, carriage returns
\SMatch all characters except s
\w [a-zA-Z0-9]
\W [^a-zA-Z0-9]
\d [0-9]
\D [^0-9]

So far, you can understand 99% of regular expressions! But when it comes to vividness, more reading and training are needed~
If this blog is useful to you, please remember to collect it for later updates

Regular expressions are easy to understand 1 -- your symbols are up to date
For more development techniques, interview tutorials and Internet Co push, please welcome my WeChat official account. Welfare will be paid irregularly~