Regular regex – Python re Library

Time:2021-12-31

Establishment date: December 10, 2021

Modified on: None

platform

  • WIN10
  • Python 3.9.9

common sense

  • String STR and byte bytes cannot be mixed
  • Regular expressions use the same breakaway characters as python, so use raw strings to represent regular expressions as much as possibler’…’

Special letters

Special characters meaning
. No re Dotall stands for any letter and does not contain newline letters (‘\ n’)

Yes, re Dotall stands for any letter and also contains newline letters (‘\ n’)
^ No re Multiple, representing the beginning of the string

Yes, re Multiple, representing the beginning of each line of the string
$ No re Multiline, which represents the end of the string or the end before the newline letter (‘\ n’)

Yes, re Multiple, representing the end of each line of the string or the beginning of the newline letter (‘\ n’)
* If the content of the previous item is matched more than zero times, this item is greedy matching(greedy), match as many times as possible

For example, use R ‘<. * >’ To match the string ‘< ABC > < ABC >’, you will get ‘< ABC > < ABC >’, instead of ‘< ABC >’
+ If the content of the previous item is matched more than once, this item is greedy matching and matched as many times as possible
? Match the previous item zero or once. This item is greedy matching. Match more times as much as possible
*?, +?, ?? Change the matching times to non greedy matching(non-greedy), match as few times as possible

For example, use R ‘<. *? >’ To match the string ‘< ABC > < ABC >’, you will get ‘< ABC >’, instead of ‘< ABC > < ABC >’
{m,n} Match the previous item at least m times and at most N times. This item is greedy matching. If m or n is omitted, it is equal to 0 or infinity
{m,n}? The way to change the number of matches is non greedy matching, and match as few times as possible
\ Break away from characters, change special characters into general characters, or some special words
[ ] Character set, which specifies all single characters that can match characters, such as hexadecimal number [0123456789 abcdefabcdef]

If ‘-‘ is added between two characters, it represents a continuous range, such as hexadecimal number [0-9a-zA-Z]

Special characters here will represent general characters

Character classes can be used here
[^ ] Character set, specifying a mismatched character set
| Match the one on the left. If not, match the one on the right, which is equivalent to “or”
( ) Define the matching of a block as a group. The group can represent the same matching content in the regular expression. The number starts from 1 and the maximum is 99
(? ) Represents an extended function? The following first character represents its function. Except (? P…), this is not a group
(?aiLmsux) Each setting should be placed at the front of the regular expression, and each text corresponds to re A, re. I, re. L, re. M, re. S, re. U, re. X setting
(?: ) Represents a group separated by parentheses, but this group cannot be referenced or obtained after matching
(?aiLmsux-imsx: ) Set or remove settings
(?P<name>) Set the matching of the group, and take name as the name of the group. You can use (? P = name), \ number to reference. When replacing the string, you can use \ g < name >, \ g < 1 >, \ 1 to reference
(?P=name) Reference a group with that name to match the same content
(?# ) annotation
(?= ) Positive lookahead assertion asserts in advance. It will only match if it matches later. It itself does not consume any string
(?! ) Negative lookahead assertion: forward-looking assertions will only match if they do not match later, and it does not consume any strings
(?<= ) Positive look behind assertion positively reviews the assertion that if the front matches, it will match, and the content of fixed length must be used
(?<! ) Negative look behind assertion negates the review assertion. If the previous does not match, it will match, and the content of fixed length must be used
(?(id/name)

yes-pattern

|no-pattern)
If the group has a match, use yes pattern to match. Otherwise, use no pattern, which can be omitted
\Number Reference the group with this number to match the same content
\A Start of string
\b Matches an empty string, either before or after a word, or between words
\B Matches an empty string, but not before or after a word, or between words
\d Matches any decimal digit sign [0-9]
\D Match any non decimal digit [^ 0-9]
\s Match any white space characters [\ t \ n \ R \ f \ v]
\S Match any non white space characters [^ \ t \ n \ R \ f \ v]
\w Match any letters, numbers and underscores [a-za-z0-9_]
\W Match any non alphanumeric and underscore [^ a-za-z0-9]
\Z End of string

Settings in regular expressions

set up Regular setting explain
re.A

re.ASCII
(?a) For \ W, \ W, \ B, \ B, \ D, \ D, \ s, \ s, only ASCII characters are matched
re.U

re.UNICODE
(?u) For \ W, \ W, \ B, \ B, \ D, \ D, \ s, \ s, Unicode characters can be matched (not necessary in Python 3)
re.I

re.IGNORECASE
(?i) Large and small initials are considered the same
re.L

re.LOCALE
(?L) For \ W, \ W, \ B, \ B, the case is different. Depending on the current locale, this flag is only applicable to bit group regular expressions and is not recommended because it is very unreliable to use the current locale mechanism
re.M

re.MULTILINE
(?m) Change the matching pattern of the beginning and end of the ‘^’ and ‘$’ strings to the multi line matching pattern of each line
re.S

re.DOTALL
(?s) Yes. ‘ Match any character pattern except the newline character, and change to all match pattern
re.X

re.VERBOSE
(?x) The regular format can be arranged at will;

Except in set characters, or when preceded by an escaped backslash, or *?, (?:, (? P < >, other white space characters will be omitted. You can add spaces, wrap lines, and add # subsequent comments
re.DEBUG Displays debugging information about compiled expressions

Regular function and method of regular object

The compiled regular objects can be reused and more efficient. The parameter flags is provided to facilitate various settings. Python’s | (or) operator is used between settings for combined use

import re

regex = re.compile(pattern, flags=0)
Regular function

Method of regular object
explain
re.search(pattern, string, flags=0)

regex.search(string[, pos[, endpos]])
In string, find the position of the first matching pattern. When matching, a match object is returned. If it cannot be matched, none is returned
re.match(pattern, string, flags=0)

regex.match(string[, pos[, endpos]])
In string, the matching position of pattern must be the beginning of string. When matching, a match object is returned. If it cannot be matched, none is returned
re.fullmatch(pattern, string, flags=0)

regex.fullmatch(string[, pos[, endpos]])
String must match pattern exactly. A match object is returned when matching. If it cannot match, none is returned
re.split(pattern, string, maxsplit=0, flags=0)

regex.split(string, maxsplit=0)
The string string is divided by each sub string of the matching pattern. The non-zero maxplit is the maximum number of times of segmentation. If it is zero, all strings are divided, and all groups will be in the output list
re.findall(pattern, string, flags=0)

regex.findall(string[, pos[, endpos]])
All matching substrings that do not overlap, no group, return the list of strings, single group, return the list of group strings, multiple groups return the list of string tuples, and if there is no match, return the empty list
re.finditer(pattern, string, flags=0)

regex.finditer(string[, pos[, endpos]])
All matching match object iterators that do not overlap
re.sub(pattern, repl, string, count=0, flags=0)

regex.sub(repl, string, count=0)
For all matches in which the strings do not overlap, replace the previous count with repl. A count of 0 represents all substitutions, and the replaced string is returned

Repl can be a string, in which the off action of backslash will be referenced, especially those representing a group, such as \ 2 or \ g < 2 >

Repl can be a function that passes in a matching object and returns a string for replacement
re.subn(pattern, repl, string, count=0, flags=0)

regex.subn(repl, string, count=0)
Similar to re Sub, but returns a string tuple (string after replacement, number of times of replacement)
re.escape(pattern) Replace the general string with a regular string with a breakaway character, which is not applicable to the string of repl
re.purge() Clear regular cache
regex.flags Regular matching flag
regex.groups Number of groups found in pattern
regex.groupindex The symbol group name (? P < ID >) corresponds to the dictionary of group numbers
regex.pattern Regular pattern
re.error(msg, pattern=None, pos=None) Related attributes of exception

-MSG unformatted error message

-Pattern regular pattern

-POS in pattern, the wrong location index may be none

-Lineno corresponds to the line number in POS, which may be none

-Colno corresponds to the column number in POS, which may be none

Match object properties

attribute explain
group([group1, …]) Returns the matching string or tuple of the matching group; If there is no parameter, it is regarded as 0 and all matching strings are returned; If a parameter is 0, this part returns all matching strings; If it is negative or greater than the maximum number of groups, an indexerror will be generated; If a group does not match, return none; If the group is matched multiple times, the last one is returned; You can also use group names
__getitem__(g) You can use the index method to return the matching results of a single group
groups(default=None) Return tuples of all matching groups; Default specifies that there is no matching group preset
groupdict(default=None) Returns the dictionary of the matching result corresponding to the group name; Default specifies that there is no matching group preset
start([group])

end([group])
Returns the string index of the group matching. If there is no matching, it returns – 1; The default value of group is 0, which refers to the first and last indexes of all matches; If it is negative or greater than the maximum number of groups, an indexerror will be generated
span([group]) Return (m.start (Group), m.end (Group))
pos Match the index start point set
endpos Matches the set index endpoint
lastindex Finally, the group index is matched. If there is no match, none is returned
lastgroup Finally, the group name is matched. If there is no matching group name, none is returned
re Regular object
string String to match

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Jason Yang