re. Match function
re. Match attempts to match a pattern from the starting position of the string. If the matching is not successful, match () returns none.
Function syntax:
re.match(pattern, string, flags=0)
Function parameter description:
parameter | describe |
---|---|
pattern | Matching regular expressions |
string | The string to match. |
flags | Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag |
Match succeeded re The match method returns a matching object, otherwise it returns none.
We can use the group (Num) or groups () matching object function to get the matching expression.
Matching object method | describe |
---|---|
group(num=0) | Group () can enter multiple group numbers at a time, in which case it will return a tuple containing the values corresponding to those groups. |
groups() | Returns a tuple containing all group strings, from 1 to the contained group number. |
example
The output results of the above examples are:
(0, 3)
None
example
The execution results of the above examples are as follows:
matchObj.group() : Cats are smarter than dogs
matchObj.group(1) : Cats
matchObj.group(2) : smarter
re. Search method
re. Search scans the entire string and returns the first successful match.
Function syntax:
re.search(pattern, string, flags=0)
Function parameter description:
parameter | describe |
---|---|
pattern | Matching regular expressions |
string | The string to match. |
flags | Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag |
Match succeeded re The search method returns a matching object, otherwise it returns none.
We can use the group (Num) or groups () matching object function to get the matching expression.
Matching object method | describe |
---|---|
group(num=0) | Group () can enter multiple group numbers at a time, in which case it will return a tuple containing the values corresponding to those groups. |
groups() | Returns a tuple containing all group strings, from 1 to the contained group number. |
example
The output results of the above examples are:
(0, 3)
(11, 14)
example
The execution results of the above examples are as follows:
searchObj.group() : Cats are smarter than dogs
searchObj.group(1) : Cats
searchObj.group(2) : smarter
re. Match and re The difference between search
re. If match does not match the regular expression, the function returns only the starting string, and if none matches, the function returns only the starting string Search matches the entire string until a match is found.
example
The operation results of the above examples are as follows:
No match!!
search --> matchObj.group() : dogs
Retrieval and replacement
Python’s re module provides re Sub is used to replace matches in the string.
Syntax:
re.sub(pattern, repl, string, count=0, flags=0)
Parameters:
- Pattern: the pattern string in the regular.
- Repl: replaced string, which can also be a function.
- String: the original string to be found and replaced.
- Count: the maximum number of times to replace after pattern matching. By default, 0 means to replace all matches.
- Flags: the matching pattern used at compile time, in digital form.
The first three are required parameters and the last two are optional parameters.
example
The execution results of the above examples are as follows:
Telephone number: 2004-959-559
Telephone number: 2004959559
The repl parameter is a function
The following example multiplies the matching number in the string by 2:
example
The execution output is:
A46G8HFD1134
Compile function
The compile function is used to compile regular expressions and generate a regular expression (pattern) object for use by the match () and search () functions.
The syntax format is:
re.compile(pattern[, flags])
Parameters:
- Pattern: a regular expression in the form of a string
- Flags is optional, indicating the matching mode, such as ignoring case, multi line mode, etc. the specific parameters are:
-
- re. I ignore case
- re. L indicates that the special character set \ W, \ W, \ B, \ B, \ s, \ s depends on the current environment
- re. M multiline mode
- re. S is’. ‘ And any character including newline character (‘.’ does not include newline character)
- re. U indicates that the special character set \ W, \ W, \ B, \ B, \ D, \ D, \ s, \ s depends on the Unicode character attribute database
- re. X to increase readability, ignore spaces and comments after ‘#’
example
example
Above, when the match is successful, a match object is returned, where:
group([group1, …])
Method is used to obtain one or more grouping matching strings. When you want to obtain the whole matching substring, you can use it directlygroup()
Orgroup(0)
;start([group])
Method is used to obtain the starting position (the index of the first character of the substring) of the matched substring in the whole string. The default value of the parameter is 0;end([group])
Method is used to obtain the end position of the sub string matched by the group in the whole string (the index of the last character of the sub string + 1), and the default value of the parameter is 0;span([group])
Method return(start(group), end(group))
。
Take another example:
example
findall
Find all substrings matched by the regular expression in the string and return a list. If there are multiple matching patterns, return the tuple list. If no matching pattern is found, return the empty list.
be careful:Match and search match all at once.
The syntax format is:
re.findall(pattern, string, flags=0)
or
pattern.findall(string[, pos[, endpos]])
Parameters:
- patternMatch pattern.
- stringString to match.
- posOptional parameter that specifies the starting position of the string. The default value is 0.
- endposThe length of the specified string is optional, and the position of the specified parameter is the end of the string.
Find all numbers in a string:
example
Output result:
['123', '456']
['123', '456']
['88', '12']
Multiple matching patterns, return tuple list:
example
result = re.findall(r'(\w+)=(\d+)’, ‘set width=20 and height=10’)
print(result)
[('width', '20'), ('height', '10')]
re.finditer
Similar to findall, all substrings matched by the regular expression are found in the string and returned as an iterator.
re.finditer(pattern, string, flags=0)
Parameters:
parameter | describe |
---|---|
pattern | Matching regular expressions |
string | The string to match. |
flags | Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag |
example
Output result:
12
32
43
3
re.split
The split method splits the string according to the substring that can be matched and returns the list. Its use form is as follows:
re.split(pattern, string[, maxsplit=0, flags=0])
Parameters:
parameter | describe |
---|---|
pattern | Matching regular expressions |
string | The string to match. |
maxsplit | Division times: maxplit = 1. It is divided once. The default value is 0. There is no limit on the number of times. |
flags | Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag |
example
Regular expression object
re.RegexObject
re. Compile() returns the regexobject object.
re.MatchObject
Group() returns the string matched by re.
- start()Returns the start of the match
- end()Returns the location where the match ended
- span()Returns a tuple containing the location of the match (start, end)
Regular expression modifier – optional flag
Regular expressions can contain optional flag modifiers to control the pattern of matching. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|) them. Such as re I | re. M is set to I and M flags:
Modifier | describe |
---|---|
re.I | Make matching pairs case insensitive |
re.L | Do local aware matching |
re.M | Multi line matching, affecting ^ and$ |
re.S | Make Matches all characters, including line breaks |
re.U | Parses characters according to the Unicode character set. This flag affects \ W, \ W, \ B, \ B |
re.X | This flag gives you a more flexible format so that you can write regular expressions easier to understand. |
Regular expression pattern
Pattern strings use special syntax to represent a regular expression.
Letters and numbers represent themselves. Letters and numbers in a regular expression pattern match the same string.
Most letters and numbers have different meanings when preceded by a backslash.
Punctuation marks match themselves only when they are escaped, otherwise they represent a special meaning.
The backslash itself requires a backslash escape.
Since regular expressions usually contain backslashes, you’d better use the original string to represent them. Schema elements (e.gr’\t’, equivalent to\\t) match the corresponding special characters.
The following table lists the special elements in the regular expression pattern syntax. If you provide optional flag parameters while using patterns, the meaning of some pattern elements will change.
pattern | describe |
---|---|
^ | Matches the beginning of the string |
$ | Matches the end of the string. |
. | Matches any character except the newline character when re When dotall tag is specified, it can match any character including newline character. |
[…] | Used to represent a group of characters, listed separately: [AMK] matches’ a ‘,’m’ or ‘k’ |
[^…] | Characters not in [] [^ ABC] match characters other than a, B, C. |
re* | Matches 0 or more expressions. |
re+ | Matches one or more expressions. |
re? | Match 0 or 1 fragments defined by the previous regular expression, non greedy |
re{ n} | Match n previous expressions. For example, “O {2}” cannot match “O” in “Bob”, but can match two o’s in “food”. |
re{ n,} | Exactly match n previous expressions. For example, “O {2,}” cannot match “O” in “Bob”, but can match all o in “fooood”. “O {1,}” is equivalent to “O +”. “O {0,}” is equivalent to “O *”. |
re{ n, m} | Match the fragment defined by the previous regular expression n to m times, greedy way |
a| b | Match a or B |
(re) | Matches the expression in parentheses and also represents a group |
(?imx) | Regular expressions contain three optional flags: I, m, or X. Only the areas in parentheses are affected. |
(?-imx) | Regular expressions turn off the I, m, or X optional flag. Only the areas in parentheses are affected. |
(?: re) | Similar (…), But it does not represent a group |
(?imx: re) | Use the I, m, or X optional flag in parentheses |
(?-imx: re) | Do not use the I, m, or X optional flags in parentheses |
(?#…) | notes. |
(?= re) | Forward positive delimiter. If it contains regular expressions, use Indicates that it succeeds when the current position is successfully matched, otherwise it fails. However, once the contained expression has been tried, the matching engine has not improved at all; The rest of the pattern also tries to the right of the delimiter. |
(?! re) | Forward negative delimiter. Contrary to the affirmative delimiter; Success when the contained expression cannot match the current position of the string. |
(?> re) | Matching independent patterns, eliminating backtracking. |
\w | Match alphanumeric underscores |
\W | Match non numeric alphabetic underscores |
\s | Match any white space character, equivalent to [\ t \ n \ R \ F]. |
\S | Match any non null character |
\d | Match any number, equivalent to [0-9]. |
\D | Match any non number |
\A | Start of matching string |
\Z | The matching string ends. If there is a newline, only the end string before the newline is matched. |
\z | End of matching string |
\G | Match the position where the last match was completed. |
\b | Match a word boundary, that is, the position between the word and the space. For example, ‘er \ B’ can match ‘er’ in ‘never’, but not ‘er’ in ‘verb’. |
\B | Matches non word boundaries. ‘ Er \ B ‘can match’ er ‘in’ verb ‘, but cannot match’ er ‘in’ never ‘. |
\n. \ T, wait. | Match a newline character. Match a tab, etc |
\1…\9 | Match the contents of the nth group. |
\10 | Match the content of the nth group if it is matched. Otherwise, it refers to the expression of octal character code. |
Regular expression instance
Character matching
example | describe |
---|---|
python | Match “Python” |
Character class
example | describe |
---|---|
[Pp]ython | Match “Python” or “Python” |
rub[ye] | Match “Ruby” or “Ruby” |
[aeiou] | Match any letter in brackets |
[0-9] | Match any number. Similar to [0123456789] |
[a-z] | Match any lowercase letters |
[A-Z] | Match any uppercase letters |
[a-zA-Z0-9] | Match any letters and numbers |
[^aeiou] | All characters except aeiou letters |
[^0-9] | Matches characters other than numbers |
Special character class
example | describe |
---|---|
. | Matches any single character except ‘\ n’. To match any character including ‘\ n’, use a pattern like ‘[. \ n]’. |
\d | Match a numeric character. Equivalent to [0-9]. |
\D | Matches a non numeric character. Equivalent to [^ 0-9]. |
\s | Matches any white space characters, including spaces, tabs, page breaks, and so on. Equivalent to [\ f \ n \ R \ t \ v]. |
\S | Matches any non whitespace characters. Equivalent to [^ \ f \ n \ R \ t \ v]. |
\w | Matches any word characters that include underscores. Equivalent to ‘[a-za-z0-9#]’. |
\W | Does not match any character. Equivalent to ‘[^ a-za-z0-9]’. |