Regular expressions in Python tutorial 18

Time:2022-5-13

re. Match function

re. Match attempts to match a pattern from the starting position of the string. If the matching is not successful, match () returns none.

Function syntax

re.match(pattern, string, flags=0)

Function parameter description:

parameter describe
pattern Matching regular expressions
string The string to match.
flags Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag

Match succeeded re The match method returns a matching object, otherwise it returns none.

We can use the group (Num) or groups () matching object function to get the matching expression.

Matching object method describe
group(num=0) Group () can enter multiple group numbers at a time, in which case it will return a tuple containing the values corresponding to those groups.
groups() Returns a tuple containing all group strings, from 1 to the contained group number.

example

#!/usr/bin/python import re print(re.match(www, www.runoob.com).span()) #Match at start print(re.match(com, www.runoob.com)) #Do not match at start

The output results of the above examples are:

(0, 3)
None

example

#!/usr/bin/python3 import re line = Cats are smarter than dogs # .* Represents any match to any single or multiple characters except line breaks (\ n, \ R) # (.*?) Indicates “non greedy” mode, and only the first matched substring is saved matchObj = re.match( r(.*) are (.*?) .*, line, re.M|re.I) if matchObj: print (matchObj.group() : , matchObj.group()) print (matchObj.group(1) : , matchObj.group(1)) print (matchObj.group(2) : , matchObj.group(2)) else: print (No match!!)

The execution results of the above examples are as follows:

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

re. Search method

re. Search scans the entire string and returns the first successful match.

Function syntax:

re.search(pattern, string, flags=0)

Function parameter description:

parameter describe
pattern Matching regular expressions
string The string to match.
flags Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag

Match succeeded re The search method returns a matching object, otherwise it returns none.

We can use the group (Num) or groups () matching object function to get the matching expression.

Matching object method describe
group(num=0) Group () can enter multiple group numbers at a time, in which case it will return a tuple containing the values corresponding to those groups.
groups() Returns a tuple containing all group strings, from 1 to the contained group number.

example

#!/usr/bin/python3 import re print(re.search(www, www.runoob.com).span()) #Match at start print(re.search(com, www.runoob.com).span()) #Do not match at start

The output results of the above examples are:

(0, 3)
(11, 14)

example

#!/usr/bin/python3 import re line = Cats are smarter than dogs searchObj = re.search( r(.*) are (.*?) .*, line, re.M|re.I) if searchObj: print (searchObj.group() : , searchObj.group()) print (searchObj.group(1) : , searchObj.group(1)) print (searchObj.group(2) : , searchObj.group(2)) else: print (Nothing found!!)

The execution results of the above examples are as follows:

searchObj.group() :  Cats are smarter than dogs
searchObj.group(1) :  Cats
searchObj.group(2) :  smarter

re. Match and re The difference between search

re. If match does not match the regular expression, the function returns only the starting string, and if none matches, the function returns only the starting string Search matches the entire string until a match is found.

example

#!/usr/bin/python3 import re line = Cats are smarter than dogs matchObj = re.match( rdogs, line, re.M|re.I) if matchObj: print (match –> matchObj.group() : , matchObj.group()) else: print (No match!!) matchObj = re.search( rdogs, line, re.M|re.I) if matchObj: print (search –> matchObj.group() : , matchObj.group()) else: print (No match!!)

The operation results of the above examples are as follows:

No match!!
search --> matchObj.group() :  dogs

Retrieval and replacement

Python’s re module provides re Sub is used to replace matches in the string.

Syntax:

re.sub(pattern, repl, string, count=0, flags=0)

Parameters:

  • Pattern: the pattern string in the regular.
  • Repl: replaced string, which can also be a function.
  • String: the original string to be found and replaced.
  • Count: the maximum number of times to replace after pattern matching. By default, 0 means to replace all matches.
  • Flags: the matching pattern used at compile time, in digital form.

The first three are required parameters and the last two are optional parameters.

example

#!/usr/bin/python3 import re phone = 2004-959-559 # this is a telephone number #Delete comment num = re.sub(r#.*$, , phone) print (phone number:, num) #Remove non numeric content num = re.sub(r\D, , phone) print (phone number:, num)

The execution results of the above examples are as follows:

Telephone number: 2004-959-559 
Telephone number: 2004959559

The repl parameter is a function

The following example multiplies the matching number in the string by 2:

example

#!/usr/bin/python import re #Multiply the matching number by 2 def double(matched): value = int(matched.group(value)) return str(value * 2) s = A23G4HFD567 print(re.sub((?P\d+), double, s))

The execution output is:

A46G8HFD1134

Compile function

The compile function is used to compile regular expressions and generate a regular expression (pattern) object for use by the match () and search () functions.

The syntax format is:

re.compile(pattern[, flags])

Parameters:

  • Pattern: a regular expression in the form of a string
  • Flags is optional, indicating the matching mode, such as ignoring case, multi line mode, etc. the specific parameters are:
    • re. I ignore case

    • re. L indicates that the special character set \ W, \ W, \ B, \ B, \ s, \ s depends on the current environment
    • re. M multiline mode
    • re. S is’. ‘ And any character including newline character (‘.’ does not include newline character)
    • re. U indicates that the special character set \ W, \ W, \ B, \ B, \ D, \ D, \ s, \ s depends on the Unicode character attribute database
    • re. X to increase readability, ignore spaces and comments after ‘#’

example

example

>>>import re >>> pattern = re.compile(r\d+) #Used to match at least one number >>> m = pattern.match(one12twothree34four) #Find header, no match >>> print( m ) None >>> m = pattern.match(one12twothree34four, 2, 10) #Match from the position of ‘e’, no match >>> print( m ) None >>> m = pattern.match(one12twothree34four, 3, 10) #Match from the position of ‘1’, just match >>> print( m ) #Returns a match object <_sre.SRE_Match object at 0x10a42aac0> >>> m.group(0) #0 can be omitted 12 >>> m.start(0) #0 can be omitted 3 >>> m.end(0) #0 can be omitted 5 >>> m.span(0) #0 can be omitted (3, 5)

Above, when the match is successful, a match object is returned, where:

  • group([group1, …])Method is used to obtain one or more grouping matching strings. When you want to obtain the whole matching substring, you can use it directlygroup()Orgroup(0)
  • start([group])Method is used to obtain the starting position (the index of the first character of the substring) of the matched substring in the whole string. The default value of the parameter is 0;
  • end([group])Method is used to obtain the end position of the sub string matched by the group in the whole string (the index of the last character of the sub string + 1), and the default value of the parameter is 0;
  • span([group])Method return(start(group), end(group))

Take another example:

example

>>>import re >>> pattern = re.compile(r([a-z]+) ([a-z]+), re.I) # re. I means ignore case >>> m = pattern.match(Hello World Wide Web) >>> print( m ) #If the match is successful, a match object is returned <_sre.SRE_Match object at 0x10bea83e8> >>> m.group(0) #Returns the entire substring that matches successfully Hello World >>> m.span(0) #Returns the index of the entire substring that matches successfully (0, 11) >>> m.group(1) #Returns the substring of the first packet matching success Hello >>> m.span(1) #Returns the index of the substring of the first packet matching success (0, 5) >>> m.group(2) #Returns the substring of the second packet matching success World >>> m.span(2) #Returns the substring index of the second packet matching success (6, 11) >>> m.groups() #Equivalent to (m.group (1), m.group (2),…) (Hello, World) >>> m.group(3) #There is no third group Traceback (most recent call last): File , line 1, in <module> IndexError: no such group

findall

Find all substrings matched by the regular expression in the string and return a list. If there are multiple matching patterns, return the tuple list. If no matching pattern is found, return the empty list.

be careful:Match and search match all at once.

The syntax format is:

re.findall(pattern, string, flags=0)
or
pattern.findall(string[, pos[, endpos]])

Parameters:

  • patternMatch pattern.
  • stringString to match.
  • posOptional parameter that specifies the starting position of the string. The default value is 0.
  • endposThe length of the specified string is optional, and the position of the specified parameter is the end of the string.

Find all numbers in a string:

example

import re result1 = re.findall(r\d+,runoob 123 google 456) pattern = re.compile(r\d+) #Find number result2 = pattern.findall(runoob 123 google 456) result3 = pattern.findall(run88oob123google456, 0, 10) print(result1) print(result2) print(result3)

Output result:

['123', '456']
['123', '456']
['88', '12']

Multiple matching patterns, return tuple list:

example

import re

result = re.findall(r'(\w+)=(\d+)’, ‘set width=20 and height=10’)
print(result)

[('width', '20'), ('height', '10')]

re.finditer

Similar to findall, all substrings matched by the regular expression are found in the string and returned as an iterator.

re.finditer(pattern, string, flags=0)

Parameters:

parameter describe
pattern Matching regular expressions
string The string to match.
flags Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag

example

import re it = re.finditer(r\d+,12a32bc43jf3) for match in it: print (match.group() )

Output result:

12 
32 
43 
3

re.split

The split method splits the string according to the substring that can be matched and returns the list. Its use form is as follows:

re.split(pattern, string[, maxsplit=0, flags=0])

Parameters:

parameter describe
pattern Matching regular expressions
string The string to match.
maxsplit Division times: maxplit = 1. It is divided once. The default value is 0. There is no limit on the number of times.
flags Flag bit, which is used to control the matching method of regular expressions, such as case sensitivity, multi line matching, etc. See:Regular expression modifier – optional flag

example

>>>import re >>> re.split(\W+, runoob, runoob, runoob.) [runoob, runoob, runoob, ] >>> re.split((\W+), runoob, runoob, runoob.) [, , runoob, , , runoob, , , runoob, ., ] >>> re.split(\W+, runoob, runoob, runoob., 1) [, runoob, runoob, runoob.] >>> re.split(a*, hello world) #Split does not split a string that cannot be matched [hello world]

Regular expression object

re.RegexObject

re. Compile() returns the regexobject object.

re.MatchObject

Group() returns the string matched by re.

  • start()Returns the start of the match
  • end()Returns the location where the match ended
  • span()Returns a tuple containing the location of the match (start, end)

Regular expression modifier – optional flag

Regular expressions can contain optional flag modifiers to control the pattern of matching. The modifier is specified as an optional flag. Multiple flags can be specified by bitwise OR (|) them. Such as re I | re. M is set to I and M flags:

Modifier describe
re.I Make matching pairs case insensitive
re.L Do local aware matching
re.M Multi line matching, affecting ^ and$
re.S Make Matches all characters, including line breaks
re.U Parses characters according to the Unicode character set. This flag affects \ W, \ W, \ B, \ B
re.X This flag gives you a more flexible format so that you can write regular expressions easier to understand.

Regular expression pattern

Pattern strings use special syntax to represent a regular expression.

Letters and numbers represent themselves. Letters and numbers in a regular expression pattern match the same string.

Most letters and numbers have different meanings when preceded by a backslash.

Punctuation marks match themselves only when they are escaped, otherwise they represent a special meaning.

The backslash itself requires a backslash escape.

Since regular expressions usually contain backslashes, you’d better use the original string to represent them. Schema elements (e.gr’\t’, equivalent to\\t) match the corresponding special characters.

The following table lists the special elements in the regular expression pattern syntax. If you provide optional flag parameters while using patterns, the meaning of some pattern elements will change.

pattern describe
^ Matches the beginning of the string
$ Matches the end of the string.
. Matches any character except the newline character when re When dotall tag is specified, it can match any character including newline character.
[…] Used to represent a group of characters, listed separately: [AMK] matches’ a ‘,’m’ or ‘k’
[^…] Characters not in [] [^ ABC] match characters other than a, B, C.
re* Matches 0 or more expressions.
re+ Matches one or more expressions.
re? Match 0 or 1 fragments defined by the previous regular expression, non greedy
re{ n} Match n previous expressions. For example, “O {2}” cannot match “O” in “Bob”, but can match two o’s in “food”.
re{ n,} Exactly match n previous expressions. For example, “O {2,}” cannot match “O” in “Bob”, but can match all o in “fooood”. “O {1,}” is equivalent to “O +”. “O {0,}” is equivalent to “O *”.
re{ n, m} Match the fragment defined by the previous regular expression n to m times, greedy way
a| b Match a or B
(re) Matches the expression in parentheses and also represents a group
(?imx) Regular expressions contain three optional flags: I, m, or X. Only the areas in parentheses are affected.
(?-imx) Regular expressions turn off the I, m, or X optional flag. Only the areas in parentheses are affected.
(?: re) Similar (…), But it does not represent a group
(?imx: re) Use the I, m, or X optional flag in parentheses
(?-imx: re) Do not use the I, m, or X optional flags in parentheses
(?#…) notes.
(?= re) Forward positive delimiter. If it contains regular expressions, use Indicates that it succeeds when the current position is successfully matched, otherwise it fails. However, once the contained expression has been tried, the matching engine has not improved at all; The rest of the pattern also tries to the right of the delimiter.
(?! re) Forward negative delimiter. Contrary to the affirmative delimiter; Success when the contained expression cannot match the current position of the string.
(?> re) Matching independent patterns, eliminating backtracking.
\w Match alphanumeric underscores
\W Match non numeric alphabetic underscores
\s Match any white space character, equivalent to [\ t \ n \ R \ F].
\S Match any non null character
\d Match any number, equivalent to [0-9].
\D Match any non number
\A Start of matching string
\Z The matching string ends. If there is a newline, only the end string before the newline is matched.
\z End of matching string
\G Match the position where the last match was completed.
\b Match a word boundary, that is, the position between the word and the space. For example, ‘er \ B’ can match ‘er’ in ‘never’, but not ‘er’ in ‘verb’.
\B Matches non word boundaries. ‘ Er \ B ‘can match’ er ‘in’ verb ‘, but cannot match’ er ‘in’ never ‘.
\n. \ T, wait. Match a newline character. Match a tab, etc
\1…\9 Match the contents of the nth group.
\10 Match the content of the nth group if it is matched. Otherwise, it refers to the expression of octal character code.

Regular expression instance

Character matching

example describe
python Match “Python”

Character class

example describe
[Pp]ython Match “Python” or “Python”
rub[ye] Match “Ruby” or “Ruby”
[aeiou] Match any letter in brackets
[0-9] Match any number. Similar to [0123456789]
[a-z] Match any lowercase letters
[A-Z] Match any uppercase letters
[a-zA-Z0-9] Match any letters and numbers
[^aeiou] All characters except aeiou letters
[^0-9] Matches characters other than numbers

Special character class

example describe
. Matches any single character except ‘\ n’. To match any character including ‘\ n’, use a pattern like ‘[. \ n]’.
\d Match a numeric character. Equivalent to [0-9].
\D Matches a non numeric character. Equivalent to [^ 0-9].
\s Matches any white space characters, including spaces, tabs, page breaks, and so on. Equivalent to [\ f \ n \ R \ t \ v].
\S Matches any non whitespace characters. Equivalent to [^ \ f \ n \ R \ t \ v].
\w Matches any word characters that include underscores. Equivalent to ‘[a-za-z0-9#]’.
\W Does not match any character. Equivalent to ‘[^ a-za-z0-9]’.