1. Understanding module
Common scenario: a module is a file containing Python definitions and declarations. The filename is the module name plus the suffix. Py.
However, the modules loaded by import are divided into four general categories:
1. Code written in python (. Py file)
2. C or C + + extensions that have been compiled as shared libraries or DLLs
3. Package a group of modules
4. Use C to write and link to the built-in module of Python interpreter
Why use modules?
If you exit the Python interpreter and re-enter it, the functions or variables you defined before will be lost. Therefore, we usually write the program to a file for permanent saving, and execute it through Python test.py when necessary. At this time, test.py is called script.
With the development of the program, there are more and more functions. In order to facilitate the management, we usually divide the program into files, so that the structure of the program is clearer and the management is more convenient. At this time, we can not only execute these files as scripts, but also import them into other modules as modules to realize the reuse of functions.
2. re module
2.1 import re module
At the beginning of the. Py file, enter import re.
2.2 common methods
Findall: match all, each item is an element in the list
RET = re.findall ('d+','sjkhk172 at actual cost 928') × regular expression with matching string, flagret = re.findall ('d','sjkhk172 at actual cost 928') × regular expression with matching string, flagprint (RET)
Search: only the first one from left to right is matched. The result is not a direct result, but a variable. The group method of this variable is used to get the result.
If no match is found, none will be returned, and an error will be reported if group is used.
RET = re.search ('\ D +','sjkhk172 according to the actual cost 928 ') Print (RET) ා memory address, which is the result of a regular match print (ret.group()) ා get the real result through ret.group() ret = re.search('\d','owghabDJLBNdgv')print(ret)print(ret.group()) RET = re.search ('\ D +','sjkhk172 according to the actual cost 928 ')if ret : # 内存地址,这是一个正则匹配的结果 print(ret.group()) # 通过ret.group()获取真正的结果
Match: match from scratch, which is equivalent to the regular expression in search plus a ^.
RET = re. Match ('\ D +','172sjkhk according to the actual cost 928 ') print(ret)
2.2.4 extension of string processing
s = 'alex|taibai|egon|'print(s.split('|'))s = 'alex83taibai40egon25'ret = re.split('\d+',s)print(ret)
ret = re.sub('\d+','H','alex83taibai40egon25') Print (RET) RET = re.sub ('\ D +','h ',' alex83taibai40egon 25 ', 1) // old, new, need to be replaced, times print (RET)
Subn: returns a tuple, and the second element is the number of substitutions
ret = re.subn('\d+','H','alex83taibai40egon25')print(ret)
2.3 advance of re module: time / space
Compile saves you time using regular expressions to solve problems.
Compile regular expressions into bytecode.
In the process of multiple use, it will not be compiled multiple times.
RET = re.compile ('\ D +') (print (RET) res = ret.findall ('alex83taibai40egon25 ') print (RES) res = ret.search ('sjkhk172 at 928') print (res.group())
Finditor saves you space / memory to solve problems with regular expressions
ret = re.finditer('\d+','alex83taibai40egon25') for i in ret: print(i.group())
2.4 method summary
Findall returns the list to find all the matches Search returns a variable for matching, and takes the first value matched by group, and returns none for mismatching. Group will report an error match, which is equivalent to adding a '^' to the regular expression of search Spilt returns a list, which is cut according to regular rules. By default, the matched content will be cut off and replaced by sub / subn. Find the content to be replaced according to regular rules. Subn returns tuples. The second value is the number of times to replace Compile compiles a regular expression. Using this result to search match find find can save time. Find returns an iterator. All the results are in this iterator. You need to use the form of cycle + group to save memory
2.5 use of groups
S ='wahaha 'ා tag Language HTML web page RET = re.search (' (\ W +) (\ W +) > ', s) print (ret.group()) all the results print (ret.group (1)) ා the number parameter represents the content Print (ret.group (2)) print (ret.group (3)) in the corresponding group
In order for findall to get the content in the group, there is a special syntax, which is to display the content in the group first.
s = 'wahaha' ret = re.findall('(\w+)',s) print(ret) ret = re.findall('>(\w+)
Ungroup first (?: regular expression)
ret = re.findall('\d+(\.\d+)?','1.234*4') print(ret)
ret = re.findall('\d+(?:\.\d+)?','1.234*4') print(ret)
For regular expressions, sometimes we need to group to constrain the occurrence of a group of characters.
For Python, grouping can help you find what you really need better and more accurately.
A special Convention between Python and regular expressions.
(? P regular expression)
s = 'wahaha' ret = re.search('>(?P\w+)print(ret.group(1))print(ret.group('con'))
s = 'wahaha'pattern = '(\w+)(\w+)>'ret = re.search(pattern,s)print(ret.group(1) == ret.group(3))
Using the previous group requires that the group with this name matches the content in the group with the same name
pattern = '\w+)>(\w+)(?P=tab)>' ret = re.search(pattern,s)print(ret)