Common symbols and methods of Python crawler regular expressions


Regular expressions are not part of Python. Regular expression is a powerful tool for string processing. It has its own unique syntax and an independent processing engine. It may not be as efficient as STR’s own methods, but it is very powerful. Thanks to this, the syntax of regular expressions is the same in the languages that provide regular expressions. The only difference is that different programming languages support different number of grammars. But don’t worry, unsupported grammars are usually not commonly used.

1. Common symbols

.: matches any character except line breaks \ n

: matches the previous character 0 or infinite times
?: match previous character 0 or 1 times

. *: greedy algorithm, match as many characters as possible

*?: non greedy algorithm

(): data in parentheses returned as result

2. Common methods

Findall: match all the contents that conform to the rule and return the list containing the results

Search: match and extract the first regular content, and return a regular expression object

Sub: replace the regular content, and return the replaced value

3. Use example

(1) Examples of using. To match any character except newline characters \ n

Import re import re library file

a = ‘xy123’

b = re.findall(‘x..’,a)

print b

The result of printing is: [‘xy1 ‘], each. Represents a placeholder

(2) * match the previous character 0 times or infinite times

a = ‘xyxy123’

b = re.findall(‘x*’,a)

print b

The result of printing is: [‘x ‘,’ ‘,’x’, ”, ”, ”, ”, ”, ”]

(3) For example, match the previous character 0 times or 1 time

a = ‘xy123’

b = re.findall(‘x?’,a)

print b

The result of printing is: [‘x ‘,’ ‘,’ ‘,’ ‘,’ ‘,’ ‘,’ ‘]

(4) Examples of. *

secret_code = ‘hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse’

b = re.findall(‘xx.*xx’,secret_code)

print b

The result of printing is: [‘xxixfasdjifja134xxlovexx23345sdfxxyouxx ‘]

(5) *? Use examples of

secret_code = ‘hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse’

c = re.findall(‘xx.*?xx’,secret_code)

print c

The result of printing is: [‘xxxixx ‘,’ xxxlovexx ‘,’ xxxyouxx ‘]

(6) () use example

secret_code = ‘hadkfalifexxIxxfasdjifja134xxlovexx23345sdfxxyouxx8dfse’

d = re.findall(‘xx(.*?)xx’,secret_code)

print d

The printed result is: [‘I ‘,’ love ‘,’ you ‘], and the data in brackets is the returned result

(7) Examples of re. S

s = ”’sdfxxhello


d = re.findall(‘xx(.*?)xx’,s,re.S)

print d

The printed result is: [‘Hello \ n ‘,’World’], and re. S is used to make. Include \ n

(8) An example of using findall

s2 = ‘asdfxxIxx123xxlovexxdfd’

f2 = re.findall(‘xx(. ?)xx123xx(. ?)xx’,s2)

print f20

The result of printing is: love

In this case, F2 is a list containing a tuple, which contains two elements. The two elements in the tuple are the contents matched by two (). If S2 contains multiple substrings such as’ XX (.?) xx123xx (.?) XX ‘, F2 contains multiple tuples;

(9) An example of using search

s2 = ‘asdfxxIxx123xxlovexxdfd’

f =‘xx(. ?)xx123xx(. ?)xx’,s2).group(2)

print f

The result of printing is: love

. group (2) means to return the matching content of the second bracket. If it is. Group (1), the printed content is: I

(10) Examples of sub

s = ‘123rrrrr123’

output = re.sub(‘123(.*?)123’,’123%d123’%789,s)

print output

The result of printing: 123789123

Where% d is similar to% d in C language. If output = re.sub (‘123 (. *?) 123 ‘,’ 123789123 ‘, s), the output result is also: 123789123

(11) An example of using \ d to match numbers

a = ‘asdfasf1234567fasd555fas’

b = re.findall(‘(\d+)’,a)

print b

The printed result is: [‘1234567 ‘,’ 555 ‘], \ D + can match the numeric string;

The above are some common symbols and syntax of Python crawler regular expressions, hoping to help Python beginners learn.