When using requests module to obtain website data, website coding is a very troublesome problem. Generally, requests will automatically identify the website code. If the page does not specify a code, it will default to iso-8859-1 code. Something may go wrong at this time.
Generally, there are several ways. The simplest is to artificially specify the code R. encoding = ‘UTF-8’
However, when collecting data, you may visit websites with different domain names. At this time, it is difficult to artificially specify a correct code for each website. The following are general methods
if r.encoding == 'ISO-8859-1':
encodings = requests.utils.get_encodings_from_content(r.text)
if encodings:
encoding = encodings[0]
else:
encoding = r.apparent_encoding
return r.content.decode(encoding, 'replace')
else:
return r.text
This work adoptsCC agreement, reprint must indicate the author and the link to this article