Python crawls recruitment website data and does data visualization processing

Time:2021-4-16

The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. If you have any questions, please contact us in time.

The following article comes from qingdeng programming, author: Qingfeng

Python crawls recruitment website data and does data visualization processing

 

preface

The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. If you have any questions, please contact us in time.

Basic development environment

  • Python 3.6
  • Pycharm

Use of related modules

  • Crawler module
import requests
import re
import parsel
import csv

 

  • Word cloud module
import jieba
import wordcloud

 

Target page analysis

Python crawls recruitment website data and does data visualization processing

 

Through the developer tool, we can see that after getting the returned data, the data is

In the window__ SEARCH_ RESULT__ Inside, you can use regular matching data.

As shown in the figure below

Python crawls recruitment website data and does data visualization processing

 

'https://jobs.51job.com/beijing-ftq/127676506.html?s=01&t=0'

 

Every detail page of recruitment information has a corresponding ID. you just need to extract the ID value by regular matching, splice the URL, and then go to the recruitment details page to extract the recruitment data.

response = requests.get(url=url, headers=headers)
lis = re.findall('"jobid":"(\d+)"', response.text)
for li in lis:
    page_url = 'https://jobs.51job.com/beijing-hdq/{}.html?s=01&t=0'.format(li)

 

Python crawls recruitment website data and does data visualization processing

 

Although the website is a static page, but the page coding is messy, in the process of crawling need transcoding.

F = open ('recruitment. CSV ', mode =', encoding = ', UTF-8', newline = ')
csv_ writer =  csv.DictWriter (F, fieldnames = ['title ',' region ',' work experience ',' education ',' salary ',' benefits', 'number of recruits',' release date '])
csv_writer.writeheader()
response = requests.get(url=page_url, headers=headers)
response.encoding = response.apparent_encoding
selector = parsel.Selector(response.text)
title =  selector.css ('. CN H1:: text'). Get() # title
salary =  selector.css (' div.cn  Strong:: text '). Get()? Salary
welfare =  selector.css ('. JTAG div.t1 span:: text'). Getall()? Welfare
welfare_info = '|'.join(welfare)
data_info = selector.css('.cn p.msg.ltype::attr(title)').get().split('  |  ')
area = data_ Info [0] # region
work_ experience = data_ Info [1] # work experience
educational_ background = data_ Info [2] # education background
number_ of_ people = data_ Info [3] # number of recruits
release_ date = data_ Info [- 1]. Replace ('publish ',')? Publish date
all_info_list = selector.css('div.tCompany_main > div:nth-child(1) > div p span::text').getall()
all_info = '\n'.join(all_info_list)
dit = {
    'title ': title,
    Area,
    Work experience_ experience,
    Educational background_ background,
    Salary,
    Welfare_ info,
    'number of recruits': number_ of_ people,
    'release date ': release_ date,
}
csv_writer.writerow(dit)
With open ('recruitment information. TXT ', mode' ='a ', encoding' ='utf-8 ') as F:
    f.write(all_info)

 

Python crawls recruitment website data and does data visualization processing

 

Python crawls recruitment website data and does data visualization processing

 

The above steps can complete the relevant data crawling about recruitment.

Simple and rough data cleaning

  • Salary and treatment
content =  pd.read_ CSV (r'd: Python, demo, data analysis, recruitment, recruitment. CSV ', encoding ='utf-8'))
Salary = content ['salary ']
salary_1 = salary[salary.notnull()]
salary_count = pd.value_counts(salary_1)

 

Python crawls recruitment website data and does data visualization processing

 

Python crawls recruitment website data and does data visualization processing

 

  • Education requirements
content =  pd.read_ CSV (r'd: Python, demo, data analysis, recruitment, recruitment. CSV ', encoding ='utf-8'))
educational_ Background = content ['education ']
educational_background_1 = educational_background[educational_background.notnull()]
educational_background_count = pd.value_counts(educational_background_1).head()
print(educational_background_count)
bar = Bar()
bar.add_xaxis(educational_background_count.index.tolist())
bar.add_ Yaxis ("educational"_ background_ count.values.tolist ())
bar.render('bar.html')

 

Python crawls recruitment website data and does data visualization processing

 

The number of recruits is no requirement

  • hands-on background
content =  pd.read_ CSV (r'd: Python, demo, data analysis, recruitment, recruitment. CSV ', encoding ='utf-8'))
work_ Experience = content ['Work experience ']
work_experience_count = pd.value_counts(work_experience)
print(work_experience_count)
bar = Bar()
bar.add_xaxis(work_experience_count.index.tolist())
bar.add_ Yaxis ("experience requirement", work_ experience_ count.values.tolist ())
bar.render('bar.html')

 

Python crawls recruitment website data and does data visualization processing

 

Word cloud analysis, technical requirements

py = imageio.imread("python.png")
F = open ('python recruitment information. TXT ', encoding' ='utf-8 ')

re_txt = f.read()
result = re.findall(r'[a-zA-Z]+', re_txt)
txt = ' '.join(result)

#Word segmentation
txt_list = jieba.lcut(txt)
string = ' '.join(txt_list)
#Word cloud picture setting
wc = wordcloud.WordCloud(
        Width = 1000, # the width of the picture
        Height = 700, # the height of the picture
        background_ Color ='white ', # picture background color
        font_ path=' msyh.ttc ', word cloud font
        Mask = py, # the word cloud image used
        scale=15,
        stopwords={' '},
        # contour_width=5,
        # contour_ Color ='Red '? Outline color
)
#Input text to word cloud
wc.generate(string)
#Saving picture address of word cloud image
wc.to_ File (r'python recruitment information. PNG ')

 

Python crawls recruitment website data and does data visualization processing

 

Conclusion:

Data analysis is really rough, and it’s really eye-catching~