Using spark to analyze the recruitment information of Lagou (4): several commonly used scripts and image analysis results

Time:2021-2-26

summary

In the previous article, we have introduced the basic usage of BMR. Combined with the documents of spark and Scala, I think we can start your data analysis. This article focuses on some simple ideas of guidance and analysis. If you are stuck in a certain link when analyzing recruitment data, you can try to read this article, Please make sure that you have read the third article in this series, configured BMR correctly, and imported the real recruitment data you need

If we use traditional programming language tools?

Suppose we use traditional language tools from data collection, storage to data reading and use, such as nodejs

If we want to know how many positions there are in different salary ranges and sort them from the most to the least, we may need to:

  1. Create a new object to store the data of each company;

  2. Read data circularly to enrich the data of each company;

  3. Group by salary, record the information of each position in each company;

  4. According to the recruitment quantity as the standard order;

The steps are relatively simple. For the time being, if the data set is larger, the memory is likely to be too large. However, the logical details of steps 2 and 3 require a lot of code judgment, such as how to read the file data cyclically? What if the file name is irregular? What if the file data is damaged and irregular? The JSON of the file data is not a directly available position array, Is the operation of JSON structure transformation logically easy for you to implement?

It’s true that there’s nothing you can’t do with a programming language. It’s just a matter of time. Now that we’re talking about time, if there’s another way that’s obviously faster, won’t you use it?

Using spark for analysis

The following operations are based on the interactive programming tool Zeppelin:

1. Read data

val job = sqlContext.read.json("jobs")
job.registerTempTable("job")
job.printSchema()

Using spark to analyze the recruitment information of Lagou (4): several commonly used scripts and image analysis results

2. Get the number of positions in each salary section and sort them

%sql
SELECT  postionCol.salary,COUNT(postionCol.salary) salary_count
FROM job
LATERAL VIEW explode(content.positionResult.result) positionTable AS postionCol
WHERE content.positionResult.queryAnalysisInfo.positionName="ios" 
GROUP BY postionCol.salary
ORDER BY salary_count  DESC

Using spark to analyze the recruitment information of Lagou (4): several commonly used scripts and image analysis results

You can really directly use SQL like syntax for complex query of semi-structured data. I don’t know what you think after reading it?

If your SQL skills are not particularly good, my suggestion is: have a look more when you are freefileWhen there is a need, type English keywords firstgoogle

Several examples of spark SQL queries for data that you may be interested in

For children’s shoes in need:

Show the number of jobs by company name


%sql
SELECT  postionCol.companyFullName,COUNT(postionCol.companyFullName) postition_count
FROM job
LATERAL VIEW explode(content.positionResult.result) positionTable AS postionCol
WHERE content.positionResult.queryAnalysisInfo.positionName="ios" 
GROUP BY postionCol.companyFullName
ORDER BY postition_count  DESC

Using spark to analyze the recruitment information of Lagou (4): several commonly used scripts and image analysis results

Shows the number of years required for a position

%sql
SELECT  postionCol.workYear,COUNT(postionCol.workYear) workYears
FROM job
LATERAL VIEW explode(content.positionResult.result) positionTable AS postionCol
WHERE content.positionResult.queryAnalysisInfo.positionName="ios" 
GROUP BY postionCol.workYear
ORDER BY workYears  DESC

Using spark to analyze the recruitment information of Lagou (4): several commonly used scripts and image analysis results

Show the educational requirements of a position


%sql
SELECT  postionCol.education,COUNT(postionCol.education) education_count
FROM job
LATERAL VIEW explode(content.positionResult.result) positionTable AS postionCol
WHERE content.positionResult.queryAnalysisInfo.positionName="ios" 
GROUP BY postionCol.education
ORDER BY education_count  DESC

Using spark to analyze the recruitment information of Lagou (4): several commonly used scripts and image analysis results

Shows the size of each company for a position

%sql
SELECT  postionCol.companySize,COUNT(postionCol.companySize) company_size_ount
FROM job
LATERAL VIEW explode(content.positionResult.result) positionTable AS postionCol
WHERE content.positionResult.queryAnalysisInfo.positionName="ios" 
GROUP BY postionCol.companySize
ORDER BY company_size_ount  DESC

Using spark to analyze the recruitment information of Lagou (4): several commonly used scripts and image analysis results

Postscript of the series

This is the first series of articles. I think I’ve made some problems clear. The value of the article itself depends on whether it can meet people who just need it. Let’s leave these things to time! As far as this series is concerned, the results of the final analysis are very experienced. Now the demand for high-end talents in the recruitment market is so large, Suddenly found that my thinking still stay in Li, two years ago, that “white” rampant era

Frankly speaking, I’ve been looking for the meaning of what I’ve done. The first and second articles in the series were rarely read. But fortunately, I continued to write the third article, and finally someone slowly recognized and read it. By the way, the reading volume of the first two articles increased a little

We should still believe that people have the ability to appreciate beautiful and valuable things; if you don’t think so, it may just happen that your efforts have not been seen by people in need

Write down what you feel valuable, and leave the rest to time. That’s what I want to say to those lovely children’s shoes who are trying to blog and share. Let’s go ↖ (^ω^) ↗


The exclusive gitub warehouse of this series:https://github.com/ios122/spark_lagou