5. Comprehensive case of RDD operation

Time:2022-5-7

1、 Word frequency statistics

1. Prepare documents

1. Download novels or long press releases

2. Upload to HDFS

    

 

 

     

 

 

2. Read file and create RDD

    

 

 

 

3. Participle

    

 

 

 

4. Exclude case lower(), map()

    

 

 

 

Punctuation mark re split(pattern,str),flatMap(),

    

 

 

 

Stop words, you can download stopwords online txt,filter(),

    

 

 

 

Word filter () with length less than 2

    

 

 

 

  

5. Statistics of word frequency

    

 

 

 

    

 

 

6. Sort by word frequency

  

 

 

     

 

 

 

7. Output to file

    

 

 

8. View results

    

 

 

2、 Find the top value