Python crawls the barrage of houlang and displays the data in a word cloud.


The text and pictures of this article are from the Internet, only for learning and communication, and do not have any commercial use. The copyright belongs to the original author. If you have any questions, please contact us in time

This article is from Tencent cloud by Python sophomore在这里插入图片描述
A few days ago, station B launched a small video called “houlang”, which has aroused warm repercussions in the whole network, including praise and criticism In this paper, we climb the video barrage to understand the views of B station netizens on the video.

Video Barrage is the existence of XML
In the file, the format of the link is: We just need to get the video’s CID
OK, let’s take a look at the access method. Let’s open the video link first , and then press
Press F12 to open the developer tool, select network, and refresh the page. We can enter CID in the filter, as shown below:
After obtaining the CID, we can know that the bullet screen file link is: , open the link to see:


The implementation code of barrage crawling is as follows:

url = ""
req = requests.get(url)
html = req.content
html_ Doc = str (HTML, "UTF-8") #
soup = BeautifulSoup(html_doc, "lxml")
results = soup.find_all('d')
contents = [x.text for x in results]
#Save the results
dic = {"contents": contents}
df = pd.DataFrame(dic)
df["contents"].to_csv("bili.csv", encoding="utf-8", index=False)


Now that we have obtained the barrage data, we will make a word cloud display of the data, and the implementation code is as follows:

def jieba_():
    #Open comment data file
    content = open("bili.csv", "rb").read()
    #Jieba participle
    word_list = jieba.cut(content)
    words = []
    #Filtered words
    stopwords = open("stopwords.txt", "r", encoding="utf-8").read().split("\n")[:-1]
    for word in word_list:
        if word not in stopwords:
    global word_cloud
    #Separate words with commas
    word_cloud = ','.join(words)

def cloud():
    #Open the background image of word cloud
    cloud_mask = np.array("bg.png"))
    #Define some attributes of word cloud
    wc = WordCloud(
        #The background image segmentation color is white
        #Background pattern
        #Display the maximum number of words
        #Show Chinese
        #Maximum size
    global word_cloud
    #Word cloud function
    x = wc.generate(word_cloud)
    #Generate word cloud image
    image = x.to_image()
    #Show word cloud pictures
    #Save word cloud image


Take a look at the effect: