What will Miss Ma’s lightning five whip dynamic word cloud picture be like? No worse than hip-hop

Time:2021-11-30

The text and pictures of this article come from the network, only for learning and communication, and do not have any commercial purpose. If you have any questions, please contact us in time for handling.

The following article is from farnast, author F

Novices and Xiaobai who have just come into contact with Python can copy the following linkWatch the basic introduction to Python video for free

https://v.douyu.com/author/y6AZ4jn9jwKW

 

preface

The headlines in November belong to Ma Baoguo.

A 69 year old comrade was attacked by young people and did not speak of martial ethics.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Look at those who bully old comrades

Or Mr. Ma speaks of benevolence, righteousness and morality, and shaking his hand is a five whip.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Hahaha, so in this issue, we will use Python to make a dynamic word cloud picture of lightning five whip for Mr. Ma Baoguo.

The word cloud data comes from station B and is drawn using stylecloud word cloud database.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

It mainly refers to an open source project on Baidu AI studio and uses paddleseg to segment the portrait.

Young F, don’t talk about martial arts. How about this, mouse tail juice.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Barrage data acquisition

Instead of crawling directly from station B, the third-party library BiliBili is used_ api。

This is a library written in python that calls various APIs of BiliBili, covering video, audio, live broadcast, dynamic, column, user, fan drama, etc.

Address: https://passkou.com/bilibili_ api/docs/

 

Using the following two methods of the video module, you can obtain the video barrage every day in November.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

First, you need to get the values of sessdata and CSRF (bili_jct).

Google browser can be viewed through the following figure. The domain name is bilibili.com.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

In order of hits, select the video ranking first to obtain the bullet screen. I didn’t expect Mr. Ma to be on fire for a long time, mouse tail juice.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Click the video ranking first, and then obtain the BV number, bv1hj411l7dp, in the access bar of the browser.

Obtain the barrage code as follows.

from bilibili_api import video, Verify
import datetime

#Parameters
Verify = verify ("your sessdata value", "your bili_jct value")

#Gets a list of dates with historical bullets
days = video.get_history_danmaku_index(bvid="BV1HJ411L7DP", verify=verify)
print(days)

#Obtain the barrage information and save it
for day in days:
    danmus = video.get_danmaku(bvid="BV1HJ411L7DP", verify=verify, date=datetime.date(*map(int, day.split('-'))))
    print(danmus)

    f = open(r'danmu.txt', 'a')
    for danmu in danmus:
        print(danmu)
        f.write(danmu.text + '\n')
    f.close()

 

Get results.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

I’m big E. I didn’t flash.

Jieba is used to segment the barrage data.

import jieba
def get_text_content(text_file_path):
    '''
    Get filled text content
    '''
    text_content = ''
    with open(text_file_path, encoding='utf-8') as file:
        text_content = file.read()
    #Data cleaning, only save the Chinese, letters and numbers in the string
    text_content_find = re.findall('[\u4e00-\u9fa5a-zA-Z0-9]+', text_content, re.S)
    text_content = ' '.join(jieba.cut(str(text_content_find).replace(" ", ""), cut_all=False))
    print(text_content)
    return text_content


text_content = get_text_content('danmu.txt')

 

Select Ma Baoguo’s original material video, and there is HD video on station B.

Address: https://www.bilibili.com/video/BV1JV41117hq

 

Refer to the information on the Internet and run the following code to download the video of station B.

from bilibili_api import video, Verify
import requests
import urllib3

#Parameters
Verify = verify ("your sessdata value", "your bili_jct value")

#Get download address
download_url = video.get_download_url(bvid="BV1JV41117hq", verify=verify)
print(download_url["dash"]["video"][0]['baseUrl'])

baseurl = 'https://www.bilibili.com/video/BV1JV41117hq'
Title = 'ma Baoguo'


def get_video():
    urllib3.disable_warnings()

    headers = {
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8'
    }
    headers.update({'Referer': baseurl})
    res = requests.Session()
    begin = 0
    end = 1024 * 1024 - 1
    flag = 0

    temp = download_url

    filename = "./" + title + ".flv"
    url = temp["dash"]["video"][0]['baseUrl']
    while True:
        headers.update({'Range': 'bytes=' + str(begin) + '-' + str(end)})
        res = requests.get(url=url, headers=headers, verify=False)
        if res.status_code != 416:
            begin = end + 1
            end = end + 1024 * 1024
        else:
            headers.update({'Range': str(end + 1) + '-'})
            res = requests.get(url=url, headers=headers, verify=False)
            flag = 1
        with open(filename, 'ab') as fp:
            fp.write(res.content)
            fp.flush()
        if flag == 1:
            fp.close()
            break

    print('--------------------------------------------')
    Print ('video download completed ')
    filename = "./" + title + ".mp3"
    url = temp["dash"]["audio"][0]['baseUrl']
    while True:
        headers.update({'Range': 'bytes=' + str(begin) + '-' + str(end)})
        res = requests.get(url=url, headers=headers, verify=False)
        if res.status_code != 416:
            begin = end + 1
            end = end + 1024 * 1024
        else:
            headers.update({'Range': str(end + 1) + '-'})
            res = requests.get(url=url, headers=headers, verify=False)
            flag = 1
        with open(filename, 'ab') as fp:
            fp.write(res.content)
            fp.flush()
        if flag == 1:
            fp.close()
            break

    Print ('audio download completed ')

 

Remember to add the values of sessdata and CSRF (bili_jct)

 

Paddleseg portrait segmentation

Project based on Baidu AI studio, project address:

https://aistudio.baidu.com/aistudio/projectdetail/1176398

 

First, download and unzip the related dependency packages of paddleseg.

#Download paddleseg
git clone https://hub.fastgit.org/PaddlePaddle/PaddleSeg.git

cd PaddleSeg/

#Install required dependencies
pip install -r requirements.txt

 

Usually go to “GitHub” to download things. The speed is relatively slow. You can use the acceleration link.

With the addition of fastgit.org here, the download speed can soar from tens of K to a few megabits per second.

#New folder
mkdir work/videos
mkdir work/texts
mkdir work/mp4_img
mkdir work/mp4_img_mask
mkdir work/mp4_img_analysis

 

Create new folders for storing related files.

Here, you can place the previously crawled video and audio in videos.

First frame the material video, that is to obtain the pictures of each frame of the video.

def transform_video_to_image(video_file_path, img_path):
    '''
    Save every frame in the video as a picture
    '''
    video_capture = cv2.VideoCapture(video_file_path)
    fps = video_capture.get(cv2.CAP_PROP_FPS)
    count = 0
    while (True):
        ret, frame = video_capture.read()
        if ret:
            cv2.imwrite(img_path + '%d.jpg' % count, frame)
            count += 1
        else:
            break
    video_capture.release()

    filename_list = os.listdir(img_path)
    with open(os.path.join(img_path, 'img_list.txt'), 'w', encoding='utf-8') as file:
        file.writelines('\n'.join(filename_list))

    Print ('video pictures saved successfully,% d in total '% count')
    return fps


input_video = 'work/videos/Master_Ma.mp4'
fps = transform_video_to_image(input_video, 'work/mp4_img/')

 

A total of 564 pictures were obtained.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Then use paddleseg to segment all the video images and generate mask images.

#Generate mask result picture
Python your path / paddleseg / pdseg / vis.py\
           --CFG your path / work / humanseg.yaml\
           --vis_ Dir your path / work / MP4_ img_ mask

 

The model is used for prediction, in which the humanseg.yaml file is provided by the author and can be used for image segmentation.

Pre training model deep lab v3p_ xception65_ For humanseg, download, unzip and install it in paddleseg / pretrained_ Model.

Because the pre training model is large, it will not be put on the network disk. You can directly visit the following link to download it.

#Download the pre training model deeplobv3p_ xception65_ humanseg
https://paddleseg.bj.bcebos.com/models/deeplabv3p_xception65_humanseg.tgz

 

Remember to change the path information in the humanseg.yaml file to your own path.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Run the above three lines of commands, and finally 564 mask files will be generated.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Word cloud generation

Use the stylecloud word cloud library to generate word clouds, and use the font Fangzheng Lanting journal black.

def create_wordcloud():
    for i in range(564):
        file_name = os.path.join("mp4_img_mask/", str(i) + '.png')
        # print(file_name)
        result = os.path.join("work/mp4_img_analysis/", 'result' + str(i) + '.png')
        # print(result)
        stylecloud.gen_stylecloud(text=text_content,
                                  font_ Path = 'founder Lanting print black. TTF',
                                  output_name=result,
                                  background_color="black",
                                  mask_img=file_name)

 

Because the stylecloud library cannot customize the word cloud image, xiaof modified its code.

To Gen_ Stylecloud adds a mask_ IMG is the parameter that ultimately acts on Gen_ mask_ Array function.

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

In this way, the mask picture can be transformed into a word cloud picture!

马老师的闪电五连鞭动态词云图会是怎么样的呢?丝毫不比街舞差

 

Combine these word cloud images into a video.

def combine_image_to_video(comb_path, output_file_path, fps=30, is_print=False):
    '''
        Merge images to video
    '''
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')

    file_items = [item for item in os.listdir(comb_path) if item.endswith('.png')]
    file_len = len(file_items)
    # print(comb_path, file_items)
    if file_len > 0:
        print(file_len)
        temp_img = cv2.imread(os.path.join(comb_path, file_items[0]))
        img_height, img_width, _ = temp_img.shape

        out = cv2.VideoWriter(output_file_path, fourcc, fps, (img_width, img_height))

        for i in range(file_len):
            pic_name = os.path.join(comb_path, 'result' + str(i) + ".png")
            print(pic_name)
            if is_print:
                print(i + 1, '/', file_len, ' ', pic_name)
            img = cv2.imread(pic_name)
            out.write(img)
        out.release()


combine_image_to_video('work/mp4_img_analysis/', 'work/mp4_analysis.mp4', 30)

 

Use ffmpeg to further process the video, clipping + overlapping.

#Video clipping
ffmpeg  -i  mp4_analysis_result.mp4  -vf  crop=iw:ih/2:0:ih/5  output.mp4

#Video overlap
ffmpeg -i output.mp4 -i viedeos/Master_Ma.mp4 -filter_complex "[1:v]scale=500:270[v1];[0:v][v1]overlay=1490:10" -s 1920x1080  -c:v libx264 merge.mp4

#Add audio
ffmpeg -i merge.mp4 -i  videos/Master_Ma.mp4 -c:v copy -c:a copy work/mp4_analysis_result2.mp4 -y

#Generate GIF graph
ffmpeg -ss 00:00:22 -t 3 -i merge.mp4 -r 15 a.gif

 

The installation and use of ffmpeg depends on everyone’s own Baidu ~