Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

Time:2022-1-22

Knowledge points

  • Dynamic packet capture
  • Dynamic page analysis
  • Requests carries parameters to send requests
  • JSON data parsing

development environment

  • Python 3.8 newer stable running code
  • Pycharm 2021.2 auxiliary code
  • Requests third party module

I Data source analysis (train of thought analysis)

1. Open the developer tool to refresh the web page

  • Right click Check or F12 to open

  • Select network and refresh the web page
    Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

  • Click to open a video
    Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

  • Click to find the content
    Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

  • Expand the view in turn to find the video address we need

Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

2. Determine the URL address, request method, request parameters and request header parameters

Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

  • Request header parameters
    Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

  • Request parameters
    Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

3. Summary

  • Request method: Post
  • Request header (disguise):
headers = {
'content-type': 'application/json',
'cookie': 'your own cookie',
'Host': 'www.kuaishou.com',
'Origin': 'https://www.kuaishou.com',
'Referer': 'https://www.kuaishou.com/profile/3xv78fxycm35nn4',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'
}
  • Request parameters:
data = {
'operationName': "visionProfilePhotoList",
'query': "query visionProfilePhotoList($pcursor: String, $userId: String, $page:
String, $webPageArea: String) {\n visionProfilePhotoList(pcursor: $pcursor, userId:
$userId, page: $page, webPageArea: $webPageArea) {\n result\n llsid\n
webPageArea\n feeds {\n type\n author {\n id\n name\n
following\n headerUrl\n headerUrls {\n cdn\n url\n
__typename\n }\n __typename\n }\n tags {\n type\n
name\n __typename\n }\n photo {\n id\n
duration\n caption\n likeCount\n realLikeCount\n
coverUrl\n coverUrls {\n cdn\n url\n __typename\n
}\n photoUrls {\n cdn\n url\n __typename\n
}\n photoUrl\n liked\n timestamp\n expTag\n
animatedCoverUrl\n stereoType\n videoRatio\n
profileUserTopPhoto\n __typename\n }\n canAddComment\n
currentPcursor\n llsid\n status\n __typename\n }\n hostName\n
pcursor\n __typename\n }\n}\n",
'variables': {'userId': "3x9dquvtb9n9fps", 'pcursor': "", 'page': "profile"}
}
  • If you need to turn pages and crawl later, you need to use recursive implementation
    Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

II code implementation

1. Send a request to visit the website

url = 'https://www.kuaishou.com/graphql'
#Disguise
headers = {
    #Control data type JSON type string
    'content-type': 'application/json',
    'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_ea128125517a46bd491ae9ccb255e242; client_key=65890b29; userId=270932146; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABnjkpJPZ-QanEQnI0XWMVZxXtIqPj-hwjsXBn9DHaTzispQcLjGR-5Xr-rY4VFaIC-egxv508oQoRYdgafhxSBpZYqLnApsaeuAaoLj2xMbRoytYGCrTLF6vVWJvzz3nzBVzNSyrXyhz-RTlRJP4xe1VjSp7XLNLRnVFVEtGPuBz0xkOnemy7-1-k6FEwoPIbOau9qgO5mukNg0qQ2NLz_xoSKS0sDuL1vMmNDXbwL4KX-qDmIiCWJ_fVUQoL5jjg3553H5iUdvpNxx97u6I6MkKEzwOaSigFMAE; kuaishou.server.web_ph=b282f9af819333f3d13e9c45765ed62560a1',
    'Host': 'www.kuaishou.com',
    'Origin': 'https://www.kuaishou.com',
    'Referer': 'https://www.kuaishou.com/profile/3xauthkq46ftgkg',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',
}
#< response [200] >: send request success result
response = requests.post(url=url, headers=headers, json=data)

2. Obtain data

json_data = response.json()

3. Parse the data to remove unwanted content

feeds = json_data['data']['visionProfilePhotoList']['feeds']
#Parameters required on the next page
pcursor = json_data['data']['visionProfilePhotoList']['pcursor']
# print(pcursor)
for feed in feeds:
    Caption = feed ['photo '] ['caption'] # title
    Photourl = feed ['photo '] ['photourl'] # video link
    #\: escape character, write directly \ cannot match\
    #\ \ to match\
    #Using CSS and XPath is necessary. The data you get is a web page source code
    caption = re.sub('[\\/:*?"<>|\n\t]', '', caption)
    print(caption, photoUrl)

5. The video data obtained is binary video data

video_data = requests.get(url=photoUrl).content

6. Save the video in binary mode

with open(f'video/{caption}.mp4', mode='wb') as f:
    f.write(video_data)
Print (caption, 'download complete!')

Page crawling

def get_page(pcursor):
    #The required data must be specified
    #Recursion, call yourself and jump out of recursion
    data = {
        'operationName': "visionProfilePhotoList",
        'query': "query visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      type\n      author {\n        id\n        name\n        following\n        headerUrl\n        headerUrls {\n          cdn\n          url\n          __typename\n        }\n        __typename\n      }\n      tags {\n        type\n        name\n        __typename\n      }\n      photo {\n        id\n        duration\n        caption\n        likeCount\n        realLikeCount\n        coverUrl\n        coverUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrls {\n          cdn\n          url\n          __typename\n        }\n        photoUrl\n        liked\n        timestamp\n        expTag\n        animatedCoverUrl\n        stereoType\n        videoRatio\n        profileUserTopPhoto\n        __typename\n      }\n      canAddComment\n      currentPcursor\n      llsid\n      status\n      __typename\n    }\n    hostName\n    pcursor\n    __typename\n  }\n}\n",
        'variables': {'userId': "3xauthkq46ftgkg", 'pcursor': pcursor, 'page': "profile"}
    }
    if pcursor == None:
        Print ('download complete ')
        return 0
    
    get_page(pcursor)

get_page('')

Effect display

Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code
Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

Python collects a hand video, 1080p HD without watermark, complete data source analysis + complete code

Recommended Today

Mybatis source code – sqlsession access

preface Known inMybatisIn useMybatisThe configuration file will be read firstmybatis-config.xmlFor character stream or byte stream, and then throughSqlSessionFactoryBuilderBuild based on the character stream or byte stream of the configuration fileSqlSessionFactory, and then passSqlSessionFactoryofopenSession()Method acquisitionSqlSession, the sample code is shown below. public static void main(String[] args) throws Exception { String resource = “mybatis-config.xml”; SqlSessionFactory sqlSessionFactory = […]