Node series – 006 – puppeter

Time:2021-10-19

Based on the previous five articles, we should access the service of point grounding gas.

At the beginning of this article, we will touch on the front-end multilingual function.

I don’t know if my friends have contacted me this timejsliangWe will desensitize the multilingual processing in the real project (yes, only processing rather than explaining how to configure multilingual) and share it.

The data used in this sharing is only fictitious for reference to real projects. After all, this is only a [tool library] rather than a project supporting multiple languages. However, this set of tools can be used elsewhere under modification and has reference value

In this article, we will explain how to use puppeter to control Chrome / chromium to download files.

I. Preface

Puppeter is a node library that provides a high-level API to control chromium or chrome through the devtools protocol.

Just as it is introduced in the introduction to GitHub: most of the operations you manually perform in the browser can be completed by using puppeter!

  • Grab page snapshot
  • Generate page pdf
  • Automatic operation page DOM
  • ……

Buddy can make complaints about the GitHub or Chinese documents in the reference below. Here is not an example, so as not to be copied by README.md.

II. Puppeter

  • Installation:npm i puppeteer

jsliangError during installation:

  • (node:7584) ExperimentalWarning: The fs.promises API is experimental

My node.js version is[email protected]Therefore, node.js needs to be upgraded.

After checking the following information, there are two ways to upgrade, one is to download the latest version to cover the installation, and the other is throughnvm/nvmwWay to manage.

jsliangThe network is not bad. Download the latest document version directly: node official website

Check the latest version after installation:

  • node -vv14.17.1

At this time, the puppeter will be installed and the installation will be successful,package.jsonDisplay:"puppeteer": "^10.0.0"

There may be various error reports during the installation of puppeter. It’s time to test the network speed of our partners

After installation, start making trouble~

2.1 snapshot capture

Let’s take a snapshot of a page as a simple example:

src/index.ts

import program from 'commander';
import common from './common';
import './base/console';
import puppeteer from 'puppeteer';

program
  .version('0.0.1')
  . description ('tool Library ')

program
  .command('jsliang')
  . description ('jsliang help instruction ')
  .action(() => {
    common();
  });

program
  .command('test')
  . description ('test channel ')
  .action(async () => {
    //Launch browser
    const browser = await puppeteer.launch({
      Headless: false, // open the entity browser
    });

    //Create a new tab and open it
    const page = await browser.newPage();
    await page.goto('https://www.baidu.com/s?wd=jsliang');

    //Take a snapshot and store it locally
    await page.screenshot({
      path: './src/baidu.png',
    });

    //Close window
    await browser.close();
  });

program.parse(process.argv);

End of executionnpm run testAfter that,srcPicture files will appear in the folderbaidu.png, open and display as follows:

Node series - 006 - puppeter

Measured scientific Internet tools or 360 security guards will affect this operation. In order to prevent your blood pressure from soaring, please ensure that these software are turned off

In this way, we can have a preliminary understanding of puppeter. Of course, it can also export PDF, etc. turn to the contents in [References] below to further understand puppeter.

2.2 downloading files

Since we can get screenshots, it’s not surprising that we can operate dom. Let’s get the files on the offline!

Take Jinshan document for example. Let’s create an excel file first:

Node series - 006 - puppeter

The creation method can be played by yourself, so there is no explanation. Jinshan document address:https://www.kdocs.cn/

Then, the next step is to download the EXCEL (assuming that someone has been invited to do the translation work). This is the Excel:

Node series - 006 - puppeter

The picture comes from the Internet. This knowledge sharing is for reference. Infringement must be deleted

Then let’s make a simple:

Node series - 006 - puppeter

It doesn’t matter how multilingual it is. Our purpose is to operate the puppeter to obtain this excel file

OK, with the file, how can we download it? Now the situation is:

  • Imagine if we open it through puppeter, it’s a headless browser, which is almost traceless. If we log in normally, we need to log in again, enter the link, and then click the button to download.

Therefore, the login free link of Jinshan document is used here:

Node series - 006 - puppeter

As we all know, no login means no login. Although this explanation is very retarded, I feel it is necessary

The above demo address is provided here. The partners can practice it, but I’m not sure if this link will be deleted one day, so set one by myself according to the above steps!

  • [Jinshan document excel trial file. Xlsx]:https://www.kdocs.cn/l/sdwvJUKBzkK2

OK, after talking about so many preconditions, let’s get to the point – how to obtain offline files:

  1. The action browser openshttps://www.kdocs.cn/l/sdwvJUKBzkK2
  2. Sleep 6.66s (make sure the browser opens the link and loads the page)
  3. Then click the [more menu] button
  4. Sleep 2S (make sure more menu buttons are clicked)
  5. Set the download path (ensure the download location, otherwise the pop-up window will be difficult to handle)
  6. Finally, click the [Download] button
  7. Sleep for 10s (ensure resources are downloaded to)
  8. close window

The only point to pay attention to above is point 5, because there will be a pop-up window when we click to download windows (not the default download), so we need to set the download path in advance (it will be reflected in the code).

Node series - 006 - puppeter

So, the code!

src/common/index.ts

import { inquirer } from '../base/inquirer';
import { Result } from '../base/interface';
import { sortCatalog } from './sortCatalog';
import { downLoadExcel } from './downLoadExcel';

const common = (): void => {
  //Question route: see questionlist.ts
  const questionList = [
    // q0
    {
      type: 'list',
      Message: 'what service do you need?',
      Choices: ['public service', 'file management']
    },
    // q1
    {
      type: 'list',
      Message: 'current public services are:',
      Choices: ['file sorting']
    },
    // q2
    {
      type: 'input',
      Message: 'the folder to be sorted is? (absolute path) ',
    },
    // q3
    {
      type: 'list',
      Message: 'what support do you need?',
      Choices: ['multilingual', 'markdown to word'],
    },
    // q4
    {
      type: 'list',
      Message: 'what support do you need?',
      choices: [
        'download multilingual resources',
        'import multilingual resources',
        'export multilingual resources',
      ],
    },
    // q5
    {
      type: 'input',
      Message: 'resource download address (HTTP)?',
      default: 'https://www.kdocs.cn/l/sdwvJUKBzkK2',
    }
  ];

  const answerList = [
    // q0
    async (result: Result, questions: any) => {
      If (result. Answer = = 'public service'){
        questions[1]();
      }Else if (result. Answer = = = 'file management'){
        questions[3]();
      }
    },
    // q1
    async (result: Result, questions: any) => {
      If (result. Answer = = = 'file sorting'){
        questions[2]();
      }
    },
    // q2
    async (result: Result, _questions: any, prompts: any) => {
      const sortResult = await sortCatalog(result.answer);
      if (sortResult) {
        Console.log ('sorting succeeded! ');
        prompts.complete();
      }
    },
    // q3
    async (result: Result, questions: any) => {
      If (result. Answer = = = 'multilingual'){
        questions[4]();
      }
    },
    // q4
    async (result: Result, questions: any) => {
      If (result. Answer = = = 'download multilingual resources'){
        questions[5]();
      }
    },
    // q5
    async (result: Result, _questions: any, prompts: any) => {
      if (result.answer) {
        const downloadResult = await downLoadExcel(result.answer);
        if (downloadResult) {
          Console.log ('download succeeded! ');
          prompts.complete();
        }
      }
    },
  ];

  inquirer(questionList, answerList);
};

export default common;

I regret seeing the above code. WhyInquirer.tsI made it so disgusting thatjsliangYou also need to write a document to indicate the problem sequence, and then straighten out the problem sequence:

src/common/questionList.ts

//Problem consultation route of common plate
export const questionList = {
  'public service': {// Q0
    'file sorting': {// Q1
      'folders to be sorted': 'work', // Q2
    },
  },
  'file management': {// Q0
    'multilingual ': {// Q3
      'download multilingual resources': {// Q4
        'download address': 'work', // Q5
      },
      'import multilingual resources': {// Q4
        'import address':' work ',
      },
      'export multilingual resources': {// Q4
        'export full resources':' work ',
        'export single door resource': 'work',
      }
    },
    'markdown to word ':' not supported yet ', // Q3
  },
};

After writing, turn to the write function:

src/common/downLoadExcel.ts

import puppeteer from 'puppeteer';
import path from 'path';
import fs from 'fs';

export const downLoadExcel = async (link: string): Promise<boolean> => {
  //Launch browser
  const browser = await puppeteer.launch({
    Headless: false, // open the entity browser
    Devtools: true, // open development mode
  });

  //1. Create a new tab and open it
  const page = await browser.newPage();
  await page.goto(link);

  //2. Sleep for 6.66s - make sure the page opens normally
  await page.waitForTimeout(6666);

  //3. Click the [more menu] button
  const moreBtn = await page.$('.header-more-btn');
  moreBtn?.click();

  //4. Sleep 1s - make sure the button is clicked
  await page.waitForTimeout(2000);

  //5. Set the download path
  const dist = path.join(__dirname, './dist');
  if (!fs.existsSync(dist)) {
    fs.mkdirSync(dist);
  }
  await (page as any)._client?.send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: dist,
  });

  //6. Click the [Download] button
  const elements = await page.$$('.header-menu-item');
  let downloadBtn;
  if (elements.length) {
    downloadBtn = elements[8];
  }
  if (!downloadBtn) {
    Console. Error ('download button not found ');
    await browser.close();
  }
  await downloadBtn?.click();

  //7. Sleep for 10s - ensure that resources are downloaded to
  await page.waitForTimeout(10000);

  //8. Close the window
  await browser.close();

  return await true;
};

After this operation, if the console does not report an error, the vs code is displayed as:

Node series - 006 - puppeter

Can seecommonIt does exist in the directoryDist / Excel trial file.xlsxWhen we’re done, we can connectnode-xlsxThis library is used to operate excel~

See you next time!

III. references

  • Github: Puppeteer
  • Puppeteer
  • Puppeter front-end weapon
  • Introduction to puppeter’s crawler

Jsliang’s document library is licensed by Liang Junrong under the 4.0 international license agreement of knowledge sharing signature – non-commercial use – sharing in the same way< Br / > based on https://github.com/LiangJunrong/document-library Creation of works on< Br / > use rights other than those authorized by this license agreement can be obtained from https://creativecommons.org/licenses/by-nc-sa/2.5/cn/ Obtained from.