PHP uses pthreads V3 Multithread to achieve Sina News information capture operation example

Time:2021-6-22

This article describes how PHP uses pthreads V3 multithreading to capture Sina News Information. The details are as follows:

We use pthreads to write a multithreaded page grabbing applet and store the results in the database.

The data table structure is as follows:

CREATE TABLE `tb_sina` (
 `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'ID',
 `URL ` varchar (256) default '' comment 'URL address',
 `Title ` varchar (128) default '' comment 'title',
 `time` datetime DEFAULT NULL ON UPDATE CURRENT_ Timestamp comment 'time',
 PRIMARY KEY (`id`)
)Engine = InnoDB default chart = utf8mb4 comment ='sina news';

The code is as follows:

<?php

class DB extends Worker
{
  private static $db;
  private $dsn;
  private $root;
  private $pwd;

  public function __construct($dsn, $root, $pwd)
  {
    $this->dsn = $dsn;
    $this->root = $root;
    $this->pwd = $pwd;
  }

  public function run()
  {
    //Create connection object
    self::$db = new PDO($this->dsn, $this->root, $this->pwd);

    //Put require in the worker thread, not in the main thread, otherwise an error will be reported and no class will be found
    require './vendor/autoload.php';
  }

  //Returns a connection resource
  public function getConn()
  {
    return self::$db;
  }
}

class Sina extends Thread
{
  private $name;
  private $url;

  public function __construct($name, $url)
  {
    $this->name = $name;
    $this->url = $url;
  }

  public function run()
  {
    $db = $this->worker->getConn();

    if (empty($db) || empty($this->url)) {
      return false;
    }

    $content = file_get_contents($this->url);
    if (!empty($content)) {
      //Get title, address, time
      $data = QL\QueryList::Query($content, [
        'tit' => ['.c_tit > a', 'text'],
        'url' => ['.c_tit > a', 'href'],
        'time' => ['.c_time', 'text'],
      ], '', 'UTF-8', 'GB2312')->getData();

      //Insert the acquired data into the database
      if (!empty($data)) {
        $sql = 'INSERT INTO tb_sina(`url`, `title`, `time`) VALUES';
        foreach ($data as $row) {
          //Modify the time. Sina's time format is 04-23 15:30
          $time = date('Y') . '-' . $row['time'] . ':00';
          $sql .= "('{$row['url']}', '{$row['tit']}', '{$time}'),";
        }
        $sql = rtrim($sql, ',');
        $ret = $db->exec($sql);

        if ($ret !== false) {
          Echo "thread {$this - > name} successfully inserted {$RET} pieces of data;
        } else {
          var_dump($db->errorInfo());
        }
      }
    }
  }
}

//Grab page address
$url = 'http://roll.news.sina.com.cn/s/channel.php?ch=01#col=89&spec=&type=&ch=01&k=&offset_page=0&offset_num=0&num=60&asc=&page=';
//Create pool
$pool = new Pool(5, 'DB', ['mysql:dbname=test;host=192.168.33.226', 'root', '']);

//Get 100 paging data
for ($ix = 1; $ix <= 100; $ix++) {
  $pool->submit(new Sina($ix, $url . $ix));
}

//Loop garbage collection, blocking the main thread, waiting for the end of the child thread
while ($pool->collect()) ;
$pool->shutdown();

Due to the use of querylist, you can install it through composer.


composer require jaeger/querylist

However, the installed version is 3.2, and there will be problems in my php7.2. Since each() has been abandoned, I will modify the source code and replace each() with foreach().

The results are as follows

The data is also stored in the database

Of course, you can also get the specific content of the page through the URL again. Here we don’t do the demonstration. Those who are interested can realize it by themselves.

For more information about PHP, readers interested in this site can see the following topics: summary of PHP process and thread operation skills, summary of PHP network programming skills, introductory course of PHP basic syntax, complete collection of PHP array operation skills, summary of PHP string usage “PHP + MySQL database operation tutorial” and “PHP common database operation skills summary”

I hope this article is helpful for PHP programming.

Recommended Today

React interview questions sorting

1、 What is the difference between react and Vue? 1. Vue is a responsive two-way data binding system, while react is a one-way data flow without two-way binding. 2. The syntax of Vue is relatively simple, which is applicable to the creation of small projects, while react is more applicable to the development of web […]