This article describes how PHP uses pthreads V3 multithreading to capture Sina News Information. The details are as follows:
We use pthreads to write a multithreaded page grabbing applet and store the results in the database.
The data table structure is as follows:
CREATE TABLE `tb_sina` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'ID',
`URL ` varchar (256) default '' comment 'URL address',
`Title ` varchar (128) default '' comment 'title',
`time` datetime DEFAULT NULL ON UPDATE CURRENT_ Timestamp comment 'time',
PRIMARY KEY (`id`)
)Engine = InnoDB default chart = utf8mb4 comment ='sina news';
The code is as follows:
<?php
class DB extends Worker
{
private static $db;
private $dsn;
private $root;
private $pwd;
public function __construct($dsn, $root, $pwd)
{
$this->dsn = $dsn;
$this->root = $root;
$this->pwd = $pwd;
}
public function run()
{
//Create connection object
self::$db = new PDO($this->dsn, $this->root, $this->pwd);
//Put require in the worker thread, not in the main thread, otherwise an error will be reported and no class will be found
require './vendor/autoload.php';
}
//Returns a connection resource
public function getConn()
{
return self::$db;
}
}
class Sina extends Thread
{
private $name;
private $url;
public function __construct($name, $url)
{
$this->name = $name;
$this->url = $url;
}
public function run()
{
$db = $this->worker->getConn();
if (empty($db) || empty($this->url)) {
return false;
}
$content = file_get_contents($this->url);
if (!empty($content)) {
//Get title, address, time
$data = QL\QueryList::Query($content, [
'tit' => ['.c_tit > a', 'text'],
'url' => ['.c_tit > a', 'href'],
'time' => ['.c_time', 'text'],
], '', 'UTF-8', 'GB2312')->getData();
//Insert the acquired data into the database
if (!empty($data)) {
$sql = 'INSERT INTO tb_sina(`url`, `title`, `time`) VALUES';
foreach ($data as $row) {
//Modify the time. Sina's time format is 04-23 15:30
$time = date('Y') . '-' . $row['time'] . ':00';
$sql .= "('{$row['url']}', '{$row['tit']}', '{$time}'),";
}
$sql = rtrim($sql, ',');
$ret = $db->exec($sql);
if ($ret !== false) {
Echo "thread {$this - > name} successfully inserted {$RET} pieces of data;
} else {
var_dump($db->errorInfo());
}
}
}
}
}
//Grab page address
$url = 'http://roll.news.sina.com.cn/s/channel.php?ch=01#col=89&spec=&type=&ch=01&k=&offset_page=0&offset_num=0&num=60&asc=&page=';
//Create pool
$pool = new Pool(5, 'DB', ['mysql:dbname=test;host=192.168.33.226', 'root', '']);
//Get 100 paging data
for ($ix = 1; $ix <= 100; $ix++) {
$pool->submit(new Sina($ix, $url . $ix));
}
//Loop garbage collection, blocking the main thread, waiting for the end of the child thread
while ($pool->collect()) ;
$pool->shutdown();
Due to the use of querylist, you can install it through composer.
composer require jaeger/querylist
However, the installed version is 3.2, and there will be problems in my php7.2. Since each() has been abandoned, I will modify the source code and replace each() with foreach().
The results are as follows
The data is also stored in the database
Of course, you can also get the specific content of the page through the URL again. Here we don’t do the demonstration. Those who are interested can realize it by themselves.
For more information about PHP, readers interested in this site can see the following topics: summary of PHP process and thread operation skills, summary of PHP network programming skills, introductory course of PHP basic syntax, complete collection of PHP array operation skills, summary of PHP string usage “PHP + MySQL database operation tutorial” and “PHP common database operation skills summary”
I hope this article is helpful for PHP programming.