How to use proxy IP to grab data and PHP crawler to grab Amazon Product data

Time:2019-11-17

 

What is an agent? When will agents be usedIP

Proxy server(Proxy Server)Its function is to obtain network information on behalf of users and then return it to users. The image says: it is the network information transit station. Through agencyIPVisit the target station to hide the user’s realityIP

For example, if you want to grab a website data, the website has100Ten thousand items, they didIPLimits, eachIPYou can only catch it every hour1000Bar, if singleIPTo catch it because it’s Limited40It can only be collected in about days. If the agent is usedIP, continuous switchingIP, can break through every hour1000The frequency of the strip is limited so as to improve the efficiency.

 

Others want to switchIPOr in the scene of hiding identity, proxy is also usedIPFor exampleSEOAnd so on.

 

agentIPThere are open agents and private agents. Open agents come from the whole network scanning. They are unstable and not suitable for crawlers. It’s ok if you use them casually. Use crawlers to capture data. It’s better to use private agents. There are many providers on the private agent network, with uneven stability. Now our company uses the private agent provided by “yiniu cloud”.

Our company has a project to capture Amazon data to analyze sales volume, comment, etcPHPPay special attention to capturing AmazonheaderOtherwise, the output data is empty. We used to use other agentsapiModel, but manage by yourselfipChi thought it was very troublesome, so he chose the crawler agent provided by yiniu cloud, which is in the dynamic forwarding mode, and doesn’t need our own managementipPool, direct data collection, which is very convenient and saves a lot of time.

 

      

        $url = “https://www.amazon.com/dp/B01H2S9F6C”;

        $urls = “https://httpbin.org/ip”;

 

        define(“PROXY_SERVER”, “tcp://t.16yun.cn:31111”);

 

        define(“PROXY_USER”, “16YUN123”);

        define(“PROXY_PASS”, “123456”);

 

        $proxyAuth = base64_encode(PROXY_USER . “:” . PROXY_PASS);

 

        $tunnel = rand(1,10000);

 

        $headers = implode(“\r\n”, [

            “Proxy-Authorization: Basic {$proxyAuth}”,

            “Proxy-Tunnel: ${tunnel}”,

        ]);

        $sniServer = parse_url($urls, PHP_URL_HOST);

        $options = [

            “http” => [

                “proxy”  => PROXY_SERVER,

                “header” => $headers,

                “method” => “GET”,

                ‘request_fulluri’ => true,

            ],

            ‘ssl’ => array(

                    ‘SNI_enabled’ => true, // Disable SNI for https over http proxies

                    ‘SNI_server_name’ => $sniServer

            )

        ];

        print($url);

        $context = stream_context_create($options);

        $result = file_get_contents($url, false, $context);

        var_dump($result);

        print($urls);

        $context = stream_context_create($options);

        $result = file_get_contents($urls, false, $context);

        var_dump($result);?>