On the basic use of curl in sending post requests in PHP simulation

Time:2022-7-28
catalogue
  • Basic usage steps of curl
  • Common settings of curl
    • Set basic information:
    • Set post data information:
    • Set authentication information:
    • Set reinforcement information:
  • Basic use of curl batch processing function
    • The problem of too much memory in curl batch processing
      • Memory optimization scheme for curl batch processing
        • Memory optimization results of curl batch processing

          Basic usage steps of curl

          First, let’s introduce curl:

          Curl simulates browser data transmission according to HTTP header information. It supports FTP, FTPS, HTTP, HTTPS, dict, file and other protocols, and has HTTPS authentication, httppost method, httpput method, FTP upload, HTTP upload, proxy server, cookies, user name / password authentication and other functions. Curl can be said to be a sharp tool to realize the functions of crawling web pages and post data.

          Using the curl function is mainly divided into four parts:

          1. Initialize curl.

          2. Set the curl variable, which is the core of crul, and the extension function depends on this step.

          3. Execute curl to get the results.

          4. Close the connection and recycle resources.

          
          $ch = curl_init();//1
          
          curl_setopt($ch, CURLOPT_URL, "http://localhost");//2
          
          $output = curl_exec($ch);//3
          
          curl_close($ch);//4

          In addition, we can also use curl_ The getinfo ($ch) function gets the information of curl execution, and the result is an array

          The contents of the $info array include the following contents:

          • “URL” / / resource network address
          • “Content_type” / / content encoding
          • “Http_code” / / HTTP status code
          • “FILETIME” / / file creation time
          • “Total_time” / / total time
          • “Size_upload” / / the size of the uploaded data
          • “Size_download” / / the size of the downloaded data
          • “Speed_download” / / download speed
          • “Speed_upload” / / upload speed
          • “Download_content_length” / / the length of the downloaded content
          • “Upload_content_length” / / the length of the uploaded content

          Common settings of curl

          The following describes in detail the common variable settings when using curl in the second step. When using curl function, you can set it according to various needs.

          Set basic information:

          curl_ setopt($ch,CURLOPT_URL,$string);// Set the directory address of curl

          curl_ setopt($ch,CURLOPT_PORT,$port);// Set the connection port. Generally, the default 80 is not set

          curl_ setopt($ch,CURLOPT_RETURNTRANSFER,1);// Return the result stream instead of outputting it for subsequent processing. Generally, this item will be set to process the captured information later, rather than directly outputting it.

          Set post data information:

          curl_ setopt($ch,CURLOPT_POST,1);// Set the data transmission method to post

          curl_ setopt($ch,CURLOPT_POSTFIELDS,$string);// Set the data to be transferred

          Set authentication information:

          curl_ setopt($ch,CURLOPT_COOKIE,$string);// Set the cookie information carried during curl execution

          curl_ setopt($ch,CURLOPT_USERAGENT,$string);// Set the browser information of curl simulation

          curl_ setopt($ch,CURLOPT_REFERER,$string);// Setting the referer in the header is conducive to cracking the anti-theft chain

          curl_ setopt($ch,CURLOPT_USERPWD,$string);// Pass the user name and password required in a connection. The format is: “[username]: [password]”

          curl_ setopt($ch,CURLOPT_FOLLOWLOCATION,1);// Set allow server redirection

          Set reinforcement information:

          curl_ setopt($ch,CURLOPT_NOBODY,1);// Setting does not allow the output of htmlbody body. If this option is set when grabbing page title and other information, the speed will be greatly accelerated

          curl_ setopt($ch,CURLOPT_TIMEOUT,$int);// Set the maximum number of seconds (timeout) allowed for execution. When the setting value is small, curl will abandon the page with long execution time

          curl_ setopt($ch,CURLOPT_HEADER,1);// Setting allows the header header file generated when reading the target to be included in the output stream

          Basic use of curl batch processing function

          Of course, the function of curl is more than that. You can find more variable settings in the manual. And the most powerful part of curl is its batch processing function.

          The batch processing of curl also seems to be well understood. The following are the general steps:

          1.$mh=curl_ multi_ init();// Initialize a batch handle.

          2.curl_ multi_ add_ handle($mh,$ch);// Add the set $ch handle to the batch handle.

          3.curl_ multi_ exec($mh,$running);// Execute the $mh handle and write the running state of the $mh handle to the $running variable

          4. When $running is true, execute curl circularly_ multi_ Close() function

          5. After the loop ends, traverse the $mh handle with curl_ multi_ Getcontent () gets the return value of the first handle

          6. Use curl_ multi_ remove_ Handle() remove the handle in $mh

          7. Use curl_ multi_ Close() closes the $mh batch handle.

          The code is as follows:

          
          <?php 
          
                  $chArr=[];
          
                  for($i=0;$i<50;$i++){
          
                        $chArr[$i]=curl_init("http://www.baidu.com");
          
                        curl_setopt($chArr[$i],CURLOPT_RETURNTRANSFER,1);
          
                  }
          
                 $mh = curl_multi_init(); //1
          
                 foreach($chArr as $k => $ch){      
          
                   curl_multi_add_handle($mh,$ch); //2
              
              }
          
                 $running = null; 
          
                 do{ 
          
                     curl_multi_exec($mh,$running); //3
          
                  }while($running > 0); //4
          
                  foreach($chArr as $k => $ch){ 
          
                        $result[$k]= curl_multi_getcontent($ch); //5
          
                        curl_multi_remove_handle($mh,$ch);//6
          
                  }
          
                  curl_multi_close($mh); //7
          
                ?>

          The problem of too much memory in curl batch processing

          However, when executing a large number of handles, we will find a very serious problem, that is, when executing, the CPU utilization of the system is almost 100%, which is almost in a panic state. Correct the reason, because when $running>0, execute curl_ multi_ When exec ($mh, $running) and the entire batch handle is not completely executed, the system will constantly execute curl_ multi_ Exec() function. We use experiments to prove:

          We curl in the loop_ multi_ Add an echo “a” before the exec ($mh, $running) sentence; Statement of. Our goal is to perform 50 visits to Baidu and then look at the results.

          From the size of the scroll bar in the figure (the scroll bar has been in the minimum state), we can roughly see the number of output a, which is more than 500, so we can find the culprit of CPU occupation.

          Memory optimization scheme for curl batch processing

          The way to make changes is to apply curl in the curl function library_ multi_ The prototype of the select() function is as follows:

          intcurl_multi_select(resource$mh[,float$timeout=1.0])

          Block until there is an active connection in the curl batch connection. Returns the number of descriptors in the descriptor set on success. In case of failure, select returns -1 in case of failure. Otherwise, it returns timeout (called from the underlying select system).

          I use curl_ multi_ Select() function to achieve the purpose of blocking the program without reading.

          We optimize the third and fourth steps of batch processing, and use its multithreading to simulate concurrent programs.

          Many friends will have doubts about the code provided in the manual (I was the same at the beginning). The following code and explanation.

          $running = null;
          
          do {
          
              $mrc = curl_multi_exec($mh, $running);
          
          } while ($mrc == CURLM_CALL_MULTI_PERFORM);
          
          //This cycle processes the $ch handle in the $mh batch for the first time, and writes the execution status of the $mh batch to $running, when the status value is equal to curlm_ CALL_ MULTI_ When performing, it indicates that the data is still being written or read. Execute the cycle. When the data of the $ch handle is written or read successfully for the first time, the status value changes to curlm_ OK, jump out of this cycle and enter the next big cycle.
          
          //$running is true, that is, $ch handle is pending in the $mh batch, $mrc==curlm_ OK, that is, the last reading or writing of the $ch handle has been completed.
          
          while ($running && $mrc == CURLM_OK) { 
          
              if (curl_multi_select($mh) != - 1) {//$mh batch processing also has an executable $ch handle, curl_multi_select ($mh)! = -1 the program exits the blocking state.
          
                  Do {// continue to execute the $ch handle that needs to be processed.
          
                       $mrc = curl_multi_exec($mh, $running);
          
                  } while ($mrc == CURLM_CALL_MULTI_PERFORM);
          
              }
          
          }

          The advantage of this execution is that the $ch handle in the $mh batch will enter curl after reading or writing data ($mrc==curlm_ok)_ multi_ Select ($mh) instead of continuously executing curl during the execution of the entire $mh batch_ multi_ Exec, a waste of CPU resources.

          Memory optimization results of curl batch processing

          The complete code is as follows:

          
          <?php 
          
                  $chArr=[];
          
                  for($i=0;$i<50;$i++){
          
                  $chArr[$i]=curl_init("http://www.baidu.com");
          
                  curl_setopt($chArr[$i],CURLOPT_RETURNTRANSFER,1);
          
                  }
          
                 $mh = curl_multi_init(); 
          
                  foreach($chArr as $k => $ch)      
          
                  curl_multi_add_handle($mh,$ch); 
                  $running = null; 
          
                  do {
                  $mrc = curl_multi_exec($mh, $running);
          
               } while ($mrc == CURLM_CALL_MULTI_PERFORM);
          
          
              while ($running && $mrc == CURLM_OK) {
          
                   if (curl_multi_select($mh) != -1) {
          
                       do {
          
                            $mrc = curl_multi_exec($mh, $running);
          
                       } while ($mrc == CURLM_CALL_MULTI_PERFORM);
          
                   }
          
              }
          
                 foreach($chArr as $k => $ch){ 
          
                         $result[$k]= curl_multi_getcontent($ch); 
          
                         curl_multi_remove_handle($mh,$ch);
          
                 }
          
                  curl_multi_close($mh); 
          
                ?>

          Once again, we are at $mrc=curl_ multi_ Add echo “a” before the exec ($mh, $running) sentence; The results are as follows:

          Although there are more than 50 times, the CPU utilization has been greatly improved compared with that before optimization.

          Although the curl function is very powerful, we still have the opportunity to use other functions to send post requests. In addition, we can also understand the curl function from the lower level, so this series also devotes a lot of space to other functions.

          The above is about the details of the basic use of curl when PHP simulates sending post requests. For more information about the basic use of curl when PHP simulates sending post requests, please pay attention to other related articles of developeppaer!

          Recommended Today

          JS generate guid method

          JS generate guid method https://blog.csdn.net/Alive_tree/article/details/87942348 Globally unique identification(GUID) is an algorithm generatedBinaryCount Reg128 bitsNumber ofidentifier , GUID is mainly used in networks or systems with multiple nodes and computers. Ideally, any computational geometry computer cluster will not generate two identical guids, and the total number of guids is2^128In theory, it is difficult to make two […]