CentOS 6. X: crawler environment of headless chrome + chromedriver + selenium

Time:2021-4-6

[please indicate the source for reprint]:https://blog.csdn.net/huahao1989/article/details/107890747

Chrome official website has made it clear that it no longer supports 6. X CentOS, at least 7. However, most of the time, the server version we use can’t be upgraded casually. Even though it’s already very hard, we still have to continue to use the low version. It’s really hard to install it. Fortunately, it’s a little harder, and we can install it successfully in the end.

What is headless Chrome

Headless Chrome is a non interface form of Chrome browser. You can use all the features supported by chrome to run your program without opening the browser. Compared with modern browsers, headless Chrome is more convenient to test web applications, get screenshots of websites, and do crawlers to capture information. Headless Chrome is closer to the browser environment than earlier versions such as phenomenjs and slimerjs.

CentOS version

lsb_release -a

CentOS 6. X: crawler environment of headless chrome + chromedriver + selenium

The latest version of Google Chrome installation

The installation of Google Chrome above CentOS / RedHat 7 can be fully referencedhttps://intoli.com/blog/insta…(version 6 and below is not applicable).

Specify Yum source

The server should specify an appropriate Yum source to avoid the embarrassment of not finding some dependencies.
Modify / etc/ yum.repos .d/CentOS- Base.repo , you can use Alibaba’s Yum repo:

wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
yum clean all
Install Google Chrome

according tohttps://intoli.com/blog/insta…, execute the command:

curl https://intoli.com/install-google-chrome.sh | bash

The script will automatically detect and download the missing dependency packages in the current version of chrome.

Check for lack of dependency:

ldd /opt/google/chrome/chrome | grep "not found"

If the return value is empty, the chrome dependency problem in CentOS is basically solved.

Run Chrome

implement

google-chrome-stable --no-sandbox --headless --disable-gpu --screenshot https://www.suning.com/。

The access is successful. A screenshot will be generated in the current directory screenshot.png If the error is reported

[0100/000000.311368:ERROR:broker_posix.cc(43)] Invalid node channel message

You need to install the dependency package:

yum install  \
 ipa-gothic-fonts \
 xorg-x11-fonts-100dpi \
 xorg-x11-fonts-75dpi \
 xorg-x11-utils \
 xorg-x11-fonts-cyrillic \
 xorg-x11-fonts-Type1 \
 xorg-x11-fonts-misc -y 

Installation of the latest version of chromedriver

The current chrome version is google-chrome-stable-72.0.3626.109-1.x86_ 64. The official website of chromeddrive ishttps://sites.google.com/a/ch…
CentOS 6. X: crawler environment of headless chrome + chromedriver + selenium
Download addresshttps://chromedriver.storage….
Or choose Taobao image to downloadhttp://npm.taobao.org/mirrors…
Image download addresshttp://npm.taobao.org/mirrors…

After decompression, deploy to the / opt / drivers directory, and try to run:

./chromedriver 
Starting ChromeDriver 72.0.3626.7 (efcef9a3ecda02b2132af215116a03852d08b9cb) on port 9515
Only local connections are allowed.
[1550143530.011][SEVERE]: CreatePlatformSocket() returned an error, errno=0: Address family not supported by protocol (97)

In addition, to modify / etc / hosts, bind 127.0.0.1 localhost. Otherwise, chromedriver in Java selenium runtime may report timeout exception because localhost cannot be found

Install selenium

  • Install Python and configure environment variables

Shell input:python -VIf the corresponding version number appears, the installation is successful!

  • Install pip
    Python comes with PIP by default. In the scripts directory of the installation directory, you can configure it to the environment variable by yourself. After the configuration is completed, enter:pip -VIf the corresponding version number appears, the installation is successful!
  • Install selenium
    Shell input:pip install seleniumTip: successfully installed selenium – the installation is successful!
python
>>>from selenium import webdriver
>>>driver = webdriver.Chrome()
>>>driver.get('https://www.baidu.com')

It’s OK. Just write a python script normally.

Problems in building environment

1、/lib64/libc.so.6: version `GLIBC_2.14′ not found (required by ./chromedriver)

#View system version
cat /etc/redhat-release 
#View glibc supported versions
strings /lib64/libc.so.6 |grep GLIBC_

wget http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.gz 
wget http://ftp.gnu.org/gnu/glibc/glibc-ports-2.14.tar.gz 
tar -xvf  glibc-2.14.tar.gz 
tar -xvf  glibc-ports-2.14.tar.gz
mv glibc-ports-2.14 glibc-2.14/ports
mkdir glibc-2.14/build
cd glibc-2.14/build 
../configure  --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include --with-binutils=/usr/bin
make
make install

Three points should be noted during installation and compilation:

  • To unzip glibc ports into the glibc directory
  • Cannot run configure in glibc’s current directory
  • With the optimized switch, export cflags = – G – O2 – March = i486 “, otherwise errors will occur
  • NSS may appear in the process of make install_ If test1 cannot be loaded, libnss can be loaded_ Test 1. So. 2grep "nss_test1" . -nrFind the command in the / glibc directory, and there are not many places to load it (this is just a static library to test NSS, you can not use it)

2、/lib64/libc.so.6: version `GLIBC_2.16′ not found (required by ./chromedriver)

wget http://ftp.gnu.org/gnu/glibc/glibc-2.16.0.tar.gz 
wget http://ftp.gnu.org/gnu/glibc/glibc-ports-2.16.0.tar.gz 
tar -xvf  glibc-2.16.0.tar.gz 
tar -xvf  glibc-ports-2.16.0.tar.gz
mv glibc-ports-2.16.0 glibc-2.16.0/ports
mkdir glibc-2.16.0/build
cd glibc-2.16.0/build 
../configure  --prefix=/usr --disable-profile --enable-add-ons --with-headers=/usr/include --with-binutils=/usr/bin
make
make install 

report errors

Unmatched ( in regex; marked by  HERE in m/$( <-- HERE if $(abi-64-ld-soname),$(abi-64-ld-soname),ld/ at scripts/test-installation.pl line

You can refer to the solutionhttps://sourceware.org/bugzil…

  • glibc-2.16.0/Makefile
ifeq (,$(install_root))
      CC="$(CC)" $(PERL) scripts/test-installation.pl $(common-objpfx)
endif

Change to

ifeq (,$(install_root))
     LD_SO=$(ld.so-version) CC="$(CC)" $(PERL) scripts/test-installation.pl $(common-objpfx)
endif
  • glibc-2.16.0/scripts/test-installation.pl
sub usage {
    print "Usage: test-installation [soversions.mk]\n";

Add above

if ($ENV{LD_SO}) {
  $LD_SO = $ENV{LD_SO};
} else {
  $LD_SO= "";
}

sub usage {
    print "Usage: test-installation [soversions.mk]\n";

stay

} else {
  if (/^ld\.so/) {
     ($ld_so_name, $ld_so_version)= /=(.*)\.so\.(.*)$/;

Add above

} elsif ($LD_SO ne "") {
    ($ld_so_name, $ld_so_version) = split ('\.so\.', $LD_SO);
} else {
  if (/^ld\.so/) {
     ($ld_so_name, $ld_so_version)= /=(.*)\.so\.(.*)$/;

Welcome to “back end old bird” The official account will be followed by a series of thematic articles, including Java, Python, Linux, SpringBoot, SpringCloud, Dubbo, algorithm, management of technical team, and various brain maps and learning materials, NFC technology, search technology, crawler technology, recommendation technology, audio frequency interactive live broadcast, etc. as long as there is time, I will organize and share. Please look forward to ready-made notes. If you need a map and learning materials, you can get the official account message in advance. Because I am basically in the role of tackling key problems and exploring the way in all the teams, I have done many things, and I have encountered many holes and solved many problems. Welcome to join the official account and learn together.

[please indicate the source for reprint]:https://blog.csdn.net/huahao1989/article/details/107890747

CentOS 6. X: crawler environment of headless chrome + chromedriver + selenium