Analysis of the compilation and installation of Greenplum with Orca optimizer

Time:2020-4-8

Orca is an open-source optimizer of Postgres and Greenplum. Compared with the built-in optimizer of Greenplum and Postgres, Orca has a very good performance improvement in complex queries, partition tables and other occasions. Here’s how to make Greenplum enable Orca optimizer, and how to run Greenplum’s test case installcheck world.

Setting up the development environment

Before you start, you need to install build and run dependencies, including those of Greenplum and Orca. The compilation environment used here is CentOS 7. First, install the system package, and execute the following commands:

sudo yum -y groupinstall "Development Tools"
sudo yum -y install readline-devel zlib-devel curl-devel apr-devel libevent-devel libxml2-devel bzip2-devel python-devel openssl-devel which iproute net-tools perl-Env wget
sudo yum install -y epel-release centos-release-scl
sudo yum install -y python-pip python-psutil cmake3
sudo yum install -y devtoolset-6-toolchain
sudo yum install -y xerces-c-devel

Readme.centos.bash in the Greenplum source protection package has related dependency package settings. The commands here increase some dependencies compared with readme.centos.bash. Next, install the related Python dependencies. The commands are as follows:

sudo pip install --upgrade pip
sudo pip install --no-cache-dir lockfile paramiko setuptools psutil conan

Then install the build management public Ninja that Orca relies on, which is version 1.8.2. The latest 1.9 relies on a higher version of the C + + runtime library, and an error will be reported when it is executed on centos7.

wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
unzip ninja-linux.zip
sudo mv ninja /usr/local/bin/

Finally, configure additional environment settings:

sudo mkdir /usr/local/gpdb
sudo chown -R `whoami` /usr/local/gpdb
source scl_source enable devtoolset-6
sudo ln -sf /usr/bin/cmake3 /usr/bin/cmake

Now we have finished all the preparations and can start compiling.

Configure ORCA

Prepare Xerces dependency Library (optional)

Before compiling orca, you need to compile its dependencies, that is, Xerces, which Orca uses to read and write XML data formats. This step is optional, because we can use either the installed system version or the version with GP patch (address: https://github.com/greenplus-db/gp-xerces). If you want to use the GP patch version, you need to execute the following commands:

git clone https://github.com/greenplum-db/gp-xerces.git
cd gp-xerces/
mkdir build && cd build
../configure --prefix=/usr/local/gpdb
make && make install && make install

Compile ORCA

Compiling Orca requires cmake3 and gcc-6. In the first step, the configuration has been completed. Before starting, you can confirm it by the following command:

gcc --version
cmake --version

If the version is incorrect or the command is not found, make sure that the following two operations are performed correctly:

source scl_source enable devtoolset-6
sudo ln -sf /usr/bin/cmake3 /usr/bin/cmake

The command ‘source scl_source enable devtoolset-6’ is used to modify the GCC version, which can be added to the login script to execute automatically

echo 'source scl_source enable devtoolset-6' >> ~/.bashrc

Because there is a version matching problem between Greenplum and orca, we use the orca version in the Greenplum source code to compile. The specific steps are as follows:

git clone https://github.com/greenplum-db/gpdb.git --branch 6X_STABLE --single-branch --depth 1 -b 6X_STABLE 6X_STABLE
cd 6X_STABLE/depends
CFLAGS="-L/usr/local/gpdb/lib/" ./configure --prefix=/usr/local/gpdb
make
make install_local

At this time, the orca is installed in the directory / usr / local / GPDB.

Compiling Greenplum

Greenplum has many extension functions that can be controlled in the command line. Here, the focus is on compiling orca, so the configuration used turns off some other compilation parameters. Execute the following command in the source root directory of Greenplum:

export LD_LIBRARY_PATH=/usr/local/gpdb/libCFLAGS="-I/usr/local/gpdb/include" LDFLAGS="-L/usr/local/gpdb/lib/" ./configure --enable-orca --without-perl --without-python --with-libxml --without-gssapi --disable-pxf --without-zstd -without-openssl
make -j4 && make install

When the command is executed successfully, Congratulations, your own Greenplum is ready. The compiled Greenplum is installed in the / usr / local / GPDB directory. It can be packaged as a whole and deployed to the same location of other machines for cluster testing. The compiled GPDB directory is roughly as follows:

$ ls /usr/local/gpdb
bin  docs  etc  greenplum_path.sh  include  lib  sbin  share

Run test cases

After Greenplum is compiled, we need to make sure that all relevant tests can run normally. Greenplum inherits the Postgres test framework and provides its own test target: installcheck world. In addition, Greenplum also includes the configuration for creating a test cluster, so our goal is to use the test cluster to perform the installcheck world test.

Prepare system configuration

To ensure the normal execution of the test, it is strongly recommended to configure the corresponding system configuration files according to the official documents of Greenplum, including / etc / security / limits.conf and / etc / sysctl.conf, which are also described in readme.linux.md of the source directory.

In addition, Greenplum needs to use SSH to execute commands even in the stand-alone version, so password free access needs to be configured:

ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

Create test cluster

A Greenplum cluster of 3 primary and 3 mirror can be created by the following command:

source /usr/local/gpdb/greenplum_path.sh
make create-demo-cluster

When you see the following information, the demo cluster configuration is successful.

 optimizer
----------- 
 on 
(1 row)
                gp_opt_version
---------------------------------------------- 
GPOPT version: 3.48.0, Xerces version: 3.1.2(1 row)

If there is an error, you need to repair it according to the prompt. After repair, you can use the command

pkill postgres

To force the end of an incomplete operation, and then execute it again.

Execution testing

After creating the test cluster, you can run the test. Execute the following command in the Greenplum source root directory:

PGPORT=15432 make installcheck-world

All tests are performed with Orca open. In addition, although we use the stable branch, there may still be some test failures. You are welcome to provide patches or report bugs.

For more information about Greenpum technical dry goods, please visit the Greenplum Chinese community website.

Recommended Today

Python basics Chinese series tutorial · translation completed

Original: Python basics Python tutorial Protocol: CC by-nc-sa 4.0 Welcome anyone to participate and improve: a person can go very fast, but a group of people can go further. Online reading Apache CN learning resources catalog introduce Seven reasons to learn Python Why Python is great Learn Python introduction Executing Python scripts variable character string […]