Presto on Alluxio By Alluxio SDS single node construction

Time:2022-11-24

Overall architecture

For those who are impatient and want to practice directly, bypass this chapter first and go directly to the practical steps behind. Run the environment and see the principle. The architecture of Presto is shown in the figure below. The client’s request will be submitted to the Coordinator for processing, and the metadata information will be managed by HiveMetaStore (HMS). Then the location information of the table or partition is also stored in HMS. Therefore, if you want to put the data of the table or partition in other storage systems, you have to modify the information of HMS, which increases the maintenance cost of HMS, and HMS is The global shared service, if it is modified, other computing frameworks will have no way to maintain access to the original path.

Presto on Alluxio By Alluxio SDS single node construction

Alluxio Structure Data Service (SDS) provides a service between Presto and the underlying HMS. Presto’s hive-hadoop2 connector plug-in can use the Alluxio master as a metadata service, and the SDS module in the Alluxio master will communicate with the underlying HMS to obtain the underlying metadata, and do some processing, and return the processed result to Presto. If the location information obtained by Presto is an alluxio address, Presto will read data from Alluxio, so that the access of Presto can be switched to Alluxio without modifying HMS.

Presto on Alluxio By Alluxio SDS single node construction

build process

This paper uses the following software environment to build. Since hive, presto, and alluxio use hadoop-compatible file system API for file system access, the underlying storage can be local or hdfs. The focus of this article is not the storage system, so the file sheme is used, with local storage as the underlying storage. If you want to use hdfs to build, you can refer to the “Options” section.

Presto on Alluxio By Alluxio SDS single node construction

Configure environment variables

export HADOOP_HOME=/softwares/hadoop-2.8.5
export JAVA_HOME=/usr/java/jdk1.8.0_291-amd64/
export HIVE_CONF_DIR=/softwares/apache-hive-2.3.5-bin/conf
export HIVE_AUX_JARS_PATH=/softwares/apache-hive-2.3.5-bin/lib
export HIVE_HOME=/softwares/apache-hive-2.3.5-bin

build process

build mysql

# Use host network, or export port
docker run --net=host  -p 3306:3306 --name mysql -e MYSQL_ROOT_PASSWORD=123456 -d mysql:5.7
# Create a hive user with password hive
create database metastore;
grant all on metastore.* to [email protected]'%'  identified by 'hive';
grant all on metastore.* to [email protected]'localhost'  identified by 'hive';
flush privileges;

Install Hive and mysql connector

wget https://archive.apache.org/dist/hive/hive-2.3.5/apache-hive-2.3.5-bin.tar.gz
tar -xzvf apache-hive-2.3.5-bin.tar.gz
mv apache-hive-2.3.5-bin /softwares/
mv mysql-connector-java-5.1.38.jar /softwares/apache-hive-2.3.5-bin/lib

Below are the relevant configuration file settings.
conf/hive-env.sh

1 export METASTORE_PORT=9083

hive-site.xml

<configuration>
<property>
       <name>javax.jdo.option.ConnectionURL</name>
       <value>jdbc:mysql://127.0.0.1:3306/metastore?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
       <description>JDBC connection string used by Hive Metastore</description>
   </property>
   <property>
       <name>javax.jdo.option.ConnectionDriverName</name>
       <value>com.mysql.jdbc.Driver</value>
        <description>JDBC Driver class</description>
   </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>Metastore database user name</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive</value>
        <description>Metastore database password</description>
    </property>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://127.0.0.1:9084</value>
        <description>Thrift server hostname and port</description>
    </property>
 </configuration>

Start MetaStore

1 bin/schematool -dbType mysql -initSchema hive hive
2 bin/hive --service metastore -p 9083

hive creates schema and table
/root/testdb/person/person.csv file

1 mary 18 1000
2 john 19 1001
3 jack 16 1002
4 luna 17 1003
1 create schema test;
2 create external table test.person(name string, age int, id int) row format delimited fields terminated by ' ' location 'file:///root/testdb/person';

Build Presto

Install higher version JAVA1

# download jdk rpm package
yum localinstall jdk-8u291-linux-x64.rpm
alternatives --config java

Install Presto

wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.252/presto-server-0.252.tar.gz
tar -xzvf presto-server-0.252.tar.gz
mv presto-server-0.252 /softwares/
mkdir -p /softwares/presto-server-0.252/etc/catalog
# The following configuration files need to be created and configured
tree /softwares/presto-server-0.252/etc                  
 ├── catalog
 │   ├── hive.properties
 │   └── jmx.properties
 ├── config.properties
 ├── jvm.config
 ├── log.properties
 └── node.properties

Prepare configuration files

node.properties

node.environment=production
node.id=node01
node.data-dir=/softwares/presto-server-0.252/var/presto/data

config.properties

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=2GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080

jvm.config

 -server
 -Xmx4G
 -XX:+UseConcMarkSweepGC
 -XX:+ExplicitGCInvokesConcurrent
 -XX:+CMSClassUnloadingEnabled
 -XX:+AggressiveOpts
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:OnOutOfMemoryError=kill -9 %p
 -XX:ReservedCodeCacheSize=150M

log.properties

com.facebook.presto=INF0

hive.properties

1 connector.name=hive-hadoop2
2 hive.metastore.uri=thrift://localhost:9083

Run Presto Server

bin/launcher start

run presto-cli

wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.252/presto-cli-0.252-executable.jar
chmod +x presto-cli-0.252-executable.jar
mv presto-cli-0.252-executable.jar /softwares/presto-server-0.252/
./presto-cli-0.252-executable.jar --catalog hive --schema test
show schemas from hive;
show tables from hive.test;
select * from hive.test.person;
select count(*) from person;

Build Alluxio

Install Alluxio[omitted]

  • alluxio-site.properties

    alluxio.master.hostname=localhost

    Run Alluxio[omitted]

  • bin/alluxio-start.sh master
  • bin/alluxio table attachdb hive thrift://localhost:9083 test

    ## Reconfigure Presto to use Alluxio SDS

Modify catalog configuration

  • etc/catalog/hive.properties
# Connector is still hive-hadoop2 because presto's hive-hadoop2 plug-in already supports the function of accessing Alluxio
connector.name=hive-hadoop2
hive.metastore=alluxio
hive.metastore.alluxio.master.address=localhost:19998

Restart Presto Server

bin/launcher stop
bin/launcher start

run presto-cli

./presto-cli-0.252-executable.jar --catalog hive --schema test
show schemas from hive;
show tables from hive.test;
select * from hive.test.person;
select count(*) from person;

Observe that after running sql, the corresponding person.csv file has been completely loaded into Alluxio.

optional

Build hdfs (if necessary, hdfs can be built)
If you want to put the data in hdfs, you can build hdfs

create schema test_hdfs;
create external table test_hdfs.person(name string, age int, id int) row format delimited fields terminated by ' ' location 'hdfs://localhost:9000/root/testdb/person'; 
./presto-cli-0.252-executable.jar --catalog hive --schema test_hdfs
show schemas from hive;
select * from hive.test_hdfs.person;

Summarize

With Alluxio SDS, the location of the partition table in the underlying HMS does not need to be modified, that is, there is no change in the HMS, and there is no change in other computing engines. Presto, on the other hand, can perform some customized transformations through the metadata service provided by Alluxio SDS. For example, some partitions or tables can return the original location information without being accessed by Alluxio.

outlook

Alluxio SDS builds a catalog proxy service between Presto and HMS. Based on this, Alluxio understands the data format, so it can do some data format conversion, such as converting csv to parquet and merging small files. If there are other needs and good ideas, they can also be transformed and developed. In addition, according to this article, an All-in-one docker image can be implemented, so that more companies can experience the functions of Alluxio SDS, and more developers will build this significant feature together.

refer to

https://docs.alluxio.io/os/us…
https://www.alluxio.io/blog/s…

For more interesting and interesting [event information] [technical articles] [big coffee views], please pay attention to [[Alluxio Think Tank]](https://page.ma.scrmtech.com/…):

Presto on Alluxio By Alluxio SDS single node construction

Recommended Today

20 popular home page layout styles, which one do you like?

Author: NiemvuilaptrinhTranslator: Frontend XiaozhiSource: Niemvuilaptrinh There is a new series:Vue2 and Vue3 tips booklet Search [Great Move to the World] on WeChat, and I will share with you the front-end industry trends, learning paths, etc. as soon as possible.This articleGitHubhttps://github.com/qq449245884/xiaozhiIt has been included, and there are complete test sites, materials and my series of articles […]