This article takes you to understand the installation process of data warehouse hive

Time:2020-11-22

Hive is a data warehouse, which is based on Hadoop framework, and can map the structured data files existing on HDFS into a database table. Hive can use SQL like statements to process structured data (query data), that is, structured data is regarded as a table in mysql, and SQL statements are used to query.

Structured data refers to row data, which can be represented by two-dimensional table structure; unstructured data refers to data that cannot be represented by two-dimensional table structure, including all formats of office documents, text, pictures, XML, HTML, various reports, images and audio / video information.

The essence of hive is to convert SQL statements into MapReduce tasks, so that users who are not familiar with MapReduce can easily use HQL to process and calculate structured data on HDFS, which is suitable for offline batch data calculation.

Hive official website

Introduction to the official website:

The Apache Hive ™ data warehouse software facilitates reading,writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

1 MySQL installation

Hive places the metadata in the embedded Derby database by default. However, because the Derby database can only allow one session connection, it is not applicable to the actual production environment, so this paper uses Mysql to store the metadata of hive.

MySQL download address

Install MySQL using Yum tools

#Download mysql80-community-release-el8-1 noarch.rpm
wget https://dev.mysql.com/get/mysql80-community-release-el8-1.noarch.rpm
#Install RPM package
yum localinstall mysql80-community-release-el8-1.noarch.rpm
#Install MySQL server
yum install mysql-community-server
#Start the MySQL server and set the boot
systemctl start mysqld
systemctl enable mysqld
systemctl daemon-reload
#Modify the default password of the root account before logging in to the MySQL client
#Check the default password of MySQL root user in the log file
grep 'temporary password' /var/log/mysqld.log
#MySQL client login, fill in the above command to take out the password, three choose one.
mysql -uroot -p
#Change password, password must have case and number.
ALTER USER 'root'@'localhost' IDENTIFIED BY 'Pass9999';
Change the password, to restart the server.
systemctl restart mysqld
#Login client
mysql -uroot -p Pass9999
#Modify the remote access rights of root to remotely connect to MySQL.
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'Pass9999' WITH GRANT OPTION;
#Refresh MySQL system permissions
FLUSH PRIVILEGES;

This article takes you to understand the installation process of data warehouse hive
MySQL stores the metadata of hive. What tables does the metadata have? The specific meanings of each table are as follows:

This article takes you to understand the installation process of data warehouse hive

SELECT * FROM `VERSION`
SELECT * FROM `DBS`
SELECT * FROM `TBLS`
  • Version table: hive version information
    This article takes you to understand the installation process of data warehouse hive
  • Related DBS tables: metadata tables
    This article takes you to understand the installation process of data warehouse hive
  • Tbls table: hive table and view related metadata table
    This article takes you to understand the installation process of data warehouse hive

2 hive installation

The package I downloaded is: apache-hive-3.1.2- bin.tar.gz , the download address can be found in the article building Hadoop cluster.

#Unzip / usr / local
tar -zxvf apache-hive-3.1.2-bin.tar.gz /usr/local
#Rename
mv apache-hive-3.1.2-bin hive-3.1.2
#Configure environment variables
vi /etc/profile
#Add the following configuration at the end of the document
export HIVE_HOME=/usr/local/hive-3.1.2
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$HIVE_HOME/bin:$PATH
#Immediate effect environment variable
source /etc/profile
#CD to file
/usr/local/hive-3.1.2/conf
#Copy a file as hive- site.xml
cp hive-default.xml.template hive-site.xml
#Empty hive- site.xml Add the following
<configuration>
    < property > <! -- database connection address, use Mysql to store metadata information, and create hive DB -- >
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    </property>
    < property > <! -- database driven -- >
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
    </property>
    < property > <! -- database user name -- >
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>root</value>
      <description>Username to use against metastore database</description>
    </property>
    < property > <! -- password -- >
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>Pass-9999</value>
      <description>password to use against metastore database</description>
    </property>
    < property > <! -- HDFS path is used to store the execution plans of different map / reduce stages and the intermediate output results of these stages. >
      <name>hive.exec.local.scratchdir</name>
      <value>/tmp/hive</value>
    </property>
    The directory where the HDFS is located
      <name>hive.querylog.location</name>
      <value>/tmp/logs</value>
      </property>
    < property > <! -- default location of local table, HDFS path -- >
      <name>hive.metastore.warehouse.dir</name>
      <value>/user/hive/warehouse</value>
    </property>
    < property > <! -- the local mode is enabled, and there are three startup modes. For details, see the following -- >
      <name>hive.metastore.local</name>
      <value>true</value>
    </property>
    <property
      <name>hive.server2.logging.operation.log.location</name>
      <value>/tmp/logs</value>
      </property>
      <property>
      <name>hive.downloaded.resources.dir</name>
      <value>/tmp/hive${hive.session.id}_resources</value>
    </property>
</configuration>

#Modify hive configuration file
#CD to bin file
/usr/local/hive-3.1.2/bin
#Add the following configuration
export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-256}
export JAVA_HOME=/usr/local/jdk1.8.0_261
export HADOOP_HOME=/usr/local/hadoop-3.2.1
export HIVE_HOME=/usr/local/hive-3.1.2
#Add Java driver
cd /usr/local/hive-3.1.2/lib
#Put the jar under the Lib file, which is downloaded from the Internet.
mysql-connector-java-5.1.49-bin.jar
#Initialize hive and start hive
#CD to bin file, initialize hive, mainly to initialize MySQL and add Mysql to hive Metadatabase.
cd /usr/local/hive-3.1.2/bin
schematool -initSchema -dbType mysql
#Start hive, input hive directly to start
hive
#View database and table
show databases;
show tables;

This article takes you to understand the installation process of data warehouse hive

The article is constantly updated. You can search “big data analyst knowledge sharing” on wechat for the first time to read, and reply to [666] to obtain big data related information.

Recommended Today

Build an open source project 13 – install IK word breaker and zookeeper

1、 Installing the IK word breaker Download IK word breaker plug-in wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.4.2/elasticsearch-analysis-ik- Using Linux to download will be very slow, so I went to GitHub and downloaded it in advance. Now I will start to install it [[email protected] ~]# mkdir /opt/elasticsearch/elasticsearch-6.4.2/plugins/elasticsearch-analysis-ik-6.4.2 [[email protected] ~]# cd /opt/elasticsearch/elasticsearch-6.4.2/plugins/elasticsearch-analysis-ik-6.4.2 [[email protected] elasticsearch-analysis-ik-6.4.2]# unzip elasticsearch-analysis-ik-6.4.2.tar.gz Decompression means that the IK […]