Palo Doris five minute quick start!

Time:2021-11-25

This article is reproduced from Baidu Developer Centerhttps://developer.baidu.com/article/detail.html?id=294225

In this tutorial section, I will introduce you to the operation process of using Palo UI to quickly experience and use Palo query.

  • For public cloud users, please refer to the documentation to create a Palo cluster.
  • Open source users need to configure enable in fe.conf_ http_ server_ After V2 = true, restart the Fe node.

Palo UI is a Web UI environment that Palo provides for you to quickly execute query requests and perform some management operations.

The demo data and query examples used in this document are from star schema benchmark. Users can click to obtain sample data and SQL statements.

Enter Palo UI

In the cluster details page of Palo, a shortcut to the Palo UI is provided.

Palo Doris five minute quick start!

Click the Palo UI button to enter the Web UI environment. Enter the user name and password on the login page. The user name is “admin”, and the password is the password filled in by the user when creating the cluster.

Open source users can open the HTTP of Fe node in the browser_ Port (8030 by default).

Palo Doris five minute quick start!

After clicking login, you can enter the main page of Palo UI. The default page is Palo query page (playground).

Palo Doris five minute quick start!

The Palo quick query page is mainly divided into three areas. The table management area is on the left, including system library tables and tables created by users. The upper right area is the SQL execution area, and the lower right area is the table preview, data import and execution result area. Next, we will show the main steps from database creation, table creation, data import and query on this page to help users who use Palo for the first time experience a complete use process.

Database and table building

In the editor area, we enter an SQL statement to create an example_ DB library. After clicking execute, you can see the execution results below. After successful execution, refresh the table area on the left, and you can see the newly created example in the table management area_ DB library.

CREATE DATABASE example_db;
Palo Doris five minute quick start!

Our demo data contains five sales related data samples such as order, date and customer information, so we need to create five corresponding tables.

First, we are in example_ Create a table named lineorder in the DB library. Palo uses the distributed keyword to set the bucket column. The bucket column is used to divide the data horizontally. Generally, we choose a column that can help the data to be evenly divided as the bucket column. Here we use lo_ Orderkey is used as the bucket column. Here, we also set the number of replicas to 1, because Palo defaults to three replicas. If our cluster only purchases one compute node, we need to manually set the number of replicas to 1.

This table mainly defines the order number, order time, profit, tax and other order master information.

CREATE TABLE lineorder (
    lo_orderkey BIGINT,
    lo_linenumber BIGINT,
    lo_custkey INT,
    lo_partkey INT,
    lo_suppkey INT,
    lo_orderdate INT,
    lo_orderpriotity VARCHAR(16),
    lo_shippriotity INT,
    lo_quantity BIGINT,
    lo_extendedprice BIGINT,
    lo_ordtotalprice BIGINT,
    lo_discount BIGINT,
    lo_revenue BIGINT,
    lo_supplycost BIGINT,
    lo_tax BIGINT,
    lo_commitdate BIGINT,
    lo_shipmode VARCHAR(11)
)
DISTRIBUTED BY HASH(lo_orderkey)
PROPERTIES ("replication_num"="1");

Then we create a date table. Use d_ Datekey is used as the bucket column, and the number of copies is set to 1. This table defines more detailed order date information.

CREATE TABLE date (

    d_datekey INT,

    d_date VARCHAR(20),
    d_dayofweek VARCHAR(10),
    d_month VARCHAR(11),
    d_year INT,
    d_yearmonthnum INT,
    d_yearmonth VARCHAR(9),
    d_daynuminweek INT,
    d_daynuminmonth INT,
    d_daynuminyear INT,
    d_monthnuminyear INT,
    d_weeknuminyear INT,
    d_sellingseason VARCHAR(14),
    d_lastdayinweekfl INT,
    d_lastdayinmonthfl INT,
    d_holidayfl INT,
    d_weekdayfl INT
) DISTRIBUTED BY hash(d_datekey) PROPERTIES ( 
  "storage_type"="column", 
  "replication_num"="1");

Next, we create the remaining three tables, customer, part and supplier, which record the details of customers, goods and suppliers respectively.

 CREATE TABLE customer (

    c_custkey INT,

    c_name VARCHAR(26),
    c_address VARCHAR(41),
    c_city VARCHAR(11),
    c_nation VARCHAR(16),
    c_region VARCHAR(13),
    c_phone VARCHAR(16),
    c_mktsegment VARCHAR(11) )
DISTRIBUTED BY hash(c_custkey) 
PROPERTIES ( 
  "storage_type"="column", 
  "replication_num"="1");

CREATE TABLE part (
    p_partkey INT,
    p_name VARCHAR(23),
    p_mfgr VARCHAR(7),
    p_category VARCHAR(8),
    p_brand VARCHAR(10),
    p_color VARCHAR(12),
    p_type VARCHAR(26),
    p_size INT,
    p_container VARCHAR(11) )
DISTRIBUTED BY hash(p_partkey) 
PROPERTIES ( 
  "storage_type"="column", 
  "replication_num"="1");

CREATE TABLE supplier (
    s_suppkey INT,
    s_name VARCHAR(26),
    s_address VARCHAR(26),
    s_city VARCHAR(11),
    s_nation VARCHAR(16),
    s_region VARCHAR(13),
    s_phone VARCHAR(16) )
DISTRIBUTED BY hash(s_suppkey) 
PROPERTIES ( 
  "storage_type"="column", 
  "replication_num"="1");

After the table is created, you can view example_ Table information in DB:

Palo Doris five minute quick start!

Import data

Palo supports a variety of data import methods. See the data import document for details. Here we use the web method to easily import data as an example.

First click to select the table to import data

Palo Doris five minute quick start!

Then click data import to enter the data import page

Palo Doris five minute quick start!

Click next. After that, first select the column separator of the imported file. Here we are \ t segmentation. Then select the data file to import

Wait until the file upload is completed, click to select the uploaded file below. At this time, you will also see the preview data of the file divided according to the specified separator (the first 10 lines).

Palo Doris five minute quick start!

Click “next” to enter the data import configuration page. We take “lineorder_data” as the label, and then click the “import” button.

Palo Doris five minute quick start!

Wait a moment and you can see the results of data import. The status of status is success, which means that the import is successful. After clicking OK, the data import is completed.

Palo Doris five minute quick start!

Due to the limited data size supported by the Web UI, we have divided the complete lineorder data into six parts. Here we only import one data as a demonstration. We have placed the complete data samples in the demo file, and the user can add and import all the data according to the test requirements.

We import the data corresponding to the table date, customer, part and supplier in the same way.

Data query

Simple query

After the data import is completed, we can execute some query statements to view the status of the data.

You can preview part of the data in the table.

SELECT * FROM lineorder limit 10
Or count the number of records in the query table.

SELECT COUNT(*) FROM lineorder

Analysis query

Then you can perform query operations and obtain query results according to our analysis requirements.

SELECT SUM(LO_EXTENDEDPRICE*LO_DISCOUNT) AS
REVENUE
FROM  lineorder, date
WHERE  LO_ORDERDATE = D_DATEKEY
AND D_YEAR = 1993
AND LO_DISCOUNT BETWEEN 1 AND 3
AND LO_QUANTITY < 25;

Through the Web UI, we execute SQL on the page and quickly obtain query results.

You can also execute complex queries with multiple tables

SELECT C_CITY, S_CITY, D_YEAR, SUM(LO_REVENUE)

AS  REVENUE

FROM customer, lineorder, supplier, date
WHERE  LO_CUSTKEY = C_CUSTKEY
AND LO_SUPPKEY = S_SUPPKEY
AND  LO_ORDERDATE = D_DATEKEY
AND C_NATION = 'UNITED STATES'
AND S_NATION = 'UNITED STATES'
AND D_YEAR >= 1992 AND D_YEAR <= 1997
GROUP BY C_CITY, S_CITY, D_YEAR
ORDER BY D_YEAR ASC,  REVENUE DESC;

Execution time and results of page query:

Palo Doris five minute quick start!

The execution time is the actual execution time of SQL on the server. Because the UI interface passes through multiple agents, the user perceived query delay is slightly slower than the actual SQL execution time.

 so far, we have completed a complete process of database creation, table creation, data import and query through the UI interface.

Source: Data cornerstone
Author: Mao Kaimin

Click to enter for more technical information~~

Recommended Today

Apache sqoop

Source: dark horse big data 1.png From the standpoint of Apache, data flow can be divided into data import and export: Import: data import. RDBMS—–>Hadoop Export: data export. Hadoop—->RDBMS 1.2 sqoop installation The prerequisite for installing sqoop is that you already have a Java and Hadoop environment. Latest stable version: 1.4.6 Download the sqoop installation […]