Common functions of hivesql / sparksql


1、 Get the current time

  1. current_ Date gets the current date
  2. current_ Timestamp / now() gets the current time
    2018-04-09 15:20:49.247

2、 Extract field from date time

  1. year,month,day/dayofmonth,hour,minute,second
 > SELECT day('2009-07-30');

*   1
*   2

  1. dayofweek (1 = Sunday, 2 = Monday, …, 7 = Saturday),dayofyear
 > SELECT dayofweek('2009-07-30');  

*   1
*   2

  1. weekofyear
    weekofyear(date) – Returns the week of the year of the given date. A week is considered to start on a Monday and week 1 is the first week with >3 days.
 > SELECT weekofyear('2008-02-20');  

*   1
*   2

  1. TRUNC intercepts the date of a part, and the other parts are 01 by default
    The second parameter [“year”, “yyyy”, “YY”, “mon”, “month”, “mm”]
 > SELECT trunc('2009-02-12', 'MM');
    > SELECT trunc('2015-10-27', 'YEAR');

*   1
*   2
*   3
*   4

  1. date_trunc [“YEAR”, “YYYY”, “YY”, “MON”, “MONTH”, “MM”, “DAY”, “DD”, “HOUR”, “MINUTE”, “SECOND”, “WEEK”, “QUARTER”]
 > SELECT date_trunc('2015-03-05T09:32:05.359', 'HOUR');  

*   1
*   2

  1. date_ Format converts time to a string of some format
 > SELECT date_format('2016-04-08', 'y');    

*   1
*   2

3、 Date time conversion

  1. unix_ Timestamp returns the UNIX timestamp of the current time
 > SELECT unix_timestamp();  1476884637
    > SELECT unix_timestamp('2016-04-08', 'yyyy-MM-dd');   1460041200` 

*   1
*   2

  1. from_ Unixtime converts the timestamp to the current time, to_ unix_ Timestamp converts time to timestamp
 > SELECT from_unixtime(0, 'yyyy-MM-dd HH:mm:ss');  1970-01-01 00:00:00
    > SELECT to_unix_timestamp('2016-04-08', 'yyyy-MM-dd');  1460041200` 

*   1
*   2

  1. to_ Date / date converts a string to a date format, to_ timestamp(Since: 2.2.0)
 > SELECT to_date('2009-07-30 04:17:52');  2009-07-30
    > SELECT to_date('2016-12-31', 'yyyy-MM-dd');   2016-12-31
    > SELECT to_timestamp('2016-12-31 00:12:00');   2016-12-31 00:12:00` 

*   1
*   2
*   3

  1. Quarter will divide a year into four equal parts (range 1 to 4)
 > SELECT quarter('2016-08-31'); 

*   1
*   2

4、 Date and time calculation

  1. months_ Number of months between two dates
    months_between(timestamp1, timestamp2) – Returns number of months between timestamp1 and timestamp2.
 > SELECT months_between('1997-02-28 10:30:00', '1996-10-30');  

*   1
*   2

  1. add_ Months returns the date n months after the date
 > SELECT add_months('2016-08-31', 1);  

*   1
*   2
*   3

  1. last_day(date),next_day(start_date, day_of_week)
 > SELECT last_day('2009-01-12');  2009-01-31
    > SELECT next_day('2015-01-14', 'TU');  2015-01-20` 

*   1
*   2

  1. date_ add,date_ Sub (minus)
    date_add(start_date, num_days) – Returns the date that is num_days after start_date.
 > SELECT date_add('2016-07-30', 1);  2016-07-31` 

*   1

  1. DateDiff (number of days between two dates)
    datediff(endDate, startDate) – Returns the number of days from startDate to endDate.
 > SELECT datediff('2009-07-31', '2009-07-30');

*   1
*   2

  1. About UTC time
  • to_utc_timestamp
    to_utc_timestamp(timestamp, timezone) – Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, ‘GMT+1’ would yield ‘2017-07-14 01:40:00.0’.
 > SELECT to_utc_timestamp('2016-08-31', 'Asia/Seoul');  
    2016-08-30 15:00:00` 

*   1
*   2

  • from_utc_timestamp
    from_utc_timestamp(timestamp, timezone) – Given a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in UTC, and renders that time as a timestamp in the given time zone. For example, ‘GMT+1’ would yield ‘2017-07-14 03:40:00.0’.
 > SELECT from_utc_timestamp('2016-08-31', 'Asia/Seoul');  
    2016-08-31 09:00:00

5、 Hive common functions

1. Mathematical function

Round (double D): returns the approximate value of bigint type of D of double type
Round (double D, int): returns the approximate value of double type D with N decimal places
Floor (double D): D is of double type and returns the largest bigint value of < = D
Ceil (double D): D is of double type and returns the smallest bigint value of > = D
Rand() rand (int seed): each row returns a double type random number, and the integer seed is a random factor
Exp (double D): returns the D power of E
Ln (double D): logarithm of D Based on natural number
Log10 (double D): logarithm of D with base 10
Log2 (double D): the logarithm of D with the base of 2
Log (double base, double D)
Pow (double D, double P) power (double D, double P): calculate the P power of D
Sqrt (double D): the square root of D
Bin (double I): calculates the string type value of binary value I
Hex (bigint I): calculates the string type value of hexadecimal value I
Inverse method of unhex (string I): hex
conv(STRING num,INT from_ base,INT to_ Base): change num of string type from from_ Base conversion to to_ Base system
ABS (double D): calculate the absolute value of D
PMOD (int I1, int I2): I1 module I2
Sin (double D): returns the sine value of D
Cos (double D): returns the cosine value of D
Asin (double D): returns the arcsine value of D
ACOS (double D): returns the arccosine of D
Tan (double D): returns the tangent of D
Atan (double D): returns the arctangent of D
Degrees (double D): converts the radian value D to the angle value
Radians (double D): converts the angle value D to the radian value
Positive (double D): returns + D
Negative (double D): returns - D
Sign (double D): if D is positive, it returns + 1.0; if D is negative, it returns - 1.0; otherwise, it is 0

2. Date function

to_ Date (string timestamp): returns the date part of the time string, such as to_ date('1970-01-01 00:00:00')='1970-01-01'
current_ Date: returns the current date
Year (date): returns the year of date, and the type is int, for example, year ('2019-01-01 ') = 2019
Month (date): returns the month of date, the type is int, for example, month ('2019-01-01 ') = 1
Day (date): returns the day of date, the type is int, for example, day ('2019-01-01 ') = 1
Week of year (date1): returns the date date1 is in the week of the year. For example, weekofyear ('2019-03-06 ') = 10
DateDiff (date1, date2): returns the number of days between date1 and date2, such as DateDiff ('2019-03-06 ','2019-03-05') = 1
date_ Add (date1, INT1): returns the date of date1 plus INT1, such as date_ add('2019-03-06',1)='2019-03-07'
date_ Sub (date1, INT1): returns the date of date1 minus INT1, such as date_ sub('2019-03-06',1)='2019-03-05'
months_ Between (date1, date2): returns the month of difference between date1 and date2, such as months_ between('2019-03-06','2019-01-01')=2
add_ Months (date1, INT1): returns the date of date1 plus INT1 month, INT1 can be negative. Such as add_ months('2019-02-11',-1)='2019-01-11'
last_ Day (date1): returns the last day of the month of date1. Such as last_ day('2019-02-01')='2019-02-28'
next_ Day (date1, Day1): returns the date of Day1 next week of date1. Day1 is the first two letters of week x, such as next_ Day ('2019-03-06 ','mo') returns to '2019-03-11'
TRUNC (date1, string1): returns the beginning year or month of the date. String1 can be year (yyyy / YY / year) or month (month / mon / mm). For example, TRUNC ('2019-03-06 ','mm') ='2019-03-01 ', TRUNC ('2019-03-06','yyyy ') ='2019-01-01'
unix_ Timestamp(): returns the UNIX timestamp of the current time. You can specify the date format. Such as UNIX_ timestamp('2019-03-06','yyyy-mm-dd')=1546704180
from_ Unixtime(): returns the date of the UNIX timestamp in a specified format. For example, select from_ unixtime(unix_ timestamp('2019-03-06','yyyy-mm-dd'),'yyyymmdd')='20190306'

3. Conditional function

If (Boolean, T1, T2): if the Boolean value holds, T1 is returned, and T2 is returned anyway. If (1 > 2100200), return 200
Case when Boolean then T1 else T2 end: if the Boolean value is true, then T1, otherwise T2, how much judgment can be added
Coalesce (V0, V1, V2): returns the first non null value in the parameter. If all values are null, null is returned. For example, coalesce (null, 1,2) returns 1
Isnull (a): returns true if a is null, otherwise returns false

4. String function

Length (string1): returns the length of the string
Concat (string1, string2): returns the string after splicing string1 and string2
concat_ WS (SEP, string1, string2): returns a string concatenated by a specified separator
Lower (string1): returns a lowercase string, the same as lcase (string1). Upper() / ucase(): returns an uppercase string
Trim (string1): remove the space around the string, ltrim (string1): remove the space around the string. Rtrim (string1): remove the right space of the string
Repeat (string1, INT1): returns the string after repeating string1 string INT1 times
Reverse (string1): returns the inverted string of string1. For example, reverse ('abc ') Returns' CBA'
Rpad (string1, len1, Pad1): fill the string of string1 to the right of Pad1 to the length of len1. For example, rpad ('abc ', 5,' 1 ') Returns' abc11'. Lpad(): left padding
Split (string1, pat1): the string string1 is separated by the regular pat1, and the array is returned. For example, split ('a, B, C ',') returns ["a", "B", "C"]
Substr (string1, index1, INT1): intercepts INT1 characters from the index position. For example, substr ('abcde ', 1,2) Returns' ab'

5. Aggregate function

Count (): count the number of rows
Sum (col1): count the specified column and
AVG (col1): statistics the average value of the specified column
Min (col1): returns the minimum value of the specified column
Max (col1): returns the maximum value of the specified column

6. Table generating function

Expand (array): returns the corresponding elements in a multi row array. For example, expand (array ('a ',' B ',' C ')) Returns 
Expand (map): returns the corresponding element of the key value pair of multi line map. For example, expand (map (1, 'a', 2, 'B', 3, 'C')) Returns

7. Window function

row_ Number () over (partition by.. order by...): sort according to partition, take different serial numbers for the same value, and there is no serial number jump
Rank() over (partition by.. order by.): sort according to partition, take the same sequence number for the same value, and there is sequence number jump
dense_ Rank() over (partition by.. order by...): sort by partition, take the same sequence number for the same value, and there is no sequence number jump
sum() over(partition by .. order by ..)
count() over(partition by .. order by ..)
Lag (col, n) over (partition by.. order by.): view the nth line above the current line
Lead (col, n) over (partition by.. order by...): view the next nth line of the current line
first_ Value () over (partition by.. order by.): the first value satisfying partition and sorting
last_ Value () over (partition by.. order by...): the last value satisfying partition and sorting
Ntile (n) over (partition by.. order by...): the data satisfying partition and sorting is divided into N parts

For more detailed partition in partition, the windows clause can be used. The Common clauses are:
Ahead: ahead
Following: back
Current row: current row
Unbounded: starting point, unbounded preceding: starting point from the front, unbounded following: ending point from the back
For example:
Sum (Col) over (partition by.. order by.. rows between 1 preceding and current row): the current row is aggregated with the previous row

8. Row and column conversion

concat_ ws(sep, collect_ Set (col1)): different rows in the same group are merged into one column and separated by SEP separator. collect_ Set can also be collected without repetition_ List () instead. collect_ Set() de duplication, collect_ List () does not duplicate
Lateral view expand (split (col1, ''): data in the same group and column are split into multiple rows, which are distinguished by SEP separator