Analytic function

Time:2021-7-6

concept

Official definition:

Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVINGclauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.

There are several key points

  • Calculates a set of data and returns multiple rows
  • There is no need to combine multiple tables to improve performance
  • After all table joins and all where, group by and having sentences, before the order by clause
  • Can only be in select or order by clauses

grammar

Analytic function

  • Commonly usedanalytic_function

    • AVG,MAX,MIN,SUM,COUNT
    • DENSE_RANK,RANK,ROW_NUMBER, CUME_DIST
    • LAG,LEAD
    • FIRST,LAST
    • NTILE
    • FIRST_VALUE/LAST_VALUE
    • LISTAGG
    • RATIO_TO_REPORT
  • argumentsNumber: 0 ~ 3
  • argumentsType: numeric type or non numeric type that can be implicitly converted to numeric type
  • analytic_clause

    Analytic function

    • It is evaluated after the from, where, group by and having clauses
    • Specify the order by clause in the select and order by clausesanalytic_clauseThe analytic function of
    • query_partition_clause

      Analytic function

      • according toexprGroup query results
      • If the statement is ignored, the query result is a group
      • The parse function uses the above branch without parentheses
      • ExprCan be constant, field, non analytic function, function expression
    • order_by_clause

      Analytic function

      • Specifies how the data in the partition is sorted
      • When the sorting result has the same value:

        • DENSE_ Rank, rank returns the same value
        • ROW_ Number returns different values, sorted according to the order of processing rows
      • limit

        • Can only be used in analytic functionsexprpositionandc_aliasinvalid
        • Use the range keyword in the analysis function and use the following window to use multiple sort keys

          • RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(RANGE UNBOUNDED PRECEDING)
          • RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
          • RANGE BETWEEN CURRENT ROW AND CURRENT ROW
          • RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    • windowing_clause

      Analytic function

      • supportwindowing_clauseAnalysis function: AVG, Max, min, sum, count
      • ROWS | RANGE

        • Define a window for each row to calculate the function result
        • Rows: Specifies the window in rows
        • Range: Specifies the window at a logical offset
      • BETWEEN … AND

        • Specifies the start and end points of the window
        • Omit between, then the specified point is the starting point, and the ending point is the current row by default
      • Only specifiedorder_by_clauseTo usewindowing_clause
      • If omittedwindowing_clause, the default is range between unbounded pressing and current row
      • Unbounded predicting: starting from the first row of the partition
      • Unbounded following: to the end of the last line of the partition
      • CURRENT ROW

        • As the starting point, current row specifies that the window starts from the current row or a value (depending on whether row or range is used). In this case, the ending point cannot bevalue_expr PRECEDING。
        • As the end point, current row specifies that the window ends at the current row or a value (depending on whether row or range is used). In this case, the start point cannot bevalue_expr FOLLOWING。
      • value_expr PRECEDING or value_expr FOLLOWING

        • For range or row

          • If the starting point isvalue_exprFollowing, the ending point must bevalue_expr FOLLOWING
          • If the end point isvalue_exprThe starting point must bevalue_expr PRECEDING
        • If rows is specified

          • value_exprIt’s a physical offset. Must be a constant or expression and must evaluate to a positive value
          • Ifvalue_exprIs part of the starting point, it must be in the row before the ending point
        • If range is specified

          • value_exprIs a logical offset. Must be a constant or expression that evaluates to a positive value or interval text
          • stayorder_by_clauseOnly one sort key can be used
          • Ifvalue_exprIs a numerical value, then order byexprMust be numeric or date type
          • Ifvalue_exprIs the interval value, then order byexprMust be of date type

classification

Type Used For
Reporting Calculating shares, for example, market share. Works with these functions: SUM, AVG, MIN, MAX, COUNT (with/without DISTINCT), VARIANCE, STDDEV, RATIO_TO_REPORT, and new statistical functions. Note that the DISTINCT keyword may be used in those reporting functions that support DISTINCT in aggregate mode.
Windowing Calculating cumulative and moving aggregates. Works with these functions: SUM, AVG, MIN, MAX, COUNT, VARIANCE, STDDEV, FIRST_VALUE, LAST_VALUE, and new statistical functions. Note that the DISTINCTkeyword is not supported in windowing functions except for MAX and MIN.
Ranking Calculating ranks, percentiles, and n-tiles of the values in a result set.
LAG/LEAD Finding a value in a row a specified number of rows from a current row.
FIRST/LAST First or last value in an ordered group.
Hypothetical Rank and Distribution The rank or percentile that a row would have if inserted into a specified data set.

Reporting

  • Query personnel information and company average salary, minimum salary, maximum salary, total salary and number of employees
select employee_id,last_name,department_id,salary,
avg(salary) over () avg_sal,
max(salary) over () max_sal,
min(salary) over () min_sal,
sum(salary) over () sum_sal,
count(salary) over () count_sal
from employees order by department_id;

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY    AVG_SAL    MAX_SAL    MIN_SAL    SUM_SAL  COUNT_SAL
----------- --------------- ------------- ---------- ---------- ---------- ---------- ---------- ----------
        200 Whalen                     10       4400 6461.83178      24000       2100     691416        107
        201 Hartstein                  20      13000 6461.83178      24000       2100     691416        107
        202 Fay                        20       6000 6461.83178      24000       2100     691416        107
        114 Raphaely                   30      11000 6461.83178      24000       2100     691416        107
        119 Colmenares                 30       2500 6461.83178      24000       2100     691416        107
        115 Khoo                       30       3100 6461.83178      24000       2100     691416        107
        116 Baida                      30       2900 6461.83178      24000       2100     691416        107
        117 Tobias                     30       2800 6461.83178      24000       2100     691416        107
        118 Himuro                     30       2600 6461.83178      24000       2100     691416        107
        203 Mavris                     40       6500 6461.83178      24000       2100     691416        107
        198 OConnell                   50       2600 6461.83178      24000       2100     691416        107
        ......
  • Query personnel information and average salary, minimum salary, maximum salary, total salary and number of employees in each department
select employee_id,last_name,department_id,salary,
avg(salary) over (partition by department_id) avg_sal,
max(salary) over (partition by department_id) max_sal,
min(salary) over (partition by department_id) min_sal,
sum(salary) over (partition by department_id) sum_sal,
count(salary) over (partition by department_id) count_sal
from employees order by department_id;

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY    AVG_SAL    MAX_SAL    MIN_SAL    SUM_SAL  COUNT_SAL
----------- --------------- ------------- ---------- ---------- ---------- ---------- ---------- ----------
        200 Whalen                     10       4400       4400       4400       4400       4400          1
        201 Hartstein                  20      13000       9500      13000       6000      19000          2
        202 Fay                        20       6000       9500      13000       6000      19000          2
        114 Raphaely                   30      11000       4150      11000       2500      24900          6
        119 Colmenares                 30       2500       4150      11000       2500      24900          6
        115 Khoo                       30       3100       4150      11000       2500      24900          6
        116 Baida                      30       2900       4150      11000       2500      24900          6
        117 Tobias                     30       2800       4150      11000       2500      24900          6
        118 Himuro                     30       2600       4150      11000       2500      24900          6
        203 Mavris                     40       6500       6500       6500       6500       6500          1
        198 OConnell                   50       2600 3475.55556       8200       2100     156400         45
        ......
  • Query the employee information of the Department’s highest salary (without using analysis function)
select employee_id,last_name,e1.department_id,job_id,salary
from employees e1
where e1.salary=(select max(salary) from employees e2 where e1.department_id=e2.department_id)
order by department_id;

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID JOB_ID         SALARY
----------- --------------- ------------- ---------- ----------
        200 Whalen                     10 AD_ASST          4400
        201 Hartstein                  20 MK_MAN          13000
        114 Raphaely                   30 PU_MAN          11000
        203 Mavris                     40 HR_REP           6500
        121 Fripp                      50 ST_MAN           8200
        103 Hunold                     60 IT_PROG          9000
        204 Baer                       70 PR_REP          10000
        145 Russell                    80 SA_MAN          14000
        100 King                       90 AD_PRES         24000
        108 Greenberg                 100 FI_MGR          12008
        205 Higgins                   110 AC_MGR          12008

11 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 298340369

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |     1 |    44 |     5  (20)| 00:00:01 |
|   1 |  SORT ORDER BY                |                   |     1 |    44 |     5  (20)| 00:00:01 |
|   2 |   NESTED LOOPS                |                   |     1 |    44 |     5  (20)| 00:00:01 |
|   3 |    NESTED LOOPS               |                   |    10 |    44 |     5  (20)| 00:00:01 |
|   4 |     VIEW                      | VW_SQ_1           |     1 |    16 |     4  (25)| 00:00:01 |
|*  5 |      FILTER                   |                   |       |       |            |          |
|   6 |       HASH GROUP BY           |                   |     1 |     7 |     4  (25)| 00:00:01 |
|   7 |        TABLE ACCESS FULL      | EMPLOYEES         |   107 |   749 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN          | EMP_DEPARTMENT_IX |    10 |       |     0   (0)| 00:00:01 |
|*  9 |    TABLE ACCESS BY INDEX ROWID| EMPLOYEES         |     1 |    28 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(MAX("SALARY")>0)
   8 - access("E1"."DEPARTMENT_ID"="ITEM_1")
   9 - filter("E1"."SALARY"="MAX(SALARY)")


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads
          0  redo size
       1178  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         11  rows processed
  • Query the employee information of the Department’s highest salary (using the analysis function)
select emp.*
from (select employee_id,last_name,department_id,job_id,salary,
max(salary) over (partition by department_id) max_sal
from employees
order by department_id) emp
where salary=max_sal
order by department_id;

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID JOB_ID         SALARY    MAX_SAL
----------- --------------- ------------- ---------- ---------- ----------
        200 Whalen                     10 AD_ASST          4400       4400
        201 Hartstein                  20 MK_MAN          13000      13000
        114 Raphaely                   30 PU_MAN          11000      11000
        203 Mavris                     40 HR_REP           6500       6500
        121 Fripp                      50 ST_MAN           8200       8200
        103 Hunold                     60 IT_PROG          9000       9000
        204 Baer                       70 PR_REP          10000      10000
        145 Russell                    80 SA_MAN          14000      14000
        100 King                       90 AD_PRES         24000      24000
        108 Greenberg                 100 FI_MGR          12008      12008
        205 Higgins                   110 AC_MGR          12008      12008

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID JOB_ID         SALARY    MAX_SAL
----------- --------------- ------------- ---------- ---------- ----------
        178 Grant                         SA_REP           7000       7000

12 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 720055818

---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |   107 |  6848 |     3   (0)| 00:00:01 |
|*  1 |  VIEW               |           |   107 |  6848 |     3   (0)| 00:00:01 |
|   2 |   WINDOW SORT       |           |   107 |  2996 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMPLOYEES |   107 |  2996 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("SALARY"="MAX_SAL")


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
       1312  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         12  rows processed

We can see that the performance of SQL using analysis function has been improved to a certain extent.

  • Query the personnel information, the total salary of each department and position, and the total salary of each department
select employee_id,last_name,department_id,job_id,salary,
sum(salary) over (partition by department_id,job_id) job_sal1,
sum(salary) over (partition by department_id) dept_sal2
from employees
order by department_id;

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID JOB_ID         SALARY   JOB_SAL1  DEPT_SAL2
----------- --------------- ------------- ---------- ---------- ---------- ----------
        200 Whalen                     10 AD_ASST          4400       4400       4400
        201 Hartstein                  20 MK_MAN          13000      13000      19000
        202 Fay                        20 MK_REP           6000       6000      19000
        118 Himuro                     30 PU_CLERK         2600      13900      24900
        119 Colmenares                 30 PU_CLERK         2500      13900      24900
        115 Khoo                       30 PU_CLERK         3100      13900      24900
        116 Baida                      30 PU_CLERK         2900      13900      24900
        117 Tobias                     30 PU_CLERK         2800      13900      24900
        114 Raphaely                   30 PU_MAN          11000      11000      24900
        203 Mavris                     40 HR_REP           6500       6500       6500
        198 OConnell                   50 SH_CLERK         2600      64300     156400
        ......
  • Query the total salary of each position in each department and the total salary of each department
select department_id,job_id,
sum(salary) job_sal1,
sum(sum(salary)) over (partition by department_id) dept_sal2
from employees
group by department_id,job_id
order by department_id;

DEPARTMENT_ID JOB_ID       JOB_SAL1  DEPT_SAL2
------------- ---------- ---------- ----------
           10 AD_ASST          4400       4400
           20 MK_MAN          13000      19000
           20 MK_REP           6000      19000
           30 PU_CLERK        13900      24900
           30 PU_MAN          11000      24900
           40 HR_REP           6500       6500
           50 SH_CLERK        64300     156400
           50 ST_CLERK        55700     156400
           50 ST_MAN          36400     156400
           60 IT_PROG         28800      28800
           70 PR_REP          10000      10000

DEPARTMENT_ID JOB_ID       JOB_SAL1  DEPT_SAL2
------------- ---------- ---------- ----------
           80 SA_MAN          61000     304500
           80 SA_REP         243500     304500
           90 AD_PRES         24000      58000
           90 AD_VP           34000      58000
          100 FI_ACCOUNT      39600      51608
          100 FI_MGR          12008      51608
          110 AC_ACCOUNT       8300      20308
          110 AC_MGR          12008      20308
              SA_REP           7000       7000

20 rows selected.
  • Query the positions whose total salary accounts for more than 50% of the total salary of the Department
select emp.*,100 * round(job_sal1/dept_sal2, 2)||'%' Percent
from (select department_id,job_id,
sum(salary) job_sal1,
sum(sum(salary)) over (partition by department_id) dept_sal2
from employees
group by department_id,job_id) emp 
where job_sal1>dept_sal2*0.5;

DEPARTMENT_ID JOB_ID       JOB_SAL1  DEPT_SAL2 PERCENT
------------- ---------- ---------- ---------- -----------------------------------------
           10 AD_ASST          4400       4400 100%
           20 MK_MAN          13000      19000 68%
           30 PU_CLERK        13900      24900 56%
           40 HR_REP           6500       6500 100%
           60 IT_PROG         28800      28800 100%
           70 PR_REP          10000      10000 100%
           80 SA_REP         243500     304500 80%
           90 AD_VP           34000      58000 59%
          100 FI_ACCOUNT      39600      51608 77%
          110 AC_MGR          12008      20308 59%
              SA_REP           7000       7000 100%

11 rows selected.
  • Query the positions whose total salary accounts for more than 50% of the total salary of the Department (using ratio)_ to_ Report function)
select emp.*
from (select department_id,job_id,
sum(salary) job_sal1,
sum(sum(salary)) over (partition by department_id) dept_sal2,
ratio_to_report(sum(salary)) over (partition by department_id) job_to_dept_sal3
from employees
group by department_id,job_id) emp
where job_to_dept_sal3>0.5;

DEPARTMENT_ID JOB_ID       JOB_SAL1  DEPT_SAL2 JOB_TO_DEPT_SAL3
------------- ---------- ---------- ---------- ----------------
           10 AD_ASST          4400       4400                1
           20 MK_MAN          13000      19000       .684210526
           30 PU_CLERK        13900      24900       .558232932
           40 HR_REP           6500       6500                1
           60 IT_PROG         28800      28800                1
           70 PR_REP          10000      10000                1
           80 SA_REP         243500     304500       .799671593
           90 AD_VP           34000      58000       .586206897
          100 FI_ACCOUNT      39600      51608       .767322896
          110 AC_MGR          12008      20308       .591294071
              SA_REP           7000       7000                1

11 rows selected.
  • Query the percentage of each person’s salary in the total Department salary and the total company salary (use ratio)_ to_ Report function)
select employee_id,last_name,department_id,hire_date,salary,
ratio_to_report(salary) over(partition by department_id) as pct1,
ratio_to_report(salary) over() as pct2
from employees;

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID HIRE_DATE              SALARY       PCT1       PCT2
----------- --------------- ------------- ------------------ ---------- ---------- ----------
        200 Whalen                     10 17-SEP-03                4400          1 .006363752
        201 Hartstein                  20 17-FEB-04               13000 .684210526 .018801995
        202 Fay                        20 17-AUG-05                6000 .315789474 .008677844
        114 Raphaely                   30 07-DEC-02               11000 .441767068  .01590938
        119 Colmenares                 30 10-AUG-07                2500 .100401606 .003615768
        115 Khoo                       30 18-MAY-03                3100 .124497992 .004483553
        116 Baida                      30 24-DEC-05                2900 .116465863 .004194291
        117 Tobias                     30 24-JUL-05                2800 .112449799  .00404966
        118 Himuro                     30 15-NOV-06                2600 .104417671 .003760399
        203 Mavris                     40 07-JUN-02                6500          1 .009400997
        198 OConnell                   50 21-JUN-07                2600 .016624041 .003760399
        ......

Windowing

  • Cumulative Aggregate Function

    • Query salary totals by department and company
    select employee_id,last_name,department_id,salary,
    sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) dept_sal_cum1,
    sum(salary) over (order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) dept_sal_cum2
    from employees;
    
    EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2
    ----------- --------------- ------------- ---------- ------------- -------------
            200 Whalen                     10       4400          4400        691416
            201 Hartstein                  20      13000         19000        691416
            202 Fay                        20       6000         19000        691416
            114 Raphaely                   30      11000         24900        691416
            119 Colmenares                 30       2500         24900        691416
            115 Khoo                       30       3100         24900        691416
            116 Baida                      30       2900         24900        691416
            117 Tobias                     30       2800         24900        691416
            118 Himuro                     30       2600         24900        691416
            203 Mavris                     40       6500          6500        691416
            198 OConnell                   50       2600        156400        691416
            ......

    Equivalent to the following SQL:

    select employee_id,last_name,department_id,salary,
    sum(salary) over (partition by department_id) dept_sal_cum1,
    sum(salary) over () dept_sal_cum2
    from employees;
    • Query salary accumulation by department and not by department
    select employee_id,last_name,department_id,salary,
    sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) dept_sal_cum1,
    sum(salary) over (order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) dept_sal_cum2
    from employees;
    
    EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2
    ----------- --------------- ------------- ---------- ------------- -------------
            200 Whalen                     10       4400          4400          4400
            201 Hartstein                  20      13000         13000         17400
            202 Fay                        20       6000         19000         23400
            114 Raphaely                   30      11000         11000         34400
            119 Colmenares                 30       2500         13500         36900
            115 Khoo                       30       3100         16600         40000
            116 Baida                      30       2900         19500         42900
            117 Tobias                     30       2800         22300         45700
            118 Himuro                     30       2600         24900         48300
            203 Mavris                     40       6500          6500         54800
            198 OConnell                   50       2600          2600         57400
            ......

    Equivalent to the following SQL:

    select employee_id,last_name,department_id,salary,
    sum(salary) over (partition by department_id order by department_id ROWS UNBOUNDED PRECEDING) dept_sal_cum1,
    sum(salary) over (order by department_id ROWS UNBOUNDED PRECEDING) dept_sal_cum2
    from employees;
    • Query the cumulative amount from the first row of the partition to the previous row of this line and the cumulative amount to the next row of this line by department partition
    select employee_id,last_name,department_id,salary,
    sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) dept_sal_cum1,
    sum(salary) over (partition by department_id order by department_id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) dept_sal_cum2
    from employees;
    
    EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2
    ----------- --------------- ------------- ---------- ------------- -------------
            200 Whalen                     10       4400                        4400
            201 Hartstein                  20      13000                       19000
            202 Fay                        20       6000         13000         19000
            114 Raphaely                   30      11000                       13500
            119 Colmenares                 30       2500         11000         16600
            115 Khoo                       30       3100         13500         19500
            116 Baida                      30       2900         16600         22300
            117 Tobias                     30       2800         19500         24900
            118 Himuro                     30       2600         22300         24900
            203 Mavris                     40       6500                        6500
            198 OConnell                   50       2600                        5200
            ......
  • Moving Aggregate Function

    • Query the cumulative amount from the previous row to the current row and from the current row to the next row by department partition
    select employee_id,last_name,department_id,salary,
    sum(salary) over (partition by department_id order by department_id ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) dept_sal_cum1,
    sum(salary) over (partition by department_id order by department_id ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) dept_sal_cum2
    from employees;
    
    EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY DEPT_SAL_CUM1 DEPT_SAL_CUM2
    ----------- --------------- ------------- ---------- ------------- -------------
            200 Whalen                     10       4400          4400          4400
            201 Hartstein                  20      13000         13000         19000
            202 Fay                        20       6000         19000          6000
            114 Raphaely                   30      11000         11000         13500
            119 Colmenares                 30       2500         13500          5600
            115 Khoo                       30       3100          5600          6000
            116 Baida                      30       2900          6000          5700
            117 Tobias                     30       2800          5700          5400
            118 Himuro                     30       2600          5400          2600
            203 Mavris                     40       6500          6500          6500
            198 OConnell                   50       2600          2600          5200
            ......
  • Centered Aggregate

    • Query the total salary grouped by the entry date and the average salary of the personnel on the day adjacent to the entry date
    SELECT hire_date, SUM(salary) AS sum_sal1, 
    AVG(SUM(salary)) OVER (ORDER BY hire_date RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING) AS CENTERED_1_DAY_AVG
    FROM employees
    GROUP BY hire_date;
    
    HIRE_DATE            SUM_SAL1 CENTERED_1_DAY_AVG
    ------------------ ---------- ------------------
    13-JAN-01               17000              17000
    07-JUN-02               36808              36808
    16-AUG-02                9000              10504
    17-AUG-02               12008              10504
    07-DEC-02               11000              11000
    01-MAY-03                7900               7900
    18-MAY-03                3100               3100
    17-JUN-03               24000              24000
    14-JUL-03                3600               3600
    17-SEP-03                4400               4400
    17-OCT-03                3500               3500
    ......

Ranking

  • Rank: returns a unique value. Unless the same data is encountered, the ranking of all the same data is the same. At the same time, the ranking will be vacated between the ranking of the last same record and the next different record
  • DENSE_ Rank: returns a unique value, unless the rank of all the same data is the same when the same data is encountered.
  • ROW_ Number: returns a unique value. When encountering the same data, the ranking will increase in turn according to the order of records in the record set.
  • Query the salary information of employees ranked from low to high by Department
select employee_id,last_name,department_id,salary,
RANK() over (partition by department_id order by salary) rank,
DENSE_RANK() over (partition by department_id order by salary) dense_rank,
ROW_NUMBER() over (partition by department_id order by salary) row_number
from employees where department_id=50;


EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY       RANK DENSE_RANK ROW_NUMBER
----------- --------------- ------------- ---------- ---------- ---------- ----------
        132 Olson                      50       2100          1          1          1
        128 Markle                     50       2200          2          2          2
        136 Philtanker                 50       2200          2          2          3
        135 Gee                        50       2400          4          3          4
        127 Landry                     50       2400          4          3          5
        131 Marlow                     50       2500          6          4          6
        144 Vargas                     50       2500          6          4          7
        182 Sullivan                   50       2500          6          4          8
        191 Perkins                    50       2500          6          4          9
        140 Patel                      50       2500          6          4         10
        198 OConnell                   50       2600         11          5         11
        ......
  • Query the salary information of the top three employees in each department
select e.*
from (select employee_id,last_name,department_id,salary,
DENSE_RANK() over (partition by department_id order by salary desc) dense_rank
from employees) e
where dense_rank<=3;


EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY DENSE_RANK
----------- --------------- ------------- ---------- ----------
        200 Whalen                     10       4400          1
        201 Hartstein                  20      13000          1
        202 Fay                        20       6000          2
        114 Raphaely                   30      11000          1
        115 Khoo                       30       3100          2
        116 Baida                      30       2900          3
        203 Mavris                     40       6500          1
        121 Fripp                      50       8200          1
        120 Weiss                      50       8000          2
        122 Kaufling                   50       7900          3
        103 Hunold                     60       9000          1
        ......

LAG/LEAD

  • grammar

{LAG | LEAD} ( value_expr [, offset] [, default] ) [RESPECT NULLS|IGNORE NULLS] OVER ( [query_partition_clause] order_by_clause )

The lag and lead functions can obtain a column of a row of several adjacent offsets of the current row arranged in a certain order in the result set (without the self association of the result set); Lag and lead are forward and backward respectively; Lag and lead have three parameters. The first parameter is the column name, the second parameter is the offset, and the third parameter is the default value when it exceeds the record window). The lag (expression <, offset > <, default >) function can access the row before the current line in the group, while the lead (expression <, offset > <, default >) function can access the row after the current line in the group. Where, offset is a positive integer, and the default value is 1. Because the first record in the group has no previous line, and the last record has no subsequent line, default is used to process such information, and the default value is empty. Note: the order by sentence must be specified in these two functions.

  • Query the salary of an employee, the salary of the previous employee and the salary of the subsequent employee
SELECT hire_date, last_name, salary,
LAG(salary, 1, 0 ) OVER (ORDER BY hire_date) AS prev_sal,
LEAD(salary, 1, 0 ) OVER (ORDER BY hire_date) AS next_sal
FROM employees
WHERE job_id = 'PU_CLERK'
ORDER BY hire_date;

HIRE_DATE          LAST_NAME           SALARY   PREV_SAL   NEXT_SAL
------------------ --------------- ---------- ---------- ----------
18-MAY-03          Khoo                  3100          0       2800
24-JUL-05          Tobias                2800       3100       2900
24-DEC-05          Baida                 2900       2800       2600
15-NOV-06          Himuro                2600       2900       2500
10-AUG-07          Colmenares            2500       2600          0

FIRST/LAST

  • grammar

aggregate_function KEEP ( DENSE_RANK LAST ORDER BY expr [ DESC | ASC ] [NULLS { FIRST | LAST }] [, expr [ DESC | ASC ] [NULLS { FIRST | LAST }]]…) [OVER query_partitioning_clause]

The first / last function allows us to sort a data set and process the first record and the last record of the sorted result. After the first or last record is found, we need to apply an aggregate function to deal with a specific column. This is to ensure the uniqueness of the returned result, because there may be multiple records ranking first and last. Using the first / last function can avoid self join or sub query, so it can improve the processing efficiency.

  • instructions

    • If the first and last functions have an over clause, they are analytic functions. If they don’t have an over clause, they are aggregate functions.
    • The parameter of the function must be of numeric type (or other types can be converted to numeric type), and return the same type
    • aggregate_ Function can be min, Max, sum, AVG, count, variance, StdDev
  • Inquire about the minimum and maximum salary of personnel and their departments
SELECT employee_id, last_name, department_id, salary,
MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY salary) OVER (PARTITION BY department_id) "Worst",
MAX(salary) KEEP (DENSE_RANK LAST ORDER BY salary) OVER (PARTITION BY department_id) "Best"
FROM employees
ORDER BY department_id, salary, last_name;

EMPLOYEE_ID LAST_NAME       DEPARTMENT_ID     SALARY      Worst       Best
----------- --------------- ------------- ---------- ---------- ----------
        200 Whalen                     10       4400       4400       4400
        202 Fay                        20       6000       6000      13000
        201 Hartstein                  20      13000       6000      13000
        119 Colmenares                 30       2500       2500      11000
        118 Himuro                     30       2600       2500      11000
        117 Tobias                     30       2800       2500      11000
        116 Baida                      30       2900       2500      11000
        115 Khoo                       30       3100       2500      11000
        114 Raphaely                   30      11000       2500      11000
        203 Mavris                     40       6500       6500       6500
        132 Olson                      50       2100       2100       8200

NTILE

  • grammar

NTILE (expr) OVER ([query_partition_clause] order_by_clause)

  • Query the personnel information and its corresponding salary level, and divide the salary into five levels
SELECT employee_id,last_name,salary,
NTILE(5) OVER (ORDER BY salary DESC) AS quartile
FROM employees
WHERE department_id=30;

EMPLOYEE_ID LAST_NAME                     SALARY   QUARTILE
----------- ------------------------- ---------- ----------
        114 Raphaely                       11000          1
        115 Khoo                            3100          1
        116 Baida                           2900          2
        117 Tobias                          2800          3
        118 Himuro                          2600          4
        119 Colmenares                      2500          5

FIRST_VALUE/LAST_VALUE

  • grammar

FIRST_VALUE|LAST_VALUE ( <expr> ) [RESPECT NULLS|IGNORE NULLS] OVER (analytic clause );

  • Query the personnel information and the name of the lowest salary and highest salary personnel in the Department
SELECT employee_id,last_name,department_id,salary,
FIRST_VALUE(last_name) OVER (PARTITION BY department_id ORDER BY salary) AS worst,
LAST_VALUE(last_name) OVER (PARTITION BY department_id ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS best
FROM employees order by department_id,salary;

EMPLOYEE_ID LAST_NAME            DEPARTMENT_ID     SALARY WORST                BEST
----------- -------------------- ------------- ---------- -------------------- --------------------
        200 Whalen                          10       4400 Whalen               Whalen
        202 Fay                             20       6000 Fay                  Hartstein
        201 Hartstein                       20      13000 Fay                  Hartstein
        119 Colmenares                      30       2500 Colmenares           Raphaely
        118 Himuro                          30       2600 Colmenares           Raphaely
        117 Tobias                          30       2800 Colmenares           Raphaely
        116 Baida                           30       2900 Colmenares           Raphaely
        115 Khoo                            30       3100 Colmenares           Raphaely
        114 Raphaely                        30      11000 Colmenares           Raphaely
        203 Mavris                          40       6500 Mavris               Mavris
        132 Olson                           50       2100 Olson                Fripp
        ......

LISTAGG

  • grammar

LISTAGG (<expr> [, <delimiter>) WITHIN GROUP (ORDER BY <oby_expression_list>)

  • Query the names of all personnel in each department and sort them from low to high salary
select department_id,
listagg(last_name,',') within group (order by salary) name 
from employees where department_id in (10,20,30) group by department_id;

DEPARTMENT_ID NAME
------------- --------------------------------------------------
           10 Whalen
           20 Fay,Hartstein
           30 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
select department_id,last_name,salary,
listagg(last_name,',') within group (order by salary) over (partition by department_id) name 
from employees where department_id in (10,20,30);


DEPARTMENT_ID LAST_NAME                SALARY NAME
------------- -------------------- ---------- --------------------------------------------------
           10 Whalen                     4400 Whalen
           20 Fay                        6000 Fay,Hartstein
           20 Hartstein                 13000 Fay,Hartstein
           30 Colmenares                 2500 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
           30 Himuro                     2600 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
           30 Tobias                     2800 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
           30 Baida                      2900 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
           30 Khoo                       3100 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely
           30 Raphaely                  11000 Colmenares,Himuro,Tobias,Baida,Khoo,Raphaely

CUME_DIST

  • grammar

CUME_DIST ( ) OVER ( [query_partition_clause] order_by_clause )

  • Calculate the relative position of each person in the Department by salary
SELECT employee_id,last_name,department_id,salary,
CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cume_dist 
FROM employees
WHERE department_id=30;


EMPLOYEE_ID LAST_NAME            DEPARTMENT_ID     SALARY  CUME_DIST
----------- -------------------- ------------- ---------- ----------
        119 Colmenares                      30       2500 .166666667
        118 Himuro                          30       2600 .333333333
        117 Tobias                          30       2800         .5
        116 Baida                           30       2900 .666666667
        115 Khoo                            30       3100 .833333333
        114 Raphaely                        30      11000          1

PERCENT_RANK

  • grammar

PERCENT_RANK () OVER ([query_partition_clause] order_by_clause)

  • Calculate the relative position of each person in the Department by salary
SELECT department_id,last_name,salary,
PERCENT_RANK() OVER (PARTITION BY department_id ORDER BY salary) AS pr 
FROM employees
WHERE department_id=30;

DEPARTMENT_ID LAST_NAME                SALARY         PR
------------- -------------------- ---------- ----------
           30 Colmenares                 2500          0
           30 Himuro                     2600         .2
           30 Tobias                     2800         .4
           30 Baida                      2900         .6
           30 Khoo                       3100         .8
           30 Raphaely                  11000          1

Hypothetical Rank

  • grammar

[RANK | DENSE_RANK | PERCENT_RANK | CUME_DIST]( constant expression [, …] ) WITHIN GROUP ( ORDER BY order by expression [ASC|DESC] [NULLS FIRST|NULLS LAST][, …] )

  • If there is a new employee with a salary of 4000 in department 50, calculate the salary position of the employee in department 50
select 
RANK(50,4000) within group (order by department_id, salary) rank,
DENSE_RANK(50,4000) within group (order by department_id, salary) dense_rank,
PERCENT_RANK(50,4000) within group (order by department_id, salary) percent_rank,
cume_dist(50,4000) within group (order by department_id, salary) cume_dist 
from employees where department_id=50;

      RANK DENSE_RANK PERCENT_RANK  CUME_DIST
---------- ---------- ------------ ----------
        38         18   .822222222 .847826087

Welcome to my official account and study together.

Analytic function