James Madison
SQL Analytics
by James Madison

Home
Home


Other Pages:
Technical






James Madison

SQL for analysis

Analytic functions compute an aggregate value based on a group of rows.  They differ from aggregate functions in that they return multiple rows for each group. 

Aggregate functions you are probably familiar with are SUM, AVG, MIN, MAX, etc.  Notice how these return just one row. 

The group of rows is called a “window”.  For each row, a sliding window of rows is defined.  The window determines the range of rows used to perform the calculations for the current row.  Window sizes can be based on either a physical number of rows or a logical interval such as time. 

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause.  All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed.  Therefore, analytic functions can appear only in the select list or ORDER BY clause.

Syntax

analytic_function :: =

Description of analytic_function.gif follows

 

analytic_clause :: =

            Description of analytic_clause.gif follows

 

query_partition_clause :: =

            Description of query_partition_clause.gif follows

 

order_by_clause :: =

            Description of order_by_clause.gif follows

 

windowing_clause :: =

            Description of windowing_clause.gif follows

 

Execution order

The parts of the analytic operation are executed in a certain order.  Knowing this order is critical to understanding how the analytical function works.  The order is:

Group by – Must go first as it heavily influences the structure of the data before any part of an analytic function.

Partition – Breaks the data set from the main SELECT…GROUP BY into parts that we define.

Order – Orders the partition for two reasons: for presentation just like ORDER BY, or so the window can slide over it.

Window – Slide a window along the ordered partition so only a portion is used as the window slides.

Analytic – Apply the analytic to the window moving across the ordered partition.

This can be thought of as the pipe and filter design pattern.

Colorization

The annotation used in this document clarifies the following important constructs of the analytic query:

Query partition clause

Order by clause

Windowing clause

Analytic function

Arguments to the analytic function

GROUP BY clause

 

Five colors are used are as follows:

       analytic ( [ arguments ] ) OVER ( [ partition ] [ order [ window ] ] )

 

Additional colors are used to highlight other specific features as needed.

GROUP BY columns are both isolated in the SELECT clause, then underlined within the GROUP BY clause:

SELECT column_one, column_two,

...

GROUP BY column_one, column_two

 

Then underlined again in the result set:

COLUMN_ONE   COLUMN_TWO

------------ ------------

 

And are listed as the first columns in the SELECT clause, separated by a comment line:

SELECT column_one, column_two,

--

Printing Issues

All formatting on this page seems to render correctly in all major browsers.  How formatting comes out in printing varies by browser.  Known issues are:

·         Internet Explorer fails to print the syntax diagrams.

·         All browsers fail to print highlighting colors.

Ranking functions

A ranking function computes the rank of a record compared to other records in the data set based on the values of a set of measures.

Ranking on a single expression: rank() over (order by agg)

User: “What is the ranking of the total sales within a channel for the last few months in the U.S.?”

SELECT channel_desc,

       --

       TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,

       --

       RANK() OVER (

           ORDER BY SUM(amount_sold)

       ) AS default_rank,

       --

       RANK() OVER (

           ORDER BY SUM(amount_sold) DESC NULLS LAST

       ) AS custom_rank

       --

  FROM sales, products, customers, times, channels, countries

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND customers.country_id = countries.country_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_month_desc IN ('2000-09', '2000-10')

   AND country_iso_code = 'US'

       --

 GROUP BY channel_desc

 ORDER BY 2 ASC;

 

CHANNEL_DESC         SALES$         DEFAULT_RANK CUSTOM_RANK

-------------------- -------------- ------------ -----------

Direct Sales              1,320,497            3           1

Partners                    800,871            2           2

Internet                    261,278            1           3

Dense rank: rank() over (order by agg)

User: “What is the ranking of the total sales within a channel for the last few months?  Show me different ways to break ties.”

SELECT channel_desc, calendar_month_desc,

       --

       TO_CHAR(TRUNC(SUM(amount_sold),-4), '9,999,999,999') SALES$,

       --

       RANK() OVER (

           ORDER BY TRUNC(SUM(amount_sold),-4) DESC

       ) AS RANK,

       --

       DENSE_RANK() OVER (

           ORDER BY TRUNC(SUM(amount_sold),-4) DESC

       ) AS DENSE_RANK

       --

  FROM sales, products, customers, times, channels

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_month_desc IN ('2000-09', '2000-10')

   AND channels.channel_desc <> 'Tele Sales'

       --

 GROUP BY channel_desc, calendar_month_desc;

 

CHANNEL_DESC         CALENDAR SALES$               RANK DENSE_RANK

-------------------- -------- -------------- ---------- ----------

Direct Sales         2000-09       1,200,000          1          1

Direct Sales         2000-10       1,200,000          1          1

Partners             2000-09         600,000          3          2

Partners             2000-10         600,000          3          2

Internet             2000-09         200,000          5          3

Internet             2000-10         200,000          5          3

Ranking on multiple expressions: rank() over (order by agg, agg)

User: “What is the ranking of the total sales within a channel for the last few months, breaking any ties with number of sales?”

SELECT channel_desc, calendar_month_desc,

       --

       TO_CHAR(TRUNC(SUM(amount_sold),-5),'9,999,999,999') SALES$,

       TO_CHAR(SUM(quantity_sold), '9,999,999,999') SALES_Count,

       --

       RANK() OVER (

           ORDER BY TRUNC(SUM(amount_sold), -5) DESC,

           SUM(quantity_sold) DESC

       ) AS col_rank

       --

  FROM sales, products, customers, times, channels

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_month_desc IN ('2000-09', '2000-10')

   AND channels.channel_desc <> 'Tele Sales'

       --

 GROUP BY channel_desc, calendar_month_desc;

 

CHANNEL_DESC         CALENDAR SALES$         SALES_COUNT      COL_RANK

-------------------- -------- -------------- -------------- ----------

Direct Sales         2000-10       1,200,000         12,584          1

Direct Sales         2000-09       1,200,000         11,995          2

Partners             2000-10         600,000          7,508          3

Partners             2000-09         600,000          6,165          4

Internet             2000-09         200,000          1,887          5

Internet             2000-10         200,000          1,450          6

Per group ranking: rank() over (partition by c order by agg)

User: “Actually, we award for the best in any month regardless of channel and the best in channel for the last four months.”

SELECT channel_desc, calendar_month_desc,

       --

       TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,

       --

       RANK() OVER (

           PARTITION BY calendar_month_desc

           ORDER BY SUM(amount_sold) DESC

       ) AS RANK_WITHIN_MONTH,

       --

       RANK() OVER (

           PARTITION BY channel_desc

           ORDER BY SUM(amount_sold) DESC

       ) AS RANK_WITHIN_CHANNEL

       --

 FROM  sales, products, customers, times, channels, countries

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND customers.country_id = countries.country_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_month_desc IN ('2000-08', '2000-09', '2000-10', '2000-11')

   AND channels.channel_desc IN ('Direct Sales', 'Internet')

       --

 GROUP BY channel_desc, calendar_month_desc;

 

CHANNEL_DESC  CALENDAR    SALES$     RANK_WITHIN_MONTH  RANK_WITHIN_CHANNEL

------------- --------    ---------  -----------------  -------------------

Direct Sales   2000-08    1,236,104                  1                    1

Internet       2000-08      215,107                  2                    4

Direct Sales   2000-09    1,217,808                  1                    3

Internet       2000-09      228,241                  2                    3

Direct Sales   2000-10    1,225,584                  1                    2

Internet       2000-10      239,236                  2                    2

Direct Sales   2000-11    1,115,239                  1                    4

Internet       2000-11      284,742                  2                    1

Per cube and rollup group ranking: rank() over (partition by c order by agg) ... group by rollup|cube

User: “I like the sum-of-every-combination thing you showed me last week, but chunk it by channel and nation then rank it.”

SELECT channel_desc, country_iso_code,

       --

       TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,

       --

       RANK() OVER (

           PARTITION BY GROUPING_ID(channel_desc, country_iso_code) -- observe purple carefully

           ORDER BY SUM(amount_sold) DESC

       ) AS RANK_PER_GROUP,

       --

       GROUPING_ID(channel_desc, country_iso_code) GROUPING_ID

       --

  FROM sales, customers, times, channels, countries

       --

 WHERE sales.time_id = times.time_id

   AND sales.cust_id = customers.cust_id

   AND sales.channel_id = channels.channel_id

   AND channels.channel_desc IN ('Direct Sales', 'Internet')

   AND times.calendar_month_desc = '2000-09'

   AND country_iso_code IN ('GB', 'US', 'JP')

       --

 GROUP BY CUBE(channel_desc, country_iso_code);

 

CHANNEL_DESC         CO SALES$         RANK_PER_GROUP GROUPING_ID -- observe purple carefully

-------------------- -- -------------- -------------- -----------

Direct Sales         GB      1,217,808              1           0

Direct Sales         JP      1,217,808              1           0

Direct Sales         US      1,217,808              1           0

Internet             GB        228,241              4           0

Internet             JP        228,241              4           0

Internet             US        228,241              4           0

Direct Sales                 3,653,423              1           1

Internet                       684,724              2           1

                     GB      1,446,049              1           2

                     JP      1,446,049              1           2

                     US      1,446,049              1           2

                             4,338,147              1           3

Treatment of NULLs: agg() over (order by c asc|desc nulls first|last)

User: “The operational system gives us lots of nulls, and I’m not sure how to rank them…can you show me the options?”

SELECT times.time_id time, 

       --

       sold,

       --

       RANK() OVER ( ORDER BY (sold) DESC NULLS LAST ) AS NLAST_DESC,

       RANK() OVER ( ORDER BY (sold) DESC NULLS FIRST ) AS NFIRST_DESC,

       RANK() OVER ( ORDER BY (sold) ASC NULLS FIRST ) AS NFIRST,

       RANK() OVER ( ORDER BY (sold) ASC NULLS LAST ) AS NLAST

       --

  FROM (

           SELECT time_id, SUM(sales.amount_sold) sold

             FROM sales, products, customers, countries

            WHERE sales.prod_id = products.prod_id

              AND customers.country_id = countries.country_id

              AND sales.cust_id = customers.cust_id

              AND prod_name IN ('Envoy Ambassador', 'Mouse Pad') AND country_iso_code = 'GB'

            GROUP BY time_id

       ) v, times

       --

 WHERE v.time_id(+) = times.time_id

   AND calendar_year = 1999

   AND calendar_month_number = 1

       --

 ORDER BY sold DESC NULLS LAST;

 

TIME            SOLD NLAST_DESC NFIRST_DESC     NFIRST      NLAST

--------- ---------- ---------- ----------- ---------- ----------

25-JAN-99    3097.32          1          18         31         14

17-JAN-99    1791.77          2          19         30         13

30-JAN-99     127.69          3          20         29         12

28-JAN-99     120.34          4          21         28         11

23-JAN-99      86.12          5          22         27         10

20-JAN-99      79.07          6          23         26          9

13-JAN-99       56.1          7          24         25          8

07-JAN-99      42.97          8          25         24          7

08-JAN-99      33.81          9          26         23          6

10-JAN-99      22.76         10          27         21          4

02-JAN-99      22.76         10          27         21          4

26-JAN-99      19.84         12          29         20          3

16-JAN-99      11.27         13          30         19          2

14-JAN-99       9.52         14          31         18          1

09-JAN-99                    15           1          1         15

12-JAN-99                    15           1          1         15

31-JAN-99                    15           1          1         15

11-JAN-99                    15           1          1         15

19-JAN-99                    15           1          1         15

03-JAN-99                    15           1          1         15

15-JAN-99                    15           1          1         15

21-JAN-99                    15           1          1         15

24-JAN-99                    15           1          1         15

04-JAN-99                    15           1          1         15

06-JAN-99                    15           1          1         15

27-JAN-99                    15           1          1         15

18-JAN-99                    15           1          1         15

01-JAN-99                    15           1          1         15

22-JAN-99                    15           1          1         15

29-JAN-99                    15           1          1         15

05-JAN-99                    15           1          1         15

Cumulative distribution function: cume_dist() over (partition by c order by agg)

User: “I need to see sales by month and channel along with the relative position of the other sales in that month and channel.”

SELECT calendar_month_desc AS MONTH, channel_desc,

       --

       TO_CHAR(SUM(amount_sold) , '9,999,999,999') SALES$,

       --

       CUME_DIST() OVER (

           PARTITION BY calendar_month_desc

           ORDER BY SUM(amount_sold)

       ) AS CUME_DIST_BY_CHANNEL

       --

  FROM sales, products, customers, times, channels

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_month_desc IN ('2000-09', '2000-07','2000-08')

       --

 GROUP BY calendar_month_desc, channel_desc;

 

MONTH    CHANNEL_DESC         SALES$         CUME_DIST_BY_CHANNEL

-------- -------------------- -------------- --------------------

2000-07  Internet                    140,423           .333333333

2000-07  Partners                    611,064           .666666667

2000-07  Direct Sales              1,145,275                    1

2000-08  Internet                    215,107           .333333333

2000-08  Partners                    661,045           .666666667

2000-08  Direct Sales              1,236,104                    1

2000-09  Internet                    228,241           .333333333

2000-09  Partners                    666,172           .666666667

2000-09  Direct Sales              1,217,808                    1

 

Try changing the SELECT and GROUP BY clauses to these for another perspective:

SELECT calendar_month_desc AS MONTH, channel_desc, prod_weight_class,

 GROUP BY calendar_month_desc, channel_desc, prod_weight_class;

Percent rank function: percent_rank() over ( partition by col order by agg ( col ) )

User: “I need to see sales by month and channel along with the relative position of the other sales in that month and channel.”

SELECT calendar_month_desc AS MONTH, channel_desc,

       --

       TO_CHAR(SUM(amount_sold) , '9,999,999,999') SALES$,

       --

       PERCENT_RANK() OVER (

           PARTITION BY calendar_month_desc

           ORDER BY SUM(amount_sold)

       ) AS PERCENT_RANK_BY_CHANNEL

       --

  FROM sales, products, customers, times, channels

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_month_desc IN ('2000-09', '2000-07','2000-08')

       --

 GROUP BY calendar_month_desc, channel_desc;

 

MONTH    CHANNEL_DESC         SALES$         PERCENT_RANK_BY_CHANNEL

-------- -------------------- -------------- -----------------------

2000-07  Internet                    140,423                       0

2000-07  Partners                    611,064                      .5

2000-07  Direct Sales              1,145,275                       1

2000-08  Internet                    215,107                       0

2000-08  Partners                    661,045                      .5

2000-08  Direct Sales              1,236,104                       1

2000-09  Internet                    228,241                       0

2000-09  Partners                    666,172                      .5

2000-09  Direct Sales              1,217,808                       1

N-tile function: ntile (n) over (order by agg)

User: “What were the monthly sales by quartile?”

SELECT calendar_month_desc AS MONTH,

       --

       TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$,

       --

       NTILE( 4 ) OVER (

           ORDER BY SUM(amount_sold)

       ) AS TILE4

       --

  FROM sales, products, customers, times, channels

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_year = 2000

   AND prod_category = 'Electronics'

       --

 GROUP BY calendar_month_desc;

 

MONTH    SALES$              TILE4

-------- -------------- ----------

2000-02         242,416          1

2000-01         257,286          1

2000-03         280,011          1

2000-06         315,951          2

2000-05         316,824          2

2000-04         318,106          2

2000-07         433,824          3

2000-08         477,833          3

2000-12         553,534          3

2000-10         652,225          4

2000-11         661,147          4

2000-09         691,449          4

Row number function: row_number () over (order by agg)

Users: “I don’t like how rank handles ties…can you just give me straight row numbers so I have an order to work with in Excel?”

SELECT channel_desc, calendar_month_desc,

       --

       TO_CHAR(TRUNC(SUM(amount_sold), -5), '9,999,999,999') SALES$,

       --

       ROW_NUMBER() OVER (

           PARTITION BY channel_desc

           ORDER BY TRUNC(SUM(amount_sold), -6) DESC

       ) AS ROW_NUMBER

       --

  FROM sales, products, customers, times, channels

       --

 WHERE sales.prod_id = products.prod_id

   AND sales.cust_id = customers.cust_id

   AND sales.time_id = times.time_id

   AND sales.channel_id = channels.channel_id

   AND times.calendar_month_desc IN ('2001-09', '2001-10')

       --

 GROUP BY channel_desc, calendar_month_desc;

 

CHANNEL_DESC         CALENDAR SALES$         ROW_NUMBER

-------------------- -------- -------------- ----------

Direct Sales         2001-10       1,000,000          1

Direct Sales         2001-09       1,100,000          2

Internet             2001-09         500,000          1

Internet             2001-10         700,000          2

Partners             2001-09         600,000          1

Partners             2001-10         600,000          2

Windowing aggregate functions

Windowing functions can be used to compute cumulative, moving, and centered aggregates.

They return a value for each row in the table, which depends on other rows in the corresponding window.

With windowing aggregate functions, you can calculate moving and cumulative versions of SUM, AVERAGE, COUNT, MAX, MIN, and many more functions.

They can be used only in the SELECT and ORDER BY clauses of the query.

Windowing aggregate functions include the convenient FIRST_VALUE, which returns the first value in the window; and LAST_VALUE, which returns the last value in the window.

These functions provide access to more than one row of a table without a self-join.

Cumulative aggregate function: agg(agg) over (partition by c order by c rows unbounded preceding)

User: “I need a running total of sales within customer and month.”

SELECT c.cust_id, t.calendar_quarter_desc,

       --

       TO_CHAR (SUM(amount_sold), '9,999,999,999.99') AS Q_SALES,

       --

       TO_CHAR(

           SUM ( SUM(amount_sold) ) OVER (

               PARTITION BY c.cust_id

               ORDER BY c.cust_id, t.calendar_quarter_desc

               ROWS UNBOUNDED PRECEDING

           ),

          '9,999,999,999.99'

       ) AS CUM_SALES

       --

  FROM sales s, times t, customers c

       --

 WHERE s.time_id = t.time_id

   AND s.cust_id = c.cust_id

   AND t.calendar_year = 2000

   AND c.cust_id IN (2595, 9646, 11111)

       --

 GROUP BY c.cust_id, t.calendar_quarter_desc

       --

 ORDER BY c.cust_id, t.calendar_quarter_desc;

 

   CUST_ID CALENDA Q_SALES           CUM_SALES

---------- ------- ----------------- -----------------

      2595 2000-01            659.92            659.92

      2595 2000-02            224.79            884.71

      2595 2000-03            313.90          1,198.61

      2595 2000-04          6,015.08          7,213.69

      9646 2000-01          1,337.09          1,337.09

      9646 2000-02            185.67          1,522.76

      9646 2000-03            203.86          1,726.62

      9646 2000-04            458.29          2,184.91

     11111 2000-01             43.18             43.18

     11111 2000-02             33.33             76.51

     11111 2000-03            579.73            656.24

     11111 2000-04            307.58            963.82

Moving aggregate function: agg(agg) over (partition by c order by c rows 2 preceding)

User: “I need a three month moving average of sales for each month in 1999 for a given customer.”

SELECT c.cust_id, t.calendar_month_desc,

       --

       TO_CHAR (SUM(amount_sold), '9,999,999,999.99') AS SALES,

       --

       TO_CHAR(

           AVG( SUM(amount_sold) ) OVER (

               ORDER BY c.cust_id, t.calendar_month_desc

               ROWS 2 PRECEDING -- includes current for a total of 3 rows

           ),

           '9,999,999,999.99'

       ) AS MOVING_3_MONTH_AVG

       --

  FROM sales s, times t, customers c

       --

 WHERE s.time_id = t.time_id

   AND s.cust_id = c.cust_id

   AND t.calendar_year = 1999

   AND c.cust_id IN (6510)

       --

 GROUP BY c.cust_id, t.calendar_month_desc

       --

 ORDER BY c.cust_id, t.calendar_month_desc;

 

   CUST_ID CALENDAR SALES             MOVING_3_MONTH_AV

---------- -------- ----------------- -----------------

      6510 1999-04             124.69            124.69

      6510 1999-05           3,395.39          1,760.04

      6510 1999-06           4,080.31          2,533.46

      6510 1999-07           6,434.63          4,636.78

      6510 1999-08           5,104.73          5,206.56

      6510 1999-09           4,676.23          5,405.20

      6510 1999-10           5,108.68          4,963.21

      6510 1999-11             801.64          3,528.85

Centered aggregate function: agg(agg) over (partition by c order by c range between … preceding  and … following)

User: “I need to see each day’s sales averaged with the day before and after it by week for the year.”

SELECT t.time_id,

       --

       TO_CHAR (SUM(amount_sold), '9,999,999,999.99') AS SALES,

       --

       TO_CHAR(

           AVG( SUM(amount_sold) ) OVER (

               ORDER BY t.time_id

               RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING

           ),

           '9,999,999,999.99'

       ) AS CENTERED_3_DAY_AVG

       --

  FROM sales s, times t

       --

 WHERE s.time_id = t.time_id

   AND t.calendar_week_number IN (51)

   AND calendar_year = 1999

       --

 GROUP BY t.time_id

       --

 ORDER BY t.time_id;

 

TIME_ID           SALES             CENTERED_3_DAY_AV

----------------- ----------------- -----------------

1999:1220:0000:00        134,336.84        106,675.93

1999:1221:0000:00         79,015.02        102,538.71

1999:1222:0000:00         94,264.28         85,341.75

1999:1223:0000:00         82,745.96         93,322.31

1999:1224:0000:00        102,956.68         82,936.70

1999:1225:0000:00         63,107.46         87,062.22

1999:1226:0000:00         95,122.51         79,114.99

Windowing aggregate functions with logical offsets: agg over (order by c range between … preceding and current row)

User: “The moving average was fine if there are no gaps, but I have gaps.  Can it go back based on time not row count?”

SELECT time_id,

       --

       daily_sum,

       SUM( daily_sum ) OVER (

           ORDER BY time_id

           RANGE BETWEEN INTERVAL '5' DAY PRECEDING AND CURRENT ROW

       ) AS CURRENT_GROUP_SUM

       --

  FROM (

           SELECT time_id, channel_id, SUM(s.quantity_sold) AS daily_sum

             FROM customers c, sales s, countries

            WHERE c.cust_id = s.cust_id

              AND c.country_id = countries.country_id

              AND s.cust_id IN (638, 634, 753, 440 )

              AND s.time_id BETWEEN '20000501' AND '20000513'

            GROUP BY time_id, channel_id

       );

 

TIME_ID            DAILY_SUM CURRENT_GROUP_SUM

----------------- ---------- -----------------

2000:0506:0000:00          7                 7

2000:0510:0000:00          1                 9

2000:0510:0000:00          1                 9

2000:0511:0000:00          2                15

2000:0511:0000:00          4                15 (11, 10,  9,  8,  7,  6) = 5 days plus current

2000:0512:0000:00          1                 9 (12, 11, 10,  9,  8,  7) = 5 days plus current

2000:0513:0000:00          2                16 (12, 11, 10,  9,  8,  7) = 5 days plus current

2000:0513:0000:00          5                16 (12, 11, 10,  9,  8,  7) = 5 days plus current

 

-------------------------------------------------------------------------------------------------

 

SELECT time_id, channel_id, SUM(s.quantity_sold) AS daily_sum

  FROM customers c, sales s, countries

 WHERE c.cust_id = s.cust_id

   AND c.country_id = countries.country_id

   AND s.cust_id IN (638, 634, 753, 440 )

   AND s.time_id BETWEEN '20000501' AND '20000513'

 GROUP BY time_id, channel_id

 ORDER BY 1, 2;

 

TIME_ID           CHANNEL_ID  DAILY_SUM

----------------- ---------- ----------

2000:0506:0000:00          3          7

2000:0510:0000:00          3          1

2000:0510:0000:00          4          1

2000:0511:0000:00          2          2

2000:0511:0000:00          3          4

2000:0512:0000:00          2          1

2000:0513:0000:00          2          2

2000:0513:0000:00          3          5

Windowing aggregate functions with physical offsets: agg over (partition by c order by c rows unbounded preceding)

User: “Just keep summing up sales across all the rows within a given day.”

SELECT t.time_id,

       --

       TO_CHAR(amount_sold, '9,999,999,999.99') AS INDIV_SALE,

       --

       TO_CHAR(

           SUM( amount_sold ) OVER (

               PARTITION BY t.time_id

               ORDER BY t.time_id, s.amount_sold

               ROWS UNBOUNDED PRECEDING

           ),

           '9,999,999,999.99'

       ) AS CUM_SALES

       --

  FROM sales s, times t, customers c

       --

 WHERE s.time_id = t.time_id

   AND s.cust_id = c.cust_id

   AND t.time_id BETWEEN TO_DATE('19991228') AND TO_DATE('19991231')

   AND c.cust_id BETWEEN 6500 AND 6600

       --

 ORDER BY t.time_id, s.amount_sold;

 

TIME_ID           INDIV_SALE        CUM_SALES

----------------- ----------------- -----------------

1999:1228:0000:00              9.14              9.14

1999:1228:0000:00             10.15             19.29

1999:1228:0000:00             10.24             29.53

1999:1228:0000:00             11.38             40.91

1999:1228:0000:00             35.55             76.46

1999:1228:0000:00             39.47            115.93

1999:1228:0000:00             40.62            156.55

1999:1228:0000:00             45.71            202.26

1999:1228:0000:00             45.71            247.97

1999:1228:0000:00             45.74            293.71

1999:1228:0000:00             47.57            341.28

1999:1228:0000:00             51.37            392.65

1999:1228:0000:00             51.96            444.61

1999:1228:0000:00             51.96            496.57

1999:1228:0000:00             51.96            548.53

1999:1228:0000:00             52.69            601.22

1999:1228:0000:00             64.61            665.83

1999:1230:0000:00             29.27             29.27

1999:1230:0000:00             51.65             80.92

1999:1230:0000:00             55.87            136.79

Reporting Aggregate Functions

Per the documentation: After a query has been processed, aggregate values like the number of resulting rows or an average value in a column can be easily computed within a partition and made available to other reporting functions.

Reporting aggregate functions return the same aggregate value for every row in a partition. Their behavior with respect to NULLs is the same as the SQL aggregate functions.

More simply: The aggregate functions from regular SQL and that you have put in SELECT clauses many times also work in partitions too.

Reporting aggregate: agg(agg) over (partition by c)

User: “For each product category, what was the region in which it had maximum sales.”

CREATE OR REPLACE VIEW MAX_REG_SALES_VIEW AS

       --

SELECT SUBSTR(p.prod_category,1,8) AS prod_category, co.country_region,

       --

       SUM(amount_sold) AS sales,

       --

       MAX( SUM( amount_sold ) ) OVER (

           PARTITION BY prod_category

       ) AS MAX_REG_SALES

       --

  FROM sales s, customers c, countries co, products p

       --

 WHERE s.cust_id = c.cust_id

   AND c.country_id = co.country_id

   AND s.prod_id = p.prod_id

   AND s.time_id = TO_DATE('20011011')

       --

 GROUP BY prod_category, country_region;

 

SELECT prod_category,

       country_region,

       to_char(sales,’999,999.99’) sales,

       to_char(max_reg_sales,’999,999.99’) max_reg_sales

  FROM max_reg_sales_view;

 

 

PROD_CAT COUNTRY_REGION       SALES       MAX_REG_SAL

-------- -------------------- ----------- -----------

Electron Americas                  581.92      581.92

Hardware Americas                  925.93      925.93

Peripher Americas                3,084.48    4,290.38

Peripher Asia                    2,616.51    4,290.38

Peripher Europe                  4,290.38    4,290.38

Peripher Oceania                   940.43    4,290.38

Software Americas                4,445.70    4,445.70

Software Asia                    1,408.19    4,445.70

Software Europe                  3,288.83    4,445.70

Software Oceania                   890.25    4,445.70

 

-------------------------------------------------------------------------------------------------

 

SELECT prod_category,

       country_region,

       to_char(sales,’999,999.99’) sales,

       to_char(max_reg_sales,’999,999.99’) max_reg_sales

  FROM max_reg_sales_view

 WHERE sales = MAX_REG_SALES;

 

 

PROD_CAT COUNTRY_REGION       SALES       MAX_REG_SAL

-------- -------------------- ----------- -----------

Electron Americas                  581.92      581.92

Hardware Americas                  925.93      925.93

Peripher Europe                  4,290.38    4,290.38

Software Americas                4,445.70    4,445.70

Ratio to report: ratio_to_report(agg) over ()

User: “I need the sales by channel, but I want to see the total sales, and what percentage each channel is compared to the total.”

SELECT ch.channel_desc,

       --

       TO_CHAR(SUM(amount_sold),'9,999,999') AS SALES,

       --

       TO_CHAR(

           SUM( SUM(amount_sold) ) OVER (

           ),

           '9,999,999'

       ) AS TOTAL_SALES,

       --

       TO_CHAR(

           RATIO_TO_REPORT( SUM(amount_sold) ) OVER (

           ),

           '9.999'

       ) AS RATIO_TO_REPORT

       --

  FROM sales s, channels ch

       --

 WHERE s.channel_id = ch.channel_id

   AND s.time_id = TO_DATE('20001011')

       --

 GROUP BY ch.channel_desc;

 

CHANNEL_DESC         SALES      TOTAL_SALE RATIO_

-------------------- ---------- ---------- ------

Direct Sales             14,447     23,183   .623

Internet                    345     23,183   .015

Partners                  8,391     23,183   .362

LAG/LEAD functions

Useful for comparing values when the relative positions of rows can be known reliably.  Notice that you can frequently induce this using other SQL constructs.  For example, group by date and you know the previous date, etc.  Because the functions provide access to more than one row of a table at the same time without a self-join, they can enhance processing speed.

Lag/Lead: lag|lead ( sum ( col ), n ) over ( order by col )

User: “Show me how much sales have changed from day to day”.

CREATE OR REPLACE VIEW SALES_OFFSET AS

SELECT time_id,

       --

       SUM(amount_sold) AS SALES,

       --

       LAG( SUM(amount_sold),1 ) OVER (

           ORDER BY time_id

       ) AS LAG1,

       --

       LEAD( SUM(amount_sold),1 ) OVER (

           ORDER BY time_id

       ) AS LEAD1

       --

  FROM sales

       --

 WHERE time_id > = TO_DATE('20001010') AND time_id < = TO_DATE('20001014')

       --

 GROUP BY time_id;

 

SELECT * FROM SALES_OFFSET;

 

 

TIME_ID                SALES      LAG1       LEAD1

----------------- ---------- ---------- ----------

2000:1010:0000:00  238479.49              23183.45

2000:1011:0000:00   23183.45  238479.49   24616.04

2000:1012:0000:00   24616.04   23183.45   76515.61

2000:1013:0000:00   76515.61   24616.04   29794.78

2000:1014:0000:00   29794.78   76515.61

 

-------------------------------------------------------------------------------------------------

 

SELECT time_id,

       sales,

       sales – lag1 AS CHANGE_SINCE_YESTERDAY,

       lead1 – sales AS CHANGE_BY_TOMORROW

  FROM SALES_OFFSET

       --

 ORDER BY 1;

 

 

TIME_ID                SALES CHANGE_SINCE_YESTERDAY CHANGE_BY_TOMORROW

----------------- ---------- ---------------------- ------------------

2000:1010:0000:00  238479.49                                -215296.04

2000:1011:0000:00   23183.45             -215296.04            1432.59

2000:1012:0000:00   24616.04                1432.59           51899.57

2000:1013:0000:00   76515.61               51899.57          -46720.83

2000:1014:0000:00   29794.78              -46720.83

Citations

Much of the content for this page is based on Oracle Database SQL Language Reference, 11g Release 1 (11.1), About SQL Functions:

http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm

 


This site does not require or capture any of your personal or financial information.
http://www.qa76.net  |  © 2019 James Madison  |  qa76.net@gmail.com