|
|
SQL for analysis
Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. Aggregate functions you are probably familiar with are SUM, AVG, MIN, MAX, etc. Notice how these return just one row. The group of rows is called a “window”. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time. Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause. Syntaxanalytic_function :: = analytic_clause :: = query_partition_clause :: = order_by_clause :: = windowing_clause :: = Execution orderThe parts of the analytic operation are executed in a certain order. Knowing this order is critical to understanding how the analytical function works. The order is: Group by – Must go first as it heavily influences the structure of the data before any part of an analytic function. Partition – Breaks the data set from the main SELECT…GROUP BY into parts that we define. Order – Orders the partition for two reasons: for presentation just like ORDER BY, or so the window can slide over it. Window – Slide a window along the ordered partition so only a portion is used as the window slides. Analytic – Apply the analytic to the window moving across the ordered partition. This can be thought of as the pipe and filter design pattern. ColorizationThe annotation used in this document clarifies the following important constructs of the analytic query: Query partition clause Order by clause Windowing clause Analytic function Arguments to the analytic function GROUP BY clause Five colors are used are as follows: analytic ( [ arguments ] ) OVER ( [ partition ] [ order [ window ] ] ) Additional colors are used to highlight other specific features as needed. GROUP BY columns are both isolated in the SELECT clause, then underlined within the GROUP BY clause: SELECT column_one, column_two, ... GROUP BY column_one, column_two Then underlined again in the result set: COLUMN_ONE COLUMN_TWO ------------ ------------ And are listed as the first columns in the SELECT clause, separated by a comment line: SELECT column_one, column_two, -- Printing IssuesAll formatting on this page seems to render correctly in all major browsers. How formatting comes out in printing varies by browser. Known issues are: · Internet Explorer fails to print the syntax diagrams. · All browsers fail to print highlighting colors. Ranking functionsA ranking function computes the rank of a record compared to other records in the data set based on the values of a set of measures. Ranking on a single expression: rank() over (order by agg)User: “What is the ranking of the total sales within a channel for the last few months in the U.S.?” SELECT channel_desc, -- TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$, -- RANK() OVER ( ORDER BY SUM(amount_sold) ) AS default_rank, -- RANK() OVER ( ORDER BY SUM(amount_sold) DESC NULLS LAST ) AS custom_rank -- FROM sales, products, customers, times, channels, countries -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND customers.country_id = countries.country_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_month_desc IN ('2000-09', '2000-10') AND country_iso_code = 'US' -- GROUP BY channel_desc ORDER BY 2 ASC; CHANNEL_DESC SALES$ DEFAULT_RANK CUSTOM_RANK -------------------- -------------- ------------ ----------- Direct Sales 1,320,497 3 1 Partners 800,871 2 2 Internet 261,278 1 3 Dense rank: rank() over (order by agg)User: “What is the ranking of the total sales within a channel for the last few months? Show me different ways to break ties.” SELECT channel_desc, calendar_month_desc, -- TO_CHAR(TRUNC(SUM(amount_sold),-4), '9,999,999,999') SALES$, -- RANK() OVER ( ORDER BY TRUNC(SUM(amount_sold),-4) DESC ) AS RANK, -- DENSE_RANK() OVER ( ORDER BY TRUNC(SUM(amount_sold),-4) DESC ) AS DENSE_RANK -- FROM sales, products, customers, times, channels -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_month_desc IN ('2000-09', '2000-10') AND channels.channel_desc <> 'Tele Sales' -- GROUP BY channel_desc, calendar_month_desc; CHANNEL_DESC CALENDAR SALES$ RANK DENSE_RANK -------------------- -------- -------------- ---------- ---------- Direct Sales 2000-09 1,200,000 1 1 Direct Sales 2000-10 1,200,000 1 1 Partners 2000-09 600,000 3 2 Partners 2000-10 600,000 3 2 Internet 2000-09 200,000 5 3 Internet 2000-10 200,000 5 3 Ranking on multiple expressions: rank() over (order by agg, agg)User: “What is the ranking of the total sales within a channel for the last few months, breaking any ties with number of sales?” SELECT channel_desc, calendar_month_desc, -- TO_CHAR(TRUNC(SUM(amount_sold),-5),'9,999,999,999') SALES$, TO_CHAR(SUM(quantity_sold), '9,999,999,999') SALES_Count, -- RANK() OVER ( ORDER BY TRUNC(SUM(amount_sold), -5) DESC, SUM(quantity_sold) DESC ) AS col_rank -- FROM sales, products, customers, times, channels -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_month_desc IN ('2000-09', '2000-10') AND channels.channel_desc <> 'Tele Sales' -- GROUP BY channel_desc, calendar_month_desc; CHANNEL_DESC
CALENDAR SALES$ SALES_COUNT
COL_RANK -------------------- -------- -------------- -------------- ---------- Direct Sales 2000-10 1,200,000 12,584 1 Direct Sales 2000-09 1,200,000 11,995 2 Partners 2000-10 600,000 7,508 3 Partners 2000-09 600,000 6,165 4 Internet 2000-09 200,000 1,887 5 Internet 2000-10 200,000 1,450 6 Per group ranking: rank() over (partition by c order by agg)User: “Actually, we award for the best in any month regardless of channel and the best in channel for the last four months.” SELECT channel_desc, calendar_month_desc, -- TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$, -- RANK() OVER ( PARTITION BY calendar_month_desc ORDER BY SUM(amount_sold) DESC ) AS RANK_WITHIN_MONTH, -- RANK() OVER ( PARTITION BY channel_desc ORDER BY SUM(amount_sold) DESC ) AS RANK_WITHIN_CHANNEL -- FROM sales, products, customers, times, channels, countries -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND customers.country_id = countries.country_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_month_desc IN ('2000-08', '2000-09', '2000-10', '2000-11') AND channels.channel_desc IN ('Direct Sales', 'Internet') -- GROUP BY channel_desc, calendar_month_desc; CHANNEL_DESC CALENDAR SALES$ RANK_WITHIN_MONTH RANK_WITHIN_CHANNEL ------------- -------- --------- ----------------- ------------------- Direct Sales 2000-08 1,236,104 1 1 Internet 2000-08 215,107 2 4 Direct Sales 2000-09 1,217,808 1 3 Internet 2000-09 228,241 2 3 Direct Sales 2000-10 1,225,584 1 2 Internet 2000-10 239,236 2 2 Direct Sales 2000-11 1,115,239 1 4 Internet 2000-11 284,742 2 1 Per cube and rollup group ranking: rank() over (partition by c order by agg) ... group by rollup|cubeUser: “I like the sum-of-every-combination thing you showed me last week, but chunk it by channel and nation then rank it.” SELECT channel_desc, country_iso_code, -- TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$, -- RANK() OVER ( PARTITION BY GROUPING_ID(channel_desc, country_iso_code) -- observe purple carefully ORDER BY SUM(amount_sold) DESC ) AS RANK_PER_GROUP, -- GROUPING_ID(channel_desc, country_iso_code) GROUPING_ID -- FROM sales, customers, times, channels, countries -- WHERE sales.time_id = times.time_id AND sales.cust_id = customers.cust_id AND sales.channel_id = channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc = '2000-09' AND country_iso_code IN ('GB', 'US', 'JP') -- GROUP BY CUBE(channel_desc, country_iso_code); CHANNEL_DESC CO SALES$ RANK_PER_GROUP GROUPING_ID -- observe purple carefully -------------------- -- -------------- -------------- ----------- Direct Sales GB 1,217,808 1 0 Direct Sales JP 1,217,808 1 0 Direct Sales US 1,217,808 1 0 Internet GB 228,241 4 0 Internet JP 228,241 4 0 Internet US 228,241 4 0 Direct Sales 3,653,423 1 1 Internet 684,724 2 1 GB 1,446,049 1 2 JP 1,446,049 1 2 US 1,446,049 1 2 4,338,147 1 3 Treatment of NULLs: agg() over (order by c asc|desc nulls first|last)User: “The operational system gives us lots of nulls, and I’m not sure how to rank them…can you show me the options?” SELECT times.time_id time, -- sold, -- RANK() OVER ( ORDER BY (sold) DESC NULLS LAST ) AS NLAST_DESC, RANK() OVER ( ORDER BY (sold) DESC NULLS FIRST ) AS NFIRST_DESC, RANK() OVER ( ORDER BY (sold) ASC NULLS FIRST ) AS NFIRST, RANK() OVER ( ORDER BY (sold) ASC NULLS LAST ) AS NLAST -- FROM ( SELECT time_id, SUM(sales.amount_sold) sold FROM sales, products, customers, countries WHERE sales.prod_id = products.prod_id AND customers.country_id = countries.country_id AND sales.cust_id = customers.cust_id AND prod_name IN ('Envoy Ambassador', 'Mouse Pad') AND country_iso_code = 'GB' GROUP BY time_id ) v, times -- WHERE v.time_id(+) = times.time_id AND calendar_year = 1999 AND calendar_month_number = 1 -- ORDER BY sold DESC NULLS LAST; TIME SOLD NLAST_DESC NFIRST_DESC NFIRST NLAST ---------
---------- ---------- ----------- ---------- ---------- 25-JAN-99 3097.32 1 18 31 14 17-JAN-99 1791.77 2 19 30 13 30-JAN-99 127.69 3 20 29 12 28-JAN-99 120.34 4 21 28 11 23-JAN-99 86.12 5 22 27 10 20-JAN-99 79.07 6 23 26 9 13-JAN-99 56.1 7 24 25 8 07-JAN-99 42.97 8 25 24 7 08-JAN-99 33.81 9 26 23 6 10-JAN-99 22.76 10 27 21 4 02-JAN-99 22.76 10 27 21 4 26-JAN-99 19.84 12 29 20 3 16-JAN-99 11.27 13 30 19 2 14-JAN-99 9.52 14 31 18
1 09-JAN-99 15 1 1 15 12-JAN-99 15 1 1 15 31-JAN-99 15 1 1 15 11-JAN-99 15 1
1 15 19-JAN-99 15 1 1 15 03-JAN-99 15 1 1 15 15-JAN-99 15 1 1 15 21-JAN-99 15 1
1 15 24-JAN-99 15 1 1 15 04-JAN-99 15 1 1 15 06-JAN-99 15 1 1 15 27-JAN-99 15 1 1 15 18-JAN-99 15 1 1 15 01-JAN-99 15 1 1 15 22-JAN-99 15 1 1 15 29-JAN-99 15 1 1 15 05-JAN-99 15 1 1 15 Cumulative distribution function: cume_dist() over (partition by c order by agg)User: “I need to see sales by month and channel along with the relative position of the other sales in that month and channel.” SELECT calendar_month_desc AS MONTH, channel_desc, -- TO_CHAR(SUM(amount_sold) , '9,999,999,999') SALES$, -- CUME_DIST() OVER ( PARTITION BY calendar_month_desc ORDER BY SUM(amount_sold) ) AS CUME_DIST_BY_CHANNEL -- FROM sales, products, customers, times, channels -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_month_desc IN ('2000-09', '2000-07','2000-08') -- GROUP BY calendar_month_desc, channel_desc; MONTH CHANNEL_DESC SALES$ CUME_DIST_BY_CHANNEL -------- -------------------- -------------- -------------------- 2000-07 Internet 140,423 .333333333 2000-07 Partners 611,064 .666666667 2000-07 Direct Sales 1,145,275 1 2000-08 Internet 215,107 .333333333 2000-08 Partners 661,045 .666666667 2000-08 Direct Sales 1,236,104 1 2000-09 Internet 228,241 .333333333 2000-09 Partners 666,172 .666666667 2000-09 Direct Sales 1,217,808 1 Try changing the SELECT and GROUP BY clauses to these for another perspective: SELECT calendar_month_desc AS MONTH, channel_desc, prod_weight_class, GROUP BY calendar_month_desc, channel_desc, prod_weight_class; Percent rank function: percent_rank() over ( partition by col order by agg ( col ) )User: “I need to see sales by month and channel along with the relative position of the other sales in that month and channel.” SELECT calendar_month_desc AS MONTH, channel_desc, -- TO_CHAR(SUM(amount_sold) , '9,999,999,999') SALES$, -- PERCENT_RANK() OVER ( PARTITION BY calendar_month_desc ORDER BY SUM(amount_sold) ) AS PERCENT_RANK_BY_CHANNEL -- FROM sales, products, customers, times, channels -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_month_desc IN ('2000-09', '2000-07','2000-08') -- GROUP BY calendar_month_desc, channel_desc; MONTH CHANNEL_DESC SALES$ PERCENT_RANK_BY_CHANNEL -------- -------------------- -------------- ----------------------- 2000-07 Internet 140,423 0 2000-07 Partners 611,064 .5 2000-07 Direct Sales 1,145,275 1 2000-08 Internet 215,107 0 2000-08 Partners 661,045 .5 2000-08 Direct Sales 1,236,104 1 2000-09 Internet 228,241 0 2000-09 Partners 666,172 .5 2000-09 Direct Sales 1,217,808 1 N-tile function: ntile (n) over (order by agg)User: “What were the monthly sales by quartile?” SELECT calendar_month_desc AS MONTH, -- TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$, -- NTILE( 4 ) OVER ( ORDER BY SUM(amount_sold) ) AS TILE4 -- FROM sales, products, customers, times, channels -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_year = 2000 AND prod_category = 'Electronics' -- GROUP BY calendar_month_desc; MONTH SALES$ TILE4 -------- -------------- ---------- 2000-02 242,416 1 2000-01 257,286 1 2000-03 280,011 1 2000-06 315,951 2 2000-05 316,824 2 2000-04 318,106 2 2000-07 433,824 3 2000-08 477,833 3 2000-12 553,534 3 2000-10 652,225 4 2000-11 661,147 4 2000-09 691,449 4 Row number function: row_number () over (order by agg)Users: “I don’t like how rank handles ties…can you just give me straight row numbers so I have an order to work with in Excel?” SELECT channel_desc, calendar_month_desc, -- TO_CHAR(TRUNC(SUM(amount_sold), -5), '9,999,999,999') SALES$, -- ROW_NUMBER() OVER ( PARTITION BY channel_desc ORDER BY TRUNC(SUM(amount_sold), -6) DESC ) AS ROW_NUMBER -- FROM sales, products, customers, times, channels -- WHERE sales.prod_id = products.prod_id AND sales.cust_id = customers.cust_id AND sales.time_id = times.time_id AND sales.channel_id = channels.channel_id AND times.calendar_month_desc IN ('2001-09', '2001-10') -- GROUP BY channel_desc, calendar_month_desc; CHANNEL_DESC CALENDAR SALES$ ROW_NUMBER -------------------- -------- -------------- ---------- Direct Sales 2001-10 1,000,000 1 Direct Sales 2001-09 1,100,000 2 Internet 2001-09 500,000 1 Internet 2001-10 700,000 2 Partners 2001-09 600,000 1 Partners 2001-10 600,000 2 Windowing aggregate functionsWindowing functions can be used to compute cumulative, moving, and centered aggregates. They return a value for each row in the table, which depends on other rows in the corresponding window. With windowing aggregate functions, you can calculate moving and cumulative versions of SUM, AVERAGE, COUNT, MAX, MIN, and many more functions. They can be used only in the SELECT and ORDER BY clauses of the query. Windowing aggregate functions include the convenient FIRST_VALUE, which returns the first value in the window; and LAST_VALUE, which returns the last value in the window. These functions provide access to more than one row of a table without a self-join. Cumulative aggregate function: agg(agg) over (partition by c order by c rows unbounded preceding)User: “I need a running total of sales within customer and month.” SELECT c.cust_id, t.calendar_quarter_desc, -- TO_CHAR (SUM(amount_sold), '9,999,999,999.99') AS Q_SALES, -- TO_CHAR( SUM ( SUM(amount_sold) ) OVER ( PARTITION BY c.cust_id ORDER BY c.cust_id, t.calendar_quarter_desc ROWS UNBOUNDED PRECEDING ), '9,999,999,999.99' ) AS CUM_SALES -- FROM sales s, times t, customers c -- WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND t.calendar_year = 2000 AND c.cust_id IN (2595, 9646, 11111) -- GROUP BY c.cust_id, t.calendar_quarter_desc -- ORDER BY c.cust_id, t.calendar_quarter_desc; CUST_ID CALENDA Q_SALES CUM_SALES ---------- ------- ----------------- ----------------- 2595 2000-01 659.92 659.92 2595 2000-02 224.79 884.71 2595 2000-03 313.90 1,198.61 2595 2000-04 6,015.08 7,213.69 9646 2000-01 1,337.09 1,337.09 9646 2000-02 185.67 1,522.76 9646 2000-03 203.86 1,726.62 9646 2000-04 458.29 2,184.91 11111 2000-01 43.18 43.18 11111 2000-02 33.33 76.51 11111 2000-03 579.73 656.24 11111 2000-04 307.58 963.82 Moving aggregate function: agg(agg) over (partition by c order by c rows 2 preceding)User: “I need a three month moving average of sales for each month in 1999 for a given customer.” SELECT c.cust_id, t.calendar_month_desc, -- TO_CHAR (SUM(amount_sold), '9,999,999,999.99') AS SALES, -- TO_CHAR( AVG( SUM(amount_sold) ) OVER ( ORDER BY c.cust_id, t.calendar_month_desc ROWS 2 PRECEDING -- includes current for a total of 3 rows ), '9,999,999,999.99' ) AS MOVING_3_MONTH_AVG -- FROM sales s, times t, customers c -- WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND t.calendar_year = 1999 AND c.cust_id IN (6510) -- GROUP BY c.cust_id, t.calendar_month_desc -- ORDER BY c.cust_id, t.calendar_month_desc; CUST_ID CALENDAR SALES MOVING_3_MONTH_AV ---------- -------- ----------------- ----------------- 6510 1999-04 124.69 124.69 6510 1999-05 3,395.39 1,760.04 6510 1999-06 4,080.31 2,533.46 6510 1999-07 6,434.63 4,636.78 6510 1999-08 5,104.73 5,206.56 6510 1999-09 4,676.23 5,405.20 6510 1999-10 5,108.68 4,963.21 6510 1999-11 801.64 3,528.85 Centered aggregate function: agg(agg) over (partition by c order by c range between … preceding and … following)User: “I need to see each day’s sales averaged with the day before and after it by week for the year.” SELECT t.time_id, -- TO_CHAR (SUM(amount_sold), '9,999,999,999.99') AS SALES, -- TO_CHAR( AVG( SUM(amount_sold) ) OVER ( ORDER BY t.time_id RANGE BETWEEN INTERVAL '1' DAY PRECEDING AND INTERVAL '1' DAY FOLLOWING ), '9,999,999,999.99' ) AS CENTERED_3_DAY_AVG -- FROM sales s, times t -- WHERE s.time_id = t.time_id AND t.calendar_week_number IN (51) AND calendar_year = 1999 -- GROUP BY t.time_id -- ORDER BY t.time_id; TIME_ID SALES CENTERED_3_DAY_AV ----------------- ----------------- ----------------- 1999:1220:0000:00 134,336.84 106,675.93 1999:1221:0000:00 79,015.02 102,538.71 1999:1222:0000:00 94,264.28 85,341.75 1999:1223:0000:00 82,745.96 93,322.31 1999:1224:0000:00 102,956.68 82,936.70 1999:1225:0000:00 63,107.46 87,062.22 1999:1226:0000:00 95,122.51 79,114.99 Windowing aggregate functions with logical offsets: agg over (order by c range between … preceding and current row)User: “The moving average was fine if there are no gaps, but I have gaps. Can it go back based on time not row count?” SELECT time_id,
-- daily_sum, SUM( daily_sum )
OVER ( ORDER BY time_id RANGE BETWEEN INTERVAL '5' DAY PRECEDING AND CURRENT ROW ) AS CURRENT_GROUP_SUM -- FROM ( SELECT time_id, channel_id,
SUM(s.quantity_sold) AS daily_sum FROM customers c, sales s,
countries WHERE c.cust_id = s.cust_id AND c.country_id =
countries.country_id AND s.cust_id IN (638, 634, 753,
440 ) AND s.time_id BETWEEN '20000501'
AND '20000513' GROUP BY time_id, channel_id ); TIME_ID DAILY_SUM CURRENT_GROUP_SUM -----------------
---------- ----------------- 2000:0506:0000:00 7 7 2000:0510:0000:00 1 9 2000:0510:0000:00 1 9 2000:0511:0000:00 2 15 2000:0511:0000:00 4 15 (11, 10, 9,
8, 7, 6) = 5 days plus current 2000:0512:0000:00 1 9 (12, 11, 10, 9,
8, 7) = 5 days plus current 2000:0513:0000:00 2 16 (12, 11, 10, 9,
8, 7) = 5 days plus current 2000:0513:0000:00 5 16 (12, 11, 10, 9,
8, 7) = 5 days plus current ------------------------------------------------------------------------------------------------- SELECT time_id,
channel_id, SUM(s.quantity_sold) AS daily_sum FROM customers c, sales s, countries WHERE c.cust_id = s.cust_id AND c.country_id = countries.country_id AND s.cust_id IN (638, 634, 753, 440 ) AND s.time_id BETWEEN '20000501' AND
'20000513' GROUP BY time_id, channel_id ORDER BY 1, 2; TIME_ID
CHANNEL_ID DAILY_SUM -----------------
---------- ---------- 2000:0506:0000:00 3 7 2000:0510:0000:00 3 1 2000:0510:0000:00 4 1 2000:0511:0000:00 2 2 2000:0511:0000:00 3 4 2000:0512:0000:00 2 1 2000:0513:0000:00 2 2 2000:0513:0000:00 3 5 Windowing aggregate functions with physical offsets: agg over (partition by c order by c rows unbounded preceding)User: “Just keep summing up sales across all the rows within a given day.” SELECT t.time_id, -- TO_CHAR(amount_sold, '9,999,999,999.99') AS INDIV_SALE, -- TO_CHAR( SUM( amount_sold ) OVER ( PARTITION BY t.time_id ORDER BY t.time_id, s.amount_sold ROWS UNBOUNDED PRECEDING ), '9,999,999,999.99' ) AS CUM_SALES -- FROM sales s, times t, customers c -- WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND t.time_id BETWEEN TO_DATE('19991228') AND TO_DATE('19991231') AND c.cust_id BETWEEN 6500 AND 6600 -- ORDER BY t.time_id, s.amount_sold; TIME_ID INDIV_SALE CUM_SALES ----------------- ----------------- ----------------- 1999:1228:0000:00 9.14 9.14 1999:1228:0000:00 10.15 19.29 1999:1228:0000:00 10.24 29.53 1999:1228:0000:00 11.38 40.91 1999:1228:0000:00 35.55 76.46 1999:1228:0000:00 39.47 115.93 1999:1228:0000:00 40.62 156.55 1999:1228:0000:00 45.71 202.26 1999:1228:0000:00 45.71 247.97 1999:1228:0000:00 45.74 293.71 1999:1228:0000:00 47.57 341.28 1999:1228:0000:00 51.37 392.65 1999:1228:0000:00 51.96 444.61 1999:1228:0000:00 51.96 496.57 1999:1228:0000:00 51.96 548.53 1999:1228:0000:00 52.69 601.22 1999:1228:0000:00 64.61 665.83 1999:1230:0000:00 29.27 29.27 1999:1230:0000:00 51.65 80.92 1999:1230:0000:00 55.87 136.79 Reporting Aggregate FunctionsPer the documentation: After a query has been processed, aggregate values like the number of resulting rows or an average value in a column can be easily computed within a partition and made available to other reporting functions. Reporting aggregate functions return the same aggregate value for every row in a partition. Their behavior with respect to NULLs is the same as the SQL aggregate functions. More simply: The aggregate functions from regular SQL and that you have put in SELECT clauses many times also work in partitions too. Reporting aggregate: agg(agg) over (partition by c)User: “For each product category, what was the region in which it had maximum sales.” CREATE OR REPLACE VIEW MAX_REG_SALES_VIEW AS -- SELECT SUBSTR(p.prod_category,1,8) AS prod_category, co.country_region, -- SUM(amount_sold) AS sales, -- MAX( SUM( amount_sold ) ) OVER ( PARTITION BY prod_category ) AS MAX_REG_SALES -- FROM sales s, customers c, countries co, products p -- WHERE s.cust_id = c.cust_id AND c.country_id = co.country_id AND s.prod_id = p.prod_id AND s.time_id = TO_DATE('20011011') -- GROUP BY prod_category, country_region; SELECT prod_category, country_region, to_char(sales,’999,999.99’) sales, to_char(max_reg_sales,’999,999.99’) max_reg_sales FROM max_reg_sales_view; PROD_CAT COUNTRY_REGION SALES MAX_REG_SAL -------- -------------------- ----------- ----------- Electron Americas 581.92 581.92 Hardware Americas 925.93 925.93 Peripher Americas 3,084.48 4,290.38 Peripher Asia 2,616.51 4,290.38 Peripher Europe 4,290.38 4,290.38 Peripher Oceania 940.43 4,290.38 Software Americas 4,445.70 4,445.70 Software Asia 1,408.19 4,445.70 Software Europe 3,288.83 4,445.70 Software Oceania 890.25 4,445.70 ------------------------------------------------------------------------------------------------- SELECT prod_category, country_region, to_char(sales,’999,999.99’) sales, to_char(max_reg_sales,’999,999.99’) max_reg_sales FROM max_reg_sales_view WHERE sales = MAX_REG_SALES; PROD_CAT COUNTRY_REGION SALES MAX_REG_SAL -------- -------------------- ----------- ----------- Electron Americas 581.92 581.92 Hardware Americas 925.93 925.93 Peripher Europe 4,290.38 4,290.38 Software Americas 4,445.70 4,445.70 Ratio to report: ratio_to_report(agg) over ()User: “I need the sales by channel, but I want to see the total sales, and what percentage each channel is compared to the total.” SELECT ch.channel_desc, -- TO_CHAR(SUM(amount_sold),'9,999,999') AS SALES, -- TO_CHAR( SUM( SUM(amount_sold) ) OVER ( ), '9,999,999' ) AS TOTAL_SALES, -- TO_CHAR( RATIO_TO_REPORT( SUM(amount_sold) ) OVER ( ), '9.999' ) AS RATIO_TO_REPORT -- FROM sales s, channels ch -- WHERE s.channel_id = ch.channel_id AND s.time_id = TO_DATE('20001011') -- GROUP BY ch.channel_desc; CHANNEL_DESC SALES TOTAL_SALE RATIO_ -------------------- ---------- ---------- ------ Direct Sales 14,447 23,183 .623 Internet 345 23,183 .015 Partners 8,391 23,183 .362 LAG/LEAD functionsUseful for comparing values when the relative positions of rows can be known reliably. Notice that you can frequently induce this using other SQL constructs. For example, group by date and you know the previous date, etc. Because the functions provide access to more than one row of a table at the same time without a self-join, they can enhance processing speed. Lag/Lead: lag|lead ( sum ( col ), n ) over ( order by col )User: “Show me how much sales have changed from day to day”. CREATE OR REPLACE VIEW SALES_OFFSET AS SELECT time_id, -- SUM(amount_sold) AS SALES, -- LAG( SUM(amount_sold),1 ) OVER ( ORDER BY time_id ) AS LAG1, -- LEAD( SUM(amount_sold),1 ) OVER ( ORDER BY time_id ) AS LEAD1 -- FROM sales -- WHERE time_id > = TO_DATE('20001010') AND time_id < = TO_DATE('20001014') -- GROUP BY time_id; SELECT * FROM SALES_OFFSET; TIME_ID SALES LAG1 LEAD1 ----------------- ---------- ---------- ---------- 2000:1010:0000:00 238479.49 23183.45 2000:1011:0000:00 23183.45 238479.49 24616.04 2000:1012:0000:00 24616.04 23183.45 76515.61 2000:1013:0000:00 76515.61 24616.04 29794.78 2000:1014:0000:00 29794.78 76515.61 ------------------------------------------------------------------------------------------------- SELECT time_id, sales, sales – lag1 AS CHANGE_SINCE_YESTERDAY, lead1 – sales AS CHANGE_BY_TOMORROW FROM SALES_OFFSET -- ORDER BY 1; TIME_ID SALES CHANGE_SINCE_YESTERDAY CHANGE_BY_TOMORROW ----------------- ---------- ---------------------- ------------------ 2000:1010:0000:00 238479.49 -215296.04 2000:1011:0000:00 23183.45 -215296.04 1432.59 2000:1012:0000:00 24616.04 1432.59 51899.57 2000:1013:0000:00 76515.61 51899.57 -46720.83 2000:1014:0000:00 29794.78 -46720.83 CitationsMuch of the content for this page is based on Oracle Database SQL Language Reference, 11g Release 1 (11.1), About SQL Functions: http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/functions001.htm |