Windowing functiions

from the Artful MySQL Tips List


Added in MySQL 8 and MariaDB 10, SQL window-analytic functions extend and refine the concept of aggregation. The windowing function OVER() works with all aggregate functions except GROUP_CONCAT()—essentially making them window-aware—and works also with window-analytic functions:
CUME_DIST() Cumulative distribution value
DENSE_RANK() Rank of current row in its window without gaps
FIRST_VALUE() Argument value from first row of window
LAG() Argument value from row lagging current row in window
LAST_VALUE() Argument value from last row of window
LEAD() Argument value from row leading current row in window
NTH_VALUE() Argument value from Nth row of window
NTILE() Bucket number of current row in its window
PERCENT_RANK() Cumulative percent rank of argument value
RANK() Current row rank within its window, with gaps
ROW_NUMBER() Current row number in its window
This is the toy table from http://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions:
CREATE TABLE sales(employee VARCHAR(8), `date` DATE, sale INT);
INSERT INTO sales VALUES 
  ('odin', '2025-03-01', 200),('odin', '2024-04-01', 300),
  ('odin', '2024-05-01', 400),('thor', '2025-03-01', 400),
  ('thor', '2024-04-01', 300),('thor', '2024-05-01', 500);
Aggregate functions suppress display of the rows they aggregate ...
SELECT SUM(sale) AS sum FROM sales;
+------+
| sum  |
+------+
| 2100 |
+------+
... but adding in the OVER() windowing function undoes that row suppression, so individual rows that have been aggregated appear in the result ...
SELECT employee, SUM(sale) OVER() AS sum FROM sales;
+----------+------+
| employee | sum  |
+----------+------+
| odin     | 2100 |
| odin     | 2100 |
| odin     | 2100 |
| thor     | 2100 |
| thor     | 2100 |
| thor     | 2100 |
+----------+------+
OVER() accepts two optional arguments, a PARTITION BY clause specifying how to partition row groups into windows, and an ORDER BY clause which orders window rows and makes their aggregate results row-by-row cumulative. OVER() with only a PARTITION BY clause partitions aggregate results, but each row shows the same aggregate result for its window:
 
SELECT 
  employee, date, sale, 
  SUM(sale) OVER(PARTITION BY employee) AS sum
FROM sales;
+----------+------------+------+------+
| employee | date       | sale | sum  |
+----------+------------+------+------+
| odin     | 2025-03-01 |  200 |  900 |
| odin     | 2024-04-01 |  300 |  900 |
| odin     | 2024-05-01 |  400 |  900 |
| thor     | 2025-03-01 |  400 | 1200 |
| thor     | 2024-04-01 |  300 | 1200 |
| thor     | 2024-05-01 |  500 | 1200 |
+----------+------------+------+------+
As you'd expect, OVER() with only an ORDER BY clause orders by the ORDER BY clause, and accumulates by it:
SELECT employee, date, sale, SUM(sale) OVER(ORDER BY date) AS sum
FROM sales;
+----------+------------+------+------+
| employee | date       | sale | sum  |
+----------+------------+------+------+
| odin     | 2025-03-01 |  200 |  600 | (sum for 2025-03-01)
| thor     | 2025-03-01 |  400 |  600 |
| odin     | 2024-04-01 |  300 | 1200 | (sum for 2024-04-01)
| thor     | 2024-04-01 |  300 | 1200 |
| odin     | 2024-05-01 |  400 | 2100 | (sum for 2024-05-01)
| thor     | 2024-05-01 |  500 | 2100 |
+----------+------------+------+------+
As you might not expect, when there is no PARTITION BY clause, aggregate results repeat within each window. Why? ORDER BY accepts two kinds of ordering, ROWS which means by physical row, and RANGE which means by value. RANGE is the default. Therefore ORDER BY x with no modifier, as in the above query, means ORDER BY x RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. (How did SQL designers come to take UNBOUNDED to mean "begin with the first item found"? I'd give worlds to know.) To get ORDER BY x to accumulate by each row value of x instead of by each distinct value found, specify the ROWS option instead:
SELECT 
  employee, sale, date,
  SUM(sale) OVER(ORDER BY date ROWS
                 BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                ) AS cum_sales
FROM sales;
+----------+------+------------+-----------+
| employee | sale | date       | cum_sales |
+----------+------+------------+-----------+
| odin     |  200 | 2025-03-01 |       200 |
| thor     |  400 | 2025-03-01 |       600 |
| odin     |  300 | 2024-04-01 |       900 |
| thor     |  300 | 2024-04-01 |      1200 |
| odin     |  400 | 2024-05-01 |      1600 |
| thor     |  500 | 2024-05-01 |      2100 |
+----------+------+------------+-----------+
We get the full potential of OVER() by giving it both PARTITION BY and ORDER BY arguments:
SELECT 
  employee, date, sale, 
  SUM(sale) OVER(PARTITION BY employee ORDER BY date) AS sum 
FROM sales;
+----------+------------+------+------+
| employee | date       | sale | sum  |
+----------+------------+------+------+
| odin     | 2025-03-01 |  200 |  200 | (odin sum starts to accumulate)
| odin     | 2024-04-01 |  300 |  500 |
| odin     | 2024-05-01 |  400 |  900 |
| thor     | 2025-03-01 |  400 |  400 | (thor sum starts to accumulate)
| thor     | 2024-04-01 |  300 |  700 |
| thor     | 2024-05-01 |  500 | 1200 |
+----------+------------+------+------+
Windowing functions OVER() works with all aggregate functions except GROUP_CONCAT(), also with the windowing functions implemented since MySQL 8 and MariaDB 10. Mostly, these functions compute row ranking statistics, either across the whole resultset or within windows created by OVER(). Adding windowing functionality to a query involves adding at least two elements: 1. A call to any standard aggregating function except GROUP_CONCAT(), or to one of the windowing functions (listed in the table near the top of this article), and 2. A call to OVER(), specifying the rowset (or partition or window) to which the above aggregating or windowing function should be applied. See "Calling window functions" below for info on calling these functions, and ample queries using them. The article "Within-group quotas (Top N per group)" illustrates how to use ROW_NUMBER() to rank rows within bands or rowsets. Given the toy table ...
drop table if exists tbl;
create table tbl(foo char(1), bar int);
insert into tbl values
  ('a',7),('b',5),('c',1),('d',9),('e',6),
  ('f',3),('g',2),('h',8),('i',5),('j',0);
... this query illustrates some of the row-to-row logic of windowing functions, applied to the whole resultset code>win ...
SELECT
  bar as Val,
  ROW_NUMBER() OVER win AS 'RowNo',
  LAG(bar) OVER win AS 'Lag',
  LAST_VALUE(bar) OVER win AS 'LastVal',
  CUME_DIST() OVER win AS 'CumeDist',
  100*ROUND( PERCENT_RANK() OVER win, 1 ) AS 'PctRank',
  DENSE_RANK() OVER win AS 'DenseRank'
FROM tbl
WINDOW win AS (ORDER BY bar);
+-----+-------+------+---------+----------+---------+-----------+
| Val | RowNo |  Lag | LastVal | CumeDist | PctRank | DenseRank |
+-----+-------+------+---------+----------+---------+-----------+
|   0 |     1 | NULL |       0 |      0.1 |     0.0 |         2 |
|   1 |     2 |    0 |       1 |      0.2 |    10.0 |         3 |
|   2 |     3 |    1 |       2 |      0.3 |    20.0 |         4 |
|   3 |     4 |    2 |       3 |      0.4 |    30.0 |         5 |
|   5 |     5 |    3 |       5 |      0.6 |    40.0 |         6 |
|   5 |     6 |    5 |       5 |      0.6 |    40.0 |         6 |
|   6 |     7 |    5 |       6 |      0.7 |    70.0 |         7 |
|   7 |     8 |    6 |       7 |      0.8 |    80.0 |         8 |
|   8 |     9 |    7 |       8 |      0.9 |    90.0 |         9 |
|   9 |    10 |    8 |       9 |        1 |   100.0 |        10 |
+-----+-------+------+---------+----------+--------+------------+
Here's another banding example: the toy table c has multiple values of pid per value of id:
create table c ( id smallint primary key auto_increment, pid smallint );
insert into c values(1,1),(2,1),(3,2),(4,2),(5,2),(6,3);
A combination of aggregation with windowing functions retrieves the count, the first id value per pid, and the last id value per pid, without GROUP BY:
select distinct 
  pid, 
  count(*)        over (partition by pid) as N, 
  first_value(id) over (partition by pid) as 'first', 
  last_value(id)  over (partition by pid) as 'last' 
from c 
order by pid;
+------+---+-------+------+
| pid  | N | first | last |
+------+---+-------+------+
|    1 | 2 |     1 |    2 |
|    2 | 3 |     3 |    5 |
|    3 | 1 |     6 |    6 |
+------+---+-------+------+
Calling analytic functions Many of these function take no arguments. The few that do mostly accept an expr argument, a column expression that evaluates to a scalar; other arguments are as shown, brackets indicate the argument is optional. CUME_DIST(): Returns cumulative distribution between 0 and 1 DENSE_RANK(): Returns 1 + count of distinct ranks before current row FIRST_VALUE(): Returns argument value from first row of window LAG(expr[, offset[, nullexpr]]): Returns value of expr in row lagging current window row by offset rows; show nullexpr for nulls LAST_VALUE(): Returns argument value from last row of window LEAD(expr[, offset[, nullexpr]]): Returns value of expr in row leading current window row by offset rows; show nullexpr for nulls NTH_VALUE(expr, n): Returns value of expr in nth window row NTILE(n): Returns bucket count of current window row if there are n buckets PERCENT_RANK(): Returns cumulative percent argument rank value RANK(): Returns current row rank within its window, with gaps ROW_NUMBER(): Returns current row number in its window This query on the sales table illustrates some basics of their use:
SELECT 
  employee as name, date, sale, 
  ROW_NUMBER()              OVER(PARTITION BY employee ORDER BY date) AS 'Row', 
  NTILE(2)                  OVER(PARTITION BY employee ORDER BY date) AS 'Tile', 
  FIRST_VALUE(sale)         OVER(PARTITION BY employee ORDER BY date) AS '1stVal', 
  NTH_VALUE(sale,1)         OVER(PARTITION BY employee ORDER BY date) AS '2ndVal', 
  LAST_VALUE(sale)          OVER(PARTITION BY employee ORDER BY date) AS 'LastVal', 
  LAG(sale,1,'None')        OVER(PARTITION BY employee ORDER BY date) AS 'Lag', 
  LEAD(sale,1,'None')       OVER(PARTITION BY employee ORDER BY date) AS 'Lead', 
  ROUND( CUME_DIST()        OVER(PARTITION BY employee ORDER BY date), 1 ) AS 'CumDist', 
  100*ROUND( PERCENT_RANK() OVER(PARTITION BY employee ORDER BY date), 1 ) AS 'PctRnk', 
  DENSE_RANK()              OVER(PARTITION BY employee ORDER BY date) AS 'DRnk' 
FROM sales;
+------+------------+------+-----+------+--------+--------+---------+------+------+---------+--------+------+
| name | date       | sale | Row | Tile | 1stVal | 2ndVal | LastVal | Lag  | Lead | CumDist | PctRnk | DRnk |
+------+------------+------+-----+------+--------+--------+---------+------+------+---------+--------+------+
| odin | 2025-03-01 |  200 |   1 |    1 |    200 |    200 |     200 | None | 300  |     0.3 |    0.0 |    1 |
| odin | 2024-04-01 |  300 |   2 |    1 |    200 |    200 |     300 | 200  | 400  |     0.7 |   50.0 |    2 |
| odin | 2024-05-01 |  400 |   3 |    2 |    200 |    200 |     400 | 300  | None |     1.0 |  100.0 |    3 |
| thor | 2025-03-01 |  400 |   1 |    1 |    400 |    400 |     400 | None | 300  |     0.3 |    0.0 |    1 |
| thor | 2024-04-01 |  300 |   2 |    1 |    400 |    400 |     300 | 400  | 500  |     0.7 |   50.0 |    2 |
| thor | 2024-05-01 |  500 |   3 |    2 |    400 |    400 |     500 | 300  | None |     1.0 |  100.0 |    3 |
+------+------------+------+-----+------+--------+--------+---------+------+------+---------+--------+------+
For more on windowing functions see ... https://dev.mysql.com/doc/refman/8.0/en/window-functions.html https://www.slideshare.net/DagHWanvik/sql-window-functions-for-mysql

Last updated 6 Feb 2025


Return to the Artful MySQL Tips page