ISO SQL defines ROW_NUMBER() OVER as a "windowing" or "analytic" function with an optional PARTITION clause for generating a derived row number column in a resultset. Several RDBMSs—including DB2, Oracle and SQL Server—have long implemented it. It came into MariaDB with version 10.2, into MySQL with version 8.0.2.
The whole list of windowing/analytic functions:
CUME_DIST()
Cumulative distribution value
DENSE_RANK()
Rank of current row in its partition without gaps
FIRST_VALUE()
Argument value from first row of window frame
LAG()
Argument value from row lagging current row in partition
LAST_VALUE()
Argument value from last row of window frame
LEAD()
Argument value from row leading current row within partition
NTH_VALUE()
Argument value from Nth row of window frame
NTILE()
Bucket number of current row in its partition
PERCENT_RANK()
Percentage rank value
RANK()
Current row rank within its partition, with gaps
ROW_NUMBER()
Current row number in its partition
Here is the simplest possible Row_Number() example. Given a table with two columns i and j, generate a resultset that has a derived sequential row_number column taking the values 1,2,3,... for a defined ordering of j which resets to 1 when the value of i changes:
DROP TABLE IF EXISTS test;
CREATE TABLE test(i int,j int);
INSERT INTO test VALUES
(3,31),(1,11),(4,14),(1,13),(2,21),(1,12),(2,22),(3,32),(2,23),(3,33);
The SQL Server query that gives the correct answer is:
SELECT
i, j,
ROW_NUMBER() OVER ( PARTITION BY i ORDER BY j ) AS row_number
FROM test
ORDER BY i,j;
This works in MariaDB 10.2 and MySQL 8.0.2. Are you stuck if you can't use those versions? No. Here are two ways to get the desired result without Row_Number().
First, with user variables:
SET @iprev=0, @jprev=0;
SELECT i, j, row_number
FROM (
SELECT j, @jprev := if(@iprev = i, @jprev+1, 1) AS row_number, @iprev := i AS i
FROM test
ORDER BY i,j
) AS tmp;
The second method uses a join and aggregation, but is correct only if there are no duplicate values of j
SELECT a.i, a.j, count(*) as row_number
FROM test a
JOIN test b ON a.i=b.i AND a.j >= b.j
GROUP BY a.i, a.j ;
To simplify generalising these query patterns to multiple ordering columns, use self-documenting column names:
DROP TABLE IF EXISTS test;
CREATE TABLE test(partition int, ord1 int, ord2 int);
INSERT INTO test
VALUES (3,31,55),(1,11,19),(4,14,23),(2,22,8),(2,22,42),(1,13,56),
(2,21,77),(2,21,7),(1,12,17),(2,23,92),(3,32,24),(3,33,62);
The user variable solution easily handles multiple ordering columns: just add the ordering column names to the outer query's SELECT list and the inner query's ORDER BY list:
In the aggregating solution for multiple columns, the theta join gets more complex as ordering columns are added:
SELECT a.partition, a.ord1, a.ord2, count(*) as row_number
FROM test a
JOIN test b
ON a.partition=b.partition
AND (a.ord1>b.ord1 OR (a.ord1=b.ord1 AND a.ord2>=b.ord2))
GROUP BY a.partition, a.ord1, a.ord2 ;