Emulate Row_Number()
ISO SQL defines a ROW_NUMBER() OVER
function with an optional PARTITION
clause for generating a derived row number column in a resultset. Several RDBMSs—including DB2, Oracle and SQL Server—implement it. Here is the simplest possible example. Given a table with two columns i and j, generate a resultset that has a derived sequential row_number column taking the values 1,2,3,... for a defined ordering of j which resets to 1 when the value of i changes:
DROP TABLE IF EXISTS test;
CREATE TABLE test(i int,j int);
INSERT INTO test
VALUES (3,31),(1,11),(4,14),(1,13),(2,21),(1,12),(2,22),(3,32),(2,23),(3,33);
The result must look like this:
+------+------+------------+
| i | j | row_number |
+------+------+------------+
| 1 | 11 | 1 |
| 1 | 12 | 2 |
| 1 | 13 | 3 |
| 2 | 21 | 1 |
| 2 | 22 | 2 |
| 2 | 23 | 3 |
| 3 | 31 | 1 |
| 3 | 32 | 2 |
| 3 | 33 | 3 |
| 4 | 14 | 1 |
+------+------+------------+
The SQL Server query for this is:
SELECT
i, j,
ROW_NUMBER() OVER ( PARTITION BY i ORDER BY j ) AS row_number
FROM test;
Can we have this result in MySQL? Here are two ways to get it. First, with user variables:
SET @iprev=0, @jprev=0;
SELECT i, j, row_number
FROM (
SELECT j, @jprev := if(@iprev = i, @jprev+1, 1) AS row_number, @iprev := i AS i
FROM test
ORDER BY i,j
) AS tmp;
The second method uses a join and aggregation:
SELECT a.i, a.j, count(*) as row_number
FROM test a
JOIN test b ON a.i=b.i AND a.j >= b.j
GROUP BY a.i, a.j ;
To simplify generalising these query patterns to multiple ordering columns, use self-documenting column names:
DROP TABLE IF EXISTS test;
CREATE TABLE test(partition int, ord1 int, ord2 int);
INSERT INTO test
VALUES (3,31,55),(1,11,19),(4,14,23),(2,22,8),(2,22,42),(1,13,56),
(2,21,77),(2,21,7),(1,12,17),(2,23,92),(3,32,24),(3,33,62);
The user variable solution easily handles multiple ordering columns: just add the ordering column names to the outer query's SELECT
list and the inner query's ORDER BY
list:
SET @partitionPrev=0, @ordPrev=0;
SELECT partition, ord1, ord2, row_number
FROM (
SELECT
ord1, ord2,
@ordPrev := If(@partitionPrev = partition, @ordPrev+1, 1) as row_number,
@partitionPrev := partition AS partition
FROM test
ORDER BY partition, ord1, ord2
) AS tmp;
+-----------+------+------+------------+
| partition | ord1 | ord2 | row_number |
+-----------+------+------+------------+
| 1 | 11 | 19 | 1 |
| 1 | 12 | 17 | 2 |
| 1 | 13 | 56 | 3 |
| 2 | 21 | 7 | 1 |
| 2 | 21 | 77 | 2 |
| 2 | 22 | 8 | 3 |
| 2 | 22 | 42 | 4 |
| 2 | 23 | 92 | 5 |
| 3 | 31 | 55 | 1 |
| 3 | 32 | 24 | 2 |
| 3 | 33 | 62 | 3 |
| 4 | 14 | 23 | 1 |
+-----------+------+------+------------+
In the aggregating solution for multiple columns, the theta join gets more complex as ordering columns are added:
SELECT a.partition, a.ord1, a.ord2, count(*) as row_number
FROM test a
JOIN test b
ON a.partition=b.partition
AND (a.ord1>b.ord1 OR (a.ord1=b.ord1 AND a.ord2>=b.ord2))
GROUP BY a.partition, a.ord1, a.ord2 ;