In MySQL since version 8.0.2 and MariaDB since version 10.2.2, windowing with Row_Number() allows this straightforward solution:
SELECT id, entrydate, `rank`
FROM (
SELECT
id, entrydate,
ROW_NUMBER()
OVER( PARTITION BY id
ORDER BY id,entrydate
) AS `rank`
FROM test
ORDER BY id
) AS tmp
WHERE tmp.`rank` <= 2
ORDER BY id, entrydate;
+------+------------+------+
| id | entrydate | rank |
+------+------------+------+
| 1 | 2024-05-01 | 1 |
| 1 | 2024-05-02 | 2 |
| 2 | 2024-06-01 | 1 |
| 2 | 2024-06-02 | 2 |
| 3 | 2024-07-01 | 1 |
| 3 | 2024-07-02 | 2 |
+------+------------+------+
Note that since there's now a Rank() function, rank is a reserved word needing backticks round it.
What if you can't yet get to MySQL 8 or MariaDB 10? One approach is to rank rows with user variables and pick off the top two for each key in the WHERE clause:
SELECT id, entrydate, `rank`
FROM (
SELECT
id, entrydate,
IF( @prev <> id,
@rownum := 1,
@rownum := @rownum+1
) AS `rank`,
@prev := id
FROM test.test
JOIN (SELECT @rownum:=NULL,@prev:=0) AS r
ORDER BY id, entrydate
) AS tmp
WHERE tmp.`rank` <= 2
ORDER BY id, entrydate;
This is pretty much the same query pattern as the user variable Row_Number() emulation method described in the "Row_Number()" entry. The join in the subquery is just a device for resetting the variables after reading a row.
How do Row_Number() and the user variable method compare in performance? They both return correct results for 1,000 rows in less than a hundredth of a second on a modest machine; Row_Number() is slightly faster but you'll need to be reading millions of rows to notice the difference.
If the groups are fairly small, another feasible approach is to self-join and count. With appropriate ordering, the first two rows per ID are the rows which, for a given ID, have two or fewer rows with earlier dates. If we use an inequality join with the COUNT(*) function to find the earlier rows per ID ...
... then we get our result by removing rows where the 'earlier' count exceeds 2:
SELECT t1.id, t1.entrydate, count(*) AS earlier
FROM test AS t1
JOIN test AS t2
ON t1.id=t2.id
AND t1.entrydate >= t2.entrydate
GROUP BY t1.id, t1.entrydate
HAVING earlier <= 2;
+------+------------+---------+
| id | entrydate | earlier |
+------+------------+---------+
| 1 | 2024-05-01 | 1 |
| 1 | 2024-05-02 | 2 |
| 2 | 2024-06-01 | 1 |
| 2 | 2024-06-02 | 2 |
| 3 | 2024-07-01 | 1 |
| 3 | 2024-07-02 | 2 |
+------+------------+---------+
This is about as efficient as the first method with a small table, but it compares every within-group row to every other within-group row. As the size N of a group increases, execution time increases by N2. If the query takes one minute for groups of 1,000, it will take 16 minutes for groups of 4,000, and more than four hours for groups for 16,000. The solution does not scale.
What to do? Forget GROUP BY! Manually assemble the desired query results in a temporary table from simple indexed queries, in this case, two rows per ID:
DROP TEMPORARY TABLE IF EXISTS earliers;
CREATE TEMPORARY TABLE earliers( id INT, entrydate DATE);
INSERT INTO earliers
SELECT id,entrydate
FROM test
WHERE id=1
ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
SELECT id,entrydate
FROM test
WHERE id=2
ORDER BY entrydate LIMIT 2;
INSERT INTO earliers
SELECT id,entrydate
FROM test
WHERE id=3
ORDER BY entrydate LIMIT 2;
You need one INSERT statement per grouping value. To print the result, just query the earliers table:
SELECT * FROM earliers
ORDER BY id, entrydate;
+------+------------+
| id | entrydate |
+------+------------+
| 1 | 2024-05-01 |
| 1 | 2024-05-02 |
| 2 | 2024-06-01 |
| 2 | 2024-06-02 |
| 3 | 2024-07-01 |
| 3 | 2024-07-02 |
+------+------------+
DROP TEMPORARY TABLE earliers;
Most useful reports run again and again. If that's the case for yours, automate it in a stored procedure: using a cursor and a prepared statement, auto-generate an INSERT statement for every grouping value, and return the result:
DROP PROCEDURE IF EXISTS listearliers;
DELIMITER go
CREATE PROCEDURE listearliers()
BEGIN
DECLARE curdone, vid INT DEFAULT 0;
DECLARE idcur CURSOR FOR
SELECT DISTINCT id FROM test;
DECLARE CONTINUE HANDLER
FOR SQLSTATE '02000' SET curdone = 1;
DROP TEMPORARY TABLE IF EXISTS earliers;
CREATE TEMPORARY TABLE earliers(
id INT, entrydate DATE);
SET @sql =
'INSERT INTO earliers
SELECT id,entrydate
FROM test
WHERE id=? ORDER BY entrydate LIMIT 2';
OPEN idcur;
REPEAT
FETCH idcur INTO vid;
IF NOT curdone THEN
BEGIN
SET @vid = vid;
PREPARE stmt FROM @sql;
EXECUTE stmt USING @vid;
DROP PREPARE stmt;
END;
END IF;
UNTIL curdone END REPEAT;
CLOSE idcur;
SELECT * FROM earliers
ORDER BY id,entrydate;
DROP TEMPORARY TABLE earliers;
END;
go
DELIMITER ;
CALL listearliers();