You have a table containing a sequence of IDs, states and dates. The query requirement is to list state changes and their dates ordered by date, and count instances of each sequenced state. This data from a blog entry by a Russian MySQLer calling himself Quassnoi ...
+----+-------+------------------+ |ID | State | Datetime | +----+-------+------------------+ | 12 | 1 | 2024-07-16 10:00 | | 45 | 2 | 2024-07-16 13:00 | | 67 | 2 | 2024-07-16 14:40 | | 77 | 1 | 2024-07-16 15:00 | | 89 | 1 | 2024-07-16 15:30 | | 99 | 1 | 2024-07-16 16:00 | +----+-------+------------------+should produce this result ... +----+-------+------------------+-------+ |ID | State | Datetime | Count | +----+-------+------------------+-------+ | 12 | 1 | 2024-07-16 10:00 | 1 | | 45 | 2 | 2024-07-16 13:00 | 2 | | 77 | 1 | 2024-07-16 15:00 | 3 | +----+-------+------------------+-------+To assess query efficiency we need more than six rows. To generate sequential test data, a utility table of ints from 0 through 9 is helpful. We keep general-purpose database objects in a system database available to all developers on the server:
create database if not exists system; use system; drop table if exists ints; -- (See "Make a table of sequential ints") create table ints(i int); insert into ints values (0),(1),(2),(3),(4), (5),(6),(7),(8),(9);Quassnoi's test data was 10k rows tracking ID, state, date and value, with state oscillating between 1 and 2 every one thousand rows. The primary key is ID. There is a covering (date, state) index. Dates are in descending order. To simplify comparative testing, parameterise the state-change interval: set @state_interval=1000; drop table if exists history; create table history ( id int primary key, state int not null, date datetime not null, value varchar(16) not null, key(date, state) ) engine=innodb; insert into history select id, 1 + floor(((id-1)/@state_interval) mod 2), '2024-07-24' - interval id minute, concat('value ', id) from ( select 1 + ( t.i*1000 + u.i*100 + v.i*10 + w.i ) as id from system.ints t join system.ints u join system.ints v join system.ints w ) tmp; alter table history add key(date,state);For clarity you might want to add a column reporting the value associated with each state change.
As you'll see, this story begins with aggregating queries, ends with non-aggregating queries. Quassnoi found a set-based aggregating solution, but it's painfully slow ... SELECT MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, value, COUNT(*) cnt FROM ( SELECT id, date, state,value, ( SELECT id FROM history a WHERE (a.date,a.state,a.id) < (b.date,b.state,b.id) AND a.state != b.state ORDER BY date DESC, state DESC LIMIT 1 ) AS prev FROM history b ) q GROUP BY prev ORDER BY date; +------+---------------------+-------+-------------+------+ | id | date | state | value | cnt | +------+---------------------+-------+-------------+------+ | 9001 | 2024-07-17 01:20:00 | 2 | value 10000 | 1000 | | 8001 | 2024-07-17 18:00:00 | 1 | value 9000 | 1000 | | 7001 | 2024-07-18 10:40:00 | 2 | value 8000 | 1000 | | 6001 | 2024-07-19 03:20:00 | 1 | value 7000 | 1000 | | 5001 | 2024-07-19 20:00:00 | 2 | value 6000 | 1000 | | 4001 | 2024-07-20 12:40:00 | 1 | value 5000 | 1000 | | 3001 | 2024-07-21 05:20:00 | 2 | value 4000 | 1000 | | 2001 | 2024-07-21 22:00:00 | 1 | value 3000 | 1000 | | 1001 | 2024-07-22 14:40:00 | 2 | value 2000 | 1000 | | 1 | 2024-07-23 07:20:00 | 1 | value 1000 | 1000 | +------+---------------------+-------+-------------+------+30.8 seconds to walk 10k rows on a reasonable machine in MySQL 5.6, 18.6 secs in MySQL 8.0. The obvious culprit is the row-by-row correlated subquery in the subquey Select clause. We can cut execution time by a third or more by moving that subquery to the From clause ... SELECT a.* FROM history AS a WHERE a.state <> ( SELECT b.state FROM history AS b WHERE (a.date,a.state) < (b.date,b.state) ORDER BY b.date LIMIT 1 );That takes 22.7 seconds in 5.6, 12.7 secs in 8.0. Better, still slow. Removing columns not covered by the index from the Select list makes no difference. A non-aggregating row-based solution makes a radical difference: track state changes with user variables in a derived table ordered by SELECT MIN(id) AS id, MIN(date) AS date, MIN(state) AS state, value, COUNT(*) cnt FROM ( SELECT @r := @r + (@state!=state OR @state IS NULL) AS state_change, @state := state AS current, h.id, h.date, h.state, h.value FROM ( -- one-row virtual table SELECT @r:=0, @state:=NULL ) as vars CROSS JOIN history as h -- order to track state changes ORDER BY date, state ) q GROUP BY state_change ORDER BY date;0.05 seconds from MySQL 5.6 through 8.0. If how this works puzzles you, run the query with MySQL has announced that support for such use of user variables will be removed in a future release. Read on... Solution for MySQL 8.08.0 introduced Common Table Expressions (CTE s) and Windowing functions. The basic algorithm of the user variable solution was ...
1 Create a derived table that identifies rows with 2 Pick out those rows for display. Dead simple with a WITH cte AS ( SELECT id, state, date, CASE WHEN LEAD(state) OVER(ORDER BY date) != state THEN 1 END As Filter FROM history ) SELECT id, state, date FROM cte WHERE Filter=1 ORDER BY id; +------+-------+---------------------+ | id | state | date | +------+-------+---------------------+ | 1001 | 2 | 2024-07-23 07:19:00 | | 2001 | 1 | 2024-07-22 14:39:00 | | 3001 | 2 | 2024-07-21 21:59:00 | | 4001 | 1 | 2024-07-21 05:19:00 | | 5001 | 2 | 2024-07-20 12:39:00 | | 6001 | 1 | 2024-07-19 19:59:00 | | 7001 | 2 | 2024-07-19 03:19:00 | | 8001 | 1 | 2024-07-18 10:39:00 | | 9001 | 2 | 2024-07-17 17:59:00 | +------+-------+---------------------+The user variable solution with 10k rows took 0.05 secs; this takes 0.07 secs. User variable value assignments in MySQL queries are going away? No worries. Last updated 11 May 2024 |
![]() |