Sum for time periods
A table tracks attendance at some location:
drop table if exists t;
create table t(interval_id int,start datetime,end datetime, att int);
insert into t values
(1,'2007-01-01 08:00:00','2007-01-01 12:00:00',5 ),
(2,'2007-01-01 13:00:00','2007-01-01 17:00:00',10),
(3,'2007-01-01 10:00:00','2007-01-01 15:00:00',15),
(4,'2007-01-01 14:00:00','2007-03-07 19:00:00',20);
select * from t;
+-------------+---------------------+---------------------+------+
| interval_id | start | end | att |
+-------------+---------------------+---------------------+------+
| 1 | 2007-01-01 08:00:00 | 2007-01-01 12:00:00 | 5 |
| 2 | 2007-01-01 13:00:00 | 2007-01-01 17:00:00 | 10 |
| 3 | 2007-01-01 10:00:00 | 2007-01-01 15:00:00 | 15 |
| 4 | 2007-01-01 14:00:00 | 2007-03-07 19:00:00 | 20 |
+-------------+---------------------+---------------------+------+
In this table, att
is a delta: it tracks entrances. Actual attendance is SUM(att)
at any given moment. For example, if the attendance sum is x
at a given moment, then after we add a row with att=y
and start/end datetimes embracing that moment, attendance will be x+y
. So to retrieve total attendance at 01330h on 1 Jan 2007, we write:
SELECT SUM(att)
FROM t
WHERE t.start <= '2007-01-01 13:30:00' AND t.end >= '2007-01-01 13:30:00';
+----------+
| SUM(att) |
+----------+
| 25 |
+----------+
Then how would we extract maximum attendance during a given period, for example, maximum attendance between 1300h and 1700h?
SQL does not deal efficiently with time. Some SQL dialects offer time series enhancements to the language; MySQL does not.
And, querying time series data for aggregate statistics gets complicated very quickly.
It gets a bit simpler with a calendar table that has a row for every possible datetime value. For our example, assume a granularity of one hour and a query period of one day. Naturally a real system would require a range of dates and perhaps a finer time granularity:
create table cal(id int,dt datetime);
insert into cal values(1,'2007-1-1 01:00:00');
insert into cal values(2,'2007-1-1 02:00:00');
insert into cal values(3,'2007-1-1 03:00:00');
insert into cal values(4,'2007-1-1 04:00:00');
insert into cal values(5,'2007-1-1 05:00:00');
insert into cal values(6,'2007-1-1 06:00:00');
insert into cal values(7,'2007-1-1 07:00:00');
insert into cal values(8,'2007-1-1 08:00:00');
insert into cal values(9,'2007-1-1 09:00:00');
insert into cal values(10,'2007-1-1 10:00:00');
insert into cal values(11,'2007-1-1 11:00:00');
insert into cal values(12,'2007-1-1 12:00:00');
insert into cal values(13,'2007-1-1 13:00:00');
insert into cal values(14,'2007-1-1 14:00:00');
insert into cal values(15,'2007-1-1 15:00:00');
insert into cal values(16,'2007-1-1 16:00:00');
insert into cal values(17,'2007-1-1 17:00:00');
insert into cal values(18,'2007-1-1 18:00:00');
insert into cal values(19,'2007-1-1 19:00:00');
insert into cal values(20,'2007-1-1 20:00:00');
insert into cal values(21,'2007-1-1 21:00:00');
insert into cal values(22,'2007-1-1 22:00:00');
insert into cal values(23,'2007-1-1 23:00:00');
insert into cal values(24,'2007-1-1 24:00:00');
To accumulate the maximum attendance sum, collect target values for defined periods in an inner query, and sum them from the outer query:
SELECT SUM( att )
FROM (
SELECT
t.start AS PeriodStart,
t.end AS PeriodEnd,
MIN(cal.dt) + INTERVAL 1 HOUR AS CountBegin,
MAX(cal.dt) AS CountEnd,
t.att
FROM t
JOIN cal ON cal.dt >= t.start AND cal.dt < t.end
GROUP BY PeriodStart, PeriodEnd
HAVING CountBegin < '2007-01-01 17:00:00' AND CountEnd > '2007-01-01 11:00:00'
) AS periods;
+------------+
| SUM( att ) |
+------------+
| 45 |
+------------+
If the data is more complicated, eg if we also need to track exits, the period logic needs refinement but the principle remains the same.