Straight SQL isn't designed for row-to-row calculations---they can be done, but they're seriously inefficient, e.g., given a chequebook-like table tbl( id int primary key,d date,value decimal(10,2), key(date) ) , this query will return a running balance ...
select
d, value,
(select sum(value) as balance
from tbl as running
where running.d <= tbl.d
) as balance
from tbl
order by d;
... but at the cost of recomputing the running sum from the top for every row. On a big table, even with a date index, that'll be slow.
Originally cursors were designed to remedy the awkwardness and slowness of SQL with row-to-row computations, but cursors are much slower than straight SQL.
A user variable can sidestep these inefficiencies by maintaining the running total in memory. Initialise the variable, then adjust its value as desired in the appropriate SELECT expression:
SET @balancel=0;
SELECT d, value, @balance:=@balance+value AS RunningSum
FROM tbl
ORDER BY d;
This example tracks the running balance and how much money from the most recent deposit remains ...
DROP TABLE IF EXISTS chequebook;
CREATE TABLE chequebook (
entry_date timestamp default now() PRIMARY KEY,
entry_item varchar(48) NOT NULL DEFAULT '',
entry_amount decimal(10,2) NOT NULL DEFAULT 0.00
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO chequebook (entry_date,entry_item,entry_amount) VALUES
('2010-01-02 12:34:00','Deposit A',215.56),
('2010-01-02 21:44:00','Withdrawal A' ,-23.34),
('2010-01-03 10:44:00','Withdrawal B',-150.15),
('2010-01-03 15:44:00','Deposit B',154.67),
('2010-01-04 18:44:00','Withdrawal C',-65.09),
('2010-01-05 08:44:00','Withdrawal D',-74.23),
('2010-01-06 14:44:00','Deposit C',325.12),
('2010-01-06 20:44:00','Withdrawal E',-80.12),
('2010-01-07 04:44:00','Withdrawal F',-110.34),
('2010-01-07 16:44:00','Withdrawal G',-150.25),
('2010-01-08 16:44:00','Withdrawal H',-23.90),
('2010-01-08 21:44:00','Withdrawal I',-75.66),
('2010-01-08 22:44:00','Deposit C',275.78),
('2010-01-09 11:44:00','Withdrawal K',-85.99),
('2010-01-09 21:44:00','Withdrawal J',-100.00);
set @depos=0, @total=0;
select
entry_date,
entry_item,
entry_amount,
if( entry_amount>0, @depos:=entry_amount, @depos:=@depos+entry_amount ) as depos_bal,
@total:=@total+entry_amount as net_bal
from chequebook
order by entry_date;
+---------------------+--------------+--------------+-----------+---------+
| entry_date | entry_item | entry_amount | depos_bal | net_bal |
+---------------------+--------------+--------------+-----------+---------+
| 2010-01-02 12:34:00 | Deposit A | 215.56 | 215.56 | 215.56 |
| 2010-01-02 21:44:00 | Withdrawal A | -23.34 | 192.22 | 192.22 |
| 2010-01-03 10:44:00 | Withdrawal B | -150.15 | 42.07 | 42.07 |
| 2010-01-03 15:44:00 | Deposit B | 154.67 | 154.67 | 196.74 |
| 2010-01-04 18:44:00 | Withdrawal C | -65.09 | 89.58 | 131.65 |
| 2010-01-05 08:44:00 | Withdrawal D | -74.23 | 15.35 | 57.42 |
| 2010-01-06 14:44:00 | Deposit C | 325.12 | 325.12 | 382.54 |
| 2010-01-06 20:44:00 | Withdrawal E | -80.12 | 245.00 | 302.42 |
| 2010-01-07 04:44:00 | Withdrawal F | -110.34 | 134.66 | 192.08 |
| 2010-01-07 16:44:00 | Withdrawal G | -150.25 | -15.59 | 41.83 |
| 2010-01-08 16:44:00 | Withdrawal H | -23.90 | -39.49 | 17.93 |
| 2010-01-08 21:44:00 | Withdrawal I | -75.66 | -115.15 | -57.73 |
| 2010-01-08 22:44:00 | Deposit C | 275.78 | 275.78 | 218.05 |
| 2010-01-09 11:44:00 | Withdrawal K | -85.99 | 189.79 | 132.06 |
| 2010-01-09 21:44:00 | Withdrawal J | -100.00 | 89.79 | 32.06 |
+---------------------+--------------+--------------+-----------+---------+
If your platform does not permit multiple queries per connection, and if you can tolerate the O(N2) inefficiency, a self-join does the same job as an accumulating user variable ...
SELECT c.id, c.value, d.RunningSum
FROM tbl c
JOIN (
SELECT a.id, SUM(b.value) AS RunningSum
FROM tbl a
LEFT JOIN tbl b ON b.id <= a.id
GROUP BY a.id
) d USING (id);
Accumulating a sum and resetting it (ie subgrouping it) on another column value is a little harder ...
drop table if exists t;
create table t( id smallint primary key, a char(1), v int );
insert into t values(1,'a',10),(2,'b',20),(3,'b',30),(4,'c',40);
select x.a, x.v, sum(x.v) as cum
from t x
join t y on y.a = x.a and y.id <= x.id
group by x.id
order by x.a, x.id;
+------+------+------+
| a | v | cum |
+------+------+------+
| a | 10 | 10 |
| b | 20 | 20 |
| b | 30 | 60 |
| c | 40 | 40 |
+------+------+------+
Since MySQL 8.0.12 we can do row-to-row calculations with using any of the above algorithms; here is a running sum CTE using the Row_Number() function to generate a sequential row id and the join algorithm for accumulating a sum...
with recursive -- so `x` can self-reference
y as ( -- sequential row id from row_number()
select d, value, row_number() over(order by d) as rn
from tbl
),
x as ( -- cumulative sum
select d, rn, value, value as balance
from y
where rn=1 -- first 'seed' row
union all
select y.d, y.rn, y.value, x.balance + y.value as balance
from x
join y on y.rn=x.rn+1 -- `x` walks itself
)
select d, value, balance -- display result
from x
order by x.rn;
|
|