Aggregates from bands of values

from the Artful Common Queries page


Aggregating by bands of values can clarify data patterns. The banding trick is a simple transformation on the banding column:
 * Floor( banding column / band width )
so to count and average scores in bands of 10, ie 0-9,10-19 and so on ...
create table scores(score int);
insert into scores values(5),(15),(25),(35);
SELECT 10 * FLOOR( score / 10  ) AS  Bottom,
       10 * FLOOR( score / 10  ) + 9 AS Top, 
       Count( score ),
       Avg( score ) 
FROM scores
GROUP BY 10  * FLOOR( score / 10  );
+--------+------+----------------+--------------+
| Bottom | Top  | Count( score ) | Avg( score ) |
+--------+------+----------------+--------------+
|      0 |    9 |              1 |       5.0000 |
|     10 |   19 |              1 |      15.0000 |
|     20 |   29 |              1 |      25.0000 |
|     30 |   39 |              1 |      35.0000 |
+--------+------+----------------+--------------+
Here's another example, from the world database, grouping UK cities in bands of 100,000:
select 
  floor(population/100000)*100000 as 'Population Band', 
  group_concat(name) as Cities
from world.city
where countrycode='GBR'
group by  1
order by  1 desc limit 5
+-----------------+--------------------------------------------------------+
| Population Band | Cities                                                 |
+-----------------+--------------------------------------------------------+
|         7200000 | London                                                 |
|         1000000 | Birmingham                                             |
|          600000 | Glasgow                                                |
|          400000 | Bristol,Leeds,Manchester,Sheffield,Edinburgh,Liverpool |
|          300000 | Cardiff,Coventry                                       |
+-----------------+--------------------------------------------------------+
Here is an example from a MySQL forum. You have a table of IDs and point scores, and you wish to show the distribution of counts across ranges. A traditional way of doing this is to build a table of ranges, then join from ranges to scores:
drop table if exists points,ranges;
create table points (
  id int not null auto_increment primary key,
  points int not null
);
create table ranges (
  low int not null,
  high int not null
);
insert into points (points) values 
  (456),(401),(543),(234),(303),(521),(478),(643),(575),(456),(432),
  (312),(564),(423),(411),(395),(543);
insert into ranges (low, high) values
  (0,50),(51,100),(101,150),(151,200),(201,250),(251,300),(301,350),
  (351,400),(401,450),(451,500),(501,550),(551,600),(601,650),(651,700);
select r.low, r.high, count(p.id)
from ranges r
left join points p on p.points between r.low and r.high
group by r.low, r.high;
That's fine, but the ranges should be consistent, eg 0-49,50-99,100-149..., and if you keep a simple utility table of ints 0..9, you never need to build another range table:
drop table if exists ints,ranges;
create table ints(i tinyint); /* See "Make a table of sequential ints" */
insert into ints values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
select ranges.low, ranges.high, count(p.id)
from (
  select ( t.i*100 + u.i*10 + v.i ) as low,( t.i*100 + u.i*10 + v.i ) + 49 as high
  from ints t
  join ints u
  join ints v
  where (t.i*100 + u.i*10 + v.i) mod 50 = 0 and (t.i*100 + u.i*10 + v.i) < 701
) ranges
left join points p on p.points between ranges.low and ranges.high
group by ranges.low;
+-----+------+-------------+
| low | high | count(p.id) |
+-----+------+-------------+
|   0 |   50 |           0 |
|  51 |  100 |           0 |
| 101 |  150 |           0 |
| 151 |  200 |           0 |
| 201 |  250 |           1 |
| 251 |  300 |           0 |
| 301 |  350 |           2 |
| 351 |  400 |           1 |
| 401 |  450 |           4 |
| 451 |  500 |           3 |
| 501 |  550 |           3 |
| 551 |  600 |           2 |
| 601 |  650 |           1 |
| 651 |  700 |           0 |
+-----+------+-------------+
If you pass banding query results like the above to the Google Visualisation Line Chart or Histogram widget, eg with TheUsual, you get a nice graphic of the result. Another example (from the MySQL Newbie Forum). > you have a table tbl(id,temp). Every 5 secs a new row is added. You want hourly average temps from the last 17280 rows. It's a 3-step ... (i) assemble a rowset of the last 17280 rows. One way, assuming there are no missing ids ...
select id 
from tbl
where id >= (select count(*) from tbl)-17280
order by id ;
Or if there may be missing ids ...
set @n = (select count(*) from tbl)-17280;
select id 
from tbl
order by i limit @n,17280 ;
(ii) Compute range groups with the expression id div 720. We'll group by that. (iii) take the count and average in each group. So we have ...
set @n = (select count(*) from tbl)-17280;
select i div 720 as grp, count(*) as N, avg(temp) as avgtemp
from (
  select id, temp 
  from tbl
  order by i limit @n,17280 ;
) temps
group by grp
order by grp;
One of the windowing functions available in MySQL since version 8.0 takes banding analysis to another level. Suppose you wish to see order sums partitioned into deciles. In the Northwind sample database, all the info you need for this is in the orderdetails table ...
 
create table orderdetails (
  orderid int not null,
  productid int not null,
  unitprice float not null default '0',
  quantity smallint not null default '1',
  discount float not null default '0',
  timestamp timestamp null default null,
  primary key (orderid,productid)
An order total is quantoty*proce-discount) summed over orderid ...
  select 
    orderid,
    round( sum(unitprice*quantity-discount),2 ) as amt,
    ntile(10) over(
      order by sum(unitprice*quantity-discount) desc)
    as decile
    from orderdetails
    group by orderid;
but that query will return a row for each of 830 orders. The windowing function NTILE(n) partitions the order sums into n bands ...
select 
  decile, 
  min(amt) as min, max(amt) as max,
  round( sum(amt), 2 ) as dectotal,
  round( avg(amt), 2 ) as decavg
from (
  select 
    orderid,
    round( sum(unitprice*quantity-discount),2 ) as amt,
    ntile(10) over(
      order by sum(unitprice*quantity-discount) desc
    ) as decile
    from orderdetails
    group by orderid
) o
group by decile
order by decile;
+--------+---------+----------+-----------+---------+
| decile | min     | max      | dectotal  | decavg  |
+--------+---------+----------+-----------+---------+
|      1 | 3584.00 | 17249.90 | 518445.82 | 6246.34 |
|      2 | 2301.75 |  3523.40 | 230980.43 | 2782.90 |
|      3 | 1819.30 |  2300.00 | 168728.37 | 2032.87 |
|      4 | 1414.80 |  1814.80 | 133272.14 | 1605.69 |
|      5 | 1015.70 |  1412.40 | 100496.42 | 1210.80 |
|      6 |  756.00 |  1015.50 |  74133.58 |  893.18 |
|      7 |  565.50 |   755.00 |  54590.20 |  657.71 |
|      8 |  391.33 |   560.00 |  39638.53 |  477.57 |
|      9 |  200.00 |   390.00 |  24989.31 |  301.08 |
|     10 |   12.50 |   200.00 |   9062.75 |  109.19 |
+--------+---------+----------+-----------+---------+

Last updated 27 May 2024