## Median

### from the Artful Common Queries page

 Statistically, the median is the middle value--the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half: ``` SELECT a.hours As Median FROM BulbLife As a, bulbLife AS b GROUP BY a.Hours Having Sum( Sign( 1-Sign(b.Hours-a.Hours ))) = Floor(( Count(*)+1)/2)  ; ``` This formula works, but it doesn't scale—it's O(n*n)—and it's awkward to use. So we posted a MySQL implementation of Torben Mogenson's algorithm for calculating the median (http://ndevilla.free.fr/median/median/node20.html). It's said to be amongst the fastest, but it proved too slow. Then Joe Wynne offered an algorithm which looks correct and does scale. Here it is as a MySQL stored procedure: ``` DROP PROCEDURE IF EXISTS Median; DELIMITER | CREATE PROCEDURE Median( tbl CHAR(64), col CHAR(64), OUT res DOUBLE ) BEGIN   DECLARE arg CHAR(64);   SET @sql = CONCAT( 'SELECT ((COUNT(*))/2) INTO @c FROM ', tbl );   PREPARE stmt FROM @sql;   EXECUTE stmt;   DROP PREPARE stmt;   SET @a = CONVERT(FLOOR(@c), SIGNED);   IF @a = @c THEN      BEGIN       SET @a = @a-1;       SET @b = 2;       SET arg = CONCAT( 'AVG(', col, ')' );     END;   ELSE     BEGIN       SET @b = 1;       SET arg = col;     END;   END IF;   SET @sql = CONCAT('SELECT ', arg, ' INTO @res FROM (SELECT ', col, ' FROM ', tbl,                      ' ORDER BY ', col, ' LIMIT ?,?) as tmp');   PREPARE stmt FROM @sql;   EXECUTE stmt USING @a, @b;   DROP PREPARE stmt;   SET res=@res; END; | DELIMITER ; ``` What we really need is a perfectly general function that scales and can be dropped into queries. As it turns out, we can have two of those three requirements—scalability and drop-in-ability. We can't have generality because MySQL functions can't invoke `PREPARE`. Still, two of three ain't bad. Stay tuned.