Find and remove rows with multiple IDs for the same info

from the Artful MySQL Tips List


It happens a lot. Jane Smith gets added to the user table multiple times, so she has multiple IDs.
This example illustrates the general pattern. Table t has three rows where j=2 and k=3:

drop table if exists t;
create table t(id int,j int,k int);
insert into t values(1,2,3),(2,2,3),(3,2,3),(4,5,6);

You need a rule for deciding which ID to keep. Suppose you want to keep the smallest of each dupe set. This query finds and displays dupes on j and k, their smallest id values (id_to_keep), and their duplicate id values (id_to_lose):

select a.id_to_keep, a.j, a.k, b.id_to_lose, b.j, b.k
from (            -- subquery to find min ids
  select j,k,min(id) as id_to_keep
  from t 
  group by j,k
) a
join (            -- subquery to find all ids joining on j,k
  select id as id_to_lose,j,k
  from t
) b using(j,k)
where a.id_to_keep <> b.id_to_lose;
+------------+------+------+------------+------+------+
| id_to_keep | j    | k    | id_to_lose | j    | k    |
+------------+------+------+------------+------+------+
|          1 |    2 |    3 |          2 |    2 |    3 |
|          1 |    2 |    3 |          3 |    2 |    3 |
+------------+------+------+------------+------+------+

The IDs to delete are in the id_to_lose column. To build the deletion query, just strip out all columns except id_to_lose from the Select list in the above query, then prepend deletion syntax:

delete t
from t
join (
  select b.id_to_lose
  from (
    select j,k,min(id) as id_to_keep
    from t 
    group by j,k
  ) a
  join (
    select id as id_to_lose,j,k
    from t
  ) b using(j,k)
  where a.id_to_keep <> b.id_to_lose
) lose on t.id=lose.id_to_lose;


Return to the Artful MySQL Tips page