It happens a lot. Jane Smith gets added to the user table multiple times, so she has multiple IDs.
This example illustrates the general pattern. Table t has three rows where j=2 and k=3:
drop table if exists t;
create table t(id int,j int,k int);
insert into t values(1,2,3),(2,2,3),(3,2,3),(4,5,6);
You need a rule for deciding which ID to keep. Suppose you want to keep the smallest of each dupe set. This query finds and displays dupes on j and k, their smallest id values (id_to_keep ), and their duplicate id values (id_to_lose ):
select a.id_to_keep, a.j, a.k, b.id_to_lose, b.j, b.k
from ( -- subquery to find min ids
select j,k,min(id) as id_to_keep
from t
group by j,k
) a
join ( -- subquery to find all ids joining on j,k
select id as id_to_lose,j,k
from t
) b using(j,k)
where a.id_to_keep <> b.id_to_lose;
+------------+------+------+------------+------+------+
| id_to_keep | j | k | id_to_lose | j | k |
+------------+------+------+------------+------+------+
| 1 | 2 | 3 | 2 | 2 | 3 |
| 1 | 2 | 3 | 3 | 2 | 3 |
+------------+------+------+------------+------+------+
The IDs to delete are in the id_to_lose column. To build the deletion query, just strip out all columns except id_to_lose from the Select list in the above query, then prepend deletion syntax:
delete t
from t
join (
select b.id_to_lose
from (
select j,k,min(id) as id_to_keep
from t
group by j,k
) a
join (
select id as id_to_lose,j,k
from t
) b using(j,k)
where a.id_to_keep <> b.id_to_lose
) lose on t.id=lose.id_to_lose;
|
|