MySQL删除重复记录但保持最新
我有唯一的 id
和 email
字段.电子邮件被复制.我只想保留所有重复项的一个电子邮件地址,但使用最新的 id
(最后插入的记录).
I have unique id
and email
fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id
(the last inserted record).
我怎样才能做到这一点?
How can I achieve this?
推荐答案
想象你的表 test
包含以下数据:
Imagine your table test
contains the following data:
select id, email
from test;
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
所以,我们需要找到所有重复的邮件并删除所有邮件,但要删除最新的 id.
在这种情况下,aaa
、bbb
和eee
是重复的,所以我们要删除 ID 1、7、2 和 6.
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa
, bbb
and eee
are repeated, so we want to delete IDs 1, 7, 2 and 6.
要做到这一点,首先我们需要找到所有重复的电子邮件:
To accomplish this, first we need to find all the repeated emails:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
然后,从这个数据集中,我们需要为这些重复的电子邮件中的每一个找到最新的 id:
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
最后,我们现在可以删除所有这些 Id 小于 LASTID 的电子邮件.所以解决方案是:
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
我现在没有在这台机器上安装 mySql,但应该可以工作
I don't have mySql installed on this machine right now, but should work
上面的删除有效,但我发现了一个更优化的版本:
The above delete works, but I found a more optimized version:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
您可以看到它删除了最旧的重复项,即 1、7、2、6:
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
另一个版本,是由 Rene Limon
delete from test
where id not in (
select max(id)
from test
group by email)
相关文章