如何删除重复的行?
从相当大的 SQL Server
表(即 300,000+ 行)中删除重复行的最佳方法是什么?
What is the best way to remove duplicate rows from a fairly large SQL Server
table (i.e. 300,000+ rows)?
当然,由于 RowID
标识字段的存在,这些行不会完全重复.
The rows, of course, will not be perfect duplicates because of the existence of the RowID
identity field.
MyTable
RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null
推荐答案
假设没有空值,你 GROUP BY
唯一列,SELECT
MIN (或 MAX)
RowId 作为要保留的行.然后,删除所有没有行 ID 的内容:
Assuming no nulls, you GROUP BY
the unique columns, and SELECT
the MIN (or MAX)
RowId as the row to keep. Then, just delete everything that didn't have a row id:
DELETE FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
如果你有一个 GUID 而不是整数,你可以替换
In case you have a GUID instead of an integer, you can replace
MIN(RowId)
与
CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))
相关文章