SQL Server 重复记录

2021-09-10 00:00:00 sql tsql sql-server-2008 sql-server


Hello I have done the following query below:

UPDATE [dbo].[TestData]
SET Duplicate = 'Duplicate within'
WHERE exists 
(SELECT telephone, COUNT(telephone)
FROM [dbo].[TestData]
GROUP BY telephone
HAVING (COUNT (telephone)>1))

在那个表中实际上有 9 个重复的电话记录.

In that table there are actually 9 duplicate telephone records.

查询将整个重复列标记为重复范围内",而不是 9 条记录.

The query is stamping the entire duplicate column as 'Duplicate within' instead of the 9 records.

我还开发了下一个以下查询,它将 18 个重复记录取消标记为 9 个.

The next following query I have also developed which will unstamp the 18 duplicate records to 9.

UPDATE [dbo].[TestData]
SET Duplicate = 'NO'
WHERE ID IN (SELECT MIN(ID) FROM [dbo].[TestData] GROUP BY telephone)


This query is not working neither could anyone please guide me on where I am going wrong!


您可以使用 where exists,但这种方式更容易编写/读取,并且性能差异很可能很小.

You could do this using where exists, but it's easier to write/read this way and the performance difference is most likely minimal.

update TestData set 
    Duplicate = 'Duplicate within'
    Telephone in (
        select Telephone 
        from TestData 
        group by Telephone 
        having count(*) > 1

要单独保留每个电话号码的第一条记录并仅标记具有相同电话号码的后续记录,请使用 cte,如下所示:

To leave the first record with each telephone number alone and mark only the subsequent records with the same telephone number, use a cte as follows:

;with NumberedDupes as (
        row_number() over (partition by Telephone order by Telephone) seq
    from TestData
update NumberedDupes set Duplicate = 'Duplicate within' where seq > 1
