SQL Server 2008 中的 COUNT (DISTINCT column_name) 与 COUNT (column_name) 的差异?

2021-09-10 00:00:00 sql tsql sql-server-2008 sql-server

我遇到了一个让我发疯的问题.运行下面的查询时,我得到的计数为 233,769

I'm running into a problem that's driving me nuts. When running the query below, I get a count of 233,769

 SELECT COUNT(distinct  Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)  
     ON Member_List_Link.UserID = MasterMembers.UserID   
  WHERE MasterMembers.Active = 1 And
        Member_List_Link.GroupID = 5 AND 
        MasterMembers.ValidUsers = 1 AND 
        Member_List_Link.Status = 1

但是如果我运行相同的查询没有不同的关键字,我会得到233,748

But if I run the same query without the distinct keyword, I get a count of 233,748

 SELECT COUNT(Member_List_Link.UserID)  
 FROM Member_List_Link  with (nolock)   
 INNER JOIN MasterMembers with (nolock)
   ON Member_List_Link.UserID = MasterMembers.UserID   
 WHERE MasterMembers.Active = 1 And Member_List_Link.GroupID = 5 
  AND MasterMembers.ValidUsers = 1 AND Member_List_Link.Status = 1

为了测试,我重新创建了所有表并将它们放入临时表中,然后再次运行查询:

To test, I recreated all the tables and place them into temp tables and ran the queries again:

  SELECT COUNT(distinct  #Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1

并且没有不同的关键字

  SELECT COUNT(#Temp_Member_List_Link.UserID)  
  FROM #Temp_Member_List_Link  with (nolock)   
  INNER JOIN #Temp_MasterMembers with (nolock)
    ON #Temp_Member_List_Link.UserID = #Temp_MasterMembers.UserID   
  WHERE #Temp_MasterMembers.Active = 1 And 
        #Temp_Member_List_Link.GroupID = 5 AND 
        #Temp_MasterMembers.ValidUsers = 1 AND 
        #Temp_Member_List_Link.Status = 1

顺便说一下,我通过简单地运行 (select * from Member_List_Link into #temp...) 重新创建了临时表

On a side note, I recreated the temp tables by simply running (select * from Member_List_Link into #temp...)

现在,当我使用这些临时表检查 COUNT(column) 与 COUNT(distinct column) 之间的差异时,我什么也没看到!

And now when I check to see the difference between COUNT(column) vs. COUNT(distinct column) with these temp tables, I don't see any!

那么为什么与原始表格存在差异?

So why is there a discrepancy with the original tables?

我运行的是 SQL Server 2008(开发版).

I'm running SQL Server 2008 (Dev Edition).

更新 - 包括统计资料

UPDATE - Including statistics profile

PhysicalOp 列仅用于第一个查询(无不同)

PhysicalOp column only for the first query (without distinct)

NULL
Compute Scalar
Stream Aggregate
Clustered Index Seek

PhysicalOp 列仅用于第一个查询(具有不同的)

PhysicalOp column only for the first query (with distinct)

NULL
Compute Scalar
Stream Aggregate
Parallelism
Stream Aggregate
Hash Match
Hash Match
Bitmap
Parallelism
Index Seek
Parallelism
Clustered Index Scan

第一个查询的行数和执行数(没有不同的)

Rows and Executes for the 1st query (without distinct)

1   1
0   0
1   1
1   1

第二个查询的行数和执行数(具有不同的)

Rows and Executes for the 2nd query (with distinct)

Rows    Executes
1   1
0   0
1   1
16  1
16  16
233767  16
233767  16
281901  16
281901  16
281901  16
234787  16
234787  16

将 OPTION(MAXDOP 1) 添加到第二个查询(具有不同的)

Adding OPTION(MAXDOP 1) to the 2nd query (with distinct)

Rows Executes

1           1
0           0
1           1
233767          1
233767          1
281901          1
548396          1

以及由此产生的 PhysicalOp

And the resulting PhysicalOp

NULL
Compute Scalar
Stream Aggregate
Hash Match
Hash Match
Index Seek
Clustered Index Scan

推荐答案

FROM http://msdn.microsoft.com/en-us/library/ms187373.aspxNOLOCK 相当于 READUNCOMMITTED.有关详细信息,请参阅本主题后面的 READUNCOMMITTED.

FROM http://msdn.microsoft.com/en-us/library/ms187373.aspx NOLOCK Is equivalent to READUNCOMMITTED. For more information, see READUNCOMMITTED later in this topic.

READUNCOMMITED 将读取行两次,如果它们是事务的主题 - 因为在事务处理过程中,前滚行和回滚行都存在于数据库中.

READUNCOMMITED will read rows twice if they are the subject of a transation- since both the roll foward and roll back rows exist within the database when the transaction is IN process.

默认情况下所有查询都是读提交的,不包括未提交的行

By default all queries are read committed which excludes uncommitted rows

当您插入临时表时,选择只会给您提交的行 - 我相信这涵盖了您试图解释的所有症状

When you insert into a temp table the select will give you only committed rows - I believe this covers all the symptoms you are trying to explain

相关文章