SQL Server 2008 R2(.Net 2.0 应用程序)中的死锁问题

有问题的 Sql Server 2008 R2 实例是一个重负载 OLTP 生产服务器.几天前出现了僵局问题,但仍未解决.我们收到了 Xml 死锁报告,其中列出了死锁中涉及的存储过程和一些其他细节.我将首先尝试列出此 xml 中的事实:

The Sql Server 2008 R2 instance in question is a heavy load OLTP production server. The deadlock issue came up a few days back and is still unresolved. We received the Xml deadlock report that listed the stored procedures involved in the deadlock and some other details. I'll try to list down the facts from this xml first:

死锁涉及两个存储过程,例如 SP1 和 SP2.根据报告,SP1 运行在隔离级别Serializable",SP2 运行在ReadCommitted".

Two stored procedures are involved in the deadlock, say SP1 and SP2. According to the report SP1 was running in Isolation level "Serializable" and SP2 was running in "ReadCommitted".

我们调查了以下内容:

  • 我们是否设置了 SP1 的 IsolationLevel在 SP 或在可序列化"代码?- 没有.

  • Are we setting IsolationLevel of SP1 to "Serializable" inside SP or in Code? - No.

是任何其他 SP 的 IsolationLevel是可序列化"调用 SP1 吗?- 没有.

Is any other SP whose IsolationLevel is "Serializable" calling SP1? - No.

SP1 使用的表是由任何其他具有隔离功能的 SP级别为可序列化"?- 是的.有些 SP 具有隔离功能级别设置为可序列化"和访问与 SP1 相同的表,但我们不知道他们是否当时正在运行死锁还是不死锁
报告仅显示 SP1 和 SP2.

Are the table used by SP1 called by any other SP that has Isolation Level as "Serializable"? - Yes. There are SPs that have Isolation Level set to "Serializable" and access the same tables as SP1, but we don't know whether they were running at the time of deadlock or not as the deadlock
report only showed SP1 and SP2.

思路:
我们考虑了以下可能的原因:

Lines of thought:
We have considered the following possible causes:

  • 发生死锁是因为 SP1 是作为可序列化"运行.- 为什么是这个 SP 运行在 Serializable 时我没设置?是隔离级别升级(就像锁一样)?如果我们解决了这个问题并让它运行为ReadCommitted,问题会不会是解决了吗?

  • Deadlock is occurring because SP1 is running as "Serializable". - Why is this SP running in Serializable when I haven't set it? Is the Isolation level escalating (like locks do)? If we figure this out and make it run as ReadCommitted, will the issue be resolved?

任何其他 SP 正在运行,锁定SP1 使用的表并导致SP1 和 SP2 之间的死锁.-这个SP不会列在死锁报告?能否陷入僵局报告错过了这样的依赖?如果是那么我们可能只会得到部分信息.这个还是不行解决 SP1 的运行方式不过可序列化.

Any other SP is running, locking the table used by SP1 and causes a deadlock between SP1 and SP2. - Wouldn't this SP be listed in the deadlock report? Can the deadlock report miss such a dependency? If yes then we might only be getting partial information. This still doesn't resolve how SP1 is running in Serializable, though.

建议:

  • 如果这些信息不充分在解决问题时,我该如何从 SQL 获取更多信息服务器为我的目的和什么我应该尝试收集信息吗?

  • If this information is not sufficient in resolving the problem, how can I obtain more information from SQL Server for my purpose and what information should I try to collect?

你会想到的任何其他思路追求解决这个问题?

Any other Line of Thought that you'd pursue in solving this issue?

更新:
这是死锁的跟踪日志信息.我已经更改了 SP 等的名称,但已检查并确认更改不会遗漏任何相关信息.查看代码后面的注释以获取有关表格等的更多信息.

Update:
This is the trace log information for the deadlock. I've changed the names of SPs etc. but have checked and verified that the changes don't miss out any relevant information. Check the notes succeeding the code for more info on tables etc.

?<EVENT_INSTANCE>
  <EventType>DEADLOCK_GRAPH</EventType>
  <PostTime>2010-09-07T11:27:47.870</PostTime>
  <SPID>16</SPID>
  <TextData>
    <deadlock-list>
      <deadlock victim="process5827708">
        <process-list>
          <process id="process5827708" taskpriority="0" logused="0" waitresource="KEY: 7:72057594228441088 (8d008a861f4f)"
                   waittime="5190" ownerId="1661518243" transactionname="SELECT" lasttranstarted="2010-09-07T11:27:42.657"
                   XDES="0x80bf3b50" lockMode="RangeS-S" schedulerid="4" kpid="2228" status="suspended" spid="76" sbid="0"
                   ecid="0" priority="0" trancount="0" lastbatchstarted="2010-09-07T11:27:42.657"
                   lastbatchcompleted="2010-09-07T11:27:42.657" clientapp=".Net SqlClient Data Provider"
                   hostname="xxx" hostpid="5988" loginname="xxx" isolationlevel="serializable (4)"
                   xactid="1661518243" currentdb="7" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
            <executionStack>
              <frame procname="SP1" line="12" stmtstart="450" stmtend="6536"
                     sqlhandle="0x0300070090cbdc7742720c00e99d00000100000000000000">
                Select ... from Table1, Table2, Table4, Table5
              </frame>
            </executionStack>
            <inputbuf>
              Proc [Database Id = 7 Object Id = 2010958736]
            </inputbuf>
          </process>
          <process id="process5844bc8" taskpriority="0" logused="1873648" waitresource="KEY: 7:72057594228441088 (0e00ce038ed0)"
                   waittime="4514" ownerId="1661509575" transactionname="user_transaction" lasttranstarted="2010-09-07T11:27:40.423"
                   XDES="0x37979ae90" lockMode="X" schedulerid="7" kpid="3260" status="suspended" spid="104" sbid="0" ecid="0"
                   priority="0" trancount="2" lastbatchstarted="2010-09-07T11:27:43.350" lastbatchcompleted="2010-09-07T11:27:43.350"
                   clientapp=".Net SqlClient Data Provider" hostname="xxx" hostpid="5988" loginname="xxx"
                   isolationlevel="read committed (2)" xactid="1661509575" currentdb="7" lockTimeout="4294967295"
                   clientoption1="673185824" clientoption2="128056">
            <executionStack>
              <frame procname="SP2" line="68" stmtstart="5272" stmtend="5598"
                     sqlhandle="0x030007003432350f109a0c00e99d00000100000000000000">
                UPDATE Table1 ...
              </frame>
            </executionStack>
            <inputbuf>
              Proc [Database Id = 7 Object Id = 255144500]
            </inputbuf>
          </process>
        </process-list>
        <resource-list>
          <keylock hobtid="72057594228441088" dbid="7" objectname="Table1" indexname="Index1"
                   id="lock448e2c580" mode="X" associatedObjectId="72057594228441088">
            <owner-list>
              <owner id="process5844bc8" mode="X" />
            </owner-list>
            <waiter-list>
              <waiter id="process5827708" mode="RangeS-S" requestType="wait" />
            </waiter-list>
          </keylock>
          <keylock hobtid="72057594228441088" dbid="7" objectname="Table1" indexname="Index1"
                   id="lock2ba335880" mode="RangeS-S" associatedObjectId="72057594228441088">
            <owner-list>
              <owner id="process5827708" mode="RangeS-S" />
            </owner-list>
            <waiter-list>
              <waiter id="process5844bc8" mode="X" requestType="wait" />
            </waiter-list>
          </keylock>
        </resource-list>
      </deadlock>
    </deadlock-list>
  </TextData>
  <TransactionID />
  <LoginName>xx</LoginName>
  <StartTime>2010-09-07T11:27:47.867</StartTime>
  <ServerName>xxx</ServerName>
  <LoginSid>xxx</LoginSid>
  <EventSequence>116538375</EventSequence>
  <IsSystem>1</IsSystem>
  <SessionLoginName />
</EVENT_INSTANCE>

SP1 正在执行从 5 个不同表(表 1 到表 5)获取数据的选择(使用内部查询等).SP2 对表 1 执行更新.
有趣的是,SP2 更新的列之一是 Table1 中的外键字段和 Table2 的主键,而 Table1 和 Table2 都是 SP1 的 select 语句的一部分,不确定这是否相关但不想错过任何东西.

SP1 is performing a select that takes data from 5 different tables (Table1 to Table5) (uses inner query etc.) SP2 performs an update on Table1.
An interesting thing is one of the columns that SP2 updates is a foreign key field in Table1 and primary key of Table2 while both Table1 and Table2 are part of the select statement of SP1, not sure this is relevant but didn't want to miss out anything.

注意:indexname="Index1"(在上面的死锁图中)——Index1 与 Table1 中的外键和 Table2 的主键位于同一列.

推荐答案

检查这个 MSDN 文章 指出:

隔离级别具有连接范围的范围,并且一旦设置为与 SET TRANSACTION ISOLATION LEVEL 语句连接,它在连接关闭或其他隔离之前一直有效级别已设置.当连接关闭并返回池时,上一个 SET TRANSACTION ISOLATION LEVEL 的隔离级别声明被保留.重用池的后续连接连接使用当时有效的隔离级别连接是池化的.

An isolation level has connection-wide scope, and once set for a connection with the SET TRANSACTION ISOLATION LEVEL statement, it remains in effect until the connection is closed or another isolation level is set. When a connection is closed and returned to the pool, the isolation level from the last SET TRANSACTION ISOLATION LEVEL statement is retained. Subsequent connections reusing a pooled connection use the isolation level that was in effect at the time the connection is pooled.

问题是使用 Serializable 隔离级别打开连接;关联的事务被释放,连接也是如此,但连接没有被破坏并进入连接池.下一次(使用相同的连接字符串)发出连接请求时,该连接将被返回,并且由于查询未指定任何隔离级别,因此它在 Serializable 隔离级别中执行.

The issue was that a connection gets opened with Serializable isolation level; the associated transaction was disposed and so was the connection but the connection was not destroyed and went to the connection pool. Next time when a request for a connection was made (with same connection string) this very connection was getting returned and since the query did not specify any isolation level, it was executing in the Serializable isolation level.

基本上,如果您有一个连接池并以特定的隔离级别打开一个连接,比如说 Serializable,那么该连接将返回到隔离级别设置为 Serializable 的池.下次您请求连接时,您无法确定不会返回此连接,因此即使默认隔离级别为 ReadCommitted,您也可能会获得这些可序列化"连接之一.

Basically, if you have a connection pool and open a connection in a particular isolation level, let's say Serializable, then the connection will go back to the pool with the isolation level set to Serializable. Next time you request a connection, you can't be sure that this connection will not be returned so even through the default isolation level is ReadCommitted you may get one of these "Serializable" connections.

另一个需要注意的是,每次将隔离级别设置为可序列化(或其他任何东西)时,您可能会选择不同的连接,并且通过将隔离级别设置为可序列化,您可能会慢慢污染连接池中越来越多的连接(或任何你设置的).

Another caveat is that each time you set the isolation level to Serializable (or anything else for that matter) you may be picking different connections and slowly you may pollute more and more connections in the connection pool by setting their isolation level to Serializable (or whatever you set).

我没有找到任何机制来重置处理连接(当它在执行我的查询后返回连接池时).一种解决方法是显式重置每个连接的隔离级别.但这很乏味.

I didn't find any mechanism to reset the disposing connection (when it was going back to the connection pool after executing my query). One workaround is to explicitly reset the isolation level for each connection. But this is tedious.

所以最好的选择是为不同的隔离级别创建单独的连接池.

相关文章