SQL Server 2005 中的原子 UPSERT

2021-11-30 00:00:00 atomic sql-server-2005 upsert sql-server

在 SQL Server 2005 中执行原子UPSERT"(存在时更新，否则插入)的正确模式是什么?

What is the correct pattern for doing an atomic "UPSERT" (UPDATE where exists, INSERT otherwise) in SQL Server 2005?

我在 SO 上看到很多代码(例如，请参阅检查是否一行存在，否则插入) 具有以下两部分模式:

I see a lot of code on SO (e.g. see Check if a row exists, otherwise insert) with the following two-part pattern:

UPDATE ... FROM ... WHERE <condition> -- race condition risk here IF @@ROWCOUNT = 0 INSERT ...

或

IF (SELECT COUNT(*) FROM ... WHERE <condition>) = 0 -- race condition risk here INSERT ... ELSE UPDATE ...

哪里<条件 > 将是对自然键的评估.上述方法似乎都不能很好地处理并发.如果我不能有两行具有相同的自然键，那么在竞争条件场景中，上述所有的行似乎都有插入具有相同自然键的行的风险.

where < condition > will be an evaluation of natural keys. None of the above approaches seem to deal well with concurrency. If I cannot have two rows with the same natural key, it seems like all of the above risk inserting rows with the same natural keys in race condition scenarios.

我一直在使用以下方法，但我很惊讶在人们的回复中没有看到它，所以我想知道它有什么问题:

I have been using the following approach but I'm surprised not to see it anywhere in people's responses so I'm wondering what is wrong with it:

INSERT INTO <table> SELECT <natural keys>, <other stuff...> FROM <table> WHERE NOT EXISTS -- race condition risk here? ( SELECT 1 FROM <table> WHERE <natural keys> ) UPDATE ... WHERE <natural keys>

请注意，这里提到的竞争条件与早期代码中的竞争条件不同.在较早的代码中，问题是幻读(在 UPDATE/IF 之间或另一个会话的 SELECT/INSERT 之间插入行).在上面的代码中，竞争条件与 DELETE 相关.在 (WHERE NOT EXISTS) 执行之后但在 INSERT 执行之前，另一个会话是否可以删除匹配的行?不清楚 WHERE NOT EXISTS 在哪里锁定与 UPDATE 相关的任何内容.

Note that the race condition mentioned here is a different one from the ones in the earlier code. In the earlier code, the issue was phantom reads (rows being inserted between the UPDATE/IF or between the SELECT/INSERT by another session). In the above code, the race condition has to do with DELETEs. Is it possible for a matching row to be deleted by another session AFTER the (WHERE NOT EXISTS) executes but before the INSERT executes? It's not clear where the WHERE NOT EXISTS puts a lock on anything in conjunction with the UPDATE.

这是原子的吗?我找不到在 SQL Server 文档中记录的位置.

Is this atomic? I can't locate where this would be documented in SQL Server documentation.

我意识到这可以通过事务来完成，但我认为我需要将事务级别设置为 SERIALIZABLE 以避免幻读问题?对于这样一个常见的问题，这肯定是矫枉过正吗?

I realise this could be done with transactions, but I think I would need to set the transaction level to SERIALIZABLE to avoid the phantom read problem? Surely that is overkill for such a common problem?

推荐答案

INSERT INTO <table> SELECT <natural keys>, <other stuff...> FROM <table> WHERE NOT EXISTS -- race condition risk here? ( SELECT 1 FROM <table> WHERE <natural keys> ) UPDATE ... WHERE <natural keys>

在第一个 INSERT 中存在竞争条件.该键在内部查询 SELECT 期间可能不存在，但在 INSERT 时确实存在，从而导致键冲突.
在 INSERT 和 UPDATE 之间存在竞争条件.在 INSERT 的内部查询中检查时，该键可能存在，但在 UPDATE 运行时已消失.

对于第二个竞争条件，人们可能会争辩说该键无论如何都会被并发线程删除，所以这并不是真正的丢失更新.

For the second race condition one could argue that the key would have been deleted anyway by the concurrent thread, so it's not really a lost update.

最佳解决方案通常是尝试最可能的情况，并在失败时处理错误(当然是在事务中):

The optimal solution is usually to try the most likely case, and handle the error if it fails (inside a transaction, of course):

如果密钥可能丢失，请始终先插入.处理违反唯一约束的情况，回退更新.
如果密钥可能存在，请始终先更新.如果未找到行，则插入.处理可能违反唯一约束的情况，回退更新.

除了正确性之外，这种模式也是速度的最佳选择:尝试插入和处理异常比进行虚假锁定更有效.锁定意味着逻辑页读取(可能意味着物理页读取)，IO(甚至逻辑)比 SEH 更昂贵.

Besides correctness, this pattern is also optimal for speed: is more efficient to try to insert and handle the exception than to do spurious lockups. Lockups mean logical page reads (which may mean physical page reads), and IO (even logical) is more expensive than SEH.

更新@Peter

为什么单个语句不是原子的"?假设我们有一个简单的表格:

Why isn't a single statement 'atomic'? Let's say we have a trivial table:

create table Test (id int primary key);

现在，如果我从两个线程运行这个单一的语句，在一个循环中，它将是原子的"，正如您所说，可以存在无竞争条件:

Now if I'd run this single statement from two threads, in a loop, it would be 'atomic', as you say, an no race condition can exist:

insert into Test (id) select top (1) id from Numbers n where not exists (select id from Test where id = n.id);

然而仅仅几秒钟，就发生了主键冲突:

Yet in only a couple of seconds, a primary key violation occurs:

消息 2627，级别 14，状态 1，第 4 行
违反 PRIMARY KEY 约束PK__Test__24927208".无法在对象dbo.Test"中插入重复键.

Msg 2627, Level 14, State 1, Line 4
Violation of PRIMARY KEY constraint 'PK__Test__24927208'. Cannot insert duplicate key in object 'dbo.Test'.

这是为什么?您是正确的，因为 SQL 查询计划将在 DELETE ... FROM ... JOIN、WITH cte AS (SELECT...FROM) DELETE FROM cte 上做正确的事" 以及许多其他情况.但是在这些情况下有一个关键的区别:子查询"指的是更新或删除操作的目标.对于这种情况，查询计划确实会使用适当的锁，实际上我这种行为在某些情况下很关键，例如在实现队列时使用表作为队列.

Why is that? You are correct in that SQL query plan will do the 'right thing' on DELETE ... FROM ... JOIN, on WITH cte AS (SELECT...FROM ) DELETE FROM cte and in many other cases. But there is a crucial difference in these cases: the 'subquery' refers to the target of an update or delete operation. For such cases the query plan will indeed use an appropriate lock, in fact I this behavior is critical on certain cases, like when implementing queues Using tables as Queues.

但是在原始问题以及我的示例中，查询优化器将子查询视为查询中的子查询，而不是需要特殊锁保护的特殊扫描更新"类型查询.结果是子查询查找的执行可以被并发观察者观察为不同的操作，从而破坏了语句的原子"行为.除非采取特殊的预防措施，否则多个线程可以尝试插入相同的值，同时确信它们已经检查过并且该值不存在.只有一个可以成功，另一个会打到PK违规.QED.

But in the original question, as well as in my example, the subquery is seen by the query optimizer just as a subquery in a query, not as some special 'scan for update' type query that needs special lock protection. The result is that the execution of the subquery lookup can be observed as a distinct operation by a concurent observerver, thus breaking the 'atomic' behavior of the statement. Unless special precaution is taken, multiple threads can attempt to insert the same value, both convinced they had checked and the value doesn't already exists. Only one can succeed, the other will hit the PK violation. QED.

相关文章