使用 SQL Server 作为具有多个客户端的数据库队列

2021-12-28 00:00:00 sql database concurrency sql-server

给定一个充当队列的表,我如何最好地配置表/查询以便多个客户端同时处理队列?

Given a table that is acting as a queue, how can I best configure the table/queries so that multiple clients process from the queue concurrently?

例如,下表显示了工作人员必须处理的命令.当worker完成后,它会将处理后的值设置为true.

For example, the table below indicates a command that a worker must process. When the worker is done, it will set the processed value to true.

| ID | COMMAND | PROCESSED |
|  1 | ...     | true      |
|  2 | ...     | false     |
|  3 | ...     | false     |

客户端可能会像这样获得一个命令来处理:

The clients might obtain one command to work on like so:

select top 1 COMMAND 
from EXAMPLE_TABLE 
with (UPDLOCK, ROWLOCK) 
where PROCESSED=false;

但是,如果有多个工作人员,每个工作人员都会尝试获取 ID=2 的行.只有第一个将获得悲观锁,其余的将等待.然后其中一个将获得第 3 行,依此类推

However, if there are multiple workers, each tries to get the row with ID=2. Only the first will get the pessimistic lock, the rest will wait. Then one of them will get row 3, etc.

什么样的查询/配置可以让每个工作客户端获得不同的行并同时处理它们?

What query/configuration would allow each worker client to get a different row each and work on them concurrently?

几个答案表明使用表本身记录进程中状态的变化.我认为这在单个交易中是不可能的.(即,如果在提交 txn 之前没有其他工作人员会看到它,那么更新状态有什么意义?)也许建议是:

Several answers suggest variations on using the table itself to record an in-process state. I thought that this would not be possible within a single transaction. (i.e., what's the point of updating the state if no other worker will see it until the txn is committed?) Perhaps the suggestion is:

# start transaction
update to 'processing'
# end transaction
# start transaction
process the command
update to 'processed'
# end transaction

这是人们通常处理这个问题的方式吗?在我看来,如果可能的话,这个问题最好由数据库处理.

Is this the way people usually approach this problem? It seems to me that the problem would be better handled by the DB, if possible.

推荐答案

我推荐你去使用表作为队列.正确实施的队列可以处理每分钟高达 1/2 百万的入队/出队操作数以千计的并发用户和服务.在 SQL Server 2005 之前,该解决方案很麻烦,涉及在单个事务中混合 SELECTUPDATE 并提供正确的锁定提示组合,如链接的文章所示按英镑.幸运的是,随着 OUTPUT 子句的出现,SQL Server 2005 提供了更优雅的解决方案,现在 MSDN 建议使用 OUTPUT 子句:

I recommend you go over Using tables as Queues. Properly implemented queues can handle thousands of concurrent users and service as high as 1/2 Million enqueue/dequeue operations per minute. Until SQL Server 2005 the solution was cumbersome and involved a mixing a SELECT and an UPDATE in a single transaction and give just the right mix of lock hints, as in the article linked by gbn. Luckly since SQL Server 2005 with the advent of the OUTPUT clause, a much more elegant solution is available, and now MSDN recommends using the OUTPUT clause:

您可以在应用程序中使用 OUTPUT使用表作为队列,或持有中间结果集.那就是应用程序不断添加或从表中删除行

You can use OUTPUT in applications that use tables as queues, or to hold intermediate result sets. That is, the application is constantly adding or removing rows from the table

基本上,您需要解决这个难题的 3 个部分才能使其以高度并发的方式工作:

Basically there are 3 parts of the puzzle you need to get right in order for this to work in a highly concurrent manner:

  1. 您需要自动出队.您必须找到该行,跳过任何锁定的行,并在单个原子操作中将其标记为出队",这就是 OUTPUT 子句发挥作用的地方:
  1. You need to dequeue automically. You have to find the row, skip any locked rows, and mark it as 'dequeued' in a single, atomic operation, and this is where the OUTPUT clause comes into play:

    with CTE as (
      SELECT TOP(1) COMMAND, PROCESSED
      FROM TABLE WITH (READPAST)
      WHERE PROCESSED = 0)
    UPDATE CTE
      SET PROCESSED = 1
      OUTPUT INSERTED.*;

  1. 您必须使用 PROCESSED 列上最左侧的聚集索引键来构建您的表.如果 ID 用作主键,则将其作为聚集键中的第二列移动.是否在 ID 列上保留非聚集键的争论是公开的,但我强烈支持不要在队列上使用任何辅助非聚集索引:
  1. You must structure your table with the leftmost clustered index key on the PROCESSED column. If the ID was used a primary key, then move it as the second column in the clustered key. The debate whether to keep a non-clustered key on the ID column is open, but I strongly favor not having any secondary non-clustered indexes over queues:

    CREATE CLUSTERED INDEX cdxTable on TABLE(PROCESSED, ID);

  1. 您不得通过任何其他方式查询此表,只能通过 Dequeue.尝试执行 Peek 操作或尝试将表同时用作队列和作为存储很可能导致死锁并显着降低吞吐量.
  1. You must not query this table by any other means but by Dequeue. Trying to do Peek operations or trying to use the table both as a Queue and as a store will very likely lead to deadlocks and will slow down throughput dramatically.

原子出队、READPAST 提示搜索元素出队以及基于处理位的聚簇索引上的最左键的组合确保了在高并发负载下的非常高的吞吐量.

The combination of atomic dequeue, READPAST hint at searching elements to dequeue and leftmost key on the clustered index based on the processing bit ensure a very high throughput under a highly concurrent load.

相关文章