TSQL:有没有办法限制返回的行数并计算没有限制返回的总数(不将其添加到每一行)?

2021-09-10 00:00:00 sql tsql sql-server-2008 sql-server sqlclr

我正在更新一个当前最多选择 n 行的存储过程,如果返回的行数 = n,则执行没有限制的选择计数,然后返回原始选择和受影响的总行数.

I'm working to update a stored procedure that current selects up to n rows, if the rows returned = n, does a select count without the limit, and then returns the original select and the total impacted rows.

有点像:

SELECT TOP (@rowsToReturn)
    A.data1,
    A.data2
FROM
    mytable A

SET @maxRows = @@ROWCOUNT
IF @rowsToReturn = @@ROWCOUNT
BEGIN
  SET @maxRows = (SELECT COUNT(1) FROM mytableA)
END    

我想将其简化为单个选择语句.基于这个问题,COUNT(*) OVER() 允许这样做,但它被放在每一行而不是输出参数中.可能类似于 MYSQL 中的 FOUND_ROWS(),例如 @@TOTALROWCOUNT 等.

I'm wanting reduce this to a single select statement. Based on this question, COUNT(*) OVER() allows this, but it is put on every single row instead of in an output parameter. Maybe something like FOUND_ROWS() in MYSQL, such as a @@TOTALROWCOUNT or such.

作为旁注,由于实际的 select 有一个 order by,数据库将需要遍历整个集合(以确保它获得正确的前 n 个有序记录),因此数据库应该已经有了这个数一数.

As a side note, since the actual select has an order by, the data base will need to already traverse the entire set (to make sure that it gets the correct first n ordered records), so the database should already have this count somewhere.

推荐答案

正如@MartinSmith 在对这个问题的评论中提到的,没有直接(即纯 T-SQL)方法来获得总行数返回,同时限制它.过去我做过的方法是:

As @MartinSmith mentioned in a comment on this question, there is no direct (i.e. pure T-SQL) way of getting the total numbers of rows that would be returned while at the same time limiting it. In the past I have done the method of:

  • 将查询转储到临时表以获取 @@ROWCOUNT(总集)
  • 对主查询的排序结果使用ROW_NUBMER() AS [ResultID]
  • SELECT TOP (n) FROM #Temp ORDER BY [ResultID] 或类似的东西
  • dump the query to a temp table to grab @@ROWCOUNT (the total set)
  • use ROW_NUBMER() AS [ResultID] on the ordered results of the main query
  • SELECT TOP (n) FROM #Temp ORDER BY [ResultID] or something similar

当然,这里的缺点是将这些记录放入临时表时会产生磁盘 I/O 成本.将 [tempdb] 放在 SSD 上?:)

Of course, the downside here is that you have the disk I/O cost of getting those records into the temp table. Put [tempdb] on SSD? :)


我也经历过先用相同的查询其余部分运行 COUNT(*),然后运行常规 SELECT"方法(如@Blam 所提倡的),这不是免费"重新运行查询:

I have also experienced the "run COUNT(*) with the same rest of the query first, then run the regular SELECT" method (as advocated by @Blam), and it is not a "free" re-run of the query:

  • 在许多情况下,这是一次完整的重新运行.问题是在执行 COUNT(*) 时(因此不返回任何字段),优化器只需要考虑 JOIN、WHERE、GROUP BY、ORDER BY 子句方面的索引.但是,当您想要返回一些实际数据时,可能 会大大改变执行计划,尤其是当用于获取 COUNT(*) 的索引没有覆盖"SELECT 列表中的字段时.
  • 另一个问题是,即使索引都相同,因此所有数据页仍在缓存中,这也只是避免了物理读取.但是您仍然可以进行逻辑读取.
  • It is a full re-run in many cases. The issue is that when doing COUNT(*) (hence not returning any fields), the optimizer only needs to worry about indexes in terms of the JOIN, WHERE, GROUP BY, ORDER BY clauses. But when you want some actual data back, that could change the execution plan quite a bit, especially if the indexes used to get the COUNT(*) are not "covering" for the fields in the SELECT list.
  • The other issue is that even if the indexes are all the same and hence all of the data pages are still in cache, that just saves you from the physical reads. But you still have the logical reads.

我并不是说这种方法不起作用,但我认为问题中仅有条件地执行 COUNT(*) 的方法对系统的压力要小得多.

I'm not saying this method doesn't work, but I think the method in the Question that only does the COUNT(*) conditionally is far less stressful on the system.


@Gordon 提倡的方法实际上在功能上与我上面描述的临时表方法非常相似:它将完整的结果集转储到 [tempdb](INSERTED 表在 [tempdb] 中)得到完整的 @@ROWCOUNT 然后它得到一个子集.不利的一面是,INSTEAD OF TRIGGER 方法是:

The method advocated by @Gordon is actually functionally very similar to the temp table method I described above: it dumps the full result set to [tempdb] (the INSERTED table is in [tempdb]) to get the full @@ROWCOUNT and then it gets a subset. On the downside, the INSTEAD OF TRIGGER method is:

  • 需要设置更多的工作(如 10 倍 - 20 倍):您需要一个真实的表来表示每个不同的结果集,您需要一个触发器,触发器需要要么动态构建,要么从某个配置表中获取要返回的行数,或者我想它可以从 CONTEXT_INFO() 或临时表中获取.尽管如此,整个过程还是相当复杂的步骤.

  • a lot more work to set up (as in 10x - 20x more): you need a real table to represent each distinct result set, you need a trigger, the trigger needs to either be built dynamically, or get the number of rows to return from some config table, or I suppose it could get it from CONTEXT_INFO() or a temp table. Still, the whole process is quite a few steps and convoluted.

非常效率低下:首先,它做了相同数量的工作,将完整的结果集转储到一个表中(即到 INSERTED 表中——它位于[tempdb]) 但随后它执行了一个额外的步骤来选择所需的记录子集(这不是真正的问题,因为这应该仍然在缓冲池中)以返回到真正的表中.更糟糕的是,第二步实际上是双 I/O,因为操作也在该真实表所在的数据库的事务日志中表示.但是等等,还有更多:查询的下一次运行怎么样?你需要清除这张真实的桌子.无论是通过DELETE还是TRUNCATE TABLE,它都是在事务日志中显示的另一个操作(基于使用这两个操作中的哪一个的表示量),加上是额外操作花费的额外时间.并且,我们不要忘记从 INSERTED 中选择子集到真实表中的步骤:它没有机会使用索引,因为您无法索引 INSERTEDDELETED 表.并不是说您总是想向临时表添加索引,但有时它会有所帮助(取决于具体情况)并且您至少可以选择.

very inefficient: first it does the same amount of work dumping the full result set to a table (i.e. into the INSERTED table--which lives in [tempdb]) but then it does an additional step of selecting the desired subset of records (not really a problem as this should still be in the buffer pool) to go back into the real table. What's worse is that second step is actually double I/O as the operation is also represented in the transaction log for the database where that real table exists. But wait, there's more: what about the next run of the query? You need to clear out this real table. Whether via DELETE or TRUNCATE TABLE, it is another operation that shows up (the amount of representation based on which of those two operations is used) in the transaction log, plus is additional time spent on the additional operation. AND, let's not forget about the step that selects the subset out of INSERTED into the real table: it doesn't have the opportunity to use an index since you can't index the INSERTED and DELETED tables. Not that you always would want to add an index to the temp table, but sometimes it helps (depending on the situation) and you at least have that choice.

过于复杂:当两个进程需要同时运行查询时会发生什么?如果它们共享相同的真实表以转储到最终输出中,然后从中选择,则需要添加另一列以区分 SPID.它可以是 @@SPID.或者它可以是在调用初始 INSERT 到实际表之前创建的 GUID(以便它可以通过 CONTEXT_INFO() 传递给 INSTEAD OF 触发器 或临时表).无论值是多少,一旦选择了最终输出,它将用于执行 DELETE 操作.如果不明显,这部分会影响之前项目符号中提出的性能问题:TRUNCATE TABLE 不能使用,因为它会清除整个表,留下 DELETE FROM dbo.RealTable WHERE ProcessID = @WhatID; 作为唯一的选项.

overly complicated: what happens when two processes need to run the query at the same time? If they are sharing the same real table to dump into and then select out of for the final output, then there needs to be another column added to distinguish between the SPIDs. It could be @@SPID. Or it could be a GUID created before the initial INSERT into the real table is called (so that it can be passed to the INSTEAD OF trigger via CONTEXT_INFO() or a temp table). Whatever the value is, it would then be used to do the DELETE operation once the final output has been selected. And if not obvious, this part influences a performance issue brought up in the prior bullet: TRUNCATE TABLE cannot be used as it clears the entire table, leaving DELETE FROM dbo.RealTable WHERE ProcessID = @WhateverID; as the only option.

现在,公平地说,可能从触发器本身内部执行最终的 SELECT.这将减少一些低效率,因为数据永远不会进入真正的表,然后也永远不需要删除.它还减少了过度复杂化,因为不需要通过 SPID 分离数据.然而,这是一个非常有时间限制的解决方案,因为在 SQL Server 的下一个版本中,从触发器中返回结果的能力将被淘汰,禁止来自触发器服务器配置选项的结果:

Now, to be fair, it is possible to do the final SELECT from within the trigger itself. This would reduce some of the inefficiency as the data never makes it into the real table and then also never needs to be deleted. It also reduces the over-complication as there should be no need to separate the data by SPID. However, this is a very time-limited solution as the ability to return results from within a trigger is going bye-bye in the next release of SQL Server, so sayeth the MSDN page for the disallow results from triggers Server Configuration Option:

此功能将在 Microsoft SQL Server 的下一版本中删除.请勿在新的开发工作中使用此功能,并尽快修改当前使用此功能的应用程序.我们建议您将此值设置为 1.

This feature will be removed in the next version of Microsoft SQL Server. Do not use this feature in new development work, and modify applications that currently use this feature as soon as possible. We recommend that you set this value to 1.


唯一可行的方法:

  • 查询一次
  • 获取行的子集
  • 仍然获得完整结果集的总行数

是使用.Net.如果 procs 是从应用程序代码调用的,请参阅底部的编辑 2".如果您希望能够通过即席查询随机运行各种存储过程,那么它必须是一个 SQLCLR 存储过程,以便它可以是通用的并且适用于任何查询,因为存储过程可以返回动态结果集而函数则不能.proc 至少需要 3 个参数:

is to use .Net. If the procs are being called from app code, please see "EDIT 2" at the bottom. If you want to be able to randomly run various stored procedures via ad hoc queries, then it would have to be a SQLCLR stored procedure so that it could be generic and work for any query as stored procedures can return dynamic result sets and functions cannot. The proc would need at least 3 parameters:

  • @QueryToExec NVARCHAR(MAX)
  • @RowsToReturn INT
  • @TotalRows INT 输出

想法是使用Context Connection = true;"利用内部/进程内连接.然后执行以下基本步骤:

The idea is to use "Context Connection = true;" to make use of the internal / in-process connection. You then do these basic steps:

  1. 调用ExecuteDataReader()
  2. 在读取任何行之前,先执行一个 GetSchemaTable()
  3. 从 SchemaTable 获得结果集字段名称和数据类型
  4. 从结果集结构中构造一个 SqlDataRecord
  5. 用那个 SqlDataRecord 你调用 SqlContext.Pipe.SendResultsStart(_DataRecord)
  6. 现在你开始调用 Reader.Read()
  7. 对于您调用的每一行:
  1. call ExecuteDataReader()
  2. before you read any rows, do a GetSchemaTable()
  3. from the SchemaTable you get the result set field names and datatypes
  4. from the result set structure you construct a SqlDataRecord
  5. with that SqlDataRecord you call SqlContext.Pipe.SendResultsStart(_DataRecord)
  6. now you start calling Reader.Read()
  7. for each row you call:
  1. Reader.GetValues()
  2. DataRecord.SetValues()
  3. SqlContext.Pipe.SendResultRow(_DataRecord)
  4. RowCounter++

  • 不是执行典型的while (Reader.Read())",而是包含@RowsToReturn 参数:while(Reader.Read() && RowCounter < RowsToReturn.Value)
  • 在while循环之后,调用SqlContext.Pipe.SendResultsEnd()关闭结果集(你正在发送的,不是你正在阅读的)
  • 然后执行第二个 while 循环,循环遍历结果的其余部分,但永远不会获取任何字段:而 (Reader.Read()){行计数器++;}
  • 然后只需设置 TotalRows = RowCounter; 它将传回完整结果集的行数,即使您只返回了它的前 n 行:)
  • Rather than doing the typical "while (Reader.Read())", you instead include the @RowsToReturn param: while(Reader.Read() && RowCounter < RowsToReturn.Value)
  • After that while loop, call SqlContext.Pipe.SendResultsEnd() to close the result set (the one that you are sending, not the one you are reading)
  • then do a second while loop that cycles through the rest of the result, but never gets any of the fields: while (Reader.Read()) { RowCounter++; }
  • then just set TotalRows = RowCounter; which will pass back the number of rows for the full result set, even though you only returned the top n rows of it :)
  • 不确定这对临时表方法、双重调用方法,甚至@M.Ali 的方法(我也尝试过并且有点像,但问题特定于not将值作为列发送),但它应该没问题并且确实按要求完成了任务.

    Not sure how this performs against the temp table method, the dual call method, or even @M.Ali's method (which I have also tried and kinda like, but the question was specific to not sending the value as a column), but it should be fine and does accomplish the task as requested.


    甚至更好!另一种选择(上述 C# 建议的变体)是使用 T-SQL 存储过程中的 @@ROWCOUNT,作为 OUTPUT 参数发送,而不是循环SqlDataReader 中的其余行.所以存储过程类似于:


    Even better! Another option (a variation on the above C# suggestion) is to use the @@ROWCOUNT from the T-SQL stored procedure, sent as an OUTPUT parameter, rather than cycling through the rest of the rows in the SqlDataReader. So the stored procedure would be similar to:

    CREATE PROCEDURE SchemaName.ProcName
    (
       @Param1 INT,
       @Param2 VARCHAR(05),
       @RowCount INT OUTPUT = -1 -- default so it doesn't have to be passed in
    )
    AS
    SET NOCOUNT ON;
    
    {any ol' query}
    
    SET @RowCount = @@ROWCOUNT;
    

    然后,在应用程序代码中,为@RowCount"创建一个新的 SqlParameter,Direction = Output.上面的编号步骤保持不变,除了最后两个(10 和 11)更改为:

    Then, in the app code, create a new SqlParameter, Direction = Output, for "@RowCount". The numbered steps above stay the same, except the last two (10 and 11), which change to:

    1. 只需调用 Reader.Close()
    2. 而不是第二个 while 循环
    3. 不要使用 RowCounter 变量,而是设置 TotalRows = (int)RowCountOutputParam.Value;

    我已经试过了,它确实有效.但到目前为止,我还没有时间测试其他方法的性能.

    I have tried this and it does work. But so far I have not had time to test the performance against the other methods.

    编辑 2:
    如果从应用层调用 T-SQL 存储过程(即不需要临时执行),那么这实际上是上述 C# 方法的更简单的变体.在这种情况下,您无需担心 SqlDataRecordSqlContext.Pipe 方法.假设您已经设置了一个 SqlDataReader 来拉回结果,您只需要:

    EDIT 2:
    If the T-SQL stored procs are being called from the app layer (i.e. no need for ad hoc execution) then this is actually a much simpler variation of the above C# methods. In this case you don't need to worry about the SqlDataRecord or the SqlContext.Pipe methods. Assuming you already have a SqlDataReader set up to pull back the results, you just need to:

    1. 确保 T-SQL 存储过程具有 @RowCount INT OUTPUT = -1 参数
    2. 确保在查询后立即SET @RowCount = @@ROWCOUNT;
    3. 将 OUTPUT 参数注册为具有方向 = 输出的 SqlParameter
    4. 使用类似于:while(Reader.Read() && RowCounter < RowsToReturn) 的循环,这样您就可以在撤回所需数量后停止检索结果.立>
    5. 记住不要限制存储过程中的结果(即没有TOP (n))
    1. Make sure the T-SQL stored proc has a @RowCount INT OUTPUT = -1 parameter
    2. Make sure to SET @RowCount = @@ROWCOUNT; immediately after the query
    3. Register the OUTPUT param as a SqlParameter having Direction = Output
    4. Use a loop similar to: while(Reader.Read() && RowCounter < RowsToReturn) so that you can stop retrieving results once you have pulled back the desired amount.
    5. Remember to not limit the result in the stored proc (i.e. no TOP (n))

    此时,就像上面第一个编辑"中提到的那样,只需关闭 SqlDataReader 并获取 OUTPUT 参数的 .Value :).

    At that point, just like what was mentioned in the first "EDIT" above, just close the SqlDataReader and grab the .Value of the OUTPUT param :).

    相关文章