SQL Server 中用户定义的表类型的性能

我们一直在使用 用户定义的表类型 将整数列表传递给我们的存储过程.

We have been using User-Defined Table Types to pass a list of integers to our stored procedures.

然后我们使用这些来连接到存储过程查询中的其他表.

We then use these to join to other tables in our stored proc queries.

例如:

CREATE PROCEDURE [dbo].[sp_Name]
(
    @Ids [dbo].[OurTableType] READONLY  
)
AS
    SET Nocount ON

    SELECT
        *
    FROM
        SOMETABLE
        INNER JOIN @Ids [OurTableType] ON [OurTableType].Id = SOMETABLE.Id

我们发现在使用较大的数据集时性能非常差.

We have seen very poor performance from this when using larger datasets.

我们用来加快速度的一种方法是将内容转储到临时表中,然后将其连接起来.

One approach we've used to speed things up, is the dump the contents into a temp table and join off that instead.

例如:

CREATE PROCEDURE [dbo].[sp_Name]
(
    @Ids [dbo].[OurTableType] READONLY  
)
AS
    SET Nocount ON
    CREATE TABLE #TempTable(Id INT)
    INSERT INTO #TempTable
    SELECT Id from @Ids

    SELECT
        *
    FROM
        SOMETABLE
        INNER JOIN #TempTable ON #TempTable.Id = SOMETABLE.Id

    DROP TABLE #TempTable

这确实显着提高了性能,但我想就这种方法以及我们尚未考虑的任何其他后果获得一些意见.此外,解释为什么这会提高性能也可能很有用.

This does improve performance significantly, but I wanted to get some opinions on this approach and any other consequences we haven't considered. Also an explanation as to why this improves performance may also be useful.

注意有时我们可能需要传入的不仅仅是一个整数,因此我们为什么不使用逗号分隔的列表或类似的东西.

推荐答案

SQL Server 2019 和 SQL Azure

Microsoft 实施了一项名为 表变量延迟编译,很大程度上解决了SQL Server以前版本中表变量的性能问题:

Microsoft has implemented a new feature called Table Variable Deferred Compilation that largely resolves the performance issues with table variables in previous versions of SQL Server:

使用表变量延迟编译,引用表变量的语句的编译将延迟到该语句的第一次实际执行.这与临时表的行为相同,并且此更改导致使用实际基数而不是原始的单行猜测.

With table variable deferred compilation, compilation of a statement that references a table variable is deferred until the first actual execution of the statement. This is identical to the behavior of temporary tables, and this change results in the use of actual cardinality instead of the original one-row guess.

此行为开箱即用,无需选择加入.不幸的是,它仍然会受到 参数嗅探问题,但总的来说这是一个巨大的改进.

This behaviour is available and enabled out-of-the-box and requires no opt-in. Unfortunately it can still suffer from parameter sniffing issues, but overall it's a massive improvement.

SQL Server 2017 及更早版本

JOIN性能不佳的主要原因是表值参数(TVP)是一个表变量.表变量不保留统计信息,并且在查询优化器看来只有 1 行.因此,他们可以执行类似 INSERT INTO Table (column_list) SELECT column_list FROM @TVP; 但不是 JOIN 之类的操作.

The primary reason for the poor performance of the JOIN is that the Table-Valued Parameter (TVP) is a Table Variable. Table Variables do not keep statistics and appear to the Query Optimizer to only have 1 row. Hence they are just fine to do something like INSERT INTO Table (column_list) SELECT column_list FROM @TVP; but not a JOIN.

有一些事情可以尝试解决这个问题:

There are a few things to try to get around this:

  1. 将所有内容转储到本地临时表(您已经在这样做了).这里的一个技术缺点是您正在复制传递到 tempdb 中的 TVP 的数据(TVP 和 temp 表都存储它们的数据).

  1. Dump everything to a local temporary table (you are already doing this). A technical downside here is that you are duplicating the data passed into the TVP in tempdb (where both the TVP and temp table store their data).

尝试将用户定义的表类型定义为具有集群主键.您可以在 [Id] 字段中内联执行此操作:

Try defining the User-Defined Table Type to have a Clustered Primary Key. You can do this inline on the [Id] field:

[ID] INT NOT NULL PRIMARY KEY

不确定这对性能有多大帮助,但值得一试.

Not sure how much this helps performance, but worth a try.

OPTION (RECOMPILE) 添加到查询中.这是一种让查询优化器查看表变量中有多少行的方法,以便它可以进行正确的估计.

Add OPTION (RECOMPILE) to the query. This is a way of getting the Query Optimizer to see how many rows are in a Table Variable so that it can have proper estimates.

SELECT column_list
FROM   SOMETABLE
INNER JOIN @Ids [OurTableType]
        ON [OurTableType].Id = SOMETABLE.Id
OPTION (RECOMPILE);

这里的缺点是你有一个 RECOMPILE,每次调用这个 proc 时都会花费额外的时间.但这可能是整体净收益.

The downside here is that you have a RECOMPILE which takes additional time each time this proc is called. But that might be an overall net gain.

从 SQL Server 2014 开始,您可以利用内存中 OLTP 并为用户定义的表类型指定 WITH (MEMORY_OPTIMIZED = ON).请参阅 场景:表变量可以 MEMORY_OPTIMIZED=ON 了解详情.我听说这肯定有帮助.不幸的是,在 SQL Server 2014 和 SQL Server 2016 RTM 中,此功能仅在 64 位企业版中可用.但是,从 SQL Server 2016 SP1 开始,此功能可用于所有版本(SQL Server Express LocalDB 可能例外).

Starting in SQL Server 2014, you can take advantage of In-Memory OLTP and specify WITH (MEMORY_OPTIMIZED = ON) for the User-Defined Table Type. Please see Scenario: Table variable can be MEMORY_OPTIMIZED=ON for details. I have heard that this definitely helps. Unfortunately, in SQL Server 2014 and SQL Server 2016 RTM this feature is only available in 64-bit Enterprise Edition. But, starting with SQL Server 2016 SP1, this feature was made available to all editions (possible exception being SQL Server Express LocalDB).

PS.不要做 SELECT *.始终指定列列表.除非做类似 IF EXIST(SELECT * FROM)....

PS. Don't do SELECT *. Always specify a column list. Unless doing something like an IF EXIST(SELECT * FROM)....

相关文章