我什么时候应该使用 CROSS APPLY 而不是 INNER JOIN?

2021-12-01 00:00:00 performance sql tsql sql-server cross-apply

使用CROSS APPLY的主要目的是什么?>

我已经读到(模糊地,通过 Internet 上的帖子)cross apply 在选择大型数据集时会更有效,如果您进行分区.(想到分页)

我也知道 CROSS APPLY 不需要 UDF 作为右表.

在大多数 INNER JOIN 查询(一对多关系)中,我可以重写它们以使用 CROSS APPLY,但它们总是给我等效的执行计划.

谁能给我一个很好的例子,说明 CROSS APPLY 在哪些情况下 INNER JOIN 也能起作用?

<小时>

这是一个简单的例子,其中的执行计划完全相同.(告诉我它们的不同之处以及 cross apply 更快/更有效的地方)

创建表公司(companyId int 身份(1,1), 公司名称 varchar(100), 邮政编码 varchar(10), 约束 PK_Company 主键 (companyId))走创建表人(personId int 身份(1,1), personName varchar(100), companyId int, 约束 FK_Person_CompanyId 外键 (companyId) 引用 dbo.Company(companyId), 约束PK_Person主键(personId))走插入公司选择ABC 公司"、19808"工会选择XYZ 公司"、08534"工会选择123 公司"、10016"插入人选择艾伦",1 个联合选择鲍比",1 个工会选择克里斯",1 个工会选择 'Xavier', 2 union选择 'Yoshi', 2 union选择 'Zambrano', 2 union选择玩家 1"、3 联盟选择'玩家 2', 3 联盟选择'玩家 3', 3/* 使用交叉应用 */选择 *来自人 p交叉申请(选择 *来自 c 公司其中 p.companyid = c.companyId) 压缩包/* 使用 INNER JOIN 的等效查询 */选择 *来自人 pp.companyid = c.companyId 上的内部连接公司 c

解决方案

任何人都可以举一个很好的例子来说明当 CROSS APPLY 在那些 INNER JOIN 也可以工作的情况下有所作为时吗?

详细性能对比见我博客文章:

  • 内连接CROSS APPLY

CROSS APPLY 在没有简单 JOIN 条件的情况下效果更好.

这个从t2中为t1中的每条记录选择3最后一条记录:

SELECT t1.*, t2o.*从 t1交叉申请(选择前 3 个 *从 t2哪里 t2.t1_id = t1.id订购者t2.rank DESC) t2o

它不能用 INNER JOIN 条件轻松表述.

您可能可以使用 CTE 和窗口函数来做类似的事情:

with t2o AS(SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn从 t2)选择 t1.*, t2o.*从 t1内部联接t2oON t2o.t1_id = t1.idAND t2o.rn <= 3

,但这样可读性较差,效率也可能较低.

更新:

刚刚检查过.

master 是一个包含 20,000,000 条记录的表格,PRIMARY KEY 位于 id.

这个查询:

WITH q AS(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS rn从主),吨AS(选择 1 作为 ID联合所有选择 2)选择  *从T加入qON q.rn <= t.id

运行几乎 30 秒,而这个:

WITH t AS(选择 1 作为 ID联合所有选择 2)选择  *从T交叉申请(SELECT TOP (t.id) m.*发件人订购者ID) q

是即时的.

What is the main purpose of using CROSS APPLY?

I have read (vaguely, through posts on the Internet) that cross apply can be more efficient when selecting over large data sets if you are partitioning. (Paging comes to mind)

I also know that CROSS APPLY doesn't require a UDF as the right-table.

In most INNER JOIN queries (one-to-many relationships), I could rewrite them to use CROSS APPLY, but they always give me equivalent execution plans.

Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?


Edit:

Here's a trivial example, where the execution plans are exactly the same. (Show me one where they differ and where cross apply is faster/more efficient)

create table Company (
    companyId int identity(1,1)
,   companyName varchar(100)
,   zipcode varchar(10) 
,   constraint PK_Company primary key (companyId)
)
GO

create table Person (
    personId int identity(1,1)
,   personName varchar(100)
,   companyId int
,   constraint FK_Person_CompanyId foreign key (companyId) references dbo.Company(companyId)
,   constraint PK_Person primary key (personId)
)
GO

insert Company
select 'ABC Company', '19808' union
select 'XYZ Company', '08534' union
select '123 Company', '10016'


insert Person
select 'Alan', 1 union
select 'Bobby', 1 union
select 'Chris', 1 union
select 'Xavier', 2 union
select 'Yoshi', 2 union
select 'Zambrano', 2 union
select 'Player 1', 3 union
select 'Player 2', 3 union
select 'Player 3', 3 


/* using CROSS APPLY */
select *
from Person p
cross apply (
    select *
    from Company c
    where p.companyid = c.companyId
) Czip

/* the equivalent query using INNER JOIN */
select *
from Person p
inner join Company c on p.companyid = c.companyId

解决方案

Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?

See the article in my blog for detailed performance comparison:

  • INNER JOIN vs. CROSS APPLY

CROSS APPLY works better on things that have no simple JOIN condition.

This one selects 3 last records from t2 for each record from t1:

SELECT  t1.*, t2o.*
FROM    t1
CROSS APPLY
        (
        SELECT  TOP 3 *
        FROM    t2
        WHERE   t2.t1_id = t1.id
        ORDER BY
                t2.rank DESC
        ) t2o

It cannot be easily formulated with an INNER JOIN condition.

You could probably do something like that using CTE's and window function:

WITH    t2o AS
        (
        SELECT  t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
        FROM    t2
        )
SELECT  t1.*, t2o.*
FROM    t1
INNER JOIN
        t2o
ON      t2o.t1_id = t1.id
        AND t2o.rn <= 3

, but this is less readable and probably less efficient.

Update:

Just checked.

master is a table of about 20,000,000 records with a PRIMARY KEY on id.

This query:

WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (ORDER BY id) AS rn
        FROM    master
        ),
        t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
JOIN    q
ON      q.rn <= t.id

runs for almost 30 seconds, while this one:

WITH    t AS 
        (
        SELECT  1 AS id
        UNION ALL
        SELECT  2
        )
SELECT  *
FROM    t
CROSS APPLY
        (
        SELECT  TOP (t.id) m.*
        FROM    master m
        ORDER BY
                id
        ) q

is instant.

相关文章