TSQL:在 select with join 中使用替换函数

2021-09-10 00:00:00 sql select tsql sql-server

背景.我正在使用 SQL Server.我在数据库中有两个表:

Background. I'm using SQL Server. I have two tables in database:

Vendors(Id, Name, Description) Products(Id, VendorId, Name, Description)

Id 列中的值使用 Vendor 表中的前缀 'ID_' 进行格式化.

Values in Id column are formatted with prefix 'ID_' in Vendor table.

VendorId 列中的值使用 Products 表中的前缀 'VE_' 进行格式化.

Values in VendorId column are formatted with prefix 'VE_' in Products table.

例如 Products中的'VE_001245'是指Vendors中的'ID_001245'>.

(请不要提议改变这个概念，不关心数据库方案，不建议添加外键.只是为了说明.)

(Please, do not propose to change this concept, do not care about database scheme, do not suggest adding foreign key. All it is just for illustration.)

问题:以下哪个查询在性能方面最好，为什么?

Question: which one of following queries is best in performance context and why?

在内部select中使用replace函数:

select v.* from Vendors v inner join ( select distinct replace(VendorId, 'VE_', 'ID_') as Id from Products ) list on v.Id = list.Id

在on语句中使用replace函数:

select v.* from Vendors v inner join ( select distinct VendorId as Id from Products ) list on v.Id = replace(list.Id, 'VE_', 'ID_')

编辑.每个表中只有聚集索引(按Id列).每个表可以包含数百万行.

Edit. There is only clustered index in each table (by Id column). Each table can contains millions rows.

推荐答案

两个查询在性能方面几乎相同.在第一个查询中，排序进行了两次，一次是在选择不同记录时，一次是在执行内部联接时，最后是合并联接选择最终结果集.而在第二个查询中，排序只完成一次，但正在执行 Hash join，这比合并连接更昂贵.因此，在表上没有任何索引的情况下，这两个查询在性能方面是相同的.

Both the queries are almost same in terms of performance. In the first query sorting is done twice, once when you are selecting the distinct records and again when it is performing an inner join, and in the end a merge join is there to select the final result set. Whereas in second query sorting is done only once but Hash join is being performed which is more expensive then merge join. So both the queries are same performance wise in the scenario when you don't have any index on the table.

相关文章