与不使用派生表的查询相比,使用派生表的查询有哪些优势?

2021-09-10 00:00:00 sql tsql sql-server

我知道如何使用派生表,但我仍然看不到使用它们的任何真正优势.

I know how derived tables are used, but I still can’t really see any real advantages of using them.

例如在下面的文章http://techahead.wordpress.com/2007/10/01/sql-derived-tables/ 作者试图通过示例展示使用派生表的查询比不使用派生表的查询的好处,我们希望生成一份报告显示每个客户在 1996 年下的订单总数,我们希望这个结果集包括所有客户,包括当年没有下任何订单的客户和根本没有下订单的客户(他使用的是 Northwind 数据库).

For example, in the following article http://techahead.wordpress.com/2007/10/01/sql-derived-tables/ the author tried to show benefits of a query using derived table over a query without one with an example, where we want to generate a report that shows off the total number of orders each customer placed in 1996, and we want this result set to include all customers, including those that didn’t place any orders that year and those that have never placed any orders at all( he’s using Northwind database ).

但是当我比较这两个查询时,我没有看到使用派生表的查询的任何优势(如果不出意外,使用派生表似乎并没有简化我们的代码,至少在这个例子中不是):

But when I compare the two queries, I fail to see any advantages of a query using a derived table ( if nothing else, use of a derived table doesn't appear to simplify our code, at least not in this example):

常规查询:

SELECT C.CustomerID, C.CompanyName, COUNT(O.OrderID) AS TotalOrders
FROM Customers C LEFT OUTER JOIN Orders O ON
       C.CustomerID = O.CustomerID AND YEAR(O.OrderDate) = 1996
GROUP BY C.CustomerID, C.CompanyName

使用派生表查询:

SELECT C.CustomerID, C.CompanyName, COUNT(dOrders.OrderID) AS TotalOrders
FROM Customers C LEFT OUTER JOIN
        (SELECT * FROM Orders WHERE YEAR(Orders.OrderDate) = 1996) AS dOrders
     ON
        C.CustomerID = dOrders.CustomerID
GROUP BY C.CustomerID, C.CompanyName

也许这不是一个很好的例子,所以你能告诉我一个例子,其中派生表的好处更明显吗?

Perhaps this just wasn’t a good example, so could you show me an example where benefits of derived table are more obvious?

谢谢

回复 GBN:

在这种情况下,如果客户和产品之间没有关系,您将无法同时捕获产品和订单聚合.

In this case, you couldn't capture both products and order aggregates if there is no relation between Customers and Products.

你能详细说明一下你的意思吗?以下查询是否会产生与您的查询相同的结果集:

Could you elaborate what exactly you mean? Wouldn’t the following query produce the same result set as your query:

SELECT 
     C.CustomerID, C.CompanyName,
     COUNT(O.OrderID) AS TotalOrders,
     COUNT(DISTINCT P.ProductID) AS DifferentProducts 
FROM Customers C LEFT OUTER JOIN Orders O ON
       C.CustomerID = O.CustomerID AND YEAR(O.OrderDate) = 1996
   LEFT OUTER JOIN Products P ON 
       O.somethingID = P.somethingID  
GROUP BY C.CustomerID, C.CompanyName

回复 CADE ROUX:

此外,如果使用表达式从具有大量共享中间计算的派生列派生列,则一组嵌套派生表或堆叠的 CTE 是唯一的方法:

In addition, if expressions are used to derive columns from derived columns with a lot of shared intermediate calculations, a set of nested derived tables or stacked CTEs is the only way to do it:

SELECT x, y, z1, z2
FROM (
    SELECT *
           ,x + y AS z1
           ,x - y AS z2
    FROM (
        SELECT x * 2 AS y
        FROM A
    ) AS A
) AS A

以下查询不会产生与上述查询相同的结果:

Wouldn't the following query produce the same result as your above query:

SELECT x, x * 2 AS y, x + x*2 AS z1, x - x*2 AS z2
FROM A

推荐答案

在您的示例中,派生表并不是绝对必要的.在许多情况下,您可能需要加入聚合或类似的数据,而派生表确实是处理该问题的唯一方法:

In your examples, the derived table is not strictly necessary. There are numerous cases where you might need to join to an aggregate or similar, and a derived table is really the only way to handle that:

SELECT *
FROM A
LEFT JOIN (
    SELECT x, SUM(y)
    FROM B
    GROUP BY x
) AS B
    ON B.x = A.x

此外,如果使用表达式从具有大量共享中间计算的派生列派生列,则一组嵌套派生表或堆叠的 CTE 是唯一的方法:

In addition, if expressions are used to derive columns from derived columns with a lot of shared intermediate calculations, a set of nested derived tables or stacked CTEs is the only way to do it:

SELECT x, y, z1, z2
FROM (
    SELECT *
           ,x + y AS z1
           ,x - y AS z2
    FROM (
        SELECT x * 2 AS y
        FROM A
    ) AS A
) AS A

就可维护性而言,使用堆叠 CTE 或派生表(它们基本上是等效的),可以使代码更具可读性和可维护性,并有助于剪切和粘贴重用和重构.优化器通常可以很容易地展平.

As far as maintainability, using stacked CTEs or derived tables (they are basically equivalent) and can make for more readable and maintainable code, as well as facilitating cut-and-paste re-use and refactoring. The optimizer can typically flatten then very easily.

为了更好的可读性,我通常使用堆叠 CTE 而不是嵌套(两个示例相同):

I typically use stacked CTEs instead of nesting for a little better readability (same two examples):

WITH B AS (
    SELECT x, SUM(y)
    FROM B
    GROUP BY x
)
SELECT *
FROM A
LEFT JOIN B
    ON B.x = A.x

WITH A1 AS (
    SELECT x * 2 AS y
    FROM A
)
,A2 AS (
    SELECT *
           ,x + y AS z1
           ,x - y AS z2
    FROM A1
)
SELECT x, y, z1, z2
FROM A2

关于您的问题:

SELECT x, x * 2 AS y, x + x*2 AS z1, x - x*2 AS z2 
FROM A 

x * 2 代码重复了 3 次.如果此业务规则需要更改,则必须在 3 个地方更改 - 注入缺陷的方法.每当您有需要保持一致且仅在一处定义的中间计算时,这种情况就会变得复杂.

This has the x * 2 code repeated 3 times. If this business rule needs to change, it will have to change in 3 places - a recipe for injection of defects. This gets compounded any time you have intermediate calculations which need to be consistent and defined in only one place.

如果 SQL Server 的标量用户定义函数可以被内联(或者如果它们执行得可以接受),这不会是一个大问题,您可以简单地构建您的 UDF 来堆叠您的结果,优化器将消除冗余调用.不幸的是,SQL Server 的标量 UDF 实现无法很好地处理大量行.

This would not be as much of a problem if SQL Server's scalar user-defined functions could be inlined (or if they performed acceptably), you could simply build your UDFs to stack your results and the optimizer would elimnate redundant calls. Unfortunately SQL Server's scalar UDF implementation cannot handle that well for large sets of rows.

相关文章