在单个表中使用 while 循环的多个选择查询?是否可以?

2021-09-24 00:00:00 while-loop with-statement sql-server

我有两张桌子.表 A 有日期、ISBN(书籍)、需求(该日期的需求).表 B 包含日期、ISBN(用于图书)和 SalesRank.

I have 2 tables. Table A has Date, ISBN (for Book), Demand(demand for that date). Table B has Date, ISBN (for Book), and SalesRank.

样本数据如下:DailyBookFile 的每个日期都有 150k 条记录,从 2010 年开始(即 150k * 365 天 * 8 年)行.每个日期大约有 50 万条记录的 SalesRank 表也是如此

The sample data is as follows: The DailyBookFile has 150k records for each date, from year 2010 (i.e. 150k * 365 days * 8 years) rows. Same goes with SalesRank Table having about 500k records for each date

DailyBookFile       
Date        Isbn13         CurrentModifiedDemandTotal
20180122    9780955153075   13
20180122    9780805863567   9
20180122    9781138779396   1
20180122    9780029001516   9
20180122    9780470614150   42

SalesRank       
importdate  ISBN13          SalesRank
20180122    9780029001516   69499
20180122    9780470614150   52879
20180122    9780805863567   832429
20180122    9780955153075   44528
20180122    9781138779396   926435

Required Output     
Date        Avg_Rank    Book_Group
20180122    385154  Elite
20180121    351545  Elite
20180120    201545  Elite

我想获取每天的 Top 200 CurrentModifiedDemand,并取平均排名.

I want to get the Top 200 CurrentModifiedDemand for each day, and take the average Rank.

我无法找到解决方案,因为我是 SQL 新手.

I am unable to work out a solution as I am new to SQL.

我从昨天获得了前 200 名 CurrentModifiedDemand 开始,然后获得了去年的平均排名.

I started with getting the Top 200 CurrentModifiedDemand for yesterday and get the Avg Rank over last year.

SELECT DBF.Filedate AS [Date],
       AVG(AMA.SalesRank) AS Avg_Rank,
       'Elite' AS Book_Group 
FROM [ODS].[wholesale].[DailyBookFile] AS DBF
INNER JOIN [ODS].[MarketplaceMonitor].[SalesRank] AS AMA ON (DBF.Isbn13 = AMA.ISBN13
                                                        AND DBF.FileDate = AMA.importdate)
WHERE DBF.Isbn13 IN (SELECT TOP 200 Isbn13
                     FROM [ODS].[wholesale].[DailyBookFile]
                     WHERE FileDate = 20180122
                       AND CAST(CurrentModifiedDemandTotal AS int) > 200)
  AND DBF.Filedate > 20170101
GROUP BY DBF.Filedate;

但结果不是我想要的.所以,现在我想要每天前 200 名 CurrentModifiedDemand 的 ISBN 及其平均排名.我试过了.

But the result is not what I want. So, now I want the ISBN for the Top 200 CurrentModifiedDemand for each day and their avg rank. I tried with this.

DECLARE @i int;
SET @i = 20180122;
WHILE (SELECT DISTINCT(DBF.Filedate)
       FROM [ODS].[wholesale].[DailyBookFile] AS DBF
       WHERE DBF.Filedate = @i) IS NOT NULL
BEGIN

    SELECT DBF.Filedate AS [Date],
           AVG(AMA.SalesRank) AS Avg_Rank,
           'Elite' AS Book_Group 
    FROM [ODS].[wholesale].[DailyBookFile] AS DBF
    INNER JOIN [ODS].[MarketplaceMonitor].[SalesRank] as AMA ON DBF.Isbn13 = AMA.ISBN13
                                                            AND DBF.FileDate = AMA.importdate
    WHERE DBF.Isbn13 in (SELECT TOP 200 Isbn13
                         FROM [ODS].[wholesale].[DailyBookFile]
                         WHERE FileDate = @i
                           AND CAST (CurrentModifiedDemandTotal AS int) > 500)
      AND DBF.Filedate = @i
    GROUP BY DBF.Filedate;

    SET @i = @i+1;

END

在这里,我在每个窗口中得到一个选择查询结果.有没有办法把结果放在一个表中?

In this I am getting one select query result in each window. Is there any way to have the result in a single table?

附言每天前 200 本书的列表会根据 CurrentModifiedDemand 变化.我想取他们的平均值.当天的销售排名.

P.S. The list of top 200 books every day will change according to the CurrentModifiedDemand. I want to take their avg. sales rank for that day.

推荐答案

您可以将行插入临时表(或表类型变量)并在循环完成后选择所有内容,而不是在循环的每次迭代中立即选择:

Instead of immediately selecting in each iteration of the loop, you can insert rows to temp table (or table-type variable) and select everything after the loop finishes:

IF OBJECT_ID('tempdb..#books') IS NOT NULL
BEGIN
    DROP TABLE #books
END

CREATE TABLE #books (
    [Date] INT,
    [Avg_Rank] FLOAT,
    [Book_Group] VARCHAR(512)
);

DECLARE @i int;
SET @i = 20180122;

BEGIN TRY
WHILE (SELECT DISTINCT(DBF.Filedate)
    FROM [ODS].[wholesale].[DailyBookFile] AS DBF
    WHERE DBF.Filedate = @i) IS NOT NULL
BEGIN

    INSERT INTO #books (
        [Date],
        [Avg_Rank],
        [Book_Group]
    )
    SELECT DBF.Filedate AS [Date],
        AVG(AMA.SalesRank) AS Avg_Rank,
        'Elite' AS Book_Group 
    FROM [ODS].[wholesale].[DailyBookFile] AS DBF
    INNER JOIN [ODS].[MarketplaceMonitor].[SalesRank] as AMA ON DBF.Isbn13 = AMA.ISBN13
                                                            AND DBF.FileDate = AMA.importdate
    WHERE DBF.Isbn13 in (SELECT TOP 200 Isbn13
                        FROM [ODS].[wholesale].[DailyBookFile]
                        WHERE FileDate = @i
                        AND CAST (CurrentModifiedDemandTotal AS int) > 500)
    AND DBF.Filedate = @i
    GROUP BY DBF.Filedate;

    SET @i = @i+1;

END
END TRY
BEGIN CATCH
    IF OBJECT_ID('tempdb..#books') IS NOT NULL
    BEGIN
        DROP TABLE #books
    END
END CATCH

SELECT *
FROM #books

DROP TABLE #books

使用表类型变量会产生更简单的代码,但是当存储大量数据时,表类型变量开始失去对临时表的性能.我不确定有多少行是截止的,但根据我的经验,我看到在 10000+ 行计数时将 table-type var 更改为 temp table 显着提高了性能.对于小行数,可能适用相反的情况.

Using table-type variable would yield simpler code, but when storing large amounts of data table-type variables start losing in performance against temp tables. I'm not sure how many rows is a cut-off, but in my experience I've seen significant performance gains from changing table-type var to temp table at 10000+ row counts. For small row counts an opposite might apply.

相关文章