带游标优化的 SQL 查询

2021-09-10 00:00:00 sql tsql sql-server

我有一个查询,我遍历一个表 -> 对于我遍历另一个表的每个条目,然后计算一些结果.我使用游标来遍历表.此查询需要很长时间才能完成.总是超过3分钟.如果我在 C# 中做类似的事情,其中​​表是数组或字典,它甚至不需要一秒钟.我做错了什么,如何提高效率?

DELETE FROM [QueryScores]走插入 [QueryScores] (Id)从 [文档] 中选择 ID声明@Id NVARCHAR(50)DECLARE myCursor CURSOR LOCAL FAST_FORWARD FORSELECT [Id] FROM [QueryScores]打开我的光标从 myCursor 中提取下一个到 @Id而@@FETCH_STATUS = 0开始声明@Score FLOAT = 0.0DECLARE @CounterMax INT = (SELECT COUNT(*) FROM [Query])声明@Counter INT = 0PRINT '文档:' + CAST(@Id AS VARCHAR)PRINT '分数:' + CAST(@Score AS VARCHAR)而@Counter <@CounterMax开始DECLARE @StemId INT = (SELECT [Query].[StemId] FROM [Query] WHERE [Query].[Id] = @Counter)DECLARE @Weight FLOAT = (SELECT [tfidf].[Weight] FROM [TfidfWeights] AS [tfidf] WHERE [tfidf].[StemId] = @StemId AND [tfidf].[DocumentId] = @Id)打印重量:"+ CAST(@Weight AS VARCHAR)IF(@Weight > 0.0)开始DECLARE @QWeight FLOAT = (SELECT [Query].[Weight] FROM [Query] WHERE [Query].[StemId] = @StemId)SET @Score = @Score + (@QWeight * @Weight)PRINT '分数:' + CAST(@Score AS VARCHAR)结尾SET @Counter = @Counter + 1结尾更新 [QueryScores] SET Score = @Score WHERE Id = @Id从 myCursor 中提取下一个到 @Id结尾关闭我的光标释放我的光标

逻辑是我有一个文档列表.我有一个问题/疑问.我遍历每个文档,然后通过查询术语/单词进行嵌套迭代,以查找文档是否包含这些术语.如果是,那么我将预先计算的分数相加/相乘.

解决方案

问题是您正在尝试使用基于集合的语言来迭代诸如过程语言之类的东西.SQL 需要不同的思维方式.您几乎不应该考虑 SQL 中的循环.

从我从您的代码中收集的信息来看,这应该可以在所有这些循环中执行您要执行的操作,但是它以基于集合的方式在单个语句中执行此操作,这正是 SQL 擅长的.

INSERT INTO QueryScores (id, score)选择做过,SUM(CASE WHEN W.[Weight] > 0 THEN W.[Weight] * Q.[Weight] ELSE NULL END)从文件 D交叉连接查询QLEFT OUTER JOIN TfidfWeights W ON W.StemId = Q.StemId AND W.DocumentId = D.id通过...分组做过

当然,如果没有描述您的要求或带有预期输出的示例数据,我不知道这是否真的是您想要的,但鉴于您的代码,这是我最好的猜测.

您应该阅读:https://stackoverflow.com/help/how-to-ask

I have a query where I iterate through a table -> for each entry I iterate through another table and then compute some results. I use a cursor for iterating through the table. This query takes ages to complete. Always more than 3 minutes. If I do something similar in C# where the tables are arrays or dictionaries it doesn't even take a second. What am I doing wrong and how can I improve the efficiency?

DELETE FROM [QueryScores]
GO

INSERT INTO [QueryScores] (Id)
SELECT Id FROM [Documents]

DECLARE @Id NVARCHAR(50)

DECLARE myCursor CURSOR LOCAL FAST_FORWARD FOR
SELECT [Id] FROM [QueryScores]

OPEN myCursor

FETCH NEXT FROM myCursor INTO @Id

WHILE @@FETCH_STATUS = 0
    BEGIN
        DECLARE @Score FLOAT = 0.0

        DECLARE @CounterMax INT = (SELECT COUNT(*) FROM [Query])
        DECLARE @Counter INT = 0

        PRINT 'Document: ' + CAST(@Id AS VARCHAR)
        PRINT 'Score: ' + CAST(@Score AS VARCHAR)

        WHILE @Counter < @CounterMax
            BEGIN

            DECLARE @StemId INT = (SELECT [Query].[StemId] FROM [Query] WHERE [Query].[Id] = @Counter)

            DECLARE @Weight FLOAT = (SELECT [tfidf].[Weight] FROM [TfidfWeights] AS [tfidf] WHERE [tfidf].[StemId] = @StemId AND [tfidf].[DocumentId] = @Id)

            PRINT 'WEIGHT: ' + CAST(@Weight AS VARCHAR)

            IF(@Weight > 0.0)
                BEGIN
                DECLARE @QWeight FLOAT = (SELECT [Query].[Weight] FROM [Query] WHERE [Query].[StemId] = @StemId)
                SET @Score = @Score + (@QWeight * @Weight)
                PRINT 'Score: ' + CAST(@Score AS VARCHAR)
                END

            SET @Counter = @Counter + 1
            END 

        UPDATE [QueryScores] SET Score = @Score WHERE Id = @Id 

        FETCH NEXT FROM myCursor INTO @Id
    END

CLOSE myCursor
DEALLOCATE myCursor 

The logic is that i have a list of docs. And I have a question/query. I iterate through each and every doc and then have a nested iteration through the query terms/words to find if the doc contains these terms. If it does then I add/multiply pre-calculated scores.

解决方案

The problem is that you're trying to use a set-based language to iterate through things like a procedural language. SQL requires a different mindset. You should almost never be thinking in terms of loops in SQL.

From what I can gather from your code, this should do what you're trying to do in all of those loops, but it does it in a single statement in a set-based manner, which is what SQL is good at.

INSERT INTO QueryScores (id, score)
SELECT
    D.id,
    SUM(CASE WHEN W.[Weight] > 0 THEN W.[Weight] * Q.[Weight] ELSE NULL END)
FROM
    Documents D
CROSS JOIN Query Q
LEFT OUTER JOIN TfidfWeights W ON W.StemId = Q.StemId AND W.DocumentId = D.id
GROUP BY
    D.id

Of course, without a description of your requirements or sample data with expected output I don't know if this is actually what you're looking to get, but it's my best guess given your code.

You should read: https://stackoverflow.com/help/how-to-ask

相关文章