是否有可能对运行总数编制索引以加快窗口函数的运行速度？

2022-06-25 00:00:00 sql mariadb mysql

我要执行：

    SELECT cust_id, SUM(cost)
    FROM purchases
    WHERE purchase_time BETWEEN('2018-01-01', '2019-01-01')
    GROUP BY cust_id

但是对于大量的行，我预计必须加载每条记录才能聚合相应的总和。我想做的是有一个类似于：

的索引

    CREATE INDEX purchase_sum 
    ON purchases(cust_id, purchase_time, 
    SUM(cost) OVER (PARTITION BY cust_id 
    ORDER BY purchase_time) AS rolling_cost)

我想要一个如下所示的索引：

 cust_id    time    rolling_cost
--------   ------  --------------
   1        Jan 1       5
   1        Jan 2       12
   1        Jan 3       14
   1        Jan 4       20
   1        Jan 5       24
   2        Jan 1       1
   2        Jan 2       7
   2        Jan 3       11
   2        Jan 4       14
   2        Jan 5       19
   3        Jan 1       4
   3        Jan 2       5
   3        Jan 3       10
   3        Jan 4       21
   3        Jan 5       45

从这里，我的原始查询可以通过简单地将每个cust_id减去2个已知值来计算，大致为cost_in_window = rolling_cost('2019-01-01') - rolling_cost('2018-01-01')，这将不需要从源表加载任何东西。

这作为索引可行吗？或者，有没有其他方法来实现同样的目标？

解决方案

您可能会发现这样更快：

select c.cust_id,
       (select sum(p.cost)
        from purchases p
        where p.cust_id = c.cust_id and
              p.purchase_time >= '2018-01-01' and
              p.purchase_time < '2019-01-01' and
       ) as total_cost
from customers c
having total_cost is not null;

然后，可以使用purchases(cust_id, purchase_time, cost)上的索引。仅计算金额时需要该索引。这是一笔省钱。更重要的是，没有整体汇总--这可能是对所有客户进行计算所节省的更大成本。

但是，对于相同的索引，这可能会更好一些：

select c.cust_id,
       (select sum(p.cost)
        from purchases p
        where p.cust_id = c.cust_id and
              p.purchase_time >= '2018-01-01' and
              p.purchase_time < '2019-01-01'
       ) as total_cost
from customers c
where exists (select 1
              from purchases p
              where p.cust_id = c.cust_id and
                    p.purchase_time >= '2018-01-01' and
                    p.purchase_time < '2019-01-01' 
             );

编辑：

实现所需内容的唯一方法是在数据中显式包含一个累计和列。这将需要重新组织查询(进行您想要的减法)，并使用触发器来维护值。

如果历史数据永远不变，这可能是一种合理的方法。但是，更新或插入较早的行可能会变得非常昂贵。

相关文章