如何在 SQLAlchemy 中使用子查询来生成移动平均线?

2022-01-23 00:00:00 python postgresql sqlalchemy subquery

问题描述

我的问题是我想检索测量列表以及这些测量的移动平均值.我可以用这个 SQL 语句(postgresql 间隔语法)做到这一点:

My problem is that I want to retrieve both a list of measurements along with a moving average of those measurements. I can do that with this SQL statement (postgresql interval syntax):

SELECT time, value, ( SELECT AVG(t2.value) FROM measurements t2 WHERE t2.time BETWEEN t1.time - interval '5 days' AND t1.time ) moving_average FROM measurements t1 ORDER BY t1.time;

我想让 SQLAlchemy 代码产生一个类似的语句来达到这个效果.我目前有这个 Python 代码:

I want to have the SQLAlchemy code to produce a similar statement to this effect. I currently have this Python code:

moving_average_days = # configureable value, defaulting to 5 t1 = Measurements.alias('t1') t2 = Measurements.alias('t2') query = select([t1.c.time, t1.c.value, select([func.avg(t2.c.value)], t2.c.time.between(t1.c.time - datetime.timedelta(moving_average_days), t1.c.time))], t1.c.time > (datetime.datetime.utcnow() - datetime.timedelta(ndays))). order_by(Measurements.c.time)

然而，生成这个 SQL:

That however, generates this SQL:

SELECT t1.time, t1.value, avg_1 FROM measurements AS t1, ( SELECT avg(t2.value) AS avg_1 FROM measurements AS t2 WHERE t2.time BETWEEN t1.time - %(time_1)s AND t1.time ) WHERE t1.time > %(time_2)s ORDER BY t1.time;

该 SQL 将子查询作为 FROM 子句的一部分，在该子句中它不能对顶级值的列值进行标量访问，即它会导致 PostgreSQL 吐出此错误:

That SQL has the subquery as part of the FROM clause where it cannot have scalar access to the column values of the top-level values, i.e. it causes PostgreSQL to spit out this error:

ERROR: subquery in FROM cannot refer to other relations of same query level LINE 6: WHERE t2.time BETWEEN t1.time - interval '5 days' AN...

因此我想知道的是:如何让 SQLAlchemy 将子查询移动到 SELECT 子句?

What I would thus like to know is: how do I get SQLAlchemy to move the subquery to the SELECT clause?

另一种获得移动平均线的方法(无需对每个(时间、值)对执行查询)也是一种选择.

Alternatively another way to get a moving average (without performing a query for each (time,value) pair) would be an option.

解决方案

好吧，显然我需要的是使用所谓的标量选择.通过使用这些，我得到了这个 python 代码，它实际上可以按照我的意愿工作(生成与我的目标中的第一个问题等效的 SQL):

Right, apparently what I needed was the use of a so-called scalar select. With the use of those I get this python code, which actually works as I want it to (generates the equivalent SQL to that of the first in my question which was my goal):

moving_average_days = # configurable value, defaulting to 5 ndays = # configurable value, defaulting to 90 t1 = Measurements.alias('t1') ###### t2 = Measurements.alias('t2') query = select([t1.c.time, t1.c.value, select([func.avg(t2.c.value)], t2.c.time.between(t1.c.time - datetime.timedelta(moving_average_days), t1.c.time)).label('moving_average')], t1.c.time > (datetime.datetime.utcnow() - datetime.timedelta(ndays))). order_by(t1.c.time)

这给出了这个 SQL:

This gives this SQL:

SELECT t1.time, t1.value, ( SELECT avg(t2.value) AS avg_1 FROM measurements AS t2 WHERE t2.time BETWEEN t1.time - :time_1 AND t1.time ) AS moving_average FROM measurements AS t1 WHERE t1.time > :time_2 ORDER BY t1.time;

相关文章