SQL炼金术中的GroupBy和Sum?
问题描述
我尝试将表中的几个字段分组,然后将这些组相加,但它们被重复计数。
我的模型如下:
class CostCenter(db.Model):
__tablename__ = 'costcenter'
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.String)
number = db.Column(db.Integer)
class Expense(db.Model):
__tablename__ = 'expense'
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
glitem_id = db.Column(db.Integer, db.ForeignKey('glitem.id'))
glitem = db.relationship('GlItem')
costcenter_id = db.Column(db.Integer, db.ForeignKey('costcenter.id'))
costcenter = db.relationship('CostCenter')
value = db.Column(db.Float)
date = db.Column(db.Date)
我一直在使用:
expenses=db.session.query(Expense,func.sum(Expense.value)).group_by(Expense.date).filter(CostCenter.id.in_([1,2,3]))
当我打印费用时,它显示后面的SQL语句。在我看来是正确的,但是我不太熟悉SQL。问题是它输出为SUM_1的值被多次计数。如果我在"In Statement"中有[1]项,它将把这三个项相加。如果我有[1,2],它会把所有三个加起来,然后加倍,如果我有[1,2,3],它会把所有三个加起来,再加三倍。我不确定为什么它会计算多次。我如何修复此问题?
SELECT expense.id AS expense_id, expense.glitem_id AS expense_glitem_id, expense.costcenter_id AS expense_costcenter_id, expense.value AS expense_value, expense.date AS expense_date, sum(expense.value) AS sum_1
FROM expense, costcenter
WHERE costcenter.id IN (:id_1, :id_2, :id_3) GROUP BY expense.date
谢谢!
解决方案
这里有一些问题;您似乎没有查询正确的内容。按Expense.date分组时,选择费用对象是没有意义的。成本中心和费用之间需要某些联接条件,否则将复制行,每个成本中心都有计数,但两者之间没有关系。
您的查询应该如下所示:
session.query(
Expense.date,
func.sum(Expense.value).label('total')
).join(Expense.cost_center
).filter(CostCenter.id.in_([2, 3])
).group_by(Expense.date
).all()
生成此SQL:
SELECT expense.date AS expense_date, sum(expense.value) AS total
FROM expense JOIN cost_center ON cost_center.id = expense.cost_center_id
WHERE cost_center.id IN (?, ?) GROUP BY expense.date
这里有一个简单的可运行示例:
from datetime import datetime
from sqlalchemy import create_engine, Column, Integer, ForeignKey, Numeric, DateTime, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session, relationship
engine = create_engine('sqlite://', echo=True)
session = Session(bind=engine)
Base = declarative_base(bind=engine)
class CostCenter(Base):
__tablename__ = 'cost_center'
id = Column(Integer, primary_key=True)
class Expense(Base):
__tablename__ = 'expense'
id = Column(Integer, primary_key=True)
cost_center_id = Column(Integer, ForeignKey(CostCenter.id), nullable=False)
value = Column(Numeric(8, 2), nullable=False, default=0)
date = Column(DateTime, nullable=False)
cost_center = relationship(CostCenter, backref='expenses')
Base.metadata.create_all()
session.add_all([
CostCenter(expenses=[
Expense(value=10, date=datetime(2014, 8, 1)),
Expense(value=20, date=datetime(2014, 8, 1)),
Expense(value=15, date=datetime(2014, 9, 1)),
]),
CostCenter(expenses=[
Expense(value=45, date=datetime(2014, 8, 1)),
Expense(value=40, date=datetime(2014, 9, 1)),
Expense(value=40, date=datetime(2014, 9, 1)),
]),
CostCenter(expenses=[
Expense(value=42, date=datetime(2014, 7, 1)),
]),
])
session.commit()
base_query = session.query(
Expense.date,
func.sum(Expense.value).label('total')
).join(Expense.cost_center
).group_by(Expense.date)
# first query considers center 1, output:
# 2014-08-01: 30.00
# 2014-09-01: 15.00
for row in base_query.filter(CostCenter.id.in_([1])).all():
print('{}: {}'.format(row.date.date(), row.total))
# second query considers centers 1, 2, and 3, output:
# 2014-07-01: 42.00
# 2014-08-01: 75.00
# 2014-09-01: 95.00
for row in base_query.filter(CostCenter.id.in_([1, 2, 3])).all():
print('{}: {}'.format(row.date.date(), row.total))
相关文章