需要指导:后端 SQL 逻辑,用于前端用户动态选择字段

我有一个数据集如下:

Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3

鉴于:

  1. 如果任何一个国家在数据中有 n 年,所有其他国家也应有相同的 n 年.例如:如果美国有 2011 年和 2012 年的数据,那么所有其他国家/地区都将有 2011 年和 2012 年的数据.

条件:

  1. 聚合仅在多国选择时发生.将按商品和年份分组.

例如:如果前端工具中的用户选择美国和阿根廷,我们必须显示 -

Eg: If a user in the frontend tool selects US and Argentina, we have to show -

衍生产量的数量 =(美国收获量 + 阿根廷收获量)/(美国产量 + 阿根廷产量),即 (2.44+15.2)/(6.48+2.66),同样,对于三个国家,它将是三个收获值的相加除以三个产值的相加,依此类推.必须将其填充到新行中.

The Amount for Derived Yield = (Harvested of US + Harvested of Argentina)/(Production of US + Production of Argentina), i.e., (2.44+15.2)/(6.48+2.66), similarly for three countries it will be addition of three harvested value divided by addition of three production value and so on. That has to be populated in a new row.

注意:前端用户可以选择任意国家/地区组合.在后端执行而不是在前端动态执行的唯一目的是因为 AWS QuickSight(我们的可视化工具)虽然可以在选定的列过滤器上填充总和,但尚不支持对那些衍生的求和字段进行计算.因此,必须预先填充所有国家/地区组合的整个计算(非常幼稚的方法),以便在报告中提供.

Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report.

我向所有 SQL 专家提出的两个问题是:

Two of my question to all SQL experts is:

  • 如何填充按年份和商品分组的所有国家/地区组合的行,以便包含所有可能组合的数据.
  • 鉴于我可以填充所有行组合,报告工具将如何根据用户选择的国家/地区了解选择哪个派生行,因为该行标记为美国 + 阿根廷,该行为美国 + 不丹,等

非常欢迎任何解决方案.

Any solution is extremely welcome.

首选 SQL 工具:Spark SQL 或 Athena SQL(在 Presto 上运行)或 HiveQL.次选:Oracle、PGSQL

SQL Tool preferred: Spark SQL or Athena SQL (runs on Presto) or HiveQL. Less preferred: Oracle, PGSQL

注意 2:发布这个问题的唯一目的,即使我已经在另一个问题中详细阐述过,也是因为我不想将我的天真方法强加给试图解决问题的人问题,所以在这里,与寻求解决方案的帮助相比,我更清楚地定义了问题.而在另一个问题中,我已经给出了预期结果的方法.如果您想查看其他问题,这里是.

Note 2: The sole purpose of posting this question, even though I've elaborated the same in another one is because I don't want to impose my naive approach on somebody trying to solve the problem, so here, I've defined the problem with more clarity than asking for help in solution. Whereas, in the other question I have given my approach for the expected result. In case if you want to see the other question, here it is.

推荐答案

你可以从这样的事情开始:

you can start with something like this:

select * from
(
    select c.Country, y.Year
    from
    (select distinct Country from table) as c,
    (select distinct Year from table) as y
) as cy
left join table as t on t.Country = cy.Country and t.Year = cy.Year

这将为您提供包含国家/年所有组合的所有行以及主表中的可选数据,因此您现在可以添加过滤器/分组

this will give you all rows with all combinations of Country/Year and optionally data from main table, so you can now add filter/grouping

相关文章