MongoDB汇总子文档上的每个键
我有多个具有此架构的文档,每个文档每天针对每个产品:
I have multiple documents with this schema, each document is per product per day:
{
_id:{},
app_id:'DHJFK67JDSJjdasj909',
date:'2014-08-07',
event_count:32423,
event_count_per_type: {
0:322,
10:4234,
20:653,
30:7562
}
}
我想获取特定日期范围内每个 event_type 的总和.
这是我正在寻找的输出,其中每种事件类型已在所有文档中求和.event_count_per_type 的键可以是任何东西,所以我需要一些可以循环遍历它们的东西,而不必隐含它们的名称.
I would like to get the sum of each event_type for a particular date_range.
This is the output I am looking for where each event type has been summed across all the documents. The keys for event_count_per_type can be anything, so I need something that can loop through each of them as opposed to be having to be implicit with their names.
{
app_id:'DHJFK67JDSJjdasj909',
event_count:324236456,
event_count_per_type: {
0:34234222,
10:242354,
20:456476,
30:56756
}
}
到目前为止,我已经尝试了几个查询,这是迄今为止我得到的最好的查询,但是子文档值没有相加:
I have been trying several queries so far, this is the best I have got so far but the sub document values are not summed:
db.events.aggregate(
{
$match: {app_id:'DHJFK67JDSJjdasj909'}
},
{
$group: {
_id: {
app_id:'$app_id',
},
event_count: {$sum:'$event_count'},
event_count_per_type: {$sum:'$event_count_per_type'}
}
},
{
$project: {
_id:0,
app_id:'$_id.app_id',
event_count:1,
event_count_per_type:1
}
}
)
我看到的输出是 event_count_per_type 键的值 0,而不是对象.我可以修改架构,使键位于文档的顶层,但这仍然意味着我需要在每个键的组语句中都有一个条目,因为我不知道键名是什么,所以我不能做.
The output I am seeing is a value of 0 for the event_count_per_type key, instead of an object. I could modify the schema so the keys are on the top level of the document but that will still mean that I need to have an entry in the group statement for each key, which as I do not know what the key names will be I cannot do.
如有任何帮助,我将不胜感激,如果需要,我愿意更改我的架构并尝试使用 mapReduce(尽管从文档看来性能很差.)
Any help would be appreciated, I am willing to change my schema if need be and also to try mapReduce (although from the documentation it seems like the performance is bad.)
推荐答案
如上所述,使用聚合框架处理这样的文档是不可能的,除非您实际上要提供所有键,例如:
As stated, processing documents like this is not possible with the aggregation framework unless you are actually going to supply all of the keys, such as:
db.events.aggregate([
{ "$group": {
"_id": "$app_id",
"event_count": { "$sum": "$event_count" },
"0": { "$sum": "$event_count_per_type.0" },
"10": { "$sum": "$event_count_per_type.10" }
"20": { "$sum": "$event_count_per_type.20" }
"30": { "$sum": "$event_count_per_type.30" }
}}
])
但您当然必须明确指定您希望处理的每个键.MongoDB 中的聚合框架和一般查询操作都是如此,因为要访问以这种子文档"形式标注的元素,您需要指定元素的确切路径"才能对其进行任何操作.
But you do of course have to explicitly specify every key you wish to work on. This is true of both the aggregation framework and general query operations in MongoDB, as to access elements notated in this "sub-document" form you need to specify the "exact path" to the element in order to do anything with it.
聚合框架和通用查询没有遍历"的概念,这意味着它们无法处理文档的每个键".这需要一种语言结构才能完成这些接口中未提供的功能.
The aggregation framework and general queries have no concept of "traversal", which mean they cannot process "each key" of a document. That requires a language construct in order to do which is not provided in these interfaces.
一般来说,使用键名"作为其名称实际上代表值"的数据点有点反模式".对此进行建模的更好方法是使用数组并将您的类型"本身表示为值:
Generally speaking though, using a "key name" as a data point where it's name actually represents a "value" is a bit of an "anti-pattern". A better way to model this would be to use an array and represent your "type" as a value by itself:
{
"app_id": "DHJFK67JDSJjdasj909",
"date: ISODate("2014-08-07T00:00:00.000Z"),
"event_count": 32423,
"events": [
{ "type": 0, "value": 322 },
{ "type": 10, "value": 4234 },
{ "type": 20, "value": 653 },
{ "type": 30, "value": 7562 }
]
}
还要注意日期"现在是一个正确的日期对象而不是一个字符串,这也是一个很好的做法.这种数据虽然很容易使用聚合框架进行处理:
Also noting that the "date" is now a proper date object rather than a string, which is also something that is good practice to do. This sort of data though is easy to process with the aggregation framework:
db.events.aggregate([
{ "$unwind": "$events" },
{ "$group": {
"_id": {
"app_id": "$app_id",
"type": "$events.type"
},
"event_count": { "$sum": "$event_count" },
"value": { "$sum": "$value" }
}},
{ "$group": {
"_id": "$_id.app_id",
"event_count": { "$sum": "$event_count" },
"events": { "$push": { "type": "$_id.type", "value": "$value" } }
}}
])
这显示了一个两阶段分组,首先获取每个类型"的总数而不指定每个键",因为您不再需要指定每个键",然后作为每个app_id"的单个文档返回,结果在数组中原样原来存储的.这种数据形式对于查看某些类型"甚至某个范围内的值"通常要灵活得多.
That shows a two stage grouping that first gets the totals per "type" without specifying each "key" since you no longer have to, then returns as a single document per "app_id" with the results in an array as they were originally stored. This data form is generally much more flexible for looking at certain "types" or even the "values" within a certain range.
如果您无法更改结构,那么您唯一的选择是 mapReduce.这允许您编码"键的遍历,但由于这需要 JavaScript 解释和执行,它不如聚合框架快:
Where you cannot change the structure then your only option is mapReduce. This allows you to "code" the traversal of the keys, but since this requires JavaScript interpretation and execution it is not as fast as the aggregation framework:
db.events.mapReduce(
function() {
emit(
this.app_id,
{
"event_count": this.event_count,
"event_count_per_type": this.event_count_per_type
}
);
},
function(key,values) {
var reduced = { "event_count": 0, "event_count_per_type": {} };
values.forEach(function(value) {
for ( var k in value.event_count_per_type ) {
if ( !redcuced.event_count_per_type.hasOwnProperty(k) )
reduced.event_count_per_type[k] = 0;
reduced.event_count_per_type += value.event_count_per_type;
}
reduced.event_count += value.event_count;
})
},
{
"out": { "inline": 1 }
}
)
这实际上将遍历并组合键",并对找到的每个键的值求和.
That will essentially traverse and combine the "keys" and sum up the values for each one found.
所以你的选择是:
- 更改结构并使用标准查询和聚合.
- 保持结构不变,需要 JavaScript 处理和 mapReduce.
这取决于您的实际需求,但在大多数情况下,重组会产生好处.
It depends on your actual needs, but in most cases restructuring yields benefits.
相关文章