mongodb 中 OR 条件匹配的百分比
我的数据格式如下..
I have got my data in following format..
{
"_id" : ObjectId("534fd4662d22a05415000000"),
"product_id" : "50862224",
"ean" : "8808992479390",
"brand" : "LG",
"model" : "37LH3000",
"features" : [{
{
"key" : "Screen Format",
"value" : "16:9",
}, {
"key" : "DVD Player / Recorder",
"value" : "No",
},
"key" : "Weight in kg",
"value" : "12.6",
}
... so on
]
}
我需要将一种产品的功能与其他产品进行比较,并根据功能匹配的百分比将结果分为不同的类别(100% 匹配,50-99 % 匹配).
I need to compare features of one product with others and divide the result into separate categories ( 100% match, 50-99 % match) based on % of feature matches..
我最初的想法是为每个功能准备一个带有或条件的动态查询,并在 php 中做百分比的事情,但这意味着即使那些只有 1 个功能匹配的产品,mongodb 也会返回给我.而且我认为一个类别的几乎所有产品都可能有一些共同点,所以我担心我可能会在 php 中开发很多产品.
My initial thought was to prepare a dynamic query with or condition for each feature and do the percentage thing in php but then that means mongodb will return me even those product which only have 1 feature matching. And I I think nearly all products of a category might have some feature in common, so I fear I might be working on lot of products in php.
我基本上有两个问题.
- 有其他替代方法吗?
- 我使用的数据结构是否足以支持我正在寻找的功能,或者我应该考虑改变它
推荐答案
嗯,你的解决方案真的应该是 MongoDB 特定的,否则你最终会在客户端进行计算和可能的匹配,这对性能.
Well your solution really should be MongoDB specific otherwise you will end up doing your calculations and possible matching on the client side, and that is not going to be good for performance.
当然,您真正想要的是一种在服务器端进行处理的方法:
So of course what you really want is a way for that to have that processing on the server side:
db.products.aggregate([
// Match the documents that meet your conditions
{ "$match": {
"$or": [
{
"features": {
"$elemMatch": {
"key": "Screen Format",
"value": "16:9"
}
}
},
{
"features": {
"$elemMatch": {
"key" : "Weight in kg",
"value" : { "$gt": "5", "$lt": "8" }
}
}
},
]
}},
// Keep the document and a copy of the features array
{ "$project": {
"_id": {
"_id": "$_id",
"product_id": "$product_id",
"ean": "$ean",
"brand": "$brand",
"model": "$model",
"features": "$features"
},
"features": 1
}},
// Unwind the array
{ "$unwind": "$features" },
// Find the actual elements that match the conditions
{ "$match": {
"$or": [
{
"features.key": "Screen Format",
"features.value": "16:9"
},
{
"features.key" : "Weight in kg",
"features.value" : { "$gt": "5", "$lt": "8" }
},
]
}},
// Count those matched elements
{ "$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}},
// Restore the document and divide the mated elements by the
// number of elements in the "or" condition
{ "$project": {
"_id": "$_id._id",
"product_id": "$_id.product_id",
"ean": "$_id.ean",
"brand": "$_id.brand",
"model": "$_id.model",
"features": "$_id.features",
"matched": { "$divide": [ "$count", 2 ] }
}},
// Sort by the matched percentage
{ "$sort": { "matched": -1 } }
])
既然您知道所应用的 $or
条件的长度",那么您只需找出特征"中有多少个元素数组匹配这些条件.这就是管道中第二个 $match 的全部内容.
So as you know the "length" of the $or
condition being applied, then you simply need to find out how many of the elements in the "features" array match those conditions. So that is what the second $match in the pipeline is all about.
获得该计数后,您只需除以作为$or
传入的条件数即可.这里的美妙之处在于,现在您可以用它做一些有用的事情,比如按相关性排序,然后甚至分页"结果服务器端.
Once you have that count, you simply divide by the number of conditions what were passed in as your $or
. The beauty here is that now you can do something useful with this like sort by that relevance and then even "page" the results server side.
当然,如果您想要对此进行一些额外的分类",您需要做的就是在管道的末尾添加另一个 $project
阶段:
Of course if you want some additional "categorization" of this, all you would need to do is add another $project
stage to the end of the pipeline:
{ "$project": {
"product_id": 1
"ean": 1
"brand": 1
"model": 1,
"features": 1,
"matched": 1,
"category": { "$cond": [
{ "$eq": [ "$matched", 1 ] },
"100",
{ "$cond": [
{ "$gte": [ "$matched", .7 ] },
"70-99",
{ "$cond": [
"$gte": [ "$matched", .4 ] },
"40-69",
"under 40"
]}
]}
]}
}}
或者类似的东西.但是 $cond
接线员可以在这里为您提供帮助.
Or as something similar. But the $cond
operator can help you here.
架构应该没问题,因为您可以在特征数组中的条目的键"和值"上有一个复合索引,这应该可以很好地扩展查询.
The architecture should be fine as you have it as you can have a compound index on the "key" and "value" for the entries in your features array and this should scale well for queries.
当然,如果您确实需要更多的东西,例如分面搜索和结果,您可以查看 Solr 或弹性搜索等解决方案.但是这里的完整实现会有点冗长.
Of course if you actually need something more than that, such as faceted searching and results, you can look at solutions like Solr or elastic search. But the full implementation of that would be a bit lengthy for here.
相关文章