通过拆分字段值来重塑文档
假设我们有一组原始数据:
Suppose we have a collection of raw data:
{ "person": "David, age 102"}
{ "person": "Max, age 8" }
我们想将该集合转换为:
and we'd like to transform that collection to:
{ "age": 102 }
{ "age": 8 }
仅使用 mongo(d) 引擎.(如果所有人名或年龄的长度相等, $substr 可以完成这项工作,)有可能吗?
using only mongo(d) engine. (If all person names or ages had equal lengths, $substr could do the job, ) Is it possible?
假设正则表达式是微不足道的/d+/
Suppose regex is trivial /d+/
推荐答案
MongoDB 3.4版本中的最优方式.
此版本的 mongod
提供 $split
运算符,它当然会拆分字符串,如此处所示.
然后我们使用 $let
变量运算符.然后可以在 in 表达式中使用新值,以使用 $arrayElemAt
运算符返回指定索引处的元素;0
表示第一个元素,-1
表示最后一个元素.
We then assign the the newly computed value to a variable using the $let
variable operator. The new value can then be use in the in expression to return the "name" and the "age" values using the $arrayElemAt
operator to return the element at a specified index; 0
for the first element and -1
for the last element.
请注意,在 in 表达式中,我们需要拆分最后一个元素才能返回整数字符串.
Note that in the in expression we need to split the last element in order to return the string of integer.
最后我们需要迭代 Cursor
对象并使用 Number
或 parseInt
并使用批量操作和 bulkWrite()
方法到 $set
这些字段的值以获得最大效率.
Finally we need to iterate the Cursor
object and cast the convert the string of integer to numeric using Number
or parseInt
and use bulk operation and the bulkWrite()
method to $set
the value for those field for maximum efficiency.
let requests = [];
db.coll.aggregate(
[
{ "$project": {
"person": {
"$let": {
"vars": {
"infos": { "$split": [ "$person", "," ] }
},
"in": {
"name": { "$arrayElemAt": [ "$$infos", 0 ] },
"age": {
"$arrayElemAt": [
{ "$split": [
{ "$arrayElemAt": [ "$$infos", -1 ] },
" "
]},
-1
]
}
}
}
}
}}
]
).forEach(document => {
requests.push({
"updateOne": {
"filter": { "_id": document._id },
"update": {
"$set": {
"name": document.person.name,
"age": Number(document.person.age)
},
"$unset": { "person": " " }
}
}
});
if ( requests.length === 500 ) {
// Execute per 500 ops and re-init
db.coll.bulkWrite(requests);
requests = [];
}}
);
// Clean up queues
if(requests.length > 0) {
db.coll.bulkWrite(requests);
}
<小时>
MongoDB 3.2 或更新版本.
MongoDB 3.2 弃用了旧的 Bulk()
API 及其相关的方法 并提供bulkWrite()
方法,但它不提供 $split
运算符,因此我们这里唯一的选择是使用 mapReduce()
方法来转换我们的数据,然后使用批量操作更新集合.
MongoDB 3.2 or newer.
MongoDB 3.2 deprecates the old Bulk()
API and its associated methods and provides the bulkWrite()
method but it doesn't provide the $split
operator so the only option we have here is to use the mapReduce()
method to transform our data then update the collection using bulk operation.
var mapFunction = function() {
var person = {},
infos = this.person.split(/[,s]+/);
person["name"] = infos[0];
person["age"] = infos[2];
emit(this._id, person);
};
var results = db.coll.mapReduce(
mapFunction,
function(key, val) {},
{ "out": { "inline": 1 } }
)["results"];
results.forEach(document => {
requests.push({
"updateOne": {
"filter": { "_id": document._id },
"update": {
"$set": {
"name": document.value.name,
"age": Number(document.value.age)
},
"$unset": { "person": " " }
}
}
});
if ( requests.length === 500 ) {
// Execute per 500 operations and re-init
db.coll.bulkWrite(requests);
requests = [];
}}
);
// Clean up queues
if(requests.length > 0) {
db.coll.bulkWrite(requests);
}
<小时>
MongoDB 版本 2.6 或 3.0.
我们需要使用现已弃用的 Bulk API.p>
var bulkOp = db.coll.initializeUnorderedBulkOp();
var count = 0;
results.forEach(function(document) {
bulkOp.find({ "_id": document._id}).updateOne(
{
"$set": {
"name": document.value.name,
"age": Number(document.value.age)
},
"$unset": { "person": " " }
}
);
count++;
if (count === 500 ) {
// Execute per 500 operations and re-init
bulkOp.execute();
bulkOp = db.coll.initializeUnorderedBulkOp();
}
});
// clean up queues
if (count > 0 ) {
bulkOp.execute();
}
相关文章