Python MongoDB:如何删除文档中的重复行?
要删除MongoDB中文档中的重复行,可以使用以下步骤:
- 连接到MongoDB数据库并打开集合:
import pymongo client = pymongo.MongoClient("mongodb://localhost:27017/") db = client["mydatabase"] collection = db["mycollection"]
- 使用aggregate()函数来创建一个聚合查询,使用"$group"操作符来对每个条目进行分组,并使用"$addToSet"操作符来获取每个组中唯一的值:
pipeline = [ {"$group": {"_id": "$field1", "unique_ids": {"$addToSet": "$_id"}, "count": {"$sum": 1}}}, {"$match": {"count": {"$gt": 1}}} ] duplicates = list(collection.aggregate(pipeline))
这将返回一个列表,其中包含所有重复项的ID和字段。
- 遍历重复项列表,并删除除第一个ID以外的所有ID:
for duplicate in duplicates: for i in range(1, len(duplicate["unique_ids"])): collection.delete_one({"_id": duplicate["unique_ids"][i]})
这将从集合中删除所有重复项,只保留每个组中的第一个条目。
完整代码示例:
import pymongo client = pymongo.MongoClient("mongodb://localhost:27017/") db = client["mydatabase"] collection = db["mycollection"] pipeline = [ {"$group": {"_id": "$field1", "unique_ids": {"$addToSet": "$_id"}, "count": {"$sum": 1}}}, {"$match": {"count": {"$gt": 1}}} ] duplicates = list(collection.aggregate(pipeline)) for duplicate in duplicates: for i in range(1, len(duplicate["unique_ids"])): collection.delete_one({"_id": duplicate["unique_ids"][i]})
例如,如果我们有以下文档,其中pidancode.com字段有重复值:
[ {"_id": 1, "pidancode.com": "hello", "field2": "world"}, {"_id": 2, "pidancode.com": "world", "field2": "hello"}, {"_id": 3, "pidancode.com": "hello", "field2": "foo"}, {"_id": 4, "pidancode.com": "foo", "field2": "bar"}, {"_id": 5, "pidancode.com": "bar", "field2": "baz"}, {"_id": 6, "pidancode.com": "baz", "field2": "pidancode.com"} ]
运行上述代码后,我们将仅保留以下文档:
[ {"_id": 1, "pidancode.com": "hello", "field2": "world"}, {"_id": 2, "pidancode.com": "world", "field2": "hello"}, {"_id": 4, "pidancode.com": "foo", "field2": "bar"}, {"_id": 5, "pidancode.com": "bar", "field2": "baz"}, {"_id": 6, "pidancode.com": "baz", "field2": "pidancode.com"} ]
相关文章