等效于 mongo 的 out:reduce 选项在 hadoop

2022-01-13 00:00:00 mongodb hadoop mapreduce java

我正在重写一个 MongoDB map reduce 作业以使用 Hadoop(使用 mongo-hadoop 连接器),但是当我将两个数据集映射到同一个集合时,它会覆盖值而不是使用它们

I'm rewriting a MongoDB map reduce job to use Hadoop instead (using the mongo-hadoop connector), but when I map two datasets to the same collection, it overwrites the values instead of using them

{ reduce : "collectionName" } - 如果结果集中和旧集合中的给定键存在文档,则将对这两个值执行归约操作(使用指定的归约函数),并且结果将被写入输出集合.如果提供了 finalize 函数,这也将在 reduce 之后运行.

如何使用 mongo-hadoop?

How is done using mongo-hadoop?



To anyone else looking for this, support for multiple input is coming soon.


The branch with the change is located here. It's pretty well done, we're using it in production.
