Hadoop 选项没有任何效果(mapreduce.input.lineinputformat.linespermap、mapred.max.map.failures.percent)

2022-01-13 00:00:00 hadoop mapreduce java

我正在尝试实现一个 MapReduce 作业，其中每个映射器将占用 150 行文本文件，并且所有映射器将同时运行；此外，无论有多少地图任务失败，它都不应该失败.

I am trying to implement a MapReduce job, where each of the mappers would take 150 lines of the text file, and all the mappers would run simmultaniously; also, it should not fail, no matter how many map tasks fail.

下面是配置部分:

JobConf conf = new JobConf(Main.class); conf.setJobName("My mapreduce"); conf.set("mapreduce.input.lineinputformat.linespermap", "150"); conf.set("mapred.max.map.failures.percent","100"); conf.setInputFormat(NLineInputFormat.class); FileInputFormat.addInputPath(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1]));

问题是 hadoop 为每一行文本创建一个映射器，它们似乎是按顺序运行的，如果单个失败，则作业失败.

The problem is that hadoop creates a mapper for every single line of text, they seem to run sequentially, and if a single one fails, the job fails.

由此推断，我应用的设置没有任何效果.

From this I deduce, that the settings I've applied do not have any effect.

我做错了什么?

推荐答案

如果您想快速找到 hadoop 新 api 选项的正确名称，请使用此链接:http://pydoop.sourceforge.net/docs/examples/intro.html#hadoop-0-21-0-笔记 .

If you want to quickly find the correct names for the options for hadoop's new api, use this link: http://pydoop.sourceforge.net/docs/examples/intro.html#hadoop-0-21-0-notes .

相关文章