Hadoop 选项没有任何效果(mapreduce.input.lineinputformat.linespermap、mapred.max.map.failures.percent)

2022-01-13 00:00:00 hadoop mapreduce java

我正在尝试实现一个 MapReduce 作业,其中每个映射器将占用 150 行文本文件,并且所有映射器将同时运行;此外,无论有多少地图任务失败,它都不应该失败.

I am trying to implement a MapReduce job, where each of the mappers would take 150 lines of the text file, and all the mappers would run simmultaniously; also, it should not fail, no matter how many map tasks fail.

下面是配置部分:

        JobConf conf = new JobConf(Main.class);
        conf.setJobName("My mapreduce");

        conf.set("mapreduce.input.lineinputformat.linespermap", "150");
        conf.set("mapred.max.map.failures.percent","100");

        conf.setInputFormat(NLineInputFormat.class);

        FileInputFormat.addInputPath(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

问题是 hadoop 为每一行文本创建一个映射器,它们似乎是按顺序运行的,如果单个失败,则作业失败.

The problem is that hadoop creates a mapper for every single line of text, they seem to run sequentially, and if a single one fails, the job fails.

由此推断,我应用的设置没有任何效果.

From this I deduce, that the settings I've applied do not have any effect.

我做错了什么?

推荐答案

如果您想快速找到 hadoop 新 api 选项的正确名称,请使用此链接:http://pydoop.sourceforge.net/docs/examples/intro.html#hadoop-0-21-0-笔记 .

If you want to quickly find the correct names for the options for hadoop's new api, use this link: http://pydoop.sourceforge.net/docs/examples/intro.html#hadoop-0-21-0-notes .

相关文章