job.setOutputKeyClass 和 job.setOutputReduceClass 指的是哪里?

2022-01-13 00:00:00 hadoop mapreduce java

我以为他们指的是 Reducer,但在我的程序中我有

I thought that they refer to the Reducer but in my program I have

公共静态类 MyMapper 扩展映射器

公共静态类 MyReducer 扩展减速机<文本、文本、NullWritable、文本>

如果我有

job.setOutputKeyClass(NullWritable.class);

job.setOutputValueClass(Text.class);

我得到以下异常

map 中键的类型不匹配:预期 org.apache.hadoop.io.NullWritable,收到 org.apache.hadoop.io.Text

但如果我有

job.setOutputKeyClass(Text.class);

没有问题.

我的代码是否有问题,或者这是因为 NullWritable 或其他原因而发生的?

Is there sth wrong with my code or this happens because of NullWritable or sth else?

我还必须使用 job.setInputFormatClassjob.setOutputFormatClass 吗?因为我的程序没有它们也能正常运行.

Also do I have to use job.setInputFormatClass and job.setOutputFormatClass? Because my programs runs correctly without them.

推荐答案

调用 job.setOutputKeyClass( NullWritable.class ); 将设置预期的类型作为 map 和 reduce 阶段的输出.

Calling job.setOutputKeyClass( NullWritable.class ); will set the types expected as output from both the map and reduce phases.

如果您的 Mapper 发出的类型与 Reducer 不同,您可以使用 JobConfsetMapOutputKeyClass()setMapOutputValueClass() 方法.这些隐式设置了 Reducer 期望的输入类型.

If your Mapper emits different types than the Reducer, you can set the types emitted by the mapper with the JobConf's setMapOutputKeyClass() and setMapOutputValueClass() methods. These implicitly set the input types expected by the Reducer.

(来源:雅虎开发者教程)

关于第二个问题,默认的 InputFormatTextInputFormat.这将每个输入文件的每一行视为单独的记录,并且不执行解析.如果您需要以不同的格式处理您的输入,您可以调用这些方法,以下是一些示例:

Regarding your second question, the default InputFormat is the TextInputFormat. This treats each line of each input file as a separate record, and performs no parsing. You can call these methods if you need to process your input in a different format, here are some examples:

InputFormat             | Description                                      | Key                                      | Value
--------------------------------------------------------------------------------------------------------------------------------------------------------
TextInputFormat         | Default format; reads lines of text files        | The byte offset of the line              | The line contents
KeyValueInputFormat     | Parses lines into key, val pairs                 | Everything up to the first tab character | The remainder of the line
SequenceFileInputFormat | A Hadoop-specific high-performance binary format | user-defined                             | user-defined

OutputFormat 的默认实例是 TextOutputFormat,它将(键、值)对写入文本文件的各行.下面是一些例子:

The default instance of OutputFormat is TextOutputFormat, which writes (key, value) pairs on individual lines of a text file. Some examples below:

OutputFormat             | Description
---------------------------------------------------------------------------------------------------------
TextOutputFormat         | Default; writes lines in "key 	 value" form
SequenceFileOutputFormat | Writes binary files suitable for reading into subsequent MapReduce jobs
NullOutputFormat         | Disregards its inputs

(来源:其他雅虎开发者教程)

相关文章