如何对自定义 RecordReader 和 InputFormat 类进行单元测试?

2022-01-13 00:00:00 unit-testing hadoop mapreduce java

我开发了一个 map-reduce 程序.我编写了自定义 RecordReaderInputFormat 类.

I have developed one map-reduce program. I have written custom RecordReader and InputFormat classes.

我正在使用 MR UnitMockito 对 mapper 和 reducer 进行单元测试.

I am using MR Unit and Mockito for unit testing of mapper and reducer.

我想知道如何对自定义 RecordReaderInputFormat 类进行单元测试?测试这些类的首选方法是什么?

I would like to know how to unit test custom RecordReader and InputFormat classes? What is the most preferred way to test these classes?




import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.TaskAttemptID;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl;
import org.apache.hadoop.util.ReflectionUtils;
import java.io.File;

Configuration conf = new Configuration(false);
conf.set("fs.default.name", "file:///");

File testFile = new File("path/to/file");
Path path = new Path(testFile.getAbsoluteFile().toURI());
FileSplit split = new FileSplit(path, 0, testFile.length(), null);

InputFormat inputFormat = ReflectionUtils.newInstance(MyInputFormat.class, conf);
TaskAttemptContext context = new TaskAttemptContextImpl(conf, new TaskAttemptID());
RecordReader reader = inputFormat.createRecordReader(split, context);

reader.initialize(split, context);
