执行 BufferedReader.close() 时 Hadoop FileSystem 关闭异常
在 Reduce 设置方法中,我试图关闭 BufferedReader
对象并获得 FileSystem
关闭异常.它不会一直发生.这是我用来创建 BufferedReader
的一段代码.
From within the Reduce setup method,I am trying to close a BufferedReader
object and getting a FileSystem
closed exception. It does not happen all the time. This is the piece of code I used to create the BufferedReader
.
String fileName = <some HDFS file path>
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path hdfsPath = new Path(filename);
FSDataInputStream in = fs.open(hdfsPath);
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
我从 bufferedReader 读取内容,一旦所有读取完成,我就关闭它.
I read contents from the bufferedReader and once all the reading is done, I close it.
这是读取它的一段代码
String line;
while ((line = reader.readLine()) != null) {
// Do something
}
这段代码关闭了阅读器.
This the piece of code that closes the reader.
if (bufferedReader != null) {
bufferedReader.close();
}
这是我执行 bufferedReader.close()
时发生的异常的堆栈跟踪.
This is the stack trace for the exception that happens when I do a bufferedReader.close()
.
我,[2013-11-18T04:56:51.601135 #25683] 信息——:尝试_201310111840_142285_r_000009_0:在org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
I, [2013-11-18T04:56:51.601135 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
我,[2013-11-18T04:56:51.601168 #25683] 信息 -- :尝试_201310111840_142285_r_000009_0:在org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:522)
I, [2013-11-18T04:56:51.601168 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:522)
我,[2013-11-18T04:56:51.601199 #25683] 信息——:尝试_201310111840_142285_r_000009_0:在java.io.FilterInputStream.close(FilterInputStream.java:155)
I, [2013-11-18T04:56:51.601199 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.FilterInputStream.close(FilterInputStream.java:155)
我,[2013-11-18T04:56:51.601230 #25683] 信息——:尝试_201310111840_142285_r_000009_0:在sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:358)
I, [2013-11-18T04:56:51.601230 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:358)
我,[2013-11-18T04:56:51.601263 #25683] 信息 -- :尝试_201310111840_142285_r_000009_0:在sun.nio.cs.StreamDecoder.close(StreamDecoder.java:173)
I, [2013-11-18T04:56:51.601263 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at sun.nio.cs.StreamDecoder.close(StreamDecoder.java:173)
我,[2013-11-18T04:56:51.601356 #25683] 信息 -- :尝试_201310111840_142285_r_000009_0:在java.io.InputStreamReader.close(InputStreamReader.java:182)
I, [2013-11-18T04:56:51.601356 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.InputStreamReader.close(InputStreamReader.java:182)
我,[2013-11-18T04:56:51.601395 #25683] 信息 -- :尝试_201310111840_142285_r_000009_0:在java.io.BufferedReader.close(BufferedReader.java:497)
I, [2013-11-18T04:56:51.601395 #25683] INFO -- : attempt_201310111840_142285_r_000009_0: at java.io.BufferedReader.close(BufferedReader.java:497)
我不确定为什么会发生此异常.这不是多线程的,所以我不希望有任何形式的竞争条件.你能帮我理解吗?
I am not sure why this exception is happening. This is not multithreaded and so, I do not expect there to be a race condition of any sort. Can you please help me understand.
谢谢,
文克
推荐答案
hadoop 文件系统 API 有一个鲜为人知的陷阱:FileSystem.get
每次调用都返回相同的对象文件系统.因此,如果一个在任何地方都关闭了,那么它们都是关闭的.你可以辩论这个决定的优点,但事实就是这样.
There is a little-known gotcha with the hadoop filesystem API: FileSystem.get
returns the same object for every invocation with the same filesystem. So if one is closed anywhere, they are all closed. You could debate the merits of this decision, but that's the way it is.
因此,如果您尝试关闭 BufferedReader,并且它尝试清除已缓冲的一些数据,但底层流连接到已关闭的 FileSystem,您将收到此错误.检查您的代码是否有任何其他要关闭 FileSystem 对象的位置,并查找竞争条件.另外,我相信 Hadoop 本身会在某个时候关闭文件系统,所以为了安全起见,您可能应该只从 Reducer 的设置、减少或清理方法(或配置、减少和关闭,具体取决于哪个 API)中访问它你正在使用).
So, if you attempt to close your BufferedReader, and it tries to flush out some data it has buffered, but the underlying stream is connected to a FileSystem that is already closed, you'll get this error. Check your code for any other places you are closing a FileSystem object, and look for race conditions. Also, I believe Hadoop itself will at some point close the FileSystem, so to be safe, you should probably only be accessing it from within the Reducer's setup, reduce, or cleanup methods (or configure, reduce, and close, depending on which API you're using).
相关文章