JVM 在 Lucene DataInput.readVInt 上崩溃

当使用 Lucene 索引文档时,我的 JVM (1.6.0_29) 在密集使用时不断崩溃.我明白了:

My JVM (1.6.0_29) keeps crashing on intensive use when indexing documents with Lucene. I get:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002b6b196d767c, pid=26417, tid=1183217984
#
# JRE version: 6.0_29-b11
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.4-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J  org.apache.lucene.store.DataInput.readVInt()I
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#

环境:

JDK:1.6u29(与 1.6_02 相同的问题)Lucene 版本 3.4.0

JDK: 1.6u29 (same issue with 1.6_02) Lucene Version 3.4.0

vm_info:用于 linux-amd64 JRE (1.6.0_29-b11) 的 Java HotSpot(TM) 64 位服务器 VM (20.4-b02),由java_re"和 gcc 3.2 于 2011 年 10 月 3 日 01:19:20 构建.2 (SuSE Linux)

vm_info: Java HotSpot(TM) 64-Bit Server VM (20.4-b02) for linux-amd64 JRE (1.6.0_29-b11), built on Oct 3 2011 01:19:20 by "java_re" with gcc 3.2.2 (SuSE Linux)

操作系统:CentOS 5.0 版(最终版)

OS:CentOS release 5.0 (Final)

jvm_args: -Dcatalina.home=/var/local/tomcat-8081 -Dcatalina.base=/var/local/tomcat-8081 -Djava.io.tmpdir=/var/tmp -Dfile.encoding=UTF-8-Xmx1024M -XX:MaxPermSize=96m

jvm_args: -Dcatalina.home=/var/local/tomcat-8081 -Dcatalina.base=/var/local/tomcat-8081 -Djava.io.tmpdir=/var/tmp -Dfile.encoding=UTF-8 -Xmx1024M -XX:MaxPermSize=96m

这似乎是 jdk 1.7 中修复的 jdk 问题,但引入了其他问题.https://issues.apache.org/jira/browse/LUCENE-3335Java 7 包含自 1.6.0_21 以来对 readVInt 问题的修复(大约,LUCENE-2975)"

It seems to be a jdk issue that was fixed in jdk 1.7, but other issues where introduced. https://issues.apache.org/jira/browse/LUCENE-3335 "Java 7 contains a fix to the readVInt issue since 1.6.0_21 (approx, LUCENE-2975)"

那么,如何使用 JDK 1.6 解决这个问题?我应该升级到 jdk 1.7 吗?

So, how can I fix this issue using JDK 1.6? Should I upgrade to jdk 1.7?

推荐答案

这些 JDK 问题也在 1.6.9_29(不仅仅是 1.7.0u1)中得到修复.ReadVInt 不能再崩溃.因此,您的崩溃与任何著名的 java6/7 错误"都无关(vint 错误根本不会使您的 JVM 崩溃,它只是通过返回错误值来破坏您的索引 - 自 Lucene 3.1 以来这个问题肯定已修复).

these JDK issues are also fixed in 1.6.9_29 (not only 1.7.0u1). ReadVInt can no longer crash. So your crash is not related to any of the "famous java6/7 bugs" (the vint bug does not crash your JVM at all it just corrupts your index by returning wrong values - and this one is definitely fixed since Lucene 3.1).

但是还有另一个可能导致 JVM 崩溃:您在 64 位平台 (Linux) 上,因此默认目录实现是 MMapDirectory.Lucene 使用 hack 能够从虚拟地址空间取消映射映射文件.JVM 本身不允许这样做,但会使取消映射依赖于垃圾收集器,这对 Lucene 来说是个问题.默认情况下,MMapDirectory 在关闭 IndexInputs 后取消映射文件.MMapDirectory 根本不同步,所以当另一个线程在取消映射后尝试访问 IndexInput 时,它将访问一个未映射的地址并会 SIGSEGV.

But there is another chance you can crash your JVM: You are on a 64 bit platform (Linux), so the default directory implementation is MMapDirectory. Lucene uses a hack to be able to unmap mapped files from virtual address space. This is not allowed by the JVM itsself, but makes unmapping dependent on garbage collector, which is a problem for Lucene. By default MMapDirectory unmaps the files after closing the IndexInputs. MMapDirectory is not synchronized at all, so when another thread tries to access the IndexInput after unmapping it will access an unmapped address and will SIGSEGV.

如果您的代码正确,这不会发生,但看起来您正在使用已经关闭的 IndexReader/IndexWriter 来访问索引.在 Lucene 3.5(即将推出)之前,IndexReader 中缺少检查将导致已关闭的 IndexReader 及其所有已关闭(且未映射)的 IndexInputs 尝试访问索引数据和段错误.

If your code would be correct this cannot happen, but it looks like you are using an already closed IndexReader/IndexWriter to access the index. Before Lucene 3.5 (will come out soon), missing checks in IndexReader will make it possible that an already closed IndexReader with all its closed (and unmapped) IndexInputs tries to access index data and segfaults.

在 3.5 中,我们添加了额外的安全检查来防止这种非法访问,但它不是 100%(因为缺少同步).我会查看代码并检查是否没有任何内容可以访问已关闭的索引.

In 3.5 we added additional safety checks to prevent this illegal access, but its not 100% (as synchronization is missing). I would review the code and check that nothing accesses closed index.

检查这是否是您的问题的简单检查是使用 NIOFSDirectory(在 Linux 上较慢)而不是 MMapDirectory.如果它没有崩溃并且可能抛出 AlreadyClosedExceptions,则该错误正在访问已关闭的索引.

A simple check to see if this is your issue would be to use NIOFSDirectory (slower on Linux) instead of MMapDirectory. If it does not crash and possibly throws AlreadyClosedExceptions, the bug is accessing closed indexes.

相关文章