JVM 运行我的代码时如何调试 Segfaults?

2022-01-12 00:00:00 segmentation-fault java

我的 Java 应用程序已经开始定期崩溃,出现 SIGSEGV 和堆栈数据转储以及文本文件中的大量信息.

My Java application has started to crash regularly with a SIGSEGV and a dump of stack data and a load of information in a text file.

我在 gdb 中调试了 C 程序,并从我的 IDE 中调试了 Java 代码.我不确定如何在正在运行的 Java 程序中处理类似 C 的崩溃.

I have debugged C programs in gdb and I have debugged Java code from my IDE. I'm not sure how to approach C-like crashes in a running Java program.

我假设我不是在这里查看 JVM 错误.其他 Java 程序运行良好,Sun 的 JVM 可能比我的代码更稳定.但是,我不知道我怎么会导致 Java 代码出现段错误.肯定有足够的可用内存,当我上次检查分析器时,堆使用率约为 50%,偶尔峰值约为 80%.我可以调查任何启动参数吗?处理此类错误时,什么是好的清单?

I'm assuming I'm not looking at a JVM bug here. Other Java programs run just fine, and the JVM from Sun is probably more stable than my code. However, I have no idea how I could even cause segfaults with Java code. There definitely is enough memory available, and when I last checked in the profiler, heap usage was around 50% with occasional spikes around 80%. Are there any startup parameters I could investigate? What is a good checklist when approaching a bug like this?

虽然到目前为止我还不能可靠地重现该事件,但它似乎也不是完全随机发生的,因此测试并非完全不可能.

Though I'm not so far able to reliably reproduce the event, it does not seem to occur entirely at random either, so testing is not completely impossible.

预计到达时间:一些血腥细节

(我正在寻找一种通用方法,因为实际问题可能非常具体.不过,我已经收集了一些信息,这可能具有一定的价值.)

不久前,我在升级 CI 服务器后遇到了类似的问题(请参阅 这里了解更多详细信息),但这次修复(设置 -XX:MaxPermSize)没有帮助.

A while ago, I had similar-looking trouble after upgrading my CI server (see here for more details), but that fix (setting -XX:MaxPermSize) did not help this time.

进一步调查显示,在崩溃日志文件中,标记为当前线程"的线程从来都不是我的线程,而是一个名为VMThread"或一个名为GCTaskThread"的线程——如果是后者,则另外标记带有注释(退出)",如果是前者,则 GCTaskThread 不在列表中.这让我认为问题可能在 GC 操作结束时出现.

Further investigation revealed that in the crash log files the thread marked as "current thread" is never one of mine, but either one called "VMThread" or one called "GCTaskThread"- I f it's the latter, it is additionally marked with the comment "(exited)", if it's the former, the GCTaskThread is not in the list. This makes me suppose that the problem might be around the end of a GC operation.

推荐答案

我假设我不是在这里查看 JVM 错误.其他 Java 程序运行得很好,Sun 的 JVM 可能比我的更稳定代码.

I'm assuming I'm not looking at a JVM bug here. Other Java programs run just fine, and the JVM from Sun is probably more stable than my code.

我认为你不应该做出这样的假设.如果不使用 JNI,您应该无法编写导致 SIGSEGV 的 Java 代码(尽管我们知道它会发生).我的观点是,当它发生时,它要么是 JVM 中的错误(并非闻所未闻),要么是某些 JNI 代码中的错误.如果您自己的代码中没有任何 JNI,这并不意味着您没有使用某个库,所以请寻找它.当我以前看到这种问题时,它是在一个图像处理库中.如果罪魁祸首不在您自己的 JNI 代码中,您可能无法修复"该错误,但您仍然可以解决它.

I don't think you should make that assumption. Without using JNI, you should not be able to write Java code that causes a SIGSEGV (although we know it happens). My point is, when it happens, it is either a bug in the JVM (not unheard of) or a bug in some JNI code. If you don't have any JNI in your own code, that doesn't mean that you aren't using some library that is, so look for that. When I have seen this kind of problem before, it was in an image manipulation library. If the culprit isn't in your own JNI code, you probably won't be able to 'fix' the bug, but you may still be able to work around it.

首先,您应该在同一平台上获得一个备用 JVM 并尝试重现它.您可以尝试这些替代方法之一.

First, you should get an alternate JVM on the same platform and try to reproduce it. You can try one of these alternatives.

如果您无法重现它,则可能是 JVM 错误.由此,您可以使用特定的 JVM 或搜索错误数据库,使用您所知道的重现它,也许会得到建议的解决方法.(即使你可以重现它,许多 JVM 实现只是对 Oracle 的 Hotspot 实现的调整,所以它可能仍然是一个 JVM 错误.)

If you cannot reproduce it, it likely is a JVM bug. From that, you can either mandate a particular JVM or search the bug database, using what you know about how to reproduce it, and maybe get suggested workarounds. (Even if you can reproduce it, many JVM implementations are just tweaks on Oracle's Hotspot implementation, so it might still be a JVM bug.)

如果您可以使用替代 JVM 重现它,那么错误可能是您有一些 JNI 错误.查看您正在使用的库以及它们可能进行的本机调用.有时,对于同一个库或执行几乎相同操作的替代库,有替代的纯 Java"配置或 jar 文件.

If you can reproduce it with an alternative JVM, the fault might be that you have some JNI bug. Look at what libraries you are using and what native calls they might be making. Sometimes there are alternative "pure Java" configurations or jar files for the same library or alternative libraries that do almost the same thing.

祝你好运!

相关文章