雄猫突然死了
尝试在 64 位 linux (CentOS) 机器上诊断一些奇怪的 Tomcat (7.0.21
) 和/或 JVM 错误.
Trying to diagnose some bizarre Tomcat (7.0.21
) and/or JVM errors on a 64-bit linux (CentOS) machine.
我正在对我们的服务器应用程序进行负载测试,并尝试使用 100K 消息对其进行处理.启动 jvisualvm
并一直关注堆.一切看起来都很棒*(见下文),直到我处理了大约 93K 条消息,然后 Tomcat 就死了.在 Tomcat 的 PID 号上运行 ps
以确认它已死.
I'm load testing our server application and tried hitting it with 100K messages. Launched jvisualvm
and kept my eye on the heap the whole time. Everything was looking great* (see below) until I got to about 93K processed messages and then Tomcat just died. Ran a ps
on Tomcat's PID number to confirm it was dead.
直到这次崩溃:
- 负载测试已经运行了大约 90 分钟;应该很快就完成了,因为我们是 93K/100K)
- CPU 保持在 45% 左右
- 使用的堆大约 2GB(在 GC 之后加上或减去一堆),但堆大小在大约 30 分钟后从 4GB 增长到
MAX_HEAP
- 类加载/卸载正常循环
- 线程转储正常
服务器代码中没有任何对 System.exit()
的调用 - 所以我们可以直接排除(是的,我已经仔细检查了!!!).
Nowhere in the server code are any calls to System.exit()
- so we can rule that right out (and yes I've double-checked!!!).
我不确定这是 Tomcat 崩溃还是 JVM 崩溃(我怎么知道?).即使我知道,我似乎也找不到任何迹象表明出了什么问题:
I'm not sure if this is Tomcat crashing or the JVM (how do I tell?). And even if I did know, I can't seem to find any indication of what went wrong:
- 所有服务器应用程序的日志都停止了,没有任何错误消息(即使我们已将日志记录普遍设置为 DEBUG 或更高版本)
- Tomcat 的
catalina.out
并尊重localhost_access_*
文件在没有任何信息的情况下停止
- All of the server app's logs just stop without any ERROR messages (even though we have logging universally set to DEBUG and higher)
- Tomcat's
catalina.out
and respectlocalhost_access_*
files just stop without any info
我听说可以让 Tomcat 记录核心转储,但不知道如何做到这一点,并且在线示例没有多大帮助.
I've heard it is possible to have Tomcat log a coredump when it does but not sure how to do that and online examples aren't helping much.
SO 将如何诊断?我应该采取哪些步骤来开始排除所有可能的因素?
How would SO go about diagnosing this? What steps should I take to start ruling out all of the possible factors?
提前致谢!
推荐答案
抱歉,我不得不删除 @erickson 的绿色检查.我终于弄清楚是什么杀死了 Tomcat.
Sorry I had to remove the green check from @erickson. I finally figured out what was killing Tomcat.
似乎未使用 VisualVM 正确配置探查器插件,并试图在 Tomcat 进程上运行配置文件将其杀死.
It looks like a profiler plugin is not configured correctly with VisualVM and attempting to run a profile on the Tomcat process killed it.
现在调查原因,一旦我知道更多,就会更新这个答案.
Investigating why right now, and will update this answer once I know more.
相关文章