Java 阻塞问题:为什么 JVM 会阻塞许多不同类/方法中的线程?

更新:这看起来像是内存问题.一个 3.8 Gb 的 Hprof 文件表明,当这种阻塞"发生时,JVM 正在转储其堆.我们的运营团队看到该站点没有响应,进行了堆栈跟踪,然后关闭了该实例.我相信他们在堆转储完成之前关闭了该站点.日志中没有错误/异常/问题证据——可能是因为 JVM 在生成错误消息之前就被杀死了.

Update: This looks like a memory issue. A 3.8 Gb Hprof file indicated that the JVM was dumping-its-heap when this "blocking" occurred. Our operations team saw that the site wasn't responding, took a stack trace, then shut down the instance. I believe they shut down the site before the heap dump finished. The log had no errors/exceptions/evidence of problems--probably because the JVM was killed before it could generate an error message.

原始问题我们最近遇到了一个应用程序出现的情况——对最终用户来说——挂起.我们在应用程序重新启动之前获得了堆栈跟踪,我发现了一些令人惊讶的结果:在 527 个线程中,463 个线程状态为 BLOCKED.

Original Question We had a recent situation where the application appeared --to the end user--to hang. We got a stack trace before the application restart and I found some surprising results: of 527 threads, 463 had thread state BLOCKED.

过去过去阻塞的线程通常有这个问题:1)一些明显的瓶颈:例如一些数据库记录锁或文件系统锁问题导致其他线程等待.2) 所有被阻塞的线程都会阻塞在同一个类/方法上(例如 jdbc 或文件系统类)

In the Past In the past blocked thread usually had this issue: 1) some obvious bottleneck: e.g. some database record lock or file system lock problem which caused other threads to wait. 2) All blocked threads would block on the same class/method (e.g. the jdbc or file system clases)

异常数据在这种情况下,除了应用程序类(包括 jdbc 和 lucene 调用)之外,我看到各种类/方法被阻止,包括 jvm 内部类、jboss 类、log4j 等

Unusual Data In this case, I see all sorts of classes/methods blocked, including jvm internal classes, jboss classes, log4j, etc, in addition to application classes (including jdbc and lucene calls)

问题什么会导致 JVM 阻塞 log4j.Hierarchy.getLogger、java.lang.reflect.Constructor.newInstance?显然某些资源稀缺",但哪种资源?

The question what would cause a JVM to block log4j.Hierarchy.getLogger, java.lang.reflect.Constructor.newInstance? Obviously some resource "is scarce" but which resource?

谢谢

堆栈跟踪摘录

http-0.0.0.0-80-417" daemon prio=6 tid=0x000000000f6f1800 nid=0x1a00 waiting for monitor entry [0x000000002dd5d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at sun.reflect.GeneratedConstructorAccessor68.newInstance(Unknown Source)
                at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
                at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
                at java.lang.Class.newInstance0(Class.java:355)
                at java.lang.Class.newInstance(Class.java:308)
                at org.jboss.ejb.Container.createBeanClassInstance(Container.java:630)

http-0.0.0.0-80-451" daemon prio=6 tid=0x000000000f184800 nid=0x14d4 waiting for monitor entry [0x000000003843d000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at java.lang.Class.getDeclaredMethods0(Native Method)
                at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)
                at java.lang.Class.getMethod0(Class.java:2670)

"http-0.0.0.0-80-449" daemon prio=6 tid=0x000000000f17d000 nid=0x2240 waiting for monitor entry [0x000000002fa5f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.register(Http11Protocol.java:638)
                - waiting to lock <0x00000007067515e8> (a org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler)
                at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.createProcessor(Http11Protocol.java:630)


"http-0.0.0.0-80-439" daemon prio=6 tid=0x000000000f701800 nid=0x1ed8 waiting for monitor entry [0x000000002f35b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at org.apache.log4j.Hierarchy.getLogger(Hierarchy.java:261)
                at org.apache.log4j.Hierarchy.getLogger(Hierarchy.java:242)
                at org.apache.log4j.LogManager.getLogger(LogManager.java:198)

推荐答案

这些大致按照我尝试它们的顺序列出,具体取决于收集的证据:

These are listed roughly in the order I would try them, depending on the evidence collected:

  • 您是否查看过 GC 行为?你有记忆压力吗?这可能会导致 newInstance() 和上面的其他一些被阻止.使用 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc 运行您的 VM 并记录输出.您是否在故障/锁定时间附近看到过多的 GC 时间?
    • 条件是否可重复?如果是这样,请尝试在 JVM (-Xmx) 中使用不同的堆大小,并查看行为是否发生重大变化.如果是这样,请查找内存泄漏或为您的应用正确调整堆大小.
    • 如果前一个很难,并且您没有收到 OutOfMemoryError,您可以调整 GC 可调参数...参见 JDK6.0 XX 选项,或 JDK6.0 GC 调优白皮书.具体看-XX:+UseGCOverheadLimit-XX:+GCTimeLimit及相关选项.(注意这些没有很好的记录,但可能有用...)
    • Have you looked at GC behavior? Are you under memory pressure? That could result in newInstance() and a few others above being blocked. Run your VM with -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc and log the output. Are you seeing excessive GC times near the time of failure/lockup?
      • Is the condition repeatable? If so, try with varying heap sizes in the JVM (-Xmx) and see if the behavior changes substantially. If so, look for memory leaks or properly size the heap for your app.
      • If the previous is tough, and you're not getting an OutOfMemoryError when you should, you can tune the GC tunables... see JDK6.0 XX options, or JDK6.0 GC Tuning Whitepaper. Look specifically at -XX:+UseGCOverheadLimit and -XX:+GCTimeLimit and related options. (note these are not well documented, but may be useful...)

相关文章