Java 1.8 安全点超时
我似乎遇到了一种情况,即 JVM 在几个小时后无限期地试图到达安全点.但是,如果我使用 -F 选项执行 jstack,它似乎会摆脱等待并继续执行.
I seem to be running into a scenario where the JVM is stuck indefinitely in trying to get to a safe point after a few hours. However, if I do a jstack with -F option it seems to get out of that wait and continues with the execution.
jdk1.8.0_45/bin/jstack -F 39924 >a.out
jdk1.8.0_45/bin/jstack -F 39924 >a.out
我在 Centos 上使用 jdk1.8.0_45
I am using jdk1.8.0_45 on Centos
我的问题是:
i) 当从 jstack 发送中断时,JVM 似乎可以摆脱安全点无限期等待.没有jstack怎么会出不来.是否有一些 jvm 选项可以用来避免无限期的等待.
i) It seems that the JVM can come out of that safepoint indefinite wait when sent an interrupt from jstack. How come it doesnt come out without jstack. Is there some jvm option I can use to avoid that indefinite wait.
ii) 我能否获得导致问题的线程的更明确的线程转储.安全点日志的输出似乎不准确.
ii) Can I get a more definite thread dump of the the thread that's causing the issue. The output from the safepoint log seems imprecise.
我使用的选项是:.
-server
-XX:+AggressiveOpts
-XX:+UseG1GC
-XX:+UnlockExperimentalVMOptions
-XX:G1MixedGCLiveThresholdPercent=85
-XX:InitiatingHeapOccupancyPercent=30
-XX:G1HeapWastePercent=5
-XX:MaxGCPauseMillis=1000
-XX:G1HeapRegionSize=4M
-XX:+PrintGC
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+UnlockExperimentalVMOptions
-XX:G1LogLevel=finest
-Xmx6000m
-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=999
-XX:+SafepointTimeout
-XX:+UnlockDiagnosticVMOptions
-XX:SafepointTimeoutDelay=20000
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
安全点日志
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.115: G1IncCollectionPause [ 170 0 0 ] [ 0 0 0 0 8 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.125: RevokeBias [ 170 1 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.127: RevokeBias [ 170 1 1 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.131: RevokeBias [ 170 1 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17771.955: RevokeBias [ 169 0 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17772.160: BulkRevokeBias [ 171 0 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17772.352: RevokeBias [ 170 1 3 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17773.596: RevokeBias [ 169 0 1 ] [ 0 0 0 0 0 ] 0
# SafepointSynchronize::begin: Timeout detected:
# SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.
# SafepointSynchronize::begin: Threads which did not reach the safepoint:
# "Thread-14" #115 prio=5 os_prio=0 tid=0x00007f20c8029000 nid=0x9cd0 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE
# SafepointSynchronize::begin: (End of list)
在 jstack 中断之后,这是我从安全点日志中看到的
After the jstack interrupt this is what I see from the safepoint log
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
17779.826: G1IncCollectionPause [ 169 1 1 ] [3315603 03315603 0 8 ] 1
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias [ 169 2 13 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.439: RevokeBias [ 169 1 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.441: RevokeBias [ 184 3 4 ] [ 0 0 3 0 1 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
21095.447: RevokeBias [ 190 0 2 ] [ 0 0 4 0 2 ] 0
推荐答案
由于您可以通过中断 VM 来解决问题并且您在 CentOS 上,所以问题让我想起了 这个内核错误.
Since you are able to remedy the problem by interrupting the VM and you're on CentOS the problem reminds me of this kernel bug.
线程列出了以下受影响的版本(假设是标准内核):
The thread lists the following affected versions (assuming standard kernels):
- RHEL 6(以及 CentOS 6 和 SL 6):6.0-6.5 都不错.6.6是坏的.6.6.z很好.
- RHEL 7(以及 CentOS 7 和 SL 7):7.1 很糟糕.截至昨天.似乎还没有 7.x 修复.
- RHEL 5(和 CentOS 5,以及SL 5):所有版本都很好(包括 5.11).
相关文章