“java.lang.OutOfMemoryError:GC 开销限制超出"中的 Excessive GC Time 的持续时间

2022-01-16 00:00:00 out-of-memory garbage-collection java

有时,在每 2 天一次到每 2 周一次之间,我的应用程序在代码中看似随机的位置崩溃:java.lang.OutOfMemoryError: GC 开销限制超出.如果我用谷歌搜索这个错误,我会来到 这个 SO 问题 并导致我 这段 sun 文档 解释了:

Occasionally, somewhere between once every 2 days to once every 2 weeks, my application crashes in a seemingly random location in the code with: java.lang.OutOfMemoryError: GC overhead limit exceeded. If I google this error I come to this SO question and that lead me to this piece of sun documentation which expains:

如果时间过长,并行收集器将抛出 OutOfMemoryError用于垃圾收集:如果超过 98% 的总时间是花费在垃圾收集上,只有不到 2% 的堆被回收,OutOfMemoryError 将被抛出.此功能旨在防止应用程序在制作过程中长时间运行由于堆太小,很少或没有进展.如有必要,这可以通过将选项 -XX:-UseGCOverheadLimit 添加到命令行.

The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line.

这告诉我,我的应用程序显然将 98% 的总时间用于垃圾收集以仅恢复 2% 的堆.

Which tells me that my application is apparently spending 98% of the total time in garbage collection to recover only 2% of the heap.

但是 98% 的时间是什么时候?整个两周的 98% 的应用程序一直在运行?最后一毫秒的 98%?

But 98% of what time? 98% of the entire two weeks the application has been running? 98% of the last millisecond?

我正在尝试确定实际解决此问题的最佳方法,而不仅仅是使用 -XX:-UseGCOverheadLimit,但我觉得需要更好地了解我正在解决的问题.

I'm trying to determine a best approach to actually solving this issue rather than just using -XX:-UseGCOverheadLimit but I feel a need to better understand the issue I'm solving.

推荐答案

我正在尝试确定实际解决此问题的最佳方法,而不仅仅是使用 -XX:-UseGCOverheadLimit,但我觉得需要更好地了解我正在解决的问题.

I'm trying to determine a best approach to actually solving this issue rather than just using -XX:-UseGCOverheadLimit but I feel a need to better understand the issue I'm solving.

嗯,你使用了太多的内存 - 从它的声音来看,这可能是因为缓慢的内存泄漏.

Well, you're using too much memory - and from the sound of it, it's probably because of a slow memory leak.

您可以尝试使用 -Xmx 增加堆大小,如果这不是内存泄漏而是表明您的应用实际上需要大量堆以及您当前拥有的设置的迹象,这将有所帮助略低.如果是内存泄漏,这只会推迟不可避免的事情.

You can try increasing the heap size with -Xmx, which would help if this isn't a memory leak but a sign that your app actually needs a lot of heap and the setting you currently have is slightly to low. If it is a memory leak, this'll just postpone the inevitable.

要调查是否是内存泄漏,请使用-XX:+HeapDumpOnOutOfMemoryError开关指​​示VM在OOM上转储堆,然后分析堆转储以查看是否有更多的对象某种比应该有的.http://blogs.oracle.com/alanb/entry/heap_dumps_are_back_with 很漂亮开始的好地方.

To investigate if it is a memory leak, instruct the VM to dump heap on OOM using the -XX:+HeapDumpOnOutOfMemoryError switch, and then analyze the heap dump to see if there are more objects of some kind than there should be. http://blogs.oracle.com/alanb/entry/heap_dumps_are_back_with is a pretty good place to start.

命运如此,我碰巧在提出这个问题的第二天,在一个批处理式应用程序中遇到了这个问题.这不是由内存泄漏引起的,增加堆大小也无济于事.我所做的实际上是减小堆大小(从 1GB 到 256MB),以使完整的 GC 更快(尽管更频繁).YMMV,但值得一试.

As fate would have it, I happened to run into this problem myself just a day after this question was asked, in a batch-style app. This was not caused by a memory leak, and increasing heap size didn't help, either. What I did was actually to decrease heap size (from 1GB to 256MB) to make full GCs faster (though somewhat more frequent). YMMV, but it's worth a shot.

编辑 2: 并非所有问题都可以通过较小的堆解决...下一步是启用 G1 垃圾收集器 似乎比 CMS 做得更好.

Edit 2: Not all problems solved by smaller heap... next step was enabling the G1 garbage collector which seems to do a better job than CMS.

相关文章