在生产环境中设置 -XX:+DisableExplicitGC:会出现什么问题?

2022-01-16 00:00:00 garbage-collection java tomcat

我们刚刚召开了一次会议,以解决用于计算保险费率的 Web 应用程序中的一些性能问题.计算在 C/C++ 模块中实现,该模块也用于其他软件包.为了使其可用作 Web 服务,实现了一个 Java 包装器,它公开了一个基于 XML 的接口并通过 JNI 调用 C/C++ 模块.

we just had a meeting to address some performance issues in a web application that is used to calculate insurance rates. The calculations are implemented in a C/C++-module, that is used in other software packages as well. To make it available as a webservice, a Java wrapper was implemented that exposes an XML based interface and calls the C/C++-module via JNI.

测量结果表明,Java 部分内的每个计算都花费了几秒钟.所以我的第一个建议是在 VM 中启用垃圾收集日志记录.我们可以立即看到制作了许多 stop-the-world 的完整 GC.说到这个,java部分的开发者告诉我们他们做了几次System.gc()以确保内存在使用后被释放".

Measurements showed that several seconds were spent on each calculation inside the Java part. So my first recomodation was to enable garbage collection logging in the VM. We could see at once that many stop-the-world full GCs were made. Talking about that, the developper of the java part told us they did a System.gc() on several occasions "to make sure the memory is released after use".

好的,我将不再详细说明该声明... ;-)

OK, I won't elaborate on that statement any further... ;-)

然后我们将上述 -XX:+DisableExplicitGC 添加到 VM 参数并重新运行测试.每次计算增加了大约 5 秒.

We then added abovementioned -XX:+DisableExplicitGC too the VMs arguments and reran the tests. This gained about 5 seconds per calculation.

由于在发布过程中此时我们无法通过剥离所有那些 System.gc() 调用来更改代码,因此我们正在考虑添加 -XX:+DisableExplicitGC 在生产中,直到可以创建新的 Jar.

Since we cannot change the code by stripping all those System.gc() calls at this point in our release process, we are thinking about adding -XX:+DisableExplicitGC in production until a new Jar can be created.

现在的问题是:这样做会有任何风险吗?我能想到的唯一一件事是在重新部署时在内部使用 System.gc() 的tomcat,但这只是一个猜测.前方是否还有其他危险?

Now the question is: could there be any risk in doing so? About the only thing I can think of is tomcat using System.gc() internally when redeploying, but that's just a guess. Are there any other hazards ahead?

推荐答案

您并不是唯一一个通过设置 -XX:+DisableExplicitGC 标志来修复 stop-the-world GC 事件的人.不幸的是(尽管文档中有免责声明),许多开发人员认为他们比 JVM 更清楚何时收集内存并准确引入此类问题.

You are not alone in fixing stop-the-world GC events by setting the -XX:+DisableExplicitGC flag. Unfortunately (and in spite of the disclaimers in the documentation), many developers decide they know better than the JVM when to collect memory and introduce exactly this type of issue.

我知道 -XX:+DisableExplicitGC 改进了生产环境的许多实例和零个出现任何负面影响的实例.

I'm aware of many instances where the -XX:+DisableExplicitGC improved the production environment and zero instances where there were any negative side effects.

安全的做法是在负载下运行您当前的生产代码,并在压力测试环境中设置该标志并执行正常的 QA 周期.

The safe thing to do is to run your current production code, under load, with that flag set in a stress test environment and perform a normal QA cycle.

如果你不能这样做,我建议在大多数情况下设置标志的风险小于不设置它的成本.

If you cannot do that, I would suggest that the risk of setting the flag is less than the cost of not setting it in most cases.

相关文章