为什么在 Java 9 G1 工作约 6 小时后性能会下降,而负载却没有实际增加?
我将 1 个实例(2 个 vCPU,2GB RAM,负载 ~4k req/sec)切换到 Java 9(来自最新的 Java 8).有一段时间,一切都很好,CPU使用率和以前一样.但是,大约 6 小时后,CPU 消耗无缘无故地增加了 4%(从 21% 到 25%).我没有流量高峰,没有内存消耗增加,没有指标变化(我在代码中的每个方法都有计数器).什么都没有.
I switched 1 instance (2 vCPU, 2GB RAM, load ~4k req/sec) to Java 9 (from latest Java 8). For a while, everything was fine and CPU usage was same as before. However, after ~6 hours CPU consumption increased by 4% (from 21% to 25%) for no reason. I had no traffic spikes, no memory consumption increased, no metric changes (I have counters for every method within code). Nothing.
我让这个实例保持原样大约 12 小时,希望它会恢复原状.但什么都没有改变.它刚刚开始消耗更多的 CPU.
I left this instance untouched for ~12 hours expecting it will revert back. But nothing changed. It just started consuming more CPU.
top
命令显示该实例的 CPU 峰值比 Java 服务器进程的通常情况要多.我最近读到 G1 不适合高吞吐量.所以我得出一个结论,原因可能在G1.
top
command showed that the instance had more CPU spikes than usually for the Java server process. I read recently that G1 is not suitable for the high throughput. So I made a conclusion that reason could be in G1.
我重新启动了实例:
java -XX:+UseParallelGC -jar server-0.28.0.jar
经过约 20 小时的监控,一切都和以前一样好.CPU 消耗量与许多天前一样处于 21% 的水平.
And after ~20 hours of the monitoring, everything is fine as before. CPU consumption is on the level of 21% as it was many days before.
Java 9 部署后的 CPU 使用率(6 小时规模):
CPU usage right after Java 9 deployment (6h scale):
7 小时后 CPU 增加 + 12 小时未触及"(7d 规模):
CPU increase after 7 hours + 12 hours "untouched" (7d scale):
CPU 后 -XX:+UseParallelGC
(24h scale):
CPU after -XX:+UseParallelGC
(24h scale):
所以我的问题是 - 这是 G1 的预期行为吗?其他人看到类似的东西吗?
So my question is - is that expected behavior for the G1? Anyone else sees something similar?
Ubuntu 16.04 x64
Ubuntu 16.04 x64
java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)
编辑 03.01.2019
尝试在 java 10.0.2 上运行一个与 G1 相同的服务器:
Tried to run one the same server with G1 on the java 10.0.2:
java version "10.0.2" 2018-07-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.2+13)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)
G1 在服务器重启后消耗的 CPU 比 UseParallelGC
多 40%.
G1 consumes 40% more CPU than UseParallelGC
right after the server restart.
推荐答案
(注意 GC 调优极度依赖环境,所以没有什么妙方.)
(Note that GC tuning is extremely dependant on the environment, so there is no magic recipe.)
与 G1 有一个非常相似的问题.默认情况下,它似乎非常不适合 REST 端点(同样,这只是我在直接附近所经历的).如here.
Had a very similar issue with G1. By default, it seems to be rather badly suited for REST endpoints (again, this is only what I experienced in my direct neighborhood). What helped me was experimenting with the GC flags, as described here.
对我们来说,最大的改进来自 -XX:G1NewSizePercent=25 和 -XX:MaxGCPauseMillis=50.G1 也会随着时间的推移自动调整自身,因此最大.GC 暂停限制对所有其他参数都有显着影响.
For us, the biggest improvements came from -XX:G1NewSizePercent=25 and -XX:MaxGCPauseMillis=50. G1 is also auto-tuning itself over time, so the max. GC pause limit has a significant effect on all other parameters.
相关文章