调整垃圾收集以实现低延迟
我正在寻找关于在低延迟至关重要的环境中如何最好地调整年轻一代(相对于老一代)大小的论据.
I'm looking for arguments as to how best to size the young generation (with respect to the old generation) in an environment where low latency is critical.
我自己的测试倾向于表明当年轻代相当大时延迟最低(例如 -XX:NewRatio <3),但是我无法将这与直觉认为年轻代越大它的时间越多应该采取垃圾收集.
My own testing tends to show that latency is lowest when the young generation is fairly large (eg. -XX:NewRatio <3), however I cannot reconcile this with the intuition that the larger the young generation the more time it should take to garbage collect.
应用程序在 linux 64 位,jdk 6 上运行.
The application runs on linux 64 bits, jdk 6.
内存使用量是在启动时加载大约 50 兆字节的长寿命对象(=数据缓存),并且从那里只创建(许多)非常短寿命的对象(平均寿命 <1 毫秒).
Memory usage is about 50Megabytes of long-lived objects being loaded at startup (=data cache), and from there it's only (many) very short lived objects being created (with average lifespan < 1 milliseconds).
一些垃圾回收周期需要超过 10 毫秒才能运行......与应用延迟相比,这看起来真的不成比例,应用延迟也是最多几毫秒.
Some garbage collection cycle take more than 10 milliseconds to run... which looks really disproportionate compared with app latency, which is again a few millisecs at max.
推荐答案
对于一个生成大量短命垃圾而没有什么长命题的应用程序,一种可行的方法是一个大堆,其中几乎所有的都是年轻一代,几乎所有这些伊甸园和任何在 YG 收藏中幸存不止一次的东西.
For an application that generates lots of short lived garbage and nothing long lived then one approach that can work is a big heap with nearly all of it young gen and nearly all of that eden and tenure anything that survives a YG collection more than once.
例如(假设您有一个 32 位 jvm)
For example (lets say you had a 32bit jvm)
- 3072M 堆(Xms 和 Xmn)
- 1.28 亿终身职位(即 Xmn 2944 万)
- MaxTenuringThreshold=1
- SurvivorRatio=190(即每个幸存者空间是 YG 的 1/192)
- TargetSurvivorRatio=90(即尽可能多地填充这些幸存者)
您将用于此设置的确切参数取决于您的工作集的稳态大小(即每次收集时有多少活着).这里的想法显然违背了正常的堆大小规则,但是你没有一个应用程序以这种方式运行.想法是,该应用程序主要是短期垃圾和一些静态数据,因此设置 jvm 以便静态数据快速进入任期,然后有一个足够大的 YG,它不会经常被收集,从而最小化停顿的频率.您需要反复转动旋钮才能确定适合您的尺寸和尺寸.这如何与每次收集获得的暂停大小相平衡.例如,您可能会发现更短但更频繁的 YG 暂停是可以实现的.
The exact params you would use for this setup depend on what the steady state size of your working set is (i.e. how much is alive at the time of each collection). The thinking here obviously goes against the normal heap sizing rules but then you don't have an app that behaves in that way. The thinking is that the app is mostly v short lived garbage and a bit of static data so set the jvm up so that that static data gets into tenured quickly and then have a YG big enough that it doesn't get collected v often thus minimising the frequency of the pauses. You'd need to twiddle knobs repeatedly to work out what a good size is for you & how that balances against the size of the pause you get per collection. You might find shorter but more frequent YG pauses are achieveable for example.
您没有说您的应用程序运行了多长时间,但这里的目标是在应用程序的整个生命周期内完全没有终身收藏.当然,这可能是不可能的,但值得瞄准.
You don't say how long your app runs for but the target here is to have no tenured collections at all for the life of the app. This may be impossible of course but it's worth aiming for.
但是,在您的情况下,重要的不仅仅是收集算法,而是分配内存的地方.NUMA 收集器(仅与吞吐量收集器兼容并通过 UseNUMA 开关激活)利用了对象通常纯粹由创建它的线程使用的观察结果.因此相应地分配内存.我不确定它在 linux 中基于什么,但它在 Solaris 上使用 MPO(内存放置优化),关于 GC 家伙博客之一的一些详细信息
However it's not just the collection algo that is important in your case, it is where the memory is allocated. The NUMA collector (only compatible with the throughput collector and activated with UseNUMA switch) makes use of the observation that an object is often uses purely by the thread that created it & thus allocates memory accordingly. I'm not sure what it is based on in linux but it uses MPO (memory placement optimisation) on Solaris, some details on one of the GC guys blogs
由于您使用的是 64 位 jvm,因此请确保您也在使用 CompressedOops.
Since you're using 64bit jvm then make sure you're using CompressedOops as well.
鉴于对象分配的速率(可能是某种科学库?)和生命周期,那么您应该考虑对象重用.执行此操作的库的一个示例是 javalution StackContext
Given that rate of object allocation (possibly some sort of science lib?) and lifetime then you should give some consideration to object reuse. One example of a lib doing this is the javalution StackContext
最后值得注意的是,GC 暂停并不是唯一的 STW 暂停,您可以使用 6u21 早期访问来运行 build 对 PrintGCApplicationStoppedTime 和 PrintGCApplicationConcurrentTime 开关进行了一些修复(有效地打印全局安全点的时间和这些安全点之间的时间).您可以使用 tracesafepointstatistics 标志来了解导致它需要安全点的原因(也就是没有任何线程正在执行字节码).
Finally it's worth noting that GC pauses are not the only STW pauses, you could run with the 6u21 early access build which has some fixes to the PrintGCApplicationStoppedTime and PrintGCApplicationConcurrentTime switches (that effectively print time at a global safepoint and time between those safepoints). You can use the tracesafepointstatistics flag to get some idea of what is causing it to need a safepoint (aka no byte code is being executed by any thread).
相关文章