Java 非常大的堆大小

有没有人在 Java 中使用过非常大的堆(12 GB 或更高)?

Does anyone have experience with using very large heaps, 12 GB or higher in Java?

  • GC 是否使程序无法使用?
  • 您使用哪些 GC 参数?
  • 哪个 JVM、Sun 或 BEA 更适合这个?
  • Linux 或 Windows 哪个平台在这种情况下表现更好?
  • 在 Windows 的情况下,在如此高的内存负载下,64 位 Vista 和 XP 之间是否存在性能差异?

推荐答案

如果您的应用程序不是交互式的,并且 GC 暂停对您来说不是问题,那么 64 位 Java 处理非常大的问题应该没有问题堆,甚至数百 GB.我们也没有注意到 Windows 或 Linux 上的任何稳定性问题.

If your application is not interactive, and GC pauses are not an issue for you, there shouldn't be any problem for 64-bit Java to handle very large heaps, even in hundreds of GBs. We also haven't noticed any stability issues on either Windows or Linux.

但是,当您需要保持低 GC 暂停时,事情会变得非常糟糕:

However, when you need to keep GC pauses low, things get really nasty:

  1. 忘记默认吞吐量,即 stop-the-world GC.对于中等堆(<~30 GB),它将暂停您的应用程序数十秒,对于大型堆(> ~30 GB),它将暂停您的应用程序几分钟.购买速度更快的 DIMM 也无济于事.

  1. Forget the default throughput, stop-the-world GC. It will pause you application for several tens of seconds for moderate heaps (< ~30 GB) and several minutes for large ones (> ~30 GB). And buying faster DIMMs won't help.

最好的选择可能是 CMS 收集器,由 -XX:+UseConcMarkSweepGC 启用.CMS 垃圾收集器仅在初始标记阶段和重新标记阶段停止应用程序.对于非常小的堆,例如 <4 GB 这通常不是问题,但是对于创建大量垃圾和大堆的应用程序,重新标记阶段可能需要相当长的时间 - 通常比完全停止世界要短得多,但仍然可以非常大的堆的问题.

The best bet is probably the CMS collector, enabled by -XX:+UseConcMarkSweepGC. The CMS garbage collector stops the application only for the initial marking phase and remarking phases. For very small heaps like < 4 GB this is usually not a problem, but for an application that creates a lot of garbage and a large heap, the remarking phase can take quite a long time - usually much less then full stop-the-world, but still can be a problem for very large heaps.

当 CMS 垃圾收集器在年老代填满之前不够快完成操作时,它会退回到标准的 stop-the-world GC.对于大小为 16 GB 的堆,预计会有大约 30 秒或更长时间的停顿.您可以尽量避免这种情况,使您的应用程序的长期垃圾产生率尽可能低.请注意,运行应用程序的核心数量越多,问题就越大,因为 CMS 只使用一个核心.显然,请注意 no 保证 CMS 不会退回到 STW 收集器.当它发生时,它通常发生在峰值负载时,你的应用程序会死掉几秒钟.您可能不想为此类配置签署 SLA.

When the CMS garbage collector is not fast enough to finish operation before the tenured generation fills up, it falls back to standard stop-the-world GC. Expect ~30 or more second long pauses for heaps of size 16 GB. You can try to avoid this keeping the long-lived garbage production rate of you application as low as possible. Note that the higher the number of the cores running your application is, the bigger is getting this problem, because the CMS utilizes only one core. Obviously, beware there is no guarantee the CMS does not fall back to the STW collector. And when it does, it usually happens at the peak loads, and your application is dead for several seconds. You would probably not want to sign an SLA for such a configuration.

嗯,有新的 G1 东西.理论上它是为了避免 CMS 的问题而设计的,但我们已经尝试过并观察到:

Well, there is that new G1 thing. It is theoretically designed to avoid the problems with CMS, but we have tried it and observed that:

  • 它的吞吐量比 CMS 差.
  • 理论上它应该首先避免收集流行的内存块,但它很快就会达到几乎所有块都是流行"的状态,并且它所基于的假设只是停止工作.
  • 最后,G1 的 stop-the-world 后备方案仍然存在;询问甲骨文,该代码应该何时运行.如果他们说从不",问他们,为什么代码在那里.所以恕我直言,G1 并没有真正解决 Java 的巨大堆问题,它只会让它(可以说)更小一些.

如果您有钱购买具有大内存的大型服务器,那么您可能也有钱购买优质的商业硬件加速、无暂停 GC 技术,例如 Azul 提供的技术.我们有一台具有 384 GB RAM 的服务器,它运行良好 - 没有停顿,GC 中有 0 行停止世界的代码.

If you have bucks for a big server with big memory, you have probably also bucks for a good, commercial hardware accelerated, pauseless GC technology, like the one offered by Azul. We have one of their servers with 384 GB RAM and it really works fine - no pauses, 0-lines of stop-the-world code in the GC.

用 C++ 编写需要大量内存的应用程序的该死部分,就像 LinkedIn 在社交图处理中所做的那样.这样做仍然无法避免所有问题(例如堆碎片),但保持低停顿肯定会更容易.

Write the damn part of your application that requires lots of memory in C++, like LinkedIn did with social graph processing. You still won't avoid all the problems by doing this (e.g. heap fragmentation), but it would be definitely easier to keep the pauses low.

相关文章