为什么即使堆等大小稳定,Sun JVM 仍会继续消耗更多的 RSS 内存?

2022-01-16 00:00:00 performance memory jvm java sun

在过去的一年里,我的应用程序的 Java 堆使用量有了很大的改进——减少了 66%.为此,我一直在通过 SNMP 监控各种指标,例如 Java 堆大小、cpu、Java 非堆等.

Over the past year I've made huge improvements in my application's Java heap usage--a solid 66% reduction. In pursuit of that, I've been monitoring various metrics, such as Java heap size, cpu, Java non-heap, etc. via SNMP.

最近,我一直在监视 JVM 有多少实际内存(RSS,驻留集),我有些惊讶.JVM 消耗的实际内存似乎完全与我的应用程序堆大小、非堆、伊甸园空间、线程数等无关.

Recently, I've been monitoring how much real memory (RSS, resident set) by the JVM and am somewhat surprised. The real memory consumed by the JVM seems totally independent of my applications heap size, non-heap, eden space, thread count, etc.

由 Java SNMP 测量的堆大小Java 堆使用图 http://lanai.dietpizza.ch/images/jvm-heap-used.png

以 KB 为单位的实际内存.(例如:1 MB 的 KB = 1 GB)Java 堆使用图 http://lanai.dietpizza.ch/images/jvm-rss.png

(堆图中的三个下降对应于应用程序更新/重启.)

这对我来说是个问题,因为 JVM 消耗的所有额外内存都是窃取"操作系统可用于文件缓存的内存.事实上,一旦 RSS 值达到 ~2.5-3GB,我开始看到我的应用程序的响应时间变慢并且 CPU 利用率更高,主要是因为 IO 等待.随着对交换分区的某个点分页开始.这都是非常不可取的.

This is a problem for me because all that extra memory the JVM is consuming is 'stealing' memory that could be used by the OS for file caching. In fact, once the RSS value reaches ~2.5-3GB, I start to see slower response times and higher CPU utilization from my application, mostly do to IO wait. As some point paging to the swap partition kicks in. This is all very undesirable.

那么,我的问题:

  • 为什么会这样?幕后"发生了什么?
  • 如何控制 JVM 的实际内存消耗?

血淋淋的细节:

  • RHEL4 64 位(Linux - 2.6.9-78.0.5.ELsmp #1 SMP Wed Sep 24 ... 2008 x86_64 ... GNU/Linux)
  • Java 6(内部版本 1.6.0_07-b06)
  • 雄猫 6
  • 应用(点播 HTTP 视频流)
    • 通过 java.nio FileChannels 实现高 I/O
    • 数百到数千个线程
    • 数据库使用率低
    • 春天,休眠

    相关JVM参数:

    -Xms128m  
    -Xmx640m  
    -XX:+UseConcMarkSweepGC  
    -XX:+AlwaysActAsServerClassMachine  
    -XX:+CMSIncrementalMode    
    
    -XX:+PrintGCDetails 
    -XX:+PrintGCTimeStamps  
    -XX:+PrintGCApplicationStoppedTime  
    -XX:+CMSLoopWarn  
    -XX:+HeapDumpOnOutOfMemoryError 
    

    我如何衡量 RSS:

    ps x -o command,rss | grep java | grep latest | cut -b 17-
    

    这进入一个文本文件,并定期读入监控系统的 RRD 数据库.请注意,ps 输出千字节.

    This goes into a text file and is read into an RRD database my the monitoring system on regular intervals. Note that ps outputs Kilo Bytes.

    虽然最终证明是 ATorras 的答案是正确的,但它kdgregory 指导我使用 pmap.(去投票给他们的两个答案!)这是发生了什么:

    While in the end it was ATorras's answer that proved ultimately correct, it kdgregory who guided me to the correct diagnostics path with the use of pmap. (Go vote up both their answers!) Here is what was happening:

    我肯定知道的事情:

    1. 我的应用程序使用 JRobin 1.4 记录和显示数据,这是我三年多前在我的应用程序中编写的代码.
    2. 当前创建的应用程序最繁忙的实例
    1. My application records and displays data with JRobin 1.4, something I coded into my app over three years ago.
    2. The busiest instance of the application currently creates
    1. 在启动后一小时内创建了超过 1000 个新的 JRobin 数据库文件(每个大约 1.3MB)
    2. 开机后每天~100+

  • 应用程序每 15 秒更新一次这些 JRobin 数据库对象,如果有要写的话.
  • 在JRobin的默认配置中:

  • The app updates these JRobin data base objects once every 15s, if there is something to write.
  • In the default configuration JRobin:

    1. 使用基于 java.nio 的文件访问后端.这个后端将 MappedByteBuffers 映射到文件本身.
    2. 每五分钟一次,JRobin 守护线程在每个 JRobin 底层数据库 MBB 上调用 MappedByteBuffer.force()
    1. uses a java.nio-based file access back-end. This back-end maps MappedByteBuffers to the files themselves.
    2. once every five minutes a JRobin daemon thread calls MappedByteBuffer.force() on every JRobin underlying database MBB

  • pmap 列出:

    1. 6500 个映射
    2. 其中 5500 个是 1.3MB 的 JRobin 数据库文件,总计约 7.1GB

  • 最后一点是我的尤里卡!"时刻.

    我的纠正措施:

    1. 考虑更新到明显更好的最新 JRobinLite 1.5.2
    2. 对 JRobin 数据库实施适当的资源处理.目前,一旦我的应用程序创建了一个数据库,然后在数据库不再被积极使用后就不再转储它.
    3. 尝试将 MappedByteBuffer.force() 移动到数据库更新事件,而不是定期计时器.问题会神奇地消失吗?
    4. 马上,把JRobin后端改成java.io实现——一行换行.这会慢一些,但这可能不是问题.下图显示了这一变化的直接影响.
    1. Consider updating to the latest JRobinLite 1.5.2 which is apparently better
    2. Implement proper resource handling on JRobin databases. At the moment, once my application creates a database and then never dumps it after the database is no longer actively used.
    3. Experiment with moving the MappedByteBuffer.force() to database update events, and not a periodic timer. Will the problem magically go away?
    4. Immediately, change the JRobin back-end to the java.io implementation--a line line change. This will be slower, but it is possibly not an issue. Here is a graph showing the immediate impact of this change.

    Java RSS 内存使用图 http://lanai.dietpizza.ch/images/stackoverflow-rss-problem-fixed.png

    我可能有时间也可能没有时间解决的问题:

    • 使用 MappedByteBuffer.force() 在 JVM 内部发生了什么?如果没有任何变化,它是否仍然写入整个文件?文件的一部分?是先加载吗?
    • RSS 中是否始终存在一定数量的 MBB?(RSS 大约是分配的 MBB 总大小的一半.巧合?我怀疑不是.)
    • 如果我将 MappedByteBuffer.force() 移动到数据库更新事件,而不是定期计时器,问题会神奇地消失吗?
    • 为什么 RSS 斜率如此规律?它与任何应用程序负载指标无关.
    • What is going on inside the JVM with MappedByteBuffer.force()? If nothing has changed, does it still write the entire file? Part of the file? Does it load it first?
    • Is there a certain amount of the MBB always in RSS at all times? (RSS was roughly half the total allocated MBB sizes. Coincidence? I suspect not.)
    • If I move the MappedByteBuffer.force() to database update events, and not a periodic timer, will the problem magically go away?
    • Why was the RSS slope so regular? It does not correlate to any of the application load metrics.

    推荐答案

    只是一个想法:NIO 缓冲区放置在 JVM 之外.

    Just an idea: NIO buffers are placed outside the JVM.

    根据 2016 年,值得考虑 @Lari Hotari 评论 [ 为什么即使堆等大小稳定,Sun JVM 仍会继续消耗更多的 RSS 内存?] 因为早在 2009 年,RHEL4 就有 glibc <2.10 (~2.3)

    As per 2016 it's worth considering @Lari Hotari comment [ Why does the Sun JVM continue to consume ever more RSS memory even when the heap, etc sizes are stable? ] because back to 2009, RHEL4 had glibc < 2.10 (~2.3)

    问候.

相关文章