如何迭代所有 malloc 块(glibc)
我正在尝试迭代所有领域中的所有 malloc_chunk.(基于core文件调试,用于内存泄漏和内存损坏调查)
I'm trying to iterate all the malloc_chunk in all arenas. (debugging based on core file, for memory leak and memory corruption investigation)
据我所知,每个竞技场都有 top_chunk,它基于 top_chunk 指向一个竞技场内的顶部块,其中有 prev_size 和 size,基于代码 (glibc/malloc/malloc.c):我可以得到之前的连续块,然后在一个竞技场中循环所有块.(我可以统计块的大小和数量,就像WinDBG:!heap -stat -h)并且还基于prev_size和size,我可以检查块是否损坏.
As i know each arena have top_chunk which point to the top chunk inside of one arena, based on top_chunk, inside of it, there's prev_size and size, based on the code (glibc/malloc/malloc.c): I can get the previous continuous chunks, and then loop all the chunks in one arena. (i can statistic the chunks with the size and the number, which like WinDBG: !heap -stat -h) and also based on prev_size and size, i can check the chunk is corrupt or not.
在arena(malloc_state)中,有一个成员变量:next,它指向下一个arena.然后我可以循环所有竞技场的块.
In arena(malloc_state), there's a member variable: next which point to next arena. Then i can loop all the arena's chunks.
但是我遇到一个问题,如果chunk没有分配,prev_size是无效的,如何获取之前的malloc_chunk??或者这种方式不正确.
But i met a problem is if the chunk is not allocated, the prev_size is invalid, how to get the previous malloc_chunk?? Or this way is not correct.
问题背景:
我们遇到的内存泄漏问题是在几个在线数据节点(我们的项目是分布式存储集群)中报告的内存泄漏.
The memory leak bug we have is memory leak reported in several online data node(our project is distributed storage cluster).
我们做了什么和结果:
我们使用 valrgind 来重现测试集群中的错误,但不幸的是我们一无所获.
We use valrgind to reproduce the bug in test cluster, but unfortunately we get nothing.
我尝试对堆进行更多调查,尝试分析堆块并遵循我之前在 WinDBG 中所做的方式(其中有非常有趣的堆命令来挖掘内存泄漏和内存损坏),但我被我问的问题阻止了.
I tried to investigate more about the heap, tried to analyze the heap chunk and follow the way which i did before in WinDBG(which have very interesting heap commands to digger the memory leak and memory corruption), but i was blocked by the Question which i asked.
我们使用 valgrind-massif 来分析分配(我认为它非常详细和有趣,可以显示哪个分配占用了多少内存).Massif 显示了几个线索,我们按照这个检查代码,最后发现了泄漏(地图非常大,并且使用不当,但我会在持有者类的析构函数中删除,这就是为什么 valgrind 不报告这个).
We use valgrind-massif to analyze the allocation(which i think it's very detail and interesting, could show which allocation takes how much memory). Massif show several clues, we follow this and check code, finally found the leak(a map is very huge, and in-proper usage of it, but i would erase in holder-class's destructor, that's why valgrind not report this).
我将深入挖掘 gdb-heap 源代码以了解有关 glic malloc 结构的更多信息.
I'll digger more about the gdb-heap source code to know more about glic malloc structure.
推荐答案
免费开源程序https://github.com/vmware/chap 在这里为 glibc malloc 做你想要的.只需获取一个核心(要么是因为核心崩溃,要么是使用 gcore 或使用 gdb 中的 generate 命令来获取 lib 核心).然后只需通过以下方式打开核心:
The free open source program https://github.com/vmware/chap does what you want here for glibc malloc. Just grab a core (either because the core crashed or grab a lib core by using gcore or using the generate command from within gdb). Then just open the core by doing:
chap yourCoreFileName
一旦您到达 chap 提示符,如果您想遍历所有块,无论是免费的还是非免费的,您可以根据您想要的详细程度执行以下任何操作,但请记住分配";in chap 不包含块头,而是从 malloc 返回的地址开始.
Once you get to the chap prompt, if you want to iterate through all the chunks, both free and not, you can do any of the following, depending on the verbosity you want, but keeping in mind that an "allocation" in chap does not contain the chunk header, but rather starts at the address returned by malloc.
尝试以下任一方法:
count allocations
summarize allocations
describe allocations
show allocations
如果您只关心当前正在使用的分配,请尝试以下任一方法:
If you only care about allocations that are currently in use try any of the following:
count used
summarize used
describe used
show used
如果您只关心泄露的分配,请尝试以下任一方法:
If you only care about allocations that are leaked try any of the following:
count leaked
summarize leaked
describe leaked
show leaked
更多详细信息可从上述 github URL 获得的文档中获得.
More details are available in documentation available from the github URL mentioned above.
在损坏方面,chap 在启动时会进行一些检查并报告多种损坏,尽管有时输出可能有点神秘.
In terms of corruption, chap does some checking at startup and reports many kinds of corruption, although the output may be a bit cryptic at times.
相关文章