如何在 C++ 中实现垃圾回收
我看到一些关于在 C 中实现 GC 的帖子,有些人说不可能这样做,因为 C 是弱类型的.我想知道如何在C++中实现GC.
I saw some post about implement GC in C and some people said it's impossible to do it because C is weakly typed. I want to know how to implement GC in C++.
我想知道如何做.非常感谢!
I want some general idea about how to do it. Thank you very much!
这是我朋友告诉我的彭博采访问题.那个时候他做的不好.我们想知道您对此的想法.
This is a Bloomberg interview question my friend told me. He did badly at that time. We want to know your ideas about this.
推荐答案
C 和 C++ 中的垃圾收集都是困难的话题,原因如下:
Garbage collection in C and C++ are both difficult topics for a few reasons:
指针可以转换为整数,反之亦然.这意味着我可以拥有一块只有通过获取整数、将其类型转换为指针、然后取消引用才能访问的内存块.垃圾收集器必须小心,不要认为一个块实际上仍然可以到达.
Pointers can be typecast to integers and vice-versa. This means that I could have a block of memory that is reachable only by taking an integer, typecasting it to a pointer, then dereferencing it. A garbage collector has to be careful not to think a block is unreachable when indeed it still can be reached.
指针不是不透明的.许多垃圾收集器,如停止和复制收集器,喜欢移动内存块或压缩它们以节省空间.由于您可以显式查看 C 和 C++ 中的指针值,因此很难正确实现.您必须确保,如果有人在将类型转换为整数时做一些棘手的事情,如果您移动了一块内存,您正确地更新了整数.
Pointers are not opaque. Many garbage collectors, like stop-and-copy collectors, like to move blocks of memory around or compact them to save space. Since you can explicitly look at pointer values in C and C++, this can be difficult to implement correctly. You would have to be sure that if someone was doing something tricky with typecasting to integers that you correctly updated the integer if you moved a block of memory around.
内存管理可以显式进行.任何垃圾收集器都需要考虑到用户可以随时显式释放内存块.
Memory management can be done explicitly. Any garbage collector will need to take into account that the user is able to explicitly free blocks of memory at any time.
在 C++ 中,分配/释放和对象构造/销毁是分开的.可以为一块内存分配足够的空间来容纳一个对象,而无需在其中实际构造任何对象.一个好的垃圾收集器需要知道,当它回收内存时,是否为可能分配在那里的任何对象调用析构函数.对于标准库容器尤其如此,出于效率原因,它们通常使用 std::allocator
来使用此技巧.
In C++, there is a separation between allocation/deallocation and object construction/destruction. A block of memory can be allocated with sufficient space to hold an object without any object actually being constructed there. A good garbage collector would need to know, when it reclaims memory, whether or not to call the destructor for any objects that might be allocated there. This is especially true for the standard library containers, which often make use of std::allocator
to use this trick for efficiency reasons.
可以从不同的区域分配内存.C 和 C++ 可以从内置的 freestore(malloc/free 或 new/delete)或通过 mmap
或其他系统调用从操作系统获取内存,对于 C++,从get_temporary_buffer
或 return_temporary_buffer
.这些程序也可能从某些第三方库中获取内存.一个好的垃圾收集器需要能够跟踪对这些其他池中内存的引用,并且(可能)必须负责清理它们.
Memory can be allocated from different areas. C and C++ can get memory either from the built-in freestore (malloc/free or new/delete), or from the OS via mmap
or other system calls, and, in the case of C++, from get_temporary_buffer
or return_temporary_buffer
. The programs might also get memory from some third-party library. A good garbage collector needs to be able to track references to memory in these other pools and (possibly) would have to be responsible for cleaning them up.
指针可以指向对象或数组的中间.在许多垃圾收集语言(如 Java)中,对象引用总是指向对象的开头.在 C 和 C++ 中,指针可以指向数组的中间,而在 C++ 中,指针可以指向对象的中间(如果使用多重继承).这会使检测仍然可以访问的内容的逻辑变得非常复杂.
Pointers can point into the middle of objects or arrays. In many garbage-collected languages like Java, object references always point to the start of the object. In C and C++ pointers can point into the middle of arrays, and in C++ into the middle of objects (if multiple inheritance is used). This can greatly complicate the logic for detecting what's still reachable.
因此,简而言之,为 C 或 C++ 构建垃圾收集器非常困难.大多数在 C 和 C++ 中进行垃圾收集的库的方法都非常保守,并且在技术上是不健全的――例如,它们假设您不会获取指针,将其转换为整数,将其写入磁盘,然后加载它在稍后的某个时间回来.他们还假设内存中任何指针大小的值都可能是指针,因此有时会拒绝释放无法访问的内存,因为存在指向它的指针的可能性非零.
So, in short, it's extremely hard to build a garbage collector for C or C++. Most libraries that do garbage collection in C and C++ are extremely conservative in their approach and are technically unsound - they assume that you won't, for example, take a pointer, cast it to an integer, write it to disk, and then load it back in at some later time. They also assume that any value in memory that's the size of a pointer could possibly be a pointer, and so sometimes refuse to free unreachable memory because there's a nonzero chance that there's a pointer to it.
正如其他人指出的那样,Boehm GC 确实为 C 和 C++ 进行垃圾收集,但是受上述限制.
As others have pointed out, the Boehm GC does do garbage collection for C and C++, but subject to the aforementioned restrictions.
有趣的是,C++11 包含一些新的库函数,允许程序员将内存区域标记为可访问和不可访问,以应对未来的垃圾收集工作.将来有可能用这种信息构建一个非常好的 C++11 垃圾收集器.但与此同时,您需要非常小心,不要违反上述任何规则.
Interestingly, C++11 includes some new library functions that allow the programmer to mark regions of memory as reachable and unreachable in anticipation of future garbage collection efforts. It may be possible in the future to build a really good C++11 garbage collector with this sort of information. In the meantime though, you'll need to be extremely careful not to break any of the above rules.
相关文章