卡桌和作家屏障实际上是​​如何工作的?

2022-01-16 00:00:00 garbage-collection java

我正在阅读一些关于 Java 垃圾收集的资料,以便更深入地了解 GC 过程中真正发生的事情.

I am reading some materials about garbage collection in Java in order to get to know more deeply what really happens in GC process.

我遇到了一种叫做牌桌"的机制.我已经用谷歌搜索了它,但没有找到全面的信息.大多数解释都很肤浅,描述得像魔术一样.

I came across on the mechanism called "card table". I've Googled for it and haven't found comprehensive information. Most of explanations are rather shallow and describes it like some magic.

我的问题是:卡片表和写屏障如何工作?牌桌上有什么标记?那么垃圾收集器如何知道该特定对象被老一代中持久存在的另一个对象引用.

My question is: How card table and write barrier works? What is marked in card tables? How then garbage collector knows that particular object is referenced by another object persisted in older generation.

我想对那个机制有一些实际的想象,就像我应该准备一些模拟一样.

I would like to have some practical imagination about that mechanism, like I was supposed to prepare some simulation.

推荐答案

不知道你是不是发现了一些特别糟糕的描述,还是你期望的细节太多,我对 我看到的解释.如果描述简短且听起来简单,那是因为它确实是一个相当简单的机制.

I don't know whether you found some exceptionally bad description or whether you expect too many details, I've been quite satisfied with the explanations I've seen. If the descriptions are brief and sound simplistic, that's because it really is a rather simple mechanism.

您显然已经知道,分代垃圾收集器需要能够枚举引用年轻对象的旧对象.扫描所有旧对象是正确的,但这会破坏分代方法的优势,因此您必须缩小范围.不管你怎么做,你都需要一个写屏障——每当分配/写入成员变量(引用类型)时执行的一段代码.如果新的引用指向一个年轻的对象并且它被存储在一个旧的对象中,那么写屏障会为垃圾收集记录这个事实.不同之处在于它的记录方式.有使用所谓的记忆集的精确方案,每个旧对象的集合,这些旧对象具有(在某些时候)对年轻对象的引用.可以想象,这需要相当多的空间.

As you apparently already know, a generational garbage collector needs to be able to enumerate old objects that refer to young objects. It would be correct to scan all old objects, but that destroys the advantages of the generational approach, so you have to narrow it down. Regardless of how you do that, you need a write barrier - a piece of code executed whenever a member variable (of a reference type) is assigned/written to. If the new reference points to a young object and it's stored in an old object, the write barrier records that fact for the garbage collect. The difference lies in how it's recorded. There are exact schemes using so-called remembered sets, a collection of every old object that has (had at some point) a reference to a young object. As you can imagine, this takes quite a bit of space.

卡片表是一种权衡:它不会告诉您哪些对象确切包含年轻指针(或至少在某些时候确实如此),而是将对象分组到固定大小的桶中并跟踪哪些对象桶包含带有年轻指针的对象.当然,这减少了空间使用.为了正确起见,您如何存储对象并不重要,只要您对此保持一致即可.为了提高效率,您只需将它们按内存地址分组(因为您可以免费使用),然后除以 2 的较大幂(以使除法成为一种廉价的按位运算).

The card table is a trade-off: Instead of telling you which objects exactly contains young pointers (or at least did at some point), it groups objects into fixed-sized buckets and tracks which buckets contain objects with young pointers. This, of course, reduces space usage. For correctness, it doesn't really matter how you bucket the objects, as long as you're consistent about it. For efficiency, you just group them by their memory address (because you have that available for free), divided by some larger power of two (to make the division a cheap bitwise operation).

此外,您无需维护明确的存储桶列表,而是预先为每个可能的存储桶预留一些空间.具体来说,有一个由 N 位或字节组成的数组,其中 N 是桶的数量,因此如果第 i 桶不包含年轻的,那么第 i 的值为 0指针,如果确实包含年轻指针,则为 1.这是正确的牌桌.通常,该空间与用作堆(一部分)的大块内存一起分配和释放.如果不需要增长,它甚至可以嵌入到内存块的开头.除非将整个地址空间用作堆(这种情况非常少见),否则上述公式给出的数字是从 start_of_memory_region >> 开始的.K 而不是 0,因此要获得卡表的索引,您必须减去堆的起始地址的开头.

Also, instead of maintaining an explicit list of buckets, you reserve some space for each possible bucket up-front. Specifically, there is an array of N bits or bytes, where N is the number of buckets, so that the ith value is 0 if the ith bucket contains no young pointers, or 1 if it does contain young pointers. This is the card table proper. Typically this space is allocated and freed along with a large block of memory used as (part of) the heap. It may even be embedded in the start of the memory block, if it doesn't need to grow. Unless the entire address space is used as heap (which is very rare), the above formula gives numbers starting from start_of_memory_region >> K instead of 0, so to get an index into the card table you have to subtract the start of the start address of the heap.

综上所述,当写屏障发现语句some_obj.field = other_obj;将年轻指针存储在旧对象中时,它会这样做:

In summary, when the write barrier finds that the statement some_obj.field = other_obj; stores a young pointer in an old object, it does this:

card_table[(&old_obj - start_of_heap) >> K] = 1;

其中 &old_obj 是现在具有年轻指针的对象的地址(它已经在寄存器中,因为它刚刚确定要引用旧对象).在 Minor GC 期间,垃圾收集器查看卡表以确定要扫描哪些堆区域以查找年轻指针:

Where &old_obj is the address of the object that now has a young pointer (which is already in a register because it was just determined to refer to an old object). During minor GC, the garbage collector looks at the card table to determine which heap regions to scan for young pointers:

for i from 0 to (heap_size >> K):
    if card_table[i]:
        scan heap[i << K .. (i + 1) << K] for young pointers

相关文章