Java“为"语句实现防止垃圾收集

UPD 21.11.2017:该错误已在 JDK 中修复,请参阅 Vicente Romero 的评论

UPD 21.11.2017: the bug is fixed in JDK, see comment from Vicente Romero

总结:

如果 for 语句用于任何 Iterable 实现,则集合将保留在堆内存中,直到当前范围(方法、语句体)结束,并且即使您没有对集合的任何其他引用并且应用程序需要分配新内存,也不会被垃圾回收.

If for statement is used for any Iterable implementation the collection will remain in the heap memory till the end of current scope (method, statement body) and won't be garbage collected even if you don't have any other references to the collection and the application needs to allocate a new memory.

http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8175883

https://bugs.openjdk.java.net/browse/JDK-8175883

例子:

如果我有下一个代码,它会分配一个包含随机内容的大字符串列表:

If i have the next code, which allocates a list of large strings with random content:

import java.util.ArrayList;
public class IteratorAndGc {
    
    // number of strings and the size of every string
    static final int N = 7500;

    public static void main(String[] args) {
        System.gc();

        gcInMethod();

        System.gc();
        showMemoryUsage("GC after the method body");

        ArrayList<String> strings2 = generateLargeStringsArray(N);
        showMemoryUsage("Third allocation outside the method is always successful");
    }

    // main testable method
    public static void gcInMethod() {

        showMemoryUsage("Before first memory allocating");
        ArrayList<String> strings = generateLargeStringsArray(N);
        showMemoryUsage("After first memory allocation");


        // this is only one difference - after the iterator created, memory won't be collected till end of this function
        for (String string : strings);
        showMemoryUsage("After iteration");

        strings = null; // discard the reference to the array

        // one says this doesn't guarantee garbage collection,
        // Oracle says "the Java Virtual Machine has made a best effort to reclaim space from all discarded objects".
        // but no matter - the program behavior remains the same with or without this line. You may skip it and test.
        System.gc();

        showMemoryUsage("After force GC in the method body");

        try {
            System.out.println("Try to allocate memory in the method body again:");
            ArrayList<String> strings2 = generateLargeStringsArray(N);
            showMemoryUsage("After secondary memory allocation");
        } catch (OutOfMemoryError e) {
            showMemoryUsage("!!!! Out of memory error !!!!");
            System.out.println();
        }
    }
    
    // function to allocate and return a reference to a lot of memory
    private static ArrayList<String> generateLargeStringsArray(int N) {
        ArrayList<String> strings = new ArrayList<>(N);
        for (int i = 0; i < N; i++) {
            StringBuilder sb = new StringBuilder(N);
            for (int j = 0; j < N; j++) {
                sb.append((char)Math.round(Math.random() * 0xFFFF));
            }
            strings.add(sb.toString());
        }

        return strings;
    }

    // helper method to display current memory status
    public static void showMemoryUsage(String action) {
        long free = Runtime.getRuntime().freeMemory();
        long total = Runtime.getRuntime().totalMemory();
        long max = Runtime.getRuntime().maxMemory();
        long used = total - free;
        System.out.printf("	%40s: %10dk of max %10dk%n", action, used / 1024, max / 1024);
    }
}

用有限的内存编译和运行它,像这样(180mb):

compile and run it with limited memory, like this (180mb):

javac IteratorAndGc.java   &&   java -Xms180m -Xmx180m IteratorAndGc

在运行时我有:

在第一次分配内存之前:1251k of max 176640k

Before first memory allocating: 1251k of max 176640k

第一次内存分配后:131426k of max 176640k

After first memory allocation: 131426k of max 176640k

迭代后:131426k of max 176640k

After iteration: 131426k of max 176640k

在方法体中强制 GC 后:最大 176640k 的 110682k(几乎没有收集到)

After force GC in the method body: 110682k of max 176640k (almost nothing collected)

再次尝试在方法体中分配内存:

Try to allocate memory in the method body again:

     !!!! Out of memory error !!!!:     168948k of max     176640k

方法体后的GC:459k of max 176640k(垃圾被收集了!)

GC after the method body: 459k of max 176640k (the garbage is collected!)

方法外第三次分配总是成功:117740k of max 163840k

Third allocation outside the method is always successful: 117740k of max 163840k

所以,在 gcInMethod() 内部,我尝试分配列表,对其进行迭代,丢弃对列表的引用,(可选)强制垃圾收集并再次分配类似的列表.但由于内存不足,我无法分配第二个数组.

So, inside gcInMethod() i tried to allocate the list, iterate over it, discard the reference to the list, (optional)force garbage collection and allocate similar list again. But i can't allocate second array because of lack of memory.

同时,在函数体之外,我可以成功强制垃圾回收(可选)并再次分配相同的数组大小!

In the same time, outside the function body i can successfully force garbage collection (optional) and allocate the same array size again!

为了避免函数体内出现这种 OutOfMemoryError,只需删除/注释这一行即可:

To avoid this OutOfMemoryError inside the function body it's enough to remove/comment only this one line:

for (String string : strings); <--这是邪恶的!!!

for (String string : strings); <-- this is the evil!!!

然后输出如下:

在第一次分配内存之前:1251k of max 176640k

Before first memory allocating: 1251k of max 176640k

第一次内存分配后:最大 176640k 中的 131409k

After first memory allocation: 131409k of max 176640k

迭代后:131409k of max 176640k

After iteration: 131409k of max 176640k

在方法体中强制GC后:497k of max 176640k(垃圾被收集了!)

After force GC in the method body: 497k of max 176640k (the garbage is collected!)

再次尝试在方法体中分配内存:

Try to allocate memory in the method body again:

二级内存分配后:115541k of max 163840k

After secondary memory allocation: 115541k of max 163840k

方法体后的GC:493k of max 163840k(垃圾被收集了!)

GC after the method body: 493k of max 163840k (the garbage is collected!)

方法外第三次分配总是成功:121300k of max 163840k

Third allocation outside the method is always successful: 121300k of max 163840k

所以,在没有for迭代的情况下,在丢弃对字符串的引用后成功收集垃圾,并第二次分配(在函数体内)和第三次分配(在方法外).

So, without for iterating the garbage successfully collected after discarding the reference to the strings, and allocated second time (inside the function body) and allocated third time (outside the method).

我的假设:

for 语法构造被编译成

Iterator iter = strings.iterator();
while(iter.hasNext()){
    iter.next()
}

(我检查了这个反编译 javap -c IteratorAndGc.class)

(and i checked this decompiling javap -c IteratorAndGc.class)

并且看起来这个 iter 引用一直停留在范围内.您无权访问引用以使其无效,并且 GC 无法执行收集.

And looks like this iter reference stays in the scope till the end. You don't have access to the reference to nullify it, and GC can't perform the collection.

也许这是正常行为(甚至可能在 javac 中指定,但我还没有找到),但恕我直言,如果编译器创建了一些实例,它应该关心在之后从范围中丢弃它们使用.

Maybe this is normal behavior (maybe even specified in javac, but i haven't found), but IMHO if compiler creates some instances it should care about discarding them from the scope after using.

这就是我期望实现 for 语句的方式:

That's how i expect to have the implementation of for statement:

Iterator iter = strings.iterator();
while(iter.hasNext()){
    iter.next()
}
iter = null; // <--- flush the water!

使用的 java 编译器和运行时版本:

Used java compiler and runtime versions:

javac 1.8.0_111

java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)

注意:

  • 问题不在于编程风格、最佳实践、约定等等,问题是关于Java的效率平台.

  • the question is not about programming style, best practices, conventions and so on, the question is about an efficiency of Java platform.

问题不在于 System.gc() 行为(您可以删除所有gc 示例中的调用) - 在第二次字符串分配期间,JVM 必须 释放丢弃的内存.

the question is not about System.gc() behavior (you may remove all gc calls from the example) - during the second strings allocation the JVM must release the dicarded memory.

对测试java类的引用, 在线编译器测试(但是这个资源只有50Mb的堆,所以使用N = 5000)

Reference to the test java class, Online compiler to test (but this resource has only 50 Mb of heap, so use N = 5000)

推荐答案

感谢您的错误报告.我们已修复此错误,请参阅 JDK-8175883.正如这里在 enhanced for 的情况下所评论的,javac 正在生成合成变量,因此对于如下代码:

Thanks for the bug report. We have fixed this bug, see JDK-8175883. As commented here in the case of the enhanced for, javac was generating synthetic variables so for a code like:

void foo(String[] data) {
    for (String s : data);
}

javac 大约在生成:

javac was approximately generating:

for (String[] arr$ = data, len$ = arr$.length, i$ = 0; i$ < len$; ++i$) {
    String s = arr$[i$];
}

如上所述,这种转换方法意味着合成变量 arr$ 持有对数组 data 的引用,一旦未引用该数组,就会阻止 GC 收集该数组不再在方法内部.此错误已通过生成此代码修复:

as mentioned above this translation approach implies that the synthetic variable arr$ holds a reference to the array data that impedes the GC to collect the array once it is not referred anymore inside the method. This bug has been fixed by generating this code:

String[] arr$ = data;
String s;
for (int len$ = arr$.length, i$ = 0; i$ < len$; ++i$) {
    s = arr$[i$];
}
arr$ = null;
s = null;

这个想法是将 javac 创建的任何引用类型的合成变量设置为 null 以转换循环.如果我们谈论的是原始类型的数组,那么最后一个对 null 的赋值不是由编译器生成的.该错误已在 repo JDK repo

The idea is to set to null any synthetic variable of a reference type created by javac to translate the loop. If we were talking about an array of a primitive type, then the last assignment to null is not generated by the compiler. The bug has been fixed in repo JDK repo

相关文章