Java不是垃圾收集内存

我正在阅读一个非常大的文件并从每一行中提取一些文本的一小部分.然而,在手术结束时,我只剩下很少的内存可以使用了.似乎垃圾收集器在读入文件后无法释放内存.

I am reading a very large file and extracting some small portions of text from each line. However at the end of the operation, I am left with very little memory to work with. It seems that the garbage collector fails to free memory after reading in the file.

我的问题是:有没有办法释放这个内存?或者这是一个 JVM 错误?

My question is: Is there any way to free this memory? Or is this a JVM bug?

我创建了一个 SSCCE 来证明这一点.它读入一个 1 mb(Java 中为 2 mb,由于 16 位编码)文件,并从每一行中提取一个字符(约 4000 行,因此应该约为 8 kb).在测试结束时,仍然使用完整的 2 mb!

I created an SSCCE to demonstrate this. It reads in a 1 mb (2 mb in Java due to 16 bit encoding) file and extracts one character from each line (~4000 lines, so should be about 8 kb). At the end of the test, the full 2 mb is still used!

初始内存使用量:

Allocated: 93847.55 kb
Free: 93357.23 kb

读入文件后(在任何手动垃圾收集之前):

Immediately after reading in the file (before any manual garbage collection):

Allocated: 93847.55 kb
Free: 77613.45 kb (~16mb used)

这是意料之中的,因为程序正在使用大量资源来读取文件.

This is to be expected since the program is using a lot of resources to read in the file.

然后我进行垃圾收集,但并非所有内存都被释放:

However then I garbage collect, but not all the memory is freed:

Allocated: 93847.55 kb
Free: 91214.78 kb (~2 mb used! That's the entire file!)

我知道手动调用垃圾收集器并不能为您提供任何保证(在某些情况下它是懒惰的).然而,这发生在我较大的应用程序中,其中文件占用了几乎所有可用内存,并导致程序的其余部分耗尽内存,尽管需要它.这个例子证实了我的怀疑,从文件中读取的多余数据没有被释放.

I know that manually calling the garbage collector doesn't give you any guarantees (in some cases it is lazy). However this was happening in my larger application where the file eats up almost all available memory, and causes the rest of the program to run out of memory despite the need for it. This example confirms my suspicion that the excess data read from the file is not freed.

这里是生成测试的 SSCCE:

Here is the SSCCE to generate the test:

import java.io.*;
import java.util.*;

public class Test {
    public static void main(String[] args) throws Throwable {
        Runtime rt = Runtime.getRuntime();

        double alloc = rt.totalMemory()/1000.0;
        double free = rt.freeMemory()/1000.0;

        System.out.printf("Allocated: %.2f kb
Free: %.2f kb

",alloc,free);

        Scanner in = new Scanner(new File("my_file.txt"));
        ArrayList<String> al = new ArrayList<String>();

        while(in.hasNextLine()) {
            String s = in.nextLine();
            al.add(s.substring(0,1)); // extracts first 1 character
        }

        alloc = rt.totalMemory()/1000.0;
        free = rt.freeMemory()/1000.0;
        System.out.printf("Allocated: %.2f kb
Free: %.2f kb

",alloc,free);

        in.close();
        System.gc();

        alloc = rt.totalMemory()/1000.0;
        free = rt.freeMemory()/1000.0;
        System.out.printf("Allocated: %.2f kb
Free: %.2f kb

",alloc,free);
    }
}

推荐答案

在创建子串时,你的子串保持对原始字符串的 char 数组的引用(这个优化使得处理许多子串一个字符串非常快).因此,当您将子字符串保存在 al 列表中时,您会将整个文件保存在内存中.为避免这种情况,请使用将字符串作为参数的构造函数创建一个新字符串.

When making a substring, your substring keeps a reference to the char array of the original string (this optimization makes handling many substring of a string very fast). And so, as you keep your substrings in the al list, you're keeping your whole file in memory. To avoid this, create a new String using the constructor that takes a string as argument.

所以基本上我建议你这样做

So basically I'd suggest you do

    while(in.hasNextLine()) {
        String s = in.nextLine();
        al.add(new String(s.substring(0,1))); // extracts first 1 character
    }

String(String) 构造函数的源代码明确指出它的用途是修剪包袱":

The source code of the String(String) constructor explicitly states that its usage is to trim "the baggage" :

  164       public String(String original) {
  165           int size = original.count;
  166           char[] originalValue = original.value;
  167           char[] v;
  168           if (originalValue.length > size) {
  169               // The array representing the String is bigger than the new
  170               // String itself.  Perhaps this constructor is being called
  171               // in order to trim the baggage, so make a copy of the array.
  172               int off = original.offset;
  173               v = Arrays.copyOfRange(originalValue, off, off+size);
  174           } else {
  175               // The array representing the String is the same
  176               // size as the String, so no point in making a copy.
  177               v = originalValue;
  178           }
  179           this.offset = 0;
  180           this.count = size;
  181           this.value = v;

更新:这个问题在 OpenJDK 7 更新 6 中消失了.使用更新版本的人没有这个问题.

Update : this problem is gone with OpenJDK 7, Update 6. People with a more recent version don't have the problem.

相关文章