Java 线程在处理结束时变慢
我有一个 Java 程序,它接收一个包含文本文件列表的文本文件,并分别处理每一行.为了加快处理速度,我使用带有 24 个线程的 FixedThreadPool 的 ExecutorService 线程.该机器有 24 个内核和 48GB 内存.
I have a Java program that takes in a text file containing a list of text files and processes each line separately. To speed up the processing, I make use of threads using an ExecutorService with a FixedThreadPool with 24 threads. The machine has 24 cores and 48GB of RAM.
我正在处理的文本文件有 250 万行.我发现对于前 230 万行左右的内容,在 CPU 利用率很高的情况下运行得非常好.然而,超过某个点(大约 2.3 行),性能下降,只使用一个 CPU,我的程序几乎停止运行.
The text file that I'm processing has 2.5 million lines. I find that for the first 2.3 million lines or so things run very well with high CPU utilization. However, beyond some point (at around the 2.3 lines), the performance degenerates with only a single CPU being utilized and my program pretty much grinding to a halt.
我调查了许多原因,确保我的所有文件句柄都已关闭,并增加了提供给 JVM 的内存量.但是,无论我改变什么,性能总是会在最后下降.我什至尝试过包含更少行的文本文件,但在处理文件结束时性能再次下降.
I've investigated a number of causes, made sure all my file handles are closed, and increased the amount of memory supplied to the JVM. However, regardless of what I change, performance always degrades towards the end. I've even tried on text files containing fewer lines and once again performance decreases towards the end of processing the file.
除了标准的 Java 并发库之外,代码还利用 Lucene 库进行文本处理和分析.
In addition to the standard Java concurrency libraries, the code also makes use of Lucene libraries for text processing and analysis.
当我不线程化这段代码时,性能是恒定的,并且不会在最后退化.我知道这是在黑暗中拍摄,很难描述发生了什么,但我想我想看看是否有人对最终导致性能退化的原因有任何想法.
When I don't thread this code, the performance is constant and doesn't degenerate towards the end. I know this is a shot in the dark and it's hard to describe what is going on, but I thought I would just see if anyone has any ideas as to what might be causing this degeneration in performance towards the end.
编辑
在收到评论后,我在此处粘贴了堆栈跟踪.如您所见,似乎没有任何线程正在阻塞.此外,在进行分析时,当事情变慢时,GC 并没有达到 100%.事实上,大部分时间 CPU 和 GC 利用率都为 0%,CPU 偶尔会飙升以处理一些文件,然后再次停止.
After the comments I've received, I've pasted a stack trace here. As you can see, it doesn't appear as if any of the threads are blocking. Also, when profiling, the GC was not at 100% when things slowed down. In fact, both CPU and GC utilization were at 0% most of the time, with the CPU spiking occasionally to process a few files and then stopping again.
执行线程的代码
BufferedReader read = new BufferedReader(new FileReader(inputFile));
ExecutorService executor = Executors.newFixedThreadPool(NTHREADS);
String line;
while ((line = read.readLine()) != null) { //index each line
Runnable worker = new CharikarHashThreader(line, bits, minTokens);
executor.execute(worker);
}
read.close();
推荐答案
这听起来很像垃圾收集/内存问题.
This sounds alot like a Garbage Collection / Memory Issue.
当垃圾收集运行时,它会暂停所有线程,以便 GC 线程可以进行这是可收集的垃圾"分析,而不会对其进行任何更改.当 GC 运行时,您会看到正好 1 个线程处于 100%,而其他线程将停留在 0%.
When the Garbage Collection runs it pauses all threads so that the GC thread can do its "is this collectable garbage" analysis without things changing on it. While the GC is running you'll see exactly 1 thread at 100%, the other threads will be stuck at 0%.
我会考虑添加一些 Runtime.freeMemory() 调用(或使用分析器)来查看在 GC 期间是否发生停止".
I would consider adding a few Runtime.freeMemory() calls (or using a profiler) to see if the "grind to a halt" occurs during GC.
我还尝试仅在文件的前 10k 行上运行您的程序,看看是否可行.
I'd also trying running your program on just the first 10k lines of your file to see if that works.
我还想看看你的程序在应该使用 StringBuilders 时是否构建了太多的中间字符串.
I'd also look to see if your program is building too many intermediate Strings when it should be using StringBuilders.
在我看来,您需要分析您的内存使用情况.
It sounds to me like you need to profile your memory usage.
相关文章