数据库连接和 OutOfMemoryError:Java 堆空间

2022-01-24 00:00:00 connection database database-connection java

去年夏天，我制作了一个 Java 应用程序，它可以解析一些 PDF 文件并获取其中包含的信息以将它们存储在 SQLite 数据库中.

Last summer, I made a Java application that would parse some PDF files and get the information they contain to store them in a SQLite database.

一切都很好，我每周左右都在向数据库中添加新文件，没有任何问题.

Everything was fine and I kept adding new files to the database every week or so without any problems.

现在，我正在尝试提高我的应用程序的速度，我想看看如果我在一个新数据库中解析过去两年的所有文件，它会怎样.那时我开始收到此错误:OutOfMemoryError: Java Heap Space.我之前没搞明白，因为我每周只解析大约 25 个新文件，但似乎一个接一个地解析 1000 多个文件要求更高.

Now, I'm trying to improve my application's speed and I wanted to see how it would fare if I parsed all the files I have from the last two years in a new database. That's when I started getting this error: OutOfMemoryError: Java Heap Space. I didn't get it before because I was only parsing about 25 new files per week, but it seems like parsing 1000+ files one after the other is a lot more demanding.

我部分解决了这个问题:我确保在每次调用数据库后关闭我的连接并且错误消失了，但代价是巨大的.解析文件现在慢得难以忍受.至于我的 ResultSets 和 Statements/PreparedStatements，我已经在每次通话后关闭它们.

I partially solved the problem: I made sure to close my connection after every call to the database and the error went away, but at a huge cost. Parsing the files is now unbearably slow. As for my ResultSets and Statements / PreparedStatements, I'm already closing them after every call.

我想我不明白什么时候应该关闭我的连接，什么时候应该继续重复使用同一个连接.我认为由于自动提交已打开，它会在每个事务(选择、更新、插入等)之后提交，并且连接会释放它正在使用的额外内存.我可能错了，因为当我解析太多文件时，我最终得到了我提到的错误.

I guess there's something I don't understand about when I should close my connection and when I should keep re-using the same one. I thought that since auto-commit is on, it commits after every transaction (select, update, insert, etc.) and the connection releases the extra memory it was using. I'm probably wrong since when I parse too many files, I end up getting the error I'm mentioning.

一个简单的解决方案是在每次调用 x 后关闭它，但我又不明白为什么，以后可能会遇到同样的错误.谁能解释我什么时候应该关闭我的连接(如果有的话，除非我完成了)?如果我只应该在完成后才这样做，那么有人可以解释我应该如何避免这个错误吗?

An easy solution would be to close it after every x calls, but then again I won't understand why and I'm probably going to get the same error later on. Can anyone explain when I should be closing my connections (if at all except when I'm done)? If I'm only supposed to do it when I'm done, then can someone explain how I'm supposed to avoid this error?

顺便说一句，我没有将它标记为 SQLite，因为当我尝试在我的在线 MySQL 数据库上运行我的程序时遇到了同样的错误.

By the way, I didn't tag this as SQLite because I got the same error when I tried running my program on my online MySQL database.

编辑正如 Deco 和 Mavrav 所指出的，问题可能不是我的 Connection.可能是文件的问题，所以我把我用来调用函数解析文件的代码一一贴出来:

Edit As it has been pointed out by Deco and Mavrav, maybe the problem isn't my Connection. Maybe it's the files, so I'm going to post the code I use to call the function to parse the files one by one:

public static void visitAllDirsAndFiles(File dir){ if (dir.isDirectory()){ String[] children = dir.list(); for (int i = 0; i < children.length; i++){ visitAllDirsAndFiles(new File(dir, children[i])); } } else{ try{ // System.out.println("File: " + dir); BowlingFilesReader.readFile(dir, playersDatabase); } catch (Exception exc){ System.out.println("Other exception in file: " + dir); } } }

因此，如果我使用目录调用该方法，它会使用我刚刚创建的 File 对象再次递归调用该函数.然后我的方法检测到它是一个文件并调用 BowlingFilesReader.readFile(dir, playerDatabase);

So if I call the method using a directory, it recursively calls the function again using the File object I just created. My method then detects that it's a file and calls BowlingFilesReader.readFile(dir, playersDatabase);

我认为方法完成后应该释放内存?

The memory should be released when the method is done I think?

推荐答案

你对打开结果集和连接的第一直觉很好，尽管可能不完全是原因.让我们先从您的数据库连接开始.

Your first instinct on open resultsets and connections was good, though maybe not entirely the cause. Let's start with your database connection first.

尝试使用数据库连接池库，例如 Apache Commons DBCP(BasicDataSource 是一个不错的起点):http://commons.apache.org/dbcp/您仍然需要关闭数据库对象，但这将使数据库前端的工作顺利进行.

Try using a database connection pooling library, such as the Apache Commons DBCP (BasicDataSource is a good place to start): http://commons.apache.org/dbcp/ You will still need to close your database objects, but this will keep things running smoothly on the database front.

增加分配给 JVM 的内存大小.您可以通过在后面添加 -Xmx 和内存量来实现，例如:

Increase the size of the memory you give to the JVM. You may do so by adding -Xmx and a memory amount after, such as:

-Xmx64m <- 这将为 JVM 提供 64 兆的内存供使用
-Xmx512m <- 512 兆

不过，请注意您的数字，向 JVM 投入更多内存不会修复内存泄漏.您可以使用 JConsole 或 JVisualVM(包含在您的 JDK 的 bin/文件夹中)之类的东西来观察您正在使用多少内存.

Be careful with your numbers, though, throwing more memory at the JVM will not fix memory leaks. You may use something like JConsole or JVisualVM (included in your JDK's bin/ folder) to observe how much memory you are using.

假设您为解析这些记录而执行的操作是线程化的，您可以通过线程化来提高操作速度.但可能需要更多信息才能回答这个问题.

You may increase the speed of your operations by threading them out, assuming the operation you are performing to parse these records is threadable. But more information might be necessary to answer that question.

希望这会有所帮助.

相关文章