多次搜索导致 OutOfMemoryError

2022-01-14 00:00:00 out-of-memory db2 jakarta-ee java Hibernate

我有一个经典的 Java EE 系统,带有 JSF 的 Web 层,用于 BL 的 EJB 3,以及对 DB2 数据库进行数据访问的 Hibernate 3.我在以下场景中苦苦挣扎:用户将启动一个涉及从数据库中检索大型数据集的过程.检索过程需要一些时间,因此用户不会立即收到响应,变得不耐烦并打开新浏览器并再次启动检索,有时会多次启动.EJB 容器显然没有意识到第一次检索不再相关的事实,当数据库返回结果集时,Hibernate 开始填充一组占用大量内存的 POJO,最终导致 OutOfMemoryError.

I have a classic Java EE system, Web tier with JSF, EJB 3 for the BL, and Hibernate 3 doing the data access to a DB2 database. I am struggling with the following scenario: A user will initiate a process which involves retrieving a large data set from the database. The retrieval process takes some time and so the user does not receive an immediate response, gets impatient and opens a new browser and initiates the retrieval again, sometimes multiple times. The EJB container is obviously unaware of the fact that the first retrievals are no longer relevant, and when the database returns a result set, Hibernate starts populating a set of POJOs which take up vast amounts of memory, eventually causing an OutOfMemoryError.

我想到的一个潜在解决方案是使用 Hibernate Session 的 cancelQuery 方法.但是,cancelQuery 方法仅在数据库返回结果集之前 起作用.一旦数据库返回结果集并且 Hibernate 开始填充 POJO,cancelQuery 方法就不再有效.在这种情况下,数据库查询本身返回得相当快,并且大部分性能开销似乎都存在于填充 POJO 上,此时我们不能再调用 cancelQuery 方法.

A potential solution that I thought of was to use the Hibernate Session's cancelQuery method. However, the cancelQuery method only works before the database returns a result set. Once the database returns a result set and Hibernate begins populating the POJOs, the cancelQuery method no longer has an effect. In this case, the database queries themselves return rather quickly, and the bulk of the performance overhead seems to reside in populating the POJOs, at which point we can no longer call the cancelQuery method.

推荐答案

最终实现的解决方案如下所示:

The solution implemented ended up looking like this:

一般的想法是维护当前正在运行查询的所有 Hibernate 会话到启动它们的用户的 HttpSession 的映射,这样当用户关闭浏览器时,我们就能够终止正在运行的查询.

The general idea was to maintain a map of all the Hibernate sessions that are currently running queries to the HttpSession of the user who initiated them, so that when the user would close the browser we would be able to kill the running queries.

这里有两个主要挑战需要克服.一种是将 HTTP 会话 ID 从 Web 层传播到 EJB 层,而不干扰沿途的所有方法调用——即不篡改系统中的现有代码.第二个挑战是弄清楚一旦数据库已经开始返回结果并且 Hibernate 正在用结果填充对象时如何取消查询.

There were two main challenges to overcome here. One was propagating the HTTP session-id from the web tier to the EJB tier without interfering with all the method calls along the way - i.e. not tampering with existing code in the system. The second challenge was to figure out how to cancel the queries once the database had already started returning results and Hibernate was populating objects with the results.

第一个问题得到了克服,因为我们认识到沿堆栈调用的所有方法都由同一个线程处理.这是有道理的,因为我们的应用程序都存在于一个容器中,并且没有任何远程调用.既然如此,我们创建了一个 Servlet 过滤器,它拦截对应用程序的每次调用,并添加一个带有当前 HTTP 会话 ID 的 ThreadLocal 变量.这样一来,HTTP session-id 将可用于沿线下方的每个方法调用.

The first problem was overcome based on our realization that all methods being called along the stack were being handled by the same thread. This makes sense, as our application exists all within one container and does not have any remote calls. Being that that is the case, we created a Servlet Filter that intercepts every call to the application and adds a ThreadLocal variable with the current HTTP session-id. This way the HTTP session-id will be available to each one of the method calls lower down along the line.

第二个挑战有点棘手.我们发现负责运行查询并随后填充 POJO 的 Hibernate 方法称为 doQuery 并位于 org.hibernate.loader.Loader.java 类中.(我们碰巧使用的是 Hibernate 3.5.3,但新版本的 Hibernate 也是如此.):

The second challenge was a little more sticky. We discovered that the Hibernate method responsible for running the queries and subsequently populating the POJOs was called doQuery and located in the org.hibernate.loader.Loader.java class. (We happen to be using Hibernate 3.5.3, but the same holds true for newer versions of Hibernate.):

private List doQuery(
        final SessionImplementor session,
        final QueryParameters queryParameters,
        final boolean returnProxies) throws SQLException, HibernateException {

    final RowSelection selection = queryParameters.getRowSelection();
    final int maxRows = hasMaxRows( selection ) ?
            selection.getMaxRows().intValue() :
            Integer.MAX_VALUE;

    final int entitySpan = getEntityPersisters().length;

    final ArrayList hydratedObjects = entitySpan == 0 ? null : new ArrayList( entitySpan * 10 );
    final PreparedStatement st = prepareQueryStatement( queryParameters, false, session );
    final ResultSet rs = getResultSet( st, queryParameters.hasAutoDiscoverScalarTypes(), queryParameters.isCallable(), selection, session );

    final EntityKey optionalObjectKey = getOptionalObjectKey( queryParameters, session );
    final LockMode[] lockModesArray = getLockModes( queryParameters.getLockOptions() );
    final boolean createSubselects = isSubselectLoadingEnabled();
    final List subselectResultKeys = createSubselects ? new ArrayList() : null;
    final List results = new ArrayList();

    try {

        handleEmptyCollections( queryParameters.getCollectionKeys(), rs, session );

        EntityKey[] keys = new EntityKey[entitySpan]; //we can reuse it for each row

        if ( log.isTraceEnabled() ) log.trace( "processing result set" );

        int count;
        for ( count = 0; count < maxRows && rs.next(); count++ ) {

            if ( log.isTraceEnabled() ) log.debug("result set row: " + count);

            Object result = getRowFromResultSet( 
                    rs,
                    session,
                    queryParameters,
                    lockModesArray,
                    optionalObjectKey,
                    hydratedObjects,
                    keys,
                    returnProxies 
            );
            results.add( result );

            if ( createSubselects ) {
                subselectResultKeys.add(keys);
                keys = new EntityKey[entitySpan]; //can't reuse in this case
            }

        }

        if ( log.isTraceEnabled() ) {
            log.trace( "done processing result set (" + count + " rows)" );
        }

    }
    finally {
        session.getBatcher().closeQueryStatement( st, rs );
    }

    initializeEntitiesAndCollections( hydratedObjects, rs, session, queryParameters.isReadOnly( session ) );

    if ( createSubselects ) createSubselects( subselectResultKeys, queryParameters, session );

    return results; //getResultList(results);

}

在这种方法中,您可以看到首先以老式 java.sql.ResultSet 的形式从数据库中获取结果,然后在每个集合上循环运行,然后从它创建一个对象.在循环之后调用的 initializeEntitiesAndCollections() 方法中执行了一些额外的初始化.经过一点调试,我们发现大部分性能开销都在方法的这些部分,而不是从数据库获取 java.sql.ResultSet 的部分,而是 cancelQuery 方法只对第一部分有效.因此解决方案是在 for 循环中添加一个附加条件,以检查线程是否被中断,如下所示:

In this method you can see that first the results are brought from the database in the form of a good old fashioned java.sql.ResultSet, after which it runs in a loop over each set and creates an object from it. Some additional initialization is performed in the initializeEntitiesAndCollections() method called after the loop. After debugging a little, we discovered that the bulk of the performance overhead was in these sections of the method, and not in the part that gets the java.sql.ResultSet from the database, but the cancelQuery method was only effective on the first part. The solution therefore was to add an additional condition to the for loop, to check whether the thread is interrupted like this:

for ( count = 0; count < maxRows && rs.next() && !currentThread.isInterrupted(); count++ ) {
// ...
}

以及在调用 initializeEntitiesAndCollections() 方法之前执行相同的检查:

as well as to perform the same check before calling the initializeEntitiesAndCollections() method:

if (!Thread.interrupted()) {

    initializeEntitiesAndCollections(hydratedObjects, rs, session,
                queryParameters.isReadOnly(session));
    if (createSubselects) {

        createSubselects(subselectResultKeys, queryParameters, session);
    }
}

另外,通过在第二次检查时调用Thread.interrupted(),标志被清除并且不影响程序的进一步运行.现在,当要取消查询时,取消方法会访问存储在映射中的 Hibernate 会话和线程,其中 HTTP 会话 ID 作为键,调用会话上的 cancelQuery 方法并调用 interrupt 线程的方法.

Additionally, by calling the Thread.interrupted() on the second check, the flag is cleared and does not affect the further functioning of the program. Now when a query is to be canceled, the canceling method accesses the Hibernate session and thread stored in a map with the HTTP session-id as the key, calls the cancelQuery method on the session and calls the interrupt method of the thread.

相关文章