SQLITE_ERROR:通过 JDBC 从 Spark 连接到 SQLite 数据库时,连接已关闭

我正在使用 Apache Spark 1.5.1 并尝试连接到名为 clinton.db 的本地 SQLite 数据库.从数据库表创建数据框工作正常,但是当我对创建的对象执行某些操作时,我收到以下错误消息,其中显示SQL 错误或丢失的数据库(连接已关闭)".有趣的是,我还是得到了手术的结果.知道我可以做些什么来解决问题,即避免错误吗?

I am using Apache Spark 1.5.1 and trying to connect to a local SQLite database named clinton.db. Creating a data frame from a table of the database works fine but when I do some operations on the created object, I get the error below which says "SQL error or missing database (Connection is closed)". Funny thing is that I get the result of the operation nevertheless. Any idea what I can do to solve the problem, i.e., avoid the error?

spark-shell 的启动命令:

Start command for spark-shell:

../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar

从数据库中读取:

val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()

简单计数(失败):

emails.count

错误:

15/09/30 09:06:39 WARN JDBCRDD:异常结束语句java.sql.SQLException: [SQLITE_ERROR] SQL 错误或缺少数据库(连接已关闭)在 org.sqlite.core.DB.newSQLException(DB.java:890)在 org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)在 org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)在 org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)在 org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)在 org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)在 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)在 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)在 org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)在 org.apache.spark.scheduler.Task.run(Task.scala:90)在 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)在 java.lang.Thread.run(Thread.java:745)res1:长 = 7945

推荐答案

我遇到了同样的错误 今天,并且重要的一行就在异常之前:

I got the same error today, and the important line is just before the exception:

15/11/30 12:13:02 INFO jdbc.JDBCRDD:关闭连接

15/11/30 12:13:02 INFO jdbc.JDBCRDD: closed connection

15/11/30 12:13:02 WARN jdbc.JDBCRDD:异常结束语句java.sql.SQLException: [SQLITE_ERROR] SQL 错误或缺少数据库(连接已关闭)在 org.sqlite.core.DB.newSQLException(DB.java:890)在 org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)在 org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)

15/11/30 12:13:02 WARN jdbc.JDBCRDD: Exception closing statement java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed) at org.sqlite.core.DB.newSQLException(DB.java:890) at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109) at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)

所以Spark成功关闭JDBC连接,然后关闭JDBC语句

So Spark succeeded to close the JDBC connection, and then it fails to close the JDBC statement

看源码,close()被调用了两次:

第 358 行(org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD,Spark 1.5.1)

Line 358 (org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, Spark 1.5.1)

context.addTaskCompletionListener{ context => close() }

第 469 行

override def hasNext: Boolean = {
  if (!finished) {
    if (!gotNext) {
      nextValue = getNext()
      if (finished) {
        close()
      }
      gotNext = true
    }
  }
  !finished
}

如果您查看 close() 方法(第 443 行)

If you look at the close() method (line 443)

def close() {
  if (closed) return

您可以看到它检查了变量 closed,但该值从未设置为 true.

you can see that it checks the variable closed, but that value is never set to true.

如果我没看错的话,这个bug还在master里面.我已提交错误报告.

If I see it correctly, this bug is still in the master. I have filed a bug report.

  • 来源:JDBCRDD.scala(行号略有不同)
  • Source: JDBCRDD.scala (lines numbers differ slightly)

相关文章