SQLITE_ERROR:通过 JDBC 从 Spark 连接到 SQLite 数据库时,连接已关闭
我正在使用 Apache Spark 1.5.1 并尝试连接到名为 clinton.db
的本地 SQLite 数据库.从数据库表创建数据框工作正常,但是当我对创建的对象执行某些操作时,我收到以下错误消息,其中显示SQL 错误或丢失的数据库(连接已关闭)".有趣的是,我还是得到了手术的结果.知道我可以做些什么来解决问题,即避免错误吗?
I am using Apache Spark 1.5.1 and trying to connect to a local SQLite database named clinton.db
. Creating a data frame from a table of the database works fine but when I do some operations on the created object, I get the error below which says "SQL error or missing database (Connection is closed)". Funny thing is that I get the result of the operation nevertheless. Any idea what I can do to solve the problem, i.e., avoid the error?
spark-shell 的启动命令:
Start command for spark-shell:
../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar
从数据库中读取:
val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()
简单计数(失败):
emails.count
错误:
15/09/30 09:06:39 WARN JDBCRDD:异常结束语句java.sql.SQLException: [SQLITE_ERROR] SQL 错误或缺少数据库(连接已关闭)在 org.sqlite.core.DB.newSQLException(DB.java:890)在 org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)在 org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358)在 org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60)在 org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)在 org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)在 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)在 scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)在 org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)在 org.apache.spark.scheduler.Task.run(Task.scala:90)在 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)在 java.lang.Thread.run(Thread.java:745)res1:长 = 7945
推荐答案
我遇到了同样的错误 今天,并且重要的一行就在异常之前:
I got the same error today, and the important line is just before the exception:
15/11/30 12:13:02 INFO jdbc.JDBCRDD:关闭连接
15/11/30 12:13:02 INFO jdbc.JDBCRDD: closed connection
15/11/30 12:13:02 WARN jdbc.JDBCRDD:异常结束语句java.sql.SQLException: [SQLITE_ERROR] SQL 错误或缺少数据库(连接已关闭)在 org.sqlite.core.DB.newSQLException(DB.java:890)在 org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)在 org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
15/11/30 12:13:02 WARN jdbc.JDBCRDD: Exception closing statement java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed) at org.sqlite.core.DB.newSQLException(DB.java:890) at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109) at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454)
所以Spark成功关闭JDBC连接,然后关闭JDBC语句
So Spark succeeded to close the JDBC connection, and then it fails to close the JDBC statement
看源码,close()
被调用了两次:
第 358 行(org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD,Spark 1.5.1)
Line 358 (org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD, Spark 1.5.1)
context.addTaskCompletionListener{ context => close() }
第 469 行
override def hasNext: Boolean = {
if (!finished) {
if (!gotNext) {
nextValue = getNext()
if (finished) {
close()
}
gotNext = true
}
}
!finished
}
如果您查看 close()
方法(第 443 行)
If you look at the close()
method (line 443)
def close() {
if (closed) return
您可以看到它检查了变量 closed
,但该值从未设置为 true.
you can see that it checks the variable closed
, but that value is never set to true.
如果我没看错的话,这个bug还在master里面.我已提交错误报告.
If I see it correctly, this bug is still in the master. I have filed a bug report.
- 来源:JDBCRDD.scala(行号略有不同)
- Source: JDBCRDD.scala (lines numbers differ slightly)
相关文章