将 SQLite 文件合并为一个 db 文件，以及“开始/提交"问题

2021-12-27 00:00:00 python merge sqlite

这篇文章指的是这个页面，用于合并 SQLite 数据库.

顺序如下.假设我想合并 a.db 和 b.db.在命令行中，我执行以下操作.

sqlite3 a.db
附加'b.db'作为toM；
开始；<--
插入基准 select * from toM.benchmark;
提交；<--
将数据库分离到M；

它运行良好，但在引用的站点中，提问者询问加速，答案是使用开始"和提交"命令.

然后，我想出了以下 python 代码来做完全相同的事情.我使用 SQLiteDB 抽象了 SQLite 函数调用，其中一种方法是 runCommand().即使我删除了 self.connector.commit()，我也遇到了同样的错误.

# 运行命令def runCommand(self, command):self.cursor.execute(命令)self.connector.commit() # 即使我删除了这一行，也会出现同样的错误db = SQLiteDB('a.db')cmd = "attach "%s" as toMerge" % "b.db"打印命令db.runCommand(cmd)cmd = "开始"db.runCommand(cmd)cmd = "插入基准选择 * from toMerge.benchmark"db.runCommand(cmd)cmd = "提交"db.runCommand(cmd)cmd = "分离数据库 toMerge"db.runCommand(cmd)

但是，我收到以下错误.

OperationalError: 无法提交 - 没有活动的事务

即使出现错误，结果数据库也很好地合并了.没有开始/提交，根本没有错误.

为什么我不能运行开始/提交命令?
是否绝对有必要运行 begin/commit 来安全地合并 db 文件?帖子说开始/提交的目的是为了加速.那么，就加速而言，使用和不使用 begin/commit 命令有什么区别?

解决方案

显然，Cursor.execute 不支持提交"命令.它确实支持开始"命令，但这是多余的，因为 sqlite3 以某种方式开始它们:

<预><代码>>>>导入 sqlite3>>>conn = sqlite3.connect(':memory:')>>>cur = conn.cursor()>>>cur.execute('开始')<sqlite3.Cursor 对象在 0x0104B020>>>>cur.execute('CREATE TABLE test (id INTEGER)')<sqlite3.Cursor 对象在 0x0104B020>>>>cur.execute('插入测试值(1)')<sqlite3.Cursor 对象在 0x0104B020>>>>cur.execute('提交')回溯(最近一次调用最后一次):文件<pyshell#10>"，第 1 行，在 <module> 中cur.execute('提交')OperationalError: 无法提交 - 没有活动的事务>>>

只需在您的 Connection 对象上使用 commit 方法.

至于你的第二个问题，在合并文件时调用 begin/commit 并不是绝对必要的:只要确保绝对没有磁盘错误、数据库的修改或人们在计算机上以错误的方式看它时正在发生.所以开始/提交可能是一个好主意.当然，如果原始数据库没有被修改(老实说我没有看过)，那么甚至不需要.如果出现错误，您可以废弃部分输出并重新开始.

它还提供了加速，因为每个更改都不必在发生时写入磁盘.它们可以存储在内存中并批量写入.但正如前面提到的 sqlite3 为你处理这个.

另外，值得一提的是

cmd = "attach "%s" as toMerge" % "b.db"

在它被弃用的意义上是错误的.如果你想正确地做错事，那就是

cmd = 'attach "{0}" as toMerge'.format("b.db") #为什么不只是一个字符串?

这与较新版本的 python 向前兼容，这将使移植代码更容易.

如果你想做正确的事情，那就是

cmd = "attach ? as toMerge"cursor.execute(cmd, ('b.db', ))

这避免了 sql 注入，而且显然速度稍快，因此是双赢的.

你可以修改你的 runCommand 方法如下:

def runCommand(self, sql, params=(), commit=True):self.cursor.execute(sql, params)如果提交:self.connector.commit()

现在，当您不需要提交时，您无法通过传递 commit=False 在每个命令之后提交.这保留了交易的概念.

This post refers to this page for merging SQLite databases.

The sequence is as follows. Let's say I want to merge a.db and b.db. In command line I do the following.

sqlite3 a.db
attach 'b.db' as toM;
begin; <--
insert into benchmark select * from toM.benchmark;
commit; <--
detach database toM;

It works well, but in the referred site, the questioner asks about speeding up, and the answer is to use the 'begin' and 'commit' command.

Then, I came up with the following python code to do the exactly same thing. I abstract the SQLite function calls with SQLiteDB, and one of it's method is runCommand(). I got the same error even though I delete the self.connector.commit().

# run command
def runCommand(self, command):
    self.cursor.execute(command)
    self.connector.commit() # same error even though I delete this line

db = SQLiteDB('a.db')
cmd = "attach "%s" as toMerge" % "b.db"
print cmd
db.runCommand(cmd)
cmd = "begin"
db.runCommand(cmd)
cmd = "insert into benchmark select * from toMerge.benchmark"
db.runCommand(cmd)
cmd = "commit"
db.runCommand(cmd)
cmd = "detach database toMerge"
db.runCommand(cmd)

But, I got the following error.

OperationalError: cannot commit - no transaction is active

Even though the error, the result db is well merged. And without the begin/commit, there's no error at all.

Why can't I run the begin/commit command?
Is it absolutely necessary to run begin/commit to safely merge the db files? The post says that the purpose of begin/commit is for speedup. Then, what's the difference between using and not using the begin/commit command in terms of speedup?

解决方案

Apparently, Cursor.execute doesn't support the 'commit' command. It does support the 'begin' command but this is redundant because sqlite3 begins them for you anway:

>>> import sqlite3
>>> conn = sqlite3.connect(':memory:')
>>> cur = conn.cursor()
>>> cur.execute('begin')
<sqlite3.Cursor object at 0x0104B020>
>>> cur.execute('CREATE TABLE test (id INTEGER)')
<sqlite3.Cursor object at 0x0104B020>
>>> cur.execute('INSERT INTO test VALUES (1)')
<sqlite3.Cursor object at 0x0104B020>
>>> cur.execute('commit')

Traceback (most recent call last):
  File "<pyshell#10>", line 1, in <module>
    cur.execute('commit')
OperationalError: cannot commit - no transaction is active
>>>

just use the commit method on your Connection object.

As for your second question, it is not absolutely necessary to call begin/commit when merging the files: just be sure that there is absolutely no disk error, modifications to the db's or people looking at the computer the wrong way while it is happening. So begin/commit is probably a good idea. Of course, if the original db's aren't being modified (I honestly haven't looked) then there is no need for that even. If there is an error, you can just scrap the partial output and start over.

It also provides a speedup because every change doesn't have to be written to disk as it occurs. They can be stored in memory and written in bulk. But as mentioned sqlite3 handles this for you.

Also, it's worth mentioning that

cmd = "attach "%s" as toMerge" % "b.db"

is wrong in the sense that it's depracated. If you want to do the wrong thing correctly, it's

cmd = 'attach "{0}" as toMerge'.format("b.db") #why not just one string though?

This is forward compatible with newer versions of python which will make porting code easier.

if you want to do the right thing, it's

cmd = "attach ? as toMerge"
cursor.execute(cmd, ('b.db', ))

This avoids sql injection and is, apparently, slightly faster so it's win-win.

You could modify your runCommand method as follows:

def runCommand(self, sql, params=(), commit=True):
    self.cursor.execute(sql, params)
    if commit:
        self.connector.commit()

now you can not commit after every single command by passing commit=False when you don't need a commit. This preserves the notion of transaction.

相关文章