在 mmap 文件中删除/插入数据

2022-01-09 00:00:00 python mmap insert

问题描述

我正在编写一个 Python 脚本,该脚本映射一个文件以使用 mmap() 进行处理.

I am working on a script in Python that maps a file for processing using mmap().

这些任务需要我更改文件的内容

The tasks requires me to change the file's contents by

  1. 替换数据
  2. 将数据添加到文件中的偏移处
  3. 从文件中删除数据(而不是删除)

只要旧数据和新数据的字节数相同,替换数据就很有效:

Replacing data works great as long as the old data and the new data have the same number of bytes:

VDATA = mmap.mmap(f.fileno(),0)
start = 10
end = 20
VDATA[start:end] = "0123456789"

但是,当我尝试删除数据(用"替换范围)或插入数据(用比范围长的内容替换范围)时,我收到错误消息:

However, when I try to remove data (replacing the range with "") or inserting data (replacing the range with contents longer than the range), I receive the error message:

IndexError: mmap 切片分配是尺寸不对

IndexError: mmap slice assignment is wrong size

这是有道理的.

现在的问题是,如何从 mmap 文件中插入和删除数据?通过阅读文档,我似乎可以使用一系列低级操作来回移动文件的全部内容,但如果有更简单的解决方案,我宁愿避免这样做.

The question now is, how can I insert and delete data from the mmap'ed file? From reading the documentation, it seems I can move the file's entire contents back and forth using a chain of low-level actions but I'd rather avoid this if there is an easier solution.


解决方案

在没有其他选择的情况下,我继续编写了两个辅助函数 - deleteFromMmap() 和 insertIntoMmap() - 来处理低级文件操作并简化发展.

In lack of an alternative, I went ahead and wrote two helper functions - deleteFromMmap() and insertIntoMmap() - to handle the low level file actions and ease the development.

关闭和重新打开 mmap 而不是使用 resize() 是由于 unix 上的 python 中的一个错误导致 resize() 失败.(http://mail.python.org/pipermail/python-bugs-list/2003-May/017446.html)

The closing and reopening of the mmap instead of using resize() is do to a bug in python on unix derivates leading resize() to fail. (http://mail.python.org/pipermail/python-bugs-list/2003-May/017446.html)

函数包含在一个完整的示例中.全局变量的使用取决于主项目的格式,但您可以轻松调整它以匹配您的编码标准.

The functions are included in a complete example. The use of a global is due to the format of the main project but you can easily adapt it to match your coding standards.

import mmap

# f contains "0000111122223333444455556666777788889999"

f = open("data","r+")
VDATA = mmap.mmap(f.fileno(),0)

def deleteFromMmap(start,end):
    global VDATA
    length = end - start
    size = len(VDATA)
    newsize = size - length

    VDATA.move(start,end,size-end)
    VDATA.flush()
    VDATA.close()
    f.truncate(newsize)
    VDATA = mmap.mmap(f.fileno(),0)

def insertIntoMmap(offset,data):
    global VDATA
    length = len(data)
    size = len(VDATA)
    newsize = size + length

    VDATA.flush()
    VDATA.close()
    f.seek(size)
    f.write("A"*length)
    f.flush()
    VDATA = mmap.mmap(f.fileno(),0)

    VDATA.move(offset+length,offset,size-offset)
    VDATA.seek(offset)
    VDATA.write(data)
    VDATA.flush()

deleteFromMmap(4,8)

# -> 000022223333444455556666777788889999

insertIntoMmap(4,"AAAA")

# -> 0000AAAA22223333444455556666777788889999

相关文章