读取 appengine backup_info 文件给出 EOFError

问题描述

我正在尝试检查我的 appengine 备份文件,以确定何时发生数据损坏.我使用 gsutil 找到并下载了文件:

I'm trying to inspect my appengine backup files to work out when a data corruption occured. I used gsutil to locate and download the file:

gsutil ls -l gs://my_backup/ > my_backup.txt
gsutil cp gs://my_backup/LongAlphaString.Mymodel.backup_info file://1.backup_info

然后我创建了一个小型 python 程序,尝试读取文件并使用 appengine 库对其进行解析.

I then created a small python program, attempting to read the file and parse it using the appengine libraries.

#!/usr/bin/python

APPENGINE_PATH='/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/'
ADDITIONAL_LIBS = [
'lib/yaml/lib'
]
import sys
sys.path.append(APPENGINE_PATH)
for l in ADDITIONAL_LIBS:
  sys.path.append(APPENGINE_PATH+l)

import logging
from google.appengine.api.files import records
import cStringIO

def parse_backup_info_file(content):
  """Returns entities iterator from a backup_info file content."""
  reader = records.RecordsReader(cStringIO.StringIO(content))
  version = reader.read()
  if version != '1':
    raise IOError('Unsupported version')
  return (datastore.Entity.FromPb(record) for record in reader)


INPUT_FILE_NAME='1.backup_info'

f=open(INPUT_FILE_NAME, 'rb')
f.seek(0)
content=f.read()
records = parse_backup_info_file(content)
for r in records:
  logging.info(r)

f.close()

parse_backup_info_file 的代码复制自backup_handler.py

当我运行程序时,我得到以下输出:

When I run the program, I get the following output:

./view_record.py 
Traceback (most recent call last):
  File "./view_record.py", line 30, in <module>
    records = parse_backup_info_file(content)
  File "./view_record.py", line 19, in parse_backup_info_file
    version = reader.read()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 335, in read
    (chunk, record_type) = self.__try_read_record()
  File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/api/files/records.py", line 307, in __try_read_record
    (length, len(data)))
EOFError: Not enough data read. Expected: 24898 but got 2112

我尝试了六个不同的 backup_info 文件,它们都显示相同的错误(数字不同.)我注意到它们都具有相同的预期长度: 当我进行该观察时,我正在查看同一模型的不同版本,当我查看其他模块的备份文件时,情况并非如此.

I've tried with a half a dozen different backup_info files, and they all show the same error (with different numbers.) I have noticed that they all have the same expected length: I was reviewing different versions of the same model when I made that observation, it's not true when I view the backup files of other Modules.

EOFError: Not enough data read. Expected: 24932 but got 911
EOFError: Not enough data read. Expected: 25409 but got 2220

我的方法有什么明显的错误吗?

Is there anything obviously wrong with my approach?

我猜另一个选项是 appengine 备份实用程序没有创建有效的备份文件.您可以提出的任何其他建议都将非常受欢迎.提前致谢

I guess the other option is that the appengine backup utility is not creating valid backup files. Anything else you can suggest would be very welcome. Thanks in Advance


解决方案

运行 AppEngine Datastore 备份时会创建多个元数据文件:

There are multiple metadata files created when an AppEngine Datastore backup is run:

LongAlphaString.backup_info 创建一次.这包含有关在数据存储备份中创建的所有实体类型和备份文件的元数据.

LongAlphaString.backup_info is created once. This contains metadata about all of the entity types and backup files that were created in datastore backup.

LongAlphaString.[EntityType].backup_info 为每个实体类型创建一次.这包含有关为 [EntityType] 创建的特定备份文件的元数据以及 [EntityType] 的架构信息.

LongAlphaString.[EntityType].backup_info is created once per entity type. This contains metadata about the the specific backup files created for [EntityType] along with schema information for the [EntityType].

您的代码用于查询 LongAlphaString.backup_info 的文件内容,但您似乎正在尝试查询 LongAlphaString.[EntityType].backup_info 的文件内容.这是一个脚本,它将以人类可读的格式打印每种文件类型的内容:

Your code works for interrogating the file contents of LongAlphaString.backup_info, however it seems that you are trying to interrogate the file contents of LongAlphaString.[EntityType].backup_info. Here's a script that will print the contents in a human-readable format for each file type:

import cStringIO
import os
import sys

sys.path.append('/usr/local/google_appengine')
from google.appengine.api import datastore
from google.appengine.api.files import records
from google.appengine.ext.datastore_admin import backup_pb2

ALL_BACKUP_INFO = 'long_string.backup_info'
ENTITY_KINDS = ['long_string.entity_kind.backup_info']


def parse_backup_info_file(content):
    """Returns entities iterator from a backup_info file content."""
    reader = records.RecordsReader(cStringIO.StringIO(content))
    version = reader.read()
    if version != '1':
        raise IOError('Unsupported version')
    return (datastore.Entity.FromPb(record) for record in reader)


print "*****" + ALL_BACKUP_INFO + "*****"
with open(ALL_BACKUP_INFO, 'r') as myfile:
    parsed = parse_backup_info_file(myfile.read())
    for record in parsed:
        print record

for entity_kind in ENTITY_KINDS:
    print os.linesep + "*****" + entity_kind + "*****"
    with open(entity_kind, 'r') as myfile:
        backup = backup_pb2.Backup()
        backup.ParseFromString(myfile.read())
        print backup

相关文章