python中的二进制文件IO,从哪里开始?

2022-01-09 00:00:00 python binary io epub mobipocket

问题描述

作为一名自学成才的 Python 爱好者,我将如何学习使用标准格式导入和导出二进制文件?

As a self-taught python hobbyist, how would I go about learning to import and export binary files using standard formats?

我想实现一个脚本,它采用 ePub 电子书(zip 中的 XHTML + CSS)并将其转换为 mobipocket (Palmdoc) 格式,以允许 Amazon Kindle 阅读它(作为更大项目的一部分)我正在努力).

I'd like to implement a script that takes ePub ebooks (XHTML + CSS in a zip) and converts it to a mobipocket (Palmdoc) format in order to allow the Amazon Kindle to read it (as part of a larger project that I'm working on).

已经有一个很棒的用于管理电子书库的开源项目:Calibre.我想尝试自己将其作为学习/自学练习来实施.我开始查看他们的 python 源码代码 并意识到我不知道发生了什么.当然,在任何事情上自学的最大危险是不知道你不知道什么.

There is already an awesome open-source project for managing ebook libraries : Calibre. I wanted to try implementing this on my own as a learning/self-teaching exercise. I started looking at their python source code and realized that I have no idea what is going on. Of course, the big danger in being self-taught at anything is not knowing what you don't know.

在这种情况下,我知道我对这些二进制文件以及如何在 python 代码中使用它们了解不多(struct?).但我认为我可能总体上缺少很多关于二进制文件的知识,我想要一些帮助来理解如何使用它们.这里是 mobi/palmdoc 标头的详细概述.谢谢!

In this case, I know that I don't know much about these binary files and how to work with them in python code (struct?). But I think I'm probably missing a lot of knowledge about binary files in general and I'd like some help understanding how to work with them. Here is a detailed overview of the mobi/palmdoc headers. Thanks!

没问题,好点!您对如何获得使用二进制文件的基本知识有任何提示吗?特定于 Python 的方法会有所帮助,但其他方法也可能有用.

No question, good point! Do you have any tips on how to gain a basic knowledge of working with binary files? Python-specific would be helpful but other approaches could also be useful.

TOM:作为问题编辑,添加了介绍/更好的标题

TOM:Edited as question, added intro / better title


解决方案

你应该从 struct 模块,正如您在问题中指出的那样,当然,将文件作为二进制文件打开.

You should probably start with the struct module, as you pointed to in your question, and of course, open the file as a binary.

基本上,您只需从文件的开头开始,然后将其逐个分开.这是一个麻烦,但不是一个大问题.如果文件被压缩或加密,事情会变得更加困难.如果您从一个您知道其内容的文件开始,这样您就不会一直在猜测,这会很有帮助.

Basically you just start at the beginning of the file and pick it apart piece by piece. It's a hassle, but not a huge problem. If the files are compressed or encrypted, things can get more difficult. It's helpful if you start with a file that you know the contents of so you're not guessing all the time.

尝试一下,也许您会提出更具体的问题.

Try it a bit, and maybe you'll evolve more specific questions.

相关文章