如何从一个Docx文件逐页写入单独的Docx文件？

2022-04-16 00:00:00 python python-3.x xml python-docx openxml

问题描述

我有一个由数百页组成的MS Word文档。

除了人名在每个页面上都是唯一的之外，每个页面都是相同的。(一页代表一个用户)。

我想获取此Word文档，并自动执行该过程以单独保存每个页面，这样我最终将得到数百个Word文档，每个人一个文档，而不是一个由每个人组成的文档，然后我可以将这些文档分发给不同的人。

我一直在使用这里的python-docx模块：https://python-docx.readthedocs.io/en/latest/

我正在努力研究如何完成这项任务。

就我所研究的而言，循环遍历每一页是不可能的，因为页不是在.docx文件本身中确定的，而是由程序(即Microsoft Word)生成的。

但是，python-docx可以解释文本，并且由于每个页面都是相同的，所以当您看到此文本(给定页面上的最后一段文本)时，我不能对python说，这是页面的结尾，这一点之后的任何内容都是新页面。

理想情况下，如果我可以编写一个循环来考虑这样一个点，并在此之前创建一个文档，然后在所有页面上重复这一点，那就太好了。它还需要获取所有格式/图片。

我不反对其他方法，例如，如果可以，首先转换为PDF。

有什么想法吗？

解决方案

我建议使用另一个包aspose-words-cloud将Word文档拆分成单独的页面。目前，它支持云存储(Aspose云存储、Amazon S3、Dropbox、Google Drive Storage、Google Cloud Storage、Windows Azure Storage和FTP Storage)。然而，在不久的将来，它将支持来自请求体(STREAMS)的进程文件。

附言：我是Aspose的开发人员。

# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-python
import os
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile


# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxx-xxxxx-xxxx-xxxxx-xxxxxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxx'

words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'

remoteFolder = 'Temp'
localFolder = 'C:/Temp'
localFileName = '02_pages.docx'
remoteFileName = '02_pages.docx'

#upload file
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName,'rb'),remoteFolder + '/' + remoteFileName))

#Split DOCX pages as a zip file
request = asposewordscloud.models.requests.SplitDocumentRequest(name=remoteFileName, format='docx', folder=remoteFolder, zip_output= 'true')
result = words_api.split_document(request)
print("Result {}".format(result.split_result.zipped_pages.href))

#download file
request_download=asposewordscloud.models.requests.DownloadFileRequest(result.split_result.zipped_pages.href)
response_download = words_api.download_file(request_download)
copyfile(response_download, 'C:/'+ result.split_result.zipped_pages.href)

相关文章