使用 Selenium Webdriver 下载时命名文件

问题描述

我看到你可以通过Webdriver设置下载文件到哪里,如下:

I see that you can set where to download a file to through Webdriver, as follows:

fp = webdriver.FirefoxProfile()

fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",getcwd())
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/csv")

browser = webdriver.Firefox(firefox_profile=fp)

但是,我想知道是否有类似的方法可以在下载文件时为其命名?最好不要与配置文件相关联,因为我将通过一个浏览器实例下载大约 6000 个文件,并且不希望每次下载都重新启动驱动程序.

But, I was wondering if there is a similar way to give the file a name when it is downloaded? Preferably, probably not something that is associated with the profile, as I will be downloading ~6000 files through one browser instance, and do not want to have to reinitiate the driver for each download.

使用所选答案建议的代码的解决方案.每次下载后重命名文件.

Solution with code as suggested by the selected answer. Rename the file after each one is downloaded.

import os
os.chdir(SAVE_TO_DIRECTORY)
files = filter(os.path.isfile, os.listdir(SAVE_TO_DIRECTORY))
files = [os.path.join(SAVE_TO_DIRECTORY, f) for f in files] # add path to each file
files.sort(key=lambda x: os.path.getmtime(x))
newest_file = files[-1]
os.rename(newest_file, docName+".pdf")


解决方案

我不知道是否有一个纯 Selenium 处理程序,但是当我需要对下载的文件做一些事情时,这是我所做的.

I do not know if there is a pure Selenium handler for this, but here is what I have done when I needed to do something with the downloaded file.

  1. 设置一个循环,轮询您的下载目录以获取 不 具有 .part 扩展名的最新文件(这表示部分下载并且偶尔会出错如果不考虑的话,事情就会发生.在此设置一个计时器,以确保在超时/其他导致下载无法完成的错误的情况下不会进入无限循环.我使用了 ls 的输出Linux 中的 -t <dirname> 命令(我的旧代码使用 commands,已弃用,因此我不会在此处显示:))并通过使用 获取第一个文件p>

  1. Set a loop that polls your download directory for the latest file that does not have a .part extension (this indicates a partial download and would occasionally trip things up if not accounted for. Put a timer on this to ensure that you don't go into an infinite loop in the case of timeout/other error that causes the download not to complete. I used the output of the ls -t <dirname> command in Linux (my old code uses commands, which is deprecated so I won't show it here :) ) and got the first file by using

# result = output of ls -t
result = result.split('
')[1].split(' ')[-1]

  • 如果 while 循环成功退出,目录中最顶层的文件将是您的文件,然后您可以使用 os.rename (或任何其他方式)对其进行修改其他你喜欢的).

  • If the while loop exits successfully, the topmost file in the directory will be your file, which you can then modify using os.rename (or anything else you like).

    可能不是您要寻找的答案,但希望它能为您指明正确的方向.

    Probably not the answer you were looking for, but hopefully it points you in the right direction.

  • 相关文章