使用 Selenium 通过 PhantomJS 中的超链接下载文件

问题描述

我正在使用 selenium 在某个页面上加载的超链接上执行点击功能.该脚本适用于谷歌浏览器,但不适用于 phantomjs.为什么这不起作用?

I am using selenium to do a click function on a hyperlink, which is loaded on a certain page. The script works for google chrome, but does not for phantomjs. Why is this not working?

from selenium import webdriver

driver = webdriver.Chrome()   
#driver = webdriver.PhantomJS(executable_path = "/Users/jameslemieux/PythonProjects/phantomjs-1.9.8-macosx/bin/phantomjs")

driver.get("http://www.youtube-mp3.org/?e=t_exp&r=true#v=hC-T0rC6m7I")

elem = driver.find_element_by_link_text('Download')
elem.click()


driver.save_screenshot('/Users/jameslemieux/Desktop/Misc./test_image.png')

driver.quit()

这适用于 chrome,但它总是会打开一个新的 chrome 窗口来完成任务.我读到我应该使用 phantomjs 让它在幕后运行,但是当我将驱动程序切换到 phantomjs 时,下载似乎没有通过.截图抓取,确实在正确的页面,下载"肯定在那里.所以

This works in chrome, but it always opens up a new chrome window to complete the task. I read that I should use phantomjs to have it run behind the scenes, however when i switch the drivers to phantomjs, the download does not seem to go through. The screenshot grabs, and it is indeed at the right page, and the 'Download' is definitely there. So the

elem.click()

没有做它应该做的,或者它正在点击,但 phantomjs 不知道如何处理直接下载链接.请帮忙,我已经连续几个小时了.

is not doing what it should, or it IS clicking, but phantomjs doesnt know how to deal with a direct download link. Please help, ive been at this for hours on end.


解决方案

由于 PhantomJS 将永远不会进行下载请求,我们需要手动下载文件.

Since PhantomJS would never proceed with a download request, we need to download the file manually.

这里的想法是点击转换"按钮,等待下载"链接出现,获取 href 属性,包含生成的 mp3 文件的链接,并通过 urllib.urlretrieve():

The idea here is to click the "Convert" button, wait for the "Download" link to appear, get the href attribute, containing the link to the generated mp3 file, and download it via urllib.urlretrieve():

import urllib
from urlparse import urljoin

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

base_url = 'http://www.youtube-mp3.org/'

driver = webdriver.PhantomJS()
driver.get("http://www.youtube-mp3.org/?e=t_exp&r=true#v=hC-T0rC6m7I")

# convert the video to mp3
driver.find_element_by_id('submit').click()

# wait for download link to appear
element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.LINK_TEXT, "Download")))
link = element.get_attribute('href')
url = urljoin(base_url, link)

# download the song
urllib.urlretrieve(url, 'song.mp3')

driver.quit()

# enjoy the great song

相关文章