使用 Selenium Webdriver (Python) 循环链接
问题描述
下午.目前正在尝试使用 Selenium webdriver 循环浏览页面上的链接列表.具体来说,它是单击一个链接,从所述页面中抓取一行文本以写入文件,然后返回,然后单击列表中的下一个链接.以下是我所拥有的:
Afternoon all. Currently trying to use Selenium webdriver to loop through a list of links on a page. Specifically, it's clicking a link, grabbing a line of text off said page to write to a file, going back, and clicking the next link in a list. The following is what I have:
def test_text_saver(self):
driver = self.driver
textsave = open("textsave.txt","w")
list_of_links = driver.find_elements_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li")
"""Initializing Link Count:"""
link_count = len(list_of_links)
while x <= link_count:
print x
driver.find_element_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li["+str(x)+"]/a").click()
text = driver.find_element_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text
textsave.write(text+"
")
driver.implicitly_wait(5000)
driver.back()
x += 1
textsave.close()
运行时,它会转到初始页面,然后...返回主页面,而不是它应该去的子页面.打印 x,我可以看到它增加了三倍而不是一倍.之后它也会崩溃.我检查了我所有的 xpath 等,并确认它得到了列表中链接数量的正确计数.
When run, it goes to the initial page, and...goes back to the main page, rather than the subpage that it's supposed to. Printing x, I can see it's incrementing three times rather than one. It also crashes after that. I've checked all my xpaths and such, and also confirmed that it's getting the correct count for the number of links in the list.
非常感谢任何输入——这实际上只是为了灵活地使用我的 python/自动化,因为我只是进入两者.提前致谢!!
Any input's hugely appreciated--this is really just to flex my python/automation, since I'm just getting into both. Thanks in advance!!
解决方案
我不确定这是否能解决问题,但总的来说最好使用 WebDriverWait
而不是 implicitly_wait
因为 WebDriveWait.until 将继续调用提供的函数(例如 driver.find_element_by_xpath
),直到返回的值不是 False
-ish 或达到超时(例如 5000 秒)——此时它会引发 selenium.common.execptions.TimeoutException
.
I'm not sure if this will fix the problem, but in general it is better to use WebDriverWait
rather than implicitly_wait
since WebDriveWait.until will keep calling the supplied function (e.g. driver.find_element_by_xpath
) until the returned value is not False
-ish or the timeout (e.g 5000 seconds) is reached -- at which point it raises a selenium.common.execptions.TimeoutException
.
import selenium.webdriver.support.ui as UI
def test_text_saver(self):
driver = self.driver
wait = UI.WebDriverWait(driver, 5000)
with open("textsave.txt","w") as textsave:
list_of_links = driver.find_elements_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li/a")
for link in list_of_links: # 2
link.click() # 1
text = wait.until(
lambda driver: driver.find_element_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text)
textsave.write(text+"
")
driver.back()
- 点击链接后,应等到链接的 url加载.所以对
wait.until
的调用直接放在link.click()
之后 而不是使用
- After you click the link, you should wait until the linked url is
loaded. So the call to
wait.until
is placed directly afterlink.click()
Instead of using
while x <= link_count:
...
x += 1
最好用
for link in list_of_links:
一方面,它提高了可读性.而且,你真的不需要关心数字x
,你真正关心的是循环遍历链接,这就是 for-loop
所做的.
For one think, it improves readability. Moreover, you really don't
need to care about the number x
, all you really care about is
looping over the links, which is what the for-loop
does.
相关文章