使用 Selenium Webdriver (Python) 循环链接

2022-01-16 00:00:00 python selenium webdriver

问题描述

下午.目前正在尝试使用 Selenium webdriver 循环浏览页面上的链接列表.具体来说,它是单击一个链接,从所述页面中抓取一行文本以写入文件,然后返回,然后单击列表中的下一个链接.以下是我所拥有的:

Afternoon all. Currently trying to use Selenium webdriver to loop through a list of links on a page. Specifically, it's clicking a link, grabbing a line of text off said page to write to a file, going back, and clicking the next link in a list. The following is what I have:

    def test_text_saver(self):
    driver = self.driver
    textsave = open("textsave.txt","w")
    list_of_links = driver.find_elements_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li")
    """Initializing Link Count:"""
    link_count = len(list_of_links)
    while x <= link_count:
        print x
        driver.find_element_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li["+str(x)+"]/a").click()
        text = driver.find_element_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text
        textsave.write(text+"

")
        driver.implicitly_wait(5000)
        driver.back()
        x += 1
    textsave.close()

运行时,它会转到初始页面,然后...返回主页面,而不是它应该去的子页面.打印 x,我可以看到它增加了三倍而不是一倍.之后它也会崩溃.我检查了我所有的 xpath 等,并确认它得到了列表中链接数量的正确计数.

When run, it goes to the initial page, and...goes back to the main page, rather than the subpage that it's supposed to. Printing x, I can see it's incrementing three times rather than one. It also crashes after that. I've checked all my xpaths and such, and also confirmed that it's getting the correct count for the number of links in the list.

非常感谢任何输入——这实际上只是为了灵活地使用我的 python/自动化,因为我只是进入两者.提前致谢!!

Any input's hugely appreciated--this is really just to flex my python/automation, since I'm just getting into both. Thanks in advance!!


解决方案

我不确定这是否能解决问题,但总的来说最好使用 WebDriverWait 而不是 implicitly_wait 因为 WebDriveWait.until 将继续调用提供的函数(例如 driver.find_element_by_xpath),直到返回的值不是 False-ish 或达到超时(例如 5000 秒)——此时它会引发 selenium.common.execptions.TimeoutException.

I'm not sure if this will fix the problem, but in general it is better to use WebDriverWait rather than implicitly_wait since WebDriveWait.until will keep calling the supplied function (e.g. driver.find_element_by_xpath) until the returned value is not False-ish or the timeout (e.g 5000 seconds) is reached -- at which point it raises a selenium.common.execptions.TimeoutException.

import selenium.webdriver.support.ui as UI

def test_text_saver(self):
    driver = self.driver
    wait = UI.WebDriverWait(driver, 5000)
    with open("textsave.txt","w") as textsave:
        list_of_links = driver.find_elements_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[2]/div/div/ul/li/a")
        for link in list_of_links:  # 2
            link.click()   # 1
            text = wait.until(
                lambda driver: driver.find_element_by_xpath("//*[@id="learn-sub"]/div[4]/div/div/div/div[1]/div[1]/div[1]/h1").text)
            textsave.write(text+"

")
            driver.back()

  1. 点击链接后,应等到链接的 url加载.所以对 wait.until 的调用直接放在 link.click()
  2. 之后
  3. 而不是使用

  1. After you click the link, you should wait until the linked url is loaded. So the call to wait.until is placed directly after link.click()
  2. Instead of using

while x <= link_count:
    ...
    x += 1

最好用

for link in list_of_links: 

一方面,它提高了可读性.而且,你真的不需要关心数字x,你真正关心的是循环遍历链接,这就是 for-loop 所做的.

For one think, it improves readability. Moreover, you really don't need to care about the number x, all you really care about is looping over the links, which is what the for-loop does.

相关文章