使用 XPath 和 Selenium 定位类的特定实例

问题描述

我正在尝试使用 Selenium 单击每个元素(屏幕截图 1 中显示的每个容器)的 PDF 图标(屏幕截图 2 中所示).

问题在于 PDF 图标的标识符是有限的,因此我只能使用 XPath 表达式按类来定位它们.在 for elem in issues_numb: 语句的每次迭代中,脚本都会单击它在页面上找到的第一个 PDF 图标,因为它是与提供给脚本的 XPath 相关联的第一个元素.

有没有办法创建一个嵌套循环,让每个类的实例(文章标题)单击与之关联的另一个类的实例(PDF 图标)?所以对于第一篇文章,点击第一个PDF图标等...

HTML 代码:

<section aria-label=从阿拉伯海分离的黄瓜金黄杆菌MW-6菌株的全基因组序列和广谱抗菌活性的元数据"类=文章列表项目内容块"><div 类=标题"数据余烬动作=";data-ember-action-1069=1069">

我的代码:

issues_numb = driver.find_elements(By.XPATH, "//section[@class='article-list-item-content-block ']")parent_tab = driver.current_window_handle对于issues_numb中的elem:title_article = elem.get_attribute("aria-label")打印(title_article[13:])尝试:check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")print("找到的 pdf 对象", str(elem))check_size_buttons = len(str(check_buttons))如果检查大小按钮 >0:pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")click_pdf = ActionChains(驱动程序).move_to_element(pdf_icon).click(pdf_icon).perform()WebDriverWait(驱动程序,超时).until(element_present)check_need_to_sign_in()driver.switch_to.window(parent_tab)别的:print("没有可用的 PDF")除了 NoSuchElementException:get_article_name()

issues_numb 变量引用这个元素:

tools_box 变量引用这个元素:

解决方案

解决这种情况的方法,即只能访问由多个元素共享的标识符(在我的例子中是共享的类名通过多个 PDF 图标),是指定要查看的上下文.

这样,驱动程序将只查看与您所追求的特定搜索区域相关的 HTML 代码.这里了解更多信息.这里,但是从那时起,Selenium 的正确语法发生了变化.这是语法是更新版本:

elements = driver.find_elements(By.XPATH, "//tag['targeted_context']")对于元素中的元素:target_element = elem.find_element(By.XPATH,".//tag[@class='targeted_class']")

(@AbdulAzizBarkat 在评论中回答.)

I am trying, using Selenium, to click the PDF icon (shown in screenshot 2) for each element (each of the containers shown in screenshot 1).

The problem is that the identifiers for the PDF icons are limited, so I am restricted to locating them with an XPath expression by class. At each iteration of the for elem in issues_numb: statement, the script clicks the first PDF icon it finds on the page, as it is the first element associated with the XPath fed to the script.

Is there a way to create a nested loop that for each instance of a class (article titles) clicks the instance of another class (PDF icons) that's associated to it? So for the first article, click the first PDF icon, etc...

HTML code:

<section aria-label="Metadata for Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea" class="article-list-item-content-block ">
    <div class="title " data-ember-action="" data-ember-action-1069="1069">
        <div id="ember1070" class="ember-view"><a target="_blank" href="/libraries/1374/articles/504204400" id="ember1071" class="ember-view" tabindex="0"> Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea
            </a>
        </div>
    </div>

    <!---->

    <div class="metadata">

        <!---->

        <span tabindex="0" class="pages ">
            p. 489
        </span>

        <!---->

        <span class="authors" data-ember-action="" data-ember-action-1082="1082">
            <span tabindex="0" class="preview tabindex">
                Iqbal, Sajid; Vohra, Muhammad Sufyan; Janjua, Hussnain Ahmed
            </span>
        </span>

        <div class="abstract" data-ember-action="" data-ember-action-1083="1083">
            <div tabindex="0" class="preview tabindex">
                <div id="ember1088" class="ember-view">
                    <span class="lt-line-clamp__line">In the current study, strain MW-6 isolated from Arabian seawater exhibited broad-spectrum antibacterial activity</span>
                   <span class="lt-line-clamp__line">against indicator bacterial pathogens. The partially extracted antibacterial metabolites with ethyl acetate revealed</span>
                   <span class="lt-line-clamp__line lt-line-clamp__line--last">
                       promising activity against, and. The minimum inhibitory concentrations (MICs) were determined against indicator stra<span class="lt-line-clamp__ellipsis"><div class="lt-line-clamp__dummy-element">…</div>

                       <!---->
                    </span></span>

                    <!----><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">…</span></div>
                    </div>
                </div>
            </div>

            <!---->

            <div class="content-overflow " data-ember-action="" data-ember-action-1089="1089">
                <span class="chevron icon flaticon solid down-2"></span>
            </div>

            <div class="tools ">
              <div class="buttons noselect">
                    <div class="button invisible download-pdf" data-ember-action="" data-ember-action-1090="1090">
                        <div id="ember1091" class="ember-view"><a aria-label="Download PDF" target="_blank" href="/libraries/1374/articles/504204400/pdf" id="ember1092" class="tooltip ember-view" tabindex="0">
                            <span aria-hidden="true" class="icon fal fa-file-pdf"></span>
                            <span class="aria-hidden">Download PDF - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
                        </a>
                    </div>
                </div>

                <div class="button invisible read-full-text" data-ember-action="" data-ember-action-1097="1097">
                    <div id="ember1098" class="ember-view"><a aria-label="Link to Article" target="_blank" href="/libraries/1374/articles/504204400" id="ember1099" class="tooltip ember-view" tabindex="0">
                        <span aria-hidden="true" class="icon fal fa-link"></span>
                        <span class="aria-hidden">Link to Article - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
                    </a>
                </div>
            </div>

            <div class="button invisible add-to-my-articles" data-ember-action="" data-ember-action-1100="1100">
              <a aria-label="Save to My Articles" class="tabindex tooltip" tabindex="0">
                <span aria-hidden="true" class="icon fal fa-folder"></span>
                <span class="aria-hidden">Save to My Articles - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
              </a>
            </div>

            <div class="button invisible citation-services" data-ember-action="" data-ember-action-2165="2165">
              <a tabindex="0" aria-label="Export Citation" class="tabindex tooltip">
                <span aria-hidden="true" class="icon fal fa-graduation-cap"></span>
                <span class="aria-hidden">Export Citation - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
              </a>
            </div>

            <div class="button invisible social-media-services" data-ember-action="" data-ember-action-2166="2166">
              <a tabindex="0" aria-label="Share" class="tabindex tooltip">
                <span aria-hidden="true" class="icon fal fa-share-alt"></span>
                <span class="aria-hidden">Share - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
              </a>
            </div>
        </div>
    </div>
</section>

My code:

issues_numb = driver.find_elements(By.XPATH, "//section[@class='article-list-item-content-block ']")
parent_tab = driver.current_window_handle


for elem in issues_numb:
    title_article = elem.get_attribute("aria-label")
    print(title_article[13:])
    try:
        check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
        print("pdf object found for", str(elem))
        checking_size_buttons = len(str(check_buttons))
        if checking_size_buttons > 0:
            pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
            click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
            WebDriverWait(driver, timeout).until(element_present)
            check_need_to_sign_in()
            driver.switch_to.window(parent_tab)
        else:
            print("No PDF available")
    except NoSuchElementException:
        get_article_name()

The issues_numb variable refers to this element:

The tools_box variable refers to this element:

解决方案

The way to solve a situation like this, i.e., only having access to an identifier that is shared by multiple elements (in my case a class name that is shared by multiple PDF icons), is to specify a context in which to look.

This way, the driver will only look in the HTML code that is relevant to the specific area of search you're after. More on this here. Here too, but Selenium's proper syntax has changed since then. This is syntax is the updated version:

elements = driver.find_elements(By.XPATH, "//tag['targeted_context']")
for elem in elements:
    targeted_element = elem.find_element(By.XPATH,".//tag[@class='targeted_class']")

(Answer by @AbdulAzizBarkat in comments.)

相关文章