Python、Selenium 和 Chromedriver - 使用 find_element_by_id 的无限循环导致 CPU 问题

问题描述

大家好!我已经遇到这个问题一个星期了,但我认为我无法解决它,而且我也没有看到任何基于在线文章的解决方案.希望有人可以在这里帮助我...

Good day to all! I've been experiencing this problem for a week now but I don't think I can solve it and I also do not see any solution based on articles online. Hopefully someone can help me here...

我的场景:我需要监控一个页面中 6 个不同表格的价格,这些表格几乎每秒都在变化.一天结束时,我会关闭浏览器(通过按 X 按钮)并终止脚本(通过按 Control+C),然后在早上再次运行,让它运行一整天.该脚本是用 python 编写的,并使用 selenium 来读取价格.我使用的浏览器是 Chrome.我的操作系统是 Windows 2008 R2;Selenium 版本是 3.14.1

My scenario: I need to monitor prices from 6 different tables in one page that changes almost every second. By end of day, I would close the browser (by pressing the X button) and terminate the script (by pressing Control+C) then run again in the morning and let it run through out the day. The script is written in python and is using selenium to read the prices. The browser I use is Chrome. My OS is Windows 2008 R2; Selenium version is 3.14.1

这里是部分代码.它只是在具有 1 秒间隔的无限循环中使用 find_elements_by_id 简单地读取表格中的价格.

here is partial part of the code. It is just plainly reading the prices within the tables using find_elements_by_id inside an infinite loop with 1 second interval.

While True:
    close1 = float(browser.find_element_by_id('bnaBox1').find_elements_by_id('lastprc1')[0].text.encode('ascii','ignore'))
    close2 = float(browser.find_element_by_id('bnaBox2').find_elements_by_id('lastprc2')[0].text.encode('ascii','ignore'))
    close3 = float(browser.find_element_by_id('bnaBox3').find_elements_by_id('lastprc3')[0].text.encode('ascii','ignore'))
    close4 = float(browser.find_element_by_id('bnaBox4').find_elements_by_id('lastprc4')[0].text.encode('ascii','ignore'))
    close5 = float(browser.find_element_by_id('bnaBox5').find_elements_by_id('lastprc5')[0].text.encode('ascii','ignore'))
    close6 = float(browser.find_element_by_id('bnaBox6').find_elements_by_id('lastprc6')[0].text.encode('ascii','ignore'))
    time.sleep(1)
...

在运行的前几分钟,脚本消耗的 CPU 最少(大约 20%~30%),但再过几分钟,消耗会慢慢上升到 100%!机器中除了脚本没有运行其他进程.

During the first few minutes of the run, the scripts consumes minimal amount of CPU (approx 20~30 percent) but after few more minutes, consumption slowly shoots up to 100%! There is no other processes running in the machine than the script.

到目前为止我已经完成了故障排除(他们都没有解决我的问题)

Troubleshooting I've done so far (they all did not solve my issue)

  • 将我的 chrome 升级到最新版本 - v71 和 chromerdriver 2.44
  • 将 Chrome 回滚到以前的版本(v62、v68、v69、v70)
  • 将 Chromedriver 版本回滚到 2.42 和 2.43
  • 清除了我的 %TEMP% 文件 -
  • 重启机器(多次)

该程序仅获取表中的值,但我怀疑在后台某处,当脚本运行时,不必要的数据堆积,导致 CPU 达到上限.

The program only gets values within tables but I suspect that somewhere in the background, as the the script runs, unnecessary data is piling-up which causes the CPU to hit the ceiling.

希望有人能帮我找出导致 CPU 出现此问题的原因并解决问题.

Hoping that someone can help me figure out what causes this problem in the CPU and resolve the issue.


解决方案

如果您的代码块没有任何可见性,特别是 ,很难猜测 100% CPU 使用率 的确切原因WebDriver 配置.所以答案将非常基于通用指南,如下所示:

It would be tough to guess the exact reason of 100% CPU Usage without any visibility to your code blocks specifically the WebDriver configuration. So the answer will be pretty much based on generic guidelines as follows:

  • 永远不要关闭浏览器(按 X 按钮).总是在 tearDown(){} 方法中调用 driver.quit() 来关闭 &优雅地销毁 WebDriver 和 Web Client 实例.
    • 您可以在 PhantomJS Web 驱动程序留在内存中找到详细讨论
    • Never close the browser (by pressing the X button). Always invoke driver.quit() within tearDown(){} method to close & destroy the WebDriver and Web Client instances gracefully.
      • You can find a detailed discussion in PhantomJS web driver stays in memory
      • 您可以在 Selenium : 如何在不调用 driver.quit() 的情况下停止 geckodriver 进程影响 PC 内存?

      几个有用的ChromeOptions()及其用法如下:

      A couple of useful ChromeOptions() and their usage are as follows:

      options.addArguments("start-maximized"); // open Browser in maximized mode
      options.addArguments("disable-infobars"); // disabling infobars
      options.addArguments("--disable-extensions"); // disabling extensions
      options.addArguments("--disable-gpu"); // applicable to windows os only
      options.addArguments("--disable-dev-shm-usage"); // overcome limited resource problems
      options.addArguments("--no-sandbox"); // Bypass OS security model
      

    • 使用 time.sleep(1) 形式的硬编码睡眠是一个很大的否.

    • Using hardcoded sleeps in the form of time.sleep(1) is a big No.

      • 您可以在 如何找到详细讨论在 python 中睡眠 webdriver 毫秒
      • 您可以在限制chrome headless CPU中找到详细讨论和内存使用情况
      • 将 ChromeDriver 升级到当前的 ChromeDriverv2.44 级别.
      • 将 Chrome 版本保持在 Chrome v69-71 级别之间.(根据 ChromeDriver v2.44 发行说明)
      • 清理你的项目工作区通过你的IDE和重建你的项目只需要依赖.
      • 如果您的基础 Web Client 版本太旧,请通过 卸载它Revo Uninstaller 并安装最新的 GA 和发布版本的 Web Client.
      • 进行一次系统重启.
      • 执行你的 @Test.
      • Upgrade ChromeDriver to current ChromeDriver v2.44 level.
      • Keep Chrome version between Chrome v69-71 levels. (as per ChromeDriver v2.44 release notes)
      • Clean your Project Workspace through your IDE and Rebuild your project with required dependencies only.
      • If your base Web Client version is too old, then uninstall it through Revo Uninstaller and install a recent GA and released version of Web Client.
      • Take a System Reboot.
      • Execute your @Test.
      • (仅限 Windows 操作系统)使用 CCleaner 在执行 Test Suite 之前和之后清除所有操作系统杂务的工具.
      • (仅限 LinuxOS)在执行 Test Suite 之前和之后释放和释放 Ubuntu/Linux Mint 中未使用/缓存的内存.
      • (WindowsOS only) Use CCleaner tool to wipe off all the OS chores before and after the execution of your Test Suite.
      • (LinuxOS only) Free Up and Release the Unused/Cached Memory in Ubuntu/Linux Mint before and after the execution of your Test Suite.

相关文章