如何使用 Selenium 和 Python 绕过 Google 验证码?
问题描述
如何使用 Selenium 和 Python 绕过 Google 验证码?
How can I bypass the Google CAPTCHA using Selenium and Python?
当我尝试抓取某些内容时,Google 会给我一个验证码.我可以使用 Selenium Python 绕过 Google 验证码吗?
When I try to scrape something, Google give me a CAPTCHA. Can I bypass the Google CAPTCHA with Selenium Python?
例如,它是 Google reCAPTCHA.您可以通过以下链接查看此验证码:https://www.google.com/recaptcha/api2/演示
As an example, it's Google reCAPTCHA. You can see this CAPTCHA via this link: https://www.google.com/recaptcha/api2/demo
解决方案
开始使用 Selenium 的 Python 客户端,你应该避免解决/绕过 Google 验证码.
To start with using Selenium's Python clients, you should avoid solving/bypass Google CAPTCHA.
Selenium 使浏览器自动化.现在,您想用这种能力实现什么完全取决于个人,但主要是为了通过浏览器客户端自动化 Web 应用程序以进行测试,当然不限于此.
Selenium automates browsers. Now, what you want to achieve with that power is entirely up to individuals, but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.
另一方面,CAPTCHA(缩写为 ...完全自动化用于区分计算机和人类的公共图灵测试...)是一种用于计算以确定用户是否是人类的挑战-响应测试.
On the other hand, CAPTCHA (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.
因此,Selenium 和 CAPTCHA 服务于两个完全不同的目的,理想情况下不应该用于完成任何相互关联的任务.
So, Selenium and CAPTCHA serves two completely different purposes and ideally shouldn't be used to achieve any interrelated tasks.
话虽如此,reCAPTCHA 可以轻松检测网络流量并将您的程序识别为 Selenium 驱动 bot.
Having said that, reCAPTCHA can easily detect the network traffic and identify your program as a Selenium driven bot.
但是,有一些通用方法可以避免在网页抓取时被检测到:
However, there are some generic approaches to avoid getting detected while web scraping:
- 网站可以确定您的脚本/程序的首要属性是您的显示器大小.所以建议不要使用常规的Viewport.
- 如果您需要向网站发送多个请求,请继续更改每个请求的用户代理.在这里您可以找到关于 如何在 Selenium 中更改 Google Chrome 用户代理?
- 要模拟 类人 行为,您可能需要减慢脚本执行速度,甚至超出 WebDriverWait 和 expected_conditions 诱导
time.sleep(secs)代码>.在这里您可以找到关于如何的详细讨论在 Python 中休眠 Selenium WebDriver 几毫秒
- The first and foremost attribute a website can determine your script/program by is through your monitor size. So it is recommended not to use the conventional Viewport.
- If you need to send multiple requests to a website, keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
- To simulate humanlike behavior, you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing
time.sleep(secs)
. Here you can find a detailed discussion on How to sleep Selenium WebDriver in Python for milliseconds
但是,在几个用例中,我们能够与 reCAPTCHA 进行交互使用 Selenium,您可以在以下讨论中找到更多详细信息:
However, in a couple of use cases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:
- 如何点击使用 Selenium 和 Java 的 reCAPTCHA
- CSS 选择器使用 Selenium 和 VBA Excel 进行 reCAPTCHA 支票簿
- 查找reCAPTCHA 元素并点击它——Python + Selenium
您可以在以下位置找到一些相关的讨论:
You can find a couple of related discussion in:
- 如何通过 Python 使用 GeckoDriver 和 Firefox 使 Selenium 脚本无法检测?
- 是否存在无法检测到的 Selenium WebDriver 版本?
- reCAPTCHA 3 如何知道我在使用 Selenium/chromedriver?
相关文章