Google翻译API ID阻止太多请求的IP地址

2022-04-14 00:00:00 python django python-requests google-translation-api batch-processing

问题描述

我正在设置从API请求产品数据的Django视图，使用BeautifulSoup解析它们，应用googletrans模块并将响应保存到我的PostgreSQL数据库中。

昨天一切正常，直到突然，Google阻止了对我的IP地址的访问，一次请求太多..

我刚刚打开LTE以更改我的IP地址，它起作用了。

但现在，为了确保此IP地址不会再次发生这种情况，我需要找到一种方法来批量调用googletransAPI或任何其他防止我再次被阻止的解决方案。

这是我的观点：

from bs4 import BeautifulSoup
from googletrans import Translator
import requests
import json


def api_data(request):
    if request.GET.get('mybtn'):  # to improve, == 'something':
        resp_1 = requests.get(
            "https://www.headout.com/api/public/v1/product/listing/list-by/city?language=fr&cityCode=PARIS&limit=5000&currencyCode=CAD",
            headers={
                "Headout-Auth": HEADOUT_PRODUCTION_API_KEY
            })
        resp_1_data = resp_1.json()
        base_url_2 = "https://www.headout.com/api/public/v1/product/get/"

        translator = Translator()

        for item in resp_1_data['items']:
            print('translating item {}'.format(item['id']))
            # concat ID to the URL string
            url = '{}{}'.format(base_url_2, item['id'] + '?language=fr')

            # make the HTTP request
            resp_2 = requests.get(
                url,
                headers={
                    "Headout-Auth": HEADOUT_PRODUCTION_API_KEY
                })
            resp_2_data = resp_2.json()

            descriptiontxt = resp_2_data['contentListHtml'][0]['html'][0:2040] + ' ...'

            #Parsing work
            soup = BeautifulSoup(descriptiontxt, 'lxml')
            parsed = soup.find('p').text

            #Translation doesn't work
            translation = translator.translate(parsed, dest='fr')

            titlename = item['name']
            titlefr = translator.translate(titlename, dest='fr')

            destinationname = item['city']['name']
            destinationfr = translator.translate(destinationname, dest='fr')

            Product.objects.get_or_create(
                title=titlefr.text,
                destination=destinationfr.text,
                description=translation.text,
                link=item['canonicalUrl'],
                image=item['image']['url']
            )

    return render(request, "form.html")

如何批量调用Google翻译API？或者有其他解决方案吗？

请帮帮忙。

编辑

基于@ddor254，我应该将time.sleep(2)放在哪里？

这就是我出来的，这样可以吗？

  Product.objects.get_or_create(
      title=titlefr.text,
      destination=destinationfr.text,
      description=translation.text,
      link=item['canonicalUrl'],
      image=item['image']['url']
  )time.sleep(2) #here

或类似于：

resp_1 = requests.get(
            "https://www.headout.com/api/public/v1/product/listing/list-by/city?language=fr&cityCode=PARIS&limit=5000&currencyCode=CAD",
            headers={
                "Headout-Auth": HEADOUT_PRODUCTION_API_KEY
            }, time.sleep(2)) #here

在冒着阻止此新IP的风险之前，我只想确保这是正确的操作方法。

解决方案

我建议您从mdn阅读本文：https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

如果这是您得到的响应，请尝试查看响应对象中的头Retry-After。

因此，使用该标头的值添加睡眠或其他延迟方法可能会解决您的问题。

相关文章