scrapy抓取某些网站出现AttributeError: 'Response' object has no attribute 'body_as_unicode'的解决办法

2022-05-03 00:00:00 attributeerror 网站抓取

AttributeError: 'Response' object has no attribute 'body_as_unicode'
出现这个问题，主要是网站的header里面没有content-type字段，scrapy就抽风了，不知道抓取网页的类型，其实解决办法很简单。
把pase方法进行简单的改写即可

def parse(self, response):
        hxs=Selector(response)
        detail_url_list = hxs.xpath('//li[@class="good-list"]/@href').extract()
        for url in detail_url_list:
            if 'goods' in url:
                yield Request(url, callback=self.parse_detail)

写成下面这个样子即可

def parse(self, response):
        hxs=Selector(text=response.body)
        detail_url_list = hxs.xpath('//li[@class="good-list"]/@href').extract()
        for url in detail_url_list:
            if 'goods' in url:
                yield Request(url, callback=self.parse_detail)

注意这句话：

hxs=Selector(text=response.body)

相关文章