python: 爬虫利器requests
requests并不是系统自带的模块,他是第三方库,需要安装才能使用
闲话少说,来,让我们上代码:
简单的看一下效果:
import requests
requests = requests.session()
headers = {
'User-Agent':'Mozilla/5.0 (windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
}
url = "Http://httpbin.org"
response = requests.get(url, headers=headers, timeout=None)
print(response.text)
print(response.cookies)
print(response.content)
print(response.content.decode("utf-8"))
print(respone.JSON())
基本的post请求:
data = {
"name":"zhaofan",
"age":23
}
response = requests.post("http://httpbin.org/post",data=data)
print(response.text)
对于无效的网站证书请求方法:
import requests
from requests.packages import urllib3
urllib3.disable_warnings()
response = requests.get("https://www.12306.cn",verify=False)
print(response.status_code)
代理设置:
import requests
proxies= {
"http":"http://127.0.0.1:9999",
"https":"http://127.0.0.1:8888"
}
response = requests.get("https://www.baidu.com",proxies=proxies)
print(response.text)
如果代理需要设置账户名和密码,只需要将字典更改为如下:
proxies = {
"http":"http://user:passWord@127.0.0.1:9999"
}
如果你的代理是通过sokces这种方式则需要pip install "requests[socks]"
proxies= {
"http":"socks5://127.0.0.1:9999",
"https":"sockes5://127.0.0.1:8888"
}
超时设置
通过timeout参数可以设置超时的时间
没有超时时间,一直等待
timeout=None
异常捕捉:
import requests
from requests.exceptions import ReadTimeout,ConnectionError,RequestException
try:
response = requests.get("http://httpbin.org/get",timout=0.1)
print(response.status_code)
except ReadTimeout:
print("timeout")
except ConnectionError:
print("connection Error")
except RequestException:
print("error")
相关文章