python BeautifulSoup
通过BeautifulSoup库的get_text方法找到网页的正文:
#!/usr/bin/env python
#coding=utf-8
#html找出正文
import requests
from bs4 import BeautifulSoup
url='Http://www.baidu.com'
html=requests.get(url)
soup=BeautifulSoup(html.text)
print soup.get_text()
相关文章