在Python中使用BeautifulSoup爬取网页数据

2023-04-17 00:00:00 python 数据网页

使用BeautifulSoup库可以方便地提取网页数据，并且可以根据标签、类名、id等属性来定位特定的数据。

安装BeautifulSoup库：

可以通过pip命令进行安装，例如：

pip install beautifulsoup4

具体的代码示例如下：

import requests
from bs4 import BeautifulSoup

url = 'https://pidancode.com'

# 发送请求
response = requests.get(url)

# 解析HTML内容
soup = BeautifulSoup(response.text, 'html.parser')

# 获取所有标题
titles = soup.find_all('h2', class_='post-title')

# 输出标题
for title in titles:
    print(title.text)

# 获取所有文章摘要
summaries = soup.find_all('div', class_='post-content')

# 输出文章摘要
for summary in summaries:
    print(summary.text)

解释一下代码：

首先导入了requests和BeautifulSoup库。
然后指定要爬取的网址，发送请求并获取响应内容。
接下来使用BeautifulSoup进行解析，指定解析器为html.parser。
利用find_all方法获取所有的标题和文章摘要数据。其中，h2和div分别表示标签，class_参数用于指定要查找的样式。
最后通过循环输出获取到的数据。

以上就是使用BeautifulSoup进行网页爬取的示例代码，可以根据自己需要进行修改。

相关文章