在Python中使用BeautifulSoup处理网页表格数据
使用BeautifulSoup处理网页表格数据的步骤如下:
- 导入必要的库
from bs4 import BeautifulSoup import requests
- 获取网页内容
url = "https://www.pidancode.com" response = requests.get(url) html_content = response.text
- 解析网页内容
soup = BeautifulSoup(html_content, 'html.parser')
- 定位表格数据
table = soup.find('table') rows = table.find_all('tr')
- 提取表格数据
for row in rows: cols = row.find_all('td') cols = [col.text.strip() for col in cols] print(cols)
示例代码:
from bs4 import BeautifulSoup import requests url = "https://www.pidancode.com" response = requests.get(url) html_content = response.text soup = BeautifulSoup(html_content, 'html.parser') table = soup.find('table') rows = table.find_all('tr') for row in rows: cols = row.find_all('td') cols = [col.text.strip() for col in cols] print(cols)
输出结果:
['学号', '姓名', '成绩'] ['001', '张三', '80'] ['002', '李四', '90'] ['003', '王五', '85']
注意:上述示例代码中的网页表格数据是自己编写的示例,实际场景中网页表格数据的定位和提取需要根据具体的网页结构进行调整。
相关文章