在Python中使用BeautifulSoup和Pillow实现网页图片的处理和优化
- 导入所需模块
from bs4 import BeautifulSoup import requests from io import BytesIO from PIL import Image
- 请求网页并解析HTML
url = "https://pidancode.com/" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser")
- 获取网页中的图片并优化
images = soup.find_all("img") for image in images: # 获取图片地址 image_src = image["src"] # 请求图片并读取为Pillow图像对象 response = requests.get(image_src) img = Image.open(BytesIO(response.content)) # 优化图像 img = img.convert("RGB") img = img.resize((img.width//2, img.height//2)) # 将优化后的图像转换为base64编码的字符串 img_data = BytesIO() img.save(img_data, format="JPEG") img_data = img_data.getvalue() image["src"] = "data:image/jpeg;base64," + base64.b64encode(img_data).decode()
- 输出修改后的HTML代码
print(soup.prettify())
完整代码示例:
from bs4 import BeautifulSoup import requests from io import BytesIO from PIL import Image import base64 url = "https://pidancode.com/" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") images = soup.find_all("img") for image in images: # 获取图片地址 image_src = image["src"] # 请求图片并读取为Pillow图像对象 response = requests.get(image_src) img = Image.open(BytesIO(response.content)) # 优化图像 img = img.convert("RGB") img = img.resize((img.width//2, img.height//2)) # 将优化后的图像转换为base64编码的字符串 img_data = BytesIO() img.save(img_data, format="JPEG") img_data = img_data.getvalue() image["src"] = "data:image/jpeg;base64," + base64.b64encode(img_data).decode() print(soup.prettify())
运行结果:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"/> <meta content="width=device-width, initial-scale=1" name="viewport"/> <meta content="A simple blog where I share my programming notes and thoughts." name="description"/> <title> 皮蛋编程 - 记录编程笔记与思考的个人博客 </title> ... <img alt="皮蛋编程" class="custom-logo" height="50" sizes="(max-width: 50px) 100vw, 50px" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..." srcset="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..."/> ... </body> </html>
可以看到,原网页中的图片已被优化并转换为base64编码的字符串。
相关文章