在Python中使用BeautifulSoup和Pillow实现网页图片的处理和优化

2023-04-17 00:00:00 优化 网页 图片
  1. 导入所需模块
from bs4 import BeautifulSoup
import requests
from io import BytesIO
from PIL import Image
  1. 请求网页并解析HTML
url = "https://pidancode.com/"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
  1. 获取网页中的图片并优化
images = soup.find_all("img")
for image in images:
    # 获取图片地址
    image_src = image["src"]
    # 请求图片并读取为Pillow图像对象
    response = requests.get(image_src)
    img = Image.open(BytesIO(response.content))
    # 优化图像
    img = img.convert("RGB")
    img = img.resize((img.width//2, img.height//2))
    # 将优化后的图像转换为base64编码的字符串
    img_data = BytesIO()
    img.save(img_data, format="JPEG")
    img_data = img_data.getvalue()
    image["src"] = "data:image/jpeg;base64," + base64.b64encode(img_data).decode()
  1. 输出修改后的HTML代码
print(soup.prettify())

完整代码示例:

from bs4 import BeautifulSoup
import requests
from io import BytesIO
from PIL import Image
import base64

url = "https://pidancode.com/"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

images = soup.find_all("img")
for image in images:
    # 获取图片地址
    image_src = image["src"]
    # 请求图片并读取为Pillow图像对象
    response = requests.get(image_src)
    img = Image.open(BytesIO(response.content))
    # 优化图像
    img = img.convert("RGB")
    img = img.resize((img.width//2, img.height//2))
    # 将优化后的图像转换为base64编码的字符串
    img_data = BytesIO()
    img.save(img_data, format="JPEG")
    img_data = img_data.getvalue()
    image["src"] = "data:image/jpeg;base64," + base64.b64encode(img_data).decode()

print(soup.prettify())

运行结果:

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <meta content="A simple blog where I share my programming notes and thoughts." name="description"/>
  <title>
   皮蛋编程 - 记录编程笔记与思考的个人博客
  </title>
  ...
  <img alt="皮蛋编程" class="custom-logo" height="50" sizes="(max-width: 50px) 100vw, 50px" src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..." srcset="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAA..."/>
  ...
 </body>
</html>

可以看到,原网页中的图片已被优化并转换为base64编码的字符串。

相关文章