Python BeautifulSoup的find_all()方法详解

2023-04-17 00:00:00 python beautifulsoup find

find_all() 是 BeautifulSoup 对象的一个方法，用于搜索符合指定条件的所有元素，并将结果以列表形式返回。

find_all() 方法支持多种条件搜索方式，包括标签名、CSS 类名、CSS 属性等。以下是详细介绍：

标签名搜索

可以使用标签名作为条件来搜索元素，方法如下：

soup.find_all('tag_name')

其中，tag_name 为标签名，如 'div'、'p' 等。

例如，可以搜索html中所有的p标签：

from bs4 import BeautifulSoup

html = '<html><body><p>pidancode.com</p><p>皮蛋编程</p></body></html>'
soup = BeautifulSoup(html, 'html.parser')

p_tags = soup.find_all('p')
print(p_tags)

输出：

[<p>pidancode.com</p>, <p>皮蛋编程</p>]

CSS 类名搜索

可以使用 CSS 类名作为条件来搜索元素，方法如下：

soup.find_all(class_='class_name')

其中，class_ 表示 CSS 类名，class_ 后面需要用下划线区分，class_name 为 CSS 类名的值，如 'header'、'content' 等。

例如，可以搜索html中所有 class 为 'name' 的元素：

from bs4 import BeautifulSoup

html = '<html><body><p class="name">pidancode.com</p><p class="name">皮蛋编程</p></body></html>'
soup = BeautifulSoup(html, 'html.parser')

name_tags = soup.find_all(class_='name')
print(name_tags)

输出：

[<p class="name">pidancode.com</p>, <p class="name">皮蛋编程</p>]

CSS 属性搜索

可以使用 CSS 属性作为条件来搜索元素，方法如下：

soup.find_all(attrs={'attr_name': 'attr_value'})

其中，attrs 表示属性字典， attr_name 为属性名， attr_value 为属性值。

例如，可以搜索html中所有 href 属性值为 'https://www.pidancode.com' 的a标签：

from bs4 import BeautifulSoup

html = '<html><body><a href="https://www.pidancode.com">pidancode.com</a><a href="https://www.baidu.com">baidu.com</a></body></html>'
soup = BeautifulSoup(html, 'html.parser')

a_tags = soup.find_all(attrs={'href': 'https://www.pidancode.com'})
print(a_tags)

输出：

[<a href="https://www.pidancode.com">pidancode.com</a>]

除了以上三种方式，find_all() 方法还可以结合多种搜索条件来使用，如：

soup.find_all('tag_name', class_='class_name', attrs={'attr_name': 'attr_value'})

此外，find_all() 方法还支持各种操作符号，如：'>'、'<'、'='、'!='、'^='、'$='、'*=‘等，用于进一步筛选符合条件的元素。

例如，可以搜索html中所有class为'name'的p标签后面跟着一个span标签的元素：

from bs4 import BeautifulSoup

html = '<html><body><p class="name">pidancode.com</p><span>logo</span><p class="name"><span>皮蛋编程</span></p></body></html>'
soup = BeautifulSoup(html, 'html.parser')

name_plus_span_tags = soup.find_all('p', class_='name', next_sibling='span')
print(name_plus_span_tags)

输出：

[<p class="name">pidancode.com</p>]

可以看出，以上所有示例中，find_all() 方法的返回结果均为列表类型，每个元素都是匹配的元素对象。如果没有匹配结果，返回空列表 []。

相关文章