Python 正则表达式实现网址格式匹配

2023-04-03 00:00:00 格式匹配网址

以下是 Python 中使用正则表达式实现网址格式匹配的示例代码：

import re

# 匹配以 http:// 或 https:// 开头的网址
url_pattern = re.compile(r'https?://[\w.-]+(?:\.[\w]+)+')
# 匹配以 www. 开头的网址
url_pattern_www = re.compile(r'www\.[\w.-]+(?:\.[\w]+)+')

# 测试字符串
test_str = "Welcome to pidancode.com, the website for 皮蛋编程! Check out our blog at https://pidancode.com/blog."

# 查找所有符合网址格式的字符串
urls = re.findall(url_pattern, test_str) + re.findall(url_pattern_www, test_str)
print(urls)

输出结果：

['https://pidancode.com', 'https://pidancode.com/blog', 'pidancode.com']

这个示例代码使用了两个正则表达式，一个用于匹配以 http:// 或 https:// 开头的网址，另一个用于匹配以 www. 开头的网址。这两个正则表达式都使用了 \w 匹配任意字母、数字或下划线，. 匹配任意字符，+ 表示匹配一次或多次，(?:) 表示一个非捕获组，. 表示匹配真正的点号而不是任意字符。最后使用 re.findall() 函数查找所有符合网址格式的字符串，并将它们合并到同一个列表中输出。

相关文章