Python 2.6+ str.format() 和正则表达式

2022-01-15 00:00:00 python regex format string-formatting

问题描述

使用 str.format() 是 Python 2.6 和 Python 3 中格式化字符串的新标准.使用 str.format() 时遇到问题code> 带有正则表达式.

Using str.format() is the new standard for formatting strings in Python 2.6, and Python 3. I've run into an issue when using str.format() with regular expressions.

我编写了一个正则表达式来返回比指定域低一级的所有域或比指定域低两级的任何域,如果下面的第二级是 www...

I've written a regular expression to return all domains that are a single level below a specified domain or any domains that are 2 levels below the domain specified, if the 2nd level below is www...

假设指定的域是delivery.com,我的正则表达式应该返回a.delivery.com、b.delivery.com、www.c.delivery.com ...但它不应该返回xadelivery.com.

Assuming the specified domain is delivery.com, my regex should return a.delivery.com, b.delivery.com, www.c.delivery.com ... but it should not return x.a.delivery.com.

import re

str1 = "www.pizza.delivery.com"
str2 = "w.pizza.delivery.com"
str3 = "pizza.delivery.com"

if (re.match('^(w{3}.)?([0-9A-Za-z-]+.){1}delivery.com$', str1): print 'String 1 matches!'
if (re.match('^(w{3}.)?([0-9A-Za-z-]+.){1}delivery.com$', str2): print 'String 2 matches!'
if (re.match('^(w{3}.)?([0-9A-Za-z-]+.){1}delivery.com$', str3): print 'String 3 matches!'

运行它应该会给出结果:

Running this should give the result:

String 1 matches!
String 3 matches!

现在,问题是当我尝试使用 str.format 动态替换 delivery.com...

Now, the problem is when I try to replace delivery.com dynamically using str.format...

if (re.match('^(w{3}.)?([0-9A-Za-z-]+.){1}{domainName}$'.format(domainName = 'delivery.com'), str1): print 'String 1 matches!'

这似乎失败了,因为 str.format() 期望 {3}{1} 是功能.(我假设)

This seems to fail, because the str.format() expects the {3} and {1} to be parameters to the function. (I'm assuming)

我可以使用 + 运算符连接字符串

I could concatenate the string using + operator

'^(w{3}.)?([0-9A-Za-z-]+.){1}' + domainName + '$'

问题归结为,当字符串(通常是正则表达式)中包含{n}"时是否可以使用str.format()?

The question comes down to, is it possible to use str.format() when the string (usually regex) has "{n}" within it?


解决方案

您首先需要格式化字符串,然后使用正则表达式.将所有内容放在一行中确实不值得.转义是通过加倍花括号来完成的:

you first would need to format string and then use regex. It really doesn't worth it to put everything into a single line. Escaping is done by doubling the curly braces:

>>> pat= '^(w{{3}}.)?([0-9A-Za-z-]+.){{1}}{domainName}$'.format(domainName = 'delivery.com')
>>> pat
'^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$'
>>> re.match(pat, str1)

另外,re.match在字符串的开头匹配,如果使用re.match就不用放^code>,但是如果您使用 re.search,则需要 ^.

Also, re.match is matching at the beginning of the string, you don't have to put ^ if you use re.match, you need ^ if you're using re.search, however.

请注意,正则表达式中的 {1} 相当多余.

Please note, that {1} in regex is rather redundant.

相关文章