RegExp:删除字符串中可以包含其他句点的最后一个句点(挖掘输出)
问题描述
我正在尝试解析 linux dig 命令的输出并执行几个用正则表达式一次性完成.
I am trying to parse the output of the linux dig command and do several things on one shot with regular expressions.
假设我挖主机mail.yahoo.com
:
/usr/bin/dig +nocomments +noquestion
+noauthority +noadditional +nostats +nocmd
mail.yahoo.com A
此命令输出:
mail.yahoo.com. 0 IN CNAME login.yahoo.com.
login.yahoo.com. 0 IN CNAME ats.login.lgg1.b.yahoo.com.
ats.login.lgg1.b.yahoo.com. 0 IN CNAME ats.member.g02.yahoodns.net.
ats.member.g02.yahoodns.net. 0 IN CNAME any-ats.member.a02.yahoodns.net.
any-ats.member.a02.yahoodns.net. 12 IN A 98.139.21.169
我想要找到所有 <host>
、<record_type>
和 <resolved_name>
部分最后一段只使用一个正则表达式
What I'd like to is finding all the <host>
, <record_type>
and <resolved_name>
parts without the final period using only one regular expression
对于这个带有 mail.yahoo.com
的特定示例,应该是:
For this particular example with mail.yahoo.com
, it'd be:
[
('mail.yahoo.com', 'CNAME', 'login.yahoo.com'),
('login.yahoo.com', 'CNAME', 'ats.login.lgg1.b.yahoo.com'),
('ats.login.lgg1.b.yahoo.com', 'CNAME', 'ats.member.g02.yahoodns.net'),
('ats.member.g02.yahoodns.net', 'CNAME', 'any-ats.member.a02.yahoodns.net'),
('any-ats.member.a02.yahoodns.net', 'A', '98.139.21.169'),
]
但事实证明,dig
命令可能会在名称末尾显示一个句点:
But it turns out that the dig
command might be showing a period at the end of the name:
mail.yahoo.com.
^ ^ ^
| | |
Good dot | |
| |
Good dot |
|
(!) Baaaad dot
使用正则表达式拆分 dig
的输出并返回带有最后一个句点的名称非常简单:
Doing a regular expression that splits dig
's output and returns the name with the final period is fairly straightforward:
regex = re.compile("^(S+).+INs+([A-Z]+)s+(S+).*s*$",re.MULTILINE)
但是使用该正则表达式调用 .findall
确实会返回主机中的最后一个句点,因为 S+
也会匹配最后一个句点:
But calling .findall
with that regex does return the final period in the host, because S+
will match the last period as well:
[
('mail.yahoo.com.', 'CNAME', 'login.yahoo.com.'),
('login.yahoo.com.', 'CNAME', 'ats.login.lgg1.b.yahoo.com.'),
('ats.login.lgg1.b.yahoo.com.', 'CNAME', 'ats.member.g02.yahoodns.net.'),
('ats.member.g02.yahoodns.net.', 'CNAME', 'any-ats.member.a02.yahoodns.net.'),
('any-ats.member.a02.yahoodns.net.', 'A', '98.139.21.169'),
]
所以我需要 something 匹配所有非空格 S
除非它是一个句点后跟一个空格.
So I'd need something that matches all non-spaces S
except if it's a period followed by a whitespace.
我已经做了无数次尝试,但我无法想出一个像样的解决方案.
I've done endless tries, and I haven't been able to come up with a decent solution.
提前谢谢你!
PS:我知道我总是可以使用简单"的正则表达式并(在第二次通过时)删除找到的字符串的最后一个点,但我很好奇这是否可以用正则表达式一次性完成.
PS: I know I can always use the "easy" regular expression and (on a second pass) remove the last dot of the found string, but I'm curious about whether this can be done with a regular expression in one shot.
解决方案
您可以将此模式与多行修饰符一起使用:
You can use this pattern with multiline modifier:
^([^ ]+)(?<!.).?[ ]+[0-9]+[ ]+IN[ ]+([^ ]+)[ ]+(.+(?<!.)).?$
存储在 $1 $2 和 $3 中的组
Groups stored in $1 $2 and $3
演示
试试这个:
^([^ ]+)(?<!.).?[ ]+[0-9]+[ ]+IN[ ]+([^ ]+)[ ]+(.+(?<!.)).?$
相关文章