正则表达式有条件地用超链接替换 Twitter 标签
我正在编写一个小的 PHP 脚本,以从用户提要中获取最新的半打 Twitter 状态更新,并将它们格式化以在网页上显示.作为其中的一部分,我需要一个正则表达式替换来将主题标签重写为 search.twitter.com 的超链接.最初我尝试使用:
(取自 https://gist.github.com/445729)
在测试过程中,我发现 #test 被转换为 Twitter 网站上的链接,但 #123 不是.在互联网上进行了一些检查并使用各种标签后,我得出的结论是,主题标签必须在某处包含字母字符或下划线才能构成链接;仅包含数字字符的标签将被忽略(大概是为了阻止诸如好的演示文稿鲍勃,幻灯片 #3 是我的最爱!"之类的链接).这使得上面的代码不正确,因为它很乐意将#123 转换为链接.
我已经有一段时间没有做太多正则表达式了,所以在我生疏的时候,我想出了以下 PHP 解决方案:
0) {foreach ($arrHashtags[2] as $strHashtag) {//检查每个标签,看看某处是否有字母或下划线if (preg_match('/#d*[a-z_]+/i', $strHashtag)) {$test = str_replace($strHashtag, '<a href="http://search.twitter.com/search?q=%23'.substr($strHashtag, 1).'">'.$strHashtag.'</a>', $test);}}}回声 $test;?>
它有效;但它的作用似乎相当冗长.我的问题是,是否有一个类似于我从 gist.github 获得的 preg_replace 将有条件地将主题标签重写为超链接,前提是它们不只包含数字?
解决方案(^|s)#(w*[a-zA-Z_]+w*)
PHP
$strTweet = preg_replace('/(^|s)#(w*[a-zA-Z_]+w*)/', '1#<a href="http://twitter.com/search?q=%232">2</a>', $strTweet);
此正则表达式表示 # 后跟 0 个或多个字符 [a-zA-Z0-9_],后跟字母字符或下划线(1 个或多个),然后是 0 个或多个单词字符.
http://rubular.com/r/opNX6qC4sG <- 在此处测试.>
I'm writing a small PHP script to grab the latest half dozen Twitter status updates from a user feed and format them for display on a webpage. As part of this I need a regex replace to rewrite hashtags as hyperlinks to search.twitter.com. Initially I tried to use:
<?php
$strTweet = preg_replace('/(^|s)#(w+)/', '1#<a href="http://search.twitter.com/search?q=%232">2</a>', $strTweet);
?>
(taken from https://gist.github.com/445729)
In the course of testing I discovered that #test is converted into a link on the Twitter website, however #123 is not. After a bit of checking on the internet and playing around with various tags I came to the conclusion that a hashtag must contain alphabetic characters or an underscore in it somewhere to constitute a link; tags with only numeric characters are ignored (presumably to stop things like "Good presentation Bob, slide #3 was my favourite!" from being linked). This makes the above code incorrect, as it will happily convert #123 into a link.
I've not done much regex in a while, so in my rustyness I came up with the following PHP solution:
<?php
$test = 'This is a test tweet to see if #123 and #4 are not encoded but #test, #l33t and #8oo8s are.';
// Get all hashtags out into an array
if (preg_match_all('/(^|s)(#w+)/', $test, $arrHashtags) > 0) {
foreach ($arrHashtags[2] as $strHashtag) {
// Check each tag to see if there are letters or an underscore in there somewhere
if (preg_match('/#d*[a-z_]+/i', $strHashtag)) {
$test = str_replace($strHashtag, '<a href="http://search.twitter.com/search?q=%23'.substr($strHashtag, 1).'">'.$strHashtag.'</a>', $test);
}
}
}
echo $test;
?>
It works; but it seems fairly long-winded for what it does. My question is, is there a single preg_replace similar to the one I got from gist.github that will conditionally rewrite hashtags into hyperlinks ONLY if they DO NOT contain just numbers?
解决方案(^|s)#(w*[a-zA-Z_]+w*)
PHP
$strTweet = preg_replace('/(^|s)#(w*[a-zA-Z_]+w*)/', '1#<a href="http://twitter.com/search?q=%232">2</a>', $strTweet);
This regular expression says a # followed by 0 or more characters [a-zA-Z0-9_], followed by an alphabetic character or an underscore (1 or more), followed by 0 or more word characters.
http://rubular.com/r/opNX6qC4sG <- test it here.
相关文章