清理放置在 URL 中的字符串的最佳方法是什么,例如 SO 上的问题名称?

2021-12-21 00:00:00 url slug php mod-rewrite

我希望创建一个 URL 字符串,就像 SO 用于问题链接的 URL 字符串一样.我不是在考虑重写 url (mod_rewrite).我正在考虑在页面上生成链接.

I'm looking to create a URL string like the one SO uses for the links to the questions. I am not looking at rewriting the url (mod_rewrite). I am looking at generating the link on the page.

示例:问题名称为:

Is it better to use ob_get_contents() or $text .= ‘test’;

URL 最终是:

http://stackoverflow.com/questions/292068/is-it-better-to-use-obgetcontents-or-text-test

我感兴趣的部分是:

is-it-better-to-use-obgetcontents-or-text-test

所以基本上我希望清除任何不是字母数字的内容,同时仍然保持 URL 可读.我创建了以下内容,但我不确定它是否是最好的方法,或者它是否涵盖了所有可能性:

So basically I'm looking to clean out anything that is not alphanumeric while still keeping the URL readable. I have the following created, but I'm not sure if it's the best way or if it covers all the possibilities:

$str = urlencode(
    strtolower(
    str_replace('--', '-', 
    preg_replace(array('/[^a-z0-9 ]/i', '/[^a-z0-9]/i'), array('', '-'), 
    trim($urlPart)))));

所以基本上:

  1. 修剪
  2. 用空替换任何非字母数字和空格
  3. 然后用破折号替换所有不是字母数字的内容
  4. 用-替换-.
  5. strtolower()
  6. urlencode() -- 可能不需要,但只是为了更好的衡量.
  1. trim
  2. replace any non alphanumeric plus the space with nothing
  3. then replace everything not alphanumeric with a dash
  4. replace -- with -.
  5. strtolower()
  6. urlencode() -- probably not needed, but just for good measure.

推荐答案

正如您已经指出的,在这种情况下不需要 urlencode(),trim() 也不需要.如果我理解正确,第 4 步是避免连续出现多个破折号,但它不会阻止超过两个破折号.另一方面,连接两个词的破折号(如大规模")将被您的解决方案删除,而它们似乎保留在 SO 上.

As you pointed out already, urlencode() is not needed in this case and neither is trim(). If I understand correctly, step 4 is to avoid multiple dashes in a row, but it will not prevent more than two dashes. On the other hand, dashes connecting two words (like in "large-scale") will be removed by your solution while they seem to be preserved on SO.

我不确定这是否真的是最佳方法,但这是我的建议:

I'm not sure that this is really the best way to do it, but here's my suggestion:

$str = strtolower( 
  preg_replace( array('/[^a-z0-9- ]/i', '/[ -]+/'), array('', '-'), 
  $urlPart ) );

所以:

  1. 删除任何既不是空格、破折号也不是字母数字的字符
  2. 用一个破折号替换任意连续数量的空格或破折号
  3. strtolower()

相关文章