清理放置在 URL 中的字符串的最佳方法是什么,例如 SO 上的问题名称?

2021-12-21 00:00:00 url slug php mod-rewrite

我希望创建一个 URL 字符串,就像 SO 用于问题链接的 URL 字符串一样.我不是在考虑重写 url (mod_rewrite).我正在考虑在页面上生成链接.

I'm looking to create a URL string like the one SO uses for the links to the questions. I am not looking at rewriting the url (mod_rewrite). I am looking at generating the link on the page.


Is it better to use ob_get_contents() or $text .= ‘test’;

URL 最终是:




所以基本上我希望清除任何不是字母数字的内容,同时仍然保持 URL 可读.我创建了以下内容,但我不确定它是否是最好的方法,或者它是否涵盖了所有可能性:

So basically I'm looking to clean out anything that is not alphanumeric while still keeping the URL readable. I have the following created, but I'm not sure if it's the best way or if it covers all the possibilities:

$str = urlencode(
    str_replace('--', '-', 
    preg_replace(array('/[^a-z0-9 ]/i', '/[^a-z0-9]/i'), array('', '-'), 


  1. 修剪
  2. 用空替换任何非字母数字和空格
  3. 然后用破折号替换所有不是字母数字的内容
  4. 用-替换-.
  5. strtolower()
  6. urlencode() -- 可能不需要,但只是为了更好的衡量.
  1. trim
  2. replace any non alphanumeric plus the space with nothing
  3. then replace everything not alphanumeric with a dash
  4. replace -- with -.
  5. strtolower()
  6. urlencode() -- probably not needed, but just for good measure.


正如您已经指出的,在这种情况下不需要 urlencode(),trim() 也不需要.如果我理解正确,第 4 步是避免连续出现多个破折号,但它不会阻止超过两个破折号.另一方面,连接两个词的破折号(如大规模")将被您的解决方案删除,而它们似乎保留在 SO 上.

As you pointed out already, urlencode() is not needed in this case and neither is trim(). If I understand correctly, step 4 is to avoid multiple dashes in a row, but it will not prevent more than two dashes. On the other hand, dashes connecting two words (like in "large-scale") will be removed by your solution while they seem to be preserved on SO.


I'm not sure that this is really the best way to do it, but here's my suggestion:

$str = strtolower( 
  preg_replace( array('/[^a-z0-9- ]/i', '/[ -]+/'), array('', '-'), 
  $urlPart ) );


  1. 删除任何既不是空格、破折号也不是字母数字的字符
  2. 用一个破折号替换任意连续数量的空格或破折号
  3. strtolower()
