转到 PHP parse_url() 没有的地方 - 仅解析域

2022-01-04 00:00:00 php dns

PHP 的 parse_url() 有一个主机字段,其中包括完整的主机.我正在寻找仅返回域和 TLD 的最可靠(且成本最低)的方法.

PHP's parse_url() has a host field, which includes the full host. I'm looking for the most reliable (and least costly) way to only return the domain and TLD.

举个例子:

  • http://www.google.com/foo,parse_url() 返回 www.google.com为主机
  • http://www.google.co.uk/foo,parse_url() 返回www.google.co.uk 主机
  • http://www.google.com/foo, parse_url() returns www.google.com for host
  • http://www.google.co.uk/foo, parse_url() returns www.google.co.uk for host

我只查找 google.com 或 google.co.uk.我已经考虑了一个有效 TLD/后缀的表格,并且只允许这些和一个词.你会用其他方式吗?有没有人知道这种事情的预装有效正则表达式?

I am looking for only google.com or google.co.uk. I have contemplated a table of valid TLD's/suffixes and only allowing those and one word. Would you do it any other way? Does anyone know of a pre-canned valid REGEX for this sort of thing?

推荐答案

这样的事情怎么样?

function getDomain($url) {
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9-]{1,63}.[a-z.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}

将使用经典的 parse_url 提取域名,然后查找没有任何子域的有效域(www 是子域).不适用于本地主机"之类的东西.如果没有匹配任何内容,将返回 false.

Will extract the domain name using the classic parse_url and then look for a valid domain without any subdomain (www being a subdomain). Won't work on things like 'localhost'. Will return false if it didn't match anything.

//

试试看:

echo getDomain('http://www.google.com/test.html') . '<br/>';
echo getDomain('https://news.google.co.uk/?id=12345') . '<br/>';
echo getDomain('http://my.subdomain.google.com/directory1/page.php?id=abc') . '<br/>';
echo getDomain('https://testing.multiple.subdomain.google.co.uk/') . '<br/>';
echo getDomain('http://nothingelsethan.com') . '<br/>';

它应该返回:

google.com
google.co.uk
google.com
google.co.uk
nothingelsethan.com

当然,如果没有通过parse_urla>,因此请确保它是格式正确的 URL.

Of course, it won't return anything if it doesn't get through parse_url, so make sure it's a well-formed URL.

//附录:

Alnitak 是对的.上述解决方案适用于大多数情况,但不一定适用于所有情况,并且需要维护以确保它们不是带有 .morethan6characters 等的新 TLD.提取域的唯一可靠方法是使用维护列表,例如 http://publicsuffix.org/.一开始会更痛苦,但从长远来看会更容易、更稳健.您需要确保了解每种方法的优缺点以及它如何适合您的项目.

Alnitak is right. The solution presented above will work in most cases but not necessarily all and needs to be maintained to make sure, for example, that their aren't new TLD with .morethan6characters and so on. The only reliable way of extracting the domain is to use a maintained list such as http://publicsuffix.org/. It's more painful at first but easier and more robust on the long-term. You need to make sure you understand the pros and cons of each method and how it fits with your project.

相关文章