从 url 中提取域(包括困难的域)

2022-01-04 00:00:00 php dns subdomain

我正在尝试编写(或只是找到一个现有的)PHP 方法,该方法可以获取链接并提取 url.诀窍是,它需要承受像以下奇怪域的重量:

I'm trying to write (or just find an existing) PHP method that can take a link and extract the url. The trick is, it needs to hold under the weight of strange looking domains like:

www.champa.kku.ac.th 

亲眼看到这个,我还是猜错了:原以为域名是kku.ac.th,但访问时出现dns错误.

Looking at this one myself with human eyes, I still guessed it incorrectly: thought the domain would be kku.ac.th but that gives a dns error when visiting.

所以任何人都知道从 url 中可靠地提取域的好方法:

So anyone knows of a good way to reliably extract the domain from url:

http://site.com/hello.php
http://site.com.uk/hello.php
http://subdomain.site.com/hello.php
http://subdomain.site.com.uk/hello.php
http://www.champa.kku.ac.th/hello.php // and even the one I couldn't tell

推荐答案

PHP 有 parse_url() 功能将帮助您进行基本的协议、主机、端口等拆分.

PHP has the parse_url() function that will help you do the basic splitting into protocol, host, port, and so on.

至于在不确定的情况下提取正确"的域,这很难说,因为有时两部分 TLD"是 TLD 当局(例如在英国)的措施,有时是私营企业(例如.uk.com).我认为您不会绕过维护包含两部分的顶级域列表,例如

As to extracting the "right" domain in uncertain cases, this is extremely hard to tell because sometimes, "two-part TLDs" are a measure by the TLD authority (e.g. in the UK) and sometimes are private enterprises (e.g. .uk.com). I think you won't get around maintaining lists of top level domains that have two parts like

  • .co.uk
  • .ac.uk
  • .ac.th

那些结尾将被视为 TLD(顶级 级别域),吞下第二部分.

those endings would be treated like TLDs (Top level domains), swallowing the second part.

这是可靠区分两部分 TLD"的唯一方法,例如 .co.uk - where server1.ibm.co.uk(其中两个-part .co.uk 需要从诸如 server1.ibm.com(其中 .com需要删除).

This is the only way of reliably telling apart "two-part TLDs" like .co.uk - where server1.ibm.co.uk (where the two-part .co.uk needs to be removed to determine the domain itself) from regular sub-domains like server1.ibm.com (where .com needs to be removed).

获取许多重要的两部分 TLD"列表的一个很好的起点是在 speednames.com 上进行域搜索(在国家/地区选择全部").更完整的列表可以在 Ruby domainatrix 库的一部分中找到.

A good starting point to get a list of many important "two-part TLDs" is the domain search at speednames.com (select "all" in countries). A more complete list can be found as part of the Ruby domainatrix library.

相关文章