extracting the root domain from a URL [message #171642] |
Thu, 13 January 2011 21:50 |
Mike
Messages: 18 Registered: December 2010
Karma:
|
Junior Member |
|
|
Given any valid URL, I'd like to extract the root domain like this:
http://www.site.com = site.com
http://xxx.yyy.site.com = site.com
http://subdomain.site.com = site.com
http://www.site.com.tw = site.com.tw
http://xxx.yyy.site.com.asia = site.com.asia
http://subdomain.site.com.af = site.com.af
I've written some code (below), which works on the examples, but falls
apart if the domain name is three characters long (e.g., ibm.com).
Does someone know of a way to do this even with three letter domain
names?
Here's my current code:
function getRootDomain($url)
{
// Get rid of junk
if(!isValidUrl($url)) { return false; }
// parse the url to get the host name
$parsedUrl = parse_url($url);
// break it apart by the '.' and flip it around
$parts = array_reverse(explode('.',$parsedUrl['host']));
// remove all but the last three parts (e.g., 'www.site.com' or
'site.com.tw' or if there's only two 'site.com')
while(count($parts) > 3)
{
array_pop($parts);
}
// if there are three parts, and the middle part is more than 3
characters, then ditch the first part
// example: www.site.com - 'site' > 3 so ditch the 'www', site.com.tw
= 'com' isn't > 3, so keep the 'site'
if( isset($parts[2]) && strlen($parts[1]) > 3) { unset($parts[2]); }
// pass back the reassembled root domain name
return implode('.',array_reverse($parts));
}
|
|
|