url - PHP: Get Domain (without subdomain) of any Address available as String -
recently question has been asked, how domain of url available string.
unfortunately question has been closed, , far linked answers pointed solutions using regex (which fails special cases .co.uk) , static solutions, considering exceptions (which ofc. might change on time).
so, searching generic solution question, work @ time , found one. (at least couple of tests positive)
if find domain attempted solution not work, feel free mention it, , i'll try imrpove snipped cover case well.
to find domain of string given, three-step solution seems work best:
- first, actual hostname, using
parse_url
(http://php.net/manual/en/function.parse-url.php) - second, query dns-server "top-most" a-record available. (i used
checkdnsrr
purpose: http://php.net/manual/en/function.checkdnsrr.php) - last not least: perform validations make sure not running "default response".
i performed tests , seems result expected. method directly generates output, can modified return domain name instead of generating output:
<?php getdomain("http://www.stackoverflow.com"); getdomain("http://www.google.co.uk"); getdomain("http://books.google.co.uk"); getdomain("http://a.b.c.google.co.uk"); getdomain("http://www.nominet.org.uk/intelligence/statistics/registration/"); getdomain("http://invalid.fail.pooo"); getdomain("http://anotheronethatshouldfail.com"); function getdomain($url){ echo "searching domain '".$url."': "; //step 1: actual hostname $url = parse_url($url); $actualhostname = $url["host"]; //step 2: top-down approach: check dns records first valid a-record. //re-assemble url step-by-step, i.e. www.google.co.uk, check: // - uk // - co.uk // - google.co.uk (will match here) // - www.google.co.uk (will skipped) $domainparts = explode(".", $actualhostname); ($i= count($domainparts)-1; $i>=0; $i--){ $domain = ""; $currentcountry = null; ($j = count($domainparts)-1; $j>=$i; $j--){ $domain = $domainparts[$j] . "." . $domain; if ($currentcountry == null){ $currentcountry = $domainparts[$j]; } } $domain = trim($domain, "."); $validrecord = checkdnsrr($domain, "a"); //looking class records if ($validrecord){ //if host can resolved ip, seems valid. //if hostname returned, invalid. $hostip = gethostbyname($domain); $validrecord &= ($hostip != $domain); if ($validrecord){ //last check: dns server might answer 1 of isps default server ips invalid domains. //perform test on querying domain of same "country" invalid sure obtain //ip list of isps default servers. compare response of current $domain. $validrecord &= !(in_array($hostip, gethostbynamel("iiiiiiiiiiiiiiiiiinvaliddomain." . $currentcountry))); } } //valid record? if ($validrecord){ //return $domain; echo $domain."<br />"; return; } } //return null; echo " not resolved.<br />"; } ?>
output of example above:
searching domain 'http://www.stackoverflow.com': stackoverflow.com searching domain 'http://www.google.co.uk': google.co.uk searching domain 'http://books.google.co.uk': google.co.uk searching domain 'http://a.b.c.google.co.uk': google.co.uk searching domain 'http://www.nominet.org.uk/intelligence/statistics/registration/': nominet.org.uk searching domain 'http://invalid.fail.pooo': not resolved. searching domain 'http://anotheronethatshouldfail.com': not resolved.
this limited set of test-cases cannot imagine case, domain has no a-record.
as nice side-effect, validates urls , not rely on theoretically valid formats last examples showing.
best, dognose
Comments
Post a Comment