Wednesday, January 23, 2008 - 11:01
It is widely known that the data that Alexa offers on visitor numbers is far from accurate, but it is possible to obtain an XML feed from Alexa that allows you to find out all of the data that Alexa offers, which is more than just their visitor numbers. Passing the correct parameters to this feed you can find out related links, contact and domain information, the Alexa rank, associated keywords and Dmoz listings.
As an example here is a feed URL for getting information about the bbc.co.uk page.
http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url=www.bbc.co.uk
So to get information about any site all you have to do is pass the correct URL to this address.
To get this information in a usable form with PHP you can use the curl functions. To download the Alexa feed into PHP use the following code:
1 2 3 4 5 6 7 8 9 10 11 12 | $url = 'www.bbc.co.uk'; $querystring = 'http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url='.urlencode($url); $ch = curl_init(); $user_agent = "Mozilla/4.0"; curl_setopt ($ch, CURLOPT_URL, $querystring); curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent); curl_setopt ($ch, CURLOPT_HEADER, 1); curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt ($ch, CURLOPT_TIMEOUT, 120); $alexaXml = curl_exec($ch); curl_close($ch); |
You now have a variable called alexaXml that contains all of the information you need. You could use some of the XML parsing options within PHP, but a simpler method is to extract the information you need using regular expressions. Here are a few examples.
To get the Alexa popularity.
1 2 3 4 5 6 7 8 | preg_match('/\<POPULARITY URL="(.*?)" TEXT="(.*?)"\/\>/Ui',$alexaXml,$match); echo "<p>Popularity: "; if(count($match)>0){ echo $match[2]; }else{ echo 0; } echo '</p>'; |
To get the Alexa links.
1 2 3 4 5 6 7 8 | preg_match('/LINKSIN NUM="(.*?)"/Ui',$alexaXml,$match); echo "<p>Links: "; if(count($match)>0){ echo $match[1]; }else{ echo 0; } echo '</p>'; |
To get the Dmoz categories.
1 2 3 4 | preg_match_all('/CAT\sID="(.*)"/U',$alexaXml,$match); echo "<p>Dmoz cats: "; if(count($match[1])){ echo '<pre>'.print_r($match[1],true).' |
You can also see the data directly by printing off a couple of links.
1 2 3 | echo '<a href="http://www.alexa.com/data/ds/linksin?q=link%3A'.urlencode($url).'&url=http%3A//'.urlencode($url).'/" title="Alexa Links">Links</a>'; echo '<br />'; echo '<a href="http://www.alexa.com/data/details/traffic_details/'.urlencode($url).'" title="Alexa Data">Data</a>'; |
There is more information available than this. To see everything that you can extract just copy the URL at the top into a browser window and view the output directly. I suggest doing this in Firefox because of the nice way in which it displays XML.
Comments
I know about XML parsing in PHP. Here is how to do the same sort of thing with the XML engine in PHP.
Just expand the startElement function to include whatever you are looking for.
Just out of interest I did some timing of the two different ways of getting at the data; XML or regular expressions. To ensure that nothing odd was going on I only downloaded the data once at the top of the script and then ran the two ways 10,000 times each on the same data. I didn't print anything off just in case that effected things, but I left in all of the if statements. Here is the average times:
There isn't much in it but it looks like regular expressions are a bit quicker than XML parsing. Unless I have done something wrong with along the way?
Submitted by Indy (not verified) on Sun, 02/17/2008 - 13:25 Permalink
Submitted by Marty Martin (not verified) on Tue, 03/25/2008 - 15:52 Permalink
http://www.alexa.com/data/details/traffic_details/{url}So for hashbangcode.com this would behttp://www.alexa.com/data/details/traffic_details/hashbangcode.comHowever, I don't see a way of getting to the traffic data itself. Not that the data is of much use anyway!Submitted by Mohamed Mahmoud (not verified) on Wed, 03/03/2010 - 18:13 Permalink
Just create a script like I have above (either via regex or xml parsing) but save the information into a database table or something.
All you need to do then is create a cron job that will run the script once a day. Something like the following would do this:
* * 23 * * php /path/to/script.phpSubmitted by rhoula (not verified) on Wed, 08/15/2012 - 19:00 Permalink
I m looking for a way to scrap data on a daily basis. In other words. I need a script that will go to Alexa everyday and scarp some information then paste the information to either a file or a database. How can I do that?
Thank you
Add new comment