Downloading Alexa Data With PHP

23rd January 2008 - 3 minutes read time

It is widely known that the data that Alexa offers on visitor numbers is far from accurate, but it is possible to obtain an XML feed from Alexa that allows you to find out all of the data that Alexa offers, which is more than just their visitor numbers. Passing the correct parameters to this feed you can find out related links, contact and domain information, the Alexa rank, associated keywords and Dmoz listings.

As an example here is a feed URL for getting information about the bbc.co.uk page.

http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url=www.bbc.co.uk

So to get information about any site all you have to do is pass the correct URL to this address.

To get this information in a usable form with PHP you can use the curl functions. To download the Alexa feed into PHP use the following code:

$url = 'www.bbc.co.uk';
$querystring = 'http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url='.urlencode($url);
$ch = curl_init();
$user_agent = "Mozilla/4.0";
curl_setopt ($ch, CURLOPT_URL, $querystring);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
$alexaXml = curl_exec($ch);
curl_close($ch);

You now have a variable called alexaXml that contains all of the information you need. You could use some of the XML parsing options within PHP, but a simpler method is to extract the information you need using regular expressions. Here are a few examples.

To get the Alexa popularity.

preg_match('/\<POPULARITY URL="(.*?)" TEXT="(.*?)"\/\>/Ui',$alexaXml,$match);
echo "<p>Popularity: ";
if(count($match)>0){
  echo $match[2];
}else{
  echo 0;
}
echo '</p>';

To get the Alexa links.

preg_match('/LINKSIN NUM="(.*?)"/Ui',$alexaXml,$match);		
echo "<p>Links: ";
if(count($match)>0){
  echo $match[1];
}else{
  echo 0;
}
echo '</p>';

To get the Dmoz categories.

preg_match_all('/CAT\sID="(.*)"/U',$alexaXml,$match);
echo "<p>Dmoz cats: ";
if(count($match[1])){
  echo '<pre>'.print_r($match[1],true).'</pre>';
}else{
  echo 0;
}
echo '</p>';

You can also see the data directly by printing off a couple of links.

echo  '<a href="http://www.alexa.com/data/ds/linksin?q=link%3A'.urlencode($url).'&url=http%3A//'.urlencode($url).'/" title="Alexa Links">Links</a>';
echo '<br />';
echo  '<a href="http://www.alexa.com/data/details/traffic_details/'.urlencode($url).'" title="Alexa Data">Data</a>';

There is more information available than this. To see everything that you can extract just copy the URL at the top into a browser window and view the output directly. I suggest doing this in Firefox because of the nice way in which it displays XML.

Comments

Permalink
You really need to learn about XML parsing in PHP, it's much more efficient.

Indy (Sun, 02/17/2008 - 12:25)

Permalink

I know about XML parsing in PHP. Here is how to do the same sort of thing with the XML engine in PHP.

$url = 'www.bbc.co.uk';
$querystring = 'http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url='.urlencode($url);
$ch = curl_init();
$user_agent = 'Mozilla/4.0';
curl_setopt ($ch, CURLOPT_URL, $querystring);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
$alexaXml = curl_exec($ch);
curl_close($ch);
 
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($xml_parser,'startElement','endElement');
xml_parse($xml_parser,$alexaXml);
xml_parser_free($xml_parser);
 
function startElement($xmlParser,$name,$attribs){
 if($name=='CAT'){
  if(isset($attribs['ID'])){
   echo 'DMOZ Cat = '.$attribs['ID'].'<br />';
  };
 }elseif($name=='LINKSIN'){
  if(isset($attribs['NUM'])){
   echo 'Alexa Links = '.$attribs['NUM'].'<br />';
  };
 }elseif($name=='POPULARITY'){
  if(isset($attribs['TEXT'])){
   echo 'Alexa Rank = '.$attribs['TEXT'].'<br />';
  };	 	
 };
}
function endElement($xmlParser,$name){
}

Just expand the startElement function to include whatever you are looking for.

Just out of interest I did some timing of the two different ways of getting at the data; XML or regular expressions. To ensure that nothing odd was going on I only downloaded the data once at the top of the script and then ran the two ways 10,000 times each on the same data. I didn't print anything off just in case that effected things, but I left in all of the if statements. Here is the average times:

Average regular expression time 0.00013191080093384 Average XML parse time 0.00088728330135345

There isn't much in it but it looks like regular expressions are a bit quicker than XML parsing. Unless I have done something wrong with along the way?

Permalink
Is it possible to get the web traffic data from Alexa into an XML file through one of these requests?

Marty Martin (Tue, 03/25/2008 - 14:52)

Permalink
Not that I know of. You can look at the traffic details for any site by going to
http://www.alexa.com/data/details/traffic_details/{url}
So for hashbangcode.com this would be
http://www.alexa.com/data/details/traffic_details/hashbangcode.com
However, I don't see a way of getting to the traffic data itself. Not that the data is of much use anyway!
Permalink
can we get the keywords that displayed for each site ? from the data.alexa.com or xml.alexa.com

Mohamed Mahmoud (Wed, 03/03/2010 - 17:13)

Permalink
Not that I can see, although some keywords appear on the traffic details page so it must be possible to get hold of them. Let me know if you find the api call you need. :)
Permalink
I m looking for a way to scrap data on a daily basis. In other words. I need a script that will go to Alexa everyday and scarp some information then paste the information to either a file or a database. How can I do that? Thank you

rhoula (Wed, 08/15/2012 - 18:00)

Permalink

Just create a script like I have above (either via regex or xml parsing) but save the information into a database table or something.

All you need to do then is create a cron job that will run the script once a day. Something like the following would do this:

* * 23 * * php /path/to/script.php

Add new comment

The content of this field is kept private and will not be shown publicly.