Downloading Alexa Data With PHP

Wednesday, January 23, 2008 - 11:01

It is widely known that the data that Alexa offers on visitor numbers is far from accurate, but it is possible to obtain an XML feed from Alexa that allows you to find out all of the data that Alexa offers, which is more than just their visitor numbers. Passing the correct parameters to this feed you can find out related links, contact and domain information, the Alexa rank, associated keywords and Dmoz listings.

As an example here is a feed URL for getting information about the bbc.co.uk page.

http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url=www.bbc.co.uk

So to get information about any site all you have to do is pass the correct URL to this address.

To get this information in a usable form with PHP you can use the curl functions. To download the Alexa feed into PHP use the following code:

1
2
3
4
5
6
7
8
9
10
11
12
$url = 'www.bbc.co.uk';
$querystring = 'http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url='.urlencode($url);
$ch = curl_init();
$user_agent = "Mozilla/4.0";
curl_setopt ($ch, CURLOPT_URL, $querystring);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
$alexaXml = curl_exec($ch);
curl_close($ch);

You now have a variable called alexaXml that contains all of the information you need. You could use some of the XML parsing options within PHP, but a simpler method is to extract the information you need using regular expressions. Here are a few examples.

To get the Alexa popularity.

1
2
3
4
5
6
7
8
preg_match('/\<POPULARITY URL="(.*?)" TEXT="(.*?)"\/\>/Ui',$alexaXml,$match);
echo "<p>Popularity: ";
if(count($match)>0){
  echo $match[2];
}else{
  echo 0;
}
echo '</p>';

To get the Alexa links.

1
2
3
4
5
6
7
8
preg_match('/LINKSIN NUM="(.*?)"/Ui',$alexaXml,$match);         
echo "<p>Links: ";
if(count($match)>0){
  echo $match[1];
}else{
  echo 0;
}
echo '</p>';

To get the Dmoz categories.

1
2
3
4
preg_match_all('/CAT\sID="(.*)"/U',$alexaXml,$match);
echo "<p>Dmoz cats: ";
if(count($match[1])){
  echo '<pre>'.print_r($match[1],true).'
'; }else{ echo 0; } echo '';

You can also see the data directly by printing off a couple of links.

1
2
3
echo  '<a href="http://www.alexa.com/data/ds/linksin?q=link%3A'.urlencode($url).'&url=http%3A//'.urlencode($url).'/" title="Alexa Links">Links</a>';
echo '<br />';
echo  '<a href="http://www.alexa.com/data/details/traffic_details/'.urlencode($url).'" title="Alexa Data">Data</a>';

There is more information available than this. To see everything that you can extract just copy the URL at the top into a browser window and view the output directly. I suggest doing this in Firefox because of the nice way in which it displays XML.

Category: 
philipnorton42's picture

Philip Norton

Phil is the founder and administrator of #! code and is an IT professional working in the North West of the UK.
Google+ | Twitter

Comments

philipnorton42's picture
Submitted by philipnorton42 on Wed, 02/20/2008 - 16:04

I know about XML parsing in PHP. Here is how to do the same sort of thing with the XML engine in PHP.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
$url = 'www.bbc.co.uk';
$querystring = 'http://xml.alexa.com/data?cli=10&dat=nsa&ver=quirk-searchstatus&uid=19700101000000&userip=127.0.0.1&url='.urlencode($url);
$ch = curl_init();
$user_agent = 'Mozilla/4.0';
curl_setopt ($ch, CURLOPT_URL, $querystring);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
$alexaXml = curl_exec($ch);
curl_close($ch);
 
$xml_parser = xml_parser_create();
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, false);
xml_set_element_handler($xml_parser,'startElement','endElement');
xml_parse($xml_parser,$alexaXml);
xml_parser_free($xml_parser);
 
function startElement($xmlParser,$name,$attribs){
 if($name=='CAT'){
  if(isset($attribs['ID'])){
   echo 'DMOZ Cat = '.$attribs['ID'].'&lt;br /&gt;';
  };
 }elseif($name=='LINKSIN'){
  if(isset($attribs['NUM'])){
   echo 'Alexa Links = '.$attribs['NUM'].'&lt;br /&gt;';
  };
 }elseif($name=='POPULARITY'){
  if(isset($attribs['TEXT'])){
   echo 'Alexa Rank = '.$attribs['TEXT'].'&lt;br /&gt;';
  };            
 };
}
function endElement($xmlParser,$name){
}

Just expand the startElement function to include whatever you are looking for.

Just out of interest I did some timing of the two different ways of getting at the data; XML or regular expressions. To ensure that nothing odd was going on I only downloaded the data once at the top of the script and then ran the two ways 10,000 times each on the same data. I didn't print anything off just in case that effected things, but I left in all of the if statements. Here is the average times:

1
2
3
4
5
Average regular expression time 
0.00013191080093384
 
Average XML parse time 
0.00088728330135345

There isn't much in it but it looks like regular expressions are a bit quicker than XML parsing. Unless I have done something wrong with along the way?

Is it possible to get the web traffic data from Alexa into an XML file through one of these requests?
philipnorton42's picture
Submitted by philipnorton42 on Tue, 03/25/2008 - 21:01

Not that I know of. You can look at the traffic details for any site by going to http://www.alexa.com/data/details/traffic_details/{url} So for hashbangcode.com this would be http://www.alexa.com/data/details/traffic_details/hashbangcode.com However, I don't see a way of getting to the traffic data itself. Not that the data is of much use anyway!
philipnorton42's picture
Submitted by philipnorton42 on Thu, 03/04/2010 - 10:00

Not that I can see, although some keywords appear on the traffic details page so it must be possible to get hold of them. Let me know if you find the api call you need. :)
philipnorton42's picture
Submitted by philipnorton42 on Thu, 08/16/2012 - 10:08

Just create a script like I have above (either via regex or xml parsing) but save the information into a database table or something.

All you need to do then is create a cron job that will run the script once a day. Something like the following would do this:

*   *   23   *  * php /path/to/script.php

I m looking for a way to scrap data on a daily basis. In other words. I need a script that will go to Alexa everyday and scarp some information then paste the information to either a file or a database. How can I do that?

Thank you

Add new comment