Quickest Way To Download A Web Page With PHP

Monday, July 7, 2008 - 11:55

There are lots of different ways to download a web page using PHP, but which is the fastest? In this post I will go through as many different methods of downloading a web page and test them to see which is the quickest.

Here is a list of the different methods.

  • The PHP curl library.
  • Snoopy the PHP web browser. Bascially a wrapper for fsockopen.
  • fsockopen().
  • fopen() with feof().
  • fopen() with stream_get_contents().
  • file() and then implode().
  • file_get_contents() function.

Each method will be run and will retrieve the contents of a web page 50 times each in order to get a decent spread of times. On each run the time will be recorded into an array, this array will then be used at the end to calculate some statistics.

Part of the test will be to do nothing to see how much time PHP spends running the benchmarking functions.

Also, I will run the functions for two different types web pages, one with lots of content and one with no content as this will show the base speed of the function. The hashbangcode.com server will be used as part of the test, but I will also use the php.net site to see what effects a site with high bandwidth has on the results.

The start of the code will be setting up the variables for the rest of the test.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<?php
// increase the amount of time available for the functions to run.
set_time_limit(1234);
 
// include the Snoopy class
include 'Snoopy.class.php';
 
// initialise variables
$contents = '';
$times = array();
 
// set URL
$url = 'http://www.hashbangcode.com/';
 
// set domain - used for fsockopen()
$domain = 'www.hashbangcode.com';
 
// generate microtime value
function getmicrotime($t){
  list($usec, $sec) = explode(" ",$t);
  return ((float)$usec + (float)$sec);
}

Now that is all set up we can run through each of the methods and record the times.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
for($i=0;$i<=50;++$i){
 ////////////////////////////////////////////////////////////////////////////////////
 // doing nothing
 $start = microtime();
 $end = microtime();
 $times['nothing'][] = (getmicrotime($end) - getmicrotime($start));
 
 ////////////////////////////////////////////////////////////////////////////////////
 //curl
 $start = microtime();
 
 $ch = curl_init();
 $user_agent = "Mozilla/4.0";
 curl_setopt ($ch, CURLOPT_URL, $url);
 curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
 curl_setopt ($ch, CURLOPT_HEADER, 1);
 curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
 curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
 $contents = curl_exec($ch);
 curl_close($ch);
 
 $end = microtime();
 $times['curl'][] = (getmicrotime($end) - getmicrotime($start));
 
 ////////////////////////////////////////////////////////////////////////////////////
 //snoopy
 $start = microtime();
 
 $snoopy = new Snoopy();
 $snoopy->fetch($url);
 $contents = $snoopy->results;
 
 $end = microtime();
 $times['snoopy'][] = (getmicrotime($end) - getmicrotime($start));
 
 ////////////////////////////////////////////////////////////////////////////////////
 //fopen
 $start = microtime();
 
 if($handle = @fopen($url, "r")){
  while(!feof($handle)){
   $contents .= fread($handle, 4096);
  }
  fclose($handle);
 }
 
 $end = microtime();
 $times['fopen'][] = (getmicrotime($end) - getmicrotime($start));
 
 ////////////////////////////////////////////////////////////////////////////////////
 //fopen with stream_get_contents
 $start = microtime();
 
 if($stream = fopen($url, 'r')){
  $contents = stream_get_contents($stream);
  fclose($stream);
 }
 
 $end = microtime();
 $times['stream'][] = (getmicrotime($end) - getmicrotime($start));
 
 ////////////////////////////////////////////////////////////////////////////////////
 //file
 $start = microtime();
 
 $html = implode('', file($url));
 
 $end = microtime();
 $times['file'][] = (getmicrotime($end) - getmicrotime($start));
 
 ////////////////////////////////////////////////////////////////////////////////////
 // fsockopen
 $start = microtime();
 
 $fp = fsockopen($domain, 80, $errno, $errstr, 30);
 if(!$fp){
  $contents .= $errstr.' ('.$errno.')<br />';
 }else{
  // send headers
  $out = "GET ".fsockopen($domain, 80, $errno, $errstr, 30)." HTTP/1.1\r\n";
  $out .= "Host: ".str_replace('http://'.$domain,'',$url)."\r\n";
  $out .= "User-Agent: FSOCKOPEN\r\n";
  $out .= "Connection: Close\r\n\r\n";
  fwrite($fp, $out);
  while(!feof($fp)){
   $contents .= fgets($fp, 4096);
  };
  fclose($fp);
 };
 $end = microtime();
 $times['fsockopen'][] = (getmicrotime($end) - getmicrotime($start));
 
 ////////////////////////////////////////////////////////////////////////////////////
 //file_get_contents
 $start = microtime();
 
 $contents = file_get_contents($url);
 
 $end = microtime();
 $times['file_get_contents'][] = (getmicrotime($end) - getmicrotime($start));
}

Now that is all complete we can now do something with the times. First I sorted the times and then worked out simple things like average, minimum and maximum time taken.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// sort the times
sort($times['nothing']);
sort($times['curl']);
sort($times['snoopy']);
sort($times['fopen']);
sort($times['stream']);
sort($times['file']);
sort($times['fsockopen']);
sort($times['file_get_contents']);
 
// print out the times
echo '<pre>'.print_r($times,true).'</pre>';
 
// calculate stats for times
foreach($times as $method=>$time){
echo '<p>'.$method.' average = '.(array_sum($time)/count($time)).'<br />min = '.$time[0].'<br />max = '.$time[count($time)-1].'</p>';
}

Here are the results for downloading a blank web page. I won't print out all of the times as this would take up a lot of space and be utterly pointless.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
nothing average = 7.320852840648E-6
min = 5.0067901611328E-6
max = 1.6927719116211E-5
 
curl average = 0.083804859834559
min = 0.073469161987305
max = 0.16210603713989
 
snoopy average = 0.089839958677105
min = 0.074288129806519
max = 0.14481902122498
 
fsockopen average = 0.13405424005845
min = 0.10800695419312
max = 0.39040613174438
 
fopen average = 0.12635245042689
min = 0.10596394538879
max = 0.19953107833862
 
stream average = 0.1248401520299
min = 0.10655999183655
max = 0.17397904396057
 
file average = 0.12469244470783
min = 0.10594987869263
max = 0.19219899177551
 
file_get_contents average = 0.12691019095627
min = 0.10590195655823
max = 0.17201805114746

Here are the results for accessing the front page of the hashbangcode.com site, which is 27.17 kB.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
nothing average = 7.9659854664522E-6
min = 5.0067901611328E-6
max = 4.2915344238281E-5
 
curl average = 0.40228473438936
min = 0.34250092506409
max = 1.2593679428101
 
snoopy average = 0.37644760748919
min = 0.34368300437927
max = 0.60013008117676
 
fsockopen average = 0.12776509920756
min = 0.10803699493408
max = 0.22104907035828
 
fopen average = 0.41444192213171
min = 0.37545895576477
max = 0.77188587188721
 
stream average = 0.41173196306416
min = 0.37820982933044
max = 0.62948799133301
 
file average = 0.40511836725123
min = 0.3781590461731
max = 0.69333410263062
 
file_get_contents average = 0.41732146693211
min = 0.37894988059998
max = 0.68813395500183
Here are the results for accessing the front page of the php.net site, which is 31.13 kB.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
nothing average = 7.9566357182521E-6
min = 5.0067901611328E-6
max = 4.1961669921875E-5
 
curl average = 1.3938542253831
min = 1.0989301204681
max = 1.7416069507599
 
snoopy average = 1.3927017473707
min = 1.1879198551178
max = 1.6753461360931
 
fsockopen average = 0.61519114176432
min = 0.56387495994568
max = 1.0507898330688
 
fopen average = 1.7791449415917
min = 1.5167849063873
max = 5.4117441177368
 
stream average = 1.6853758213567
min = 1.4832580089569
max = 2.3599371910095
 
file average = 1.7028319742165
min = 1.4881958961487
max = 2.3575241565704
 
file_get_contents average = 1.7295382069606
min = 1.5339889526367
max = 2.5368640422821
Here is a graph of the results. Click To View

The results show that if the page you are downloading has any content then the quickest way to download a web page is to use fsockopen(). If the page has little or no content then you are best off using curl() as fsockopen() seems to perform worst with this.

So fsockopen() is the quickest way to get a web page. However, it is probably best to use Snoopy to do all of your fsockopen() calls as there is no point in reinventing the wheel to accomplish a simple task. If you really need the speed and are sure that you will always use the same page then use your own fsockopen() function. Snoopy does ass some overhead, but I think the benefits of using Snoopy outweigh the drop in speed.

What was odd with the data was the amount of time difference between downloading the home page of hashbangcode.com and php.net. I expected the php.net server to take less time, and although fsockopen() was far quicker than any other method, the slowest method took more than a second to download the php.net site than the hashbangcode.com site. There is a page size difference of about 3 kB, but this is not enough to effect the result like this. The only thing I can think of is that the distance is of the php.net server is causing this speed difference. I ran the results on my own computer in the UK, and the php.net server is hosted in the USA.

Doing nothing takes about 0.000008 seconds in all cases, which is no time at all really and shows that the benchmarking function contributes a tiny amount of time to the overall time taken. This time could be taken away from the time taken for each method to give a more realistic result.

If you have any other method that I have missed then post a comment and let me know. Also, it would be interesting if people ran this on their own computers/hosts in order to find out a global average time as there are other factors involved in altering these times. If you run this test then let please send me the results so that I can collate them.

Category: 
philipnorton42's picture

Philip Norton

Phil is the founder and administrator of #! code and is an IT professional working in the North West of the UK.
Google+ | Twitter

Comments

 

Waow, Interesting .

I use curl ... 'm thinking to change

Excellent post on the speed.

However there is another side to this question and that is availability. Many webserver do not allow fsockopen(), fopen(), file_get_contents(), etc. So while fsockopen() is the fastest, you may be forced to use a different method just to get the data.

Excellent blog post. I absolutely appreciate this site.
Stick with it!

Add new comment