Quickest Way To Download A Web Page With PHP

7th July 2008 - 9 minutes read time

There are lots of different ways to download a web page using PHP, but which is the fastest? In this post I will go through as many different methods of downloading a web page and test them to see which is the quickest.

Here is a list of the different methods.

  • The PHP curl library.
  • Snoopy the PHP web browser. Bascially a wrapper for fsockopen.
  • fsockopen().
  • fopen() with feof().
  • fopen() with stream_get_contents().
  • file() and then implode().
  • file_get_contents() function.

Each method will be run and will retrieve the contents of a web page 50 times each in order to get a decent spread of times. On each run the time will be recorded into an array, this array will then be used at the end to calculate some statistics.

Part of the test will be to do nothing to see how much time PHP spends running the benchmarking functions.

Also, I will run the functions for two different types web pages, one with lots of content and one with no content as this will show the base speed of the function. The hashbangcode.com server will be used as part of the test, but I will also use the php.net site to see what effects a site with high bandwidth has on the results.

The start of the code will be setting up the variables for the rest of the test.

  1. <?php
  2. // increase the amount of time available for the functions to run.
  3. set_time_limit(1234);
  4.  
  5. // include the Snoopy class
  6. include 'Snoopy.class.php';
  7.  
  8. // initialise variables
  9. $contents = '';
  10. $times = array();
  11.  
  12. // set URL
  13. $url = 'http://www.hashbangcode.com/';
  14.  
  15. // set domain - used for fsockopen()
  16. $domain = 'www.hashbangcode.com';
  17.  
  18. // generate microtime value
  19. function getmicrotime($t){
  20. list($usec, $sec) = explode(" ",$t);
  21. return ((float)$usec + (float)$sec);
  22. }

Now that is all set up we can run through each of the methods and record the times.

  1. for ($i = 0; $i <= 50; ++$i) {
  2. ////////////////////////////////////////////////////////////////////////////////////
  3. // doing nothing
  4. $start = microtime();
  5. $end = microtime();
  6. $times['nothing'][] = (getmicrotime($end) - getmicrotime($start));
  7.  
  8. ////////////////////////////////////////////////////////////////////////////////////
  9. //curl
  10. $start = microtime();
  11.  
  12. $ch = curl_init();
  13. $user_agent = "Mozilla/4.0";
  14. curl_setopt ($ch, CURLOPT_URL, $url);
  15. curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
  16. curl_setopt ($ch, CURLOPT_HEADER, 1);
  17. curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
  18. curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
  19. curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
  20. $contents = curl_exec($ch);
  21. curl_close($ch);
  22.  
  23. $end = microtime();
  24. $times['curl'][] = (getmicrotime($end) - getmicrotime($start));
  25.  
  26. ////////////////////////////////////////////////////////////////////////////////////
  27. //snoopy
  28. $start = microtime();
  29.  
  30. $snoopy = new Snoopy();
  31. $snoopy->fetch($url);
  32. $contents = $snoopy->results;
  33.  
  34. $end = microtime();
  35. $times['snoopy'][] = (getmicrotime($end) - getmicrotime($start));
  36.  
  37. ////////////////////////////////////////////////////////////////////////////////////
  38. //fopen
  39. $start = microtime();
  40.  
  41. if($handle = @fopen($url, "r")){
  42. while(!feof($handle)){
  43. $contents .= fread($handle, 4096);
  44. }
  45. fclose($handle);
  46. }
  47.  
  48. $end = microtime();
  49. $times['fopen'][] = (getmicrotime($end) - getmicrotime($start));
  50.  
  51. ////////////////////////////////////////////////////////////////////////////////////
  52. //fopen with stream_get_contents
  53. $start = microtime();
  54.  
  55. if($stream = fopen($url, 'r')){
  56. $contents = stream_get_contents($stream);
  57. fclose($stream);
  58. }
  59.  
  60. $end = microtime();
  61. $times['stream'][] = (getmicrotime($end) - getmicrotime($start));
  62.  
  63. ////////////////////////////////////////////////////////////////////////////////////
  64. //file
  65. $start = microtime();
  66.  
  67. $html = implode('', file($url));
  68.  
  69. $end = microtime();
  70. $times['file'][] = (getmicrotime($end) - getmicrotime($start));
  71.  
  72. ////////////////////////////////////////////////////////////////////////////////////
  73. // fsockopen
  74. $start = microtime();
  75.  
  76. $fp = fsockopen($domain, 80, $errno, $errstr, 30);
  77. if(!$fp){
  78. $contents .= $errstr.' ('.$errno.')<br />';
  79. }else{
  80. // send headers
  81. $out = "GET ".fsockopen($domain, 80, $errno, $errstr, 30)." HTTP/1.1\r\n";
  82. $out .= "Host: ".str_replace('http://'.$domain,'',$url)."\r\n";
  83. $out .= "User-Agent: FSOCKOPEN\r\n";
  84. $out .= "Connection: Close\r\n\r\n";
  85. fwrite($fp, $out);
  86. while(!feof($fp)){
  87. $contents .= fgets($fp, 4096);
  88. };
  89. fclose($fp);
  90. };
  91. $end = microtime();
  92. $times['fsockopen'][] = (getmicrotime($end) - getmicrotime($start));
  93.  
  94. ////////////////////////////////////////////////////////////////////////////////////
  95. //file_get_contents
  96. $start = microtime();
  97.  
  98. $contents = file_get_contents($url);
  99.  
  100. $end = microtime();
  101. $times['file_get_contents'][] = (getmicrotime($end) - getmicrotime($start));
  102. }

Now that is all complete we can now do something with the times. First I sorted the times and then worked out simple things like average, minimum and maximum time taken.

  1. // sort the times
  2. sort($times['nothing']);
  3. sort($times['curl']);
  4. sort($times['snoopy']);
  5. sort($times['fopen']);
  6. sort($times['stream']);
  7. sort($times['file']);
  8. sort($times['fsockopen']);
  9. sort($times['file_get_contents']);
  10.  
  11. // print out the times
  12. echo '<pre>'.print_r($times,true).'</pre>';
  13.  
  14. // calculate stats for times
  15. foreach ($times as $method=>$time){
  16. echo '<p>'.$method.' average = '.(array_sum($time)/count($time)).'<br />min = '.$time[0].'<br />max = '.$time[count($time)-1].'</p>';
  17. }

Here are the results for downloading a blank web page. I won't print out all of the times as this would take up a lot of space and be utterly pointless.

  1. nothing average = 7.320852840648E-6
  2. min = 5.0067901611328E-6
  3. max = 1.6927719116211E-5
  4.  
  5. curl average = 0.083804859834559
  6. min = 0.073469161987305
  7. max = 0.16210603713989
  8.  
  9. snoopy average = 0.089839958677105
  10. min = 0.074288129806519
  11. max = 0.14481902122498
  12.  
  13. fsockopen average = 0.13405424005845
  14. min = 0.10800695419312
  15. max = 0.39040613174438
  16.  
  17. fopen average = 0.12635245042689
  18. min = 0.10596394538879
  19. max = 0.19953107833862
  20.  
  21. stream average = 0.1248401520299
  22. min = 0.10655999183655
  23. max = 0.17397904396057
  24.  
  25. file average = 0.12469244470783
  26. min = 0.10594987869263
  27. max = 0.19219899177551
  28.  
  29. file_get_contents average = 0.12691019095627
  30. min = 0.10590195655823
  31. max = 0.17201805114746

Here are the results for accessing the front page of the hashbangcode.com site, which is 27.17 kB.

  1. nothing average = 7.9659854664522E-6
  2. min = 5.0067901611328E-6
  3. max = 4.2915344238281E-5
  4.  
  5. curl average = 0.40228473438936
  6. min = 0.34250092506409
  7. max = 1.2593679428101
  8.  
  9. snoopy average = 0.37644760748919
  10. min = 0.34368300437927
  11. max = 0.60013008117676
  12.  
  13. fsockopen average = 0.12776509920756
  14. min = 0.10803699493408
  15. max = 0.22104907035828
  16.  
  17. fopen average = 0.41444192213171
  18. min = 0.37545895576477
  19. max = 0.77188587188721
  20.  
  21. stream average = 0.41173196306416
  22. min = 0.37820982933044
  23. max = 0.62948799133301
  24.  
  25. file average = 0.40511836725123
  26. min = 0.3781590461731
  27. max = 0.69333410263062
  28.  
  29. file_get_contents average = 0.41732146693211
  30. min = 0.37894988059998
  31. max = 0.68813395500183

Here are the results for accessing the front page of the php.net site, which is 31.13 kB.

  1. nothing average = 7.9566357182521E-6
  2. min = 5.0067901611328E-6
  3. max = 4.1961669921875E-5
  4.  
  5. curl average = 1.3938542253831
  6. min = 1.0989301204681
  7. max = 1.7416069507599
  8.  
  9. snoopy average = 1.3927017473707
  10. min = 1.1879198551178
  11. max = 1.6753461360931
  12.  
  13. fsockopen average = 0.61519114176432
  14. min = 0.56387495994568
  15. max = 1.0507898330688
  16.  
  17. fopen average = 1.7791449415917
  18. min = 1.5167849063873
  19. max = 5.4117441177368
  20.  
  21. stream average = 1.6853758213567
  22. min = 1.4832580089569
  23. max = 2.3599371910095
  24.  
  25. file average = 1.7028319742165
  26. min = 1.4881958961487
  27. max = 2.3575241565704
  28.  
  29. file_get_contents average = 1.7295382069606
  30. min = 1.5339889526367
  31. max = 2.5368640422821

Here is a graph of the results.

Click To View

The results show that if the page you are downloading has any content then the quickest way to download a web page is to use fsockopen(). If the page has little or no content then you are best off using curl() as fsockopen() seems to perform worst with this.

So fsockopen() is the quickest way to get a web page. However, it is probably best to use Snoopy to do all of your fsockopen() calls as there is no point in reinventing the wheel to accomplish a simple task. If you really need the speed and are sure that you will always use the same page then use your own fsockopen() function. Snoopy does ass some overhead, but I think the benefits of using Snoopy outweigh the drop in speed.

What was odd with the data was the amount of time difference between downloading the home page of hashbangcode.com and php.net. I expected the php.net server to take less time, and although fsockopen() was far quicker than any other method, the slowest method took more than a second to download the php.net site than the hashbangcode.com site. There is a page size difference of about 3 kB, but this is not enough to effect the result like this. The only thing I can think of is that the distance is of the php.net server is causing this speed difference. I ran the results on my own computer in the UK, and the php.net server is hosted in the USA.

Doing nothing takes about 0.000008 seconds in all cases, which is no time at all really and shows that the benchmarking function contributes a tiny amount of time to the overall time taken. This time could be taken away from the time taken for each method to give a more realistic result.

If you have any other method that I have missed then post a comment and let me know. Also, it would be interesting if people ran this on their own computers/hosts in order to find out a global average time as there are other factors involved in altering these times. If you run this test then let please send me the results so that I can collate them.

Comments

Permalink

 

Waow, Interesting .

I use curl ... 'm thinking to change

onymous (Mon, 09/12/2011 - 21:23)

Permalink
Excellent post on the speed. However there is another side to this question and that is availability. Many webserver do not allow fsockopen(), fopen(), file_get_contents(), etc. So while fsockopen() is the fastest, you may be forced to use a different method just to get the data.

LostSteak (Mon, 07/23/2012 - 00:44)

Permalink
Excellent blog post. I absolutely appreciate this site. Stick with it!

http://tvjourn… (Mon, 05/13/2013 - 11:54)

Add new comment

The content of this field is kept private and will not be shown publicly.