Netscape HTTP Cooke File Parser In PHP

30th June 2011

I recently needed to create a function that would read and extract cookies from a Netscape HTTP cookie file. This file is generated by PHP when it runs CURL (with the appropriate options enabled) and can be used in subsequent CURL calls. This file can be read to see what cookies where created after CURL has finished running. As an example, this is the sort of file that might be created during a typical CURL call.

  1. # Netscape HTTP Cookie File
  2. # http://curl.haxx.se/rfc/cookie_spec.html
  3. # This file was generated by libcurl! Edit at your own risk.
  4.  
  5. www.example.com FALSE / FALSE 1338534278 cookiename value

The first few lines are comments and can therefore be ignored. The cookie data consists of the following items (in the order they appear in the file.

  • domain - The domain that created and that can read the variable.
  • flag - A TRUE/FALSE value indicating if all machines within a given domain can access the variable. This value is set automatically by the browser, depending on the value you set for domain.
  • path - The path within the domain that the variable is valid for.
  • secure - A TRUE/FALSE value indicating if a secure connection with the domain is needed to access the variable.
  • expiration - The UNIX time that the variable will expire on.
  • name - The name of the variable.
  • value - The value of the variable.

So the function used to extract this information would look like this. It works in a pretty straightforward way and essentially returns an array of cookies found, if any. I originally tried to use a hash character to determine the start of a commented line and then try to extract anything else that had content. It turns out, however, that some sites will add cookies with a hash character at the start (yes, even for the URL parameter). So it is safer to detect for a cookie line by seeing if there are 6 tab characters in it. This is then exploded by the tab character and converted into an array of data items.

  1. /**
  2.  * Extract any cookies found from the cookie file. This function expects to get
  3.  * a string containing the contents of the cookie file which it will then
  4.  * attempt to extract and return any cookies found within.
  5.  *
  6.  * @param string $string The contents of the cookie file.
  7.  *
  8.  * @return array The array of cookies as extracted from the string.
  9.  *
  10.  */
  11. function extractCookies($string) {
  12. $cookies = array();
  13.  
  14. $lines = explode("\n", $string);
  15.  
  16. // iterate over lines
  17. foreach ($lines as $line) {
  18.  
  19. // we only care for valid cookie def lines
  20. if (isset($line[0]) && substr_count($line, "\t") == 6) {
  21.  
  22. // get tokens in an array
  23. $tokens = explode("\t", $line);
  24.  
  25. // trim the tokens
  26. $tokens = array_map('trim', $tokens);
  27.  
  28. $cookie = array();
  29.  
  30. // Extract the data
  31. $cookie['domain'] = $tokens[0];
  32. $cookie['flag'] = $tokens[1];
  33. $cookie['path'] = $tokens[2];
  34. $cookie['secure'] = $tokens[3];
  35.  
  36. // Convert date to a readable format
  37. $cookie['expiration'] = date('Y-m-d h:i:s', $tokens[4]);
  38.  
  39. $cookie['name'] = $tokens[5];
  40. $cookie['value'] = $tokens[6];
  41.  
  42. // Record the cookie.
  43. $cookies[] = $cookie;
  44. }
  45. }
  46.  
  47. return $cookies;
  48. }

To test this function I used the following code. This takes a URL (google.com in this case) and sets up the options for CURL so that when the page is downloaded it also creates a cookie file. This file is then analyzed using the above function to see what cookies are present therein.

  1. // Url to extract cookies from
  2. $url = 'http://www.google.com/';
  3.  
  4. // Create a cookiefar file
  5. $cookiefile = tempnam("/tmp", "CURLCOOKIE");
  6.  
  7. // create a new cURL resource
  8. $curl = curl_init();
  9.  
  10. curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
  11.  
  12. // Set user agent
  13. curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20090910 Ubuntu/9.04 (jaunty) Shiretoko/3.5.3");
  14.  
  15. // set URL and other appropriate options
  16. curl_setopt($curl, CURLOPT_URL, $url);
  17. curl_setopt($curl, CURLOPT_HEADER, true);
  18.  
  19. curl_setopt($curl, CURLOPT_COOKIEJAR, $cookiefile);
  20.  
  21. $data = curl_exec($curl);
  22.  
  23. // close cURL resource, and free up system resources
  24. curl_close($curl);
  25.  
  26. // Extract and store any cookies found
  27. print_r(extractCookies(file_get_contents($cookiefile)));

When run, this function produces the following output.

  1. Array
  2. (
  3. [0] => Array
  4. (
  5. [domain] => .google.com
  6. [flag] => TRUE
  7. [path] => /
  8. [secure] => FALSE
  9. [expiration] => 2013-06-29 10:00:01
  10. [name] => PREF
  11. [value] => ID=051f529ee8937fc5:FF=0:TM=1309424401:LM=1309424401:S=4rhYyPL_bW9KxVHI
  12. )
  13.  
  14. )

 

Comments

Permalink
it's good. but when cookies had string "#HttpOnly",it runs error. such as: ... #HttpOnly_.twitter.com TRUE / TRUE 1711001654 auth_token fd62834f676 ...

yaoruisheng (Mon, 03/24/2014 - 07:37)

Add new comment

The content of this field is kept private and will not be shown publicly.