Tidy Up A URL With PHP

4th August 2008

Lots of applications require a user to input a URL and lots of problems occur as a result. I was recently looking for something that would take a URL as an input and allow me to make sure that is was formatted properly. There wasn't anything that did this so I decided to write it myself.

The following function takes in a URL as a string and tries to clean it up. It essentially does this by splitting is apart and then putting it back together again using the parse_url() function. In order to make sure that this function works you need to put a schema in front of the URL, so the first thing the function does (after trimming the string) is to check that a schema exists. If it doesn't then the function adds this onto the end.

  1. function tidyUrl($url){
  2. // trim the string
  3. $url = trim($url);
  4. // check for a schema and if there isn't one then add it
  5. if(substr($url,0,5)!='https' && substr($url,0,4)!='http' && substr($url,0,3)!='ftp'){
  6. $url = 'http://'.$url;
  7. };
  8. parse the url
  9. $parsed = @parse_url($url);
  10. if(!is_array($parsed)){
  11. return false;
  12. }
  13. // rebuild url
  14. $url = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '' : '//') : '';
  15. $url .= isset($parsed['user']) ? $parsed['user'].(isset($parsed['pass']) ? ':'.$parsed['pass'] : '').'@' : '';
  16. $url .= isset($parsed['host']) ? $parsed['host'] : '';
  17. $url .= isset($parsed['port']) ? ':'.$parsed['port'] : '';
  18. // if no path exists then add a slash
  19. if(isset($parsed['path'])){
  20. $url .= (substr($parsed['path'],0,1) == '/') ? $parsed['path'] : ('/'.$parsed['path']);
  21. }else{
  22. $url .= '/';
  23. };
  24. // append query
  25. $url .= isset($parsed['query']) ? '?'.$parsed['query'] : '';
  26. // return url string
  27. return $url;
  28. }

The parse_url() function should return an array is successful, if it doesn't then the function checks for this and returns false.

This function is also useful if you want to keep a standard format to any URL that you store. To make this easier in the long term you should store any domain URL with the trailing slash. If none is added by the user then the function adds it onto the end.

Add new comment

The content of this field is kept private and will not be shown publicly.