Avoiding URL Canonicalisation With mod_rewrite And Apache

22nd February 2008

URL canonicalisation is where you have a website with different URLs outputting the same content. When search engine spiders see all this content that is the same they can get confused as to what page to display in search engine result pages. The following URLs, although they are different, actually produce the same content.

  1. http://www.example.com
  2. http://example.com
  3. http://www.example.com/
  4. http://www.example.com/index.html

The way to solve this issue is to redirect any requests to a single page using mod_rewrite. Add a .htaccess file to your root directory and include the following line to turn on the engine.

RewriteEngine On

The following rule will redirect the www page to the non-www page.

  1. #Redirecting non-www to www.domain.com:
  2. RewriteCond %{HTTP_HOST} ^domain\.com$ [NC]
  3. RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]

Use the following rule to redirect from the index.html page to the directory name.

  1. #Redirecting /index.html to /:
  2. RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html
  3. RewriteRule ^index\.html$ http://www.domain.com/ [R=301,L]

If you want to detect for the existence of mod_rewrite you can include all of the previous lines in an if statement like this.

  1. <IfModule mod_rewrite.c>
  2. RewriteEngine On
  3.  
  4. #Redirecting non-www to www.domain.com:
  5. RewriteCond %{HTTP_HOST} ^domain\.com$ [NC]
  6. RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]
  7.  
  8. #Redirecting /index.html to /:
  9. RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.html
  10. RewriteRule ^index\.html$ http://www.domain.com/ [R=301,L]
  11. </IfModule>

 

Add new comment

The content of this field is kept private and will not be shown publicly.