robots.txt

Drupal Logo

Drupal 9: Customise Your Robots.txt File

9th May 2021 - 7 minutes read time
A robots.txt file tells search engines spiders what pages or files they should or shouldn't request from your site. It is more of a way of preventing your site from being overloaded by requests rather than a secure mechanism to prevent access. It really shouldn't be used as a way of preventing access to your site, and the chances are that some search engine spiders will access the site anyway. If you do need to prevent access then think about using noindex directives within the page itself, or even password protecting the page.
#! code Logo

A Look At robots.txt Files

18th May 2009 - 5 minutes read time

A robots.txt file is a simple, static, file that you can add to your site in order to stop search engines from crawling the content of certain pages or directories. You can even prevent certain user agents from crawling certain areas of you site.

Lets take a real-world example and look at what you would do if you decided to set up a Feedburner feed in place of your normal RSS feed. I won't go into why you would do this much, other than to say that you get some nice usage statistics and it can save on some processing power on your server as your feed is only looked at when Feedburner updates. Once you have allowed your blog to issue the Feeburner feed instead of your normal feed you then need to stop search engines from indexing the old feed. This stops is appearing in search indexes and things so that you can get your users to grab the Feedburner feed and not your local feed. You would then put a robots.txt file in place with the following content.