A guide to Robots.txt usage

Author: Stefan Vervoort | Published: 06 April 2008 | RSS | LINK Font size:

This guide has all the information, examples and explanation you need about Robots.txt files.

What is a Robots.txt file

As the name already tells you, Robots.txt is a text file. A text file telling search engine ‘robots’ what to do with certain pages in your website. That is all a Robots.txt file does. For example, if you have a private weblog for your colleagues, which people outside your work group, should not be reading or if you have a site displaying time tables for a high school, the use of Robots.txt is an option.

Content in a Robots.txt file

The Robots.txt isn’t as expanded as it could be. This means there are disadvantages to the use of this file and of course the lack of features. Let’s start to go through them.

If you would like to include any of these features to your website, simply open up Notepad and save that file as Robots.txt. You should upload this file always in the root of your website.

Block all robots on all files

User-agent: *
Disallow:

Block all robots

User-agent: *
Disallow: /

Block all robots on specific directories

User-agent: *
Disallow: /admin/
Disallow: /private/

Block a Bad Crawler

User-agent: Bad Crawler
Disallow: /
 
# In a Robots.txt file, you can add comments. Add a '#' and type your comment.

What you should know about Robots.txt

As I mentioned earlier: there are some shortcomings and disadvantages to the use of these files. In this part of the guide I will add a couple tips.

1. Always place your Robots.txt file in the root of your website. Example: http://divitodesign.com and http://blog.divitodesign.com both fit the criteria.

2. Usually, blocking bad robots will not work. Those bad robots are usually spam bots and they will not even look for a Robots.txt file. They just ignore them.

3. The Robots.txt file is a public file. Everyone can look and find via the Robots.txt file which parts of the site is blocked. The part is blocked for the robots, but they aren’t for the users! Do not forget this.

4. As we aren’t perfect, typo’s and syntax errors are possible. Fortunately, there are some Robots.txt syntax checkers out there.

5. When you use Google’s Webmaster Tools, you could use Google’s Robots.txt generator to generate your Robots.txt file.

6. Wordpress blogs could get a penalty for duplicating content (For example: same content on the “Categories and Archives” pages.) and you can use a Robots.txt file to exclude those archives and categories pages from the search results.

7. To block specific robots you need the right name. Here is a database of those robots with description per robot.

Do you need more information?

If you have any questions, please post a comment or contact me. You could check out the following sites if you need even more info; Robotstxt.org and the Wikipedia page.

Liked this post? Subscribe, or Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • Blogosphere News

Leave a Reply

This blog is a DoFollow blog. This means your URL counts as a backlink. Some basic HTML is allowed. Please keep all comments constructive, polite and on-topic. Any spam or offensive comments will be deleted.