Robots.txt

The robots.txt tool is used to restrict the areas which robots are permitted to access in your website. If you are happy for search engine and other robots to crawl the whole of your site then there is no need to include a robots.txt function in your website directory. However, most people find this tool useful for SEO purposes.

Why would you not want your entire site read by a robot?

Firstly, your website can contain duplicate content; for example if you have a ‘printer-friendly’ version of a document as well as the actual document on the website. In terms of SEO, duplicate content has a negative effect and can result in Google penalties; this is why you would not want the Google robot to search the ‘printer friendly’ version as well as the original.

Also, there may be parts of a website that are not important for SEO purposes and that you do not want or need to appear in Google search results. Examples of these include admin login pages, or shopping cart pages. These will not add any value to your SEO campaign and so you will not need them to appear in Google so there is no need for Googlebots to crawl and index them.

Please be advised that Google will not crawl or index pages that are included in the robots.txt disallow tool, but they may still index the page if the there is a link to it from a website page that is allowed to be crawled.

In order to be recognised, robots.txt commands must be entered into your websites main directory e.g. www.exampledomain.co.uk/robots.txt. This is because search engines will only look as far as your main root directory for the robots.txt file; if it is not there, they will assume that there is no robots.txt file and crawl the whole site, even if it appears later in the website code.

It is worth noting that not all robots are for search engine or positive purposes; many robots have been created for negative purposes e.g. to collect email addresses and data in order to send spam emails. However, a robots.txt file will not necessarily prevent these robots from entering your files. This is because this type of file does not block robots from entering site files, it merely states that the website owner would rather not have some pages crawled. Search engine robots generally heed the wishes of the webmaster and do not crawl the designated pages, whilst malicious robots generally ignore the robots.txt commands and crawl the whole site regardless.

Overall then, the robots.txt command is a tool vital in SEO practices on your website and so should be used efficiently for this. It does not however protect sites from malicious robots so ensure that you have adequate security and protection against these too.

To create robots.txt tool click here.

Robots.txt

Why would you not want your entire site read by a robot?

Client Links

News & Information

Services

Careers at Chameleon

Links

Chameleon Web Services