Home > Magento SEO > Magento Robots.txt & SEO

Magento Robots.txt & SEO

Posted on: 15th Nov 2010 By: Robert Kent 18 Comments

Today I’m going to talk about robots.txt files and their impact on a Magento Website. Thanks to a current customer of ours who stumbled across a robots.txt thread in the magento forums we have the basis for a comprehensive robots.txt file.

First, a little history.

Imagine your website as a stately manor – a bit like that series “Downton Abbey” that was on TV recently. A big expansive and exquisitely designed piece of magnificence. You are lord and lady of this manor, you know where things are and where they need to be. However you are nervous, you pace the corridors with anxiety, you check the positions of chairs, sofas, candelabras. Anything out of place you put back, anything now put back that looks out of place you put away. You slide your finger along the surfaces checking for dust, you get on your hands and knees inspecting the carpet.

You know you should leave these jobs to your servants, but can you really trust them to do it right? Do they have as much at stake in this as you do? Of course not. They do not understand the significance of this visit. It all comes down to this, tonight will, perhaps, just perhaps – hold the key for your entire future. The money, the fame, the glory! It all rests on this.

For tonight you dine with Mr G.

You feel a little more confident in your presentation, the hairs are plucked and the spots are squeezed, however you know that Mr G is meticulous and misses nothing, you swallow hard. A bellboy runs forward and informs you a carriage is approaching, you feel the first tendrils of real panic setting in. Your mind races.

What have I forgotten? What have I missed? Nothing, the path is clear, he will enter here, he will walk there and he will dine there and leave the same way. Oh God but what if he doesn’t? What if he wants to have a look around?

You can’t hold him by the hand – you are a Lord – you need to be presented at the dinner table, not ushering Mr G down halls and stairs. Your eyes scan the entry room – servants are lining up as they have been told, everyone knows what they are doing and you are confident they will do it well. Who can you spare, who can act as guide and guardian of Mr G? Your gaze settles on Master R Botts. He is a plain young man and there’s nothing remarkable about him. However you know that whatever you ask of him he will do and he will do it well.

“Master Botts!” you call. “Mr G will be here presently, when he arrives you will be his guide. There are places in the house that have not had the greatest of care taken over them. I would prefer it if he were not to witness such things. You must keep him away from the following:”. You proceed to rattle off a list of rooms and hallways that are exempt to Mr G’s visit. “That’s about it Master Botts, this is a splendid house and I know he will appreciate it. But just keep him from those areas.”.

Master Botts simply nods his head. It will be done.

With a weight much like that of your solid oak doors suddenly taken from your shoulders, you hurry to the dining hall where you will await Mr G and his all important verdict…

Back to the Present

I felt that it was high time that a new description of “what a robots.txt file is for” was produced. Sorry for the lengthy passage I simply got carried away.

Anyway, a robots.txt file is very important and the moral of the above story is this (or thereabouts): “When Mr G (Google) appears he will want to see everything about your site, in almost all cases it is not wise to do so, therefore you should always have somebody – as it cannot be yourself – at hand to steer Mr G away from trouble. Robots.txt files (Master Botts) are the perfect weapons for this situation. Master Botts will be the first to greet Mr G, and will not leave his side until Mr G leaves trouble-free.”.

How can we apply this to Magento? Well what we need to do is make sure that all areas of a Magento website, that are not for the public, are disallowed in your robots.txt file.

A good example of this can be seen in the following – and you will notice that there are many areas that are included – as well as not including into this file the path to the admin login (a sure-fire giveaway for hackers and crackers alike).

# $Id: robots.txt,v magento-specific 2010/28/01 18:24:19 goba Exp $
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these "robots" where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used:    http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

# Website Sitemap
Sitemap: http://www.mydomain.com/sitemap.xml

# Crawlers Setup
User-agent: *
Crawl-delay: 10

# Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow:/catalogsearch/result/

# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
# Disallow: /media/ // I would personally allow this folder for google product caching
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /stats/
Disallow: /var/

# Paths (clean URLs)
Disallow: /index.php/
Disallow: /catalog/product_compare/
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /catalogsearch/
Disallow: /checkout/
Disallow: /control/
Disallow: /contacts/
Disallow: /customer/
Disallow: /customize/
Disallow: /newsletter/
Disallow: /poll/
Disallow: /review/
Disallow: /sendfriend/
Disallow: /tag/
Disallow: /wishlist/

# Files
Disallow: /cron.php
Disallow: /cron.sh
Disallow: /error_log
Disallow: /install.php
Disallow: /LICENSE.html
Disallow: /LICENSE.txt
Disallow: /LICENSE_AFL.txt
Disallow: /STATUS.txt

# Paths (no clean URLs)
Disallow: /*.js$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?p=*&
Disallow: /*?SID=

All credit for the above lies with the Magento Forum at this thread.

My role in this was to perhaps remind you of the importance of a robots.txt file and what it can actually do. Oh and one more thing – make sure you link your sitemap.xml file into your robots.txt. I should have added something into the story such as “showing Mr G your fabulous family portrait wall” or something similar but hey-ho there you go.

Thanks for visiting e-commerce web design, home of the magento fox.

18 Responses to “ Magento Robots.txt & SEO ”

  1. Sulman
    #1 | 16th November 2010

    What a great post! Loving the start!

    Can this robots.txt file be dropped straight in to the root without problems (other than changing the domain)?

    If so is it ok for all versions of Magento?

    Thanks again!

  2. Adam Moss
    #2 | 17th November 2010

    Yes just drop it in the root, it has no effect on the actual site itself – just the robots that crawl it.

  3. Sulman
    #3 | 17th November 2010

    Excellent! Thanks very much for sharing.

  4. Mike305
    #4 | 21st November 2010

    Thank You! .. I added this to my site today!

  5. Bill
    #5 | 24th November 2010

    Just wondering how do you stop a bot from reading SSL? or content encrypted within https?

  6. louboutin 2011
    #6 | 6th April 2011

    I read mostly all the post of your blog, the posts were really interesting and I came to know about lot of things. Thank you so much for your post.

  7. Sergio Iacobucci
    #7 | 13th April 2011

    All of those URLs, do they need to be changed slightly to be specific to my site? Or can I just use that exact format because it is Magento?

    Many thanks, very helpful post and I love the start,

  8. michel
    #8 | 20th May 2011

    Thank you for sharing , i am wondering how it woul look like for a multiple website runnig under one installation. Each underlying website are place in seperate order under the root. My question: Do i have to replicate the robots.txt and /or which part of it and place a copy in each order (website)? Thank you for your reply.

  9. Penny Magas
    #9 | 2nd June 2011

    Thank you very much! I wonder if you could offer me some guidance as it relates to my site and the sitemaps? You mention that the sitemap needs to be in the root. You’re the first one I’ve found who’s mentioned that and it could well the source of my problem with poor search results.

    I have multiple websites running under one installation and the sites all go to one cart/one SSL. (I only mention that because it had a major impact on my directory structure.)

    I can put the robots.txt in the root of each site. But the sitemap that magento prepares daily is not in the root – it’s in the sitemap folder. On the magento configuration for google sitemap generation, I point it to the sitemap folder and have a symbolic link to get it to the proper site’s folder. (When I specify the root there, it puts all of the sitemaps in the root of the folder for the “main” site – the one that has the SSL.)

    If I create a symbolic link in the root of each site for just the file in the sitemap folder, will it be crawled or will the link cause the crawlers and such to overlook it?

    Alternatively, I can (manually) copy the sitemap file to the root periodically. It won’t be as effective because it won’t get done daily, but it might be better than nothing?

    I do hope this makes sense. Thank you in advance for your help with this!

  10. michel
    #10 | 3rd June 2011

    Hello, I am running a single installation of Magento with multiple website. Since the new version 1.5.0.1 underlying folder for the website(s9 are no more needed all is now simply implemented in the inex.phtml in the root. So you only need one robots.txt and place at the end of this text the link to your different sitemap like this
    Sitemap: http://www.yoursite.de/de_sitemap.xml
    Sitemap: http://www.yoursite.es/es_sitemap.xml
    Sitemap: http://www.yoursite.com/en_sitemap.xml
    Sitemap: http://www.yoursite.eu/sitemap.xml

    etc…
    All these sitemaps should (can) be placed in the root of your site.

    The most important part is the possibility now to switch directly from only one piece of code in the index file to each website without any subfolder.Hope this help further.
    Michel

  11. Neil Bradley
    #11 | 17th June 2011

    Has anyone had problems with Bing Webmaster Tools by using this? Google Webmaster has crawled my site, but Bing reports that it is being exluded by the robots.txt. :S

  12. Yashar
    #12 | 1st July 2011

    Why do you recommend “Crawl-delay: 10″ does that not effect the SEO in a bad way?

  13. Robert Kent
    #13 | 4th July 2011

    Hi Yashar,

    That’s a good point – in fact Google completely disregards that line – many other robots do too – feel free to remove it if you wish – the only reason for the crawl delay is mainly to let the server catch up with the request – back in the day most connections and servers were slower – now not the case – so it’s pretty much redundant.

  14. Shoetopia
    #14 | 19th September 2011

    Awesomeeeeeee! Really appreciate this ;) . Hope it works, started getting annoying with my random folders getting indexed.

  15. Hiren Modi
    #15 | 18th October 2011

    I really happy to read blog post which is highly associated to Robots.txt for Magento. I am going to transfer entire platform for my eCommerce website. [osCommerce to Magento]

    http://www.vistastores.com/robots.txt
    This is my current Robotst.txt on osCommerce platform. But, I am quite confuse define directories, paths and files for Magento. You have given such a great example of Robots.txt file so, can I implement for my Magento website? Is there any default Robots.txt file which help me to Disallow accurate folders?

  16. modra ideja
    #16 | 27th October 2011

    Excelent! Tnx for this excelent post. I will upload this robots.txt file to all my Magento stores. Keep up the good work!

  17. Mike
    #17 | 2nd November 2011

    When using this robots.txt file I am getting errors in both webmaster tools:

    “Some important page is blocked by robots.txt. More Details”

    and Google Base:

    “Roboted images (592 items)
    The submitted image URLs seem to be blocked by robots.txt. Google will not be able to display these images together with the products. Please change your robots.txt file to allow Google to download the image.”

    Anyone else having these issues?

  18. Wally
    #18 | 17th November 2011

    Thank you for the robots.txt – however I have to agree with MIKE as I am getting a “Some important page is blocked by robots.txt. More Details” by google webmaster tools as well.

    Is there any fix?

Post A Comment

Your comments:
Enclose code snippets within the appropriate tags: [php][/php]   [js][/js]   [xml][/xml]   [css][/css]   [html][/html]
E.g: [php]<?php echo "hello world"; ?>[/php]

Search Blog

Archives

For the record...

Views & opinions in this blog are those of the individual and do not necessarily reflect those of E-commerce Web Design or the Creare Group.