robots.txt – What not to do

Warning – the “robots.txt” file can destroy whole website’s search rankings!

The “robots.txt” file is a small text file on a website’s main root that tells search engine spiders how to interact with the site.  Our “robots.txt” file allows any search engine spider (from Google, Bing, AOL, Yahoo etc.) to crawl every webpage of our site and use the pages in the search engine results.  Also the file importantly tells spiders where to find a website’s XML Sitemap; this is a list of all the URL’s on a site and makes sure no page is missed out when added to search engine indexes.  The XML Sitemap also notifies search engines of any new webpages that have been created, if you have a WordPress blog then make sure you have an XML Sitemap generator plugin installed on your blog for this purpose.

Another feature of robots.txt is to tell search engine spiders which pages to ignore, or at least skip out in the search engine results.  This is the same as adding the “NOINDEX” attribute in a meta tag.  Examples of using this could be on duplicate pages, pages with sensitive information such as names/addresses or sales pages full of repeated keywords.

Unfortunately it’s rather easy to set robots.txt to not index the entire website!
Just changing the line of code “Disallow:” to “Disallow: /” is all it takes to bring a whole website out of Google and all the other search engines.  This effect takes about 1-2 weeks but it has devastated many sites in the past, sometimes even deliberately.

WordPress blogs also have a habit of setting the NOINDEX attribute on sometimes when installed.  Check that WordPress isn’t blocking your website from search engines by going into the settings pictured:

If you don’t know what to do with a robots.txt file then it’s best to leave it altogether, Googlebot and other search engine spiders will happily go through the whole site if they don’t find it.  You cannot ensure that no web spider will crawl every single page of your website even when it’s completely blocked by robots.txt, if you have information that you do not wish to be seen then use passwords or don’t publish it at all!



Related Posts

  • Check your broken links for search spiders

    Every search engine (apart from search directories) uses programs called “spiders” to crawl whole websites and grab all the relevant data they can from each page. The spider will enter the homepage of a website and then follow every link it comes across in the code after saving the content.  It will then crawl each [...]

  • Google Sitemaps – What To Look Out For

    When looking at your Webmaster Tools which features do you normally look at? The lazy SEO’er will just click through the links on the left hand side of the page, hoping that nothing stands out as odd or wrong. If you’re new to Webmaster Tools or just find it a little daunting, one thing you [...]

  • Google’s Sandbox for Major Site URL Changes

    Changing a website’s entire URL structure can be both a blessing and a curse, if done incorrectly you can permanently damage a proportion of your overall SEO. You may change the URL structure of your website to improve aspects such as on-site SEO and/or site usability such as product filtering.  This can change nearly all [...]

  • Getting a new site indexed by Google Quickly

    When you have created a new website and uploaded the files it doesn’t instantly appear in Google’s search results how-ever you try and search for it. Google may have not noticed the new website or more likely they have “sandboxed” it which is where they put it within a holding area and wait to release [...]

  • WordPress SEO Blogging

    A WordPress blog has become the most effective tool for SEO experts within the last 4 years. User-generated content on the blog will be presented to Google and other search engines in the ideal way to get the maximum impact on search engine rankings. Here at RealWebSEO.com we use our own custom written WordPress plugins [...]

Leave a Comment