But before we open and start working on Robots.txt, let's have a brief overview of its significance:
Warning! Use with caution. Incorrect use of these features can result in your blog being ignored by search engines.
What is Robots.txt?
With every blog that you create/post on your site, a related Robots.txt file is auto-generated by Blogger. The purpose of this file is to inform incoming robots (spiders, crawlers etc. sent by search engines like Google, Yahoo) about your blog, its structure and to tell whether or not to crawl pages on your blog. You as a blogger would like certain pages of your site to be indexed and crawled by search engines, while others you might prefer not to be indexed, like a label page, demo page or any other irrelevant page.How do they see Robots.txt?
Well, Robots.txt is the first thing these spiders view as soon as they reach your site. Your Robots.txt is like a hour flight attendant, that directs you to your seat and keep checking that you don't enter private areas. Therefore, all the incoming spiders would only index files that Robots.txt would tell to, keeping others saved from indexing.Where is Robots.txt located?
You can easily view your Robots.txt file either on your browser by adding /robots.txt to your blog address like http://myblog.blogspot.com/robots.txt or by simply signing into your blog and choosing Settings > Search engine Preference > Crawlers and indexing and selecting Edit next to Custom robots.txt.How Robots.txt does looks like?
If you haven't touched your robots.txt file yet, it should look something like this:User-agent: Mediapartners-GoogleDon't worry if it isn't colored or there isn't any line breaks in code, I colored it and placed line breaks so that you may understand what these words mean.
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://myblog.blogspot.com/feeds/posts/default?orderby=UPDATED
User-agent:Media partners-Google:
Mediapartners-Google is Google's AdSense robot that would often crawl your site looking for relevant ads to serve on your blog or site. If you disallow this option, they won't be able to see any ads on your specified posts or pages. Similarly, if you are not using Google AdSense ads on your site, simply remove both these lines.
User-agent: *
Those of you with little programming experience must have guessed the symbolic nature of character '*' (wildcard). For others, it specifies that this portion (and the lines beneath) is for all of you incoming spiders, robots, and crawlers.
Disallow: /search
Keyword Disallow, specifies the 'not to' do things for your blog. Add /search next to it, and that means you are guiding robots not to crawl the search pages /search results of your site. Therefore, a page result like http://myblog.blogspot.com/search/label/mylabel will never be crawled and indexed.
Allow: /
Keyword Allow specifies 'to do' things for your blog. Adding '/' means that the robot may crawl your homepage.
Sitemap:
Keyword Sitemap refers to our blogs sitemap; the given code here tells robots to index every new post. By specifying it with a link, we are optimizing it for efficient crawling for incoming guests, through which incoming robots will find path to our entire blog posts links, ensuring none of our posted blog posts will be left out from the SEO perspective.
However by default, the robot will index only 25 posts, so if you want to increase the number of index files, then replace the sitemap link with this one:
Sitemap: http://myblog.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500And if you have more than 500 published posts, then you can use these two sitemaps like below:
Sitemap: http://myblog.blogspot.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: http://myblog.blogspot.com/atom.xml?redirect=false&start-index=500&max-results=1000
How to prevent posts/pages from being indexed and crawled?
In case you haven't yet discovered yourself, here is how to stop spiders from crawling and indexing particular pages or posts:Disallow Particular Post
Disallow: /yyyy/mm/post-url.htmlThe /yyy/mm part specifies your blog posts publishing year and month and /post-url.html is the page you want them not to crawl. To prevent a post from being indexed/crawled simply copy the URL of your post that you want to exclude from indexing and remove the blog address from the beginning.
Disallow Particular Page
To disallow a particular page, you can use the same method as above. Just copy the page URL and remove your blog address from it, so that it will look something like this:Disallow: /p/page-url.html
Adding Custom Robots.Txt to Blogger
Now let's see how exactly you can add Custom Robots.txt file in Blogger:1. Sign in to you blogger account and click on your blog.
2. Go to Settings > Search Preferences > Crawlers and indexing.
3. Select 'Edit' next to Custom robots.txt and check the 'Yes' check box.
4. Paste your code or make changes as per your needs.
5. Once you are done, press Save Changes button.
6. And congratulations, you are done!
How to see if changes are being made to Robots.txt?
As explained above, simply type your blog address in the url bar of your browser and add /robots.txt at the end of your url as you can see in this example below:http://helplogger.blogspot.com/robots.txtOnce you visit the robots.txt file, you will see the code which you are using in your custom robots.txt file. See the below screenshot:
Final Words:
Are we through then bloggers? Are you done adding the Custom Robots.txt in Blogger? It was easy, once you knew what those code words meant. If you couldn't get it for the first time, just go again through the tutorial and before long, you will be customizing your friends' robots.txt files.In any case, from SEO and site ratings it's important to make that tiny bit of changes to your robots.txt file, so don't be a sloth. Learning is fun, as long as its free, isn't it?