One of the objectives a website owner will most probably have is to have their website effectively crawled by search engines. They often miss a common pitfall of being crawled and that is the efficiency of search engines. “What do you mean Les”?
In this post, I will teach you how to determine what pages on your site have been crawled, and what you can do to optimize things so that certain things do not crawled and indexed.
I have never seen a website where 100% of the pages and content needed to be crawled. In fact, allowing the search engines to do too good of a job can actually hurt your business as SERPs that you do not want to appear in Google (Bing & Yahoo) will show. They are simply non-denominational. Let them see it and they will crawl and index it (sooner or later).
Determining what Parts have your site have been Crawled and Indexed
There are a number of ways to do this:
- Let Google do the work. Simply go to www.google.com (or. ca or whatever….) and type site:mysitename.com and hit the search box. Here’s a real life example of my Cycling Blog:
Oh oh, I have a snippet to fix!
- I am assuming that you re using an Analytics application to track your Website’s statistics….. Right? If not get to it!
The data from your program will show you traffic to your site. If Google or whomever has indexed an area of your website that you do not want traffic to, any hits will show here. Of course this is not foolproof as just because a page has been indexed does not mean that someone will click through to it. So take this option as a check and balance. That’s all. - Sign up for an application like Google Webmaster Tools. As with most things Google, it’s free. I won’t get into many details. But you can be sure to find some handy goodies to help maximize the efficiency of your web presence. It’s actually very cool. And fiddling with your robots.txt file is only the tip of the iceberg!
Making Sure What you Want to Get Crawled… And What You Don’t Want to Get Crawled…..
By now you have collected a few ways of seeing what has been crawled on your website and why this may or may not be good. So how about having a look-see at a robot.txt file? Exciting eh!?!
Now this interesting! What do you see?
- They have opened themselves up to all search engines
- Man, do they ever a lot of ad partners! (note blocked directories)
So now you need to learn how to do this yourself (Caution: If you are unsure of yourself, hire somebody who knows what they are doing).
Here’s a good resource to get you started.
And -- almost forgot the source
And, to cap things off, an interesting video from Google’s own Matt Cutts that speaks to how Google indexes websites and the impact of robots.txt. 4 and 1/2 minutes of time well-spent!



