Googlebot! Engage!

This article is over 10 years old. The content within may be out of date.

One thing you have to remember about search engines is that they aren’t human. A web crawler sees your page completely differently than you do. In fact, they don’t really “see” it at all. Rather, they systematically analyze the code on the page, and, through an extraordinarily complex set of algorithms, attempt to interpret what the site is about.

While Google’s ability to “see” a site like a human gets better every single day, there are still some inherent weaknesses in artificial intelligence holding them back. For that reason, it’s a good idea to look at your site the way they do every once in a while.

What does it do?

First and foremost, it’s a great way to find crawl errors. Google can only find a webpage by following a link to that page. If the spiders can’t reach your site, or a specific URL on that site, that site or URL won’t be included in Google’s index. That means you don’t get credit for what you’ve created, and won’t show up for any search, no matter how relevant your site may be. Conversely, if you wanted to block a portion of your site from being crawled (to prevent duplicate content errors, conserve bandwidth, etc) it’s nice to see that it’s working correctly.

If your site is designed so that users see one thing and spiders see another, you could be in trouble. This practice is called cloaking (just like in Star Trek), and whether you did it on purpose or not, you could face some serious penalties if Google deems it manipulative. Once again, it’s nice to double check for that kind of thing once in a while.

How do I do it?

There are a number of ways to make yourself a bit more robotic. Each of the major browsers has a plethora of user-agent switching tools available. Google provides a great starting point with the Crawl errors page in Webmaster Tools. Here they list problems they have come across when attempting to crawl your site.

It’s a good idea to compare and contrast between your regular user agent and Googlebot. If there are cloaked portions of your site, that is a sure way to find them. Of course, many sites are much too large to effectively compare this way. If you are working on a large site that may have employed some of these tactics in the past, you might want to try out some of the cloak detector tools available, although your best bet is to just get to know the site, especially any Javascript or other elements the spiders can’t read.

In the end, if the robots can’t find you, people won’t either, so it pays to make sure they know where to look.