I’ve gone on and on lately about taking away some comforts for the sake of effective site analysis. There are some aspects of a good site analysis that you just can’t accomplish in a browser’s naturally cheery and helpful state. Today I’d like to present the last step toward making your browser as stiff and humorless as a web crawler.

GoogleBots aren't great at reading JavaScript... Yet

Google has steadily improved their algorithms over the years, and are constantly getting better at “seeing” a site the way a human’s puny optical nerves do. Similarly, the way a bot understands a web page has gone from finding keyword occurrences in the right places to something much more closely resembling a human’s puny cerebral processes. With Latent Semantic Indexing and other word relationship technologies, Google can determine what a site is about in a much more “natural language” sort of way.

Even with all these advancements though, Google is still a long way from Skynet. (Or at least that’s what they want us to believe…)

One particular Google-weakness is JavaScript. While they’ve improved drastically, Google is still not great at deciphering it. It’s not so much that the spiders are slow learners, it’s just that they weren’t designed for that. See, spiders were created to perform one task: crawl about on the world wide webs and keep track of what they find there. With the old standard HTML, it’s just a matter of reading: HTML is a markup language, meaning whatever it is meant to do is written out explicitly in the site code, not programmed.

JavaScript, on the other hand, is not so straight forward. JavaScript is a programming language. In order to utilize it, the browser must read the code and run the program. That’s where it gets tricky for spiders. It’s not easy for a computer program to run and analyze another computer program. Any step in that direction is a pretty major breakthrough, because we’re up against some mathematical problems that have been around for a very long time. In fact, programs that can execute and understand other programs are a pretty big step toward the Singularity!

But, putting the robot uprising and subsequent human apocalypse aside for now, the practical problem is this: if Google doesn’t understand your JavaScript, you may be inadvertently locking their spiders out of your content, and thus keeping that content out of Google’s index! So, once in a while, it’s wise to take a look at your site with JavaScript disabled, to make sure every page element is working correctly. This is especially important for fancy Java based navigation, because if Google gets confused there, it may not make it to the rest of your site.

And in your spare time, make sure you’re preparing to do your part in the human resistance. Robots are ruthless oppressors you know.

Shiny Cylons

Ruthless and shiny!