Comparing web traffic stats: google analytics, google urchin, and webtrends

The best way to take web analytics data is with a healthy dose of salt. While analytics software packages have improved over the years, they are still best seen as providing an indicator of patterns over time, rather than representing accurate absolute numbers.

As I currently have access to traffic stats for the same website from three different analytics solutions, I thought I would share them.

Table: Analytics data
Visits by month
Analytics package June July August September
Webtrends 31,581 28,333 29,582 29,374
Google Analytics 5,821 5,539 4,525 6,683
Google Urchin 27,920 25,082 25,022 23,839

As you can see, Web Trends and Google Urchin are roughly in line (not surprising, considering they are both work via logfile analysis), while Google Analytics is about five times lower than the others.

Another interesting comparison was on file downloads, as recorded by Urchin and Web Trends. With my limited sample, I found that Google Urchin undershot Web Trends in reported PDF download numbers by a factor of anywhere between 2 and 10. That is to say, Web Trends might report 28,000 downloads, and Urchin would report 2,500. Or Web Trends would say 5,000 and Urchin lists 2,300.

Apparently random, and confusing. I haven’t tried any of the javascript hacks that get Google Analytics to track file downloads.

Google’s confusing and questionable advice on URL rewriting

Kudos to the Google Webmaster Central Blog for what seems like a conscious effort to try to address head-on the concerns of the community over common problem areas like duplicate content and 404s. But their most recent effort, which examines the pros and cons of statics vs. dynamic URLs from Google’s point of view, seem to have resulted in more heat than light, with a number of confused commenters responding to the post.

The gist of the entry was that you should probably leave your dynamic URLs as-is, because Google has no problem crawling them; re-writing them, if done improperly, can increase the difficulty of crawling your site, and is also hard to maintain.

I think Google seems to think that doing URL rewrites is more difficult than it is. I would like to think that most websites are either using a CMS that can adequately handle this (as is the case with WordPress, Drupal and Joomla! for instance), or are being run by someone who has the technical expertise to ensure that this is done appropriately and straightforwardly.

But even if that isn’t the case, Google’s advice here runs counter to the more reasonable advice provded in Tim Berners-Lee‘s W3C Style article, Cool URIs Don’t Change.

There are many reasons for using “cool URIs”, including the fact that they are easier to type, recognize and remember for people. One of the best reasons offered in this article, though, is that if you have a bunch of technology-dependent cruft in your URL, then you decide to switch the underlying technology, you’re going to end up with an entirely different URL structure, thus breaking all the bookmarks and links that have ever been made to your site.

I think the advantages of cool URIs outweigh the risks associated with mapping your dynamic URLs to static URLs, and it is kind of narrow-minded for Google to look at this only as a search engine crawling problem, rather than seeing it in a larger context.

UPDATE

There’s a good post over at SEOmoz on the same topic that lists a bunch of other reasons why, on balance, rewriting your dynamic URLs is still a good idea.

Consequences of bot-mediated reality

I have a lot of catch-up listening to do with regards to The Long Now Foundation‘s excellent Seminars About Long-term Thinking (SALT) lecture and podcast series. I’m a charter member of the Foundation, which gets you a sweet membership card and access to video of their lectures, among other less tangible things like knowing you’re helping inject some much-needed awareness of long-term thinking and planning into public discourse.

One of the lectures I’m particularly looking forward to downloading is the recent Daemon: Bot-Mediated Reality by Daniel Suarez, which I think has particular relevance given the recent and rather large f-up in which Google’s news crawler inadvertently “evaporated $1.14B USD”.

Unfortunately, I think that in the near future, as more and more processes are automated, we will see more such screw-ups of this scale. I can’t help but think that this might have been avoidable, though, if the indexing engine had been able to take advantage of semantic data rather than relying on scraping and evaluating natural language.

Google chrome: speed at the price of bloat?

Yes, Virginia, Google Chrome is fast. I installed it on my Windows XP work machine to give it a whirl, and it certainly rivals if not outperforms Firefox 3 in the speed department (though I haven’t found it to be as fast as some).

Hardware manufacturers everywhere must be thanking Google for this development. As this Slashdot article points out, both Internet Explorer 8 and Google Chrome aim to overcome the limitations that a single-process approach imposes on traditional browsers (i.e. more prone to irrecoverable crashes, less robust when running multiple web apps). What this means is that both IE8 and Chrome will be more resource intensive. Sure, you can have flickr, google maps, and photosynth running all at the same time in different tabs, but it will cost you in extra RAM and CPU overhead (not that this is surprising).

So, even if all our applications do eventually get moved off our desktops into “the cloud”, as some people predict/hope will happen, we will still need faster and larger computers to be able to run these increasingly complicated, JavaScript-laden and Ajax-y web apps. The days of the dumb terminal are over and aren’t coming back.

In any case, I’m sticking with Firefox 3 at least until Chrome is available for Mac and gets a comparable set of plugins.