« Web Analytics in the movies | Main | Accenture buys Memetrics and Maxamine »

January 22, 2008

Tags vs Logs: The big fight

sp3There are many in the web analytics industry who could say (with some justification) that the tussle over whether to use client-side JavaScript tags or web server logs as your source of web analytics data has already been settled, with tags being declared the winner by a knockout. Certainly with Gatineau we've decided to place ourselves firmly in the tags corner (if you want to provide a hosted web analytics solution, and collect the data centrally, you really don't have any other option).

But logs aren't beat yet. Many vendors - Google, Webtrends, Clicktracks, WebAbacus, Site Intelligence to name a few - still offer the option to use logs as the primary data source. How come? Let's take a look at how this battle plays out.

 

Round 1: Convenience

Say what you like about accuracy (and you will, I'm sure), but you can't beat server logs for convenience. If you have the logs to hand, once you've installed your web analytics product, you simply point it at the logs, press the button, and sit back and wait for your data. There are wrinkles to be dealt with, for sure - you might have non-standard logs; you might have multiple web servers; or it might be difficult to gain access to the logs on your network (the three letters that strike fear into my heart? FTP), but most decent analytics tools can take these things in their stride.

Tag-based systems, by contrast, won't yield up a scrap of data until you've made code changes to your website and cut them live to your server. Then there's the hassle of ensuring that all the pages are tagged, and that pages don't become untagged at some later date when some developer looks at the code and thinks "what's this muck?" and removes it.

Round 1 winner: Logs

 

Round 2: Historical data

Straight back out of its corner after the success of round 1, logs delivers a second blow to tags: historical data. If you've been keeping your raw log files, a logs-based web analytics tool will be able to process that set of historical data and give you an instant picture of weeks, months or even years of activity on your site.

Tags just can't match this - the data only starts to be collected on the day you implement the tags, so you can't get a historical picture, by definition. This also makes it more challenging to move from one web analytics tools to another, since in the new tool you can't get a historical picture to ease the transition. It means that many companies leave their old tool in place for months whilst the new tool builds up a base of data - costly if you're paying for one or both tools.

Round 2 winner: Logs

 

Round 3: Visit and visitor counts

After its easy victories in the first two rounds, logs comes out with a swagger to square up on visit and visitor counts. But this time, tags is more than a match. Pretty much every tag-based analytics system serves up a persisitent cookie with the tag, and uses this cookie to sessionize the data (that is, build visits, by identifying page requests from the same user) and generate counts of unique users over longer periods of time. Once you've gone through the pain of instrumentation, this stuff comes pretty much for free, and is a great benefit.

It's perfectly possible to use cookies as user identifiers in a logs-based system; but firstly the site has to issue a cookie, and secondly that cookie has to be persistent and pervasive (i.e. every page should issue it if it isn't already present in the browser). This can be a royal pain to set up.

Round 3 winner: Tags

 

Round 4: Accuracy

With a win under their belt, the team in the tags corner is starting to feel a little more bullish. And, sure enough, when it comes to accuracy, tags give logs a run for its money. The main reason for this is that the actual tag request made by the JavaScript in a tag-based system cannot be cached; so every request made by a visitor ends up being recorded by the system that's listening out for the tag requests, resulting in pretty good accuracy at the page impression level

Log-based systems, on the other hand, are at the mercy of intermediate caches on the Internet - if a particular page (say, the home page) is relatively static and popular, a big subset of users will never hit the actual site's web server when they request that page - they'll be served a cached copy from a proxy somewhere between them and the site's server (probably at their ISP, or their corporate firewall). So a tag-based system can under-report page impressions by as much as 80% (though 40-50% is a more common figure). Worse still, the pages in a web site are not evenly cached, so a home page will be served from cache much more often than a deep page or a checkout page. This means that the shape of funnels can look screwy, and it is very difficult to determine anything other than broad traffic patterns.

Round 4 winner: Tags

 

Round 5: Non-HTML content

Not every web site is made up entirely of HTML. Come to that, not every transaction-based system that you might want to analyze the usage of is HTML based - for example, call center or IVR system usage. In these situations, log-based systems come into their own; many log-based analytics systems can turn their hand to a surprising number of analytics tasks, as long as the system they're analyzing the usage of can generate a log of its usage.

It used to be the case that this was a sucker punch for logs for non-HTML content on web sites too - but recently tag-based systems have got more adept at finding ways to track the usage of PDF files and other non-HTML content. Both Google Analytics and Gatineau have this functionality, for example.

Round 5 winner: A draw

 

Round 6: Sub-page events

Another knock-down for tags in this round. Sites which refresh content (manually or automatically) without executing a full page refresh present a particular challenge for web analytics tools of all stripes; but tag-based systems rise to the challenge much better than log-based ones. Increasingly, tag-based analytics tools offer the ability to attach a JavaScript event call to sub-page events, and track them as a separate kind of interaction (i.e. not a full-fledged page impression, but something worth counting nonetheless).

To pull this off with a log-based system, you'd have to modify your site code to generate a dummy log entry on your web server (perhaps by requesting a non-existent HTML file), and then, whilst processing the data, treat this HTML file and others like it as a special case, ensuring the analytics system doesn't accidentally count it as a page impression. It's doable, but gnarly, gnarly, gnarly. And I don't know of any log-based analytics system which implement a sub-page event model (perhaps someone can enlighten me via the comments box).

Round 6 winner: Tags

 

Round 7: Data integration

The team in the tags corner cries foul at this point, pointing out that data integration is more a function of whether you run your analytics system in-house or have it hosted as a third-party service; and that there are plenty of web analytics tools which can combine tag-based data collection with an in-house service. But there's a strong correlation between logs/tags and in-house/hosted, so the referee allows the fight to continue.

In-house systems do make data integration easier. A log-based analytics system will capture all the user identifiers (in cookies, typically), including those used by the site's own CMS, and a half-way decent web analytics tool will allow these identifiers to be extracted and then used as a key for the import of related data (for example, the purchase history of a known customer).

Because tag-based systems tend to send their tag request to a third-party server (the web analytics provider's data collection server), these cookies are not automatically captured. You can modify or customize the tag script for some tools to capture identity cookie values as variables, but then you're still left with the challenge of importing potentially sensitive customer data across the Internet. Data protection laws in the EU and US state that in order to use customer data for this "secondary use" and transfer it to a third party, you have to get the customer's explicit permission - something that most site owners are reluctant to do, for obvious reasons.

Round 7 winner: Logs (kinda)

 

The final score

Finally the competitors stagger back to their corners, bloody but unbowed. After some debate, the judges declare the final score to be:

Tags: 3,    Logs: 2½

So, a closer result than you might think. Tagging wins out (just) because of the better quality of the data it yields up; although it's a pain to instrument a site, you immediately get access to pretty good-quality, well sessionized data that you can start to build reports around. Logs are much more of a struggle to get set up to deliver good quality data, but once you're there you have as much flexibility as with a tag-based system, and more in some respects (for example, in the area of data integration).

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341bf6d253ef00e5500119f08834

Listed below are links to weblogs that reference Tags vs Logs: The big fight:

Comments

About

About me

Disclaimer

Subscribe

Enter your email address:

Delivered by FeedBurner

Subscribe