With another E-metrics Summit over (sans me, sadly), it’s clear that interest in web analytics and online measurement remains high, even (or especially) in these troubled times. But as the technology sets for online advertising and web analytics continue to merge and overlap, one urgent question remains unanswered: what are we going to do about data collection?
You only have to talk to any medium-sized web agency, or marketing manager for an e-commerce site, to understand that online behavior data collection is deeply broken right now – ad servers and web analytics products still collect their data entirely separately, leading to misery for webmasters as they struggle to maintain two (or three, or four…) tracking tags on each page of a site, and misery for analysts as they struggle to reconcile differing numbers from different systems. If you throw ad tags (that is, the snippets of code that actually cause ads to be displayed on a page, such as the AdSense code) into the mix, things become even more complicated.
How we as an industry go about fixing this problem depends on who we care about more: webmasters (I use that term loosely to refer to the gaggle of unfortunates who are charged with maintaining and updating a website), or marketers; or whether we decide that we care about them both. Here are some ideas (none of them new) about how to approach the problem, together with “feel the love” rankings for marketers and webmasters. Feel free to add your own ideas in the comments.
Idea 1: Merge the back-end data
Marketers: ♥♥♥ (out of 5)
Webmasters: ♥ (out of 5)
It’s not uncommon for a site to be using multiple tag code from the same vendor, such as Google (which has separate tags for Adwords, AdSense, GA and DFA/DFP) or our good selves (adCenter, adCenter Analytics, Atlas and others). If this is the case, then the vendor has the opportunity – some would say the responsibility – to join together the data it collects at the back-end to provide a more joined-up and consistent set of reports for marketers.
Google has just taken another decent step in this direction with its inclusion of AdSense clickthrough and CPC data in GA reports. I don’t actually have detail on exactly how they’re doing this, but my best guess is that they’re merging the click data from AdSense with the impression data from Analytics.
You can generalize this approach to a situation where two or more vendors might group together to pool the data they have to provide a consolidated set of reports. This is (sort of) the approach used by Omniture and DoubleClick, where you can use an Omniture tag in place of DoubleClick spotlight tags for conversion tracking.
The crucial pre-requisite is that the different sources of data need to be mergeable; and that means a couple of things. First, the visitor ID needs to be shared between the data sets. This is fairly easy for a single vendor to achieve, but trickier for vendors working together.
The other implication is that it needs to be possible to de-duplicate individual transactions. If you have two tags on your page, one for a web analytics product, and one for an ad server’s conversion tracking, it can actually be pretty challenging to ensure that when a user requests a page, you don’t count the page impression twice. Either you ignore one source of data completely (which is sort of what Google seems to do with AdSense/GA), or you have to employ various heuristics to decide when to throw something away – for example, if you register two identical page requests within a fraction of a second of one another, you can be confident (though not certain) that they are duplicates.
As for the customers? The marketer gets a decent benefit from this approach; they’ll see merged data, though the quality of the data may still leave something to be desired (hidden ‘seams’ where the data has been stitched together can trip up the unwary analyst). The webmaster, on the other hand, sees little benefit – they still have to maintain both tags, especially if each tag has its own unique capability. So this solution is really more of a stepping-stone to a more complete approach than a destination in its own right.
Idea 2: A “tag management” system
Even if a single vendor or pair of vendors can join forces to combine the data from a couple of tags, most sites are still going to be using multiple tags from multiple vendors, some of whom (by their very nature) are never likely to co-operate on data. Given this state of affairs, one obvious approach is to provide some more technology to the webmaster to help them manage the plethora of tags.
Such a system would be, essentially, a content management system for tagging, enabling the webmaster to define which tags from which vendors should appear in which places on their site. Such a system could come from a vendor, or a sufficiently motivated site owner could create it themselves.
A webmaster using such a system would see a dramatic reduction in the overhead associated with managing multiple tags (once they’d gone through the pain of implementing the tag management system’s tags, that is). Furthermore, a well-implemented tag management system would make it easier for the webmaster to introduce (and remove) tags, reducing some of the friction associated with moving from one analytics or ad serving vendor to another.
The big sticking point, however, with a system like this, is custom tagging. If you actually speak to a site owner about the pain of tag management, having to actually insert a JS file into the page is only a small part of the task – and that step is made much easier by modern content management systems.No, it’s the definition of custom variables, and integrating them with the data coming from the site, that is the challenging and time consuming step. Publishers (who are implementing ad server tag code to host ads on their site) also have the overhead of defining page groups for their content, which is a major task compared to the actual tagging itself.
So in order for such a system to be really useful, it would need to provide a standardized interface between the data coming from the site and the tags – essentially, its own custom variable schema with a defined set of mappings to Omniture, GA, Atlas AdManager, etc.
A company called Positive Feedback (based in London, which means they must be geniuses) has taken a stab at providing a solution here with their TagMan offering. And Tealium is looking to address the custom variables problem with their solution, TrackEvent.
Idea 3: A universal tag
Ah, the universal tag. The holy grail of web analytics (at least, according to some). The idea here is that a group of vendors (perhaps under the augurs of the Web Analytics Association) come together to create a universal piece of tag code that can capture data for any of their services. The upshot is that the webmaster only has to place this single tag on their site, and then configure the tag for whichever vendor solutions they’re using. A side benefit of the “universal tag” is that it can direct beacon requests to the customer’s own data collection systems as well as a third-party’s – avoiding the problem of data ownership.
They key challenge with this approach is that, despite warm words on the topic from web analytics vendors, there’s little real incentive to put a bunch of effort into doing something like this. All the vendors get is a potentially more complicated implementation, and more client mobility. What we may find happening instead is vendors supporting other vendors’ custom variables and event calls - so vendor A could come in and say “simply switch out your call JS file reference (or add ours), and we’ll start capturing the same data you’re already getting”. It would be interesting to see if any vendors complained that their IP was being infringed by this approach.
A variant of this idea is where a vendor creates a tag architecture and then works with partners to encourage them to abandon or supplement their own data collection with the vendor’s – thus making the vendor’s tag the universal tag. This is Omniture’s approach with Genesis. This approach strikes me as more likely to succeed, since the incentives work differently; it’s in Omniture’s interest to push continued Genesis tracking adoption.
The asymmetry of Omniture’s approach also makes a more general point about the universal tag idea – which is that it seems likely that the vendor who already has the most well-established tagging relationship with a client will be able to leverage that to get other systems’ data collection needs met within the framework of their tag. This is likely to be the web analytics vendor, so we should look to those organizations (rather than, say ad serving companies) to lead on a solution like this.
Idea 4: A universal data collection service
If you continue the thought process around universal tagging, and vendors looking to provide more and more help to customers with data collection, then you end up with the idea of a vendor providing a fully-fledged data collection service.
I’ve blogged about this idea before, as it happens. The core idea here is that some kindly organization (which has access to a large pool of cheap processing and data storage) takes it upon itself to offer a data collection service that is so flexible, reliable and cheap that many other vendors abandon their own data collection and use the common service.
Part of the service is a “universal tag” which can be configured to capture the data that each analytics/ad serving service needs. But the difference is that the universal tag doesn’t try to generate beacon calls in the correct formats for the individual services, or even send that data to those services’ data collection servers – it just gathers the data to a centralized repository and the other services access this data programmatically.
This approach combines some of the benefits of the two preceding ideas – for webmasters, the tag management process is radically simplified because one tag can do multiple things. Marketers like it because it would finally deliver numbers which match up. However, the approach wouldn’t work for certain things, such as adserving tags – unless that system was merged together with the data collection service.
Of course, another obstacle to this kind of approach taking root is vendors’ reluctance to entrust their (or their customers’) data to a third-party. This reluctance is liable to increase in proportion to the size of the vendor. So whilst Omniture would like balk at using a data collection from Google or Microsoft in place of its own, a small vendor (such as our pluckly little friends at Woopra) may find such a service invaluable in allowing them to focus on analytics rather than data collection.
So those are my ideas – what are yours? And which one(s) of the above ideas do you think are most likely to gain traction?