October 24, 2011

Wading into the Google Secure Search fray

broken-chain-1024x768There’s been quite the hullabaloo since Google announced last week that it was going to send signed-in users to Google Secure Search by default. Back when Google first announced Secure Search in May, there was some commentary about how it would reduce the amount of data available to web analytics tools. This is because browsers do not make page referrer information available in the HTTP header or in the page Document Object Model (accessible via JavaScript) when a user clicks a link from an SSL-secured page through to a non-secure page. This in turn means that a web analytics tool pointed at the destination site is unable to see the referring URLs for any SSL-secured pages that visitors arrived from.

This is all desired behavior, of course, because if you’ve been doing something super-secret on a secure website, you don’t want to suddenly pass info about what you’ve been doing to any old non-secure site when you click an off-site link (though shame on the web developer who places sensitive information in the URL of a site, even if the URL is encrypted).

At the time, the web analytics industry’s concerns were mitigated by the expectation that relatively few users would proactively choose to search on Google’s secure site, and that consequently the data impact would be minimal. But the impact will jump significantly once the choice becomes a default.

One curious quirk of Google’s announcement is this sentence (my highlighting):

When you search from https://www.google.com, websites you visit from our organic search listings will still know that you came from Google, but won't receive information about each individual query.

This sentence caused me to waste my morning running tests of exactly what referrer information is made available by different browsers in a secure-to-insecure referral situation. The answer (as I expected) is absolutely nothing – no domain data, and certainly no URL parameter (keyword) data is available. So I am left wondering whether the sentence above is just an inaccuracy on Google’s part – when you click through from Google Secure Search, sites will not know that you came from Google. Am I missing something here? [Update: Seems I am. See bottom of the post for more details]

I should say that I generally applaud Google’s commitment to protecting privacy online in this way – despite the fact that it has been demonstrated many times that an individual’s keyword history is a valuable asset for online identity thieves, most users would not bother to secure their searches when left to their own devices. On the other hand, this move does come with a fair amount of collateral damage for anyone engaged in SEO work. Google’s hope seems to be that over time more and more sites will adopt SSL as the default, which would enable sites to capture the referring information again – but that seems like a long way off.

It seems like Google Analytics is as affected by this change as any other web analytics tool. Interestingly, though, if Google chose to, it could make the click-through information available to GA, since it captures this information via the redirect it uses on the outbound links from the Search Results page. But if it were to do this, I think there would be something of an outcry, unless Google provided a way of making that same data to other tools, perhaps via an API.

So for the time being the industry is going to have to adjust to incomplete referrer information from Google, and indeed from other search engines (such as Bing) that follow suit. Always seems to be two steps forward, one step back for the web analytics industry. Ah well, plus ca change…

Update, 10/25: Thanks to commenter Anthony below for pointing me to this post on Google+ (of course). In the comments, Eric Wu nails what is actually happening that enables Google to say that it will still be passing its domain over when users click to non-secure sites. It seems that Google will be using a non-secure redirect that has the query parameter value removed from the redirect URL. Because the redirect is non-secure, its URL will appear in the referrer logs of the destination site, but without the actual keyword. As Eric points out, this has the further unfortunate side-effect of ensuring that destination sites will not receive query information, even if they themselves set SSL as their default (though it’s not clear to me how one can force Google to link to the SSL version of a site by default). The plot thickens…

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

May 13, 2009

Does Display help Search? Or does Search help Display?

One of the topics that we didn’t get quite enough time to cover in detail in my face-off with Avinash Kaushik at last week’s eMetrics Summit (of which more in another post) was the thorny issue of conversion attribution. When I asked Avinash about it, he made the sensible point that trying to correctly “attribute” a conversion to a mix of the interactions that preceded it ends up being a very subjective process, and that adopting a more experimental approach – tweaking aspects of a campaign and seeing which tweaks result in higher conversion rates – is more sound.

I asked the question in part because conversion attribution is conspicuously absent from Google Analytics – a fact which raises an interesting question about whether it’s in Google’s interest to include a feature like this, since it may stand to lose more than it gains by doing so (since the effective ROI of search will almost certainly go down when other channels are mixed into an attribution model).

Our own Atlas Institute is quite vocal on this topic, and has published a number of white papers such as this one [PDF] about the consideration/conversion funnel, and this one [PDF], on which channels are winners and losers in the new world of Engagement Mapping (our term for multi-channel conversion attribution).

The Atlas Institute has also opined about how adding display to a search campaign can raise the effectiveness of that campaign by 22% compared to search alone – in other words, how display helps search to be better.

However, a recent study from iProspect throws some new light on this discussion. The study – a survey of 1,575 web consumers – attempted to discover how people respond to display advertising. And one of the most interesting findings from the study is that, whilst 31% of users claim to have clicked on a display ad in the last 6 months, almost as many – 27% – claimed that they responded to the ad by searching for that product or brand:


This raises the interesting idea that search can actually help display be better, by providing a response mechanism that differs from the traditional ad click behavior that we expect. Of course, this still doesn’t mean that search should get 100% of the credit for a conversion in this kind of scenario – in fact, it makes a stronger case for “view-through” attribution of display campaigns – something that ad networks (like, er, our own Microsoft Media Network) are keen to encourage people to do, to make performance-based campaigns look better.

All this really means that, of course, it’s not a case of display vs. search, but display and search (and a whole lot of other ways of reaching consumers). Whether you take the view that it’s your display campaign that helps your search to be more effective, or your search keywords that help your display campaign to drive more response, multi-channel online marketing – and the complexity that goes with measuring it – looks set for the big time. And by “big time”, I mean the army of small advertisers currently using systems like Google’s AdWords, or our own adCenter. So maybe we’ll see multi-channel conversion attribution in Google Analytics before long.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

April 30, 2009

What would you like to ask Avinash Kaushik?

boxer The gloves will be tied tight. Brightly colored silk dressing gowns will be shrugged to the floor; gum-shields inserted. In the blue corner: yours truly. In the red (and blue, yellow and green) corner, web analytics heavyweight, Avinash Kaushik. As the crowd bays for blood, battle will be joined. The Garden never saw anything like this.

Well, ok, it’ll probably be a bit more civilized (well, a lot more civilized) than that. But at next week’s E-metrics Summit in San Jose, Avinash and I will indeed be going head to head in the “Rules for Analytics Revolutionaries” session on Wednesday May 6 at 3.25. In that session, I’ll be asking Avinash some genuinely tricky questions to really get to the heart of some of the thorniest issues around web analytics today, such as campaign attribution, free versus paid tools, and what, really, the point of all this electronic navel-gazing really is.

But I could use your help. In my comments box below, or via e-mail, suggest the question(s) you’d most like me to ask Avinash next week. This is your big chance to ask Avinash the question you’re too embarrassed/polite/nervous to ask him in person. If you’re going to be at the Summit, then be sure to come to the session to see if your question gets asked; if not, I’ll post a follow-up post here after the event and shall be sure to include Avinash’s answers to any questions from the blog.

So come on – what have you got to lose? It’s not like it’s you who’s going to be picking a fight with one of the industry’s most revered and respected advocates, is it? Leave that to old numb-knuckles here.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

April 21, 2009

Google adds rank information to referral URLs

The Google bus drops of another visitor in VisitorVille An interesting post on the official Google Analytics blog from Brett Crosby appeared last week, in which he announced that Google is to start introducing a new URL format in its referring click-through URLs for organic (i.e. non-paid) results. From Brett’s post:

Starting this week, you may start seeing a new referring URL format for visitors coming from Google search result pages. Up to now, the usual referrer for clicks on search results for the term "flowers", for example, would be something like this:


Now you will start seeing some referrer strings that look like this:


Brett points out that the referring URL now starts with /url? rather than /search? (which is interesting in itself in its implication for the way Google is starting to think about its search engine as a dynamic content generation engine); but the really interesting thing, which Brett doesn’t call out but which was confirmed by Jason Burby in his ClickZ column today, is the appearance of the cd parameter in the revised URL, which indicates the position of the result in the search results page (SRP). So in the example above, where cd=7, the link that was clicked was 7th in the list.

As Jason points out, this new information is highly useful for SEO companies, who can use it to analyze where in the SRPs their clients’ sites are appearing for given terms. Assuming, of course, that web analytics vendors make the necessary changes to their software to extract the new parameter and make it available for reporting (or, alternatively, you use a web analytics package that is flexible enough to enable you to make this configuration change yourself).

As you can see from the example above, there are various other new parameters that are included in the new referring URL, which may prove useful from an analytics perspective (such as the source parameter). It’s also worth noting that whereas the old referring URL is the URL of the search results page itself, the new URL is inserted by some kind of redirection (this must be the case, since it includes the URL of the click destination page).

Using a redirect in this way means that as well as providing more information to you, Google is now also capturing more information about user click behavior, since the redirect can be logged and analyzed. Crafty, huh?

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

February 10, 2009

Googopoly vs Micropoly

dastardly The excellent Tom Slee (aka. Whimsley) has been on my blogroll for some time. Tom writes longish posts about a range of interesting topics – at the moment he’s in the middle of a dissertation about whether Amazon’s recommendation service really does help unknown authors to find an audience and side-step the ‘evil’ publishing business (I have a suspicion about how the answer will turn out).

Part of Whimsley’s appeal is that he seems to toil away in the same kind of semi-obscurity that I do [violins]; but recently he’s had a burst of traffic to his site thanks to a link from Jeff Atwood, who referenced an excellent Whimsley post about the role of Google in shaping the Internet. The short version of Whimsley’s thesis is that because of the way Google’s ranking algorithm (especially the PageRank part) works, Google is as much as driver of what’s popular on the Internet as it is a reflector of that. You should read the whole post to get the full story.

But the interesting point raised by Jeff Atwood is, is Google a monopoly? The comments to Jeff’s post are full of people echoing Google’s official response to the accusation of monoplism – which is that users are free to switch their search engine at any time.But Whimsley makes the point that it’s extremely difficult for an advertiser to switch away from Google. This is not because Google makes it difficult, but because Google provides such an essential service – AdWords – to advertisers that it’s almost impossible to run certain kinds of businesses without it. And, of course, it’s the advertisers that provide Google’s income, not the end users executing searches.

Now, I’m aware that I would be on extremely thin ice, both morally and legally, if I were to speculate on whether Google’s market position constitutes a monopoly. And, just to be clear, I’m not. I’m also not speculating on how it was that Microsoft ended up in trouble with the DOJ (I in fact have very little knowledge of the specifics of the case). All I would say is that Google and Microsoft have a lot more in common than people might like to believe. Neither is staffed by moustache-twirling evildoers (to borrow Whimsley’s excellent phrase); both are trying to develop their business and improve the services that they provide to customers. The challenge is to do so from a strong market position without ending up behaving in a monopolistic fashion.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

November 21, 2008

Brandt Dainow gets over-excited again

hs_dainow_brandt After his breathless article last year, proclaiming Google Analytics to be something like a cross between the second coming and Barack Obama, Brandt Dainow seems to have soured on the big G, proclaiming this week that GA contains ‘disturbing inaccuracies’:

Google Analytics is different from other products in that it has been intentionally designed by Google to be inaccurate over and above the normal inaccuracies that are inevitable. These inaccuracies are so glaring that most people are getting a very false picture of what is happening in their sites.

Dainow’s main beef with GA is two-fold:

  • It treats single-page visit as valid visits (i.e. it doesn’t remove them from visit counts or other related measures)
  • It includes single-page visits in average visit duration calculations

He also remarks that Google did in fact change the way that GA calculated average visit duration last year, but then changed the calculation back in the face of user pressure:

Google intentionally rolled Google Analytics back so that it produced an incorrect average duration…It's been that way ever since -- Google is intentionally and knowingly providing inaccurate numbers because a few people preferred neatness to truth.

Brandt then proposes two alternative measures - ‘retained visits’ (the count of visits with more than one page impression) and ‘true average duration’ (the average duration of retained visits). These metrics are not without some merit – it’s useful to know how many visits contained more than one page view, and the average duration of these visits. But Brandt goes on to assert that these two metrics should replace the standard measurements of visits and average duration in GA and (presumably) other tools. This suggestion is ridiculous, for the following reasons:

  • Contrary to Brandt’s assertions, there are a host of scenarios where a single-page visit is a perfectly valid visit, including, for example, this blog, for crying out loud, which has a high proportion of single-page visits because readers either just read the homepage and leave, or click through to an article from their RSS reader. So chucking all these kinds of visits out is crazy.
  • Whilst the inaccuracy of including single-page visits in average visit duration calculations is known to be a problem, removing these visits from the calculation doesn’t yield a magically ‘accurate’ number, it just yields one that is inaccurate in a different way. You still have no idea how long people looked at the final page of their visit for, and with a two-page visit this can introduce a huge potential inaccuracy.
  • Such standard metrics as exist in the web analytics industry are the result of long and arduous wrangling. There are no sacred cows, but you need a really good reason to exchange a simple and easy-to-understand metric for one which is more complex and offers no discernible benefit.

Whilst I can understand Brandt’s motivations for posting these ideas (which, I imagine, lie somewhere on a spectrum between a genuine desire to spark debate and a desire to generate a lot of traffic to his blog, in which regard I am obliging him), his remarks do irk me a bit (can you tell?), principally because he commits the unpardonable sin of absolutism when talking about web analytics, bandying about words like “truth” and “wrong” when really he is just presenting his own preferences.

When, as an industry, we can’t even agree what constitutes a visit, it’s pretty rich to start decrying one tool or another as ‘inaccurate’ simply because it takes an approach to data that you don’t believe in. And besides, as Brandt surely knows, Google Analytics now has the capability (via its custom segmentation) to calculate the metrics he seeks.

Finally, as every half-experienced web practitioner (of whom Brandt seems to have a low opinion also) knows, the key to success in web analytics is to pick your metrics, stick to them, and measure them continuously as you make changes to your site and your marketing, to see what is working. If you’re looking to increase engagement, and have decided that visit duration is a good measure of this (a debatable point, as it happens), then it doesn’t matter whether you include single-page visits in your duration calculation – if your visit durations are going up, you’re happy. And if your visit durations suddenly jump because your web analytics vendor has changed the way they calculate the metric, this could in fact cause more pain than benefit, perhaps causing you to go to said vendor and say, “Oi! Change it back to how it was!”.

So feel free to read the article, but be warned: it’s not very accurate.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

October 29, 2008

Whence the universal tag?

With another E-metrics Summit over (sans me, sadly), it’s clear that interest in web analytics and online measurement remains high, even (or especially) in these troubled times. But as the technology sets for online advertising and web analytics continue to merge and overlap, one urgent question remains unanswered: what are we going to do about data collection?

You only have to talk to any medium-sized web agency, or marketing manager for an e-commerce site, to understand that online behavior data collection is deeply broken right now – ad servers and web analytics products still collect their data entirely separately, leading to misery for webmasters as they struggle to maintain two (or three, or four…) tracking tags on each page of a site, and misery for analysts as they struggle to reconcile differing numbers from different systems. If you throw ad tags (that is, the snippets of code that actually cause ads to be displayed on a page, such as the AdSense code) into the mix, things become even more complicated.

How we as an industry go about fixing this problem depends on who we care about more: webmasters (I use that term loosely to refer to the gaggle of unfortunates who are charged with maintaining and updating a website), or marketers; or whether we decide that we care about them both. Here are some ideas (none of them new) about how to approach the problem, together with “feel the love” rankings for marketers and webmasters. Feel free to add your own ideas in the comments.


Idea 1: Merge the back-end data

Marketers: ♥♥♥ (out of 5)
Webmasters: ♥ (out of 5)

head-on-collision It’s not uncommon for a site to be using multiple tag code from the same vendor, such as Google (which has separate tags for Adwords, AdSense, GA and DFA/DFP) or our good selves (adCenter, adCenter Analytics, Atlas and others). If this is the case, then the vendor has the opportunity – some would say the responsibility – to join together the data it collects at the back-end to provide a more joined-up and consistent set of reports for marketers.

Google has just taken another decent step in this direction with its inclusion of AdSense clickthrough and CPC data in GA reports. I don’t actually have detail on exactly how they’re doing this, but my best guess is that they’re merging the click data from AdSense with the impression data from Analytics.

You can generalize this approach to a situation where two or more vendors might group together to pool the data they have to provide a consolidated set of reports. This is (sort of) the approach used by Omniture and DoubleClick, where you can use an Omniture tag in place of DoubleClick spotlight tags for conversion tracking.

The crucial pre-requisite is that the different sources of data need to be mergeable; and that means a couple of things. First, the visitor ID needs to be shared between the data sets. This is fairly easy for a single vendor to achieve, but trickier for vendors working together.

The other implication is that it needs to be possible to de-duplicate individual transactions. If you have two tags on your page, one for a web analytics product, and one for an ad server’s conversion tracking, it can actually be pretty challenging to ensure that when a user requests a page, you don’t count the page impression twice. Either you ignore one source of data completely (which is sort of what Google seems to do with AdSense/GA), or you have to employ various heuristics to decide when to throw something away – for example, if you register two identical page requests within a fraction of a second of one another, you can be confident (though not certain) that they are duplicates.

As for the customers? The marketer gets a decent benefit from this approach; they’ll see merged data, though the quality of the data may still leave something to be desired (hidden ‘seams’ where the data has been stitched together can trip up the unwary analyst). The webmaster, on the other hand, sees little benefit – they still have to maintain both tags, especially if each tag has its own unique capability. So this solution is really more of a stepping-stone to a more complete approach than a destination in its own right.


Idea 2: A “tag management” system

Marketers: ♥♥
Webmasters: ♥♥♥♥

trashcan Even if a single vendor or pair of vendors can join forces to combine the data from a couple of tags, most sites are still going to be using multiple tags from multiple vendors, some of whom (by their very nature) are never likely to co-operate on data. Given this state of affairs, one obvious approach is to provide some more technology to the webmaster to help them manage the plethora of tags.

Such a system would be, essentially, a content management system for tagging, enabling the webmaster to define which tags from which vendors should appear in which places on their site. Such a system could come from a vendor, or a sufficiently motivated site owner could create it themselves.

A webmaster using such a system would see a dramatic reduction in the overhead associated with managing multiple tags (once they’d gone through the pain of implementing the tag management system’s tags, that is). Furthermore, a well-implemented tag management system would make it easier for the webmaster to introduce (and remove) tags, reducing some of the friction associated with moving from one analytics or ad serving vendor to another.

The big sticking point, however, with a system like this, is custom tagging. If you actually speak to a site owner about the pain of tag management, having to actually insert a JS file into the page is only a small part of the task – and that step is made much easier by modern content management systems.No, it’s the definition of custom variables, and integrating them with the data coming from the site, that is the challenging and time consuming step. Publishers (who are implementing ad server tag code to host ads on their site) also have the overhead of defining page groups for their content, which is a major task compared to the actual tagging itself.

So in order for such a system to be really useful, it would need to provide a standardized interface between the data coming from the site and the tags – essentially, its own custom variable schema with a defined set of mappings to Omniture, GA, Atlas AdManager, etc.

A company called Positive Feedback (based in London, which means they must be geniuses) has taken a stab at providing a solution here with their TagMan offering. And Tealium is looking to address the custom variables problem with their solution, TrackEvent.


Idea 3: A universal tag

Marketers: ♥♥♥
Webmasters: ♥♥♥

rfid-tag Ah, the universal tag. The holy grail of web analytics (at least, according to some). The idea here is that a group of vendors (perhaps under the augurs of the Web Analytics Association) come together to create a universal piece of tag code that can capture data for any of their services. The upshot is that the webmaster only has to place this single tag on their site, and then configure the tag for whichever vendor solutions they’re using. A side benefit of the “universal tag” is that it can direct beacon requests to the customer’s own data collection systems as well as a third-party’s – avoiding the problem of data ownership.

They key challenge with this approach is that, despite warm words on the topic from web analytics vendors, there’s little real incentive to put a bunch of effort into doing something like this. All the vendors get is a potentially more complicated implementation, and more client mobility. What we may find happening instead is vendors supporting other vendors’ custom variables and event calls  - so vendor A could come in and say “simply switch out your call JS file reference (or add ours), and we’ll start capturing the same data you’re already getting”. It would be interesting to see if any vendors complained that their IP was being infringed by this approach.

A variant of this idea is where a vendor creates a tag architecture and then works with partners to encourage them to abandon or supplement their own data collection with the vendor’s – thus making the vendor’s tag the universal tag. This is Omniture’s approach with Genesis. This approach strikes me as more likely to succeed, since the incentives work differently; it’s in Omniture’s interest to push continued Genesis tracking adoption.

The asymmetry of Omniture’s approach also makes a more general point about the universal tag idea – which is that it seems likely that the vendor who already has the most well-established tagging relationship with a client will be able to leverage that to get other systems’ data collection needs met within the framework of their tag. This is likely to be the web analytics vendor, so we should look to those organizations (rather than, say ad serving companies) to lead on a solution like this.


Idea 4: A universal data collection service

Marketers: ♥♥♥♥
Webmasters: ♥♥♥

InsideWarehouse_300 If you continue the thought process around universal tagging, and vendors looking to provide more and more help to customers with data collection, then you end up with the idea of a vendor providing a fully-fledged data collection service.

I’ve blogged about this idea before, as it happens. The core idea here is that some kindly organization (which has access to a large pool of cheap processing and data storage) takes it upon itself to offer a data collection service that is so flexible, reliable and cheap that many other vendors abandon their own data collection and use the common service.

Part of the service is a “universal tag” which can be configured to capture the data that each analytics/ad serving service needs. But the difference is that the universal tag doesn’t try to generate beacon calls in the correct formats  for the individual services, or even send that data to those services’ data collection servers – it just gathers the data to a centralized repository and the other services access this data programmatically.

This approach combines some of the benefits of the two preceding ideas – for webmasters, the tag management process is radically simplified because one tag can do multiple things. Marketers like it because it would finally deliver numbers which match up. However, the approach wouldn’t work for certain things, such as adserving tags – unless that system was merged together with the data collection service.

Of course, another obstacle to this kind of approach taking root is vendors’ reluctance to entrust their (or their customers’) data to a third-party. This reluctance is liable to increase in proportion to the size of the vendor. So whilst Omniture would like balk at using a data collection from Google or Microsoft in place of its own, a small vendor (such as our pluckly little friends at Woopra) may find such a service invaluable in allowing them to focus on analytics rather than data collection.


So those are my ideas – what are yours? And which one(s) of the above ideas do you think are most likely to gain traction?

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

August 08, 2008

Google integrates DoubleClick with AdSense

google-dclick In a post yesterday on the company blog, Google has announced that it's going to be introducing some DoubleClick-like features into the Google Content network (that is, the group of sites that use AdSense to serve contextual ads). The new functionality includes:

  • Frequency capping and reporting
  • Improved ad quality
  • View-through conversions

These new capabilities are interesting because they are the kinds of functionality that brand (as opposed to direct response) advertisers are likely to be most interested in, and indicate that Google is trying to broaden the appeal of its Content Network inventory in these areas (Google already offers CPM pricing for ads placed on the Content Network).

An interesting detail of the announcement is that Google is now serving a DoubleClick cookie with AdSense ads. The touted benefit to users is that they can now opt out of DoubleClick and AdSense ad targeting with a single click, whilst integration for existing DoubleClick advertisers and publishers will be simplified. The benefit to Google, of course, is that it can start using the behavioral data from the Content Network (which is huge) to be able to sell more targeted ads to their DFA (DART for Advertisers) customers.DoubleClick previously dallied with this kind of functionality in the early part of the decade, but jettisoned the technology back in 2002 in the wake of a bunch of class-action lawsuits accusing it of infringing users' privacy.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

June 30, 2008

AdSense becomes a content delivery network

family_guy_stewie_2 In a move that I predicted on this blog 18 months ago (ok, really I just called out that a colleague had predicted it, but I shall take my prognostication credits where I can), Google has announced that it's going to be using the AdSense network to distribute new episodes (I suppose you could use that terrible neologism, webisodes) of a web-only cartoon from Seth McFarlane (he of Family Guy fame), called "Seth McFarlane's Cavalcade of Cartoon Comedy".

The new program gives a different spin to the Google Content Network, the name that Google has used up until now to describe the AdSense ad network. Monetization will come from in-stream ads, but also from customized animated ads for brands themselves, presumably infused with MacFarlane's trademark dark/smutty humor.

The New York Times (a member of the Google Content Network) gushes about the new development, describing it as a "a bold step into the distribution business, one that, if successful, will surely send shock waves through the entertainment business". But it has a point - by turning AdSense units into real content (albeit content that is designed to generate clicks), Google is in one sense going into competition with its own AdSense content partners - the thousands of websites which host Google ads, and make money from clicks on those ads.

Any publisher who runs ads on their site has to navigate the fine line between making the site's content successful (which will draw users back in future, and keep them clicking around the site), and making the site's ads successful (which pay the bills, but carry users away from the site). This balance is challenging enough when the ads are obviously ads, but when the ad units start to carry compelling content from people like Seth MacFarlane, it could detract from the site's own content. The short-term payoff for the publisher might be elevated click revenue from these webisodes (perhaps we should call them "adisodes"?), but the long-term effect may be decreased engagement with the site's own content, and dissatisfaction from advertisers that the publisher is working with. Only time will tell.

[By the way, my favorite Family Guy character is Stewie, of course. Is there any other choice?]

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon

May 20, 2008

What will Google do next with Google Analytics?

palace So I'm a little late with my obligatory post-E-metrics blog post; my excuse is that I flew straight from San Francisco to Mexico for a vacation, and have just made it back.

A fixed presence at E-metrics summits these days is our good friend, Google - in fact, this year, both Google Analytics and Google Website Optimizer were sponsoring the show (possibly a somewhat inefficient use of marketing dollars, but there you go). In terms of sheer numbers of customers, Google Analytics is the 500 lb gorilla of web analytics, as anyone reading this blog will doubtless know. But where next for the two and a half-year-old wunderkind of web analytics?

One of my favorite things to do at E-metrics is to catch up with friends from the industry, and Google evangelist Avinash Kaushik and I had a very pleasant coffee where we discussed just this topic. Just to be clear, Avinash didn't reveal anything about Google's future plans for GA, but it became clear to me (from that discussion and others) that Google is scratching its head a little about how (or whether) to provide GA to enterprise clients (i.e. big companies). A lot of people are expecting it to, but for all GA's success, it remains a relatively simple analytics package, incapable of the detailed reports that you can pull with Omniture or even Webtrends. And it doesn't really seem to be in Google's DNA to provide a very feature-rich application stack like these other companies provide. In many ways they're the Microsoft Office to Google's Docs & Spreadsheets. So how to square the circle?

Whilst I was sunning myself in Mexico, I had a chance to reflect on how Google could address this challenge and here are my thoughts. As you know, I like to make predictions, just for the fun of sparking a bit of debate, so feel free to use the comments box to let me know precisely what banned substance it is that I'm smoking.


Prediction 1: Google will release a comprehensive mid-tier API for GA

I'm hardly going out on a limb with this prediction - it's been something that Google-watchers have been crying out for for some time (and some people have taken unilateral action to fix). But most talk about APIs has been to provide a programmatic way of pulling existing GA reports - i.e. a "front-end" API. What I'm talking about here (hence the use of the term "mid-tier") is an API into GA's data store that allows pretty much any data set to be extracted to a third-party system and then processed into a report.

Google would have to be very careful not to overwhelm their systems by providing such an API, of course; it would be all too easy to write a call which asked for all the data for a very busy site; but those eventualities could be predicted and prevented fairly easily. Note that I'm also not saying that Google will provide this API for free; there's no reason that it might not choose to charge for access to such a comprehensive data service.

You might be thinking, why would Google release an Analytics API at all? After all, isn't the point of GA to encourage people to use Google's tools to optimize their campaigns, and therefore spend more money with Google? Well, only partially. The main benefit to Google in the deployment of GA is the huge amount of data that it gives them access to. In an API scenario, Google would still control instrumentation of the site and collection of the data, and would therefore still accrue the same benefits from it as they do currently.

Prediction 1a: Related Google products will use the Google Behavior Data API

I've decided to give the new API a name - the Google Behavior Data API - to distinguish it from Google Analytics itself.

If they don't already, Google's various behavior data-consuming products (principally, Analytics and Website Optimizer) will use the same API for data access. You probably won't see any visible change in the products as a result of this. This might already have happened behind the scenes.

Prediction 1b: The Google data collection .js tag will become a "universal" tag

If Google opens up the mid-tier of their system, they'll also (eventually) need to open up the data collection part, making it possible to collect any custom variable or event you want, and subsequently being able to access this through the API. This will require new functionality in the JavaScript tag to support customizable data collection. The importance of this ability will become clear in predictions 2 and 3, below.


Prediction 2: The Google Behavior Data API will create a new industry of "third-party" web analytics tools

I've railed before against new entrants to the web analytics business, asking what value they can possibly add at this stage in the game. But one of the reasons I've been so skeptical in the past is that most of these folks building these kinds of new tools just don't appreciate how much effort has to go into collecting and storing the data in a format that makes it easy to deliver reports, and easy to expand functionality in the future.

A mid-tier Data API would mean that such companies could rely on Google for all the basic data collection, primary processing and warehousing, and just focus on developing interesting new reports. As long as the underlying platform is flexible, this frees up these companies to innovate at the front-end without having to worry about the back-end.

The upshot of this is that you may see web analytics functionality popping up in all sorts of places that it might not otherwise, especially in the SMB market, such as CMS/blog tools, e-commerce systems, sales automation systems and the like. Some of these systems already provide integration with Google Checkout, for example, so using Google's Data API for reporting & analytics would be a logical next step.


Prediction 3: Eventually, even the big guys will use the Behavior Data API

This is the big one, of course, and the most contentious. Why would a company like Omniture or Webtrends, or CoreMetrics hand over data collection to Google? Omniture, for example, has put a lot of effort behind its Universal Tag architecture, and data is as useful to them (or will be, ultimately) as it is to Google.

One chief reason is switching costs. If an Enterprise web analytics vendor wants to convert a GA customer onto their platform, then offering a "no reinstrumentation" proposition is going to be attractive. It is true that the different beacon code provided by different vendors capture different (unique) things, but there would be value in being able to say to a customer "just give us your API key and we'll do the rest", even if it did mean offering a reduced set of reports (although a universal tag with custom variables would offset this issue).

Another reason, however, is cost. It costs web analytics vendors a lot of money to host the servers for data capture and initial processing, and is one of the things that contributes to the rose-tinged bottom line of these companies. It costs Google money too, of course, but Google can probably provision servers more cheaply than anyone else on the planet (except, perhaps, our good selves), and is able to leverage the benefit of having access to the data to offset the cost.

This eventuality also neatly solves the problem of "GA in the enterprise". With the API in place, Google is free to reach agreements with the bigger web analytics vendors that preserves those vendors' positions with their customers whilst allowing GA to get in and get access to the data. "Maverick" implementation of GA by outlying departments of big companies is an increasing problem faced by the major Enterprise vendors. Being able to consume this data in their own tools would decrease the vendors' need to charge for every last byte of data they're collecting, and would enable them to say "Sure! Instrument with GA, and you'll see the aggregate numbers in our tool, too".


So, those are my predictions. Feel free to add yours below.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon


About me



Enter your email address:

Delivered by FeedBurner