Dogfood

June 30, 2009

My face, on the Internet

I have just noticed (rather belatedly, to say the least) that Laura Lee Dooley has posted a complete video of my encounter with Avinash Kaushik at the May E-metrics Summit in San Jose on Vimeo. The sound quality is a little poor, but you can more or less follow the thread of the conversation.

I come across as a cross between Prince Charles, Alastair Campbell and my Dad. Avinash does rather better, particularly around the 26 minute mark. Anyway, watch it for yourself and see who comes out on top.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

January 21, 2009

Omniture stumbles

stumble Chatter is building on the interwebs about Omniture’s recent (and ongoing) latency woes. Looks like both SiteCatalyst and Discover are days behind in processing data (according to messages on Twitter, up to around 5 – 7 days in some cases). And it looks like the situation is still getting worse, rather than better.

I have no insight into the cause of Omniture’s difficulties, or how widespread they are. It may be that they’re related to the December release of SiteCatalyst 14.3, which seems to contain a number of new features which are fairly broad in scope, and which may have had an impact on the platform’s ETL stability. Behind the scenes, Omniture may have made some changes to start integrating HBX’s feature set (especially its Active Segmentation) into SiteCatalyst as a prelude to a final migration push for the remaining HBX customers. Omniture’s certainly not saying – they’ve been conspicuously silent since the start of these problems.

Whatever the cause, I can certainly empathize with this kind of situation – we had all sorts of difficulty dealing with latency issues in my WebAbacus days. And we can be confident that Omniture will (eventually) fix these problems, and will probably not lose very many customers as a result (though, in the teeth of a recession, it can’t be great for attracting new customers).

But do these problems tells us something more about Omniture’s (or any other web analytics company’s) ability to run a viable business? Infrastructure costs are a big part of a web analytics firm’s cost base (at least, those with a hosted offering, which is all of them). And unfortunately, these costs don’t really scale linearly with the charging method that most Enterprise vendors use – charging by page views captured. Factors like the amount a tool is used, and the complexity of the reports that are being called upon, have a big impact on the load placed on a web analytics system, and the resulting infrastructure cost. It’s tricky for a vendor to recoup this cost without seeming avaricious.

As Omniture’s business grows, it has a constant need to invest in its infrastructure to keep pace with the demand for its services. But as the economy has worsened, it must be terribly tempting to see if a little more juice can be squeezed out of the existing kit, especially with its 2008 earnings due later this month. This will be as true for any other vendor (such as Webtrends or Coremetrics) as it is for Omniture, and these remarks shouldn’t be seen as a pop at our friends in Orem. But the nub is, can Enterprise web analytics pay the bills for its own infrastructure cost? Or will all web analytics ultimately need to be subsidized by something else (such as, oh, I don’t know, advertising)?

Your thoughts, please.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

November 21, 2008

Brandt Dainow gets over-excited again

hs_dainow_brandt After his breathless article last year, proclaiming Google Analytics to be something like a cross between the second coming and Barack Obama, Brandt Dainow seems to have soured on the big G, proclaiming this week that GA contains ‘disturbing inaccuracies’:

Google Analytics is different from other products in that it has been intentionally designed by Google to be inaccurate over and above the normal inaccuracies that are inevitable. These inaccuracies are so glaring that most people are getting a very false picture of what is happening in their sites.

Dainow’s main beef with GA is two-fold:

  • It treats single-page visit as valid visits (i.e. it doesn’t remove them from visit counts or other related measures)
  • It includes single-page visits in average visit duration calculations

He also remarks that Google did in fact change the way that GA calculated average visit duration last year, but then changed the calculation back in the face of user pressure:

Google intentionally rolled Google Analytics back so that it produced an incorrect average duration…It's been that way ever since -- Google is intentionally and knowingly providing inaccurate numbers because a few people preferred neatness to truth.

Brandt then proposes two alternative measures - ‘retained visits’ (the count of visits with more than one page impression) and ‘true average duration’ (the average duration of retained visits). These metrics are not without some merit – it’s useful to know how many visits contained more than one page view, and the average duration of these visits. But Brandt goes on to assert that these two metrics should replace the standard measurements of visits and average duration in GA and (presumably) other tools. This suggestion is ridiculous, for the following reasons:

  • Contrary to Brandt’s assertions, there are a host of scenarios where a single-page visit is a perfectly valid visit, including, for example, this blog, for crying out loud, which has a high proportion of single-page visits because readers either just read the homepage and leave, or click through to an article from their RSS reader. So chucking all these kinds of visits out is crazy.
  • Whilst the inaccuracy of including single-page visits in average visit duration calculations is known to be a problem, removing these visits from the calculation doesn’t yield a magically ‘accurate’ number, it just yields one that is inaccurate in a different way. You still have no idea how long people looked at the final page of their visit for, and with a two-page visit this can introduce a huge potential inaccuracy.
  • Such standard metrics as exist in the web analytics industry are the result of long and arduous wrangling. There are no sacred cows, but you need a really good reason to exchange a simple and easy-to-understand metric for one which is more complex and offers no discernible benefit.

Whilst I can understand Brandt’s motivations for posting these ideas (which, I imagine, lie somewhere on a spectrum between a genuine desire to spark debate and a desire to generate a lot of traffic to his blog, in which regard I am obliging him), his remarks do irk me a bit (can you tell?), principally because he commits the unpardonable sin of absolutism when talking about web analytics, bandying about words like “truth” and “wrong” when really he is just presenting his own preferences.

When, as an industry, we can’t even agree what constitutes a visit, it’s pretty rich to start decrying one tool or another as ‘inaccurate’ simply because it takes an approach to data that you don’t believe in. And besides, as Brandt surely knows, Google Analytics now has the capability (via its custom segmentation) to calculate the metrics he seeks.

Finally, as every half-experienced web practitioner (of whom Brandt seems to have a low opinion also) knows, the key to success in web analytics is to pick your metrics, stick to them, and measure them continuously as you make changes to your site and your marketing, to see what is working. If you’re looking to increase engagement, and have decided that visit duration is a good measure of this (a debatable point, as it happens), then it doesn’t matter whether you include single-page visits in your duration calculation – if your visit durations are going up, you’re happy. And if your visit durations suddenly jump because your web analytics vendor has changed the way they calculate the metric, this could in fact cause more pain than benefit, perhaps causing you to go to said vendor and say, “Oi! Change it back to how it was!”.

So feel free to read the article, but be warned: it’s not very accurate.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

November 07, 2008

Applied Insights falls into the gaping Foviance maw

neilmason Ok, it’s not quite Omniture acquiring TouchClarity, but I was delighted to read yesterday that my old pals at Foviance have secured the services of none other than Neil Mason through the acquisition of his company, Applied Insights. Neil (whose ClickZ column you should read) has been flying the flag for web analytics – especially from a marketing-effectiveness point of view – for many years, and he’ll be a great asset to the Foviance team. Congratulations Neil and Paul. Neil’s partner and co-founder of Applied Insights, John McConnell, will leave to pursue independent interests.

The only note of sadness for me in this announcement is that the background to the acquisition is the change in the focus of Foviance’s web analytics efforts away from a WebAbacus-oriented technology/consulting solution to a services-only offering based around Omniture, WebTrends and the like. Of course, this is precisely the right thing for Foviance to do, since the web analytics market is now firmly consolidating around the major players; but I’m sad because WebAbacus (which I spent so many years on) isn’t one of them. But I like to think that its influence in the industry lives on.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

October 29, 2008

Whence the universal tag?

With another E-metrics Summit over (sans me, sadly), it’s clear that interest in web analytics and online measurement remains high, even (or especially) in these troubled times. But as the technology sets for online advertising and web analytics continue to merge and overlap, one urgent question remains unanswered: what are we going to do about data collection?

You only have to talk to any medium-sized web agency, or marketing manager for an e-commerce site, to understand that online behavior data collection is deeply broken right now – ad servers and web analytics products still collect their data entirely separately, leading to misery for webmasters as they struggle to maintain two (or three, or four…) tracking tags on each page of a site, and misery for analysts as they struggle to reconcile differing numbers from different systems. If you throw ad tags (that is, the snippets of code that actually cause ads to be displayed on a page, such as the AdSense code) into the mix, things become even more complicated.

How we as an industry go about fixing this problem depends on who we care about more: webmasters (I use that term loosely to refer to the gaggle of unfortunates who are charged with maintaining and updating a website), or marketers; or whether we decide that we care about them both. Here are some ideas (none of them new) about how to approach the problem, together with “feel the love” rankings for marketers and webmasters. Feel free to add your own ideas in the comments.

 

Idea 1: Merge the back-end data

Marketers: ♥♥♥ (out of 5)
Webmasters: ♥ (out of 5)

head-on-collision It’s not uncommon for a site to be using multiple tag code from the same vendor, such as Google (which has separate tags for Adwords, AdSense, GA and DFA/DFP) or our good selves (adCenter, adCenter Analytics, Atlas and others). If this is the case, then the vendor has the opportunity – some would say the responsibility – to join together the data it collects at the back-end to provide a more joined-up and consistent set of reports for marketers.

Google has just taken another decent step in this direction with its inclusion of AdSense clickthrough and CPC data in GA reports. I don’t actually have detail on exactly how they’re doing this, but my best guess is that they’re merging the click data from AdSense with the impression data from Analytics.

You can generalize this approach to a situation where two or more vendors might group together to pool the data they have to provide a consolidated set of reports. This is (sort of) the approach used by Omniture and DoubleClick, where you can use an Omniture tag in place of DoubleClick spotlight tags for conversion tracking.

The crucial pre-requisite is that the different sources of data need to be mergeable; and that means a couple of things. First, the visitor ID needs to be shared between the data sets. This is fairly easy for a single vendor to achieve, but trickier for vendors working together.

The other implication is that it needs to be possible to de-duplicate individual transactions. If you have two tags on your page, one for a web analytics product, and one for an ad server’s conversion tracking, it can actually be pretty challenging to ensure that when a user requests a page, you don’t count the page impression twice. Either you ignore one source of data completely (which is sort of what Google seems to do with AdSense/GA), or you have to employ various heuristics to decide when to throw something away – for example, if you register two identical page requests within a fraction of a second of one another, you can be confident (though not certain) that they are duplicates.

As for the customers? The marketer gets a decent benefit from this approach; they’ll see merged data, though the quality of the data may still leave something to be desired (hidden ‘seams’ where the data has been stitched together can trip up the unwary analyst). The webmaster, on the other hand, sees little benefit – they still have to maintain both tags, especially if each tag has its own unique capability. So this solution is really more of a stepping-stone to a more complete approach than a destination in its own right.

 

Idea 2: A “tag management” system

Marketers: ♥♥
Webmasters: ♥♥♥♥

trashcan Even if a single vendor or pair of vendors can join forces to combine the data from a couple of tags, most sites are still going to be using multiple tags from multiple vendors, some of whom (by their very nature) are never likely to co-operate on data. Given this state of affairs, one obvious approach is to provide some more technology to the webmaster to help them manage the plethora of tags.

Such a system would be, essentially, a content management system for tagging, enabling the webmaster to define which tags from which vendors should appear in which places on their site. Such a system could come from a vendor, or a sufficiently motivated site owner could create it themselves.

A webmaster using such a system would see a dramatic reduction in the overhead associated with managing multiple tags (once they’d gone through the pain of implementing the tag management system’s tags, that is). Furthermore, a well-implemented tag management system would make it easier for the webmaster to introduce (and remove) tags, reducing some of the friction associated with moving from one analytics or ad serving vendor to another.

The big sticking point, however, with a system like this, is custom tagging. If you actually speak to a site owner about the pain of tag management, having to actually insert a JS file into the page is only a small part of the task – and that step is made much easier by modern content management systems.No, it’s the definition of custom variables, and integrating them with the data coming from the site, that is the challenging and time consuming step. Publishers (who are implementing ad server tag code to host ads on their site) also have the overhead of defining page groups for their content, which is a major task compared to the actual tagging itself.

So in order for such a system to be really useful, it would need to provide a standardized interface between the data coming from the site and the tags – essentially, its own custom variable schema with a defined set of mappings to Omniture, GA, Atlas AdManager, etc.

A company called Positive Feedback (based in London, which means they must be geniuses) has taken a stab at providing a solution here with their TagMan offering. And Tealium is looking to address the custom variables problem with their solution, TrackEvent.

 

Idea 3: A universal tag

Marketers: ♥♥♥
Webmasters: ♥♥♥

rfid-tag Ah, the universal tag. The holy grail of web analytics (at least, according to some). The idea here is that a group of vendors (perhaps under the augurs of the Web Analytics Association) come together to create a universal piece of tag code that can capture data for any of their services. The upshot is that the webmaster only has to place this single tag on their site, and then configure the tag for whichever vendor solutions they’re using. A side benefit of the “universal tag” is that it can direct beacon requests to the customer’s own data collection systems as well as a third-party’s – avoiding the problem of data ownership.

They key challenge with this approach is that, despite warm words on the topic from web analytics vendors, there’s little real incentive to put a bunch of effort into doing something like this. All the vendors get is a potentially more complicated implementation, and more client mobility. What we may find happening instead is vendors supporting other vendors’ custom variables and event calls  - so vendor A could come in and say “simply switch out your call JS file reference (or add ours), and we’ll start capturing the same data you’re already getting”. It would be interesting to see if any vendors complained that their IP was being infringed by this approach.

A variant of this idea is where a vendor creates a tag architecture and then works with partners to encourage them to abandon or supplement their own data collection with the vendor’s – thus making the vendor’s tag the universal tag. This is Omniture’s approach with Genesis. This approach strikes me as more likely to succeed, since the incentives work differently; it’s in Omniture’s interest to push continued Genesis tracking adoption.

The asymmetry of Omniture’s approach also makes a more general point about the universal tag idea – which is that it seems likely that the vendor who already has the most well-established tagging relationship with a client will be able to leverage that to get other systems’ data collection needs met within the framework of their tag. This is likely to be the web analytics vendor, so we should look to those organizations (rather than, say ad serving companies) to lead on a solution like this.

 

Idea 4: A universal data collection service

Marketers: ♥♥♥♥
Webmasters: ♥♥♥

InsideWarehouse_300 If you continue the thought process around universal tagging, and vendors looking to provide more and more help to customers with data collection, then you end up with the idea of a vendor providing a fully-fledged data collection service.

I’ve blogged about this idea before, as it happens. The core idea here is that some kindly organization (which has access to a large pool of cheap processing and data storage) takes it upon itself to offer a data collection service that is so flexible, reliable and cheap that many other vendors abandon their own data collection and use the common service.

Part of the service is a “universal tag” which can be configured to capture the data that each analytics/ad serving service needs. But the difference is that the universal tag doesn’t try to generate beacon calls in the correct formats  for the individual services, or even send that data to those services’ data collection servers – it just gathers the data to a centralized repository and the other services access this data programmatically.

This approach combines some of the benefits of the two preceding ideas – for webmasters, the tag management process is radically simplified because one tag can do multiple things. Marketers like it because it would finally deliver numbers which match up. However, the approach wouldn’t work for certain things, such as adserving tags – unless that system was merged together with the data collection service.

Of course, another obstacle to this kind of approach taking root is vendors’ reluctance to entrust their (or their customers’) data to a third-party. This reluctance is liable to increase in proportion to the size of the vendor. So whilst Omniture would like balk at using a data collection from Google or Microsoft in place of its own, a small vendor (such as our pluckly little friends at Woopra) may find such a service invaluable in allowing them to focus on analytics rather than data collection.

 

So those are my ideas – what are yours? And which one(s) of the above ideas do you think are most likely to gain traction?

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

September 16, 2008

Phorm gets the all-clear from the UK Goverment (kinda)

[Update 10/1/08: BT has announced that it will commence a new trial with Phorm to start September 30 in the UK. The trial, in accordance with the conditions below, is opt-in]

 

phorm_logo Beleaguered behavioral targeting outfit Phorm appears finally to have caught a bit of a lucky break - the UK Government has (belatedly) responded to the EU's queries about Phorm's business practices by saying that Phorm does not break EU data collection/retention laws. But the Department for Business, Enterprise and Regulatory Reform (BERR) - the Government department tasked with assessing Phorm's business and responding to the EU - has placed the following conditions on its approval (from an excerpt of the full letter sent to the EU which is reproduced on The Register - my highlighting added):

  • The user profiling occurs with the knowledge and agreement of the customer.
  • The profile is based on a unique ID allocated at random which means that there is no need to know the identity of the individual users.
  • Phorm does not keep a record of the actual sites visited.
  • Search terms used by the user and the advertising categories exclude certain sensitive terms and have been widely drawn so as not to reveal the identity of the user.
  • Phorm does not have nor want information which would enable it to link a user ID and profile to a living individual.
  • Users will be presented with an unavoidable statement about the product and asked to exercise a choice about whether to be involved.
  • Users will be able to easily access information on how to change their mind at any point and are free to opt in or out of the scheme.

The two key bullets here are the last two - Phorm will be required to operate this service as an opt-in service only, with clear language and functionality enabling even opted-in users to opt out at any time. And  BERR states that it will be keeping a close eye on Phorm to ensure that it continues to comply with these conditions.

The news may do a little to shore up Phorm's deflating stock price, which has lost about 80% of its value since the heady days of March. But it's hard to imagine Phorm building much of a sustainable business on the back of an opt-in only system - it's going to be an incredibly hard sell for the ISPs that Phorm partners with (BT, TalkTalk and Virgin Media being the only ones mentioned so far). The only model I can think of is that the ISPs offer reduced rates in exchange for opting into the targeting system; but that negates the very purpose of implementing the system in the first place - to shore up sagging ISP revenues in the wake of the last few years' broadband price wars. I fear that Phorm is not out of the woods yet - especially if the recent happenings at its competitor NebuAd are anything to go by.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

September 11, 2008

Yahoo updates IndexTools terms & conditions

safe Yahoo is not letting the grass grow under its feet with its integration of IndexTools. Today IndexTools partners received an e-mail from Yahoo informing them of a change to the terms & conditions of the service, which need to be agreed to by October 15 in order to retain access to IndexTools.

The e-mail calls out a change to the Ts & Cs which require IndexTools partner customers (i.e. the site owners themselves) to place the following (or equivalent) language on their websites (my highlighting):

“Third-Party Web Beacons: We use third-party web beacons from Yahoo! to help analyze where visitors go and what they do while visiting our website. Yahoo! may also use anonymous information about your visits to this and other websites in order to improve its products and services and provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by Yahoo!, click here.”

Yahoo goes on to say that it will be auditing client sites and will disable accounts where this verbiage has not been included on the site (I wonder how effective this will be in practice - it may just be sabre-rattling).  Partners and client sites have until October 15 to comply.

The comment from the IndexTools partner who forwarded on this information was that it would be a challenge for their clients to implement this - from a logistical perspective, if nothing else. But I can understand Yahoo's move here - part of the benefit of a company like Yahoo (or Microsoft, or Google) offering a web analytics service is the secondary use of the resulting data for ad targeting purposes (something that Yahoo is very good at).

For comparison, here is (a shortened version of) the paragraph that Google requests its customers insert onto their sites:

“[...]  Google Analytics uses “cookies”, which are text files placed on your computer, to help the website analyze how users use the site. [...] Google will use this information for the purpose of evaluating your use of the website, compiling reports on website activity for website operators and providing other services relating to website activity and Internet usage.  Google may also transfer this information to third parties where required to do so by law, or where such third parties process the information on Google's behalf. Google will not associate your IP address with any other data held by Google. [...]  By using this website, you consent to the processing of data about you by Google in the manner and for the purposes set out above.”

This wording does not seem to imply that Google will reuse the data for other purposes, including ad targeting (IANAL, however); though Google did introduce some reuse of data (and some options for controlling it) with their data sharing feature that they launched back in March.

The corresponding paragraph from adCenter Analytics is:

Microsoft may retain and use user data subject to the terms of the Microsoft privacy statement and publish in aggregate or average form such information in combination with information collected from others’ use of adCenter Analytics except that Microsoft will not disclose to any third parties any user data collected by adCenter Analytics from your websites in a manner that (i) contains or reveals any personally-identifiable information or (ii) is specifically attributable to you or your websites.

The Microsoft privacy statement does say that we may use the information we collect to deliver services, "including personalized content and advertising".

So Yahoo is not doing anything here that hasn't been done before; and, as I've said several times before, you can't expect a company to provide a free web analytics service of the quality of IndexTools and not attempt to monetize it in some way. What is a little different about Yahoo's approach, though, is that it's taking a sterner line on actual implementation of the data reuse language, and actually threatening to disable accounts where the wording hasn't been added. This implies that Yahoo anticipates that it may need to defend its usage of this data (at least from a PR perspective), and wants to ensure that it can point to this wording on any site that uses IndexTools, so that users can't complain that their behavior data is being reused without their consent.

[Update 9/11/08: Added a reference to Google data sharing]

[Update 9/12/08: Corrected IndexTools' name - duh]

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

August 08, 2008

Google integrates DoubleClick with AdSense

google-dclick In a post yesterday on the company blog, Google has announced that it's going to be introducing some DoubleClick-like features into the Google Content network (that is, the group of sites that use AdSense to serve contextual ads). The new functionality includes:

  • Frequency capping and reporting
  • Improved ad quality
  • View-through conversions

These new capabilities are interesting because they are the kinds of functionality that brand (as opposed to direct response) advertisers are likely to be most interested in, and indicate that Google is trying to broaden the appeal of its Content Network inventory in these areas (Google already offers CPM pricing for ads placed on the Content Network).

An interesting detail of the announcement is that Google is now serving a DoubleClick cookie with AdSense ads. The touted benefit to users is that they can now opt out of DoubleClick and AdSense ad targeting with a single click, whilst integration for existing DoubleClick advertisers and publishers will be simplified. The benefit to Google, of course, is that it can start using the behavioral data from the Content Network (which is huge) to be able to sell more targeted ads to their DFA (DART for Advertisers) customers.DoubleClick previously dallied with this kind of functionality in the early part of the decade, but jettisoned the technology back in 2002 in the wake of a bunch of class-action lawsuits accusing it of infringing users' privacy.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

July 18, 2008

Online Ad Business 101, Part III - Ad Networks

So far in my nascent Online Ad Business 101 series, I've covered the overall advertising value chain, and looked at a superficial level at how an ad 'call' is actually handled. This installment brings together themes from those two first posts, by taking a look at ad networks.

As I have mentioned before, ad networks are in the media representation business. Even the biggest publishers don't typically have the resources to sell every last scrap of their available inventory day in, day out, so they hand over a portion of their inventory - the remnant inventory - to ad networks. Small publishers, on the other hand, have no resources of their own to sell their inventory, so they have to go to the market via networks. The networks aggregate all the inventory that they have available and then sell this inventory to advertisers.

Ad networks make money by selling the inventory for a higher price than they buy it. They can achieve this in a number of ways, which I shall list in broad order of sophistication/difficulty (with the easiest first):

  • Simple arbitrage: The network buys from the publisher at a rock-bottom price (because the publisher would literally make nothing from the inventory otherwise) and sells the inventory on in larger aggregated blocks at a slightly higher price. The "value add" is small - the network is simply allowing the advertiser to soak up some remaining part of their budget without having to go to lots of individual publishers.
  • Vertical aggregation: The network buys lots of small parcels of inventory in specific verticals (e.g. travel). It then aggregates the inventory for sale according to these segments, enabling it to charge a bit more. The advertiser is able to extend the reach of their campaign in a target audience without having to deal with lots of publishers.
  • Price model arbitrage: The network buys inventory on a CPM (cost-per-thousand impressions) basis, providing the publishers with a nice, reliable revenue stream. But it sells the inventory on a CPC (cost-per-click) or CPA (cost-per-acquisition) basis, reducing the risk of the inventory for advertisers (who are only paying for success), and absorbing the associated risk itself. The network makes money on the difference between the CPM it pays publishers and the "effective CPM" (eCPM) it charges advertisers.
  • Platform specialization: Advertising on emerging-media platforms such as video and mobile still requires quite a lot of specialized technology, forcing Rich Media vendors to build close relationships with the publishers that they deal with. Over time, many of the vendors in this space have gone the extra mile for their advertiser customers and turned themselves into networks, making it easier for advertisers to buy ads in these new formats across a range of publishers.
  • Behavioral targeting: The network buys inventory from publishers, and when the ad call is passed over to the network, it drops a third-party cookie. By doing this across all its publisher clients, the network can build up a profile of users by cookie ID - knowing, for example, that cookie ID XYZ123 has visited ten sites about watersports in the past week. The network can then use this information to add value to the inventory it's reselling, enabling advertisers to buy "active surfer dudes" and the like.
 
Can you give me some examples?

Sure. Here are some examples of ad networks which (roughly) map to the types above. In practice, of course, most ad networks employ a combination of the above techniques to maximize the margin on the media they represent.

Simple arbitrage: Advertising.com
adcom No doubt my description of Advertising.com as a "simple arbitrage" network will generate howls of protest from AOL (Advertising.com's parent company). But one of Advertising.com's main value propositions is the breadth of sites and audience it can deliver. Because Ads.com deals with so many publishers, advertisers can almost always find some inventory that maps onto the audience they're looking for, and are happy to pay a (relatively) modest fee for the privilege.

Simple arbitrage 2: Google Content Network (AdSense)
adsense_logo_main No discussion of networks would be complete without a mention of Google AdSense. AdSense provides a way for lots of small publishers to make inventory available to the pool of advertisers that use Google Adwords - in addition to their ads appearing next to Google's search results, these ads can also appear on the small publishers' sites; the ads are matched with the sites on a contextual basis (the content of the site is crawled to extract keywords which then stand in for the keywords that advertisers normally bid against for paid search results).

A crucial feature of this system is that the publisher is paid on a cost-per-click basis, so assumes a big chunk of the risk - if no one clicks, the publisher doesn't get paid. Google makes its money on the margin between the cost-per-click they pay the publisher, and the cost-per-click they charge the advertiser. The value proposition lies in connecting lots of small (and large) advertisers to lots of small publishers who are running sites which have a really good content match to the advertiser's offering. In other words, if you manufacture Mongolian nose-flutes, AdSense allows you to get your ads onto all the Mongolian nose-flute fansites out there, with very little effort.

Vertical aggregation: Martha's Circle
marthascircle Martha's Circle is the (rather winsome) name for the ad network run by Martha Stewart Omnimedia. It's a classic example of a publisher/media owner extending their brand (and saleable audience) by signing up sites in the same sector (in this case, lifestyle) and creating a niche network. For an advertiser wanting to reach thirty-something women with an interest in the home, this kind of network is a no-brainer when building a media plan. Glam.com is another good example, as is Fox Interactive Media.

Price Model Arbitrage: DRIVEpm
drivepm_logo DRIVEpm is Microsoft's own advertising network, acquired with the acquisition of aQuantive last year. DRIVEpm styles itself as a "performance" network, meaning that it uses a variety of techniques (amongst them, price model arbitrage) to enable advertisers to buy inventory on a cost-per-performance basis, whilst still paying publishers on a cost-per-thousand basis. Scott Howe, former GM for DRIVEpm and now VP for the Microsoft Advertising business unit, wrote a great article back in 2005 about some of the dynamics in a performance network from the perspective of a media buyer looking to get the best ROI. Well worth a read.

Platform specialization: VideoEgg
image VideoEgg is a video advertising network (the clue's in the name, I guess). Its offering is a classic mix of innovative ad unit technology (their latest offering is something called "AdFrames") with a network attached. Another feature of VideoEgg is that it offers advertisers a CPE (cost-per-engagement) model for buying video advertising, performing the same kind of price model arbitrage that DRIVEpm is doing. Their publisher audience is widget & app developers for social media environments such as Facebook and MySpace, ensuring that their value proposition to advertisers is further differentiated (essential as the online video market becomes more crowded).

Behavioral targeting: Tacoda
image Tacoda is also part of AOL's Platform A unit, and markets itself as the world's "first" behaviorally-targeted ad network (a hard claim to substantiate, but equally hard to refute). Tacoda tracks behaviors of the visitors to its network of over 4,000 sites and uses this information to associate behavioral profiles with those users. It then sells inventory on these sites on a user-target-group basis, rather than by group of site or content area. These "audience segments" have names like "Family Chef" and "Photo Bug".

 

How does it actually work?

Understanding how ad networks actually serve their ads is essential in understanding how some of the above business models (especially targeting) work. I'll cover two scenarios - a small publisher/small advertiser scenario, and a large publisher/large advertiser scenario.

Small publisher/small advertiser
A small publisher will insert their ad network's ad code directly onto their site - in many cases, this is the only ad code the publisher is using, and is serving 100% of that publisher's ads. On the other side of the fence, the ad network may provide a web UI to enable advertisers to create or upload ads, and (possibly) allow the advertiser to choose which sites (or groups of site) those ads will appear on. The diagram below summarizes this (thanks to Right Media for the advertiser & publisher people icons):

image

Examples of this kind of system are Google AdSense and the Yahoo Publisher Network (these are often called "self-service" ad networks). The actual ad delivery model is pretty simple - the same ad server (the network ad server) functions as both publisher and advertiser ad server (the ad call path is on the left side of the diagram above).

Large publisher/large advertiser
When it comes to large publishers using ad networks to deliver inventory to large advertisers, things get more complicated. In this scenario, both the publisher and the advertiser will likely have their own ad servers. The publisher will configure its ad server to "hand off" a certain block of inventory to the network, whilst on the advertiser side, the advertiser ad server will be configured to buy a certain portion of a campaign from a network (or networks). So the ad call has to be passed from the publisher to the advertiser via the network:

image

 

It's the point at which the ad call passes through the network ad server when the network is able to drop a cookie on the user's machine, enabling behavioral tracking and targeting (assuming, of course, that the users don't delete their cookies in the meantime).

Of course, hybrids of the two models above also exist: large publishers will sometimes hand over some of their inventory to a self-serve network, in particular, in which case the publisher's ad server calls the network ad server, which serves the ad itself.

This picture also becomes more complicated when you consider that many ad networks will pass the ad request on to another ad network if they themselves can't fulfil it (or fulfil it economically). So, for example, a targeted ad network may receive an ad call from a user it has no information about. Rather than serve an ad for that user at a low cost (and thereby preventing that ad impression from being served to another user at a higher cost), the ad network passes the ad call on to a "value" (read: cheap) network. So in the picture above you can have two, or even three or four, ad networks passing the ad call around like a hot potato.

This game of pass-the-parcel isn't really very good for the user, who has to wait a long time to see the ad (which really hurts the advertiser most, since a slow-loading ad might as well not render at all); and it's also not great from a security point of view, because the publisher is ceding control of a portion of their site's screen real-estate to an unknown network and an even more unknown advertiser. Which is why ad exchanges are emerging which provide a centralized clearing-house for inventory, thus dispensing with the round-robin approach described above.

Online Advertising Business 101 - Index of all posts

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

July 06, 2008

Online advertising's dirty secret: Malvertising

dodgy_spyware_ad There's been a lot of chatter recently about the "dark side" of online advertising, in particular, the activities of companies like NebuAd and Phorm using somewhat shady techniques to gather behavioral data about users and using this data to target ads. I've even blogged about it myself. And click fraud remains a significant challenge to confidence in online advertising.

But whilst the term "click fraud" generates about 25 million results on the world's best search engine, the term "malvertising" generates only 2,170. Since you may not be familiar with the term, I'll offer you the definition I found on urbandictionary.com (sadly, there's no Wikipedia entry for Malvertising):

Malvertising:
An Internet-based criminal method for the installation of unwanted or malicious software through the use of Internet advertising media networks and exchanges.

So Malvertising = malware + advertising. See? Clever (if ugly). But despite its goofy name and low profile, malvertising arguably represents a greater threat to the online advertising industry than either unscrupulous behavioral targeting or click fraud.

Malvertising can take a number of forms, typically along the following lines:

  • Ads that try to trick you into going to a site, where malware is installed (e.g. those "Your PC is infected! Click here to install our anti-virus software NOW!" ads)
  • Hijacking legitimate ad clicks and redirecting users to sites which encourage them to install malware
  • Malware disguised as ads, that exploit security vulnerabilities in web client software (such as this one in Adobe Flash), either to install further malware, or to scrape PII from the browser

The enormous reach of modern ad networks, plus the ability to place malicious code on thousands of otherwise innocent sites, makes distributing malware via advertising networks a very attractive proposition.

The malware itself is usually focused on stealing users' personal data (e.g.login details for broker accounts), taking control of the user's machine for distributed denial-of-service attacks (turning it into a zombie), or convincing the user to spend their own money buying malware "removal" software after they have been "infected".

But it's not just the end user that suffers. The publisher who has unwittingly hosted the malvertising can find themselves besieged by angry users demanding to know why they've been served malware from their site. If the ad was served via an ad network, the publisher will possibly cancel their contract, depriving the ad network of their business (ESPN has already ditched ad networks altogether, although not ostensibly for this reason). And advertisers who want to use increasingly sophisticated ads with high levels of interaction may find that they are unable to because these ads are some of the ones most likely to contain malware, and so are blocked by the ad networks and publishers the advertiser wants to deal with.

Furthermore, if end users lose confidence in the ads they're being shown, either in terms of where a click will lead, or whether the ad itself is malicious, this will drive down ad clicks and drive up the installation of ad blocking software - both of which will have a disastrous effect on the industry.

 

What can be done?

The malvertising problem is not insoluble, but it will demand a concerted effort from all industry participants to fix (or, at least, contain) it. I'll blog about these topics again in more detail, but the main areas of attention will need to be:

Creative/URL scanning: Ad networks and third-party ad servers will need to start scanning creatives and destination URLs as a matter of course. The technical challenge of scanning Flash or Silverlight-based creatives is considerable, since malicious ads will take steps to cover their tracks, such as obfuscating code, and behaving normally if they detect they're being scanned. Ultimately, the co-operation of Adobe and Microsoft may be required to put in place more robust systems for determining an ad's provenance.

URL scanning is a more manageable problem - all ad networks should ensure that ad click destinations do not lead to sites which are known to host malware.

Creative template quality: Malware has been known to sneak into ads through sloppy management of creative templates - if an agency uses an infected template, then of course all ads created using that template will be infected. This problem will grow as larger numbers of smaller advertisers start to use online services which provide Flash templates that are customized to order - the advertisers will not have the technical sophistication to determine whether the resulting ads are safe or not. Some kind of 'quality seal' may be required for these services, though that will not stop bogus ones springing up.

Outlawing redirect-based tracking: At the moment, many ad networks use redirects to track ad clicks, meaning that a single ad click can be passed around many ad networks before the user is finally deposited at the advertiser site. This system is open to abuse via "click hijacking", where a bogus network sends some clicks for legitimate ads to malware sites. Publishers should inform ad networks that redirects for tracking are unacceptable, which will mitigate this problem.

Ad isolation: At the moment, an ad which is served with a page (rather than via an iframe) has access to that page's DOM, which means that if the ad is malicious, it can crawl the DOM, looking for user PII (such as usernames and passwords for the site the ad is on, or credit card details). Microsoft is working on some technology to isolate ads that are served on its network, so that even if they're served in a first-party context (i.e. not via an iframe or redirect), they are unable to access the page DOM. Other publishers & networks should consider doing the same.

Industry co-operation: Currently, very little specific information about malware is shared within the industry, partly for noble reasons (it can be difficult to be specific about a malware instance without revealing user PII) but mostly for ignoble ones (no ad network wants to advertise the fact that they've been subject to a malware attack). This must change - the industry needs to find a way to share this kind of data without an individual network or publisher having to step into the firing line.

 

As I said, I'll return to this subject with some more thoughts on some of the above issues. In the meantime, a great resource for information on malvertising is Spyware Sucks, a blog run by Microsoft MVP Sandi Hardmeier, who tirelessly chronicles various malvertising outbreaks. It makes for sobering reading.

del.icio.usdel.icio.us diggDigg RedditReddit StumbleUponStumbleUpon FacebookFacebook Live FavoritesLive Favorites

Twitter Updates

    follow me on Twitter

    Search

    Subscribe

    Enter your email address:

    Delivered by FeedBurner