Dogfood

March 12, 2008

Phorm over function

phormchart

There's been plenty of buzz (more of the angry hornet variety rather than the just-inhaled-a-lungful-of-dope variety) about Phorm of late, precipitated by a press release that the company put out on Feb 14 in the UK, announcing partnerships with three major UK ISPs to provide a system "...which ensures fewer irrelevant adverts and additional protection against malicious websites". Critics of the system  (led by noted UK cage-rattler, The Register) claim that the technology is little more than spyware by another name. The negative press around Phorm's announcement has caused at least one of their ISP partners to back away from the deal, and cause their stock to plummet by more than 30%. It looks like this could be the latest in an increasingly long line of bungled targeting announcements from the industry (Beacon, anyone?). But what went wrong?

What is Phorm?

Phorm as a company is the new name for 121Media, a UK AIM-listed company who started out producing a browser toolbar which tracked your page usage to provide a social media environment, connecting you with other people who were looking at the same page. Ad-funded, the toolbar quickly picked up a reputation for being spyware (even though I agree with Phorm's protestations that it was really adware, which is better, but still tarred with the same brush), so it was dropped and the company renamed Phorm.

The new service Phorm has launched is called Webwise (not to be confused with the BBC site of the same name). Essentially it is technology that ISPs install at their data centers which analyzes the URL and textual content of web pages being served and uses this information to place users into interest categories so that they can be served behaviorally-targeted ads. The technology does this by intercepting the page request and sending a copy of it to a "Profiling" server which extracts keywords and uses this information to assign users to interest groups:

 

phormslide

 

The same technology has a function to alert the user to phishing web sites; since the URL and content is being examined, phishing sites can be spotted and blocked. This functionality forms a core part of Webwise's value proposition to users.

The other part of the alleged value to users is that this profiling process does not permit the ISP to associate a user's profile with their IP address; that means that the ISP (and any government agency who subpoenaed the ISP's records) could not re-associate the Phorm data with a customer record (ISPs can tell which IP address was assigned to which customer at a particular time). The Phorm system does also not store any of the page information or extracted keywords; once the interest "channel" has been arrived at, all the rest of the data is deleted.

So Phorm claims that its system is a real step forward for user privacy on the Internet, whilst at the same time enabling advertisers to reach their audience more effectively. But the industry (and the public) haven't really seen it like this.

 

Why all the fuss?

Phorm's announcement was always bound to generate a certain amount of controversy, because it's in the sensitive area of behavioral profiling & targeting.  But there has been a particularly virulent reaction in the UK, which, whilst started by sites like the Register, has now spread to the "mainstream" media.

Some of the reasons for the fuss are (comparatively) silly things - for example, the renaming of the company from 121Media, which has just made people nervous, especially given the previous company's adware history, or the fact that the company operates out of serviced offices in the UK and doesn't really have a physical address in the US.

A more serious blunder on Phorm's part is their failure to anticipate the scrutiny that this kind of system would be placed under. In this kind of environment, given the firm's history, absolute transparency is essential, and Phorm hasn't provided this. There are still unanswered technical questions about Phorm's system, such as how it manages the opt-out (does data still get collected, or not?), and there have been inconsistencies in the claims that Phorm has made about third-party privacy audits of their software.

Phorm has also made the mistake of launching prematurely, with many of their partnerships still only half-baked. At the moment there is no benefit to users being delivered, because none of the systems that Phorm has announced are actually live within ISPs, and so all the focus is on the downside. Phorm would have done much better to wait until the service was fully baked with at least one of their partners and they had some real users onboard who could testify to the increased relevance of ads and how comfortable they were with their privacy with Phorm, before making a big splash. The press release looks like the product of an over-zealous PR agency looking to ensure their monthly coverage targets were being hit. Well, they've certainly done that.

 

What can we learn?

The main problem here is a poorly thought-out balance of benefits for 'costs' in this offer. Phorm have claimed that this system protects user privacy, but it doesn't really; it's just an ad targeting system with a better-than-average approach to protecting privacy. Users who are opted into Phorm will still receive cookies and targeted ads from other ad networks, and their behavior will still be tracked by those other networks.

Apart from the phishing protection (which is already baked into IE7 and Firefox anyway, and turned on by default), there's nothing in the Phorm system which provides users with protection of their personal data across the Internet. The only way that Phorm's entry into this market can elevate user privacy overall is if other providers of targeted ads who are storing more data decide to pack up and go home - which I doubt will happen.

The furore also highlights the challenges of partnering with ISPs for this kind of service. Because ISPs are the gatekeepers of the Internet (and because, for many people, switching ISPs is a pain in the a**), users are very sensitive to any perceived exploitation of this relationship by the ISPs. In the UK, ISPs are some of the best-known Internet brands, but also some of the least liked. Ironically the cause of this dislike (poor customer service) is a direct result of the price war that has precipitated ISPs' interest in this kind of service, as they are receiving a cut of the revenues, of course.

Ultimately the tale makes clear how careful any company has to be in launching a service like this - the balance of benefits has to be clearly stacked in favor of the user. As Chris Williams of The Register said during an interview with Phorm's CEO, Kent Ertegrul, said:

"a big difference I see between what you're doing and what Google does is that people feel that they're getting a service from Google. I don't think people feel they'll be getting a service from you"

It will be interesting to see how the Phorm saga plays out. Perhaps one day it'll find its way onto an online marketing MBA module syllabus.

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

March 07, 2008

News, news, news...

Sigh. Blog post topics seem to be like buses - you wait ages for one to come along, and then three come along all at once. Actually, I've got four things to post about, but I'm going to leave two until after the weekend. Here are the other two. Funnily enough, they're related - both are about benchmark data.

1. Compete.com cashes in

Online traffic benchmarking service Compete.com has been bought by UK-based market research firm TNS (Taylor Nelson Sofres). This is a good result for the folks at Compete, who have been waging a four-way battle with Quantcast, Alexa, and Comscore. Funnily enough the deal isn't stellar, despite the significant attention that Compete (and benchmarking services in general) has been getting recently - it's only a guaranteed $75m, with another $75m payable on achievement of revenue targets. Compete Inc has accepted about $43m in investment since it started in 2,000, so I guess the investors are pleased but not delighted.

The rest of TNS's business is pretty traditional market research stuff, so it'll be interesting to see how they integrate/expoit Compete's capabilities. Moving the footprint outside of the US seems like one obvious goal they may look to achieve in the not-too-distant future.

2. Google Analytics rolls out new data sharing feature

Logging onto Google Analytics this week, I was interested to see the new data sharing options that the product is making available:

image

So the key option in the above list is #2 - allowing GA to share your data with its "benchmarking service", where data from sites in a similar industry will be aggregated together for benchmark reports, like the sample below:

image

This is a smart thing for Google to do, as it provides an incentive for GA users to share their data by providing them with a solid benefit in return. It will be interesting to see how GA determines which industry a site is in; I guess they will mine the search index for those sites and use some behavioral targeting-type techniques to drop a site into a category based upon the words that appear on the site's pages. I have no idea how they'll categorize my site - they'll probably drop it into a "blogs" industry segment, since Google already knows that my site is a blog.

The other smart part of this move is to make it easy to turn off data sharing altogether. I presume that this means that no GA data will be used to inform decisions about, for example, keyword ranking in Adwords; though GA's terms of use are still a little vague on this point. As I was discussing with Brian Clifton a couple of weeks ago in London, our part of the web analytics industry (companies that offer services for free, and monetize the service indirectly) need to be super-clear about how the data is going to be used.

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

November 06, 2007

What's in an ANID?

ukpassport250 I promised some time ago that I'd post more information on the process by which we get the demographic data into Gatineau. As I mentioned before, this information comes ultimately from data that people provide when they sign up for a "Live ID" to access one of Microsoft's online services, such as Messenger or Hotmail (this ID was previously known as a "Passport" ID).

I also mentioned that we are careful (to say the least) to anonymize the data before we pass it over to Gatineau. The anonymization process relies on the creation of an intermediate "Anonymous ID", or ('cos we just love acronyms), the "ANID". But how does this process work? Well, my colleagues over at Microsoft's Trustworthy Computing Initiative have posted an excellent white paper which explains how the ANID works and where it fits into the overall schema of the IDs and cookies you'll get when you use our online properties. The paper's here (PDF format):

Privacy Protections in Microsoft’s Ad Serving System and the Process of “De-identification”

I have only one beef with this white paper, and that's its rather lengthy title. Read it for a clear view of how we go about protecting the privacy of our online users, whilst at the same time using behavioral and demographic data to add value to the advertising inventory that we sell on our network.

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

June 21, 2007

More on the Comscore cookie deletion study

fdcookies A bit late to the punch with this one (as ever), but Comscore have published a white paper which contains more detail about the methodology behind their recent study which showed that both first and third-party cookies are deleted more than we thought (or at least more than we hoped). It clears a few things up about their claims; and raises some interesting new questions.

Log-in vs 'passive' cookies
The Comscore study used 'passive' (non-login) cookies from Yahoo! for its first-party cookie. As per my previous post on this topic, the high deletion rate of this cookie (31% at least once a month, according to Comscore) reiterates the need to use login information (i.e. cookies) for UU counting where possible.

Cookie 'flip-flop' vs cookie deletion
Buried in the numbers on page 8 of the report is a footnote about 'preserved' cookies. According to Comscore:

"Preserved designation includes PCs where two or more distinct cookie values were observed alternating throughout the observation period. Such oscillating patterns reflect the use of multiple browsers, or multiple accounts on a PC, and do not reflect reset events."

From the study data, 69.3% of first-party cookies were either constant for the month of the study, or 'preserved' according to the above definition. Another 16.1% were reset once (meaning that there were two distinct cookie values during the period). But in order to tell whether a cookie value change is due to deletion or multiple accounts, you have to have at least three cookie values: A, B, A. I'm going to call this a 'recurring' cookie value. So there's a good chance that a proportion of the "1 reset" group is actually recurring cookies where there was only an A, B pattern (i.e. the recurring cookie didn't come back again before the month was up).

The report isn't clear about whether the study stripped out recurring cookies from the reset counts in higher groups. For example, if they saw the following cookie values through the month:

A B C D

then that is clearly three resets. But if they saw the following values:

A B A D

then is that three resets, or did they strip out the extra A and call it two resets?

This is important because one of the banner headlines from the study is that 7.1% of computers contribute 36.3% of the unique cookie values, resulting in an average overstatement of UUs of 150%. If actually these numbers haven't had recurring cookies stripped out, then the overstatement wouldn't be that high.

Cookie awareness
One interesting aspect of the study which wasn't in the original press release was the results of some survey questions that Comscore asked. The most interesting one was "Do you know the difference between a first-party and third-party cookie?" An astonishing 29.8% said they did - I'm not sure even that many people here know the difference. Only 4.2% claimed to selectively delete third-party (but not first-party) cookies, however, which is unsurprising (or surprisingly high), since it's basically impossible to distinguish between first- and third-party cookies once they're on your system, unless you keep a database of the sites which are known to issue third-party cookies.

 

Overall, the study makes for interesting reading, and seems to have been undertaken with some care. However, towards the end Comscore throws in a bunch of other reasons (rotating IP address, accidentally including international numbers when comparing with domestic panel data) which also inflate server-based counting methods. The addition of this extra material simply has the effect (for me, at least) of making the report seem even more self-serving - it seems disingenuous to release a paper (written in however scholarly a fashion) which basically just bashes server-side measurement when Comscore's motives are so easy to see. It contributes to the debate, and I welcome it, but it leaves a nasty taste in my mouth. Not something that can be said for the splendid Father's Day gift that I received on Sunday (my family know me too well).

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

April 17, 2007

Cookies are evil! Burn them!

Click for original on www.webflyer.comThere's a lot of chatter on the wires here (ooh, I make it sound so glamorous and newsroom-y - it would be more accurate to say there are a lot of e-mails going back and forth) about Comscore's press release about cookie deletion. It makes for somewhat alarming reading - according to the report, 31% of Internet users delete their first-party cookies at least once a month, with 7% deleting them more than four times. Comscore estimates that this means that a cookie-based count of unique users would be overstated by a factor of 2.5, or 150%.

Of course, Comscore is hardly likely to come out with a piece of research that provides a glowing endorsement of cookies, since their measurement methodology - panels (or, more accurately, sampling using a piece of client software that users install on their machines) competes directly with regular web analytics solutions, which rely on cookies for user counts and persistence. But is this study as alarming as it seems?

One thing that caught my eye about the study is that the first-party counts excluded log-in cookies. I'm not quite sure what they meant by this (i.e. whether those were just session cookies), but a lot of sites' first-party cookies are login cookies. So the first-party cookies measured were 'non-essential' cookies; perhaps much more likely to be deleted.

Furthermore, if your site is issuing log-in cookies (assuming they're persistent), you can use these cookies to generate UU numbers, even if not all users have them (assuming your web analytics solution is sophisticated enough to do this). The great thing about a log-in cookie is that, even if the user clears their cookie, when they come back and log in again, their log-in cookie looks identical, even if it's not the same actual cookie. So you can have users delete their cookies 10 times a month with no problems, as long as the new cookie you give them looks the same as the old one.

Which leads me onto the point I made in my previous post - sites need to work with their web analytics vendor to implement strategies to limit the impact of cookie deletion on their numbers. It's still very common to encounter a site which is issuing a high-quality cookie associated with a log-in, but is using a 'junk' first- (or even third-)party cookie for sessionizing, UU counts and persistence.

The other point I'd make is that debates about the absolute accuracy of web analytics have been raging for as long as the industry has existed, and the main answer to this kind of thing remains the same today as it always have - don't rely on your web analytics solution for absolute numbers. Instead, focus on trends and comparisons - which have a much higher chance of being accurate if your measurement technique is consistent across your audience and over time. If you want absolute numbers, use a panel.

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

March 22, 2007

How much are you worth?

I came across a very interesting article in last week's Economist Technology Quarterly the other day (which I was reading a week late, thanks to the efficiencies of the US Postal Service). The article mentioned a couple of sites such as AttentionTrust and Agloco which have sprung up to help users take ownership of their own online behavior data and sell this data to advertisers who want to target them with ads.

Both sites use a browser plug-in which captures browsing behavior and stores it online where it can be aggregated and sold on to advertisers. It's an interesting idea; since the user generates the valuable data about their own preferences, it seems fair that they should get a cut of the advertising revenues generated by this information (according to Agloco, up to 90%).

The only problem is that these users are already getting something for nothing - content. In the current model of ad-supported websites, publishers take money from advertisers who want to reach their readers, and use this money to pay for web hosting, design, maintenance,  content authoring, editing and all the other myriad expenses associated with publishing on the web. As a "thank you" to their users, they offer their content for free (ironically, even the Economist is doing this now).

But if users start taking a big piece of the revenue pie just for the privilege of making their eyes available to be presented with ads, ad-supported publisher business models could collapse. The only way out of this bind is if these "attention" networks can take so much of the weight and expense of managing user profiles off the publishers that they (the publishers) can afford to give away such a big chunk of the ad revenues to the users themselves. And it will be a long time before a sufficiently large number of users are in such networks to make it worthwhile for publishers to abandon their own behavioral targeting efforts. And as a publisher I'm not sure I would want to have to deal with multiple attention networks - so consolidation aroung a single (ideally not-for-profiit) network seems like another pre-requisite.

But the development is interesting, nevertheless. At the very least, Agloco's claimed 10 million users is testament to the fact that users are becoming much more savvy about their personal information and even their browsing behavior, and are looking to monetize themselves (what a great phase that is: "Honey, I'm off to monetize myself for the day. I'll be back around 6.30"). How much are you worth?

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

February 15, 2007

One bad apple

A colleague brought to my attention the dubious practices of LogStats.de, a German provider of free web analytics. LogStats is a typical teeny-tiny provider of free web stats, using a JavaScript-based tag for data collection. Free web stats is a pretty thin business to be in these days, what with behemoths like Google and us charging about (or about to charge about, in our case) in the market - so how does LogStats pay the bills?

It turns out that the HTML code segment that LogStats distributes contains a little something extra. Can you spot what it is in the code below? (thanks to Google Blogoscoped for this code):

<!-- Logstats Counter Code -->
<script language="JavaScript" type="text/javascript" src="http://www.logstats.de/pphlogger.js.php?id=...">
</script>
<noscript>
<img src="http://www.logstats.de/pphlogger.php?id=...">
<a href="http://www.artelight.de">Leuchten</a>
</noscript>
<!-- Logstats Counter Code -->

Don't see anything unusual? Go to the back of the class. What, precisely, is that link on the word "Leuchten" (German for "Lamps") doing in the <noscript> section? Well, the website linked to - Artelight.de - is owned by the same guy, Marcin Nolte, who owns LogStats.de. So everyone who implements this tag code is giving Artelight a free link - on every page.

That's going to be pretty good for Artelight's Google rankings, and indeed they rank #1 in Germany for the term "Leuchten" and "Lampen" (another word for "Lamps"). Logstats claims to have about 9,500 customers, so that's a lot of back-links. But it's pretty sneaky.

You could argue that  Logstats/Artelight are doing nothing more evil than gaming Google's page rank algorithm, and all power to them. After all, apart from consuming a tiny amount of extra bandwitdth on their clients' sites, neither their clients nor their customers are coming to any harm whatsoever. And you could argue that these companies need to get something back for providing a free web analytics package.

But in an era when web analytics and online marketing are viewed with considerable suspicion, this kind of behavior is unhelpful, to say the least. There are rumors that other small web analytics firms are engaged in this practice, too, which is also rather worrying (the only one I've been able to confirm is blogcounter.de which seems to do something similar). The problem with this kind of thing is that it is grist to the mill for anyone who wants to throw mud at the online marketing and web analytics industries and paint them as enemies of privacy. One bad apple spoils the whole damned barrel.

[Thanks again to Google Blogoscoped for much of the detail of this post]

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

January 22, 2007

The deleting-your-Google-cookies industry

I'm always amazed by the economic niches that grow up around the periphery of big companies and industries. It's a great demonstration of the Darwinian roots of capitalism. So I was delighted to discover (in a purely academic sense, of course) via GoogleWatch that a little industry has grown up around the business of managing (and deleting, if you want to) your Google cookies.

Of course, the effects of anti-spyware programs such as Adsgone on third-party cookies have been understood for some time, but this more recent development of utilities that specifically target Google is interesting - and more than a little worrying for those of us who use cookies for very similar purposes.

The reasons that Google (and Microsoft, and Yahoo!) set persistent cookies are broadly two-fold:

  1. To make it easier for you to log in the next time you come back to the site
  2. To recognize you the next time you come back, even if you don't log in

Of these, no. 2 is the most important for the search engine; if you can start building up a profile of people's search (and other) behavior, and tie this to some registration information that they may have provided, you gain the ability to offer much more targeted advertising to that person.

So, for example, perhaps I spend a day online searching for all things Chrysler-related - Chrysler dealerships, Chrysler reviews, etc. Then, a month later, I come back and search for "Auto repair shop Seattle". It might be useful if the first paid results shown were for auto shops which specialized in Chrysler cars, wouldn't it? The auto shop in question would probably pay a little more to get to the top of the results in this situation - and anything that drives up the price of ads is good - good for Google, good for us, good for Yahoo!.

Of course, this sort of second-guessing of people's preferences makes people nervous - what else is Google keeping about me? Hence the deleting-your-Google-cookies industry, and things like the recent FTC complaint against Microsoft (seems a little harsh to single us out, but I guess that's what you get for being a huge and not-particularly-loved target). But people need to remember that it's advertising revenues that fund the cool stuff they get for free; including Gatineau.

So there's a balance to be struck, and a lot of education still to do. And we need to be at the forefront of that education process, or this time next year I'll be blogging about the deleting-your-Microsoft-cookies industry.

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

October 27, 2006

Who moved my cookies?

As you probably know, Internet Explorer 7 is out in the wild. It's a big improvement over IE6, but I am a wee bit disappointed that it doesn't offer a bit more finesse in the cookie management department. There's a new feature to delete all your cookies (separately from other temporary files), but you can't delete the cookies from a specific website. I suppose the thinking was that this kind of functionality is more the kind of thing that only developers need (I came across this whilst trying to delete cookies so that I could test something relating to my wife's e-commerce site, mirror mirror - I wasn't being a 'normal' user), but it would have been nice to have something to compare with Firefox's functionality in this area.

What this means is that power users will be forced (as they were with IE6) to go to the Temporary Internet Files folder to find the cookies they want to delete. As with IE6, there's a button on the General tab of the Internet Options dialog to open up the relevant folder. But if you're running IE7 on Vista, there's a surprise in store. The folder that IE opens up for you is a variant on the following (depending on who you're logged in as):

C:\Users\[username]\AppData\Local\Microsoft\Windows\Temporary Internet Files

Notice that this is a different file path from the normal C:\Document and Settings\... path that things are found in on Win XP. But that's not the problem - the problems is that your cookies are actually stored in the following folder:

C:\Users\[username]\AppData\Roaming\Microsoft\Windows\Cookies

So, you go hunting in the Temporary Internet Files folder (which, at least on my system, still contains some cookie files) and delete the cookies you want, only to discover that IE still has the cookie. In fact, it's a little more confusing even than that, because if you're running IE in protected mode (the default) on Vista, the cookie information is written to:

C:\Users\[username]\AppData\Roaming\Microsoft\Windows\Cookies\Low

I feel like a bit of a heel criticizing IE 7 - my colleagues on the IE team have done a great job bringing it to the market, and you can really sense the excitement on their blog. But if you want to be able to delete cookies individually (or advise someone else how to), this information may be of use to you. Alternatively, download one of the add-ons for cookie removal from the IE Add-ons site. But be aware that some of these need updating to find the right cookies folder, too.

Live Favorites co.mments del.icio.us digg Furl Ma.gnolia RawSugar Reddit Spurl TailRank YahooMyWeb

Search

Subscribe

Enter your email address:

Delivered by FeedBurner