The joys of cross-domain tracking

Reading Time: 5 minutes

One of the dirtier little secrets of web analytics is tracking behavior across multiple domains. I got asked a question about this by a colleague today, so I thought I’d blog about it (the blogmaw gets fed for another day – hurrah).

The problem

Here’s the problem in a nutshell: to track a user’s behavior across two (or more domains), you need a unique ID attached to that user’s page requests which is the same for requests on all the domains. There are a number of ways of doing this:

  1. Use the user’s IP address (or IP+User Agent) as the ID
  2. Store the ID in a third-party cookie
  3. Store the ID in a set of first party cookies, one for each domain

Now, as most of you will no doubt know, using IP address for user tracking is pretty weak – many ISPs change the effective IP address of a user from click to click (because they dynamically route requests through a bank of Proxies), whilst many users behind a corporate firewall will share the same apparent IP address. So let’s forget about that option right away. We need to use a cookie.

Cookies & domains

So that leaves the option of using a cookie. What you need to know here is that cookies are linked to the sites (domains) that issue them. So if I get a cookie from a particular site, any code running on that domain (e.g. web analytics tag JS code) can read the cookie; but as soon as I move to a new domain, the abc.com cookie becomes invisible.

Here’s an example. I have an identical piece of JavaScript code on two sites, abc.com and xyz.com, which execute the following logic when they run (this is very typical of the logic you’ll find in web analytics tag code):

If (cookie exists) then
    record ID value from the cookie called “ID”
else
    set new cookie called “ID” with a random ID
    record ID value from the cookie

When a new user comes to abc.com, they’ll get issued with a cookie called “ID” which contains a new random ID value (say, 12345). They click around the site, each time running this piece of code, which on subsequent runs doesn’t set a new cookie (because one already exists), but just records the “12345” ID value. If I analyze this data I’d see a whole clickstream from user #12345.

If the user then moves over onto xyz.com, the exact same code will not see the abc.com cookie, and so will issue its own “ID” cookie, with a new ID value in it (say, 56789). This value will be captured alongside every click on the xyz.com domain.

If I analyze the clickstream data from both sites as a piece, now, it’ll look like there were two users – #12345 (on abc.com) and #56789 (on xyz.com). But really, those users are the same person. So how to get round this? We have two options: use a third-party cookie, or use ‘cookie handover’.

Third party cookies

Until cookie churn became a problem, using a third-party cookie was by far the easiest and most effective option for cross-domain tracking – that’s why many web analytics firms adopted it.

In this solution, a ‘third-party’ website issues the cookie on behalf of the two domains. Typically this third-party site is the web analytics provider’s own server(s) – this is also the way that ad servers work. So now when the user goes to abc.com, they get a cookie from (say) wafirm.com. And when they go to xyz.com, the code there checks for a cookie on wafirm.com, not xyz.com. Since one already exists, the ID in the existing cookie is logged, rather than a new cookie (and ID) being issued.

So by tracking the user’s behavior against the third-party cookie, you can join the activity on the two sites together into a single session. Hurrah!

The problem with this is that it’s invisible to the user that information about them (however anonymous) is being sent to a third-party website. Vendors of ‘anti-spyware’ software have deemed that this kind of behavior is ‘spyware’, and have added functionality to their to remove third party cookies automatically. (As an interesting aside, it’s actually impossible to tell whether a cookie on your computer is a first- or third-party cookie; the anti-spyware software just looks for cookies from known third-party issuing domains (such as webtrends.com) and deletes them). The spread of anti-spyware software has meant that users are automatically deleting their third-party cookies on a frequent basis.

First-party cookies and cookie handover

So now the most widely implemented solution to this problem is to use first-party cookies for user tracking, but to add code to the site which ‘hands over’ the cookie value to the other domain. You can do this because it’s not the cookie you need on the second domain, just the ID value from inside it. So you can have two cookies, fine, but they just need to have the same ID value inside.

The actual ‘handover’ is achieved by inserting the unique ID value as a special parameter into the ‘landing’ URL on the second domain, and then deploying tracking code on this domain that will look for that parameter and use the ID value within it to set the value of the tracking cookie rather than setting a new random value.

The solution’s not perfect, because you have to recode every link between the domains to include the tracking ID. This isn’t feasible when you have two domains with lots of pages and lots of links between them, but it does have use when the links between the domains are few in number and within a structured process.

The best example of this is an e-commerce site which uses a third-party shopping cart or payment engine (for example, mirrormirror, which uses the Protx engine). In most cases, there are only one or two links from the main site to the payment engine (at the end of the purchase process), so it’s feasible to add in the ID information to these links. Even some quite large e-commerce sites use third-party payment providers, so this is a useful technique.

The steps to implementing this solution, then, are:

  1. Re-write all links from domain 1 to domain 2 (and back, if users are likely to start their visits at site 2) to include the value of the “ID” cookie. This requires the use of JavaScript.
  2. Implement tracking code with the following logic on both domains:

if (URL contains a parameter called “ID”) then
    set new cookie called “ID” with ID value from the “ID” parameter in the URL
else
    if (cookie called “ID” exists) then
        record ID value from the cookie called “ID”
    else
        set new cookie called “ID” with a random ID
record ID value from the cookie

Note that this code over-writes the cookie value if there’s an “ID” parameter in the URL, even if there’s a pre-existing cookie. This logical flow is open to debate, but I’ve included it this way round because it takes care of the situation where someone arrives at both domain 1 and domain 2 independently (and has an “ID” cookie from each as a result) and then clicks through from one domain to the other. The way this logic is structured, their ID cookies will be synchronized.

This has the downside that their previous behavior on whatever domain’s ID it is that gets nuked in this process looks like the activity of a different user. But I think that’s a better outcome than not synchronizing the IDs when you get the chance.

If you want a real-world example of code that does this, you can find it in the Google Analytics help.

2 thoughts on “The joys of cross-domain tracking”

  1. Glad to see you post about multi-site tracking techniques since I also ran into similar concerns a few months ago. One technique that you might be missing is using a “friendly 3rd party cookie”. You don’t want your cookie to come directly from your web analytics vendor because of anti-spyware tools, so you would typically use a 1st party cookie. Instead, you could create a DNS CNAME (an alias) of the 3rd party vendor tracking site and use your own domain name of choice. No playing around with URLs, no risks of missing code, no anti-spyware concerns. Check with your vendor to make sure they handle/allow this to work (at least Omniture works this way).
    I would be glad to hear your feedback about my post at:
    http://shamel.blogspot.com/2006/09/web-analytics-implementation-challenge.html

  2. Stephane,
    That’s true – I had forgotten that option. It’s an important finesse on the third-party cookie option. However, it does rely on the web analytics vendor supporting it, and, more importantly, on the site owner’s ability to create a CNAME DNS alias for the cookie to be issued from, which isn’t always possible.
    A further wrinkle with this is if the site (or part of it) uses SSL; in this case, any third-party cookie has to be issued by a server that also has an SSL certificate (otherwise you get that annoying ‘mix of secure and insecure content’ message in the browser); if you create a new CNAME for the cookie, you have to get a certificate, which can cost around $1,000. Since the majority of cross-domain tracking for smaller sites is to capture activity on cart pages (which are usually protected by SSL), this is a real problem.
    Ian

Comments are closed.