An MVT buyer's guide - part 1 - Lies, Damned Lies...

I’m on the flight back from OMMA New York, where I’ve been speaking on a panel about the resurgence of e-mail as a performance-based marketing channel. It really has been a flying visit – I arrived yesterday (Monday) at 8pm and was back in a taxi to JFK at 3pm today. But at least 5 hours on a plane gives me the chance to write a blog post.

One of the things I did get to do in New York was go for a very pleasant lunch with the equally pleasant Bob Bergquist of Widemile. Widemile is an up-and-coming MVT vendor based in Seattle (so somewhat amusing that I should have to go 2,500 miles to catch up with Bob), and we had a lively conversation about the state of the market, particularly in the light of Offermatica’s recent acquisition. I surprised myself a little (though perhaps not you) by having quite a lot of opinions about the merits or demerits of the various players; so I thought I’d wrap some of them up into this post as a kind of MVT buyer’s guide. I’m not going to comment on the specific players, but instead offer some characteristics of MVT tools that you should look for if you’re looking to make an investment here.

This is too much content to include in a single blog post, so I’m going to break it up into a number of parts. I’ll come back and add an update with links to the different parts to this post when I’m done.

MV-what?

Before we dive in, though, here’s a very quick primer on the essentials of MVT. MVT (Multi-variate testing) is a process by which you vary multiple elements of something (e.g., a web page) and measure the effectiveness of each permutation at achieving some goal (for example, a click, or a subsequent conversion). Its appeal is that you can test multiple ideas simultaneously and quickly optimize a page’s layout and content against your desired outcome. The different versions of the elements are usually referred to as variants; the resulting page (containing multiple variants) is called a treatment.

The tyranny of statistical significance

All quantitative tests (whether MVT tests of a web page or field tests of a new cancer drug) have to achieve statistical significance before you can base any decisions on the results. Statistical significance is a formal statistics term meaning that there is a 95% chance that the result you’re seeing is a true result and not just a fluke. That still means that one in twenty times the result you’re seeing will be a fluke, which is a sobering thought.

The friend of statistical significance is data – the more data points you have, the more reliable your results are. In a web MVT test, the amount of data you have is a function of how busy your site is, and how long you run the experiment for. The less busy your site, the longer you’ll be waiting for results. It’s an immutable law and you ignore it at your peril. Much of the differentiation in the MVT industry is in the area of working around this issue.

Ok, you’re all primed. Let the unasked-for advice begin.

1. Experimental design

Ok, this perhaps isn’t the easiest MVT topic to kick off with, but I think it’s the most important, and so I wanted to get to it before you lost interest.

Imagine you’re running an MVT test on your homepage, and you’re testing three variants of four different parts of the page. The total number of possible page ‘treatments’ is 3 to the power of 4, or 81. (Don’t believe me? Click on the image to the left). Raise the number of variants per page part to 4, and you’re dealing with 4^4, or 256 permutations. Add another area into the experiment, and the permutation count goes up to 625.

To test 625 treatments you have to break your audience into 625 groups, serve each treatment to its respective group, and measure the results. Unless your site is pretty busy, this is going to take some time until you reach statistical significance across all the groups. And what if you want to try to find an optimized page for visitors coming from search vs those coming from e-mail or display ads? Now you have 625 x 3 groups, or 1875, that you have to get statistically significant results for. You could be there until Christmas waiting for the result of the experiment to pop out.

The answer to this problem is Experimental Design. This refers to the ability of the MVT tool to avoid having to test every single possible treatment permutation, either because you’ve explicitly said that certain combinations should be avoided (e.g. the headline that says “Get 30% off now!” and the call-to-action button that says “Click here for 45% off!”), or (more cleverly) because the MVT tool has the smarts to implement something called Fractional Factorial design (the alternative – testing all the treatments- is called Full Factorial testing).

Fractional factorial experimentation is a clever mathematical trick where you can test just a fraction (hence the name) of the possible permutations of a page, and then use mathematical interpolation to work out which treatment – even if it wasn’t explicitly tested – is the winner. It can radically reduce the amount of time needed to get to reliable results – for example, an experiment with 1,024 possible treatments can be run on a fractional basis with as few as 64 treatments.

So when you’re talking to a prospective MVT vendor, ask: “Does your tool employ fractional factorial design? Or is it full factorial?”. If the vendor says full factorial, or looks at you blankly, you may be in trouble, unless you want to run very simple tests where the number of treatments is low.

Smart delivery

As a bonus, the MVT tool you use should be able to spot treatments that are generating terrible results and automatically drop these from the test. As well as making the overall experiment shorter, it saves you from having to explain to your boss why revenues are down 10% this month because you were running an experiment with some really bad treatments in it.

And finally, another delivery finesse you should look for is the ability for your MVT tool to just carve off a subset of your audience (randomly selected) to run tests against, leaving the rest of the audience seeing your default treatment. This is important to enable you to be a bit more adventurous with your experiments – if you are only subjecting (say) 10% of your audience to the experiment, the impact of any negative effect will be reduced by a factor of 10. But then, so will your experiment traffic – so fractional factorial testing comes in handy again.

That’s it for part 1. Tune in again soon for parts 2 – 37, covering the following topics:

Analytical power
Segmented optimization
Results automation
People
Getting started

Update: Part 2 of this diatribe now available: click here.