When should you use multivariate testing, and when is A/B/n testing best?
The answer is both simple and complex.
Of course, A/B testing is the default for most people, as it is more common in optimization. But there is a time and a place for multivariate testing (MVT) as well, and it can add a lot of value.
Before we get into the nuances, let’s briefly go over the differences.
What is multivariate testing?
Multivariate testing is the process of testing more than one component on the website in a live environment.
This is the definition by Lars Nielsen of Sitecore, to which he also added:
Multivariate testing opposes the traditional scientific notion. Essentially, it can be described as running multiple A/B/n tests on the same page, at the same time.
Multivariate testing is, in a sense, a more complex form of testing than A/B testing. A/B testing is fairly straightforward:
You can also measure the performance of three or more variations of a page with A/B/n tests. As Yaniv Navot of Dynamic Yield wrote, “High-traffic sites can use this testing method to evaluate performance of a much broader set of variations and to maximize test time with faster results.”
Here’s what an A/B/C/D test looks like conceptually:
A/B testing usually involves less combinations with more extreme changes, whereas multivariate tests have a large number of variations that usually have subtle differences.
The case for A/B/n tests
Should you use MVT or A/B/n tests?
If you have enough traffic, use both. They both serve different, but important purposes. In general, A/B tests should be your default, though.
With A/B testing, you can:
- You can test more dramatic design changes;
- Tests usually take way less time than MVTs;
- Advanced analytics can be installed and evaluated for each variation (e.g., mouse tracking info, phone call tracking, analytics integration, etc.);
- Individual elements and interaction effects can still be isolated for learning and customer theory building;
- A/B tests typically bring bigger gains (since you often test bigger changes).
A/B testing tends to get meaningful results faster. The changes between pages are more drastic, so it’s easier to tell which page is more effective.
So A/B testing harnesses the power of large changes, not just tweaking colors or headlines as is sometimes the case with MVT. Optimizers usually start all engagements with A/B testing, because that’s where the bigger gains are possible.
Yaniv Navot, Director of Online Marketing at Dynamic Yield, also mentioned that MVT is mainly used for smaller tweaks. He also mentioned that A/B tests are better for multi-page and multi-scenario experiences:
Something else to worry about with MVT: the amount of traffic you get.
How much traffic do you get?
For example, a 3×2 test (testing two different versions of three design elements) would require the same amount of traffic as an A/B test with nine variations (3^2). 3×2 is a typical MVT test.
In a full factorial multivariate test, your traffic is divided evenly among all variations, which multiplies the amount of traffic necessary for statistical significance. As Leonid Pekelis, statistician at Optimizely, said, this results in a longer test run:
Altogether, the main requirement becomes running your multivariate test long enough to get enough visitors to detect many, possibly nuanced interactions.
Claire Vo of Optimizely also said that MVT is more difficult to execute because of the extra traffic and resources it requires:
A rule of thumb: if your traffic is under 100,000 uniques/month, you’re probably better off doing A/B testing instead of MVT. The only exception would be the case where you have high-converting (10% to 30% conversion rate) lead gen pages.
In addition, if you’re an early stage startup and you’re still doing customer development, it’s too early for MVT. You may end up with the best performing page, but you won’t learn much. By doing everything at once, you miss out on the ups and downs of understanding the behavior of your audience.
That said, there are definitely some high-impact use cases for MVT.
When should you use a multivariate test?
The benefits of multivariate tests
MVT is awesome for follow-up optimization on the winner from an A/B test once you’ve narrowed the field.
While A/B testing doesn’t tell you anything about the interaction between variables on a single page, MVT does. This can help your redesign efforts by showing you where different page elements will have the most impact.
This is especially useful when designing landing page campaigns, for example, as the data about the impact of a certain element’s design can be applied to future campaigns, even if the context of the element has changed.
So a big goal of multivariate testing is to let you know which elements on your site play the biggest role in achieving your objectives.
ANOVA? A quick definition
ANOVA (analysis of variance) is a “collection of statistical models used to analyze the differences among group means and their associated procedures.”
In simple terms, when comparing two samples, we can use the t-test—but ANOVA is used to compare the means of more than two samples.
If you’re looking to dive deep into ANOVA, here’s a great video tutorial to learn:
So if there are certain use cases for multivariate tests, then there are certain ways to execute them. What are the conditions and requirements of running successful multivariate tests?
Multivariate testing: How to do it right
The one big condition of running MVT: “Lots and lots of traffic,” according to Paras Chopra. Therefore, much of the accuracy in running MVT means understanding traffic needs and avoiding false positives.
Common mistakes with running MVT
Though many of the common mistakes of MVT aren’t unique (many apply to A/B testing as well), some are specific to multivariate methods. But they’re pretty much as you’d guess:
- Not enough traffic;
- Not accounting for increased chance of false positives;
- Not using MVT as a learning tool;
- Not using MVT as a part of a systemized approach to optimization.
1. Not enough traffic
We already talked about traffic above, but to reiterate: MVT requires lots of traffic. Fractional factorial methods mitigate this, but there are some questions as to the accuracy of this method.
The increased traffic requirement also presents the question of how long you should expect this test to go. This is especially true if you’re using MVT as a way to throw things at the wall and see what sticks (inefficient).
One thing you should definitely do is estimate the traffic needed for significant results. Use a calculator like this one.
Leonid from Optimizely discussed ways to get around the need for crazy amounts of traffic, including the fractional factorial method (we’ll discuss more below):
Though Matt Gershoff, CEO of Conductrics, said that it’s not necessarily true that an MVT requires more data than would a related set of simple A/B tests. In fact, he says, for the same number of treatments to be evaluated and the same independence assumptions that are implicitly made when running separate A/B tests, an MVT actually requires less data. He continues:
2. Not accounting for increased chance of false positives
According to Leonid, the most common mistake in running multivariate tests is not accounting for the increased chance of false positives. His thoughts:
We’ve written about multiple comparison problems before. Read a full account here.
3. Not using MVT as a learning tool
As we mentioned in a previous article, optimization is really about “gathering information to inform decisions.” MVT is best used as a learning tool. Using it as a way to drive incremental change and throw stuff at the wall is inefficient and takes time away from more impactful A/B tests. Andrew Anderson put it well in an article on his blog:
4. Not using MVT as a part of a systemized approach to optimization
Similarly, many MVT mistakes come from people not knowing what they’re planning on doing, or having a testing plan at all. As Paras Chopra put it:
Andrew Anderson puts it in perspective, saying if you’re using either A/B or MVT testing just to throw stuff against the wall or to validate hypotheses, this will only lead to a personal optimum (i.e. ego-fulfillment). He continues, saying that “tools used correctly to maximize results and maximize resource allocation for future efforts lead to organizational and global maximum.”
Now, I mentioned above that there were different statistical methods for MVT. There’s a bit of a debate between them. Does it matter?
Full factorial, fractional factorial… Does it matter?
There are a few different methods of multivariate testing:
- Full factorial;
- Fractional factorial;
There’s a bit of an ideological debate between the methods, as well.
Full factorial multivariate testing
A full factorial experiment is “an experiment whose design consists of two or more factors, each with discrete possible values or “levels,” and whose experimental units take on all possible combinations of these levels across all such factors.”
In other words, full factorial MVT tests all combinations with equal amounts of traffic. That means that it:
- Is more thorough, statistically;
- Requires a ton of traffic.
Paras Chopra wrote in Smashing Magazine a while ago:
“If there are 16 combinations, each one will receive one-sixteenth of all the website traffic. Because each combination gets the same amount of traffic, this method provides all of the data needed to determine which particular combination and section performed best. You might discover that a certain image had no effect on the conversion rate, while the headline was most influential. Because the full factorial method makes no assumptions with regard to statistics or the mathematics of testing, I recommend it for multivariate testing.”
Fractional factorial multivariate testing
Fractional factorial designs are “experimental designs consisting of a carefully chosen subset (fraction) of the experimental runs of a full factorial design.”
So fractional factorial experiments test a sample set by showing significant combinations. Because of that, they require less traffic:
Though, an Adobe blog post likened fractional factorial design to a barometer, saying “a barometer measures atmospheric pressure, but its value is not so much in the precise measurement as the notification that there is a directional change in pressure.”
The same article then also said:
I question how valuable it is to spend five months running one single test for learnings that may no longer be applicable by the time the test has completed and the data pumped through analysis. Instead, why not take the winnings and learnings of your week-long fractional-factorial multivariate test and then run another test that builds off that new and improved baseline?
Taguchi multivariate testing
This is a bit more esoteric, so it’s best not to worry about it. As Paras wrote in Smashing Magazine:
It’s a set of heuristics, not a theoretically sound method. It was originally used in the manufacturing industry, where specific assumptions were made in order to decrease the number of combinations needing to be tested for QA and other experiments. These assumptions are not applicable to online testing, so you shouldn’t need to do any Taguchi testing. Stick to the other methods.
So does it matter?
As mentioned above, most of the debate lies in the murkier statistics of the fractional factorial method. A large amount of the optimizers I talked to said they only recommend full factorial. As Paras explains, “A lot of ‘fractional factorial’ methods out there are pseudo scientific, so unless the MVT method is properly explained and justified, I’d stick to full factorial.”
However, some, like Andrew Anderson, hold that these debates in general are misguided. As he explains:
So does it really matter? I don’t know. If you have enough traffic, I think full factorial is harder to mess up. That said, you’re making business decisions that are time critical, so if a full factorial test will take you six months to complete, it’s probably not worth the accuracy.
If you have enough traffic, use both types of tests. Each one has a different and specific impact on your optimization program, and used together, can help you get the most out of your site. Here’s how:
- Use A/B testing to determine best layouts.
- Use MVT to polish the layouts to make sure all the elements interact with each other in the best possible way.
As I said before, you need to get a ton of traffic to the page you’re testing before even considering MVT.
Test major elements like value proposition emphasis, page layout (image vs. copy balance, etc.), copy length and general eyeflow via A/B testing, and it will probably take you 2-4 test rounds to figure this out. Once you’ve determined the overall picture, now you may want to test interaction effects using MVT.
However, makes sure your priorities align with your testing program. Peep once said, “most top agencies that I’ve talked to about this run about 10 A/B tests for every one MVT.”