So you ran a test – and you ran it correctly, following A/B testing best practices – and you’ve reached inconclusive results.
A surprising amount of tests end up inconclusive. According to Experiment Engine’s data, anywhere from 50% to 80% of test results are inconclusive, depending on the vertical and stage of the testing program. As they summarize, “you better get used to ties.”
Here they provide a histogram of the probability of an outcome between two values:
Other estimates rank A/B test ‘failures’ (A wasn’t meaningfully different from B to justify a new business tactic) anywhere from 80 to 90 percent of the time. This can stall a testing program – as an HBR article put it, “For many managers, no action resulting from the tests equals no value to the test. So when the vast majority of tests “fail,” it is natural to wonder if testing is a waste of time and resources.”
Both VWO and Convert.com have produced estimates that have concluded only about 1 in 7 A/B tests is a winning test. Though Convert also showed that those that use optimization agencies generally see 1 out of 3 tests producing statistically valid results.
Since inconclusive results appear to be the norm rather than the exception, what do you do when you get them?
Segment The Data
The first thing you should do if your A/B test results are inconclusive is look into the segments.
Brian Massey from Conversion Sciences shared how looking at individual segments helped reveal clearer data on a well planned and powered split test he ran, saying, “when lumped together, one segment’s behavior was canceling out gains by other segments.”
If you’re facing an inconclusive test, look at the test performance across key segments like devices, traffic sources and whatever else makes sense for your business. But heads up: segments need to have enough sample size too before you can take the results as “conclusive”.
In Massey’s case, tests of video footage on an apparel site came up inconclusive. Though video typically drives increased conversions, his video variations performed the same as text-based pages.
Segmenting users revealed the following answers:
- New visitors preferred viewing long videos, while returning visitors engaged more with shorter clips.
- Visitors entering the site through product pages preferred different types of video than those entering through the home page.
- Existing subscribers converted higher than other segments when viewing videos that included product close-ups.
When taken together, each individual segment was canceling out another’s gains. Slicing down the traffic into discrete segments revealed the insight Massey needed to move forward.
If you’ve run through your segments and found nothing of value, it’s time to ask yourself whether or not to keep pushing with your hypothesis or move to the next item on your list…
Should You Stay or Should You Go?
Do you keep trying variations of the same hypothesis, or do you call it a day and move on with a new hypothesis entirely?
EJ Lawless from Experiment Engine mentioned in a blog post that testing velocity is a crucial trait of successful optimization teams, but if your test was based on trying to validate an opinion or based on some checklist you read on the internet, you might be best off dropping it and moving to something real.
First, Don’t Test Dumb Things
In many cases, if changes are small and pointless, the results of your A/B test will come back inconclusive.
Take a look at these examples from GrooveHQ:
In the first instance, the color of the site’s CTA button – a commonly tested feature – wasn’t enough of a change to bring about a significant result.
While huge companies, like Amazon or Google – with their millions of visitors each day – can run statistics with lots of power for measuring significance on small cosmetic changes, smaller companies need to hone in on the big wins instead.
Another example comes from Alex Bashinsky, co-founder of Picreel. They were trying to maximize the impact of their social proof section, so they tested colored logos with links from our various media mentions versus grey logos without links.
Didn’t matter at all. Lesson learned: visitors don’t care about the logo color.
If your most recent test has come up inconclusive, it’s possible that you’ve earned these results because the changes you’ve made came from running down a list of “Test This Right Now” tactics, rather than the things that really matter to your visitors.
Rather, your optimization process should be, well, a process. There many frameworks out there, but I suggest checking out our ResearchXL model to gather and prioritize insights.
Test Bolder Changes
In addition to basing your tests in qualitative and quantitative data (instead of mere opinion), test things that will actually make a difference to your visitors. Sometimes you’ve got to get bold.
Tests like the above examples from Groove produce no results (or learnings, really) because they’re seemingly random, don’t address visitors’ actual issues, and are too small to recognize without a TON of traffic.
Iterative Testing and When To Persist
If you’re following a process, and you reach an inconclusive test, there are times when you should, as EJ Lawless put it, “re-examine your hypothesis and see if the hypothesis makes sense and if you should test another variation around the same hypothesis.”
Peep gives the following example:
The key here is that you are basing your test on a strong hypotheses. And while we can never be 100% certain in a hypothesis (even if the tests win – we won’t know why it actually worked, we’ll just have some possible explanations), there are ways to be more confident (ResearchXL). So in many cases, iterative testing is the name of the game.
What’s Your Strategy?
And while inconclusive results aren’t as ‘fun’ as winners, you can still learn things from them…
What Are Inconclusive Results, Anyway?
Are you validating opinions or testing for discovery? Even if you didn’t get the results you hoped for, inconclusive results can tell you if something has little to no influence, which in itself is valuable.
Testing for discover and increasing the amount of variations you test can be more valuable than trying to validate an assumption. Andrew explains here:
Getting Value From Neutral Tests
Everyone loves winning. Getting an unexpectedly major lift on a split test is an exhilarating feeling, but unfortunately, it’s the exception – rather than the norm.
That doesn’t mean that inconclusive A/B tests aren’t worth your time, though. You can still learn a lot from inconclusive tests.
Grigoriy Kogan wrote on Optimizely’s blog about getting value from neutral tests. In the event of an inconclusive tests, he suggests asking, “What hypothesis, if any, does the neutral result invalidate?”
“The problem might not be what you thought,” he says.
As an example, he showed an inconclusive test he ran:
Here’s how he explained:
Another example, a peculiar one mind you, is pricing. If you test pricing, and there is no difference in the variations, that provides a TON of value.
In fact, that was one of Groove’s failed tests. They tested tiny differences in their pricing. No difference:
But if there’s no difference, then charge the highest amount. We had a test where conversion were the same for something like $29, $35, and $39. So of course, in that case you charge $39.
Look At Micro-Conversions
While you shouldn’t necessarily optimize for micro-conversions, you should, as Kyle Rush suggests, “measure more than just your primary goal.”
Looking at micro-conversion is also something that Justin Rondeau suggests:
So perhaps if a variation increases some correlative micro-conversion metrics, it would be okay to implement that. Otherwise…
When In Doubt, Favor The Control
Almost everyone I talked to for this article suggested that, when all wells turn up dry, simply to favor the control. Why? For one, to conserve resources…
There are exception to this though. Maybe you’re testing something that is a legal requirement or maybe it’s a shift in branding that will be good in the long term for the company. In those cases you would likely deploy the treatment to 100% of traffic.”
…Or Your Favorite Variation
If there’s truly no difference, then you could just do what you like best…
Paul Rouke agreed with the idea that, if nothing else is distinguished, having the variation as the new control can be an option:
So you run the test and it comes up inconclusive. Most will say favor the control because of resources, as well as branding concerns and novelty effects. But if it’s a matter of politics, you can probably give way to the client (or the boss’s) opinion.
Everyone loves a winner, but industry data has shown that most tests aren’t winners. Many tests are simply inconclusive, which can produce friction in a budding optimization program.
In the case of an inconclusive test (and assuming you ran the thing right and understand variance), there are a few different solutions, recommended and vetted by optimization experts. These are all contextual recommendations and need to be implemented according to your own situation (digging into the segments won’t do anything if you don’t have adequate traffic or if you’re testing dumb stuff):
- Dig into the segments and learn or implement personalization rules.
- Iterate on your hypothesis
- Maximize your beta of test options, and figure out whether it’s an execution problem or a lack of influence.
- Try something new (next item on the testing backlog)
- Try something more radical
- Track micro-conversions and if important correlative metrics increase in a given variation, implement it.
…or just stick with the control, or appease your stakeholders by implementing their favored variation. If you want to be disciplined and efficient, favoring the control is the way to go. If you want to play politics (sometimes necessary to fuel an optimization program), exercise discretion.
H/T to Alex Bashinsky for helping with research and production of this article.