How to Analyze A/B Test Results

You’ve identified a page that is leaking money. Thanks to qualitative research, heuristic analysis and other insights you’ve come up with a number of test hypotheses.

Now you create a treatment, and test it against control. Oh no, it loses! Or oh no, there’s no difference between the two variations! Now what?

You analyze the treatment, the results across segments, and improve your hypothesis. Then – you test again! Iterative testing is the name of the game. Be prepared to run many test rounds for each page. The odds of you solving all problems with the first test are slim! 10 tests to a sizable win is more like it.

Nobody knows which test is going to work. If we did, we wouldn’t need testing! So inevitably some tests are either not going to produce a lift, or even perform worse than control. What matters is that you analyze each test result, update your test hypothesis and test again.

Remember – specific execution matters. Let’s say you’re trying to optimize for clarity. You re-write the body copy, headlines and CTAs. But – no lift. “Clarity is not an issue here”, you might think. But there’s also a good chance that the changes you implemented were still not clear enough. So you might want to try another test, using voice of the customer in the copy – the actual language they used in the surveys.

Tests lose all the time, and that’s okay. It’s about learning.

Some say only 1 out of 8 wins, some claim 75% of their tests win. Convert.com ran a query on their data and found that 70% of the A/B test performed by individuals without agencies don’t lead to to any increase in conversion.

Ignore “market averages” for this kind of stuff as your average tester has never done any conversion research, and is likely to have their testing methodology wrong as well. The same convert.com research also showed that using a marketing agency for A/B testing gives 235% more chance of a conversion increase. So competence clearly matters (of course, your average marketing agency is not very competent at CRO).

When you know that more than half of your tests are likely not to produce a lift, you will have new-found appreciation for learning. Always test a specific hypothesis! That way you never fully fail. With experience, you begin to realize that you sometimes learn even more from tests that did not perform as expected.

Matt Gershoff, Condutrics:

“Test is really data collection. Personally, I think the winner/loser vocabulary perhaps induces risk adversity.”

Some people indeed fear that “losing” test would mean you did something wrong, and because of that your boss, client etc would not be happy with the performance. Doubt and uncertainty start to cloud your thought process. The best way to overcome this is to be on the same page with everyone. Before you even get into testing, get everyone to together and agree that this is about learning.

The company profit is really just a by-product of successfully building on your customer theory.

Nazli Yuzak, Dell:

“There lies the reason why many tests fail: an incorrect initial hypothesis. From numerous tests, we’ve found that the hypothesis creation has a major impact on the way a test is run, what is tested, how long a test runs and just as important, who’s being tested?”

What happens if test results are inconclusive?

Inconclusive tests are very common. It’s when your treatment does not beat control, no statistical significance. This usually happens when your hypotheses is wrong, or your test wasn’t bold or brave enough in shifting away from the original design, particularly on lower traffic sites.

So what do you do then? You analyze the segments.

Open up Google Analytics and see how the variations performed across different segments – new, returning, different browsers and devices, different traffic sources, other behavioral identifiers. Read this to learn how to analyze test results with Google Analytics.

Quite often you will find that one of the variations was a confident winner in a specific segment. That’s an insight you can build on! One or more segments may be over and under, or they may be cancelling out – the average is a lie. The segment level performance will help you (Note: in order to accurately assess performance across a segment, you again need a decent sample size!)

If you genuinely have a test which failed to move any segments, it’s a crap test, assess how you came to this hypothesis and revise your whole hypothesis list.

And finally – get testing again!

What happens if test fails (Control wins)?

In short: Learn from the failure.

If you can’t learn from the failure, you’ve designed a crap test. Next time you design, imagine all your stuff failing. What would you do?

If you don’t know or you’re not sure, get it changed so that a negative becomes useful. Failure itself at a creative or variable level should tell you something. On a failed test, always analyze the segmentation. One or more segments will be over and under – check for varied performance.

Now add the failure info to your customer theory. Look at it carefully – what does the failure tell you? Which element do you think drove the failure? If you know what failed (e.g. making the price bigger) then you have very useful information.

Perhaps you turned the handle the wrong way. Now look at all the data that you have, and brainstorm a new test.

#1: Mindset of an Optimizer
You seek to understand your customers better - their needs, sources of hesitation, conversations going on inside their minds.
#2: Conversion Research
Would you rather have a doctor operate on you based on an opinion, or careful examination and tests? Exactly. That's why we need to conduct proper conversion research.
#3: Google Analytics for Conversion Optimization
Where are the problems? What are the problems? How big are those problems? We can find answers in Google Analytics.
#4: Mouse Tracking and Heat Maps
We can record what people do with their mouse / trackpad, and can quantify that information. Some of that data is insightful.
#5: Learning From Customers (Qualitative Surveys)
When quantitative stuff tells you what, where and how much, then qualitative tells you 'why'. It often offers much more insight than anything else for coming up with winning test hypotheses.
#6: Using Qualitative On-Site Surveys
What's keeping people from taking action on your website? We can figure it out.
#7: User Testing
Your website is complicated and the copy doesn't make any sense to your customers. That's what user testing can tell you - along with specifics.
#8: From Data to Test Hypotheses
The success of your testing program depends on testing the right stuff. Here's how.
#9: Getting A/B Testing Right
Most A/B test run are meaningless - since people don't know how to run tests. You need to understand some basic math and statistical concepts. And you DON'T stop a test once it reaches significance.
#10: Learning from Test Results
So B was better than A. Now what? Or maybe the test ended in "no difference". But what about the insights hidden in segments? There's a ton of stuff to learn from test outcomes.
Conclusion
Conversion optimization is not a list of tactics. Either you have a process, or you don't know what you're doing.