Here’s something scary, according to MECLabs & Magento’s 2014 eCommerce Benchmark, only 13% of those studied base their testing on extensive historical data.
The sad part is, the same study found that in almost every revenue group – with the exception the $0-10k group – companies that test changes based on extensive historical data were proportionally the most likely to be more successful.
Yet, in our own qualitative survey, I frequently see, “I don’t know where to start testing.” “How do I do CRO the right way?” & “How do I know what’ll have the most impact?”
Please don’t tell me that you’re a part of the 87% (from this study at least) that have yet to embrace the ” Test changes based on actual data.” approach.
Table of contents
To Test Based On Data, You Must First Understand Your Data
To be fair, there may be other forces at work. Maybe you haven’t sold your boss on CRO, or there’s bureaucracy, or your HiPPO doesn’t believe in using real analytics…
Indeed, over half of the companies surveyed in the MECLabs benchmark admitted to having little to no guidelines for measuring performance through analytics.
Or, maybe your company is ready to embrace the testing culture, but you genuinely don’t know what data to look at to know where you should start.
If that’s the case, I’d highly recommend checking out these three articles:
- Peep’s “10 Google Analytics Report That Tell You Where Your Site Is Leaking Money“
- 7+ Underutilized Analytics Reports
- 10 Optimization Experts Share Their Favorite Google Analytics Reports
In order to prioritize any sort of meaningful test, you must first understand:
1. Where Gaps Actually Exist Based On Real Data, Not Guesses or Perceptions
Above is a screenshot of what the first run experience of Patrick McKenzie’s Bingo Card Creator looks like.
“We can observe that there is a small drop-off in funnel completion between Dashboard and Create list (95.9% of users having reached Dashboard will successfully create their word list), but there are significant leaks between “Create list”, “Customize” and “Schedule print”. This is where I focus my efforts as a UX designer, as it is likely I can achieve big wins there…”
Breaking the overall conversion (downloads) into a series of steps like this helped him to identify where the smaller leaks in the funnel were, allowing him to make incremental improvements to one step of the funnel at a time – that is, if he chose to run tests on this funnel first.
Taking a “big picture” approach & looking at the rest of his data, he may have also found room for improvement in:
- Customer referrals
- High traffic/high bounce pages
- High traffic/low speed pages
- Specific countries
Knowing what to test first depends on how well you know your data & understanding where you’ll get the biggest gains with the least amount of effort. If other areas have potential for bigger gains & take fewer resources, that area should get higher priority.
This is what I mean when I say “knowing your data” too, it’s not just a matter of looking at everything from the surface level, but really striving to understand the different kinds of interactions that happen on your site.
Had Patrick said “I want to get more downloads, I’ll test my headlines!” he may have very easily broken things that were working for him and seeing no actual gains.
And yes, it happens all the time. According to VWO, 30% of the first tests run through their platform are calls to action followed by 20% of tests being headlines.
This wouldn’t be so bad, if there wasn’t also an underwhelming amount of research being performed. The majority of tests only had one hour invested from the initial research to making the test go live.
The problem with so little time being invested into the research process, even if there were a significant outcome that came from a test, there’s usually no understanding of why it was successful. We’re talking about persuading people & understanding the desires, needs & emotions that drive them to spend their money… It takes more than an hour folks.
Tell me honestly, how much time do you think I should invest in understanding what makes you want to buy?
2. You Must Understand How Much Traffic & Time It Will Take For A Test To Achieve A Statistically Significant Outcome?
Peep wrote an article that I think every marketer who wants to be taken seriously must read, that the #1 A/B testing mistake he sees are people calling their tests too early.
In it he says,
“You don’t want to make conclusions based on a small sample size. Go for at least 250 conversions per variation. It’ll be more accurate if it’s 350-400 conversions per variation. And if you want to segment results, you need thousands of conversions per variation in order to have 350+ conversions per variation within a segment.”
What I’ve always found interesting about this is that it’s not trying to get a winner as quick as possible, but rather taking an approach where you can know with at least 95% confidence that the winner is actually a winner.
Moreover, Peep also recommends running tests for at least 4 full weeks, and checking the Conversions Per Day Of The Week report in order make sure you’re not skewing your results.
So check your analytics, and get a sense of how long it will take you to get at least 100 conversions, then work backwards from there.
If for some reason getting to 100 conversions/variation is going to take too long, focus your attention on usability testing, gathering qualitative feedback, and understanding your visitors & their problems. You can still do conversion optimization with little traffic, you just need to focus differently.
Now, assuming you understand your data & how long it’s going to take to get a significant result (95% chance to win) here are 3 frameworks you can use to prioritize your tests.
Method 1 – CXL’s PXL Framework
While there are many frameworks out there, they all have one problem: subjectivity in their scoring. We wanted a more objective, empirical way to prioritize tests. We wanted to force people to bring data to the table. That’s why we created the PXL Framework:
Grab your own copy of this spreadsheet template here. Just click File > Make a Copy to have your own customizable spreadsheet.
This framework brings 3 big benefits:
- It makes any “potential” or “impact” rating more objective
- It helps to foster a data-informed culture
- It makes “ease of implementation” rating more objective
Instead of guessing what the impact might be, this framework asks you a set of questions about it.
- Is the change above the fold? → Changes above the fold are noticed by more people, thus increasing the likelihood of the test having an impact
- Is the change noticeable in under 5 seconds? → Show a group of people control and then variation(s), can they tell the difference after seeing it for 5 seconds? If not, it’s likely to have less impact
- Does it add or remove anything? → Bigger changes like removing distractions or adding key information tend to have more impact
- Does the test run on high traffic pages? → Relative improvement on a high traffic page results in more absolute dollars.
We’ve seen the power of solid conversion research, so many of the variables specifically require you to bring data to the table to prioritize your hypotheses.
- Is it addressing an issue discovered via user testing?
- Is it addressing an issue discovered via qualitative feedback (surveys, polls, interviews)?
- Is the hypothesis supported by mouse tracking heat maps or eye tracking?
- Is it addressing insights found via digital analytics?
Having weekly discussions on tests with these 4 questions asked from everyone will quickly make people stop relying on just opinions.
Then we also put bounds on Ease of implementation by bracketing answers according to the estimated time. Ideally you’d have a test developer be part of prioritization discussions.
We made this under the assumption of a binary scale – you have to choose one or the other. So for most variables (unless otherwise noted), you choose either a 0 or a 1.
But we also wanted to weight certain variables because of their importance – how noticeable the change is, if something is added/removed, ease of implementation. So on these variables, we specifically say how things change. For instance, on the Noticeability of the Change variable, you either mark it a 2 or a 0.
We built this model with the belief that you can and should customize the variable based on what matters to your business.
For example, maybe you’re operating in tangent with a branding or user experience team, and it’s very important that the hypothesis conforms to brand guidelines. Add it as a variable.
Maybe you’re at a startup whose acquisition engine is fueled primarily by SEO. Maybe your funding depends on that stream of customers. So you could add a category like, “doesn’t interfere with SEO,” which might alter some headline or copy tests.
Point is, all organizations operate under different assumptions, but by customizing the template, you can account for them, and optimize your optimization program.
Method 2 – Chris Goward’s P.I.E Framework
The WiderFunnel framework is about evaluating three different factors & assigning each on a score of 1-10.
“How much improvement can be made on the pages?”
Chris recognizes that all pages have room for improvement, but that you should be prioritizing your worst performers, taking into account analytics, customer feedback & expert heuristic analysis of user scenarios.
So, for example, I may take a look at the content landing page report, and identify the high traffic pages that get few conversions.
Looking at the report, I can find the pages where the majority of the visitors:
- Don’t stay for very long
- Don’t convert very well
From here, I can begin to narrow down which pages become a better testing priority.
Chris asks, “how valuable is the traffic to these pages?” then explains “Your most important pages are the ones with the highest volume & the costliest traffic.” In other words, if the page is performing terribly, but the there isn’t a high volume of traffic, or that traffic isn’t terribly expensive, then it’s not a testing priority.
But what if the majority of traffic is coming from organic search?
In this case, I would highly recommend assigning higher scores to the pages that are closely associated to what your core offering is. For example, we have articles that rank very well on the topics of value propositions, customer lifetime value, customer personas & social media marketing.
As an agency that focuses on clear, persuasive design, our content on value propositions & customer personas might take higher priority over social media marketing & customer lifetime value, as those aren’t what we specifically specialize in.
Later, we might chose to work out a partnership with another service provider, or do affiliate sales through other pages, but that would not come close to being a first priority.
“How complicated will the test be to implement on the page or template?
… A page that would be technically easy may have many stakeholders or vested interests that can cause barriers. I’m looking at you, home page.“
Internal politics kill conversions more often than anything else, so it’s absolutely critical to keep in mind which pages you’ll be met with the most resistance internally.
It’s also important to realize that not all changes are as easy as they seem. For instance, 99 high quality images from this site could end up costing you $1,980. Financially, that may not seem like much, but what happens when a single product page has 4-5 different images?
How much effort will it take to coordinate the shoot? Does the photographer come to you? Do you have to ship every product to them? What happens if you don’t like the photos?
These are all factors that go into the “Ease” score. You may find out that including a few testimonials on your product page is easier to implement & more cost effective – for now.
How to Rank Your Pages
With each piece of the prioritization framework – potential, importance, and ease – assign a score between 1-10 to help you understand which pages/elements are going to be the most beneficial & easiest to implement at a glance.
There are no hard & fast rules about the pages or tests to prioritize, but the idea is that you’ll be able to quickly identify where your low hanging fruits are in order to run successful tests & eventually get more internal buy in as more of tests are successful over time.
Method 3 – Sean Ellis’ F.U.D.W.M ?
In this article on the Optimizely blog, Sean breaks down the process a little further.
He recommends mining your data to find your biggest opportunities. Start by digging up:
- Your top 5 highest bounce rate pages
- Your top 5 highest abandonment points in your funnel
- The top 5 most valuable pages to your business
Once you have these pages, you should move on to step 2 & understand your visitors needs. Try to answer the four following questions:
- Why did they come to your website?
- What stopped them from converting?
- Did they find what they were looking for?
- If they did convert, what almost stopped them?
As you might imagine, a good portion of this can be implemented as an automatic feedback loop, and many insights can easily be collected with a tool like Qualaroo – if you’re asking the right questions at the right points in the funnel.
Sean then recommends moving on to Step 3, where you pick the pages you’re going to test. Personally, I like using the P.I.E framework in addition to this, so I can select pages that aren’t going to ruffle anyone’s feathers right off.
He also recommends doing big tests that are going after “bold, targeted changes” rather than “meek tests” so you can test for impact.
Step 4 is about designing your first 10 tests. Sean recommends starting with:
- 4 Message Tests
- 4 “A-Ha” Moment Tests
- 2 Large Scale Design Tests
Being the host of Page Fights, I can honestly say there is never enough clarity in the message. If the data is suggesting the clarity isn’t there, then these might be good starting points for you too. Sean said that many users didn’t know about a free trial on a product he was selling, proving it happens to everyone
“A-Ha” Moment tests are at the point in the conversion funnel where once the user takes the action, they’re much more likely to be a valuable customer.
“For Qualaroo, Sean mentioned that visitors seeing their first set of survey results is very important for user retention and satisfaction. This could be watching an intro video, or something else entirely. Focus on getting your visitors to these moments faster
Step 5 is about measuring the outcomes and planning new tests as a result of your new findings. It’s also critical to note here that just because the page itself produced a lift, does not always mean that the test was ultimately successful.
For example, you may get a big lift for free trial signups, but if 95% of those customers end up churning, the test was ultimately unsuccessful.
For more details on Sean’s method, check out the slideshow below.
Method 4 – Bryan Eisenberg’s Plan, Measure, Improve Framework
Finally, there’s Bryan Eisenberg‘s approach, which I really enjoy because for some reason, it encourages me to dig more into the human element, rather than looking at our visitor feedback as “qualitative data”.
Bryan encourages you to answer these three questions before you run your test:
- Who are we trying to persuade?
- What action do we want them to take?
- What action do they want to take?
What’s interesting about looking at it like this, is often times you’ll find that the action you want the visitor to take, and the action your visitor wants to take, aren’t necessarily the same.
http://t.co/4MWtYGdOa7 Key content requirements for product pages are: answer users’ questions, be direct, and help with product comparison.
— Peep Laja (@peeplaja) August 25, 2014
So like in the previous testing frameworks, the idea is not just to throw stuff out there, but rather to get as much feedback as possible from actual users, then respond with that feedback on whatever page or funnel you’re trying to improve.
The next part of the process is to design the test, and define it’s parameters for success. Bryan recommends asking three more questions:
- What action do we want them to take & how do we measure it?
- What page(s) do we test?
- Where/How do we judge success?
So, for example, you may have noticed in the past that many of your existing customers say they signed up because of your explainer video, so the goal of your test may be to get more people clicking to watch.
You’d set up click tracking as an event in Google Analytics, then create a segment of video watchers/converters to monitor for a lift.
To get more people watching, you may try testing the video’s thumbnails, and you’d judge success by whether or not there was a lift in the conversions that come from video watchers.
Once you have this data, the next step is to improve upon it. So for example, if there are more people watching the video and converting, what does the video cover that the page does not.
If that “A-Ha” moment in the video is due to a testimonial, or success guarantee, is that properly reflected on the page so it’s clear to the non-video watchers too?
Now that you have the data, what can you learn from it to make even bigger improvements?
Prioritizing Your Tests Using T.I.R
Time – How many calendar days, man hours, development hours etc. will be necessary for this test to have it’s maximum impact? “A score of 5 would be given to a project that takes the minimal amount of time to execute and to realize the impact.”
Impact – The amount of revenue (or reduced costs) that will change in the event of a successful test. Are you testing on the whole customer base, or just a segment? Are you looking at a 1% increase or 20%?
“A score of 5 would be given to a project that takes the minimal amount of time to execute and to realize the impact.”
Resources – How much are the tools, people, and everything else associated with this test going to cost? “A score of 5 is given when resources needed are few and are available for the project.”
He then recommends multiplying all of these scores together & start working on the projects that have the highest scores, as those will have the most lift with the least amount of time & resources.
So, even though the MECLabs/Magento study shows that the majority of businesses don’t have any sort of analytics or testing process in place, I hope that you, my favorite CXL reader, are now in a better place to get a step ahead.
But, I’d also love to hear the challenges you face internally when it comes to having a testing process adopted. Let’s kick off the conversation in the comments & see if we can help each other out.