When it comes to conversion rate optimization (CRO), the difference between success and failure often comes down to one thing above all else: strategy.
Teams that have a clear, focused CRO strategy are able to build momentum and deliver strong return on investment (ROI) over large timespans.
Teams that don’t – who focus on quick, tactical wins and low-hanging fruit – will find that their results begin to dry up pretty quickly.
In this article, we’re going to show you how to build a powerful strategy for your CRO program that will continue generating results for years and decades to come.
Table of contents
- Part one: Some prerequisites for success
- Part two: The scientific method applied to CRO
- 1. Questions: Formulating research questions and matching them to the methodology.
- 2. Research and observations: Conduct your research and document your observations.
- 3. Hypothesis: Proposing an explanation or prediction.
- 4. Experiment: Conducting controlled tests to validate the hypothesis.
- 5. Analysis: Interpreting the data from experiments.
- 6. Conclusion: Drawing conclusions that support or refute the hypothesis.
- 7. Reporting: Sharing results
- Final thoughts: Measuring success
Part one: Some prerequisites for success
Conversion rate optimization is a systematic approach to improving a website’s performance through iterative experimentation.
But, in actual fact, experimentation is only one step in the scientific process. If you’re going to build a sound CRO strategy for your program, you need a plan that accounts for each of these steps both individually and in coordination with one another.
As such, the bulk of this article is going to be about the scientific process, how it applies specifically to business experimentation, and how you can use it to drive consistent results for your program.
Before diving any deeper, though, we need to lay down a few foundations to ensure that your CRO program is optimally positioned for long-term success.
1. First, set your SMART goal
The very first thing you need to do, before you even think about running any experiments, is decide on a goal for your experimentation program. Your program’s goal should be two things:
- SMART: SMART is an acronym for specific, measurable, achievable, realistic, and timely. A SMART goal would be something like, “Increase revenue by 4% by the end of August 2025,” whereas “Increase website conversions” would not be.
- Ladder up to your business needs: Your business no doubt has some well-defined revenue or profit targets. Your program’s goal should ladder up to these overarching business targets.
By setting a clear goal for your program, you will have a much more focused experimentation program that stands a strong chance of achieving a good return on investment (ROI). This will ultimately mean that the business is more likely to invest in experimentation in the future.
Once you’ve set your goal, you now have a chance to split your experiment efforts across learn and earn experiments.
Your Earn Experiments will always ladder up to your SMART goal. Their role is to drive that ROI and get winners in the bag, increasing the likelihood of buy-in for learn experiments along the way.
Learn Experiments could ladder up to your SMART goal, but they are primarily there to drive innovation and answer big business questions. Learn Experiments could be things the business wants to do, but should test – new products, prices, and plans. Their role is to get people invested in experimentation outside of ROI.
Now that you have your SMART goal and you’ve split out your experimentation program into learn and earn arms, it’s time to think about where you will store all your experiments and the insights you gain from them.
2. Setting up your experimentation database
A database for your experiments is important for many reasons. Namely, your strategic prioritization of experiments and future-proofing your experimentation program.
You should have a single tool that is easily accessible and that stores all of your experiment information and learnings. Once you’ve set this up, you should then utilize taxonomies to bring consistency to your data storage. Taxonomies are systems of categorization. They are the lens through which macro-level patterns in your data begin to emerge. To a large extent, the effectiveness of a repository is dependent upon the effectiveness of the taxonomies it uses.
Here are some of the most important taxonomies you should implement upfront:
- Industry [More for agencies than in-house]
- Experiment number
- Hypothesis
- Execution
- Area
- Risk profile
- Build size
- KPI
- Outcome
- Result
- Key learning
And if you want to get more advanced, you can add in:
- Psychological principles: We took the skeleton of our psychological principles taxonomy from the book ‘Smart Persuasion’. In essence, we define each experiment based on the psychological principle that is in play in that experiment.
- Our Levers Framework: A lever, as we define it, is any feature of the user experience that influences user behavior. We utilize levers to bring a cohesive understanding to our insights and observations, the hypothesis we’re testing, and the structure to explore, exploit, and fold methodology. It’s a bigger subject than this post could capture, so here’s our whitepaper and webinar about the Levers Framework–feel free to take it and apply it to your own repository.
With a consistent database built on effective taxonomies, you will be able to categorize your experiments consistently. This will ultimately allow you to filter experiments and learnings to extract powerful macro insights that fuel your roadmap and drive outsized ROI. Eventually, you could also utilize this same database to run machine learning.
By using these tags, you can look at the bigger questions surrounding your experimentation program, like:
Does building bigger experiments mean we have more winners?
Using the ‘build size’ tag and your ‘outcome tag,’ you can answer this question pretty quickly. If the answer is no, you can readjust your concepts to be focused on the Minimum Viable Experiment (MVE), reducing design and dev costs.
Which page is the most valuable to test on?
Using your ‘outcome,’ ‘result,’ and ‘area’ tags here will give you what you need. The area should tell you where the test happened. Your result will give you the uplift so that you can work out the average uplift for that page. Using the outcome, you can then work out the win rate for each page. Once you know this, you may choose to double down on a certain page to reap more rewards. Alternatively, you could also choose to fold an area that is not worth testing at this time.
Should we be taking more risks, or are low-risk iterative changes better?
Using your ‘risk profile’ tag coupled with your ‘outcome’ and ‘results’ tag, you can understand where high, medium, or low-risk experiments have the best win rate and which have the highest average uplift. This is key for pivoting your CRO strategy at the right time. If you are consistently running low-risk experiments and seeing flat results time and time again, it could be time to adjust your risk profile.
Building an effective experiment repository is quite a complex task–but the long-term rewards are well worth it. We’ve only really been able to offer up a sketch of how to do this here, but if you’d like to dive into the weeds, you can read our in-depth blog post on the subject.
Part two: The scientific method applied to CRO
Developed and refined by a number of influential scientists over many years, “The scientific method” was initially intended as a means of unearthing truths about the natural world.
Since then, many of the world’s leading businesses–including Amazon, Netflix, Spotify, and more–have used this powerful methodology to unearth insights about their users and build world-class products and user experiences.
The scientific method follows a simple lifecycle:
- Questions: Formulating research questions and matching them to the methodology;
- Research and observations: Conduct your research and document your observations;
- Hypothesis: Proposing an explanation or prediction;
- Experiment: Conducting controlled tests to validate the hypothesis;
- Analysis: Interpreting the data from experiments;
- Conclusion & reporting: Drawing conclusions that support or refute the hypothesis and sharing results, then iterating.
Image created by Steph Le Prevost, 2024.
Throughout the remainder of this piece, we’ll show you how to apply this very same scientific thinking to your own experimentation program.
1. Questions: Formulating research questions and matching them to the methodology.
This is the research design phase of your scientific lifecycle. It is about gathering preliminary research that will eventually be used to inform your hypotheses and experiments.
Here, we recommend starting with the questions you want to answer and then mapping those questions to the best research methods.
For example, in the table below, we’ve mapped these initial research questions to various research types. Here you can see that several questions are now mapped to user testing, which immediately tells you that you should prioritise user testing as a research method.
Question | Research type |
What barriers do users experience on the signup flow? | User testing |
How do users perceive the cost of the product? | Customer interviews or user testing |
Why do users choose to purchase from us? | Survey |
Do users understand our offering from the homepage? | 5 second test or user testing |
What percentage of users drop off on the basket? | Analytics |
By coming up with your research questions first and mapping the appropriate methodologies to them, you will avoid the trap of data hunting in analytics–i.e., searching for data to support your hunches.
From here, you can conduct your research and write down your observations and insights.
2. Research and observations: Conduct your research and document your observations.
Just as with your experiment database, you should have a systematic approach to collecting and storing your research observations and insights. This is important as it will allow for easy formulation of future questions and hypotheses. This taxonomy forms the basis for a methodical way of capturing and utilizing research data:
- Learning: The learning is an insight that has arisen from one or more observations, e.g. if your observation was ‘50% of survey respondents said they did not trust the brand’, then your learning would be something like ‘users lack trust in the brand.’
- Observation: This is the specific thing you observed, e.g. users struggled to proceed through the flow; or else something like a direct quote from a survey or user testing participant.
- Barrier/Motivation: Does the observation and learning relate to something that is blocking the user from converting, or is it motivating them?
- Area: Which page on the website did you collect this data from? E.g. PDP, PLP, checkout, etc. This is important to collect as you can start to understand where most of your learnings are focused. For example, once you have gathered all your research learnings, you may observe that 70% of them are centered around the product page. This tells you immediately that there is an opportunity to optimize in this space.
- Source: What type of research did you use to generate this insight? E.g. competitor analysis or user testing. This matters because not all research methods are equal in the value they bring to an experimentation program.
3. Hypothesis: Proposing an explanation or prediction.
Once you’ve gathered your initial research data, the next step is to start forming your hypotheses for experimentation. A hypothesis is essentially a prediction, based on your research, that you are going to use an experiment to try and validate.
For example, let’s say a learning from your research was ‘users do not trust our brand.’ Then, a hypothesis you may want to test could be something along the lines of ‘by adding social proof to our landing pages in the form of Trustpilot reviews, we can increase the number of sales.’
Before charging ahead, though, it’s important to emphasize that you should always use a framework when writing your hypotheses–after all, the experiment is only as strong as the hypothesis, so getting this bit right is important. Everyone doing experimentation in your organization should be using the same framework to bring consistency and trust to your program.
Image created by Conversion.com, 2020.
For us, the above hypothesis framework works so well because it brings every element of the test together in one place. We’ll talk through each of the elements one by one, starting with research:
Quantitative and qualitative data is the first element in the framework. This will be centered around research we’ve collected in the previous steps in our cycle, and it will focus on why we are running the experiment in the first place. In essence: what data do we have that suggests this experiment is a good idea?
The next part separates the lever and concept. This distinction is crucial. A lever is the core theme for a test (e.g., “emphasizing urgency”), whereas the concept is the application of that lever to a specific area (e.g., “showing the number of available rooms on the hotel page”). It’s important to make the distinction as it affects what happens after a test is completed. If the experiment wins, you can apply the same lever to other areas, and look to iterate with bolder executions. If it loses, then it’s important to question whether the lever or the concept was at fault: was the concept misunderstood by users, or does the lever not resonate with them? More on this in the ‘Conclusion’ section.
The next part of the framework looks at selecting your success criteria upfront. The primary KPI of the experiment should be defined here. This is the KPI from which you calculate your Minimum Detectable Effect and decide on your stopping protocol. The duration is then set by your MDE calculation. Reading in depth about statistics for AB testing can help you establish the best stopping protocol.
Between the hypothesis and go go-live stage, you need to prioritize your experiments. There are a number of prioritization methodologies you can use to do this, but the important part is that you remove as much bias as possible and stick to the same method. Prioritization is an extremely important topic within experimentation, but, again, it’s slightly beyond the scope of this piece. If you want to read more about prioritization you can do so here.
4. Experiment: Conducting controlled tests to validate the hypothesis.
This is the part where you actually get to test the hypothesis! There is a lot we could talk about here, but mainly I want to call out a few things to watch out for around experiment reporting.
‘Peeking’ is when you look at your results and make a judgment call about them too early on. Whatever testing methodology you are using, you should stick to your test plan. If you use the fixed horizon method, then pause when you are due to. Pausing your test too early likely means your results will be undersampled, so your outcome will not be valid. If you are tempted to peek, the sequential methodology could be for you. Merrit Aho’s post “Peeking at your data? How to avoid false positives in sequential testing” from the CXL blog is worth a read.
A sample ratio mismatch (SRM) happens when the actual sample of the size of your test groups is significantly different from the expected ratios. For example, you set your test to 50:50 between your control and variant but somehow end up with 70:30. An SRM can indicate an issue with your set-up, the randomization process, data collecting, or outside factors affecting the distribution of traffic. They can affect the validity of your test results, so it’s important to look out for them. Most experimentation tools have a check in place for this now; however, if yours doesn’t, you can use this free online checker developed by Luka Vemeer.
5. Analysis: Interpreting the data from experiments.
You should have decided upon your stopping protocol before you launched your experiment, so you’ll already know whether your KPIs are statistically significant and if you have a loser, winner, or an inconclusive result.
People often make the mistake of digging too deep into inconclusive results. Ultimately, they do not hold enough evidence for us to glean real insights, from and it’s a dangerous game to play if you rely on them too much. If your statistical significance is set to 90-99%, then you are effectively saying that there is a 10-1% chance this is false. But if you then start looking at results at 30%, 40%, or even 80%, statistically speaking, you open yourself up to much more risk of false results. Interpreting trends needs to be done carefully and with education about the statistical significance.
The next step is to draw a conclusion: did your test win, lose, or was it inconclusive?
6. Conclusion: Drawing conclusions that support or refute the hypothesis.
Given all of the preliminary work you’ve done setting KPIs and defining goals, drawing conclusions from your experiment data will often be the easiest part of the entire process.
What’s important here is that you apply an ‘explore, exploit, fold’ methodology.
When you first start experimenting in a new space, you are in the explore phase: you explore completely new hypotheses, you gather new data, and you gain a novel understanding of what works and what doesn’t.
But once you’ve gathered some initial experimental data, you can now move onto the exploit phase, which is where you start exploiting the learnings from your previous tests to drive impact now.
In the case of a winner, you may look to exploit by experimenting with the same concept in different areas or by moving to a bolder execution.
In the case of a loser, you may choose to iterate now you understand what does not work and can pivot toward what does. Or if you have consistently seen losers, you may decide to fold.
In the case of inconclusive results, they happen for three reasons:
- The execution is at fault
The changes weren’t the right ones. If it’s the first attempt, it might simply be a matter of execution, so we should consider alternative approaches based on experiment data. - The hypothesis is not right
If repeated attempts at executing this hypothesis fail, then the hypothesis might be the problem. We should check if we’ve chosen the right lever to solve the problem. - There is a data issue
If we’ve tested a few different levers to tackle a problem identified through data, the data itself might be the issue. The problem itself might not be as big as we thought, or we may need more comprehensive research methods to verify our insights.
7. Reporting: Sharing results
It is important to share your experiment results and learnings widely with the business. Experimentation should not just be seen as a means to bring in revenue but as a way to answer big business questions and de-risk decisions.
Reporting is often a crucial means of gaining buy-in for your program–the kind of buy-in that you will need to maximize your program’s impact.
With this in mind, we thought we’d share some of our top tips on how to gain buy-in from your stakeholders:
- Listen to them, just like you conduct user research to understand your users’ motivations and barriers. Do the same with your stakeholders. Seek to understand them. This will not only benefit your communication but could give you more context for the wider business which will aid experimentation.
- Be prepared to push back. You are the specialist; you have the knowledge and the research, so utilize it. Get ready to debate concepts, discuss risk, and use your data. Know when to fight your corner and when to concede until a later date. This is something that is really only learned by doing.
- Shout about success; shout louder about failure. Experiments that win will likely be perceived as guarantees. ‘Well, of course, it won; why would we even test that.’ Whereas, there is true power in a losing experiment. We know exactly what does not work and why. We can disprove long-standing assumptions and blow ‘guaranteed winners’ out of the water.
- Ask why. A lot of businesses function in a mindset where something is done a certain way because it has always been that way. Experimentation, at its core, challenges this mindset, so as experimenters, we need to get comfortable with doing that too. Get comfortable asking why things are the way they are, and challenging them – with data to back you up. Push the experimentation mindset and empower people to test and learn.
By building your reporting processes and presentations around the priorities and pain points that arise during these discussions, you should hopefully be able to increase the relevance and visibility of your program within your organization.
“Experimentation can only succeed in an organization if everyone supports it. But it can be challenging to get this support – especially from the C-suite. They’re used to getting their own way, without having to justify it. Turning ideas and plans into measurable experiments that are visible for all to see can make people uncomfortable. That’s because, with experimentation, it doesn’t matter if you’re the CEO or the guy making the tea – your ideas will be measured objectively.”
(Stephen Pavlovich Founder of Conversion)
Final thoughts: Measuring success
We’ve now covered the entire experiment cycle, from questions and research all the way through to drawing conclusions and reporting. As will hopefully be clear, once you’ve reached the reporting phase, you’re then in a position to return to the question phase–only this time, you’ll have new experiment-backed questions that you can use your research to address.
As you repeat this cycle, you’ll find that your research questions become more refined, that your research itself becomes more valuable, that your hypotheses become better informed–and that the entire CRO strategy and program progressively builds momentum and generates more and more impact.
But before signing off, it’s worth making one final point:
How do you know if your experimentation program is running well?
Measuring the success of your experimentation program comes down to three vital signs: Velocity, Volume, and Value.
- Velocity is the speed at which you get experiments from concept to live;
- Volume is the number of experiments you are running;
- Value is not only the revenue impact but the learning side of the program as well.
Velocity isn’t about skipping steps to speed up the process; it’s about being methodical, organized, and agile. You need to deliver your experiment quickly to A) Avoid the control changing whilst you’re working on the test and B) Learn at a decent rate to support the volume of tests you want to run.
Volume is important because the more you test, the more you learn, and the more impact you can have in a shorter amount of time. But remember: don’t just test for the sake of it – make sure what you put out is research-backed and makes sense with your CRO strategy and goal.
Value is all about what you and your stakeholders see as value from experimentation. Yes, you have your SMART goal so that’s the first thing to measure this against. But the second is much more holistic: experimentation is a mindset – it’s a tool to be used to help businesses grow. The likes of Netflix, Spotify, Amazon, and Booking.com all run massive numbers of experiments every day. They live and breathe the test and learn mantra–where failing fast is really seen as a good thing, and that has helped them greatly in their journey to success.
Understanding the bottom line impact of your program is important–but what’s equally important is evaluating the intangible benefits that experimentation brings to your business as a whole. The most effective experimentation businesses are those that allow experiment insights to drive decision-making across every area and stratum of the business–from minor design and copy decisions all the way up to huge business-critical decisions at the C-suite.
“Our success at Amazon is a function of how many experiments we do per year, per month, per week, per day.”
(Jeff Bezos, founder of Amazon)
“We don’t experiment because we like running experiments, but because experimentation is a great way to make sure that when we think we’re fixing something, we’re actually fixing it. Change is constant, we have to keep updating our products to make them better, but we also have to make sure those changes really work.”
(Luka Vermeer, Director of Experimentation Booking.com)
Ready to Level Up Your CRO Game?
Whether you’re just starting out with conversion optimization and experimentation or you’re an experienced professional looking to master advanced strategies, CXL is your ultimate resource.
From beginner-friendly courses that cover the fundamentals to in-depth programs designed to elevate your expertise, CXL offers industry-leading training led by top CRO practitioners. Start building the skills you need to drive measurable results and scale your experimentation program with confidence.