Imagine you’ve been working on optimizing a site for a while now, say 3, 6 or even 12 months.
You’ve had solid winners each month, and you’re confident in the test results. These are not imaginary lifts. But now your conversion rate looks the same as when you started. How do you explain this to the boss/client?
Another scenario: you’ve been optimizing for 12 months and your revenue per customer has increased by 2%. Same question: how can you justify your contribution? How can you tell what caused that – optimization, SEM, seasonality, word-of-mouth, or something else?
How do you measure the the ROI of your optimization efforts? The question is actually more complicated than it sounds.
Table of contents
- ROI is Difficult to Measure
- Time Period Comparison in Analytics (and Why It’s Wrong)
- Tests to Gauge Impact
- Are There Opportunity Costs?
ROI is Difficult to Measure
In fact, it’s easy to project the predicted ROI of optimization (click here to download conversion optimization ROI calculator). It’s just really hard to measure it, post-hoc.
In 2012, MarketingSherpa posed the question, “Did optimization or testing demonstrate ROI in 2011?” Here are the results:
Not really surprising, really. Measuring ROI of optimization is hard. If anything, I’m skeptical of the 38% that demonstrated positive ROI. How, indeed, did they demonstrate ROI?
There’s a quote in the article from Amelia Showalter, former Director of Digital Analytics for Obama for America, that explains how hard it is to track and measure everything, at least in the long term:
“When we’re working on the campaign, we’re actually working so hard to run all those tests that we didn’t always keep perfect track of exactly what results were long term. It’s hard to calculate this stuff out when we want to put all our resources into running more tests. So, we don’t actually ever have a perfect estimate of actually how much extra revenue was due to our testing, but I think that $200 million is a fairly reasonable estimate.”
The article also sums things up by saying, “You can also take heart that if you’re running valid tests, you are likely improving the bottom line.”
While that’s heartwarming, it’s not going to satisfy a neurotic boss or client. We’ve got to corner a way to measure our impact. How can we possibly do that?
Time Period Comparison in Analytics (and Why It’s Wrong)
If asked to measure improvement in conversion rate due to optimization efforts, most people would point to Google Analytics. They would perform a time period comparison, looking back 6-12 months ago when you started the campaign and comparing with the conversion rate you have now (linear analysis).
This won’t tell the full story for a few reasons, the big one being the variability of your traffic quality.
Several things can affect your traffic quantity and quality, including but not limited to:
- Press (positive or negative)
Let’s say you’re at a 2% conversion rate with 100,000 monthly visitors to start. Over the course of a year, a lot can change the quality of your traffic. If you’re selling novelty gifts, the holidays might improve your conversion rate with negligible impact from your optimization efforts. Similarly, if you hit the front page of Hacker News, you’ll get a lot of traffic – but the quality might be really shitty, lowering your present average conversion rate.
Conversion Rates Are Non-Stationary Data
A stationary time series is one whose statistical properties (mean, variance, autocorrelation, etc) are constant over time. According to an article on Duke University’s website, “A stationarized series is relatively easy to predict: you simply predict that its statistical properties will be the same in the future as they have been in the past!”
But as Investopedia says, data points are often non-stationary:
“Non-stationary data, as a rule, are unpredictable and cannot be modeled or forecasted. The results obtained by using non-stationary time series may be spurious in that they may indicate a relationship between two variables where one does not exist.”
That’s essentially the nature of data. Whether because of seasonality, day of week, external factors, press, advertising, etc, data just fluctuates. Even if you didn’t change anything on your site for a month, you’re not going to get the same result every day. It will fluctuate – sometimes a little, sometimes a lot.
Andrew cites this as the reason time period comparison in analytics won’t work for accurately measuring your ROI, and he gives a great example below:
“You can be costing your company millions and think that everything is better by relying on pre/post. Because of this it is less useful than just flipping a coin. Both have nothing to do with measuring the outcome of a change, but at least with the coin you won’t confidence yourself that the data means something.”
A Possible Exception
After talking to Craig Sullivan, I found out it is possible to do time period comparison. However, you have to have a predictable traffic stream (ie PPC) and even then it is rough. Craig explains it well:
If we were to assume that PPC traffic is “reliable”, we’d also have to assume that you haven’t changed daily budget, haven’t changed your keywords, and haven’t changed your ad copy. Three, six of twelve months is a very long period and there are too many variables. It’s not the same traffic anymore. In fact, the variables are constantly changing in AdWords, sometimes daily:
high-volume accounts see daily bid/budget adjustments and monthly ad tests. Underlying structure might change 30-40% over 3-6mos.
— Leonardo Saraceni (@leosaraceni) August 27, 2015
Also – you also can’t draw broad conclusions from PPC data because you can’t assume that all traffic sources will behave similarly. What works for PPC traffic might not work for returning direct traffic, SEO traffic and so on.
Tests to Gauge Impact
“It can be extremely difficult to explain results when it looks like things are flat or overall down. The fundamental problem is that people are using a linear correlative data set instead of the comparative data that a test provides, or in other words you are saying that you are X percent better, not necessarily X percent better of a specific number. All data is sinusoidal, it goes up and it goes down, despite test results.”
If time period comparison won’t work, what will? There are a few ways to measure impact. None of them are perfect – and there are pros and cons of each – but nonetheless, they’re better than nothing.
1. Retest old versions of the site later on
One of the easiest ways to measure ROI is to retest old versions of the site as part of larger tests later on. Basically, all changes made during the testing period (combined into one metric) tested against the old version.
As Andrew Anderson said:
Though like most methods here, there are some pros and cons. According to Craig Sullivan, if you’ve been continuously improving and learning, it might not be worth the time to test an old version. Craig:
2. Weak Causal Analysis
Another method is weak causal analysis.
As Andrew Anderson said, “use weak causal analysis to get a read on estimated impact. In both cases (cause analysis and retesting old versions) you will often find that you are actually having a bigger impact than you imagine. It is important that you are doing this analysis without prompting and proactively giving others a full evaluation of the overall program.”
What’s is weak causal analysis? Basically this: Do a long term trend line with an estimated error rate. Take that based on prior data before the change and look at the outcome as compared to the expected outcome of the trend line. Make sure you are using independent variables as a basis (like users) so that you can get some read on where you would have been versus where you are.
“Anything that can approximate causal information is better than nothing but has a much higher chance of Type I or Type II errors (a ‘false positive’ and a ‘false negative,’ respectively),” according to Andrew.
Not perfect but better than nothing.
3. Measure Impact Through Various Stages of a Funnel
According to Chris Stucchio, another method is to attempt to “measure the effect of your optimizations on the various stages of a funnel.”
Say these are your stages:
- Step 1: click from email to site
- Step 2: add product to cart
- Step 3: go to checkout
- Step 4: buy
It’s possible you might not have enough data to actually measure a difference at step 4. But as Chris said:
“You can often infer data about step 4 from steps 1-3 (i.e. if you made a significant impact on the percentage of people reaching step 3, it is *likely* (though not guaranteed) that you increased conversions). There rigorous ways to estimate this statistically, but they are again somewhat difficult to do.”
4. Send a small part of your traffic to a consistent base
Here’s what Lukas Vermeer said in a previous quote:
Sending a small part, 5-10% of your total traffic, to a consistent control seems to be the most accurate way to track impact of optimization. This is the method that I heard most consistently from expert optimizers, anyway.
Chris Stucchio explains how this works:
Of course, the question then is with the opportunity costs. If you’re not optimizing 10%, you’re (maybe) missing out on increased revenue. You’re also dealing with less optimizable traffic, so tests will take longer to reach significance.
Are There Opportunity Costs?
As Peep said in a previous article, “Testing something is an opportunity cost – means you can’t test something else. While I’m re-validating something here, I could be testing something else that gives me a lift (but of course, it’s not possible to know whether it would). It’s also questionable whether you should be re-testing it.”
Or as Joshua Kennon put it, “everything in life has an opportunity cost.”
This is a question of your specific goals and risk tolerance. Andrew Anderson explains that it’s always worth it to improve your performance, which might mean taking the time to test impact over the long term:
Here’s Craig’s take on opportunity costs:
Then again, optimization is more than just a/b testing and lifts. Matt Gershoff, CEO of Conductrics, put it well, saying part of it is about “gathering information to inform decisions.” In other words, optimization is about reducing uncertainty, and therefore risk aversion, in decision making. So you have to factor in everything else you gain from conversion optimization.
Craig also mentioned that conversion optimization isn’t just about the testing. It’s about the big picture:
Measuring ROI is hard. But there are a few ways to do it.
There are some statistically rigorous methods of calculating impact (GA Effect, weak causal analysis), and even though time period comparison analysis is generally wrong (due to non-stationary data), as Craig mentioned, there are a few exceptions when you can get a rough estimate (if you have stable and controllable traffic, like with PPC – though you might not be able to draw overall conclusions this way.). Finally, one of the most common answers I found was to send a consistent amount of traffic to a small holdback set.
Keep in mind, too, that when done correctly, optimization and the insight you gain can be used in all of your marketing. It’s a process that leads to information that informs better decisions, so the return on investment compounds with the customer insight you gain.