Email Testing: Going Beyond Open Rate and Click Rate

Email is one of the few marketing channels that spans the full funnel. You use email to raise awareness pre-conversion. To stay connected with content subscribers. To nurture leads to customers. To encourage repeat purchases or combat churn. To upsell existing customers.

Getting the right email to the right person at the right time throughout the funnel is a massive undertaking that requires a lot of optimization and testing. Yet, even some mature email marketing programs remain fixated on questions like, “How can we increase the open rate?” Moar opens! Moar clicks!

What about the massive bottom-line impact email testing can have at every stage of the funnel? How do you create an email testing strategy for that? It starts by understanding where email testing is today.

What is email testing?
The current state of email testing
- Why email testing often falls flat
The step-by-step process to testing email journeys
Common pitfalls in email testing—and how to avoid them
Conclusion

What is email testing?

Email testing refers to sending different versions of your mails to check the impact of each variation, optimize your email journey, and ultimately get the right email to the right person at the right time.

The current state of email testing

According to the DMA, 99% of consumers check their email every single day. (Shocking, I know.)

In 2014, there were roughly 4.1 billion active email accounts worldwide. That number is expected to increase to nearly 5.6 billion before 2020. In 2019, email advertising spending is forecasted to reach $350 million in the United States alone.

Despite the fact that email continues to thrive over 40 years after its inception, marketers remain fixated on top-of-funnel engagement metrics.

According to research from AWeber:

434 is the average number of words in an email.
43.9 is the average number of characters in an email subject line.
6.9% of subject lines contain Emojis.
60% of email marketers use sentence case in subject lines.

According to benchmarks from Mailchimp:

0.29% is the average email unsubscribe rate in the architecture and construction industry.
1.98% is the average email click rate in the computers and electronics industry.
0.07% is the average hard bounce rate in the daily deals and e-coupons industry.
20.7% is the average open rate for a company with 26–50 employees.

But why are these statistics the ones we collect? Why do blog posts and email marketing tools continue to prioritize surface-level testing, like subject lines (i.e. open rate) and button copy (i.e. click rate)?

Why email testing often falls flat

Those data points from AWeber and Mailchimp are perhaps interesting, but they have no real business value.

Knowing that the average click rate in the computers and electronics industry is 1.98% is not going to help you optimize your email marketing strategy, even if you’re in that industry.

Similarly, knowing that 434 is the average number of words in an email is not going to help you optimize your copy. That number is based on only 1,000 emails from 100 marketers. And, of course, there’s no causal link. Who’s to say length impacts the success of the emails studied?

For the sake of argument, though, let’s say reading that 60% of email marketers use sentence case in their subject lines inspired you to run a sentence case vs. title case subject-line test.

Congrats! Sentence case did in fact increase your open rate. But why? And what will you do with this information? And what does an open rate bump mean for your click rate, product milestone completion rates, on-site conversion rates, revenue, etc.?

A test is a test is a test. Regardless of whether it’s a landing page test, an in-product test, or an email test, it requires time and resources. Tests are expensive—literally and figuratively—to design, build, and run.

Focusing on top-of-funnel and engagement metrics (instead of performance metrics) is a costly mistake. Open rate to revenue is a mighty long causal chain.

If you’re struggling to connect email testing and optimization to performance marketing goals, it’s a sign that something is broken. Fortunately, there’s a step-by-step process you can follow to realign your email marketing with your conversion rate optimization goals.

The step-by-step process to testing email journeys

Whether you’re using GetResponse or ActiveCampaign, HubSpot or Salesforce, what really matters is that your email marketing tool is collecting and passing data properly.

Whenever you’re auditing data, ask yourself two questions:

Am I collecting all of the data I need to make informed decisions?
Can I trust the data I’m seeing?

To answer the first question, have your optimization and email teams brainstorm a list of questions they have about email performance. After all, email testing should be a collaboration between those two teams, whether an experimentation team is enabling the email team or a conversion rate optimization team is fueling the test pipeline.

Can your data, in its current state, answer questions from both sides? (Don’t have a dedicated experimentation or conversion rate optimization team? Email marketers can learn how to run tests, too.)

With email specifically, it’s important to have post-click tracking. How do recipients behave on-site or in-product after engaging with each email? Post-click tracking methods vary based on your data structure, but there are five parameters you can add to the URLs in your emails to collect data in Google Analytics:

utm_source;
utm_medium;
utm_campaign;
utm_term;
utm_content.

Learn how to use these parameters to track email to on-site or in-product behavior here.

UTM parameters connect in-email behavior to on-site behavior. (Image source)

The second issue—data integrity—is more complex and beyond the scope of this post. (Thankfully, we have another post that dives deep into that topic.)

Once you’re confident that you have the data you need and that the data is accurate, you can get started.

1. Mapping the current state

To move away from open rate and click rate as core metrics is to move toward journey-specific metrics, like:

Gross customer adds;
Marketing-qualified leads;
Revenue;
Time-to-close.

By focusing on the customer journey instead of an individual email, you can make more meaningful optimizations and run more impactful tests.

The goal at this stage is to document and visualize as much as you can about the current state of the email journey in question. Note any gaps in your data as well. What do you not know that you wish you did know?

It all starts with a deep understanding of the current state of the email journey in question. You can use a tool like Whimsical to map it visually.

example of user flow to apply to email. — *An example from Whimsical of how to map a user flow. While their example maps on-site behavior, a similar diagram works for email, too. (Image source*)

Be sure to include:

Audience data;
Subject line and preview text for each email;
Days between each email;
Data dependencies;
Automation rules;
Personalization points, and alternate creative and copy (if applicable);
Click rates for each call to action (CTA);
On-site destinations and their conversion rates (for email, specifically).

Really, anything that helps you achieve a deep understanding of who is receiving each email, what you’re asking them to do, and what they’re actually doing.

Take this email from Amazon Web Services (AWS), for example:

email example from amazon web services. — (Image source)

There are a ton of different asks within this email. Tutorials, a resource center, three different product options, training and certification, a partner network⁠—the list goes on.

Your current state map should show how recipients engage with each of those CTAs, where each CTA leads, how recipients behave on-site or in-product, etc. Does the next email in the sequence change if a recipient chooses “Launch a Virtual Machine” instead of “Host a Static Website” or “Start a Development Project,” for example?

Your current state map will help answer questions like:

How does the email creative and copy differ between segments?
Who receives each email and how is that decision made?
Which actions are recipients being asked to take?
Which actions do they take most often?
Which actions yield the highest business value?
How frequently are they asked to take each action and how quickly do they take it on average?
What other emails are these recipients likely receiving?
What on-site and in-product destinations are email recipients being funneled to?
What gaps exist between email messaging and the on-site or in-product messaging?
Where are the on-site holes in the funnel?
Can post-email, on-site, or in-product behavior tell us anything about our email strategy?

2. Mapping the ideal state

Once you know what’s true now, it’s time to find optimization opportunities, whether that’s an obvious fix (e.g. an email isn’t displaying properly on the iPhone 6) or a test idea (e.g. Would reducing the number of CTAs in the AWS email improve product milestone completion rates?).

There are two methods to find those optimization opportunities:

Quantitatively. Where are recipients falling out of the funnel, and which conversion paths are resulting in the highest customer lifetime value (CLTV)?
Qualitatively. Who are the recipients? What motivates them? What are their pain points? How do they perceive the value you provide? What objections and hesitations do they present?

The first method is fairly straightforward. Your current state map should present you with all of the data you need to identify holes and high-value conversion paths.

The second method requires additional conversion research. (Read our comprehensive guide to conducting qualitative conversion research.)

Combined, these two methods will give you a clear idea of your ideal state of the email journey. As best you can, map that out visually as well.

How does your current state map compare to your ideal state map? They should be very different. It’s up to you to identify and sort those differences:

Insights	Quick Fixes	Test Ideas	Data Gaps
What did you learn during this entire journey mapping process that other marketers and teams will find useful?	What needs to be fixed or implemented right away? This is a no-brainer that doesn’t require testing.	What needs to be tested before implementation? This could be in the form of a full hypothesis or simply a question.	What gaps exist in your measurement strategy? What’s not being tracked?

3. Designing, analyzing, and iterating

Now it’s time to design the tests, analyze the results, and iterate based on said results. Luckily, you’re reading this on the CXL blog, so there’s no shortage of in-depth resources to help you do just that:

How to create a meaningful hypothesis;
How to account for segments;
How to define stopping rules;
How to limit sample pollution and control variables;
How to analyze the results of the test properly and without bias;
How to communicate the results to stakeholders;
How to use the results to fuel the next iteration;
How to prioritize tests.

Short on time? Read through our start-to-finish post on A/B testing.

Common pitfalls in email testing—and how to avoid them

1. Testing the email vs. the journey

It’s easier to test the email than the journey. There’s less research required. The test is easier to implement. The analysis is more straightforward—especially when you consider that there’s no universal customer journey.

Sure, there’s the nice, neat funnel you wax poetic about during stakeholder meetings: session to subscriber, subscriber to lead, lead to customer; session to add to cart, cart to checkout, checkout to repeat purchase. But we know that linear, one-size-fits-all funnels are a simplified reality.

The best customer journeys are segmented and personalized, whether based on activation channel, landing page, or onboarding inputs. Campaign Monitor found that marketers who use segmented campaigns report as much as a 760% increase in revenue.

When presented with the choice of running a simple subject line A/B test in your email marketing tool or optimizing potentially thousands of personalized customer journeys, it’s unsurprising many marketers opt for the former.

But remember that email is just a channel. It’s easy to get sucked into optimizing for channel-level metrics and successes, to lose sight of what that channel’s role is in the overall customer journey.

Now, let’s say top-of-funnel engagement metrics are the only email metrics you can accurately measure (right now). You certainly wouldn’t be alone in that struggle. As marketing technology stacks expand, data becomes siloed, and it can be difficult to measure the end-to-end customer journey.

Is email testing still worth it, in that case?

It’s a question you have to ask yourself (and your data). Is there an inherent disadvantage to improving your open rate or click rate? No, of course not (unless you’re using dark patterns to game the metrics).

The question is: is the advantage big enough? Unless you have an excess of resources or are running out of conversion points to optimize (highly unlikely), your time will almost certainly be better spent elsewhere.

2. Optimizing for the wrong metrics

Optimization is only as useful as the metric you choose. Read that again.

All of the research and experimentation in the world won’t help you if you focus on the wrong metrics. That’s why it’s so important to go beyond boosting your open rate or click rate, for example.

It’s not that those metrics are worthless and won’t impact the bigger picture at all. It’s that they won’t impact the bigger picture enough to make the time and effort you invest worth it. (The exception being select large, mature programs.)

Val Geisler of Fix My Churn elaborates on how top-of-funnel email metrics are problematic:

Most people look at open rates, but those are notoriously inaccurate with image display settings and programs like Unroll.me affecting those numbers. So I always look at the goal of the individual email.

Is it to get them to watch a video? Great. Let’s make sure that video is hosted somewhere we can track views once the click happens. Is it to complete a task in the app? I want to set up action tracking in-app to see if that happens.

It’s one thing to get an email opened and even to see a click through, but the clicks only matter if the end goal was met.

You get the point. So, what’s a better way to approach email marketing metrics and optimization? By defining your overall evaluation criterion (OEC).

To start, ask yourself three questions:

What is the tangible business goal I’m trying to achieve with this email journey?
What is the most effective, accurate way to measure progress toward that goal?
What other metric will act as a “check and balance” for the metric from question two? (For example, a focus on gross customer adds without an understanding of net customer adds could lead to metric gaming and irresponsible optimization.)

In “Advanced Topics in Experimentation,” Ronny Kohavi of Microsoft explains how an experience at Amazon taught him that engagement metrics are easy to game:

The question is what OEC should be used for these programs? The initial OEC, or “fitness function,” as it was called at Amazon, gave credit to a program based on the revenue it generated from users clicking-through the e-mail.

There is a fundamental problem here: the metric is easy to game, as the metric is monotonically increasing: spam users more, and at least some will click through, so overall revenue will increase. This is likely true even if the revenue from the treatment of users who receive the e-mail is compared to a control group that doesn’t receive the e-mail.

Eventually, a focus on CLTV prevailed:

The key insight is that the click-through revenue OEC is optimizing for short-term revenue instead of customer lifetime value. Users that are annoyed will unsubscribe, and Amazon then loses the opportunity to target them in the future. A simple model was used to construct a lower bound on the lifetime opportunity loss when a user unsubscribes. The OEC was thus

Where 𝑖 ranges over e-mail recipients in Treatment, 𝑗 ranges over e-mail recipients in Control, and 𝑠 is the number of incremental unsubscribes, i.e., unsubscribes in Treatment minus Control (one could debate whether it should have a floor of zero, or whether it’s possible that the Treatment actually reduced unsubscribes), and unsubscribe_lifetime_loss was the estimated loss of not being able to e-mail a person for “life.”

Using the new OEC, Ronny and his team discovered that more than 50% of their email marketing programs were negative. All of the open- rate and click- rate experiments in the world wouldn’t have addressed the root issue in this case.

Instead, they experimented with a new unsubscribe page, which defaulted to unsubscribing recipients from a specific email program vs. all email communication, drastically reducing the cost of an unsubscribe.

Amazon learned that creating multiple lists (rather than a single “unsubscribe”) was key to increasing CLTV.

3. Skimping on rigor

Email marketing tools make it easy to think you’re running a proper test when you’re not.

Built-in email testing functions are the equivalent of on-site testing tools flashing a green “significant” icon next to a test to signal it’s done. (We know that’s not necessarily true.)

Email tests require the same amount of rigor and scientific integrity as any other test, if not more. Why? Because there are many little-known nuances to email as a channel that don’t exist on-site, for example.

Val sees companies calling (and acting upon) email tests too soon and allowing external validity threats to seep in:

Too many people jump to make changes too soon. Email should be tested for a while (every case varies, of course), and no other changes should be made during that test period.

I have people tell me they changed their pricing model or took away the free trial or did some other huge change in the midst of testing email campaigns. Well that changes everything! Test email by itself to know if it works before changing anything else.

Designing a test for a single-send email is different than designing a test for an always-on drip campaign. Designing a test for a personalized campaign is different than designing a test for a generic campaign.

To demonstrate the complexity of email testing, let’s say you’re experimenting with frequency. You’re sending the control group three emails and the treatment group five emails. Halfway through the test, you realize the treatment group’s unsubscribe rate increased because of the increased frequency. Suddenly, you don’t have a large enough sample size to call the test either way.

To run a valid test, you’ll need to account for potential email unsubscribes in your sample size. CXL has a free sample size calculator.

Also consider that the email journey you’re testing is (likely) one of many. Even at a mid-sized company, you have to start controlling for automation. How do the other emails the test participants receive impact the test? How do the emails they receive differ from the emails the other recipients in their assignment (control or treatment) receive?

And we haven’t even touched on segmentation. Let’s say you market a heatmap tool and have one generic onboarding journey but want to test a personalized onboarding journey. You know different segments will respond differently, and that their brands of personalization will differ.

So, you segment once: people who switched from a competitive tool, people who have started their first heatmap, and people who have not started their first heatmap. And again: solopreneurs, agencies, and enterprise companies. Before you know it, you’re trying to design and build nine separate tests.

The point is not to scare you away from the complexity of email testing and optimization. It’s to remind you to invest the time upfront to properly design, build and run each test. What are the potential validity threats? What is your sample size and have you accounted for fluctuations in your unsubscribe rate? How will the effects of personalization impact your test results? Are you segmenting pre-test or post-test?

There’s nothing inherently wrong with post-test segmentation, but the decision to segment results can’t be made post-test. As Chad Sanderson of Microsoft explains:

Like anything else in CRO, constructing a segmentation methodology is a process, not something to be done on a whim once a test finishes.

Segmentation is a wonderful way to uncover hidden insights, but it’s easy to discover false positives and run into sample- size limitations when segmenting post-test. The famous line, “If you torture the data long enough, it will confess to anything,” comes to mind.

Conclusion

If you want to develop your email marketing program beyond “more opens” and “more clicks,” you have to:

Align on a strategic overall evaluation criterion (OEC) that goes beyond open rate and click rate.
Map out the current state of your email journey.
Map out the ideal state of your email journey. How do they compare?
Extract relevant insights to share with other teams and stakeholders.
Implement quick fixes you spot along the way.
Use your journey maps to generate a list of test ideas. Then prioritize them, run them, analyze them, and iterate.

Focusing on the customer journey will help you make smarter email testing decisions and invest your limited resources in the highest value optimization opportunities. It will also serve as a catalyst for improved segmentation and landing pages, for example.

Email Testing: Going Beyond Open Rate and Click Rate

Table of contents

What is email testing?