The Psychology of a Successful Testing Program

All testing programs, no matter how great or awful, think they are doing pretty good and can get better.

Having spent the better part of 13 years working with testing programs of various sizes to turn their programs around or to help them go to levels that they did not know were possible, one thing stands out more than any other. It doesn’t matter what size the programs are or the industry, of the over 300 programs I have worked with almost all of them suffer from some very similar issues.

What you think of your testing program means little about its actual performance
Are you Ready for a Testing Program?
Deconstruction – The Testers Best Friend
The Need for Rational Decisioning
- Determine your single success metric before you test
- Don’t report on microconversions
Cognitive Dissonance – Your Greatest Enemy
- ll ideas have to be treated the same, whether you think they will win or not
Helpful Rules
Conclusion

What you think of your testing program means little about its actual performance

In many ways the actual outcomes of the program have no correlation to how those in the programs view the value of their own program. This myopia can lead to a number of missed opportunities to improve or change the status quo. In many cases the steps needed to change or improve a program are easy, but the hang ups of those in charge can be hard to sway. That is why it is so vital that programs start with the right mentality and then grow from there.

With that in mind lets take a moment and start over fresh. Lets look at the real building blocks, steps, tactics, and tools necessary to really have a successful program. It is often easier to evaluate things from the ground up instead of just tackle small incremental changes one at a time. If we think of each program as starting from scratch then it is far easier to tackle the big things that might have been missed on the way.

Are you Ready for a Testing Program?

Right off the bat is the fundamental question of is a testing program the right thing for you and your organization? This is not as simple as it seems as far too many programs are actually negative to the bottom line of their organizations. Just because you want to test doesn’t mean that it is the right tactic for you as there are many other tactics that you can use to really grow your business.

Do you have enough traffic?

Let’s start with the really basic and easy requirement. You need to have at least 500 conversions a month. We will discuss in a bit what those 500 are, but essentially you have to have 500 unique actions that lead directly to revenue. While this may seem to really cut the legs off of smaller organizations the reality is that it just means that you have to choose the tactics that make sense for where you are at as a business.

Image credit

You may not be there today but there are many growth options for you to leverage. Once you do those then you should be ready at least move forward from a traffic standpoint. 500 is not a firm number but you need to have enough traffic that you can avoid simple A vs. B set-ups and so that you can move on to the really important part of testing.

Do you have the right mindset?

The second requirement is by far the hardest one and is the one that changes completely the scale and consistency of your program. Most programs limit themselves greatly because they view testing as a way to choose from one thing or the other, or as a tactic to prove their opinions or their analysis correct. The truth is that this is the least efficient form of optimization.

The real challenge is in getting yourself and your organization ready to accept one really simple truth: Being wrong is far more valuable than being right

What does that mean? Well lets think about the real core goal of a testing program: To provide the maximum amount of revenue gain for resources you have available.

What that really means is that everything is about efficiency. A 10% lift that takes 6 months is good, one that takes 2 months is better and one that takes 2 weeks even better. Likewise a 10% lift to 30% of your traffic is good, 50% better and 100% the best. This means that everything you do becomes about one simple equation:

( Population * Influence ) / Cost

It also means that just knowing you got the 10% lift isn’t enough, because you have to compare it in order to understand the context and whether it was a good use or a bad use of resources. Coincidentally comparing more things at once is also extremely beneficial for both resource allocation as well as generating more consistent and greater lifts.

It means that you must always look at things as a way to discover the influence of something and do it by managing cost. Cost is both a technical and tool bost, but also a political capital and opportunity cost. It is the holistic cost of running that test on other current or future opportunities and based on the resources that you have available to you.

So how does this impact the value of being wrong? There are many opinions that you and everyone has about what matters, how to figure out what matters and how to improve things. The problem is that in many cases these things are based on conjecture or passed knowledge and in all cases they just don’t matter.

Don’t let your ideas limit your testing

If I have an idea for how to improve a user experience, lets say to change our shopping cart to remove some unneeded content, that is great, but only if I don’t let that limit what I test. If I am right and that is the most important thing to do, then the data at the end will come to that conclusion.

If something else I choose to do, like change the colors or change the order of the information fields, proves to be of greater value, then I make more money for my organization. By allowing the test the freedom to compare everything that was feasible and not pretending to know something I didn’t, I get a far better outcome.

If I were to limit my testing to popular opinion or only what I think will happen then I learn far less, reinforce useless convention, and I will limit the value of each action I take.

If I however force myself to assume I know nothing and compare as many things as possible, the worst case scenario is that what I thought would happen does happen (and it VERY rarely does).

In that case I was able to enrich that data and eliminate future opportunities. In all other cases something else outperformed what I expected to happen, granting us a higher lift, a better outcome, and the opportunity to learn something we would have cut ourselves off from.

Comparing your idea to alternative ideas is the basis of a successful testing program

This thought process is the very core of a really successful test program and yet is very rarely enacted. As you evaluate your program really look and see if you or the people you work are ready for their world views to be challenged on a daily basis. There are many people out there that talk about wanting to help the business, but when push comes to shove are far more interested in their own well being that of the companies.

How about yourself? Let me assure you that nothing will deflate your ego faster than seeing everything you think you know about users, analysis, and good user experience being torn to shreds. Not only are you going to be dealing with this on a continuous basis, but this will become by far the majority and most important work you do every day. If you are not ready for that fight then you and your program probably are not really ready to grow or to improve.

Deconstruction – The Testers Best Friend

Let’s go back though to what really matters with testing, discovering influence and cost of actions. This changes completely what actions you take, how your prioritize actions, and how you even report on changes. Fundamentally we know far less about the influence of changes then we wish we did. Finding out that groups or pages have different behavior is interesting, but it tells us nothing about our influence to those groups or the cost to achieve that influence.

You can have the perfect idea but if the cost is too high or the population too small, then the efficiency of that action is not worth doing. This is why deconstructing ideas is so vital. You are fundamentally taking pages or ideas and looking at how you can learn influence and how you can control cost.

Deconstruction means taking an idea and challenging the core assumptions in it.

You think you need to improve the messaging on your CTA? How influential is the CTA compared to the nav, or the headline, or the other parts of the page. You think you know, but do you really?

I can tell you that after all the tests that I have done the only consistent rule I have found is that spatial changes consistently have a higher beta than contextual (real estate is more important then copy). Or how about how influential your landing page is compared to your homepage, or your cart? I am sure you have opinions but have you really broken it down?

Testing is not about proving your idea

Let’s take the example of the CTA? What if we take 2 versions of a changed CTA, but compare that to 2 versions of the main headline, and removing the other sections of a page. What about if we do an MVT of the various sections and compare the influence? The best case scenario is that we find something else is a better use of our resources and we get a better outcome.

This is the core of good testing. People view testing as either shotgun or as hypothesis driven when in reality those are 2 of many options. The core tenant is getting results, not proving someone right or in blind luck. You can focus your testing on discovering the value of different types of changes and then use that to inform and apply resources going forward.

The original test idea, like all test ideas, was fungible, it was a starting point to get to the real action that does matter. When you start out you know very little about your ability to influence, which is why deconstructing the idea and discovery was so vital.

Real life example

Back to that example of the shopping cart I mentioned before. That is a real test that we just completed and I can tell you that it turned out removing the side options was actually negative, but changing the page to high contrast, despite some reservations internally, had a significant impact to the bottom line. We would never had known that if we limited ourselves to opinions, nor did we discover it via random chance.

We deconstructed the feasible alternatives of the page and tested them all against each other so that we could compare the efficiency of different changes. What we learned not only made an impact here but also will help us going forward. We also did not get just one winner but 5 out of 8 variations proved to be positive.

This is also important because one of the key things you are testing for is not a specific winner but a high beta (large spectrum of outcomes) so that we can choose the best option. I can also tell you that the beta of changes here were not as high as other tests we have run, which has given some people internally some new information to think about.

In almost all cases many opinions were wrong, including my own for the record, but because we focused on discovery and efficiency, we were able to get a great outcome and move the organization and the testing program forward.

The Need for Rational Decisioning

Another byproduct for this is the need to eliminate opinion and wiggle room for others. It means that you need to be far more forceful in what you allow or don’t allow and that everything is focused on efficiency.

The number one way outside of how you construct tests to maximize discovery is to align people on a single success metric and to ensure that it doesn’t change from test to test. In almost all cases that metric is going to be RPV (Revenue per Visitor) or if all actions are of equal value then conversion rate.

It is vital that this metric has nothing to do with the specific test, no matter if you think you are influencing a specific action or not. Want to improve your internal search use, great, but that is a means to an end and not the end itself. Increasing the efficiency of your site to generate revenue is the end goal of all for profit businesses.

Determine your single success metric before you test

One of the core tenets of my work is that success is determined before you launch a test, not after. If you have taken care of everything before you launch a test, and aligning people on a single success metric is probably the most vital, than once a test launches it is a mechanical check of data and processing of a winner.

If you have not then you have allowed a massive amount of inefficiency into your program. It can be incredibly difficult to get people to align on a metric but it is absolutely important to rational decision making and to allowing for a evaluation of efficiency for all options. It doesn’t matter if one was better at getting people to the cart and one was better at getting email signups, the only thing that matters is did it impact the bottom line of the business.

It is vital to the point that I will not run a test if there is not a single success metric that is universal and tied to the bottom line of the business, independent of the test we are running. It is better to hold up or do nothing than to allow this behavior going forward.

Don’t report on microconversions

The key to this is that you never report on anything but that single metric. The goal as stated before is the efficiency of feasible alternatives, which means the more things you compare and the more thought into the cost of each action the better you are to inform others about that efficiency.

Everything needs to be viewed as a measure of one action versus another, as all actions, even doing nothing, will have an outcome. You should instead focus on when you learn that people’s opinions were wrong and what the next steps from that data should be.

Doing this also helps eliminate making false statements about causal direction as those can be deadly to overall program efficiency. Just because more people used internal search or click on your banner and you made more money does not mean that you made more money because people did that other action.

At the end of the test you have exactly one data point, that both went up, or one went up and one went down, or both went down. One data point is not enough to establish a pattern, correlation, or especially causation. What will happen though is that your preconceived notions and opinions will naturally try to explain this relationship, which can lead to wasted resources and limited gains in the future. Focus on what you know, not what you want to know or think you can deduce.

Cognitive Dissonance – Your Greatest Enemy

The very nature of running a program this way means that you will consistently be proving people wrong. While this is the best thing for performance it creates a constant state of cognitive dissonance that if you are not prepared to deal with can blow up your program and lead to people creating reasons to not have faith in your tests.

It is painful for people to believe something that they have done for years in the past is not only not as valuable as they think, but in many cases not valuable at all and negative to the business. If you are not coaching this before hand and if you are not prepared to deal with this you are going to run into a large number of landmines.

The first and most vital step to dealing with this is to focus all discussions on the comparing of actions and not on validating opinions. It isn’t about if Tactic A or B works, it is how well does Tactic A or B or C or D and so on compare to each other.

ll ideas have to be treated the same, whether you think they will win or not

It is about the various influences of each option and not about any individual idea or concept. All ideas get treated the same, whether you think they will win or not. By doing this you are taking the fight out from a me versus you attack and instead focusing on the system and the outcomes. It isn’t personal, it is about a holistic view of all possible outcomes.

The second tactic is what this entire article is about, discussing just how valuable being wrong is. If you have that discussion outside of test ideas and if you reinforce it in every conversation then you are opening up that door to hold the conversation when it really matters.

Cherish unexpected test results

It is even better if you are championing how great a result is to the rest of the organization when you find something that goes against conventional wisdom. Doing this the first few times prepares people for this being a consistent and good outcome of future tests. In the case of the shopping cart test I mentioned before one of our senior executives through up their hands and proclaimed how funny it is that they are constantly wrong on each test. They were prepared for it and allowed us to make the changes because we had been preparing them since day 1.

The third tactic is to simply ensure that each test has 1-2 variants that are there purely because they go against conventional wisdom and the thought process that lead to the current status quo. By having things designed to break opinions you can leverage the learning from those to build the case in the future when one wins. In our case we have had well over 50% of our tests have one of those options win, and we are all the better performing for it.

The last main tactic is to have an education program consistently going within your organization. Meet regularly with each key group and inform them about what testing can do, past experiences and what to expect, and help them think about future efforts you can do to assist them.

Create openings for conversations about testing

You may not win that individual battle and get them to champion an outcome, but by doing this you are opening up a conversation and allowing them to hear about what you are trying to accomplish away from being in the heat of an argument. I would strongly suggest some sort of regular conversation at least once every other month in larger organizations and ongoing conversations in smaller organizations.

In all cases you have to choose your tactics based on the people and the place. By far the most important work I do is dealing with cognitive dissonance and helping grow an understanding of what you are trying to accomplish.

Trying to get people to think in terms of feasible alternatives, being wrong, and rational decision agents is a big deal and is not part of anyones day to day activities. We are wired and trained from an early age to please people and to try and get that gold star from being right. It takes a lot to realize that being right or being wrong is irrelevant, getting results in an impartial way and working together to make everyone better is what really matters.

Helpful Rules

I want to close with a few helpful rules that might make many of the concepts I discuss above and in the future be acted on. Even if you don’t understand completely everything that I have discussed simply enforcing these rules will allow your program to grow and will enable future understanding.

Never test just 1 alternative – Try to shoot for a minimum of 4-5 and more is better if you have the traffic. I would cap it at 10 unless you have insane amounts of traffic.
Always have 1 success metric – This is for your site and independant of the specific test
Always discuss how you will act on a test before you launch it – Success is determined before you launch, not after
Never explain why something happened – It is impossible to tell from a test and detracts from the rational action that needs to follow
Don’t get too caught up on test ideas – ideas are fungible until you have hard causal proof of influence
Always challenge conventions – Try to get various opinions about what the best option is but also include options that challenge popular opinion
Deconstruct ideas – see if you can challenge assumptions with every test you do
Avoid downtime – It is far better to have a test live while you work on a bigger test than to hold things up for resources. Maximize the cost of a test mean minimizing downtime.
Have clear rules of action – Don’t over focus on confidence or under focus on sampling bias or representative data
Educate, educate, educate – Successful programs are so different than what people expect from a testing program. Your main role is to help people grow and understand this. Tests are just actions to achieve a goal, not the only function of a program.

Conclusion

Take a second to really think about the core focus on your program. It is so easy to fall into the trap of thinking of testing as a way to prove a point or validate a change on the site. There is value in this but it is so little and so inefficient to what you can be doing. The first step of really getting results is to change how you view testing and optimization.

Focus on being wrong, focus on efficiency and then design your program around that. Everything you do, from how you talk to your organization, what tests you run, and how you leverage tools is shaped by your fundamental understanding of what matters in testing.

Your own cognitive dissonance is the the first hurdle to really changing what and how you run a program. Really evaluate what actions you take and what you are accomplishing with your program. Even if you don’t agree with everything I said in this article you can follow the simple rules I stated and that will help open up the program to much greater returns. Most importantly avoid going off track as it can be extremely difficult to get back on track the more you allow others to drive you towards less efficient outcomes.

“We are what we repeatedly do. Excellence, then, is not an act, but a habit.”

– Aristotle

Featured image credit

Join the conversation Add your comment

ChantalNewkirk

12 years ago

Spot on with this write-up, I actually feel this site needs a lot more attention. I’ll probably be returning to read more, thanks for the advice!
www.DesireLeather.com

11 years ago

This editorial piece is written up in the simple words with insightful secrets. Keep posting like this subject.

Comments are closed.

The Psychology of a Successful Testing Program

Table of contents

What you think of your testing program means little about its actual performance