Are you going “all in” with your email testing strategies?
Posted by Rob Ropars on August 26th, 2011
We’ve all heard that if you’re in marketing, in particular email marketing, you should constantly be testing to maximize results. The most common test mentioned is the ubiquitous “A/B” split test, meaning a 50/50 list split to test one variable against another (graphics, copy, offer, layout, list, time of day, day of week, etc.).
But is an A/B test all you can or should do? If you have only a few thousand or fewer emails to work with, an A/B test may be all you can do to ensure statistically reliable results. However, if your list is too small, an A/B test might not make any sense. For example, if you only have a few hundred email addresses, splitting and conducting one test will literally tell you nothing (statistically) other than directionally relevant information. Instead you may need to try to replicate the test over time, to aggregate the results and to analyze your collective data over a longer period.
The first consideration is to quantify how many email addresses you need to test to ensure you have a representative sample and more importantly, to ensure the results are reliable. There is a lot of math and science behind this topic, and fortunately a lot of math/science/statistics sites have free online tools such as this one.
You must set up the test(s) correctly (with sufficient sample sizes and assumed response rates) on the front end to ensure that results on the back end are reliable, meaning with a confidence level that you’re comfortable with (we recommend a 95% confidence level if it’s possible). Again, there are resources online to assist such as this one. The key is to avoid the common mistake of merely looking at results and assuming winners/losers based on seemingly different response rates.
Before testing, you have to identify the goal or the question you’re trying to answer. We recommend that you actually write these down and then, as briefly and concisely as possible, describe the various yardsticks you will use to determine your winner. As form follows function, the goals/objectives of the test coupled with the means to measure results should help drive copy, graphics, and/or layout to ensure the messages are properly structured and focused on whatever question you’re trying to answer..
Let’s say your goal is a higher click rate and after an A/B test you find “A” has a 2.7% CTR and “B” has 2.85%. It is a common mistake to use subtraction and declare that “B” was the winner or that “B” was only 0.15% higher and that could lead you down the path of thinking it wasn’t a significant result (i.e. a virtual “tie”). Or maybe you routinely just pick the higher percentage as the winner and run with that. Using proper percent increase/decrease calculations, we find that this is actually a 5.56% increase from “A” to “B.”
That however may or may not be statistically significant, but as you can see it’s a much larger increase than originally assumed. In order to determine if the results are statistically significant, use one of the calculators, plug in each version’s list size and the click percentage (or open percentage, or conversion rate, etc. depending on the key metric you’re analyzing) and it will instantly tell you whether this difference is enough to be reliable (with a 95% confidence level).
In this example, let’s pretend I sent “A” and “B” to a random 2,000 people each. The calculations indicate that this would not be enough of a difference to be statistically reliable. In fact, the “B” cell’s click rate would have to have been at least 3.81% in order for the difference to be reliably significant. However, if you didn’t analyze the results properly you wouldn’t know this.
The other way to ensure you’re maximizing your results is to avoid doing a full scale A/B test. If your database for an email marketing campaign is large enough (again calculate minimum sample size), you can do a different kind of split test. First, split your list 10%/90% (ensuring it’s random). Then split the 10% group in half so you have two small splits and the remaining 90%.
Deploy your test to the 10% splits, give as much time as possible for activity to occur (twenty-four hours if possible), analyze the results and then deploy the winner to the remaining 90%. That way you’ve done your best to maximize the campaign’s results without going “all in” on a typical full file A/B split.
As with gambling, learn the rules, do the math, analyze the data and place your bets. Do it right, and the odds will swing in your favor.
In 1594, Shakespeare wrote: “What’s in a name? That which we call a rose by any other name would smell as sweet…”. Although some words may be better than others to convey the meaning of a marketing term, resistance to change can keep lesser-qualified words in place. Take for example the word “open.”

