“Today, I realised that the word ‘bed’ actually looks like a bed. Mind blown.”


  • Categories

Search Resources

How Phrasee calculates your email split test sample size

Phrasee calculates your email split test sample sizes using actual statistics.

Split testing is the best strategy to optimise your email results. But: if your sample size is calculated incorrectly, you aren’t going to learn anything.

Phrasee automatically calculates sample sizes to ensure maximum statistical significance.

Why is your split test sample size important?

Email marketing has a huge amount of random variance. Want proof? Try this:

Do a split test – 50% to one group, 50% to the other. Use the same subject line, creative and everything for each split. Click launch.

If the world were perfectly predictable, the results would be the exact same – you’re sending out the exact same thing to the entire group. But – guess what – the results won’t be the same! Groups A and B will have different results.

This is because of random variance. In any given sample group, random things can happen. And if your sample size is too small, you will face false positives.

For example – we’ve heard of people running 15 splits to a sample size of 500 each… and thinking the “winner” is the true winner. If you’re doing this, you’re using bad statistics… and making questionable decisions.

We calculate statistically robust email split test sample sizes. And here’s how.

How Phrasee calculates your sample sizes

There are several factors that impact the power of an analysis, such as:

  • Defining good hypotheses
  • Determining test variables
  • Controlling other sources of variance (where possible)

And of course, ensuring you’re using the correct sample size to learn as much as possible as quickly as possible.

You want to maximise the statistical power of your split tests… and Phrasee does this for you.

Determining your effect size

First, we need to estimate the effect size – or, how big a difference we would hypothetically consider a “success” versus a “failure” of a given subject line.

Having run thousands of split tests for customers, Phrasee knows how to do this.

First, we use the global Phrasee data set to predict the likely effect size. Then, we calculate the smallest effect size we consider to be relevant. Lastly, we insert a small level of randomness to control for experimental bias.

Calculating your sample size

Phrasee then creates a test family using the t-test and sample size estimation for correlation coefficients.

Then, we calculate the appropriate alpha level – that is, the probability of falsely rejecting a null hypothesis. This ranges from 0.009 for intricate tests, to 0.1 for fundamental tests. Lastly, we set a statistical power level to predict whether or not the result actually exists in nature.

This gives you your # of splits and sample size

We use a maximum of 30% of your overall email list as a test group. Sometimes we’ll use a lot, sometimes not so much: it all depends on what our statistical engine requires.

You then send out your generated subject lines to samples of this amount… and whichever subject line wins, send to the remaining audience.

Case study: Domino’s

Case study: Domino’s

Email is a crucial marketing channel for Superdry, and to ensure that they remained relevant they needed something to help increase eye share and optimise their email marketing results. Read how Superdry used Phrasee's AI to boost customer engagement.

Email subject lines that sell

Email subject lines that sell

An email subject line is one of the most important elements of any email marketing campaign, representing a crucial factor in motivating people to open an email and engage further with your brand. Here at Phrasee, we know that there’s a science to creating winning email subject lines.

Turn clicks into loyal customers with Brand Language Optimization!