For Digital Analytics, MDEs > Power
One of the most important concepts in designing and running experiments is statistical power – defined as the probability a test will yield a rejection of the null hypothesis at a desired confidence level. In more colloquial language, statistical power is “how likely am I to get a statistically significant finding for an effect, provided that an effect is real”. That colloquial language is not particularly colloquial, which is a shame because power is an incredibly useful tool for an experimenter to wield. It lets you figure out ahead of time which tests are worth doing and which tests are not, as well as calculate the minimum sample sizes required in order to run a test. The problem is that it’s incredibly difficult to explain the concept to people without a solid statistics background, and a good analytics practitioner is an interpreter and a salesperson as much as an analyst.
I’ve realized that a much better way to frame the importance of sample size is the minimum detectable effect, or MDE. The minimum detectable effect is, colloquially “given a sample of size n, how powerful would an effect have to be in order to detect it reliably?”. There are a few virtues that make this particularly useful in the context of a Web business:
- It doesn’t rely on assumptions about effect size: Statistical power takes effect size as an input, but the problem here is that we don’t know the effect of an experiment until we try. Instead of working from there, we work from what we already have – natural rate of variation in what we’re measuring and the sample size. Which is another strength, that:
- Sample size is an input, so you can work with what you have: In many business contexts, the sample is fixed rather than something we can influence – for example, the number of people visiting our homepage is out of our control. We can’t just increase the sample size of people on our email list, either. It lets you take the sample as it is, and ask “Given our existing constraints, how powerful would an effect have to be?”, which gets to the biggest strength:
- It is expressed in terms a practitioner can understand. We can take whatever sample we have, plug in the variance, and get out a result that’s comprehensible. “Well, we’ll probably only see an effect if the new creative bumps open rates by 20%” is a sentence that a marketer or product manager can understand, and in turn decide on their own whether that’s plausible.
Analytics is ultimately a service for the rest of the organization, and designing good experiments is one of the most important things we can do. Making that message heard is a crucial part of the task, and I think MDEs should occupy a bigger part of the communication toolbox.