Statistical Significance: Are You Interpreting Correctly?
Know the null hypothesis!
A few days back, I was chatting with a friend about a paper. At one point, the conversation went like this:
Friend: (Referring to a result mentioned in the paper) The difference in the average outcome between the two groups is statistically significant.
I: Yeah, it suggests that the difference in the average outcome between the two groups is highly likely to be different from 0; but from this result, we get no idea how big the actual difference is.
Friend: What!? 😲 I thought a statistically significant difference suggests a substantial difference in the outcome! 🤷♂️
Indeed many of us have the same misunderstanding, particularly at the beginning of our statistics education. And now that data is everywhere and statistical results are often shared with people in general — particularly in newspapers, magazines, and blogs — it is crucial not to misguide anyone.
Colloquially, when we say that something is significant, we refer to the meaningfulness and importance of it. If you received no education in inferential statistics or if you were just poorly instructed, intuitively, you may think that a statistically significant finding means an important finding obtained through a statistical test.
Yes, a statistically significant finding may be really important in some cases, but it may not be important at all in many other cases. For example, a $5 difference in the average annual income between two groups of people may be statistically significant (i.e., highly likely to be different from 0 in the population of interest) in a large sample, but this difference may not have any real-world significance.
Here are a few things to note about statistical significance in the context of two-tailed t-tests/z-tests:
1. Statistical significance is always tied to a hypothesis test. Usually, we have:
i) a null hypothesis of no difference and
ii) an alternative hypothesis of a non-zero difference in the average outcome between the two groups in the population of interest
It should be noted that the null hypothesis can include any number, not just 0. For example, in a two-sample t-test, we can test the null hypothesis of a $1000 difference (rather than a $0 difference) in the population-level average annual income between the two groups.
In many statistical packages, the default value of the difference in the null hypothesis is set to 0. This makes sense because we often test a null hypothesis of no difference in the population-level average outcome between two groups.
2. We decide on a significance level (expressed by the Greek letter alpha) for the hypothesis test. Conventionally, we pick a 5% significance level which means we expect to make a correct decision on the hypothesis test 95 out of 100 times (conversely, this implies: we take the risk of erroneously rejecting a true null hypothesis 5 out of 100 times).
Based on the significance level and the p-value (probability of getting a test statistic identical to or more extreme than the one that we got in our sample, given the null hypothesis is true) of the hypothesis test, we set a decision rule:
If the p-value is ≤0.05, we reject the null hypothesis in favor of the alternative hypothesis
If the p-value is > 0.05, we fail to reject the null hypothesis
3. When we reject a null hypothesis, we conclude that the difference in the population-level average outcome between the two groups is significantly different from the value mentioned in the null hypothesis. In this context, “significantly different” suggests substantial evidence favoring the alternative hypothesis.
For example, let’s consider a case in which the null hypothesis says that there is no difference in the population-level average annual income between two groups of people, and after conducting a t-test, we reject it.
By rejecting the null, we conclude that the population-level average annual income between the two groups is significantly different from 0.
Some would interpret the result as: “There is a significant difference in the average income between the two groups,” which is technically correct but can be misleading.
To conclude, to interpretthe meaning of a statistically significant result, first, we must know the null hypothesis. Without knowing the null, it is difficult to judge the practical relevance of a statistically significant result.