Inspired by an article on phys.org I decided to compile a list of the seven statistical sins. Statistics is a vital tool to understanding the patterns in the world around us, however our intuition often lets us down when it comes to interpreting these patterns.
1.Assuming small differences are meaningful
Examples of this include small fluctuations in the stock market, or differences in polls where one party is ahead by one point or two. These represent chance rather than anything meaningful.
To avoid drawing any false conclusions that may arise due to this statistical noise we must consider the margin of error related to the numbers. If the difference is smaller than the margin of error, there is likely no meaningful difference and is probably due to random fluctuations.
2. Equating statistical significance to real-world significance
Statistical data may not represent real-world generalisations, for example stereotypically women are more nurturing while men are physically stronger. However, given a pile of data, if you were to pick two men at random there is likely to be quite a lot of difference in their physical strength; if you pick one man and one women they may end up being very similar in terms of nurturing or the man may be more nurturing than the woman.
This error can be avoided by analysing the effect size of the differences between groups, which is a measure of how the average of one group differs from the average of another. Then if the effect size is small, the two groups are very similar. Even if the effect size is large, each group will still have a lot of variation so not all members of one group will be different from all members of the other (hence giving rise to the error described above).
3. Neglecting to look at the extremes
This is relevant when looking at normal distributions.
In these cases, when there is a small change in performance for the group, whilst there is no effect on the average person the character of the extremes changes more drastically. To avoid this, we have to reflect on whether we’re dealing with the extreme cases or not. If we are, these small differences can radically affect the data.
4. Trusting coincidence
If we look hard enough, we can find patterns and correlations between the strangest things, which may be merely due to coincidence. So, when analysing data we have to ask ourselves how reliable the observed association is. Is it a one-off? Can future associations be predicted? If it has only been seen once, then it is probably only due to chance.
5. Getting causation backwards
When we find a correlation between two things, for example unemployment and mental health, it may be tempting to see a causal path in one direction: mental health problems lead to unemployment. However, sometimes the causal path goes in the other direction: unemployment leads to mental health problems.
To get the direction of the causal path correct, think about reverse causality when you see an association. Could it go in the other direction? Could it even go in both ways (called a feedback loop)?
6. Forgetting outside cases
Failure to consider a third factor that may create an association between two things may lead to an incorrect conclusion. For example, there may be an association between eating at restaurants and high cardiovascular strength. However, this may be due to the fact that those who can afford to eat at restaurants regularly are in high socioeconomic bracket, which in turn means they can also afford better health care.
Therefore, it is crucial to think about possible third factors when you observe a correlation.
7. Deceptive Graphs
A lot of deception can arise from the way that the axis are labeled (specifically the vertical axis) on graphs. The labels should show a meaningful range for the data given. For example, by choosing a narrower range a small difference looks more impactful (and vice versa).
In fact, check out this blog filled with bad graphs.