Dashboards and visualisations are fast becoming an essential tool to help decision makers digest lots of information at a glance. The adoption has been accelerated by tools like Tableau and Qlikview that break down the barrier to creating complex graphs. As we increasingly count on hard facts and evidence to complement our experience and intuition when making business decisions, we must also sharpen our analytical lens and watch out for data traps.

With a robust database in place, I set out to look for correlations between two time series datasets from the Office for National Statistics and found a fairly high correlation (0.836) between expenditure by UK households on beer and the popularity of the name 'Harry' for newborn boys in England and Wales.

Without any background knowledge, the graph above may lead some to believe that there is a relationship between how much the UK households spend on beer and the popularity of the name 'Harry' for newborn boys in England and Wales. For the less data-savvy among us, we may even mistakenly believe that one causes the other – could we boost the national sales of beer by discouraging new parents from naming their child Harry?

This is a rather contrived example, but it serves to highlight some pitfalls to avoid when we are interpreting data.

Correlation does not imply causation: Concurrent events do not always have a cause-and-effect relationship. Most statistical tests for correlations are unable to pinpoint causal relationships. In this case, more baby boys named Harry could be depressing the sales of beer, or perhaps the nation's decreased appetite for beer makes the name Harry ever more appealing to parents. Or maybe there's a third reason - a common cause - that hasn't been considered in the analysis that is affecting both variables; perhaps Houdini had put a spell on the nation?

Spurious relationship: Two highly correlated variables sometimes have no connections at all. Spurious relationships can come from exhaustive data mining, where the eyes see a relationship on paper and the mind instantly associates the connection as significant. Most of us would rightly recognise that the popularity of baby names has no bearing on household spending, or vice versa, but will most of us remember to apply the same rigour to business data?

Misleading y-axis: A truncated y-axis can sometimes highlight differences that aren't really that significant. In my example, expenditure on beer as a proportion of total domestic spending appears volatile (left), but recreating the graph with a more conventional y-axis, where the value starts at zero, shows a slightly different picture (right). Seeing dramatic rises and falls in business data may get us excited, but fully grasping the context will ensure we react to what the data says, rather than what we feel the visualisation says.

Data may be empirical, but if we don't learn to interpret it correctly, data can still lie.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.