2023 Q4 | Edition 4 | Article 3

A blurred city scene

Skewed data shows us why we need humans

There is a concept that is fundamental to looking at data. It’s a pattern you first observed when you were in school, measuring the heights of your classmates and plotting them on a chart. It is the idea of the bell-shaped curve.

The bell-shaped curve goes by other technical names – the normal curve or the Gaussian curve.  The top of the curve shows the mean, or average, height. On this table, the mean height of men is 178cm or around 5 foot 10 inches. The ends represent the minimum and maximum heights. For mathematicians, this shape of data matters. A lot of things can be done when data is bell-shaped.

But data gets its meaning from its context, and for that, we need people who know about the part of the world where the data was collected. We don’t have a data on this chart, and if it was collected in 1923, it would be useless as a predictor of male height today. We don’t know where these men lived. The average height for American men tends to be a lot taller than, say, the average male in Timor, and shorter than the average male in the Netherlands. have got a lot taller. We can’t try and use the same data to make sense of a new and different situation without expecting trouble.

Data is rarely perfectly symmetrical and balanced

There is another reason why we need human interpretation. Data is rarely a perfectly symmetrical mountain like the curve in the middle chart above. When it slopes off to one side it has serious implications, depending on the context.

In the positive skewed image, the mode – the most popular option – is much earlier than the mean.

  • If this was an investment, even if the mean return on your investment looked good, you’d be worried. Less than half of people investing in this fund got much lower than the mean return.

In the negatively skewed image, the slope goes off to the lower end of the scale and the mode, or the most popular, is much higher than the mean value.

  • If this was ambulance waiting times, the negative skew means that most ambulances took longer than the mean. In this case, that’s not good news as we know that the faster ambulances arrive, the better. We might want to review the process further, though. Perhaps the call centre has an efficient way of prioritising those urgent calls that get extremely fast response, and the shape of the data is intentional, to ensure that the most urgent cases are dealt with first.

The random shapes of data mountains show us that new information can only be useful if we have people able to make connections and fill in the gaps in our knowledge. Human machine teaming is essential requirement to make the most of what machines can offer in solving problems.

Make it stand out.