Decisions

# How to use dispersion statistics to understand data

Dispersion statistics analyze how data is spread out. The range (difference between highest and lowest values) is the simplest. Variance (an estimate of the distance of the mean from each value) and standard deviation (how spread out the values in the data are from the mean) are more complex. Spotify might want to know which song was downloaded most frequently. Amazon might want to know the range of incomes of their customers.

Range, variance, and standard deviation are three important statistical measures that can be used to understand customer behavior. These measures provide different insights into the data and can be used together to gain a more complete understanding of consumer behavior.

The range is the difference between the highest and lowest value in a set of data. It can be used to understand the spread of the data and to identify any outliers. For example, if the range of the amount of money spent by customers on a particular product is \$20, it means that the highest amount spent is \$20 more than the lowest amount spent. This can be useful in identifying customers who are willing to spend more or less than the average customer.

The variance is a measure of the spread of the data. It is calculated by finding the average of the squared differences from the mean. The higher the variance, the more spread out the data is. Variance can be used to understand how much customers are willing to deviate from the average amount spent on a particular product.

The standard deviation is a measure of the spread of the data that is calculated from the variance. It is the square root of the variance. The standard deviation can be used to understand how much customers are willing to deviate from the average amount spent on a particular product. The standard deviation is useful in identifying customers who are willing to spend more or less than the average customer.

Here’s an example from Charles Wheelan’s popular book Naked Statistics. You’ve been feeling extra tired so your doctor orders some blood work. The result comes back with a count of 134 for a (fictitious) blood chemical. The internet tells you that the mean count for someone your age is 122. Do you worry? You call the doctor and her assistant tells you that there is natural variation for this chemical and the standard deviation for this measure is 18.

Plenty of people have counts higher or lower. The problem comes when the count is excessive. How do you know what’s excessive? In many distributions of data, most observations lie within one standard deviation of the mean. In fact, in a normal distribution, we know by definition that 68.2% of observations are in this range. It’s the “foundation on which much of statistics is built,” writes Wheelan. In the case of your blood result, your result is 12 above the mean, which is less than one standard deviation. You can relax, for now.

These examples are simple so we hope you can remember them as prompts when you first approach any data set. Even the largest, most complex data can be described and its underlying patterns exposed.