We explain what and what are the measures of dispersion, and we give several examples

**What are dispersion measures?**

The **measures of dispersion** or variation, in statistics, they measure how far a data distribution moves away from the value of a central measure, such as the mean or arithmetic average. Its value is always positive and usually different from 0, except in the case of identical data.

If a measure of dispersion yields a small value, it means that the data is located very close to the average, but if it is large, it means that the data is more dispersed, therefore, far from the average.

Dispersion measures are very important from a statistical point of view, not only as arithmetic indicators of data variation, but as an invaluable help when you want to improve quality, both in the manufacturing of products and in the provision of services. .

An example of this are the customer service lines at banks. The average time it takes for customers to make a single line and then distribute themselves at the ticket booths is the same as if they make individual lines in front of each one.

However, the dispersion is less in the single line, which means that the individual attention time is very similar for each customer. Customers have stated that they feel more comfortable this way, even if the average service time is the same in either modality.

The main ones are: range, variance, standard deviation and coefficient of variation.

**Range**

The range R of a data set is defined as the difference between the maximum value xmax and the minimum value xmin of the set:

Range = R = Maximum value – minimum value = xmax − xmin

The range is quick to calculate, but it is very sensitive to extreme values, and has the disadvantage of not taking intermediate values into account. For this reason, it is only used to have an initial, fairly approximate idea of the dispersion of the data.

**range example**

This is a list of the number of hurricanes that have occurred in the Atlantic during the last 14 years:

8; 9; 7; 8; fifteen; 9; 6; 5; 8; 4; 12; 7; 8; 2

The maximum value data is 15, and the minimum value is 2, therefore:

R = Maximum value – minimum value = xmax − xmin =15 – 2 = 13 hurricanes

**Variance**

This measure is used to compare each of the data with the mean of the set, and it is calculated by adding the differences, raised to the square, between each value with the mean and dividing by the total number of values.

Be:

-Mean: μ

-Any value, belonging to the data set: xi

-Total number of observations: N

Denoting the variance of a population as σ2, the expression to calculate it is:

And when taking a sample of size n from a population, we prefer to calculate the variance like this:

Where the sample variance has been denoted by s, and the mean by X with a bar, to leave the use of Greek letters to the population. The reason for dividing by n–1, instead of n, is so that the sample variance does not underestimate the population variance, which always happens when dividing by n.

On the other hand, the idea of squaring each difference between the data and the average is to prevent the sum from resulting in 0, since some differences will be positive and others negative, which tends to cancel the sum. Instead, the squares are always positive.

Hence, the variance is always positive, even if the difference between xi and the mean is negative, and its main advantage of the variance is that it takes into account each data set.

But it has the drawback that its units are not the same as those of the data, for example, if these consist of times, measured in minutes, the variance of the set will be given in minutes squared.

**variance example**

The calculation of the variance requires finding the mean. Taking the data of the number of hurricanes, the average is calculated by:

(8 + 9 + 7+ 8 + 15 + 9 + 6 + 5+ 8 + 4 + 12 + 7 + 8+ 2)/14 = 7.7 hurricanes.Therefore, the variance is:

**Standard deviation**

To correct the problem of the lack of agreement between the units, the standard deviation is defined *σ*as the square root of the variance:

And analogously, in the case of a sample:

There is an empirical rule to estimate the value of the standard deviation of a set of sample data, starting from the range. By this rule, the standard deviation is about one fourth of R:

s ≈ R/4

It has the advantage of allowing a quick estimate of the standard deviation, since the operations are much simpler.

The standard deviation is by far the most widely used measure of dispersion, so it is worth highlighting its main characteristics:

The standard deviation indicates how far the data is from the mean.

It is always positive, but it can be 0 if all the data are identical.

The higher the value of the standard deviation, the more spread out the data.

The units of the standard deviation are the same as those of the variable under study.

Its value changes rapidly when one of the data (or more) has a very different value from the rest

The standard deviation values are biased, that is, the standard deviation means are not distributed around the mean, in contrast to the variance, which is unbiased.

**Standard Deviation Example**

Continuing with the hurricane example, the standard deviation is:

Or, if you prefer to use the standard deviation approximation across the range, you get a fairly close value:

s = 13 / 4 = 3.25

**Coefficient of variation**

The coefficient of variation is denoted by the initials CV or, in some texts, and for both a population and a sample, it relates the standard deviation and the mean, as a percentage:

O well:

The equations are valid as long as the mean is different from 0.

As a general rule, the coefficient of variation is rounded to a single decimal place, and is used to compare data from two different populations.

**Coefficient of Variation Example**

Waiting times in seconds, for bank customers, are recorded in two situations: when they make a single line and when they make individual lines at the customer service ticket offices. The results are the following:

Both data sets can be compared through their respective coefficient of variation:

**single row**

Average = 429 seconds

Deviation = 28.6 seconds

CV= (28.6/429) x 100 = 6.7%

**individual rows**

Average = 429 seconds

Deviation = 109.3 seconds

CV= (109.3/429) x 100 = 25.5%

Since this last value is higher, it indicates that there is more variability in customer service times when they wait in individual lines than when they wait in a single queue, although the average time is the same in each case.