Surfstat.australia: an online text in introductory Statistics
SUMMARISING AND PRESENTING DATA
MEASURES OF VARIABILITY
Comments
- MINITAB command is STDEV C1 and standard deviation is also given, with
other values, by DESCRIBE C1.
- Although (n-1) is most commonly used for the denominator, sometimes
n is used. Many calculators have both versions - usually the one with n
is (wrongly) called s and the one with (n-1)
is called s. Some textbooks start using n and then change to (n-1). (eg
Staudte's "Seeing Through Statistics".)
- An alternative version of the formula which is easier to use is
e.g. for data set A, n=7
- For grouped data with values
and corresponding frequencies
the
formula is
EXAMPLE - for the student age data
For the student age data (xi = age of student,
fi = number of students with age xi)
Boxplots
- The standard deviation, like the mean, is strongly influenced by extreme
values. eg Commodore Prices
The MINITAB command
The box contains 50% of the values. The whiskers show how far the values
are spread.
The MINITAB command is BOXPLOT C1
For all Commodore prices
Omitting the value $29,500
If an observation is more than 3 times the interquartile range (IQR)
from an end of the box, it is the MINITAB convention to regard it as an
"outlier" (possibly a mistake?) and it is marked as
o on a box plot. If an observation is between 1.5 x IQR and 3 x
IQR from one end of the box, it is a possible outlier. It is marked as
* on a box plot. The whiskers of the boxplot do not extend to the *'s
and o's.
Five-number summary
The five-number summary of a distribution consists of the median M,
the quartiles Q1 and Q3 and the smallest and largest individual observations,
written in the order
minimum, Q1, M, Q3, maximum
This provides a quick overall description of a distribution.