Surfstat.australia: an online text in introductory Statistics

SUMMARISING AND PRESENTING DATA

MEASURES OF CENTRAL TENDENCY ("Averages")

A measure of central tendency is a number which indicates the middle of the distribution of data values. The three main measures are the median, the mode and the mean.

Median

The median is a number which is greater than half the data values and less than the other half. If there are an odd number of values, the median is the middle one when they are sorted in order of magnitude. If there are an even number of values, the median is the average of the two middle values. Eg.1 The first five Commodore prices in \$,000 are:

6, 6.7, 3.8, 7, 5.8

Arranged in order of magnitude these are Eg.2 The first 6 Commodore prices in \$,000 are

6, 6.7, 3.8, 7, 5.8, 9.975

Arranged in order of magnitude these are:  For all Commodores there are n = 38 values, so the middle ones are 19th and 20th values (so that there are 18 on either side). The 19th and 20th values are both \$9,500 so the median is \$9,500.

The MINITAB command for obtaining the median of a data set stored in column C1 is

```
MTB> median c1

MEDIAN =      9500.0
```

Mode

The mode is the value or category which occurs most frequently. If several data values occur with the same maximal frequency, they are all modes. For example, in the Commodore data, using the grouped data, the class interval, [8,000 - 10,999], is the mode.

Notation

Denote data values by where n is the total number of values Mean

This is denoted by (read as 'x bar') and defined as the arithmetic mean of all the data values. e.g. for the Commodore prices The MINITAB command for mean

The MINITAB command is MEAN C1. Also the mean and other summary values are given by DESCRIBE C1  Comparison of Mean and Median

Data set A: 2,3,3,4,5,7,8
Data set B: 2,3,3,4,5,8,20

Both have n = 7 values. It is necessary to sort the data in order of magnitude before you can find the median. For large data sets this may be time consuming and this is the reason why medians were not used much until computers became readily available.

The median is not affected by extreme values, but the mean is changed (compare results for data sets A and B above).

APPLET "Centres"

Click and drag below the line to add and move data. Drag points above the line to remove them. Which is the mean and which the median? Which is more responsive as you move (a) the middle point (b) an end point? In a real dataset, which (middle or end) is more likely to be a data error?

Mean for Grouped Data  The median for grouped data is calculated as the midpoint of the class interval that comes closest to having half the values above and below it.

Example - Survey of 20 students in a STAT101 tutorial (or 18.75 + 0.5 years if we use the mid-point interval)

For grouped data like the Commodore prices take the x-values as interval mid-points
e.g. for interval 2000-4999 use , etc then (which is close to the mean calculated from the individual values, 10080)

Progress check

1. The median of a distribution is
2. It is easier to find the mean of a large set of data than the median because
3. The main advantage of the median over the mean is that

 ... Previous page Next page ...