Surfstat.australia: an online text in introductory Statistics
SUMMARISING AND PRESENTING DATA
PRESENTING CONTINUOUS DATA
The frequency distribution of a discrete quantitative
may be summarised in a bar chart, as explained in the last section. The
frequency distribution of a continuous quantitative variable can be
constructed in the same way by first grouping the observations.
That is, by choosing a set of contiguous non-overlapping intervals,
called class intervals, the observations can be grouped to form a discrete
variable from the continuous variable.
The following data set will be used to illustrate various concepts and
Prices of n=38 second-hand Commodores from classified advertisements in
the Newcastle Herald on 23 February 1991 - all models and all years
were included but damaged vehicles were excluded.
|6000 ||6700 ||3800 ||7000 ||5800 ||9975 ||10500 ||5990
|20000 ||11990 ||16500 ||10750 ||9500 ||12995 ||12500 ||8000
|9900 ||18000 ||9500 ||9400 ||7250 ||15000 ||4500 ||8900
|9850 ||9000 ||5800 ||29500* ||15000 ||9000 ||4250 ||4990
|11000 ||9990 ||2200 ||4000 ||13500 ||14500 || ||
* This car was described as 'prototype, exclusive, group A look alike'
(ie similar to cars used for racing).
Charts which show the frequency distributions of continuous variables form an important class of
descriptive statistics and are called histograms. Unlike bar
charts, they are drawn without gaps between the bars because the x-axis
is used to represent the class intervals. They provide a useful visual
impression of the principal characteristics of the frequency
Example - Commodore prices
- Usually use from 5 to 15 intervals.
- Calculate the range and divide it by the chosen number of intervals
to get the approximate length for each interval.
- Define interval end points so they don't overlap or leave gaps
(ie. they are mutually exclusive and exhaustive) - This ensures that
every observation belongs in exactly one interval.
- It is a usually simpler idea to have all intervals of the same length
- Count the number of values in each interval (the class frequency) -
go through the data once only and use tally marks to help counting.
- Usually relative frequencies or percentages are helpful to show the
distribution of data.
Draw rectangles over each interval so that
area of rectangle = frequency (or relative frequency)
But area = length x height
So if all intervals are the same length, L
Heights are directly proportional to the frequencies only if all intervals
are the same length.
For the histogram shown above, for the Commodore prices, the
interval "20000 or more" was changed to four intervals, 20000 - 22999,
23000 - 25999, 26000 - 28999, 29000 - 31999 in order to have properly
defined endpoints and equal lengths for all intervals.
If intervals are not all of the same length then heights have to be
scaled so that each area is proportional to the frequency for that interval.
In MINITAB the command is
Note MINITAB rotates the histogram 90 degrees.
Java applet "Histogram"
Here is another histogram applet, by R. Webster West, Dept.
of Statistics, Univ. of South Carolina.
A histogram groups the data into just a few intervals. A dotplot groups
the data as little as possible. Dotplots tend to be more useful with small
data sets. Dotplots are useful if you want to compare two or more sets
of data. For larger data sets, group the data and draw a histogram.
Example - The Commodore price data
MTB > DotPlot 'price'.