Surfstat.australia: an online text in introductory Statistics

SUMMARISING AND PRESENTING DATA

PRESENTING CONTINUOUS DATA

The frequency distribution of a discrete quantitative variable may be summarised in a bar chart, as explained in the last section. The frequency distribution of a continuous quantitative variable can be constructed in the same way by first grouping the observations.

That is, by choosing a set of contiguous non-overlapping intervals, called class intervals, the observations can be grouped to form a discrete variable from the continuous variable.

The following data set will be used to illustrate various concepts and methods.

Commodore Data

Prices of n=38 second-hand Commodores from classified advertisements in the Newcastle Herald on 23 February 1991 - all models and all years were included but damaged vehicles were excluded.

 6000 6700 3800 7000 5800 9975 10500 5990 20000 11990 16500 10750 9500 12995 12500 8000 9900 18000 9500 9400 7250 15000 4500 8900 9850 9000 5800 29500* 15000 9000 4250 4990 11000 9990 2200 4000 13500 14500

* This car was described as 'prototype, exclusive, group A look alike' (ie similar to cars used for racing).

Histograms

Charts which show the frequency distributions of continuous variables form an important class of descriptive statistics and are called histograms. Unlike bar charts, they are drawn without gaps between the bars because the x-axis is used to represent the class intervals. They provide a useful visual impression of the principal characteristics of the frequency distribution.

Example - Commodore prices • Usually use from 5 to 15 intervals.
• Calculate the range and divide it by the chosen number of intervals to get the approximate length for each interval.
• Define interval end points so they don't overlap or leave gaps (ie. they are mutually exclusive and exhaustive) - This ensures that every observation belongs in exactly one interval.
• It is a usually simpler idea to have all intervals of the same length
• Count the number of values in each interval (the class frequency) - go through the data once only and use tally marks to help counting.
• Usually relative frequencies or percentages are helpful to show the distribution of data. Draw rectangles over each interval so that

area of rectangle = frequency (or relative frequency)

But area = length x height

So if all intervals are the same length, L Heights are directly proportional to the frequencies only if all intervals are the same length.

For the histogram shown above, for the Commodore prices, the interval "20000 or more" was changed to four intervals, 20000 - 22999, 23000 - 25999, 26000 - 28999, 29000 - 31999 in order to have properly defined endpoints and equal lengths for all intervals.

If intervals are not all of the same length then heights have to be scaled so that each area is proportional to the frequency for that interval.

In MINITAB the command is Note MINITAB rotates the histogram 90 degrees.

Java applet "Histogram"

(source)

Here is another histogram applet, by R. Webster West, Dept. of Statistics, Univ. of South Carolina.

Dotplots

A histogram groups the data into just a few intervals. A dotplot groups the data as little as possible. Dotplots tend to be more useful with small data sets. Dotplots are useful if you want to compare two or more sets of data. For larger data sets, group the data and draw a histogram.

Example - The Commodore price data

```MTB > DotPlot 'price'.
``` Boxplots

Progress check

1. A histogram is constructed for a large data set. Which of the following are true statements? (a) the median divides the area of the histogram into two equal parts (b) the data are continuous numeric (c) the mean is found under the tallest column
2. In a histogram, what property of each rectangle represents the frequency of observations in the range it covers?

 ... Previous page Next page ...