Surfstat.australia: an online text in introductory Statistics
SUMMARISING AND PRESENTING DATA
PRESENTING DATA FOR TWO CONTINUOUS MEASUREMENTS
Properties of the correlation coefficient
- -1
r
1
always
- r = 1 when all points are on a line which slopes up
- r = -1 when all points are on a line which slopes down
- r = 0 if there is no association between X and Y
- r does not indicate the extent of non linear associations
- The value of r can be affected by outliers
- r is a number without units. Its value is not affected by
- interchanging the variables X and Y
- adding the same number to all values of one variable.
- multiplying all the values of one variable by the same positive number
- because of the previous point, the correlation between two
variables is unaffected by normalising or standardising
them to have mean=0 and variance=1. If we do this, the formula for calculating the
correlation coefficient becomes much simpler: it is simply the
average of X times Y.
Interpretation of correlation
- r measures the extent of linear association between two
continuous variables.
- Association does not imply causation - both variables may be
affected by a third variable.
Eg1. In Australia total alcohol consumption and the number of
ministers of religion have both increased over time and would be
positively correlated but the increase in one has not caused the
increase in the other (both are related to the total population size)
Eg2. In Japanese schoolchildren shoe size was reported to be
correlated (positively) with scores on a test of mathematical ability.
When the data were subdivided by age they looked like this
i.e. within each age group there was no association. In this case
age was a confounding variable.