Surfstat.australia: an online text in introductory Statistics

SUMMARISING AND PRESENTING DATA

PRESENTING DATA FOR TWO CONTINUOUS MEASUREMENTS

Properties of the correlation coefficient

  1. -1 r 1 always
  2. r = 1 when all points are on a line which slopes up

  3. r = -1 when all points are on a line which slopes down

  4. r = 0 if there is no association between X and Y

  5. r does not indicate the extent of non linear associations

  6. The value of r can be affected by outliers

  7. r is a number without units. Its value is not affected by
    • interchanging the variables X and Y
    • adding the same number to all values of one variable.
    • multiplying all the values of one variable by the same positive number
  8. because of the previous point, the correlation between two variables is unaffected by normalising or standardising them to have mean=0 and variance=1. If we do this, the formula for calculating the correlation coefficient becomes much simpler: it is simply the average of X times Y.

Interpretation of correlation

  1. r measures the extent of linear association between two continuous variables.
  2. Association does not imply causation - both variables may be affected by a third variable.

Eg1. In Australia total alcohol consumption and the number of ministers of religion have both increased over time and would be positively correlated but the increase in one has not caused the increase in the other (both are related to the total population size)

Eg2. In Japanese schoolchildren shoe size was reported to be correlated (positively) with scores on a test of mathematical ability.

When the data were subdivided by age they looked like this

i.e. within each age group there was no association. In this case age was a confounding variable.

Progress check


... Previous page Next page ...