Surfstat.australia: an online text in introductory Statistics

SUMMARISING AND PRESENTING DATA

PRESENTING DATA FOR TWO CONTINUOUS MEASUREMENTS

Notes

  1. The line y = + x is called the regression line of y on x (the least squares regression line)
    • x is called the predictor or explanatory variable (sometimes the independent variable)
    • y is called the outcome or response variable (sometimes the dependent variable)

    The idea can be generalised to equations involving more than one predictor

  2. Some calculators do least squares regression - if yours does check the moisture equation.
  3. The MINITAB command for y = a + bx is

    More generally, e.g.

  4. Relationship between correlation and regression

    Correlation coefficient is so x and y are interchangeable.

    Regression line is y = + x where = - and = so x and y are not interchangeable.

    If you treat x as the response and y as the predictor you get a different line because you would minimise the horizontal distances rather than the vertical distances.

    i.e. the square of the correlation coefficient r is the product of the slopes of these two lines. If the linear relationship is perfect the two lines will be the same: then because "X" and "Y" are interchanged, each slope will be the reciprocal of the other, and their product will be 1.

  5. Lines estimated from small data sets or widely scattered points (i.e. with low correlation) will be subject to error and predictions made from them may be unreliable.
  6. Always look at the scatter plot before fitting a regression line to see if a linear relationship is sensible.

    If the plot (or theory) suggests a non-linear relationship it may be possible to either transform the data into a linear relationship and then fit a straight line or else use a non-linear function of the predictor variables.

Example - American female runners

The table below shows speeds (in feet per second) and stride rates (number of steps per second) of some of the best American female runners.
Moore and McCabes "Introduction to the Practice of Statistics".
Speed      (x) 15.86  16.88  17.50  18.62  19.97  21.06
Stride Rate(y)  3.05   3.12   3.17   3.25   3.36   3.46

Data were entered into a MINTAB worksheet in columns C5 (speed) and C6 (stride).

MTB> Plot 'stride' 'speed';
SUBC> Symbol 'x'.

Analysis of Variance

SOURCE       DF          SS          MS         F        p
Regression    1     0.11789     0.11789   1807.69    0.000
Error         4     0.00026     0.00007
Total         5     0.11815

MTB > let c3 = 1.79677 + 0.078527 * c5
MTB > let c4 = c6 - c3
MTB > name c3 'fit' c4 'residual'
MTB > print c5 c6 c3 c4

 ROW   speed  stride       fit    residual

   1   15.86    3.05   3.04221   0.0077918
   2   16.88    3.12   3.12231  -0.0023060
   3   17.50    3.17   3.17099  -0.0009923
   4   18.62    3.25   3.25894  -0.0089428
   5   19.97    3.36   3.36495  -0.0049543
   6   21.06    3.46   3.45055   0.0094514

MTB > plot c4 c5

Progress check


... Previous page Next page ...