Data : (x1,y1), (x2,y2),..., (xn,yn).
Plot y against x.
In calculating a correlation, x and y are treated interchangeably. Both measurements are assumed to be subject to random variation.
ROW fht fwt 1 167 60 2 164 65 3 170 64 4 163 47 5 152 46 6 160 57 7 170 57 8 160 55 9 157 55 10 170 65 11 150 50 12 156 46 13 168 60 14 159 55 15 160 50 16 172 69 17 175 56 18 169 56 19 169 72 20 156 56 gplot c2 c1
correl c2 c1 Correlation of fwt and fht = 0.673
For regression on the other hand, one variable y is regarded as an outcome of, or response to, the other variable x. The two variables are not interchangeable.
We find the equation for a straight line " y = a + b.x " which summarises the relationship between them. Here y is the outcome, response or dependent variable (output), while x is the predictor, explanatory or independent variable (input).
Often values of x are chosen by the investigator and are not random, whereas y values are measurements which are treated as random variables taken from distibutions that depend on x.
(Source: Aust Inst. of Health, "Health Expenditure", Info.Bull. No 6, May 1991)
Year 1983-4 84-5 85-6 86-7 87-8 88-9Denote the years by x = 1 for 1983-4, x = 2 for 1984-5 and so on.
Growth 4.6 2.7 4.1 2.3 1.8 2.0
MTB> plot c4 c3
-
growth - *
-
-
4.0+ *
-
-
-
-
3.0+
- *
-
-
- *
2.0+ *
- *
-
----+---------+---------+---------+---------+---------+--year
1.0 2.0 3.0 4.0 5.0 6.0
We fit a line
= â -
x using the method of least
squares to calculate â and
so that the sum of squares of the vertical distance
is minimised.
These values are
(sxy and
sx are defined above)
- 
You can calculate â and
and then predict the y value for any given x
= â -
x
The total (overall) variation among measured y values, ignoring the
x's, is
.
The variation of y values from the line is less, and is given by
. The regression line is
precisely the line that makes this second quantity as small as possible.
It can be shown that
Total Variation
Sometimes (e.g. by MINITAB) these values are shown in an Analysis of Variance (ANOVA) table:
| Source of variation | Sum of Squares (SS) | |
|---|---|---|
| Regression |
( i -
)2
| <-- "explained variation" |
| Error |
(yi - i)2
| <-- "unexplained variation" |
| Total |
(yi - )2
|
This is a measure of how well the line fits the data.
It can be shown that
So 100r2 is the percentage of variation "explained" by the regression line.
If the points all lie exactly on the line, i.e. if yi =
i, then the
"unexplained" (or error) variation is zero so the coefficient of
determination is one.
If
i =
so the slope of the line
is zero (i.e. no regression effect)
then the coefficient of determination is zero.
In general: 0
coefficient of determination
1 .
MTB> brief 3
| ... Previous page | Next page ... |