Assume ei ~ N(0, s 2) with s 2 the same for all ei's. Assume the ei's are independent.
The error terms
are estimated by
i =
yi - â -
x
where yi and xi are the data values and
â and
are the least
squares estimates.
The
i's are called
the residuals. It can be shown that the mean of the residuals
is zero,
The following two results are needed in order to make statistical inferences from regression lines.
terms
the denominator must be (n-2) to give an unbiased estimator.
so if s 2 is estimated by s2, then the t-statistic
can be used to make inferences about the slope b of the regression line. The inferences most commonly required are (i) tests of particular values of the parameters, in particular whether the slope could be zero, and (ii) confidence intervals for the parameters, especially the slope parameter b.
| car | #options | delivery time |
|---|---|---|
| 1 | 3 | 25 |
| 2 | 4 | 32 |
| 3 | 4 | 26 |
| 4 | 7 | 38 |
| 5 | 7 | 34 |
| 6 | 8 | 41 |
| 7 | 9 | 39 |
| 8 | 11 | 46 |
| 9 | 12 | 44 |
| 10 | 12 | 51 |
| 11 | 14 | 53 |
| 12 | 16 | 58 |
| 13 | 17 | 61 |
| 14 | 20 | 64 |
| 15 | 23 | 66 |
| 16 | 25 | 70 |
MTB> name c1 'options' c2 'delivery' MTB> plot c2 c1
MTB> brief 3
MTB> regress c2 1 c1;
SUBC> predict 18;
SUBC> predict 10;
SUBC> residuals c3.
The regression equation is
delivery = 21.9 + 2.07 options
Predictor Coef Stdev t-ratio p
Constant 21.925 1.591 13.78 0.000
options 2.0687 0.1164 17.77 0.000
s = 3.045 R-sq = 95.8% R-sq(adj) = 95.5%
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 2927.2 2927.2 315.80 0.000
Error 14 129.8 9.3
Total 15 3057.0
Obs. options delivery Fit Stdev.Fit Residual St.Resid
1 3.0 25.000 28.132 1.295 -3.132 -1.14
2 4.0 32.000 30.200 1.203 1.800 0.64
3 4.0 26.000 30.200 1.203 -4.200 -1.50
4 7.0 38.000 36.406 0.958 1.594 0.55
5 7.0 34.000 36.406 0.958 -2.406 -0.83
6 8.0 41.000 38.475 0.892 2.525 0.87
7 9.0 39.000 40.544 0.837 -1.544 -0.53
8 11.0 46.000 44.681 0.770 1.319 0.45
9 12.0 44.000 46.750 0.761 -2.750 -0.93
10 12.0 51.000 46.750 0.761 4.250 1.44
11 14.0 53.000 50.887 0.796 2.113 0.72
12 16.0 58.000 55.025 0.892 2.975 1.02
13 17.0 61.000 57.094 0.958 3.906 1.35
14 20.0 64.000 63.300 1.203 0.700 0.25
15 23.0 66.000 69.506 1.490 -3.506 -1.32
16 25.0 70.000 73.643 1.694 -3.643 -1.44
Fit Stdev.Fit 95% C.I. 95% P.I.
59.162 1.033 ( 56.946, 61.379) ( 52.265, 66.060)
42.613 0.796 ( 40.905, 44.320) ( 35.861, 49.364)
PLOT 'DELIVERY' 'OPTIONS' <- scatter plot
BRIEF 3 <- print full output
REGRESS 'DELIVERY' 1 'OPTIONS' <- regress the response variable DELIVERY
(y) against one predictor OPTIONS (x)
PREDICT 18: - predict DELIVERY for various values of OPTIONS
PREDICT 10: - predict DELIVERY for various values of OPTIONS
RESIDUALS C3. - Store residuals
i =
yi -
i C3
(i) Regression equation is
delivery = 21.9 + 2.07 options
^ ^ ^ ^
y intercept slope x
â
Interpretation - delivery time increases by about 2 days for each
additional option ordered
- For the first data point x1 = 3, y1 = 25
fitted
1 =
21.925 + 2.0687 × 3 = 28.13
Residual
1 = 25
- 28.13 = -3.13
- To test the hypothesis b = 0 (i.e. delivery time is not related to number of options) use
p-value = P(t14 < - 17.77 or t14> 17.77)
As the p-value is very small we would reject the null hypothesis that b=0, and conclude that delivery time was related to the number of options ordered.
(ii) To obtain a 95% confidence interval (C.I.) for b use
± tn-2 × st.dev
(
) where n=16 so tn-2
= t14
i.e. 2.0687 ± 2.145 × 0.1164 which gives (1.82, 2.32)
This is the 95% C.I. for the rate at which delivery time increases with the number of options.
- To interpret the Analysis of Variance table
source DF SS MSMS = mean square =
regression 1 - - error 14 - -
total 15
Error MS =
=
s2 (see result 1 above)
From the printout s = 3.045 so s2 = 9.272 = error MS.
Coefficient of determination R2 =
so 95.8% of variation in delivery times is 'explained' by the number of options ordered.
R2 (adj) gives an unbiased estimate of the square of the population correlation coefficient when more than one predictor is used - it adjusts for the number of parameters estimated (via the d.f.'s)
(iii) Predictions, e.g. for x = 18 options the predicted (fitted) value for delivery time is
= â +
x
| ... Previous page | Next page ... |