Surfstat.australia: an online text in introductory Statistics

STATISTICAL INFERENCE

ONE CONTINUOUS VARIABLE

Interpretation of Confidence Intervals

Suppose you take several samples each of size n from the population and for each you calculate

± 1.96

then, on average, 95% of the intervals will contain the true but unknown value µ and 5% will not.

If you plotted the intervals vertically they might look like this

Note: The intervals vary from sample to sample.

On average 95% of "95% confidence intervals" will contain µ.

Comments

  1. The sample mean provides a point estimate (i.e. single value approximation) for µ

    e.g. In the example about women's heights, = 165cms

    so µ is approximately 165cms.

  2. Confidence limits provide an interval estimate together with a degree of confidence that the parameter is in the interval

    e.g. with 95% confidence the population mean height µ for these women is in the interval (164, 166) cms

  3. The width of the interval (i.e. precision of estimate) depends on sample size.
In the example of women's heights, the sample size was n=100 so the 95% confidence interval is
165 ± 1.96 ×

i.e. (164, 166) cms.

Suppose the sample size had been n=40 but the mean and standard deviation were still = 165 and s = 5. Then a 95% confidence interval for µ is

165 ± 1.96 × = 165 ± 1.55

which gives (163.5, 166.5).

Notice that increasing the sample size increases the precision of the estimate by not .

e.g. width of 95% confidence interval

= U - L
= ( + 1.96 ) - ( - 1.96 )
= 2 × 1.96

So if n = 100, width = = 0.392 s

or if n = 25, width = = 0.784 s.

If you increase the sample size by 4 you decrease the width of the confidence interval by ½. Precision of the estimate depends on the term in the standard error SE =

Hypothesis Testing

To test a hypothesis that a population parameter has some specified value two approaches are commonly used. They are illustrated in the following example.

Example - Bolt production

A manufacturer produces bolts with a nominal length of 15 cms. The actual lengths vary slightly. The process is stable and the population standard deviation is known to be s = 0.3 cms.

A sample of 50 bolts has a mean length of = 14.85 cms. Does this suggest that the average length of all bolts is not 15 cms?

Sampling distribution of sample mean is

In this case s = 0.3, n = 50, = 14.85 and we want to know if µ = 15 is plausible.

The First Approach

The first approach is to find a confidence interval for µ. A 95% confidence interval is given by
14.85 ± 1.96 ×

i.e. (14.77, 14.93)cms

Interpretation - with 95% confidence the interval (14.77, 14.93) contains the population mean µ of all bolts produced by the process. As the interval does not contain 15.0, the data are not consistent with the hypothesis that µ = 15. That is, the data do not support the hypothesis that the average length of all bolts is 15cms.

Remarks

  1. Confidence intervals with other probabilities, e.g. 0.9 or 0.99, can be calculated similarly.

    For the above example a 99% confidence interval is

    14.85 ± 2.576 ×

    i.e. (14.74, 14.96)

  2. If you have the raw data (e.g. the 50 lengths of bolts) you can use MINITAB to calculate the confidence interval

    e.g. ZINTERVAL 95 0.3 C1

    here 95 = required confidence level (as a percentage)

    0.3 = value for s
    C1 = column with data values

The Second Approach

The second approach is to test the hypothesis (i.e. µ = 15cm) more directly as follows

  1. Assume µ = 15 (that is, assume the null hypothesis is true).
  2. Calculate the probability of getting a sample mean as far away as or further from the assumed population mean as was observed (i.e. = 14.85)

    The shaded areas represent values as far away
    as or further from µ = 15 as the
    observed value = 14.85

    This is called the "p-value". In this case

    p-value = P( 14.85 or 15.15)
    If ( ~ N(15,) then Z =
    Hence P( 14.85 or 15.15).

    =
    = P(Z < - 3.5 or Z > 3.5) < 0.001 from tables

  3. This probability, p-value <0.001, is very small so we conclude that the sample data provide evidence against the assumption µ = 15. We reject the hypothesis that the average length of all bolts was 15 cms.

MINITAB can be used to calculate the p-value for this example as follows

ZTEST 15 0.3 C1
15 = hypothesised mean µ
0.3 = s
C1 = column of data values

Steps for hypothesis testing

In general the steps for statistical testing of hypotheses are as follows.
  1. Specify the hypothesis (often a parameter value) in such a way that you can calculate the probability (sampling) distribution for the sample statistic you are going to observe

    e.g. in the above example µ = 15 so that

    ~ N(15,), taking s = 0.3

  2. Calculate the statistic from the data (e.g. = 14.85)
  3. Calculate the p-value

    = P(obtaining a value which is as extreme or more extreme than the calculated statistic if the hypothesis is true)

    e.g. in the above example

    p-value = Pr( 14.85 or 15.15)

  4. If the p-value is small (see below for the conventional definition of small) this means that the sample value is very unlikely to have occurred if the hypothesis were true, so we reject the hypothesis.


... Previous page Next page ...