Surfstat.australia: an online text in introductory Statistics

# VARIATION AND PROBABILITY

## NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

### Shape of the Binomial Distribution

The shape of the binomial distribution depends on the values of n and p.

### The Binomial and the Normal Distributions Compared

For large n (say n > 20) and p not too near 0 or 1 (say 0.05 < p < 0.95) the distribution approximately follows the Normal distribution.

This can be used to find binomial probabilities.

If   X ~ binomial (n,p) where n > 20 and 0.05 < p < 0.95 then approximately X has the Normal distribution with mean E(X) = np

so is approximately N(0,1).

### The MINITAB MPLOT command

Use the MINITAB command MPLOT (or GMPLOT for high-resolution graphics) to compare a Binomial Distribution with the Normal Distribution with the same expected value and variance. First calculate probabilities for the binomial distribution with n=16 and p=0.5. Then calculate probabilities for the Normal distribution with µ=np=16x0.5=8, and s 2=npq=16x0.5x0.5=4 so that s=2. The graph shows that the two curves are very close together (the symbol 2 indicates that the value for the binomial (A) and the normal (B) distributions were nearly equal).
```MTB> set c1
DATA> 0:16
DATA> end
MTB> pdf c1 c2;
SUBC> binomial 16 0.5.
MTB> name c2 'binomial'
MTB> pdf c1 c3;
SUBC> normal 8 2.     *NOTE np = 8, npq = 4 *
MTB> name c3 'normal'
MTB> Gmplot c2 c1 c3 c1
```

Hence, if X has the binomial distribution ie. X~ binomial (n,p) and n is large, then X has approximately the Normal distribution with mean µ=np and standard deviation . This approximation is reasonably good when np>10 and n(1-p)>10.

### Continuity Correction and Accuracy

For accurate values for binomial probabilities, either use computer software to do exact calculations or if n is not very large, the probability calculation can be improved by using the continuity correction. This method considers that each whole number occupies the interval from 0.5 below to 0.5 above it. When an outcome X needs to be included in the probability calculation, the normal approximation uses the interval from (X-0.5) to (X+0.5). This is illustrated in the following example.

### Java applet "Discrete"

(Source)

#### Example - Gender in a particular faculty

In a particular faculty 60% of students are men and 40% are women. In a random sample of 50 students what is the probability that more than half are women?

Let RV X = number of women in the sample.

Assume X has the binomial distribution with

n = 50 and p = 0.4.

Then E(X) = np = 50 x 0.4 = 20

var(X) = npq = 50 x 0.4 x 0.6 = 12

so approximately X ~ N(20,12).

We need to find P(X > 25). Note - not P(X >= 25).

so

P(X > 25) = P(Z > 1.44)
= 1 - P(Z < 1.44)
= 1 - 0.9251
= 0.075

The exact answer calculated from binomial probabilities

is P(X>25) = P(X=26) + P(X=27) + ... + P(X=50) = 0.0573)

The approximate probability, using the continuity correction, is

= 0.0562 which is a much better approximation to the exact value of 0.0573

(The value 25.5 was chosen as the outcome 25 was not to be included but the outcomes 26, 27, 50 were to be included in the calculation.)

Similarly, if the example required the probability that less than 18 students were women, the continuity correction would require the calculation

 ... Previous page Next page ...