Surfstat.*australia*: an online text in introductory Statistics

**
****
**# VARIATION AND PROBABILITY

## NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION

### Shape of the Binomial Distribution

The shape of the binomial distribution depends on the values of n and
p.

### The Binomial and the Normal Distributions Compared

For large *n* (say n > 20) and *p* not too near 0 or 1
(say 0.05 < p < 0.95) the distribution approximately follows the Normal
distribution.

This can be used to find binomial probabilities.

If X ~ binomial (n,p) where n > 20 and 0.05 < p < 0.95
then approximately X has the Normal distribution with mean E(X) = np

so
is approximately N(0,1).

### The MINITAB MPLOT command

Use the MINITAB command MPLOT (or GMPLOT for high-resolution graphics)
to compare a Binomial Distribution with the Normal Distribution with
the same expected value and variance. First calculate probabilities
for the binomial distribution with n=16 and p=0.5. Then calculate
probabilities for the Normal distribution with µ=np=16x0.5=8,
and s ^{2}=npq=16x0.5x0.5=4 so that
s=2. The graph shows that the two curves
are very close together (the symbol 2 indicates that the value for the
binomial (A) and the normal (B) distributions were nearly equal).
MTB> set c1
DATA> 0:16
DATA> end
MTB> pdf c1 c2;
SUBC> binomial 16 0.5.
MTB> name c2 'binomial'
MTB> pdf c1 c3;
SUBC> normal 8 2. *NOTE np = 8, npq = 4 *
MTB> name c3 'normal'
MTB> Gmplot c2 c1 c3 c1

Hence, if X has the binomial distribution ie. X~ binomial (n,p) and
*n* is large, then X has approximately the Normal distribution with
mean µ=np and standard deviation . This approximation is reasonably good when np>10 and
n(1-p)>10.

### Continuity Correction and Accuracy

For accurate values for binomial probabilities, either use computer
software to do exact calculations or if *n* is not very large, the
probability calculation can be improved by using the **continuity
correction**. This method considers that each whole number occupies
the interval from 0.5 below to 0.5 above it. When an outcome X needs to
be included in the probability calculation, the normal approximation
uses the interval from (X-0.5) to (X+0.5). This is illustrated in the
following example.

### Java applet "Discrete"

(Source)

#### Example - Gender in a particular faculty

In a particular faculty 60% of students are men and 40% are women. In
a random sample of 50 students what is the probability that more than half
are women?

Let RV X = number of women in the sample.

Assume X has the binomial distribution with

n = 50 and p = 0.4.
Then E(X) = np = 50 x 0.4 = 20

var(X) = npq = 50 x 0.4 x 0.6 = 12

so approximately X ~ N(20,12).

We need to find P(X > 25). Note - **not** P(X >= 25).

so

P(X > 25) = P(Z > 1.44)
= 1 - P(Z < 1.44)
= 1 - 0.9251
= 0.075
The exact answer calculated from binomial probabilities

is P(X>25) = P(X=26) + P(X=27) + ... + P(X=50) = 0.0573)

The approximate probability, using the **continuity correction**,
is

= 0.0562 which is a much better approximation to the exact value of
0.0573
(The value 25.5 was chosen as the outcome 25 was **not** to be
included but the outcomes 26, 27, 50 **were** to be included in the
calculation.)

Similarly, if the example required the probability that less than 18
students were women, the continuity correction would require the
calculation