For these data, test the null hypothesis
H0 : P(girl) = ½
against the alternative hypothesis
H1 : P(girl)
½
Let X = number of girls. To test the hypothesis, firstly assume the null hypothesis is true and then calculate the p-value associated with the sample value obtained.
ie. assume X ~ Binomial (7, ½) (n = 7 babies, assume pr(girl) = ½)
Hence E(X) = 7 × ½ = 3.5. The observed value of X was 6 so values as extreme as 6 or more extreme in either direction than expected (i.e. 3.5) are X = 6 or X = 7 or X = 1 or X = 0. So p-value = P(X=6 or 7 or 1 or 0)
As the p-value is greater than 0.05 this is not statistically significant so we conclude the data are consistent with
H0 : P(girl) = P(boy) = 1/2
Let X = number of girls. Assume X ~ Binomial (21,½)
In this case
p-value = P(X
3 or X
18)
0.0015 by
calculator
This is statistically significant (because the p-value is much less than 0.05) so we reject H0 and conclude the sex ratio differed from 1:1.
Notice that the observed proportion of boys, 1/7, was the same for both the above examples. The p-value was affected by the total number, n, of babies or equivalently, by the expected numbers of boys and girls, np and n(1-p).
Let X = number of heads. Assume X ~ Binomial (50,½).
In this case, the expected number of heads, assuming the coin is
balanced, is 50 × ½ = 25. The p-value is the probability of
getting the observed value of 20 or a more extreme value (
20 heads) in either direction
(ie.
30 heads)
p-value = P(X
20 or X
30) which is tedious to
calculate directly.
Instead we can use the Normal approximation to a binomial distribution, since n is large
i.e. if X ~ Binomial (n,p) then approximately X ~ Normal (np,npq)
For n = 50 and p = ½, np = 25, npq = 12.5 and Z =
p-value = P(X
20 or
X
30) and use the
continuity correction
= P(Z
or Z
)
P(Z
- 1.273 or Z
1.273)
= 2 × (1 - 0.898)
0.20
As this p-value is not statistically significant we conclude that the data are consistent with the hypothesis that P(head) = ½ i.e. that the coin may be balanced.
| Categories | : | Heads | Tails |
| Observed frequencies | : | o1 = x | o2 = n-x |
| Expected frequencies | : | e1 = np | e2 = n(1- p) |
c is the Greek letter chi (pronounced ki).
Hence, when there are just two categories and the total frequency is constrained to sum to a fixed number n, (so that there is only one r.v., X), then c2 = Z2.
To calculate p-values we need the probability distribution for c2
This is the chi-squared distribution with 1 degree of freedom - written c21 - see Table 3.
e.g. P(Z < - 1.96 or Z > 1.96) = 0.05
This is equivalent to
P(Z2 > 1.962)
= P(Z2 > 3.84)
= P ( c21 > 3.84)
= 0.05 from Table 3 (first row)
More generally consider a frequency table for k categories
| categories | : | A1 | A2 | ... | Ak |
| observed frequencies | : | o1 | o2 | ... | ok |
| expected frequencies | : | e1 | e2 | ... | ek |
To see how well the data (observed frequencies) correspond to the hypothesis (and hence the expected frequencies) calculate
If the data correspond closely to the hypothesis then (oi- ei) will be small for every category, so c2 will be small.
If the data are not consistent with the hypothesis some or all of the (oi-ei) values will be large, so c2 will be large.
To determine how large c2 could be (just by chance) if the hypothesis were true, we need to know its sampling distribution. This is the chi-squared distribution with (k-1) degrees of freedom. So for k = 3 categories, use 3 - 1 = 2 degrees of freedom.
Table 3 gives probabilities of observing a value of c2 greater than various tabulated values. Its use is illustrated in the following example.
| Number of spots, xi : | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Observed frequency, oi : | 12 | 7 | 10 | 17 | 5 | 9 |
Null Hypothesis: die is fair so the probability for each value is
.
If this is true, for n = 60 tosses the expected frequency
for each xi is 60 ×
= 10.
| spots xi : | 1 | 2 | 3 | 4 | 5 | 6 | Total |
| observed freq.oi : | 12 | 7 | 10 | 17 | 5 | 9 | 60 |
| expected freq.ei : | 10 | 10 | 10 | 10 | 10 | 10 | 60 |
If the die is fair, c2 will be small. If the die is not fair, c2 will be large. How do you decide whether c2 = 8.8 is large or small?
There are k = 6 categories so use the chi-squared distribution with (k-1) = 5 degrees of freedom.
Large values of c2 provide evidence against the hypothesis so "extreme" values are those greater than or equal to the calculated c2 value. Hence
From the 5th row of Table 3, the required probability is between 0.1 and 0.15, so p-value > 0.1. Since this p-value is not small, we do not reject H0.
We conclude the data are consistent with the hypothesis that the die is fair.
Large c2 values show that the data are inconsistent with H0 and if the p-value is very small you would reject H0.
5). If the ei's are too small you can
combine adjacent categories until expected frequencies are
5.
| ... Previous page | Next page ... |