Surfstat.australia: an online text in introductory Statistics

STATISTICAL INFERENCE

INFERENCE FOR COUNT DATA

Hypothesis Testing for Categorical Data

Example - Test-tube babies in Australia

Six of the first 7 "test-tube" babies born in Australia were girls. This lead to speculation that in-vitro fertilization (IVF) somehow affected the sex of the baby.

For these data, test the null hypothesis

H0 : P(girl) = ½

against the alternative hypothesis

H1 : P(girl) ½

Let X = number of girls. To test the hypothesis, firstly assume the null hypothesis is true and then calculate the p-value associated with the sample value obtained.

ie. assume X ~ Binomial (7, ½) (n = 7 babies, assume pr(girl) = ½)

Hence E(X) = 7 × ½ = 3.5. The observed value of X was 6 so values as extreme as 6 or more extreme in either direction than expected (i.e. 3.5) are X = 6 or X = 7 or X = 1 or X = 0. So p-value = P(X=6 or 7 or 1 or 0)

As the p-value is greater than 0.05 this is not statistically significant so we conclude the data are consistent with

H0 : P(girl) = P(boy) = 1/2

Example - 3 out of 21 babies boys

Suppose only 3 of the first 21 babies had been boys. Would that have suggested that the sex ratio for IVF was different from 1:1?

Let X = number of girls. Assume X ~ Binomial (21,½)

In this case

p-value = P(X 3 or X 18)

0.0015 by calculator

This is statistically significant (because the p-value is much less than 0.05) so we reject H0 and conclude the sex ratio differed from 1:1.

Notice that the observed proportion of boys, 1/7, was the same for both the above examples. The p-value was affected by the total number, n, of babies or equivalently, by the expected numbers of boys and girls, np and n(1-p).

Example - 50 coin tosses

Suppose in the first 50 tosses of a coin there were 20 heads and 30 tails. Would this suggest the the coin was not balanced?

Let X = number of heads. Assume X ~ Binomial (50,½).

In this case, the expected number of heads, assuming the coin is balanced, is 50 × ½ = 25. The p-value is the probability of getting the observed value of 20 or a more extreme value ( 20 heads) in either direction (ie. 30 heads)

p-value = P(X 20 or X 30) which is tedious to calculate directly.

Instead we can use the Normal approximation to a binomial distribution, since n is large

i.e. if X ~ Binomial (n,p) then approximately X ~ Normal (np,npq)

For n = 50 and p = ½, np = 25, npq = 12.5 and Z =

p-value = P(X 20 or X 30) and use the continuity correction

= P(Z or Z )

P(Z - 1.273 or Z 1.273)

= 2 × (1 - 0.898)

0.20

As this p-value is not statistically significant we conclude that the data are consistent with the hypothesis that P(head) = ½ i.e. that the coin may be balanced.

Testing more than 2 categories

We want to generalize this test to more than two categories. The following approach to the two category case is equivalent to the method used in the last example but it can be extended to more categories.

Categories:HeadsTails
Observed frequencies:o1 = xo2 = n-x
Expected frequencies:e1 = npe2 = n(1- p)
Calculate

c is the Greek letter chi (pronounced ki).

Remarks

  1. This takes into consideration differences between the observed frequencies for the categories and the expected frequencies if the null hypothesis were true.
  2. However it ignores the direction of the differences (because they are squared) so it corresponds to a two-tailed test.
  3. It adjusts for the sizes of the expected values - as suggested by the first two examples in this section.
  4. In the two category case this method is identical to the normal approximation to the binomial distribution:

    Hence, when there are just two categories and the total frequency is constrained to sum to a fixed number n, (so that there is only one r.v., X), then c2 = Z2.

P-values

To calculate p-values we need the probability distribution for c2

This is the chi-squared distribution with 1 degree of freedom - written c21 - see Table 3.

e.g. P(Z < - 1.96 or Z > 1.96) = 0.05

This is equivalent to

P(Z2 > 1.962)

= P(Z2 > 3.84)

= P ( c21 > 3.84)

= 0.05 from Table 3 (first row)

More generally consider a frequency table for k categories

categories:A1A2...Ak
observed frequencies:o1o2...ok
expected frequencies:e1e2...ek
Let e1, e2,...,ek be expected frequencies predicted by some (null) hypothesis or model.

To see how well the data (observed frequencies) correspond to the hypothesis (and hence the expected frequencies) calculate

If the data correspond closely to the hypothesis then (oi- ei) will be small for every category, so c2 will be small.

If the data are not consistent with the hypothesis some or all of the (oi-ei) values will be large, so c2 will be large.

To determine how large c2 could be (just by chance) if the hypothesis were true, we need to know its sampling distribution. This is the chi-squared distribution with (k-1) degrees of freedom. So for k = 3 categories, use 3 - 1 = 2 degrees of freedom.

Table 3 gives probabilities of observing a value of c2 greater than various tabulated values. Its use is illustrated in the following example.

Example - 60 throw fair die tester

To test whether a die is fair, it is tossed 60 times. The results are as follows.
Number of spots, xi : 123456
Observed frequency, oi : 127101759
Do these data suggest the die is 'fair'?

Null Hypothesis: die is fair so the probability for each value is .

If this is true, for n = 60 tosses the expected frequency for each xi is 60 × = 10.

spots xi :123456Total
observed freq.oi :12710175960
expected freq.ei :10101010101060

= 8.8

If the die is fair, c2 will be small. If the die is not fair, c2 will be large. How do you decide whether c2 = 8.8 is large or small?

There are k = 6 categories so use the chi-squared distribution with (k-1) = 5 degrees of freedom.

Large values of c2 provide evidence against the hypothesis so "extreme" values are those greater than or equal to the calculated c2 value. Hence

p-value = P( c2 5 > 8.8)

From the 5th row of Table 3, the required probability is between 0.1 and 0.15, so p-value > 0.1. Since this p-value is not small, we do not reject H0.

We conclude the data are consistent with the hypothesis that the die is fair.

Comments

  1. Small c2 values show that the data are consistent with H0.

    Large c2 values show that the data are inconsistent with H0 and if the p-value is very small you would reject H0.

  2. The sampling distribution of c2 is approximately the chi-squared distribution provided none of the expected values is too small (it is usual to require ei 5). If the ei's are too small you can combine adjacent categories until expected frequencies are 5.
  3. If there are k categories and the expected frequencies are completely specified by the hypothesis H0 (i.e. no parameters need to be estimated from the data) then c2 has (k - 1) degrees of freedom.
  4. If the expected frequencies are not completely specified by H0 and m parameters have to be estimated from the data then c2 has (k - m - 1) degrees of freedom.


... Previous page Next page ...