In this section we will consider ways of presenting data on two or more variables in tables
A table containing the number of applicants for admission to the University of California at Berkley according to gender and major (equivalent to Faculty in Australia) is shown below.
How can you discover and present the main "messages" in the data?
| Major | Men | Women | Total |
|---|---|---|---|
| A | 825 | 108 | 933 |
| B | 560 | 25 | 585 |
| C | 325 | 593 | 918 |
| D | 417 | 375 | 792 |
| E | 191 | 393 | 584 |
| F | 373 | 341 | 714 |
| TOTAL | 2691 | 1835 | 4526 |
To compare preferences for major (outcome variables) between men and women (predictor variable) we might calculate the percentage of total admissions by major for each gender (ie column percentages). To do this, multiply each number in the "Men" column by 100/2691 and multiply each number in the "Women" column by 100/1835.
| Major | Men | Women |
|---|---|---|
| A | 31 | 6 |
| B | 21 | 1 |
| C | 12 | 32 |
| D | 15 | 20 |
| E | 7 | 21 |
| F | 14 | 19 |
| TOTAL | 100 | 99* |
Conclusion - men preferred majors A or B, while women preferred majors C, D, E or F.
You could show the column %s on a bar graph or a dot chart.
To compare gender distributions (outcome) for majors (predictor), calculate % of men and women for each major (ie row percentages)
| Major | Men | Women | Total |
|---|---|---|---|
| A | 88 | 12 | 100 |
| B | 96 | 4 | 100 |
| C | 35 | 65 | 100 |
| D | 53 | 47 | 100 |
| E | 33 | 67 | 100 |
| F | 52 | 48 | 100 |
Conclusion - Majors A and B were predominantly men, C and E were predominantly women, and majors D and F were approximately balanced between men and women.
The variable OWNER takes 3 values :
| Owner | Size | Owner | Size | |
|---|---|---|---|---|
| 3 | 3 | 1 | 2 | |
| 1 | 2 | 3 | 2 | |
| 1 | 1 | 3 | 2 | |
| 3 | 1 | 3 | 3 | |
| 1 | 1 | 1 | 2 | |
| 3 | 1 | 3 | 3 | |
| 1 | 2 | 3 | 1 | |
| 1 | 1 | 3 | 2 | |
| 2 | 1 | 3 | 1 | |
| 2 | 3 | 2 | 1 |
For example, the first restaurant in the data set was owned by a corporation (code 3) and had more than 20 employees (code 3).
The data were entered into a MINITAB worksheet by the following commands:
MTB> read c1 c2
DATA> 3 3
DATA> 1 2
DATA> 1 1
DATA> 3 1
....
....
....
DATA> end
20 ROWS READ
MTB> name c1 'owner' c2 'size'
The command TABLE C1 C2 will result in a table in which the rows are OWNER, the columns are SIZE and the numbers in the cells of the table are the counts (frequencies).
A table is often easier to interpret if the counts are converted to percentages.
The subcommand COLPERCENT calculates column percentages and the subcommand ROWPERCENT calculates row percentages.
MTB> table c1 c2
ROWS: owner COLUMNS: size
1 2 3 ALL
1 3 4 0 7
2 2 0 1 3
3 4 3 3 10
ALL 9 7 4 20
CELL CONTENTS --
COUNT
MTB> table c1 c2;
SUBC> colpercent.
ROWS: owner COLUMNS: size
1 2 3 ALL
1 33.33 57.14 -- 35.00
2 22.22 -- 25.00 15.00
3 44.44 42.86 75.00 50.00
ALL 100.00 100.00 100.00 100.00
CELL CONTENTS --
% OF COL
MTB> table c1 c2;
SUBC> rowpercent.
ROWS: owner COLUMNS: size
1 2 3 ALL
1 42.86 57.14 -- 100.00
2 66.67 -- 33.33 100.00
3 40.00 30.00 30.00 100.00
ALL 45.00 35.00 20.00 100.00
CELL CONTENTS --
% OF ROW
Which category of OWNER had the highest frequency? What percentage of all OWNER were in that category? Which category of SIZE had the highest frequency?
The table below shows the numbers of deaths in Australia in 1995 for people aged 15-24 years (Source: Australian Bureau of Statistics, 3303.0, pp.33-35):
| Cause of death | Males | Females | Total |
|---|---|---|---|
| Motor vehicle accident | 448 | 146 | 594 |
| Suicide | 350 | 84 | 434 |
| Other accident | 257 | 74 | 331 |
| Malignant cancer | 86 | 50 | 136 |
| Other diseases | 267 | 153 | 420 |
| Total | 1,408 | 507 | 1,915 |
Each person who died was categorised by sex (M or F) and by cause of death. A cross-classified table is sometimes called a contingency table.
Do males and females in this age group die from the same causes?
To compare patterns of cause of death you need to consider relative frequencies or percentages because the total numbers of deaths are not the same for males and females.
| Males | Females | Totals | |
|---|---|---|---|
| Numbers | 1408 | 507 | 1915 |
| Relative Frequency | 0.74 | 0.26 | 1.00 |
(e.g. 1408/1915 is approximately 0.74)
Conclusion - In this age group there are about 3 times more male deaths than female deaths.
| Cause | Number | Relative frequency |
| Motor vehicle accident | 594 | 0.31 |
| Suicide | 434 | 0.23 |
| Other accidents | 331 | 0.17 |
| Malignant neoplasms | 136 | 0.07 |
| Other diseases | 420 | 0.22 |
| Total | 1915 | 1.00 |
This table is obtained by collapsing the original table over the factor 'sex'. Conclusion - The main causes of death in this age group are motor vehicle accidents, which cause 31% of all deaths, and suicides, which account for 23%.
| Males | Females | |||
|---|---|---|---|---|
| Cause | No. | % | No. | % |
| Motor vehicle accidents | 448 | 32 | 146 | 29 |
| Suicide | 350 | 25 | 84 | 17 |
| Other accidents | 257 | 18 | 74 | 15 |
| Cancer | 86 | 6 | 50 | 10 |
| Other diseases | 267 | 19 | 153 | 30 |
| Total | 1408 | 100% | 507 | 100% |
Conclusion - Motor vehicle accidents were the major cause of death for both males and females in the age group 15-24 years, accounting for about 30% of deaths. Suicides were more common for males than for females.
| Area | Neonatal deaths | Live births | Total births | Death rate (%) |
| Lake Macquarie | 24 | 2304 | 2328 | 1.03 |
| Newcastle | 26 | 1835 | 1861 | 1.40 |
| Maitland | 5 | 814 | 819 | 0.61 |
| Cessnock | 7 | 725 | 732 | 0.96 |
| Port Stephens | 10 | 631 | 641 | 1.56 |
| Muswellbrook | 2 | 295 | 297 | 0.67 |
You cannot directly compare the numbers of deaths in each area because these depend on the number of births. You need to convert the numbers of deaths to death rates:
Progress check |
| ... Previous page | Next page ... |