Surfstat.australia: an online text in introductory Statistics

# SUMMARISING AND PRESENTING DATA

## EXPLORING DATA IN TABLES

In this section we will consider ways of presenting data on two or more variables in tables

#### Example - Number of applicants to UCB Source: Freedman, Pisani & Purves, "Statistics", Norton 1978.

A table containing the number of applicants for admission to the University of California at Berkley according to gender and major (equivalent to Faculty in Australia) is shown below.

How can you discover and present the main "messages" in the data?

No. of applicants
Major Men Women Total
A 825 108 933
B 560 25 585
C 325 593 918
D 417 375 792
E 191 393 584
F 373 341 714
TOTAL 2691 1835 4526

To compare preferences for major (outcome variables) between men and women (predictor variable) we might calculate the percentage of total admissions by major for each gender (ie column percentages). To do this, multiply each number in the "Men" column by 100/2691 and multiply each number in the "Women" column by 100/1835.

Major Men Women
A 31 6
B 21 1
C 12 32
D 15 20
E 7 21
F 14 19
TOTAL 100 99*
*Doesn't add to %100 due to rounding.

Conclusion - men preferred majors A or B, while women preferred majors C, D, E or F.

### Dot Chart

You could show the column %s on a bar graph or a dot chart.

To compare gender distributions (outcome) for majors (predictor), calculate % of men and women for each major (ie row percentages)

Major Men Women Total
A 88 12 100
B 96 4 100
C 35 65 100
D 53 47 100
E 33 67 100
F 52 48 100

Conclusion - Majors A and B were predominantly men, C and E were predominantly women, and majors D and F were approximately balanced between men and women.

#### Example - Newcastle restaurant survey

A Newcastle Restaurant Survey was conducted in 1990. Restaurants were classified by type of ownership and size.

The variable OWNER takes 3 values :

1. sole proprietorship
2. partnership,
3. corporation.
The variable SIZE also takes 3 values:
1. fewer than 5 employees
2. between 5 to 20 employees
3. more than 20 employees
The raw data for 20 restaurants are as follows.
Owner Size Owner Size 3 3 1 2 1 2 3 2 1 1 3 2 3 1 3 3 1 1 1 2 3 1 3 3 1 2 3 1 1 1 3 2 2 1 3 1 2 3 2 1

For example, the first restaurant in the data set was owned by a corporation (code 3) and had more than 20 employees (code 3).

The data were entered into a MINITAB worksheet by the following commands:

```MTB> read c1 c2
DATA> 3 3
DATA> 1 2
DATA> 1 1
DATA> 3 1
....
....
....
DATA> end
MTB> name c1 'owner' c2 'size'
```

The command TABLE C1 C2 will result in a table in which the rows are OWNER, the columns are SIZE and the numbers in the cells of the table are the counts (frequencies).

A table is often easier to interpret if the counts are converted to percentages.

The subcommand COLPERCENT calculates column percentages and the subcommand ROWPERCENT calculates row percentages.

```MTB> table c1 c2

ROWS: owner     COLUMNS: size

1        2        3      ALL

1        3        4        0        7
2        2        0        1        3
3        4        3        3       10
ALL       9        7        4       20

CELL CONTENTS --
COUNT

MTB> table c1 c2;
SUBC> colpercent.

ROWS: owner     COLUMNS: size

1        2        3      ALL

1    33.33    57.14      --     35.00
2    22.22      --     25.00    15.00
3    44.44    42.86    75.00    50.00
ALL  100.00   100.00   100.00   100.00

CELL CONTENTS --
% OF COL

MTB> table c1 c2;
SUBC> rowpercent.

ROWS: owner     COLUMNS: size

1        2        3      ALL

1    42.86    57.14      --    100.00
2    66.67      --     33.33   100.00
3    40.00    30.00    30.00   100.00
ALL   45.00    35.00    20.00   100.00

CELL CONTENTS --
% OF ROW
```

Which category of OWNER had the highest frequency? What percentage of all OWNER were in that category? Which category of SIZE had the highest frequency?

### Contingency Tables

#### Example - Number of deaths in Australia (1989)

The table below shows the numbers of deaths in Australia in 1995 for people aged 15-24 years (Source: Australian Bureau of Statistics, 3303.0, pp.33-35):

Cause of death Males Females Total
Motor vehicle accident
448 146 594
Suicide 350 84 434
Other accident 257 74 331
Malignant cancer 86 50 136
Other diseases 267 153 420
Total 1,408 507 1,915

Each person who died was categorised by sex (M or F) and by cause of death. A cross-classified table is sometimes called a contingency table.

Do males and females in this age group die from the same causes?

To compare patterns of cause of death you need to consider relative frequencies or percentages because the total numbers of deaths are not the same for males and females.

 Males Females Totals Numbers 1408 507 1915 Relative Frequency 0.74 0.26 1.00

(e.g. 1408/1915 is approximately 0.74)

Conclusion - In this age group there are about 3 times more male deaths than female deaths.

 Cause Number Relative frequency Motor vehicle accident 594 0.31 Suicide 434 0.23 Other accidents 331 0.17 Malignant neoplasms 136 0.07 Other diseases 420 0.22 Total 1915 1.00

This table is obtained by collapsing the original table over the factor 'sex'. Conclusion - The main causes of death in this age group are motor vehicle accidents, which cause 31% of all deaths, and suicides, which account for 23%.

Conditional frequency distribution
MalesFemales
CauseNo.%No.%
Motor vehicle accidents 448 32 146 29
Suicide 350 25 84 17
Other accidents 257 18 74 15
Cancer 86 6 50 10
Other diseases 267 19 153 30
Total 1408100%507 100%

Conclusion - Motor vehicle accidents were the major cause of death for both males and females in the age group 15-24 years, accounting for about 30% of deaths. Suicides were more common for males than for females.

### The need to consider group size

In order to obtain valid comparisons among groups it is necessary to consider the sizes of the groups and to report the results similarly for all groups.

#### Example - Infant deaths in 1989

Numbers of babies who died before or just after birth in the Hunter Region in 1989.

 Area Neonatal deaths Live births Total births Death rate (%) Lake Macquarie 24 2304 2328 1.03 Newcastle 26 1835 1861 1.40 Maitland 5 814 819 0.61 Cessnock 7 725 732 0.96 Port Stephens 10 631 641 1.56 Muswellbrook 2 295 297 0.67
Source: Hunter Health Statistics Unit

You cannot directly compare the numbers of deaths in each area because these depend on the number of births. You need to convert the numbers of deaths to death rates:

### Rules for presentation of tables

#### Progress check

1. A column of numbers has total T. To convert to percentages you can
2. A contingency table classified by two factors A and B is collapsed over B. If A has three levels and B has four levels, how many cells are there in the marginal table, not counting the grand total?
3. In designing tables of statistical data, it is helpful

 ... Previous page Next page ...