Because qualitative data always have a limited number of alternative values, such variables are also described as discrete. All qualitative data are discrete, while some numeric data are discrete and some are continuous.

For statistical analysis, qualitative data can be converted into
discrete numeric data by simply __counting__ the different values
that appear.

Note: the word "variable" is used in two senses. It can mean an
item of data collected on each sampling unit, and it can mean "random
variable". A random variable is a variable in the mathematical sense,
but one that takes different values according to a probability
distribution. The word "variate"
is also sometimes used to mean random variable. In Statistics, we use
random variables to build probability models for data variables. This
makes sense because when data are collected on observational units
sampled at random, the values recorded for the *data* variables
can be regarded as realisations of mathematical *random*
variables.

Qualitative data arise when the observations fall into separate distinct categories.

Examples are:

Colour of eyes : blue, green, brown etc

Exam result : pass or fail

Socio-economic status : low, middle or high.

Such data are inherently discrete, in that there are a finite number of possible categories into which each observation may fall.

Data are classified as:

**nominal** if there is
no natural order between the categories (eg eye colour), or

**ordinal** if an
ordering exists (eg exam results, socio-economic status).

Quantitative or numerical data arise when the observations are
counts or measurements. The data are said to be **discrete** if the
measurements are integers (eg number of people in a household, number
of cigarettes smoked per day) and **continuous** if the measurements
can take on any value, usually within some range (eg weight).

Quantities such as sex and weight are called **variables**, because the value of
these quantities vary from one observation to another. Numbers
calculated to describe important features of the data are called statistics. For example, (i) the
proportion of females, and (ii) the average age of unemployed persons,
in a sample of residents of a town
are statistics.

The following table shows a part of some (hypothetical) data on a group
of 48 subjects.

'Age' and 'income' are continuous numeric variables,

'age group' is an ordinal qualitative variable,

and 'sex' is a nominal qualitative variable.

The ordinal variable 'age group'
is created from the continuous
variable 'age' using five categories:

age group = 1 if age is less than 20;

age group = 2 if age is 20 to 29;

age group = 3 if age is 30 to 39;

age group = 4 if age is 40 to 49;

age group = 5 if age is 50 or more

Subject No | Age (years) |
Age Group |
Annual Income (x $10,000) |
Sex |
---|---|---|---|---|

1 | 32 | 3 | 4.1 | F |

2 | 20 | 2 | 1.5 | M |

3 | 45 | 4 | 2.3 | F |

. | . | . | . | . |

. | . | . | . | . |

47 | 19 | 1 | 0.5 | F |

48 | 32 | 3 | 1.9 | F |

## Progress check |

... Previous page | Next page ... |