Because qualitative data always have a limited number of alternative values, such variables are also described as discrete. All qualitative data are discrete, while some numeric data are discrete and some are continuous.
For statistical analysis, qualitative data can be converted into discrete numeric data by simply counting the different values that appear.
Note: the word "variable" is used in two senses. It can mean an item of data collected on each sampling unit, and it can mean "random variable". A random variable is a variable in the mathematical sense, but one that takes different values according to a probability distribution. The word "variate" is also sometimes used to mean random variable. In Statistics, we use random variables to build probability models for data variables. This makes sense because when data are collected on observational units sampled at random, the values recorded for the data variables can be regarded as realisations of mathematical random variables.
Qualitative data arise when the observations fall into separate distinct categories.
Examples are:
Colour of eyes : blue, green, brown etc
Exam result : pass or fail
Socio-economic status : low, middle or high.
Such data are inherently discrete, in that there are a finite number of possible categories into which each observation may fall.
Data are classified as:
nominal if there is
no natural order between the categories (eg eye colour), or
ordinal if an
ordering exists (eg exam results, socio-economic status).
Quantitative or numerical data arise when the observations are counts or measurements. The data are said to be discrete if the measurements are integers (eg number of people in a household, number of cigarettes smoked per day) and continuous if the measurements can take on any value, usually within some range (eg weight).
Quantities such as sex and weight are called variables, because the value of these quantities vary from one observation to another. Numbers calculated to describe important features of the data are called statistics. For example, (i) the proportion of females, and (ii) the average age of unemployed persons, in a sample of residents of a town are statistics.
The following table shows a part of some (hypothetical) data on a group
of 48 subjects.
'Age' and 'income' are continuous numeric variables,
'age group' is an ordinal qualitative variable,
and 'sex' is a nominal qualitative variable.
The ordinal variable 'age group'
is created from the continuous
variable 'age' using five categories:
age group = 1 if age is less than 20;
age group = 2 if age is 20 to 29;
age group = 3 if age is 30 to 39;
age group = 4 if age is 40 to 49;
age group = 5 if age is 50 or more
| Subject No | Age (years) |
Age Group |
Annual Income (x $10,000) |
Sex |
|---|---|---|---|---|
| 1 | 32 | 3 | 4.1 | F |
| 2 | 20 | 2 | 1.5 | M |
| 3 | 45 | 4 | 2.3 | F |
| . | . | . | . | . |
| . | . | . | . | . |
| 47 | 19 | 1 | 0.5 | F |
| 48 | 32 | 3 | 1.9 | F |
Progress check |
| ... Previous page | Next page ... |