Most variables in a data set can be classified into one of two major types.

Numerical variables

The values of a **numerical** variable are numbers. They can
be further classified into **discrete** and **continuous**
variables.

- Discrete numerical variable
- A variable whose values are
**whole numbers**(counts) is called discrete. For example, the number of items bought by a customer in a supermarket is discrete. - Continuous numerical variable
- A variable that may contain
**any value within some range**is called continuous. For example, the time that the customer spends in the supermarket is continuous.

Statistical methods that can be used for continuous variables are not always appropriate for discrete variables.

The distinction between discrete and continuous variables is
important. |

Categorical variables

The values of a **categorical** variable are selected from a
small group of categories. Examples are gender (male or female) and marital
status (never married, married, divorced or widowed).

Categorical variables can be further categorised into **ordinal**
and **nominal** variables.

- Ordinal categorical variable
- A categorical variable whose categories can be
**meaningfully ordered**is called ordinal. For example, a student's grade in an exam (A, B, C or Fail) is ordinal. - Nominal categorical variable
- It does not matter which way the categories are ordered in tabular or graphical displays of the data -- all orderings are equally meaningful. For example, a student's religion (Atheist, Christian, Muslim, Hindu, ...) is nominal.

Most statistical methods for categorical data can be applied to both ordinal and nominal variables.

We rarely distinguish between ordinal and nominal variables
in CAST. |

Labels

In some data sets, each individual has a unique 'name' that can be used to
identify it. We call such a variable a **label variable**. The
labels may help us to identify unusual observations in the data set.

Warning!

Sometimes categorical variables are **coded** as numbers when
the data are recorded (e.g. gender may be coded as 0 for males and 1 for females).
The variable is still categorical, despite the use of numbers.

In a similar way, the individuals in a survey may be coded with a number that uniquely identifies them (perhaps to avoid storing names in the data for confidentiality). This is really a label variable and may be simply the row number in the data matrix.

When you see a column of numbers in your data matrix, do not
assume that it is a numerical variable. |

Characteristics of employees

Consider the following data set that describes characteristics of the employees of a company.

Name | Sex | Age | Marital status | No of children | Income | Smoking |
---|---|---|---|---|---|---|

John Smith | male | 24 | single | 0 | $25,000 | never smoked |

Mary Brown | female | 35 | married | 3 | $45,000 | current smoker |

Adam Jones | male | 42 | divorced | 1 | $40,000 | former smoker |

Jane Robertson | female | 29 | divorced | 0 | $42,000 | never smoked |

... | ... | ... | ... | ... | ... |

*Name*is a label variable.*Sex*,*Marital Status*and*Smoking*are nominal categorical variables. (However if we regard 'former smoker' as being**between**'never smoked' and 'current smoker' then it could be treated as ordinal.)*Age*and*Income*are continuous numerical variables. (Although the recorded ages have been truncated to whole numbers, the concept of age is continuous.)*Number of children*is a discrete numerical variable (a count).

European countries

The diagram below shows some data about countries in Europe.

Membership of EU | Distinguishes between countries that joined the EU before 2000, those that joined between 2000 and 2005, countries that were candidates in 2005, and others. |
---|---|

GDP per cap | Gross Domestic Product (GDP) per capita in 2003. |

Phones | Number of fixed-line and mobile phones per 1,000 in 2002. |

PCs | Personal computers per 1,000 in 2002. |

Energy | Energy use (kg of oil equivalent per capita) in 2002. |

*The data were obtained from the World
Bank (http://www.worldbank.org/data).*

The first of these variables is a nominal categorical variable and the others are continuous numerical ones. A map of Europe is coloured to represent the values of the variables.

Use the pop-up menu to select the variable to display on the map and investigate its distribution through Europe.

Click on a row of the data matrix or a country on the map to highlight it in both parts of the diagram.

Note that some values are unknown (shaded in grey on the map). |