## 19 November 2010

### Statistics in UX

I thought of doing a short series on statistics in this page after a recent series of posts on the IxDA discussion list where there was some confusion about what qualitative data is. Stats offers a tremendous amount of return on the time needed to learn it.

Let's start at the beginning. Data types, and there are only three types of data. This is fundamental to good stats practice because the type of data define how analysis is done.
1. Nominal (or categorical)
2. Ordinal
3. Interval
Nominal (categorical) data

These are data that have names / categories that exist independently of the other names. The classic example is sex / gender: people are (generally) male or female. Nominal data are firmly qualitative - it's impossible to argue that (for example) male is worth twice female or vice versa without being nonsensical or arbitrary.

Other examples include occupation (with exceptions, see ordinal below), nationality, brand of coffee preference, and pass/fail. A UX designer is not worth 2 accountants.

Ordinal data

These are categories that have an order in them. The classic example is a Likert scale or Likert-type scale. So if you issued a survey and one question was, "What do you think of this website?" and the answers were "superb!", "good", "don't care", "dislike it", and "hate it", then there is an order of feeling that makes the resulting data ordinal. If the responses were just "like" and "dislike", then we're back to nominal data.

Some nominal data can be ordinal but this depends entirely upon the measurement scale. The occupation exception above could be ordinal if the question was something like, "How senior are you?" and the measurements were "junior", "mid-level" and "senior". These have an order to them in which one is regarded as the highest, another the lowest and the third in-between.

Ordinal data are (generally) qualitative. Nunally argued that Likert scales with more than 11 points can be considered interval.

Interval data

These data are the classic quantitive data. Things like "reaction time in milliseconds", "amount of alcohol units consumed", "age in years", "number of errors during task" or "income in dollars".

Watch out for...

Sometimes, nominal data are subsumed into bands: for example, asking someone about their income might be done with 3 bands: "below \$25,000", "between \$25,001 and \$80,000" and "above \$80,000". The measurement is ordinal.

It is possible to change the type of data. An example is summarising interval data categories: so reaction time might instead be recorded as "fast" / "medium" / "slow" (ordinal) or "pass" / "fail" (nominal). This often affects what is referred to as the "granularity" of the data: interval are seen as fine-grained data, ordinal less so and nominal the lowest-grained.

But aren't some of these measures quantitive? For example, if I measure the gender of a sample and find 24 females and 24 males, these are quantitive?

No, the summary is just that: a summary. The underlying data are measured with 2 categories. Everything can be summarised by frequency but that doesn't make the data quantitive. The underlying data are qualitative.

Next, I will talk about how to deal with these data.