18 January 2012

Statistics in UX - Part II

This is the second part of my article on statistics in user experience. Part I is here and discusses types of data. In this article, we begin to see what to do with it.

Descriptive statistics

This is essential to analysing data and represents what some people see as statistics.

Descriptive statistics just describe the data. There are various measurement types that you can use (depending upon what research questions you have) to understand what happened when you took the data.

But first think back to the last article: we discussed nominal, ordinal and interval data. The types of data you have will shape the kind of descriptive statistics you need. There are good reasons behind this and statistics is far from being an academic discipline. It's based entirely upon understanding the world around us.

Measures of central tendency.

These measure tell you where data tends to be centred. The catch-all word "average" covers a number of measures which you have most likely heard of at some point. Each of these measures has assumptions which have to be met before the test can be used.

Mean, or more correctly, the arithmetic mean is a measure used for interval data. It's fairly meaningless in terms of ordinal data and certainly nominal data. Imagine if you did a survey with 100 respondents and 60 of them were women. Can you say that the mean sex of respondents was 1.2?

The mean is calculated as the sum of all values divided by the number of values. So a list of 4 response times [234ms, 265ms, 289ms, 198ms] would have a mean of:

(234 + 265 + 289 + 198) / 4

= 986 / 4

= 246.5.

The mean seems to make more sense for ordinal scales like Likert scales but there is a danger. The mean works well only when there is a normal distribution of data (a bell-curve). Even then, it can be hard to make sense of the data. A better measure to use is the median.

The median is another measure of central tendency that produces the middle value. Imagine if we got all values, sorted them into an order and found the central point. If there is an odd number of values, then 1 value is the median. If there is an even number, then the mean of the 2 central values is the median.

Say we have a 5 point Likert type scale given to 10 people. Responses are [1,2,4,3,5,3,2,1,2,1,2]. Sorted, this list becomes [1,1,1,2,2,2,2,3,3,4,5]. The central values are 2 and 2 ([1,1,1,2,2,2,2,3,3,4,5]) and the mean of [2, 2] is 2! We can say that the median response is 2.

Compare this with the mean which is 2.6 - not vastly different but different enough.

The media is useful when the mean cannot be used: remember the assumption of a normal distribution? Well if you want to calculate the mean of a nation's salary, you will probably not get a normal distribution: rather, it will be skewed to the left (a negative skew) because most people will be earning little and only a handful raking in the millions that we hear about (very extreme and uncommon values are called outliers). Reporting the mean will be meaningless.

The median protects somewhat against skewed data. By taking the central point, a figure more representative is found.

The mode is controversial. Some statisticians say it is a measure of central tendency; others say it is not. The mode is the most commonly occurring value. For interval data (like response times), the mode doesn't make much sense. There may well be no mode because each response time happened only once. But for Likert scale, it makes sense: with the above data, we can say the modal value is 2 because it occurs 4 times which is more frequent than any other value.

It also makes sense to report modes for nominal data. With the above example of sex, we can say the modal value is female (but we'd have to report the statistic, so something like, "Of the 100 participants, 60 were female" is probably enough.

The choice of what statistic to use depends upon your research question. If you're trying to find out the likely disposable income of users, the median will be best. If you're answering a question about tax income on a national level, then the mean is best.

There are other measures of central tendency (harmonic mean, geometric mean and so on) but they are rarely used in UX.

In the next article, I'll talke about measures of variance.

No comments: