This article was first published on 18 January 2006.
A recent study shows that decisions about webpages may be made in the first 50 milliseconds of viewing. This has implications for website designers. In this article, we investigate this claim.
It’s been widely reported in the media that websites are evaluated in 50 milliseconds - this is all the time that designers have to make a good impression. This article will discuss these findings i) in terms of the quality of the research itself; and ii) if the research is valid, what are the implications for designers?
The paper is by Lindgaard et al (2006), but instead of believing everything I read in secondary sources, I decided to take a look at the actual paper itself. After all, there’s nothing like going to the primary source!
Human visual processing needs a certain amount of time to recognise objects (in the order of a few hundred milliseconds) but emotional judgements are made far more quickly, the authors contend. Further, later decisions about an object are not so much based on rational thought, but instead follow the principles of the cognitive confirmation bias (cognitive constancy) in which aspects congruent to the initial decision are focused upon and used to justify decisions. This means that the very basic elements of layout and design are important in the judgement of a websites quality which reduces the importance of other factors such as the content and facilities.
This study is important because, if valid, it indicates that the aims of usability (to ensure that websites meet the “holy trinity” of being effective, efficient, and satisfying to the user [ref ISO9241]) may be less important than a gut reaction as to its “coolness". The very idea that cool sites might be better than more usable but drab ones is central to the work of usability practitioners. After all, no amount of painstaking work will compensate for having a site that just looks drab (and if so, should Jakob Nielsen beware?)
Analysis
My reservations about the paper comes from 3 different areas.
Measurement scale
The first is the measurement of the 50ms condition in the third experiment. The other two experiments examined the stimuli with an exposure interval of 500ms. The third experiment correlated responses between a 50ms and a 500ms condition while introducing a new measurement scale. Earlier measurements were made using a computer presented line with the only prompts being “very unattractive” and “very attractive", each at a different end of the line. The third experiment introduced a 9 point scale. Although there is nothing strictly wrong with this, I would prefer to introduce only one new element to a new experiment. Introducing more than one leaves the risk that there might be a complex interaction between the new elements whereas if all but one element is the same, you can be sure why any difference occurred.
New design
The third experiment introduced a between-subjects design: participants saw web pages presented only for 50 ms or 500 ms. I felt that a within-subjects design would have offered more power to investigate the research question because it would have allowed comparison of the same page by the same person under different conditions.
A respectable interval between the two testing phases would have reduced the probability of demand characteristics (i.e., participants “remembering” their scores from a previous exposure). If the interval was, say, a week, then I would feel more confident of assertions about intra-rater reliability.
Can measurements be made in less than 500ms?
The analysis for this involved collapsing attractiveness responses for each participant across all webpages and then correlating them. I don’t feel too confident about this because collapsing scores (to me at least) should be done very carefully. There is a danger that a lot of variance is removed from the analysis making a Type I error more likely.
If scores are collapsed across pages, it is quite reasonable to expect the scores to lie close to the median or grand mean of all scores. A correlation might be meaningless in this case. The alpha of interrater reliability is not reported. If a within-subjects design had been used, reliability could have been tested with an intraclass correlation which would be more meaningful. Unfortunately, I think that the association between the 500ms and 50ms scores is less strong than reported.
True generalisation?
The authors report work by Zajonc which claims that judgements made at 500ms are reliable indicators of longer-term judgements. I would prefer to see specific evidence of the halo effect on longer term judgements when webpages are concerned. It would be interesting to see what scores participants would have made with a long term exposure. While only anecdotal, I have encountered websites that made me cringe when I first went there, but the content was rich enough for me to override an effect of confirmation bias.
Conclusion
I really liked the first two experiments and consider them valuable additions to the field. It is certainly important to realise that judgements about a website may often be made on the tiniest of exposures. However, I have concerns about the design of the experiment that lead me to reject the claim that designers have to build sites that impress in 50ms exposure. These concerns would be easily wrapped up in a couple of experiments: use a within-subjects design and test intra-rater reliability with an intraclass correlation coefficient; test the relationship between short term (ie, 500ms or less) exposures and longer term judgements made after at least some degree of interaction with the site; and keep the measurement scales constant between experiments.
On the whole though, a good read!
References
Lindgaard G, Fernandes G, Dudek C, Brown J (2006) Attention web designers: You have 50 milliseconds to make a good first impression! Behaviour and Information Technology, Vol 25, No. 2. 115-126.
50ms to rate a webpage!
A recent study shows that decisions about webpages may be made in the first 50 milliseconds of viewing. This has implications for website designers. In this article, we investigate this claim.
It’s been widely reported in the media that websites are evaluated in 50 milliseconds - this is all the time that designers have to make a good impression. This article will discuss these findings i) in terms of the quality of the research itself; and ii) if the research is valid, what are the implications for designers?
The paper is by Lindgaard et al (2006), but instead of believing everything I read in secondary sources, I decided to take a look at the actual paper itself. After all, there’s nothing like going to the primary source!
Human visual processing needs a certain amount of time to recognise objects (in the order of a few hundred milliseconds) but emotional judgements are made far more quickly, the authors contend. Further, later decisions about an object are not so much based on rational thought, but instead follow the principles of the cognitive confirmation bias (cognitive constancy) in which aspects congruent to the initial decision are focused upon and used to justify decisions. This means that the very basic elements of layout and design are important in the judgement of a websites quality which reduces the importance of other factors such as the content and facilities.
This study is important because, if valid, it indicates that the aims of usability (to ensure that websites meet the “holy trinity” of being effective, efficient, and satisfying to the user [ref ISO9241]) may be less important than a gut reaction as to its “coolness". The very idea that cool sites might be better than more usable but drab ones is central to the work of usability practitioners. After all, no amount of painstaking work will compensate for having a site that just looks drab (and if so, should Jakob Nielsen beware?)
Analysis
My reservations about the paper comes from 3 different areas.
Measurement scale
The first is the measurement of the 50ms condition in the third experiment. The other two experiments examined the stimuli with an exposure interval of 500ms. The third experiment correlated responses between a 50ms and a 500ms condition while introducing a new measurement scale. Earlier measurements were made using a computer presented line with the only prompts being “very unattractive” and “very attractive", each at a different end of the line. The third experiment introduced a 9 point scale. Although there is nothing strictly wrong with this, I would prefer to introduce only one new element to a new experiment. Introducing more than one leaves the risk that there might be a complex interaction between the new elements whereas if all but one element is the same, you can be sure why any difference occurred.
New design
The third experiment introduced a between-subjects design: participants saw web pages presented only for 50 ms or 500 ms. I felt that a within-subjects design would have offered more power to investigate the research question because it would have allowed comparison of the same page by the same person under different conditions.
A respectable interval between the two testing phases would have reduced the probability of demand characteristics (i.e., participants “remembering” their scores from a previous exposure). If the interval was, say, a week, then I would feel more confident of assertions about intra-rater reliability.
Can measurements be made in less than 500ms?
The analysis for this involved collapsing attractiveness responses for each participant across all webpages and then correlating them. I don’t feel too confident about this because collapsing scores (to me at least) should be done very carefully. There is a danger that a lot of variance is removed from the analysis making a Type I error more likely.
If scores are collapsed across pages, it is quite reasonable to expect the scores to lie close to the median or grand mean of all scores. A correlation might be meaningless in this case. The alpha of interrater reliability is not reported. If a within-subjects design had been used, reliability could have been tested with an intraclass correlation which would be more meaningful. Unfortunately, I think that the association between the 500ms and 50ms scores is less strong than reported.
True generalisation?
The authors report work by Zajonc which claims that judgements made at 500ms are reliable indicators of longer-term judgements. I would prefer to see specific evidence of the halo effect on longer term judgements when webpages are concerned. It would be interesting to see what scores participants would have made with a long term exposure. While only anecdotal, I have encountered websites that made me cringe when I first went there, but the content was rich enough for me to override an effect of confirmation bias.
Conclusion
I really liked the first two experiments and consider them valuable additions to the field. It is certainly important to realise that judgements about a website may often be made on the tiniest of exposures. However, I have concerns about the design of the experiment that lead me to reject the claim that designers have to build sites that impress in 50ms exposure. These concerns would be easily wrapped up in a couple of experiments: use a within-subjects design and test intra-rater reliability with an intraclass correlation coefficient; test the relationship between short term (ie, 500ms or less) exposures and longer term judgements made after at least some degree of interaction with the site; and keep the measurement scales constant between experiments.
On the whole though, a good read!
References
Lindgaard G, Fernandes G, Dudek C, Brown J (2006) Attention web designers: You have 50 milliseconds to make a good first impression! Behaviour and Information Technology, Vol 25, No. 2. 115-126.
No comments:
Post a Comment