Thought Into Design: Practical Reliability of measurements

This article was originally published on 8 November 2004

Practical Reliability of measurements - a Practical Guide

Cronbach, L. J., & Shavelson, R. J. (2004) My Current Thoughts on Coefficient Alpha and Successor Procedures. Educational and Psychological Measurement, 64 (3), 391-418.)

Lee Cronbach recently published an article on the alpha coefficient which is commonly used to determine the reliability of a measurement scale [1]. This fits in well with Harwell’s [2] conception of testing human raters as measurement tools, a topic which has become very relevant to my current research.

Cronbach proposed that using just the alpha coefficient to determine reliability may be mistaken - while a useful measure, the standard error of the M might be better, and there are also other measurements to note when investigating reliability.

However, Cronbach’s article does tend to digress into historical forays at times, so here is a practical guide to using Cronbach’s article in reliability determination.

Organise the data into a matrix. Participants (those being rated) in a row, each rating in a column.
Analyse the data using a within subjects univariate anova. Don’t worry about the F-ratio.
Take the sum of squares for the following terms:

Within participants (the person variance);
Within conditions (each independent rating;
The residual / error;

Calculate the estimated variances from the universal mean (which you cannot accurately calculate) for each aspect (person, test, residual) as follows:

Residual - as is;
Person = (residual - test);
Test = (residual - person);

Calculate the alpha coefficient as follows:

alpha = Person variance / person variance + ( (test variance + residual) / k’).
k’ is most commonly the number of conditions, i.e., the number of tests.
The SEM is the square root of the residual.

The resulting figure should be an alpha coefficient which indicates a measure of internal consistency of the test.

However, Cronbach mentions that the other statistics are also of use: estimated variances calculated above can be useful for identifying where a test may be having problems. For example, if a strong residual interferes with the alpha calculation (i.e., lowers it), then the analyst may infer that there is an overly large interaction between the person and the test (i.e., people are being scored in different ways). For a test that shows no evidence of internal consistency, the full range of statistics here can be a useful diagnostic tool which enriches the data obtained when compared to running just an alpha coefficient.

References

[1] Cronbach, L. J. (1951) Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.

[2] Harwell, M. (1999) Evaluating the validity of educational rating data. Educational and Psychological Measurement, 59 (1), 25-37.

Thought Into Design

17 January 2012

Practical Reliability of measurements - a Practical Guide

No comments: