17 January 2012

Review - Borsboom, Mellenbergh & van Heerden (2004) The concept of validity

This article was first published on 26 January 2005.

Review - Borsboom, Mellenbergh & van Heerden (2004) The concept of validity


Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2004) The concept of validity. Psychological Review, 111(4), 1061-1071.

This study begins by proposing a simplification in the concept of validity which is defined by two requirements: “a test is valid for measuring an attribute if (a) the attribute exists, and (b) variations in the attribute produce variation in the measurement outcomes” (in abstract). The authors make the claim that their theory is not only simpler but also theoretically superior. However, research purists may be concerned that the concept of validity decreases in scope and importance.

The concept of construct validity was expounded in a classic article by Cronbach & Meehl (1955) and expanded upon by Messick (1989), and the authors note that researchers’ conceptions of this work commonly differ. Very often, researchers will substitute the concepts of criterion validity for construct validity without being aware of the difference.

What seems to be the authors’ main criticism of construct validity is that research into the degree of a tests validity is based on epistemological characteristics, whereas effective measurement procedures in other fields use ontological claims. Ignoring the fact that ontology defines the effectiveness of a measurement scale means that the process of validation is difficult and tortuous (and it’s probably impossible to provide a clear answer).

Another objection to construct validity is its reliance on correlational measures and the way these are traditionally held to indicate the degree of validity of a test by having it compared to existing tests. The presence of a correlation (no matter how high) will not however infer any causative link. There are three grounds to this:

  1. criterion validity (which implied validity through the correlation of a test to a criterion) can imply that a test can measure many different things - if a series of universal characteristics affecting an experiment could be measured and compiled, there would be many correlations between variables and the test. The authors quote Guilforf (1946): “a test is valid for measuring many things". This is a weakness of criterion validity.
  2. the size of correlation equates with the degree of validity. The authors exemplify this by showing a perfect correlation between thunder and lightning - this does not though allow the measurement of lightning by measuring thunder. However, one response to this could be that both thunder and lightning arise from a common cause (an atmospheric event) which is something that the test does measure. But again, access to this event is not possible using associational measures.
  3. correlations are population-dependent statistics.

The authors propose their own definition which reduces the importance assigned to a tests validity. One interesting benefit is that validity becomes a dichotomous measure: either an attribute exists and the test measures it, or it does not.

The authors also feel that what was termed as experimental validity should be renamed, possibly as “overall quality". This would consist of validity, reliability (which is no longer a subset of validity), predictive accuracy, and absence of bias among others. One consequence of this is that validity would still be an essential component of research quality, but would be reduced in status.

Overall, the authors present what I feel is a compelling case. I not only feel that the current practices in validation are obscure and unworkable, but also overly complex. While some purists may not like the reduction of importance in validity, it does make sense.

No comments: