17 January 2012

Review - Downing (2003) Validity: on the meaningful interpretation of assessment data

This article was first published on 27 January 2005.

Review - Downing (2003) Validity: on the meaningful interpretation of assessment data


Downing, S.M. (2003) Validity: on the meaningful interpretation of assessment data. Medical Education, 37, 830-837.

This paper concentrates largely upon the issue of validity and how it pertains to medical education. Downing says that all validity for medical education assessments is construct validity which can arise from five sources. Common misconceptions held about validity assessments are also discussed.

Validity should always be tested as a hypothesis, and is an essential component of research quality. Assessment tools are described as having degrees of validity rather than all or nothing properties. Curiously, this aspect of validity is somewhat at odds with the rest of the scientific method where a significant difference allows the researcher to say that “x affected y": never can a researcher say with confidence that their assessments are “valid” or not.

The description of validity is contained within the AERA Standards of Educational and Psychological Measurement, and now instead of there being different kinds of validity, all is now contained under a unitary concept, construct validity.

Construct validity arises from the postulation of theoretical constructs whose degree of relatedness to evidence is measured by empirical assessment.

Referring to the work of Kane, validity should be argued using an “evidentiary chain” with parallel means of investigation. This chain should related the empirical evidence with the “network of theory, hypotheses and logic". This process of analysis is an ongoing exercise.

Downing mentions that validity may be “typed” into five categories: content, response process, internal structure, relationship to other variables, and consequences. Within these sources are numbers of specific examples which Downing helpfully provides in a table.

In review, the concept of validity as discussed here is very much the modern interpretation of the traditional ideas of Cronbach, Messick and others. However, it seems a complex way to determine whether a “test measures what it is supposed to measure".

One problem I see with this concept (rather: collection of concepts) is that a lot of the assessment relies on correlational data: these cannot allow a researcher to make certain claims, for correlations only show associations. While this may not affect the results of an assessment in practical and immediate terms, one is still left with the ambiguity of the cause of observed effects. A later article by Borsboom et al explains all this in more detail, and also discusses the problem with the old diminished concept of criterion validity substituting itself for construct validity. The entire concept of validity as discussed here relies upon the construction of a workable nomological network of theory into which the proposed constructs can be placed. Much research into validity doesn’t bother with this, and instead reviews validity post hoc rather than assessing it within the bounds of the scientific method.

In summary, this paper appears to offer a concise view of validity that updates the traditional concepts that are commonly referred to in assessment work. However, I feel that serious problems remain with the entire concept of what validity is, how it can be assessed, and its importance to the process of empirical enquiry.

No comments: