17 January 2012

Paper review: Validating interpretive arguments for licensure and certification examinations (Kane, 1994)

This article was first published on 22 January 2005.

Paper review (Kane, 1994)

Kane, MT (1994) Validating interpretive arguments for licensure and certification examinations, Evaluation and the Health Professions, 1994 (2); 17; 133-159.

Firmly based in medical education (but generalisable to other research), this was an interesting paper dealing with validity and how it impacts upon medical assessment. Initially, Kane draws 4 questions from the aspect of validity:

1. Are the results consistent over time? (reliability)
2. Are the results similar to other approaches assessing the same characteristic? (concurrent / predictive validity)
3. Are the explanations for good and bad performance plausible?
4. Are the standards for pass / fail reasonable?

From these four questions, Kane generalises validity as arising from plausible inferences and assumptions from the scores to the conclusions and decisions (i.e., validity is an interpretive argument).


American Educational Research Association (AERA) National Council on Measurement in Education (NCME), 1985) held that validity is the “appropriateness, meaningfulness, and usefulness of specific inferences made from test scores). Test validation is the process of accumulating evidence to support such inferences.”

My own personal interest concerned me only with the first section of Kane’s article - the rest was about licensure and competance, which isn’t currently part of my research. I shall look at those in great detail at some other time.

Validity: parallel and comprehensive

Kane, like Cronbach and others, holds that validity is a function of the use to which an assessment procedure is put, not of the assessment itself. Also, the degree of validity that is inferred only applies to those data examined. However, Kane calls for parallel paths of validation to be pursued by researchers: doing this increases the confidence with which the degree of validity can be assumed. It seems that by providing evidence that pursues several paths of inquiry and over several situations helps generate confidence that the use of the assessment tool is good enough for other purposes. In addition, competing explanations about the tool must also be generated and falsified for the intended assertions to be maintained.

For example, in the tool that I am examining, I have generated the hypothesis that the scores may be due to skill with information technology rather than subject competance and effective collaborative problem-solving. Comparison against a non-IT based assessment should ideally show a high correlation.

However, I would expect a less than high (but still significant) correlation to exist because the other assessments that we have are not collaborative. Indeed, we do expect IT skills to play a role in a students’ performance. However, we want this aspect to be minimised as much as possible, and instead concentrate upon the aspect of subject knowledge and collaborative problem-solving. This is because the subject is rarely performed on the ‘Net - rather it is in person. Intentionally including IT skills may confound what we want to measure and reduce the possible validity of the assessment tools use.

Continuous process

However, validation is also an ongoing process which must be pursued with every new set of data. In some ways, this highlights a problem with the scientific method which can only investigate these issues post hoc. For an educational course, this means that only after all the teaching has been done, the examinations sat, and coursework marked can the faculty say (with any real confidence) that the course was worthwhile. If it wasn’t, then there is no guarantee about a future course until the next one has been completed.

No comments: