17 January 2012

Construct Validity

This article was published on 8 December 2004.

Construct Validity

Currently, I’m reading Jum Nunnally’s book, Psychometric Theory. It’s a very fine read and well explained, and I’m finding it to be a solid bonus for my work in evaluating a particular educational assessment tool.

However I am having to ascertain the validity of the tool, and the only applicable method of validation is construct validity. Consider this brief article to be an outline of what construct validity is (and isn’t?).

To begin with, Nunnally says that construct validity is based upon constructs. These are things, but curiously, the scientist who engages upon an examine of them will never know what they are. Or even whether the name he or she gives them is correct. Or even how they operate.

But the scientist will know how they relate to other constructs. That is extremely useful for it can be used, for example, to show whether one assessment tool has an association with another assessment tool. Like seeing if a test that claims to measure extraversion might be related to a tool that measures, um, extraversion. This helps us to understand whether a new tool is useful in measuring what it is supposed to be measuring. Wicked!


But what is the first tool measuring? Umm, well, the scientist doesn’t actually know that. That’s because it’s based on a construct, just like the new tool that he or she is messing with. So yes, this means that one tool which measures “something” is similar (but not exactly the same) to another tool which measures “something". And of course, those two somethings might not be related at all - it could all be chance or due to a third (or fourth! Or fifth!) factor that nobody has even suspected is in existence.

To recap: construct validity is useful to see if a tool that measures something we don’t know about is similar to another tool that measures something we don’t know about. Okay, got that?

But then of course, both tools can (if the scientist is willing and energetic enough) be compared to some other tools that may or may not measure the same thing. But even if they did, then they would probably be measuring something else at the same time, but the scientist doesn’t even know that the third thing exists, only that it might do.

Of course, in reality the situation is more bleak. There are probably several thousand factors all operating at the same time, some in concert, others in isolation, and the calculations needed to get at them would be hideously complex. However, there is no real way of knowing a) how many factors there are; b) what effect these factors have; c) whether these factors interact; d) to what extent do they interact; e) whether the evidence for these factors is due to an artifact in the data; f) whether the existence of an artifact in the data is due to some amazingly serendipidous finding that will hitherto go ignored; and g) what I had for lunch.

Well, for lunch I had a pasty, so that’s one cleared up at least.


But the rest still linger. But still, ploughing on through using this technique in the real world, I have understood that something (of which I know nothing) could therefore be similar to something else (of which I know nothing), but probably isn’t because life is just too complicated.

But seriously folks…

…this method is actually quite good. I would recommend Nunally’s book as it is well written and the author obviously lives in the real world (he mentions how analysis of construct validity should happen, and compares it to how it usually happens, giving advice along the way to harrassed and perplexed researchers.

And construct validity itself? I have found it a useful technique because it has allowed me to better understand the tool. I have compared it to other measures of educational performance which have seem to be good measures (they’ve been examined by quite a few domain experts) and found high levels of association. This implies that the new tool is measuring something of what it is supposed to be measuring, but also that it is measuring other things too. This was expected because it is a new tool using new ways of teaching and applying knowledge; if I had a perfect or even a high association, the tool would have been problematic (ie, how can a multiple choice questionnaire produce the same pattern of results as a collaborative group exercise?).

No comments: