Measurement science has developed objective methods for evaluating cognitive skills and habits of mind that are more precise, more valid, and more reliable than writing samples.
Writing prompts that invite a person to give reasons and evidence to support their analyses, inferences, explanations, evaluations and interpretations can be useful exercises for developing stronger critical thinking. That said, using writing samples is a less than ideal method of assessing critical thinking because of inherent issues like invalidity, a lack of precision, insufficient variance, unreliability, and misrepresentation.
Many people believe that writing samples provide a good test of critical thinking. We asked Dr. Peter Facione, senior researcher and author at Insight Assessment to explain why:
First, and most obviously, ratings of writing samples introduce a validity threat when evaluators conflate writing craft and a facility with rhetorical devices with critical thinking skills.
- This is apart from the common human tendency to favor writing samples that present views with which the reader may agree vs. those that argue for positions with which the reader disagrees.
- There are other threats to validity as well. For example, all the ways that humans use critical thinking that do not include writing are automatically eliminated if the only manifestation evaluated is what the person is able to express in written form. This limitation severely restricts the range of possible applications and manifestations of our critical thinking skills.
- Additionally, using writing samples typically does not provide the opportunity to test specific aspects of critical thinking such as its application in different reasoning contexts, empirical, comparative, ideological, and quantitative. Nor does using writing typically permit the more detailed scrutiny of the writer’s ability to resist locking-in prematurely to a given alternative, to recognize and avoid common reasoning errors, and to overcome the tendency to misapply cognitive heuristics.
A second area of concern is that writing based assessments do not spread out scores widely (lack variance) and those scores they do yield are imprecise.
Written work is typically scored using four or five categories in the way that a professor might assign letter grades to an essay. While there is a rank ordering to the grades, the intervals between the grades are not necessarily uniform. In measurement science we prefer smoothly uniform intervals between scores, which is why a large range of numerical scores, e.g. between 65 and 100, offers more precision and more variance. The more precision and the more variance a valid and reliable measurement tool permits, the better the tool.
- The reliability problems in the evaluation of writing samples are well documented. That is why so much training is needed for the evaluators. In our experience even well-trained human graders using rubrics can disagree about what score to apply to written work. We know too that human graders have difficulty reliably evaluating the writer’s critical thinking when the writer uses humor, irony, hyperbole, invective, or sarcasm.
- Computers can be programmed to be as reliable as trained human raters, but that is actually a rather low threshold. Currently available computer grading algorithms are incapable of understanding what is written. The computers are not considering the quality of the critical thinking process used, instead the machines are looking only for syntactical markers, like sentence length, the frequency of the use of specific words, and grammatical construction.
But, perhaps the most important consideration comes from the very nature of what a writing sample represents: inevitably a writing sample is a person’s a reconstruction of their own thinking, but not the thinking itself.
Writers do not record and report their thought process as an entirely unedited stream of consciousness. Writing samples are fabrications offered in most cases to make one appear to have been more thoughtful and more reasonable than was actually the case.
- Thanks to Dr. Peter Facione, a Senior Researcher at Insight Assessment and principal at Measured Reasons LLC, a Los Angeles based research and consulting firm supporting excellence in strategic thinking and leadership decision making. Dr. Facione is the developer and author of the California Critical Thinking Skills Test family of measurement tools; his latest book is Think Critically, 2016, Facione & Gittens.
Insight Assessment offers a comprehensive array of validated, reliable and objective critical thinking assessments calibrated for students and professionals. Contact us to discuss your assessment needs.