Grading essays: Human vs. machine

by Jordan Bienstock, CNN

(CNN) No one thinks twice about using machines to grade multiple-choice tests. For decades, teachers – and students – have trusted technology to accurately decipher which bubble was filled in on a Scantron form.
But can a machine take on the task of evaluating the written word?
A recent study conducted by the College of Education at the University of Akron collected 16,000 middle and high school test essays from six states that had been previously graded by humans. The essays were then fed into a computer scoring program.
    According to the researchers, the robo-graders “achieved virtually identical levels of accuracy, with the software in some cases proving to be more reliable.”
    So the simple answer to whether machines can grade essays would appear to be yes. However, the situation is anything but simple.
    The grading software looks for elements of good writing, such as strong vocabulary and good grammar.
    What it isn’t able to do is distinguish nuance, or even truth.
    Les Perelman, a director of writing at the Massachusetts Institute of Technology, is a critic of these robo-graders. He’s had a chance to study how some of the programs work, and says they can be gamed if you can determine the preferences set by the scoring algorithms.
    For example, Perelman said in a New York Times article that the machines focus on composition, but have no concern with accuracy. According to Perelman, “any fact will do as long as it is incorporated into a well-structured sentence.”
    Dr. Mark Shermis, dean of Akron’s College of Education and one of the authors of the study, acknowledges that “automatic grading doesn’t do well on very creative kinds of writing. But this technology works well for about 95 percent of all the writing that’s out there.”
    Another point in the machine’s favor: speed. The New York Times article points out that human graders working as quickly as possible are expected to grade up to 30 essays in an hour.
    In contrast, some robo-graders can score 16,000 essays in 20 seconds.
      That disparity would seem to support Shermis’ view that robotic graders can serve “as a supplement for overworked” entry-level writing instructors. But he warns that his findings shouldn’t be used as a justification to replace writing instructors with robots.
      What these robo-graders can do, Shermis says, is “provide unlimited feedback to how you can improve what you have generated, 24 hours a day, seven days a week.”