Monday, July 07, 2014
The Robot Says You Flunked: Algorithms versus Judgment
Harvard and MIT have teamed to develop an artificial-intelligence system that grades essay questions on exams. The way it works is this. First, a human grader manually grades a hundred essays, and feeds the essays and the grades to the computer. Then the computer allegedly learns to imitate the grader, and goes on to grade the rest of the essays a lot faster than any manual grader could—so fast, in fact, that often the system provides students nearly instant feedback on their essays, and a chance to improve their grade by rewriting the essay before the final grade is assigned. So we have finally gotten to the point of grading essays by algorithms, which is all computers can do.
Joshua Schulz, a philosophy professor at DeSales University, doesn't think much of using machines to grade essays. His criticisms appeared in the latest issue of The New Atlantis, a quarterly on technology and society, and he accuses the software developers of "functionalism." Functionalism is a theory of the mind that says, basically, the mind is nothing more than what the mind does. So if you have a human being who can grade essays and a computer that can grade the same essays just as well, why, then, with regard to grading essays, there is no essential difference between the two.
With all due respect to Prof. Schulz, I think he is speculating, at least when he supposes that the essay-grading-software developers espouse a particular theory of the mind, or for that matter, any theory of the mind whatsoever. The head of the consortium that developed the software is an electrical engineer, not a philosopher. Engineers as a group are famously impatient with theorizing, and simply use whatever tools fall to hand to get the job done. And that's what apparently happened here. Problem: tons and tons of essay questions and not enough skilled graders to grade them. Solution: an automated essay grader whose output can't be distinguished from the work of skilled human graders. So where is the beef?
The thing that bothers Prof. Schulz is that the use of automated essay-grading tends to blur the distinction between the human mind and everything else. And here he touches on a genuine concern: the tendency of large bureaucracies to turn matters of judgment into automatic procedures that a machine can perform.
Going to extremes can make a point clearer, so let's try that here. Suppose you are unjustly accused of murder. By some unlikely coincidence, you were driving a car of a similar make to the car driven by a bank robber who shot and killed three people and escaped in a car whose license plate number matches yours except for the last two digits, which the eyewitness to the crime didn't remember. The detectives on the case didn't find the real bank robber, but they did find you. You are arrested, and in due time you enter the courtroom to find seated at the judge's bench, not a black-robed judge, but a computer terminal at which a data-entry clerk has entered all the relevant data. The computer determines that statistically, the chances of your being guilty are greater than the chances that you're innocent, and the computer has the final word. Welcome to Justice 2.0.
Most people would object to such a delicate thing as a murder trial being turned over to a machine. But nobody has a problem with lawyers who use word processors or PowerPoints in their courtroom presentations. The difference is that when computers and technology are used as tools by humans exercising that rather mysterious trait called judgment, no one being judged can blame the machines for an unjust judgment, because the persons running the machines are clearly in charge.
But when a grade comes out of a computer untouched by human hands (or unseen by human eyes until the student gets the grade), you can question whether the grader who set the example for the machine is really in charge or not. Presumably, there is still an appeals process in which a student could protest a machine-assigned grade to a human grader, and perhaps this type of system will become more popular and cease to excite critical comments. If it does, we will have moved another step along the road that further systematizes and automates interactions that used to be purely person-to-person.
Something similar has happened in a very different field: banking. My father was a loan officer for many years at a small, independent bank. He never finished college, but that didn't keep him from developing a finely honed gut feel for the credit-worthiness of prospective borrowers. He wouldn't have known an algorithm if it walked up and introduced itself, but he got to know his customers well, and his personal interactions with them was what he based his judgment on. He would guess wrong once in a great while, but usually because he allowed some extraneous factor to sway his judgment. For example, once my mother asked him to loan money to a work colleague of hers, and it didn't work out. But if he stuck to only the things he knew he should pay attention to, he did pretty well.
Recently I had the occasion to borrow some money from one of the largest national banks in the U. S., and it was not a pleasant experience. I will summarize the process by saying it was based about 85% on a bunch of numbers that came out of computer algorithms that worked from objective data. At the very last step in the process, there were a few humans who intervened, but only after I had jumped through a long series of obligatory hoops that allowed the bankers to check off "must-do" boxes. If even one of those boxes had been left blank, no judgment would have been required—the machine would say no, and that would have been the end of it. I got the strong impression that the people were there mainly to serve the machines, and not the other way around.
The issue boils down to whether you think there is a genuine essential difference between humans and machines. If you do, as most people of faith do, then no non-human should judge a human about anything important, whether it's for borrowing money, assigning a grade, or going to jail. If you don't think there's a difference, there's no reason at all why computers can't judge people, except for purely performance-based factors such as the machines not being good enough yet. Let's just hope that the people who think there's no difference between machines and people don't end up running all the machines. Because there's a good chance that soon afterwards, the machines will be running the people instead.
Sources: The Winter 2014 issue of The New Atlantis carried Joshua Schulz's article Machine Grading and Moral Learning on pp. 109-119. The New York Times article from which Prof. Schulz learned about the AI-based essay grading system is available at http://www.nytimes.com/2013/04/05/science/new-test-for-computers-grading-essays-at-college-level.html. The Harvard-MIT consortium's name is edX.
Note to Readers: In my blog of June 16, 2014, I asked for readers to comment on the question of monetizing this blog. Of the three or four responses received, all but one were mostly positive. I have decided to attempt it at some level, always subject to reversal if I think it's going badly. So in the coming weeks, you may see some changes in the blog format, and eventually some ads (I hope, tasteful ones) may appear. But I will try to preserve the basic format as it stands today as much as possible.