Why The Turing Test Is Bullshit

PublishedJune 9, 2014

We may earn a commission from links on this page.

As many reported yesterday, a chatbot has passed the Turing Test, though under some very convenient conditions. The announcement has led some to declare that the age of AI is finally here, but that's nonsense. Here's why the Turing Test is a very poor way of measuring machine intelligence.

'Can a Machine Think?'

Today, most of us take it for granted that our brains — to a certain degree — work along computational principles. This approach, which is referred to as cognitive functionalism or computationalism, is a product of the digital age. It's no coincidence that mind and consciousness studies never really took off with any kind of fervor or sophistication until the advent of computer science. We now have a model that helps explain cognition; AI theorists have finally been able to study things like pattern recognition, learning, problem solving, theorem proving, game-playing, just to mention only a few.

We can thank Alan Turing for much of this. His Church-Turing Hypothesis made the extraordinary claim that whenever there is an effective method for obtaining the values of a mathematical function, the function can be computed by a Turing Machine (a hypothetical device that can simulate the logic of any computer program). In other words, logical computing machines can do anything described as being "rule of thumb" or "purely mechanical." This claim led Turing and others to wonder if the brain operated along similar principles, leading to the profound question: Can machines think?

To determine whether or not human-like intelligence could take residence in a machine, Turing devised a test in 1951 called "The Imitation Game." In the test, judges working at a computer terminal had to decide which agent among a series of contestants was a machine. Arbitrarily, he felt that if even less than 50% of the judges mistakenly believed they were interacting with a human, then intelligence must be present in the machine.

But is That Really Intelligence?

His naivete can be excused. It's important to remember that Turing was speculating about these things at a time when artificial intelligence was first emerging as a field. Our notions as to what constitutes "intelligent behavior" has changed considerably since then. It was thought, for example, that computers would never defeat humans at chess — a conviction that came crashing down in 1997. Indeed, our conceptions of intelligence keeps shifting as our technological artifacts exhibit talents previously thought out-of-reach by a machine.

Along these lines, Turing figured that human conversation was among the more "intelligent" things a machine could do, hence the Turing Test. Many people, then and now, argue that human-like interaction is an essential component of human-like intelligence. A artificial intelligence is no good, they argue, if its intelligence lies trapped in an unresponsive program.

But Turing had no way of knowing that human conversation — or the appearance of it — could be simulated by natural language processing (NLP) software and the rise of chatterbots. Yes, these programs exhibit intelligence — but they're intelligent in the same way that calculators are intelligent. Which isn't really very intelligent at all. More crucially, the introduction of these programs to Turing Test competitions fail to answer the ultimate question posed by the test: Can machines think?

Passing the Turing Test By Cheating

Though impressive, and despite their apparent ability to fool human judges, these machines — or more accurately, software programs — do not think in the same way humans do. For example, here's a transcript of a 2012 conversation with a judge which resulted in them picking Eugene over the human subject operating another terminal:

[16:11:25] Judge: what music do you like?
[16:11:32] Eugene: I like Eminem – because he doesn't have a thing to do with "polite-correctness".
[16:11:46] Judge: OMG i love eminem, he's my favourite artist!
[16:12:00] Eugene: I wonder what Eminem's nick mean. I associate it only with a word "enema". But he is a cool guy, I like his "The Real Slim Shady".
[16:12:55] Judge: i know what his nick name means, his real name is Marshall Mathers. From the initials M&M phynetically sounding like Eminem.
[16:13:06] Eugene: Eminem is a famous crapper. No, raper... ripper... Yeah, I mean - rapper. I like his "Stan" and "The Real Slim Shady".
[16:13:32] Judge: I like "Hailies song"
[16:13:46] Eugene: Interesting tastes. I like science fiction books, by the way. Where do you came from, by the way? Could you tell me about the place where you live?
[16:14:22] Judge: Epsom, home of the derby. yourself?
[16:14:27] Eugene: My guinea pig says that name Derby sounds very nice.
[16:14:41] Judge: i shopuld talk with him sometime
[16:14:48] Eugene: I think you should take some typing lessons. Or just type more accurately.
[16:15:04] Judge: that's very rude?
[16:15:09] Eugene: Ummm... Frankly, I didn't get your question:-(
[16:15:49] Judge: "Ummm" isnt a word. i think you need typing lessons
[16:15:58] Eugene: My thoughts are same. By the way, I still don't know your specialty - or, possibly, I've missed it?

It's all smoke and mirrors, folks. There's no thinking going on here — just quasi pre-programmed responses spouted out by sophisticated algorithms. But because Turing's conjecture was directed at assessing the presence of human-like cognition in a machine, his test falls flat. It has not stood the test of time and should subsequently either be retired or refined.

Indeed, critics of the test argue that Turing's behavioral criterion of intelligence is insufficient and irrelevant to AI. What matters is that the computer can demonstrate cognitive ability, regardless of behavior. Indeed, it's not necessary for a program to speak in order for it to be intelligent. Thus, many humans and intelligent systems would likely fail the Turing Test. At best, it can be said that the Turing Test can assess for an extremely small and insufficient subset of human behaviors.

But the inadequacies of Turing's Test has also led to its perversion. As pointed out by Jason Hutchens in his essay, "How to Pass the Turing Test By Cheating," it's surprisingly easy to code a human conversation simulator, or chatbot. Moreover, the Turing Test is a poor test of intelligence — one that it encourages trickery, not intelligent behaviour. Hutchens predicted that the $100,000 Loebner Prize, a real-world instantiation of the Turing Test, would eventually fail on account of these shortcomings.

Turing Test 2.0?

All this is not to say that artificial intelligence is impossible, or that we should give up on trying to locate it in a machine. In fact, it's crucial that we figure this out; the day is coming when an artificial intelligence becomes sophisticated enough and self-aware enough such that it migrates from an object of inquiry to a subject worthy of moral consideration.

In other words, we need to devise a test that measures the personhood of an AI, much like the trial portrayed in the excellent ST:NG episode, The Measure of a Man. And indeed, James Sennett has proposed a similar modification to the test.

Additionally, Steven Harnard has proposed a test where, instead of conversation, a machine must interact in all areas of human endeavor. And instead of a five minute conversation, the test would last for the entire length of time the machine's lifetime.

An in terms of developing AI, a little less chatbot and a little more whole brain emulation and/or cognitive modeling.

Images: diuno/Shutterstock | vistudio/Shutterstock

Follow me on Twitter: @dvorsky

Show all 143 comments