Testing For Intelligence

Introduction

In his 1950 paper Computing Machinery and Intelligence [1], Professor Alan Turing, one of the most prominent of the founders of computer science, considered the question Can machines think?. Such a question is interesting on many levels; it raises several philosophical questions about the nature of intelligence and thought and additionally, it poses a great challenge for computer scientists — can we make machines that can think? For Turing, however, the question was too ambiguous. Thought, he reasoned, is too abstract an idea to be used for such a question, and when combined with the vagueness of the word Machine, it was, he decided, absurd [1].

To combat this ambiguity, Turing decided to define intelligence by creating tests that could only be passed by an entitity with intelligence. This notion of defining intelligence by the ability to solve certain problems was a revolutionary one that opened the door to real AI research, rather than the philosophical speculation that had occurred previously.

Turing's original test was a simple one. Based on a parlour game, an human interrogator questions an unknown entity that can be either human or machine, and from the responses it gives, aims to judge which of the two it is. By relying on the interrogator's innate perception of human-like intelligence, Turing hoped to categorise machines by their intellectual qualities. As he put in his paper:

The new problem has the advantage of drawing a fairly sharp line between the physical and the intellectual capacities of a man.[1]

A different approach to AI was devised in 1950's in the form of a neural network. Instead of modeling intelligence as a set of formal rules, neural networks aim to simulate a wide variety of situations by having a system that can adapt to its current state. An example of this model is the twenty questions game. In this experiment, a device will try to guess an unknown object by asking less than 20 yes or no questions. By learning from each game it plays, the machine can build up a broad base of knowledge from which to derive its questions.

Clearly these methods both aim to evaluate intelligence in different ways. To understand the reasons behind their usage and some of their criticisms it is useful to take a more in-depth look at some of the history behind their development.

A History of the Turing Test

In Turing's day computers were still in the realm of the theoretical. Turing was a mathematical logician, and was working on methods of solving mathematical proofs with logical machines when the notion of machine intelligence occurred to him. His work on the Entscheidungsproblem in the 1930's had led him to consider the idea of a machine that could perform logical operations. This theoretical Turing Machine is still used as a standard by which computer systems are compared - the notion of turing completeness stems from this research.

This new type of machine paved the avenue for an entirely new type of study, into which Turing eagerly delved. Helped by the development of Alonso Church's lambda calculus, the field of mechanical computation, explored in the Church-Turing thesis, had been started. One of the first areas explored in this new field was the question of what was possible with computation. Artificial Intelligence was an obvious problem to study and much speculation occurred in these early days as to whether it was possible. The lack of a precise definition of intelligence hampered early research, leaving most of the early reasoning on a purely philosophical level.

The Turing test revolutionised AI largely because it provides a way to categorise, and more importantly test, the intelligence of a system. This idea of the intelligence of a system being defined by it's behaviour led to the development of the field of symbolic AI. In this model, it is believed that intelligence can be described by a set of algorithms. In the Turing test, these algorithms aim to analyse the input text and craft a suitable response. The fame of the Turing test, coupled with the creation of several prizes, has meant that a large amount of effort has gone into solving it over the last 70 years. Although the problem has not been solved - as it is still impossible to fool everyone all of the time- several systems have come close. The history of these systems is an interesting one.

ELIZA

In the 1960's an MIT professor, Joseph Weizenbaum, created ELIZA. ELIZA was, in Weizenbaums words, a program which makes natural language conversation with a computer possible.[2]

ELIZA functioned by searching input text for certain keywords, and then using a predefined rule to transform the text into a response. If this did not work, a canned response was returned [2]. Although primitive, by basing his responses on Rogerian psychotherapy, a type of psychoanalysis in which emphasis is placed on the client in a non-directive approach, Weizenbaum created a system that fooled many people, at least temporarily.

ELIZA's responses are simplistic at best. Frequently, question reversal is used to direct the subject of conversation back towards the user. Although, from this perspective, psychologically very well designed, a lack of understanding of any of the conversation means that ELIZA's responses are excessively vacuous and therefore obviously mechanical.

Despite its shortcomings, Weizenbaum was sufficiently perturbed by the amount of people fooled by his program that he wrote a book, [?], in which he discussed the implications of the system. As he noted in the introduction:

What I had not realized is that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people. This insight led me to attach new importance to questions of the relationship between the individual and the computer, and hence to resolve to think about them.[?]

Flaws in the test

To today's society, the conversations of early conversation machines, such as ELIZA, are obviously contrived. The prevalence of computers, and mechanised intelligence systems has led to an increased scepticism of such systems, and as a result, perception of intelligence in situations such as the Turing test has improved. This leads to the first flaw in Turing's test. The test is based on perception of intelligence, a very fallible metric. Humans are naturally subjective, and there- fore changes in society and education make any definitive comparisons between systems almost impossible. This may be a flaw in the test, or it may be implicit in the notion of intelligence. By defining intelligence as a cognitive quantity, Turing essentially abandoned any hope of constancy in such measurements.

As Weizenbaum had realised, another flaw in the Turing test was the fallibility of human nature, and our ability to personify inanimate objects. People believed in ELIZA's intelligence because they wanted to converse with it. By reading into the replies, they extrapolated responses that they wanted to hear rather than reading them for what they were. The Turing test fails, therefore, on another level. Humans are renown for their ability to read into things - to imagine that the constellations are actually pictures, to refer to ships as she to recognise emotion in faces they have never seen before. It is an innate ability of man to recognise and empathise with humanity, this social interaction is one of the reasons the human species is so successful. To base a test for intelligence on human perception whilst ignoring this ability is surely naïve.

In his paper [4], Jason Hutchens comes to the same conclusion. Hutchens was a 1996 winner of the Loebner prize, an annual competition which aims to promote work into AI by performing a formal version of the Turing test. His entry, Hal, was a lot more sophisticated than ELIZA, however it still worked by pattern matching input strings. To add sophistication to the responses, Hutchens built up a database of responses based on his own personality that aimed to anthropomorphise his creation. This system with personality was successful, however it still only functioned as an impersonation of a conversation, lacking any real comprehension.

Twenty Questions

One of the most serious problems with the symbolic AI is the sheer number of different scenarios a system may face. It is very difficult to form formal rules about real world situations because of the amount of parameters that depend on the state of the situation. To overcome this complexity, a model known as an artificial neural network can be adopted.

Whilst symbolic AI aims to imitate the human mind, with its logical formulae, neural networks try to imitate the brain. By building a network of simple nodes, known as neurons, and simulating thought as interaction between them, neural networks offer the potential to be able to deal with huge numbers of scenarios.

A good example of this type of program is the twenty questions game. First developed in 1988, by Robin Burgener, it was initially programmed to recognise a small number of ob jects. The genius of the design lay in its ability to learn new ob jects when it lost. When twenty questions was first made available to the public, it had built up an impressive array of knowledge from which to draw conclusion. 20Q is currently being marketed as a toy which, as Kevin Kelly, of Wired Magazine notes, Burned into its 8-bit chip is a neural net that has been learning for 17 years.[7]

Conclusions

In 1950, at the dawn of the computer age, Alan Turing asked the question Can computers think? Despite the rapid advancement of the field of computing, and indeed artificial intelligence, we are still far from answering him.

One of the key problems is that we still have no precise definition of intelligence. Despite the success of systems based around Turing's behavioural definition of intelligence, this approach has many flaws. Arguments such as the chinese room, have questioned our definition of what classes as thought.

Perhaps we are also guilty of defining thought too narrowly. Weizenbaum notes : that an entirely too simplistic notion of intelligence has dominated both popular and scientific thought, and that this notion is, in part, responsible for permitting artificial intelligence's perverse grand fantasy to grow.[3]

Regardless of our progress in solving Turing's original problem, the tests that have been proposed subsequent to it have inspired swathes of research and ideas. Understanding these tests, therefore is instrumental in understanding the way AI research has evolved and where it may lead.

References

Turing, A Computing Machinery And Intelligence
Weizenbaum, J ELIZA "A Computer Program For the Study of Natural Language Communication Between Man and Machine
Weizenbaum, J Computing Power and Human Reason
Hutchens, J How To Pass The Turing Test by Cheating
http://www.loebner.net/Prizef/loebner- prize.html
http://news.bbc.co.uk/1/hi/sci/tech/1194565.stm
http://www.kk.org/cooltools/archives/000725.php