Why Prediction is the Essence of Intelligence
Is it a coincidence that machine learning and intelligence are both rooted in prediction?
Are we approaching a momentous juncture when our technology embodies the essence of intelligence? Or is this yet another chapter in a long history of misconceptions? And if it is indeed the essence, in a system of many components, what elevates prediction above the rest?
“Prediction is the essence of intelligence” — Yann LeCun
Yuval Noah Harari advises us to study history to loosen the grip of the past. “Studying history will not tell us what to choose, but at least it gives us more options.” If you’re charting a roadmap in artificial intelligence, surveying the options is prudent. By tracing the work of Marcus Hutter, Shane Legg, Jeff Hawkins, and Yann LeCun, we gain insights into why machine learning holds us in its powerful grip.
Technology molds our explanations of intelligence
Historically, the dominant technology of each age molded our explanations of how intelligence works. The hydraulic technologies of the ancient Greeks paralleled the flow of “humors”, determining our bodily and mental functions. The mechanical age amplified the idea that humans are machines, including motions in the brain. Enter electricity and communications, and the brain became a switchboard; computing, and the brain is an information processor.
In this age of artificial intelligence, many believe that prediction and learning are the essence of intelligence. Prediction and learning are also the major functional components of machine learning, a dominant technology of this age.
So one possible explanation for why prediction is the essence of intelligence is that our tools are engines of prediction. In this historical context, the essence of the thing is less about the merits of prediction and more about the difficulty in thinking beyond our tools.
Admittedly, this history is interesting, but hardly a damning indictment of the idea, in and of itself. It may well be that prediction is indeed the essence, and we’re approaching that tremendous milestone when our technology achieves the essence of our intelligence. To weigh that proposition, we need to address the claim more directly.
A narrow wedge of intelligence
To identify the essence, we need to parse the aspects that are essential. It’s worth highlighting at this point the remarkable freedom afforded by the concept of intelligence, to abstract from it different goals, purposes, attributes, and essences.
When William Calvin explored The Emergence of Intelligence, he observed that the essence wasn’t always associated with prediction. “To most observers, the essence of intelligence is cleverness, a versatility in solving novel problems.” While expressing a personal affinity towards the importance of foresight and prediction, his survey included a range of complementary behaviors, such as exploration, creativity and versatility.
“We will never agree on a universal definition of intelligence because it is an open-ended word, like consciousness.” — William Calvin
Calvin concluded that the task of finding a single definition of intelligence was futile. “We will never agree on a universal definition of intelligence because it is an open-ended word, like consciousness.”
As it turned out, those pursuing machine intelligence were not at all deterred by this fungibility. While it may not be satisfying to all people and purposes, a tribe may rally around a simplified concept. These foundational choices may in turn bring essential attributes of intelligence into focus. This is what we’ll look at next.
A definition of machine intelligence
One effort to focus a community on a definition of machine intelligence was offered by Shane Legg and Marcus Hutter. They acknowledged the difficulty in developing a highly abstract yet general concept. However, with clarity about the goal of machine intelligence as an autonomous, goal-seeking system, they overcame the challenge.
“Intelligence measures an agent’s ability to achieve goals in a wide range of environments.” — Shane Legg & Marcus Hutter
Legg and Hutter informed their definition through a survey of expert opinions on intelligence. Given their ambitions for universality, they sought a definition that subsumed both natural and artificial intelligences. The essential features of intelligence, assessed against their goal, were extracted into a general definition: “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.”
As anticipated by the historical context, not only are we thinking about intelligence using the concepts of our technologies, we’re doing so in a very explicit way. If the circularity here is troubling, Legg and Hutter acknowledge the concern. They conclude, however, by chaining their results to expert definitions of intelligence, including natural intelligences, and showing that their work is consistent with the theory of universally optimal learning agents, “what we have done goes far beyond merely restating elementary reinforcement learning theory”.
However, this still leaves open the question of why prediction is the essence of intelligence. Machine intelligence is a system of functional components. Denoting one part as indispensable may seem odd: presumably, all the parts comprising the system are indispensable. What elevates prediction as the essence?
Prediction as the essence of machine intelligence
In the context of a system, the meaning of essence may be better understood as “the missing link”, the element that solves the puzzle.
In his theory of Universal Artificial Intelligence, Marcus Hutter set out to solve the problem of machine intelligence, through “a unification of the ideas of universal induction, probabilistic planning and reinforcement learning”. The combination of sequential decision theory and Solomonoff’s theory of Universal Induction provided a theory for rational agents in both known and unknown environments.
Noteworthy are Hutter’s statements of values. Among the success criteria, his theory captures the informal definition of intelligence discussed above and it’s rooted in a theory of inductive inference. In another post, I’ve examined how induction drives AI theory and methodology.
Within a theory of machine intelligence then, we’ve arrived at a deeper understanding of why prediction is the essence. Inductive inference, a predictive process, solves the problem of how rational agents can achieve their goals in unknown environments. Machine learning is a process of inductive inference, drawing general conclusions from specific examples, and prediction is the essence of machine learning.
This explanation doesn’t diverge from the historical expectation, it embraces it. Legg and Hutter defended their approach as “unabashedly functional”. What matters is the goal and the measurable performance of the agent. They celebrate that unmeasurable aspects such as consciousness, emotions, and creativity are stripped away. Similarly for any anthropomorphic sympathies or appeals to natural intelligence. They have no interest in creating an “artificial human”.
Others believe that it is only through a reverse engineering of natural intelligences that artificial intelligence will ever be achieved. Is prediction the essence of intelligence in their worldview?
Prediction as the essence of natural intelligence
In his book, On Intelligence, Jeff Hawkins proposed that prediction offers an important measure of intelligence, framed in a concept of understanding. Citing Searle’s familiar Chinese Room thought experiment (a matter of no practical concern in the functional perspective discussed above), Hawkins explained that understanding embodies aspects of remembering. Intelligence requires inner representations and processing of experiences.
In Hawkins’ theory, predictions are evaluated against expectations of what will happen. The human cortex is made up of millions of columns, each composed of groups of neurons. (Again, harking back to the sway of our tools, this reminded Hawkins, an engineer by training, of the architecture of a silicon chip.) The uniformity in these structures implies a single algorithm or principle underlying all information processing in the cortex. These columns constitute units of prediction. In this memory-prediction framework, prediction is the essence of intelligence.
Even within this prediction-memory framework, the idea of prediction as the essence is difficult to pin down. While it’s true that people are forever living in the future, the functionality at play is more aptly described as prospection, not prediction. We’re continually evaluating our internal explanations of the world against relatively sparse, unexpected observations.
Critics such as Gary Marcus maintain that Hawkins’ model is overly simplified, abstracting only some of the known mechanisms in the brain; he notes that many other aspects remain a mystery. (This isn’t to suggest Marcus is an advocate for mainstream machine learning, believing there’s “a bias in machine learning which is to assume that everything is learnt.”) Other researchers, such as Jeff Dean and Demis Hassabis, seem more sympathetic to Hawkins’ inspiration in nature, if not necessarily applauding the feasibility or realization of his ideas thus far.
“For fucks sake, DL people, leave language alone and stop saying you solve it.” — Yoav Goldberg
As might be expected, the complexity of intelligence and the inevitable compromises required to engineer solutions create fault lines. This brings tribes into conflict, as in the recent public debate between LeCun, Yoav Goldberg, and others representing the deep learning and NLP communities, subtitled, “for fucks sake, DL people, leave language alone and stop saying you solve it”.
When we’re looking to reality as an objective arbiter, the essence of intelligence only becomes more elusive. Ultimately, to understand why prediction is the essence of intelligence, we need to appeal to the court of public opinion and the most influential people in the industry.
A question becomes a meme
For much of this story, the question of whether prediction is the essence of intelligence was a relatively obscure, pragmatic consideration, whether in the pursuit of machine intelligence or the study of natural intelligence. Then the question became a meme.
This transformation can be traced through a short series of events. The vision of machine intelligence as inductive inference, expressed in applications of deep learning, emerged as a tremendously successful approach. Pioneers of this approach, such as Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, vindicated after decades of research, became highly influential prophets of what comes next.
In settings of short attention, complex questions are necessarily simplified and stripped of their historical context. In an effort to educate and promote their research agendas to a massive new audience, complex questions were reduced to a form suitable for tweets, soundbites, and pitch decks.
Memes become dogma
When a rockstar like LeCun stands up and makes a proclamation, it reverberates across the entire industry. Such as it was in 2015, when LeCun stated, “Prediction is the essence of intelligence, and that’s what we’re trying to do.” And what LeCun is trying to do will influence and constrain the thought leadership and R&D agendas for many.
LeCun argues that the crux of the AI problem is “prediction under uncertainty”, which parallels the functional perspective discussed above. In his keynote from NIPS 2016, the central role of predictive learning — learning predictive models — is highlighted as the necessary step for progress in AI. The main technical difficulty is that “the world is only partially predictable”.
As LeCun outlines the roadmap, prediction frames a host of challenges, even in areas not conventionally associated with forecasting. Common sense, for example, is described as, “Predicting any part of the past, present or future percepts from whatever information is available.” As with Hawkins’ prediction-memory duality discussed above, the functional framing may be shifted from predicting to other techniques, such as representing, remembering, or reasoning. But why mess with success?
It’s the law of the hammer to treat everything as if it were a nail. When wielded as the essence of the thing, prediction becomes not only descriptive and a criterion for measurement, but also prescriptive of the solutions themselves. And when cast in prescriptive terms to an adoring tribe, the question is no longer questioned. It becomes truth.
Why prediction is the essence of intelligence
Dominant technologies have a tremendous influence over how we understand the world and ourselves. Throughout history, technologies such as hydraulics, steam engines, and computers have molded our explanations of intelligence. This powerful force continues today. Machine learning is the conceptual frame that explains why prediction is the essence of intelligence.
A complex concept like intelligence must be simplified before it can be engineered. A mere subset of attributes and success criteria are abstracted to suit the need. While a single definition won’t satisfy everyone, a community of practice may rally around this simplified concept.
For one community, a consensus emerged around a vision of machine intelligence as an autonomous, goal-seeking system. By design, this definition of intelligence was crafted within the expectations of how machine intelligence would be realized and measured. Within this architectural frame, inductive inference, a predictive process, offered a solution to the problem of how rational agents can achieve their goals in unknown environments. Prediction is also the essence of machine learning, a process of inductive inference.
This simplification and focus, while effective, is also inherently controversial. The complexity of intelligence affords the freedom to accentuate many purposes, theories, and architectures. Accordingly, artificial intelligence is an extraordinarily diverse collection of research and technologies, many at odds with the prevailing and dominant ideas.
However, might makes right. Machine learning, particularly deep learning, has emerged as the most successful and dominant set of technologies, overshadowing every other aspect of artificial intelligence. Just as the concept of intelligence was reduced to a simplified, pragmatic definition, in the eyes of many, artificial intelligence has been reduced to machine learning, further reduced to deep learning. In that worldview, prediction is indeed the essence.
Success also brings power and influence. The pioneers of deep learning, vindicated after decades of research, have been elevated to the status of prophets. Their statements become indiscriminate fodder for tweets, soundbites, and pitch decks; their research agendas and roadmaps set the course for an entire industry.
And given the fungibility of these concepts, any functional gap in machine intelligence may be conceived as a problem of prediction. If your roadmap is rooted in prediction as the essence, the law of the hammer dictates that it’s a prediction problem. And when cast in prescriptive terms to an adoring industry, the question is no longer questioned. It becomes truth.
Prediction is the essence of intelligence. And it will remain so until a more dominant technology teaches us otherwise.