Many of the advances in artificial intelligence that have made the news recently have involved artificial neural networks—large systems of simple elements that interact in complex ways, inspired by the simplicity and complexity of neurons and brains. New methods for building "deep" networks with many layers of neurons have met or exceeded the state of the art for problems as diverse as understanding speech, identifying the contents of images, and translating languages. For anybody interested in artificial and natural intelligence, these successes raise two questions: First, should all thinking machines resemble brains? Second, what do we learn about real brains (and minds) by exploring artificial ones?
When a person tries to interpret data—whether it's figuring out the meaning of a word or making sense of the actions of a colleague—there are two ways to go wrong: being influenced too much by preconceptions, and being influenced too much by the data. Your preconceptions might get in the way when you assume that a word in a new language means the same thing as a word in a language you already know, like deciding that "gateau" and "gato" are the same thing in French and Spanish (which could have dire consequences, for both pets and birthday parties). You might be influenced too much by the data when you decide that your colleague hated your idea, when in fact he was short-tempered after being up all night with a sick kid (nothing to do with you at all).
Computers trying to interpret data—to learn from their input—run into exactly the same problems. Much machine learning research comes down to a fundamental tension between structure and flexibility. More structure means more preconceptions, which can be useful in making sense of limited data but can result in biases that reduce performance. More flexibility means a greater ability to capture the patterns that appear in data but a greater risk of finding patterns that aren't there.
In artificial intelligence research, this tension between structure and flexibility manifests in different kinds of systems that can be used to solve challenging problems like speech recognition, computer vision, and machine translation. For decades, the systems that performed best on these problems came down on the side of structure: they were the result of careful planning, design, and tweaking by generations of engineers who thought about the characteristics of speech, images, and syntax and tried to build into the system their best guesses about how to interpret these particular kinds of data. The recent breakthroughs using artificial neural networks come down firmly on the side of flexibility: they use a set of principles that can be applied in the same way to many different kinds of data—meaning that they have weak preconceptions about any particular kind of data—and they allow the system to discover how to make sense of its inputs.
Artificial neural networks are now arguably discovering better representations of speech, images, and sentences than the ones designed by those generations of engineers, and this is the key to their high performance. This victory of flexibility over structure is partly the result of innovations that have made it possible to build larger artificial neural networks and to train them quickly. But it's also partly the result of an increase in the amount of data that can be supplied to these neural networks. We have more recorded speech, more labeled images, and more documents in different languages than ever before, and the amount of data available changes where the balance between structure and flexibility should be struck.
When you don't have a lot of data—when you have to guess based on limited evidence—structure is more important. The guidance of wise engineers helps computers guess intelligently. But when you do have a lot of data, flexibility is more important. You don't want your system to be limited to the ideas that those engineers could come up with, if there's enough data to allow the computer to come up with better ideas. So machine learning systems that emphasize flexibility—like artificial neural networks—will be most successful at solving problems where large amounts of data are available, relative to what needs to be learned.
This insight—that having more data favors more flexibility—provides the answer to our two questions about artificial and natural brains. First, thinking machines should resemble brains—insofar as artificial neural networks resemble brains—when the problem being solved is one where flexibility trumps structure, where data are plentiful. Second, thinking along these lines can also be useful for understanding when real brains will resemble artificial neural networks. That is, for understanding which aspects of the human mind are best viewed as the result of general-purpose learning algorithms that emphasize flexibility over structure as opposed to the result of built-in preconceptions about the world and what it contains. Fundamentally, the answer will be governed by the quantity of data available and the complexity of what is to be learned.
Many of the great debates in cognitive science—such as how children learn language and become able to interpret the actions of others—come down to exactly these questions about the data available and the knowledge acquired. To address these questions we try to map out the inputs to the system (what children hear and see), characterize the result (what language is, what knowledge underlies social cognition), and explore different kinds of algorithms that might provide a bridge between the two.
The answers to these questions aren't just relevant to understanding human minds. Despite recent advances in artificial intelligence, human beings are still the best example we have of thinking machines. By identifying the quantity and the nature of the preconceptions that inform human cognition we can lay the groundwork for bringing computers even closer to human performance.