"A DIFFERENCE THAT MAKES A DIFFERENCE"
I’ve been thinking about information for years. Way back when I was a graduate student in Oxford, I first encountered information theory and wrote a little bit about it in my dissertation. I realized there was a huge gap between information in the everyday folk sense and information as presented in A Mathematical Theory of Communication by Claude Shannon and Warren Weaver, which I explored briefly in that book and just left it dangling and didn’t know what to do with it.
It’s been haunting me over the years. More recently, I’ve returned to the subject to see if I could take advantage of some of the other changes of perspective I’ve had over fifty years. Now, with the help of a few other people, I’ve got a pretty good bead on information in both senses and how they fit together. There is what's sometimes called semantic information, which is information that’s about particular things. And then there's Shannon information, which is just a measure. It’s not just a measure, but it’s often quite acceptably considered to be. It’s like a bucket of information. How many quarts? How many pints? How many bits? But the bits can be about anything.
The question is, how does the Shannon sense of information ground or fit with the understanding of information we talk about when we talk about disinformation, or being misinformed, or being a very well-informed person? How does this all fit with the philosophers’ favorite topics of belief and knowledge and so forth? There’s a tradition in philosophy that looks at the root case. For example, Tom believes that P—for some proposition. Tom believes that Larsen pitched the only perfect game in World Series history, to take a threadbare example.
Here we have a case of a person, Tom, who has a belief, and the belief is captured presumably by this proposition. It’s a sentence of English in this case, but one is to understand that that sentence of English is just standing in for something more abstract, roughly the same way that a numeral, Roman or Arabic, stands in for a number. The number is abstract; the numeral is a symbol. The English sentence is an expression of a proposition, and so philosophers have been very eager to get a good theory of propositions.
I played, worked, and struggled with that stuff for years, and grew more and more dissatisfied with it. I wrote a piece twenty-five, thirty years ago called "Beyond Belief," which was my attempt to put that tradition pretty much to bed. And to my satisfaction, I did. I go back to that literature, the so-called propositional attitude task force, so named by the participants somewhat jocularly. It was 1980-something when I wrote that. Here it is all these years later and they’re playing the same games and puzzling over the same examples, and they’re just as far from a solution as ever, as far as I can tell. I'm not sorry that I turned my back on propositions, but I still hadn't answered the questions I wanted answered.
Having turned my back on propositions, I thought, what am I going to do about this? The area where it really comes up is when you start looking at the contents of consciousness, which is my number one topic. I like to quote Maynard Keynes on this. He was once asked, “Do you think in words or pictures?” to which he responded, “I think in thoughts.” It was a wonderful answer, but also wonderfully uninformative. What the hell’s a thought then? How does it carry information? Is it like a picture? Is it iconic in some way? Does it resemble what it’s about, or is it like a word that refers to what it’s about without resembling it? Are there third, fourth, fifth alternatives? Looking at information in the brain and then trying to trace it back to information in the genes that must be responsible for providing the design of the brain that can then carry information in other senses, you gradually begin to realize that this does tie in with Shannon's information theory. There’s a way of seeing information as "a difference that makes a difference," to quote Donald MacKay and Bateson.
Ever since then, I’ve been trying to articulate, with the help of Harvard evolutionary biologist David Haig, just what meaning is, what content is, and ultimately, in terms of biological information and physical information, the information presented in A Mathematical Theory of Communication by Shannon and Weaver. There’s a chapter in my latest book called “What is Information?” I stand by it, but it’s under revision. I’m already moving beyond it and realizing there’s a better way of tackling some of these issues.
The key insight, which I’ve known for years, is that we have to get away from the idea of there being the pure ultimate fixed proposition that captures the information in any informational state. This goal of capturing the proposition, this attempt at idealization that philosophers have poured their hearts and souls into for a hundred years is just wrong. Don’t even try. I’m now coming around to wonder why it had such a hold on us. It’s quite obvious once you start thinking this way.
We and only we, among all the creatures on the planet, developed language. Language is very special when it comes to being an information handling medium because it permits us to talk about things that aren’t present, to talk about things that don’t exist, to put together all manner of concepts and ideas in ways that are only indirectly anchored in our biological experience in the world. Compare it, for instance, with a vervet monkey alarm call. The vervet sees an eagle and issues the eagle alarm call. We can understand that as an alarm signal, and we can see the relationship of the seen eagle and the behavior on the part of the monkey and on the part of the audience of that monkey’s alarm call. That’s a nice root case.
Suppose we start asking what exactly that monkey call means. Does it mean, "Look out! There’s an eagle. It’s up there!" Or does it mean, "Jump into the trees!" Or does it mean, "Oh, help, help, help!" How would we put that alarm call in English? Don’t try. This is the trick. Don’t imagine that the way to have a theory of meaning and interpretation is to treat it as a theory that has as its goal the reduction of everything to some canonical proposition. That’s a very powerful idea, which is just a big mistake.
It’s powerful because, for instance, it allows you to say things like this: We can have a sentence in French, le chat est noir; and a sentence in English, the cat is black; a sentence in German, die Katze ist Schwarz; and they all mean the same thing. There is one and only one proposition, which is expressed differently in different languages. So, this is a way of giving us a signpost to this object, the proposition.
Propositions are supposed to be idealizations, rather like numbers or vectors or some other abstract formulation. It looks at first very powerful, and for some purposes it’s very useful. But it takes you away from enlightenment because it gives you this false sense that you haven’t understood something really until you’ve figured out how to articulate, how to point to, how to identify the proposition that a particular meaningful event has. No. There are all kinds of meaningful events that defy putting in terms of any particular proposition. That doesn’t make them not meaningful. You have to turn the whole thing upside down.
This is what I call David Haig’s strange inversion. Start with the simplest imaginable case, like a bacterium that responds to a gradient in its environment, and that response has a meaning. Well, what does it mean? In a way, don’t ask. It has meaning because the response in one way or another is relevant to the wellbeing of that bacterium. If it’s responding to food by moving towards it, that’s its meaning, and it’s not more precise than that. We have to get away from the idea that that’s a merely figurative or metaphorical case of meaning. It’s as real as meaning ever gets. Then, we start with dead simple cases of meaning, the meanings of the actions of bacteria and the like. We can get even simpler than that and talk about the meanings of the structures of proteins. We can get very simple and treat those as our atoms, those as our basic units. We see human meaning in books, newspapers, and on television—meaning that’s linguistically bound as important and which has many properties that other kinds of meaning don’t have, not as the foundational case but rather as the exotic cases of meaning, the cases that create theoretical illusions of various sorts, such as the hunt for the phantom propositions.
You say to some philosophers, “I’m very interested in meaning and interpretation. I don’t think propositions are going to play much of a role,” and many philosophers would tell you you’ve got to be wrong. They’re so wedded to that perspective, which they’ve learned in all of their classes. I was myself wedded to it to a significant degree.
I’m moving away from that. And it’s paying off not just in thinking about information in biology and information in what I call the intentional stance, which obviously ties in very nicely with that, but very specifically with how to talk about the content of states of consciousness or other states of the brain which are not all pictures. Almost none of them are pictures, and they aren’t all words. Many of them are, but that’s not where their meaning resides. So, we are freed up to think about how neural states and transitions can have quite elaborate meaning without it being expressible in any sentence.
I’ve been hinting at this in various ways for years. More than a decade ago I had an example of somebody saying, “Joe has a thing about redheads.” Well, what did I mean by “a thing about redheads”? It was a thing. It was a thing in his brain about redheads. In what sense? Well, whenever redheads were the topic, this thing became active and biased his cognition, his emotional state. He just had a thing about redheads. Okay, so now we have this thing. It’s not a proposition. Don't ask what the proposition is that it expresses or stores in memory. There’s no proposition. What’s true of it is that it’s this physical thing, which has the curious property of being relevant to redheads in that it becomes active whenever the topic is redheads. Whether redheads are seen, or talked about, or remembered, or imagined, this thing becomes active. That's a very crude example of a neural something or other that has meaning in the mind of a particular individual, but it defies translation into any propositional format, and none the worse for it.
The philosophers’ fascination with propositions was mirrored in good old-fashioned AI, the AI of John McCarthy, early Marvin Minsky, and Allen Newell, Herbert Simon, and Cliff Shaw. It was the idea that the way to make an intelligent agent was from the top down. You have a set of propositions in some proprietary formulation. It’s not going to be English—well, maybe LISP or something like that, where you define all the predicates and the operators. Then, you have this huge database that is beautifully articulated and broken up into atoms of meaning, which have the meaning they have by being part of the system they’re part of. You stipulate their meanings, and then you have a resolution theorem prover that sits on top of that, and this is how we’re going to generate a thoughtful, creative mind. It was a very attractive dream to many people, and it’s not quite dead yet.
This early attempt in AI, which went through well into the ‘80s, gave us many wonderful things. It’s important to recognize how much we now take for granted in AI and in computers that grew out of the explorations of the good old-fashioned AI pioneers. But it hit a brick wall. McCarthy and Hayes discovered the frame problem. There were other intractable issues, and then along came connectionism, and then, more recently, reinforcement learning and deep learning. People have moved away from this ideal of a canonical expression of specific propositions as in an axiom system.
I’ve used the term good old-fashioned AI or GOFAI, which was coined by my late dear friend John Haugeland in his book, Artificial Intelligence: The Very Idea. He was an early influential and knowledgeable critic of the field. He deserved the right to come up with the amusing name “good old-fashioned AI” when it was still regarded as very promising by many people. He was already foreseeing its demise correctly in many regards, but the fact is that the fruits of good old-fashioned AI are all around us.
In fact, the Internet is largely based on good old-fashioned AI. When people talk today about semantic search as opposed to string search, they’re talking about going beyond what you can do with the methods of good old-fashioned AI, which is string search till the cows come home, go deeper and get at the semantic meanings of what’s out there.
Those are the semantic meanings that the philosophers were trying to capture with their notion of propositions. If that’s just a fool’s errand, as I now think, then one has to acknowledge that semantic search, the semantic web, can never be anything more than a useful approximation. We should be very keen to recognize that and recognize that we’re not going to get canonical, clear, provable semantics out of most of what’s on there. You can get some because some areas of inquiry and datasets are wonderfully well articulated and organized and are just built for that kind of exploration. But most of the web isn’t, and that’s the way it’s going to continue.
That’s where deep learning comes in, especially Bayesian nets and so forth, because these are tools for finding by amazing neo Darwinian methods extracting meaningful patterns from all of this diverse material. And the beautiful thing about it is they extract meaningful patterns without knowing what the meaning is, and that takes us back, interestingly enough, to Shannon and bits and differences that make a difference. Even when you can’t say what difference they make, you can say they make a difference.
What deep learning is now doing is coming up with competences that were unimaginable just a few years ago because they don’t depend on GOFAI-type comprehension. They don’t depend on being able to deduce from axioms that this has to be the meaning of this or that. They’re not deductive at all.
If you go to Google and search for something, they will search for exactly the sequence of letters you put in. Now, it’s getting better—if you think it’s better—in that it will try to guess what you really want to ask about, and if it thinks you’ve misspelled something, it will try an alternative spelling.
A lot of people object to that because they want to do research that depends on actual string search, but the techniques that Google and others are developing for going beyond the string and trying to suss out the intended meaning of the search, this is new territory. We should view this with some real caution and be careful that we know when it’s being used and when it isn’t because it may very well misinterpret what you want, and you’ll never know unless you can somehow peel back behind the screen and see what’s actually going on.
Those methods are the ones that are intended to make the web more semantic in that you can look things up by meaning and not just by strict name, not by just the symbols in the string. All of these issues are related.
The nice phrase “the difference that makes a difference”—obviously, there are differences everywhere. This grain of sand is here, not there. This grain of sand is a little different shape than that one. Some differences, however, have roles to play in larger circumstances, and it may be the butterfly that flaps its wings that causes the hurricane. If we want to have a general term for differences that make a difference, then that’s the information in the Shannon sense in that it can give rise to a correlation which can then make a further difference, and further differences, and further differences.
I’ve talked about inert historical facts. Those are differences that don’t make a difference. My stock example is this: Some of the gold in my teeth once belonged to Julius Caesar. Or, none of the gold in my teeth ever belonged to Julius Caesar, not a single miligram. One of those is true. That’s a difference that makes no difference. I doubt if that information exists in the world today. It could. We can prove that some things belong to some people who have been dead for centuries. We have an informational chain, but it’s a way of thinking about information that is maximally simple and shorn of many of its usual connotations of codes, and signals, and languages.