Where does mathematics come from? I'm not talking about the philosophical question of whether mathematical truths have an existence independent of human minds. I mean concretely. What on earth is this field? When a mathematician makes cryptic pronouncements like, "We define the entropy of a probability distribution to be such-and-such," who or what led them to explore that definition over any other? Are they accepting someone else's definition? Are they making up the definition themselves? Where exactly does this whole dance begin?
Mathematics textbooks contain a mixture of the timeless and the accidental, and it isn't always easy to tell exactly which bits are necessary, inevitable truths, and which are accidental social constructs that could have easily turned out differently. An anthropologist studying mathematicians might notice that, for some unspecified reason, this species seems to prefer the concept of "even numbers" over the barely distinguishable concept of "(s?)even numbers," where a "(s?)even number" is defined to be a number that's either (a) even, or (b) seven. Now, when it comes to ad hoc definitions, this example is admittedly an extreme case. But in practice not all cases are quite this clear cut, and our anthropologist still has an unanswered question: What is it that draws the mathematicians to one definition and not the other? How do mathematicians decide which of the infinity of possible mathematical concepts to define and study in the first place?
The secret is a piece of common unspoken folk-knowledge among mathematicians, but being an outsider, our anthropologist had no straightforward way of discovering it, since for some reason the mathematicians don't often mention it in their textbooks.
The secret is that, although it is legal in mathematics to arbitrarily choose any definitions we like, the best definitions aren't just chosen: they're derived.
Definitions are supposed to be the starting point of a mathematical exploration, not the result. But behind the scenes, the distinction isn't always so sharp. Mathematicians derive definitions all the time. How do you derive a definition? There's no single answer, but in a surprising number of cases, the answer turns out to involve an odd construct known as a functional equation. To see how this happens, let's start from the beginning.
Equations are mathematical sentences that describe the behaviors and properties of some (often unknown) quantity. A functional equation is just a mathematical sentence that says something about the behaviors—not of an unknown number—but an entire unknown function. This idea seems pretty mundane at first glance. But its significance becomes more clear when we realize that, in a sense, what a mathematical sentence of this form gives us is a quantitative representation of qualitative information. And it is exactly that kind of representation that is needed to create a mathematical concept in the first place.
When Claude Shannon invented information theory, he needed a mathematical definition of uncertainty. What's the right definition? There isn't one. But, whatever definition we choose, it should act somewhat like our everyday idea of uncertainty. Shannon decided he wanted his version of uncertainty to have three behaviors. Paraphrasing heavily: (1) Small changes to our state of knowledge only cause small changes to our uncertainty (whatever we mean by "small"), (2) Dice with more sides are harder to guess (and the dice don't actually have to be dice), and (3) If you stick two unrelated questions together (e.g., "What's your name" and "Is it raining?") your uncertainty about the whole thing should just be the first one's uncertainty plus the second one (i.e., independent uncertainties add). These all seem pretty reasonable.
In fact, even though we're allowed to define uncertainty however we want, any definition that didn't have Shannon's three properties would have to be at least a bit weird. So Shannon's version of the idea is a pretty honest reflection of how our everyday concept behaves. However, it turns out that just those three behaviors above are enough to force the mathematical definition of uncertainty to look a particular way.
Our vague, qualitative concept directly shapes the precise, quantitative one. (And it does so because the three English sentences above are really easy to turn into three functional equations. Basically just abbreviate the English until it looks like mathematical symbols.) This is, in a very real sense, how mathematical concepts are created. It's not the only way. But it's a pretty common one, and it shows up in the foundational definitions of other fields too.
It's the method Richard Cox used to prove that (assuming degrees of certainty can be represented by real numbers) the formalism we call "Bayesian Probability theory" isn't just one ad hoc method among many, but is in fact the only method of inference under uncertainty that reduces to standard deductive logic in the special case of complete information, while obeying a few basic qualitative criteria of rationality and internal consistency.
It's the method behind the mysterious assertions you might have heard if you've ever spent any time eavesdropping on economists: A "preference" is a binary relation that satisfies such-and-such. An "economy" is an n-tuple behaving like etcetera. These statements aren't as crazy as they might seem from the outside. The economists are doing essentially the same thing Shannon was. It may involve functional equations, or it may take some other form, but in every case it involves the same translation from qualitative to quantitative that functional equations so elegantly embody.
The pre-mathematical use of functional equations to derive and motivate our definitions exists on a curious boundary between vague intuition and mathematical precision. It is the DMZ where mathematics meets psychology. And although the term "functional equation" isn't nearly as attention grabbing as the underlying concepts deserve, they offer valuable and useful insights into where mathematical ideas come from.