Structural insight: 2015

Thursday, October 1, 2015

Natural intelligence

[...] man had always assumed that he was more intelligent than dolphins because he had achieved so much — the wheel, New York, wars and so on — while all the dolphins had ever done was muck about in the water having a good time. But conversely, the dolphins had always believed that they were far more intelligent than man — for precisely the same reasons.
— Douglas Adams, The Hitchhiker's Guide to the Galaxy, Chapter 23.

I have a few things I want to say here about the nature of natural, as opposed to artificial, intelligence. While I'm (as usual) open to all sorts of possibilities, the theory of mind I primarily favor atm has a couple of characteristics that I see are contrary to prominent current lines of thought, so I want to try to explain where I'm coming from, and perhaps something of why.

Contents
The sapience engine
Scale
Evolution
Unlikelihood
Sapience

The sapience engine

In a nutshell, my theory is that the (correctly functioning) human brain is a sapience engine — a device that manipulates information (or, stored and incoming signals, if you prefer to put it so) in a peculiar way that gives rise to a sapient entity. The property of sapience itself is a characteristic that arises from the peculiar nature of the manipulation being done; the resulting entity is sapient not because of how much information it processes, but because of how it processes it.

This rather simple theory has some interesting consequences.

Scale

There's an idea enjoying some popularity in AI research these days, that intelligence is just a matter of scale. (Ray Dillinger articulated this view rather well, imho, in a recent blog post, "How Smart is a Smart AI?".) I can see at least four reasons this view has appeal in the current intellectual climate.

Doing computation on a bigger and bigger scale is what we know how to do. Compounding this, those who pursue such a technique are rewarded, both financially and psychologically, for enthusing about the great potential of what they're pursuing. And of course, what we know how to do doesn't get serious competition for attention, because the alternative is stuff we don't know how to do and that doesn't play nearly as well. Better still, by ascribing full sapience to a range of computational power we haven't achieved yet, we absolve ourselves of blame for having not yet achieved full AI. Notwithstanding that just because we know how to do it doesn't mean it has to be able to accomplish what we want it to.
The more complex a computer program gets — the more branching possibilities it encompasses and the bigger the database it draws on — the more effectively it can fool us into seeing its behavior as "like us". (Ray Dillinger's post discusses this.)
The idea that sapience is just a matter of scale appeals to a certain self-loathing undercurrent in modern popular culture. It seems to have become very trendy to praise the cleverness of other species, or of evolution itself, and emphasize how insignificant we supposedly are; in its extreme form this leads to the view that there's nothing at all special about us, we're just another species occupying an ecological niche no more "special" than the niches of various kinds of ants etc. (My subjective impression is that this trend correlates with environmentalism, though even if the correlation is real I'm very wary of reading too much into it. I observe the trend in Derek Bickerton's Adam's Tongue, 2009, which I wouldn't otherwise particularly associate with environmentalism.)
The "scale" idea also gets support from residual elements of the previously popular, opposing view of Homo sapiens as special. Some extraordinary claims are made about what a vast amount of computing power is supposedly possessed by the human brain — evidently supposing that the human brain is actually doing computation, exploiting the computational potential of its many billions of neurons in a computationally efficient way. As opposed to, say, squandering that computational potential in an almost inconceivably inefficient way in order to do something that qualitatively isn't computation. The computational-brain idea also plays on the old "big brain" idea in human evolution, which supposed that the reason we're so vastly smarter than other primates is that our brains are so much bigger. (Terrance Deacon in The Symbolic Species, 1997, debunking the traditional big-brain idea at length, notes its appeal to the simplistic notion of the human brain as a computer.)

I do think scale matters, but I suspect its role is essentially catalytic (Deacon also expresses a catalytic view); and, moreover, I suspect that beyond a certain point, bigger starts to degrade sapience rather than enhance it. I see scale coming into play in two respects. As sketched in my previous post on the subject (here), I conjecture the key device is a non-Cartesian theater, essentially short-term memory. There are two obvious size parameters for adjusting this model: the size of the theater, and the size of the audience. I suspect that with too small an audience, the resultant entity lacks efficacy, while with too large an audience, it lacks coherence. Something similar seems likely to apply to theater size; I don't think the classic "seven plus or minus two" size of human short-term memory is at all arbitrary, nor strongly dependent on other constraints of our wetware (such as audience size).

Note that coherent groups of humans, though they represent collectively a good deal more computational potential than a single human, are generally a lot stupider. Committees — though they can sometimes produce quite good results when well-chaired — are notorious for their poor collective skills; "design by committee" is a running joke. A mob is well-described as a mindless beast. Democracy succeeds not because it necessarily produces brilliant results but because it resists the horrors of more purely centralized forms of government. Wikipedia, the most spectacular effort to date to harness the wisdom of the masses, rather thoroughly lacks wisdom, being prone to the vices of both committees and mobs. (Do I hate Wikipedia? No. I approve deeply of some of the effects it has had on the world, while deploring others; that's a complicated subject for another time.) One might suppose that somehow the individual people in a group are acting a bit like neurons (or some mildly larger unit of brain structure), and one would need a really big group of people before intelligence would start to reemerge, but honestly I doubt it. Once you get past a "group of one", the potential intelligence of a group of people seems to max out at a well-run committee of about six, and I see no reason to think it'll somehow magically reemerge later. Six, remember, is roughly the size of short term memory, and I've both wondered myself, and heard others wonder, if this is because the individuals on the committee each have short term memories of about that size; but as an alternative, I wonder if, just possibly, the optimal committee size is not so much an echo of the size of the non-Cartesian theater as a second example of the same deep phenomenon that led the non-Cartesian theater to have that size in the first place.

Evolution

As might be guessed from the above, since my last blog post on this subject I've been reading Derek Bickerton's Adam's Tongue (2009) and Terrance Deacon's The Symbolic Species (1997, which was recommended to me by a commenter on my earlier post). Both have a fair amount to say about Noam Chomsky, mostly in the nature of disagreement with Chomsky's notion of a universal language instinct hardwired into the brain.

But it struck me, repeatedly throughout both books, that despite Deacon's disagreements with Chomsky and Bickerton's disagreements with Deacon and Chomsky, all three were in agreement that communication is the essence of the human niche, and sapience is an adjunct to it. I wondered why they thought that, other perhaps than two of them being linguists and therefore inclined to see their own subject in whatever they look at (which could as well explain why I look at the same things and see an algorithm). Because I don't altogether buy into that linguistic assumption. They seem to be dismissing a possibility that imho is worth keeping in play for now.

There's a word I picked up from Deacon: exaptation. Contrasting with adaptation by changing prefix ad- to ex-. The idea is that instead of a species feature developing as an adaptation for a purpose that the species finds beneficial, the feature develops for some other purpose and then, once available, gets exapted for a different purpose. The classic example is feathers, which are so strongly associated with flight now, that it's surprising to find they were apparently exapted to that purpose after starting as an adaptation for something else (likely, for insulation).

So, here's my thought. I've already suggested, in my earlier post, that language is not necessary to sapient thought, though it does often facilitate it and should naturally arise as a consequence of it. What if sapience was exapted for language after originally developing for some other purpose?

For me, the central question for the evolution of human sapience is why it hadn't happened before. One possible answer is, of course, that it had happened before. I'm inclined to think not, though. Why not? Because we're leaving a heck of a big mark on the planet. I'm inclined to doubt that some other sapient species would have been less capable or, collectively, more wise; so it really seems likely to me that if this had happened before we might have noticed. (Could it have happened before without our noticing? Yes, but as I see it Occam's Razor doesn't favor that scenario.)

To elaborate this idea — exaptation of sapience for language — and to put it into perspective with the alternatives suggested by Deacon and Bickerton, I'll need to take a closer look at how an evolutionary path might happen to be extremely unlikely.

Unlikelihood

Evolution works by local search in the space of possible genetic configurations: imagining a drastically different design is something a sapient being might do, not something evolution would do. At any given point in the process, there has to be a currently extant configuration from which a small evolutionary step can reach another successful configuration. Why might a possible target configuration (or a family of related ones, such as "ways of achieving sapience") be unlikely to happen? Two obvious contributing factors would be:

the target is only successful under certain external conditions that rarely hold, so that most of time, even if an extant species were within a short evolutionary step of the target, it wouldn't take that step because it wouldn't be advantageous to do so.
there are few other configurations in the close neighborhood of the target, so that it's unlikely for any extant species to come within a short evolutionary step of the target.

To produce an unusually unlikely evolutionary target, it seems we should combine both of these sorts of factors. So suppose we have an evolutionary target that can only be reached from a few nearby configurations, and consider how a species would get to one of these nearby configurations. The species must have gotten there by taking a small evolutionary step that was advantageous at the time, right? Okay. I submit that, if the target is really to be hard to get to, we should expect this small step to a nearby configuration to have a different motivation than the last step to the target. This is because, if these two consecutive steps were taken for the same reason, then at any time a species takes the penultimate step, at that same time the last step should also be viable. That would cancel out our supposition that the target is only occasionally viable: any time it's viable to get to the nearby configuration, it's also viable to continue on to the target. In which case, for the target to be really difficult to get to, its nearby neighbors would have to be difficult to get to, merely pushing the problem of unlikelihood back from the target to its nearby neighbors, and we start over with asking why its neighbors are unlikely.

In other words, in order for the evolutionary target to be especially unlikely, we should expect it to be reached by an exaptation of something from another purpose.

(I'm actually unclear on whether or not feathers may be an example of this effect. Clearly they aren't necessary for flight, witness flying insects and bats; but without considerably more biological flight expertise, I couldn't say whether there are technical characteristics of feathered flight that have not been achieved by any other means.)

Sapience

This is one reason I'm doubtful of explaining the rarity of sapience while concentrating on communication. Lots of species communicate; so if sapience were an adaptation for communication, why would it be rare? Bickerton's book proposes a specific niche calling for enhanced communication: high-end scavenging, where bands of humans must cooperate to scavenge the remains of dead megafauna. Possible, yes; but I don't feel compelled by it. The purpose — the niche — doesn't seem all that unlikely to me.

Deacon's book proposes a more subtle, though seemingly closely related, purpose. Though less specific about the exact nature of the niche being occupied — which could be high-end scavenging, or perhaps group hunting — he suggests that in order to exploit the niche, hominins had to work together in large bands containing multiple mated pairs. This is tricky, he says, because in order for these group food-collection expeditions to be of major benefit to the species, those who go on the expeditions must find it in their own genetic self-interest to share the collected food with the stay-at-home nurturers and young. He discusses different ways to bring about the required self-interest motive; but evolution works, remember, in small steps, so not all of these strategies would be available for our ancestors. He suggests that the strategy they adopted for the purpose — the strategy, we may suppose, that was reachable by a small evolutionary step — was to have each mated pair enter into a social contract, essentially a marriage arrangement, in which the female agrees to mate only with a particular male in exchange for receiving food from that male's share. The arrangement holds together so long as they believe each other to be following the rules, and this requires intense communication between them plus sophisticated reasoning about each other's future behavior.

I do find (to the extent I understand them) Deacon's scenario somewhat more plausible than Bickerton's, in that it seems to provide more support for unlikelihood. Under Bickerton, a species tries to exploit a high-end scavenging niche, and the available solution to the coordination problem is proto-language. (He describes various other coordination techniques employed by bees and ants.) Under Deacon, a species tries to exploit a high-end scavenging-or-hunting niche, and the available solution to the cooperation problem is a social contract supported by symbolic thought. In either scenario, the species is presented with an opportunity that it can only exploit with an adaptation. For this to support unlikelihood, the adaptation has to be something that under most circumstances would not have been the easiest small-step solution to the challenge. Under Bickerton, the configuration of the species must make proto-language the closest available solution to the coordination problem. Under Deacon, the configuration of the species must make symbolic thinking the closest available solution to the cooperation problem. This is the sense in which, as I said, I find Deacon's scenario somewhat more plausible.

However, both scenarios seem to me to be missing something important. Both of them are centrally concerned with identifying a use (coordination per Bickerton, cooperation per Deacon) to which the new feature is to be put: they seek to explain the purpose of an adaptation. By my reasoning above, though, either the target of this adaptation should be something that's almost never the best solution for the problem, or the target should only be reachable if, at the moment it's wanted, some unlikely catalyzing factor is already present in the species (thus, available for exaptation). Or, of course, both.

From our end of human evolution, it seems that sapience is pretty much infinitely versatile, and so ought to be a useful adaptation for a wide variety of purposes. While this may be so, when conjecturing it one should keep in mind that if it is so, then sapience should be really difficult to achieve in the first place — because if it were both easy to achieve and useful for almost everything, one would expect it to be a very common development. The more immediately useful it is once achieved, the more difficult we'd expect it to be to achieve in the first place. I see two very plausible hypotheses here:

Sapience itself may be an adaptation for something other than communication. My previous post exploring sapience as a phenomenon (here) already suggested that sapience once achieved would quickly be exapted for communication. My previous posts regarding verbal culture (starting here) suggest that language, once acquired, may take some time (say, a few million years) to develop into a suitable medium for rapid technological development; so the big payoff we perceive from sapience would itself be a delayed exaptation of language, not contributing to its initial motivation. Deacon suggests there are significant costs to sapience, so that its initial adoption has to have a strong immediate benefit.
Sapience may require, for its initial emergence, exaptation of some other relatively unlikely internal feature of the mind. This calls for some deep mulling over, because we don't have at all a firm grasp of the internal construction of sapience; we're actually hoping for clues to the internal construction from studying the evolutionary process, which is what we're being offered here if we can puzzle it out.

Putting these pieces together, I envision a three-step sequence. First, hominin minds develop some internal feature that can be exapted for sapience if that becomes sufficiently advantageous to overcome its costs. Second, an opportunity opens up, whereby hominin communities have a lot to gain from group food-collection (be it scavenging or hunting), but to make it work requires sophisticated thinking about future behavior, leading to development of sapience. The juxtaposition of these first two steps being the prime source of unlikelihood. I place no specific requirement on how sapience is applied to the problem; I merely suppose that sapience (symbolic thinking, as Deacon puts it) makes individuals more able to realize that cooperating is in their own self-interest, and doing so is sufficiently advantageous to outweigh the costs of sapience, therefore genes for sapience come to dominate the gene pool. Third, as sapience becomes sufficiently ubiquitous in the population, it is naturally exapted for language, which then further plays into the group cooperation niche as well as synergizing with sapience more broadly. At this point, I think, the process takes on an internal momentum; over time, our ancestors increasingly exploit the language niche, becoming highly optimized for it, and the benefits of language continue to build until they reach critical mass with the neolithic revolution.

Sunday, June 28, 2015

Thinking outside the quantum box

Doctor: Don't tell me you're lost too.
Shardovan: No, but as you guessed, Doctor, we people of Castrovalva are too much part of this thing you call the occlusion.
Doctor: But you do see it, the spatial anomaly.
Shardovan: With my eyes, no — but, in my philosophy.
— Doctor Who, Castrovalva, BBC.

I've made no particular secret, on this blog, that I'm looking (in an adventuresome sort of way) for alternatives to quantum theory. So far, though, I've mostly gone about it rather indirectly, fishing around the edges of the theory for possible angles of attack without ever engaging the theory on its home turf. In this post I'm going to shave things just a bit closer — fishing still, but doing so within line-of-sight of the NO FISHING sign. I'm also going to explain why I'm being so indirect, which bears on what sort of fish I think most likely here.

To remind, in previous posts I've mentioned two reasons for looking for an alternative to quantum theory. Both reasons are indirect, considering quantum theory in the larger context of other theories of physics. First, I reasoned that when a succession of theories are getting successively more complicated, this suggests some wrong assumption may be shared by all of them (here). Later I observed that quantum physics and relativity are philosophically disparate from each other (here), a disparity that has been an important motivator for TOE (Theory of Everything) physicists for decades.

The earlier post looked at a few very minor bits of math, just enough to derive Bell's Inequality, but my goal was only to point out that a certain broad strategy could, in a sense, sidestep the nondeterminism and nonlocality of quantum theory. I made no pretense of assembling a full-blown replacement for standard quantum theory based on the strategy (though some researchers are attempting to do so, I believe, under the banner of the transactional interpretation). In the later post I was even less concrete, with no equations at all.

Contents
The quantum meme
How to fish
Why to fish
Hygiene again
The structure of quantum math
The structure of reality

The quantum meme

Why fish for alternatives away from the heart of the quantum math? Aside, that is, from the fact that any answers to be found in the heart of the math already have, presumably, plenty of eyeballs looking there for them. If the answer is to be found there after all, there's no lasting harm to the field in someone looking elsewhere; indeed, those who looked elsewhere can cheerfully write off their investment knowing they played their part in covering the bases — if it was at least reasonable to cover those bases. But going into that investigation, one wants to choose an elsewhere that's a plausible place to look.

Supposing quantum theory can be successfully challenged, I suggest it's quite plausible the successful challenge might not be found by direct assault (even though eventual confrontation would presumably occur, if it were really successful). Consider Thomas Kuhn's account of how science progresses. In normal science, researchers work within a paradigm, focusing their energies on problems within the paradigm's framework and thereby making, hopefully, rapid progress on those problems because they're not distracting themselves with broader questions. Eventually, he says, this focused investigation within the paradigm highlights shortcomings of the paradigm so they become impossible to ignore, researchers have a crisis of confidence in the paradigm, and after a period of distress to those within the field, a new paradigm emerges, through the process he calls a scientific revolution. I've advocated a biological interpretation of this, in which sciences are a variety of memetic organisms, and scientific revolution is the organisms' reproductive process. But if this is so, then scientific paradigms are being selected by Darwinian evolution. What are they being selected for?

Well, the success of science hinges on paradigms being selected for how effectively they allow us to understand reality. Science is a force to be reckoned with because our paradigms have evolved to be very good at helping us understand reality. That's why the scientific species has evolved mechanisms that promote empirical testing: in the long run, if you promote empirical testing and pass that trait on to your descendants, your descendants will be more effective, and therefore thrive. So far so good.

In theory, one could imagine that eventually a paradigm would come along so consistent with physical reality, and with such explanatory power, that it would never break down and need replacing. In theory. However, there's another scenario where a paradigm could get very difficult to break down. Suppose a paradigm offers the only available way to reason about a class of situations; and within that class of situations are some "chinks in the armor", that is, some considerations whose study could lead to a breakdown of the paradigm; but the only way to apply the paradigm is to frame things in a way that prevents the practitioner from thinking of the chinks-in-the-armor. The paradigm would thus protect itself from empirical attack, not by being more explanatory, but by selectively preventing empirical questions from being asked.

What characteristics might we expect such a paradigm to have, and would they be heritable? Advanced math that appears unavoidable would seem a likely part of such a complex. If learning the subject requires indoctrination in the advanced math, then whatever that math is doing to limit your thinking will be reliably done to everyone in the field; and if any replacement paradigm can only be developed by someone who's undergone the indoctrination, that will favor passing on the trait to descendant paradigms. General relativity and quantum theory both seem to exhibit some degree of this characteristic. But while advanced math may be an enabler, it might not be enough in itself. A more directly effective measure, likely to be enabled by a suitable base of advanced math, might be to make it explicitly impossible to ask any question without first framing the question in the form prescribed by the paradigm — as quantum theory does.

This suggests to me that the mathematical details of quantum theory may be a sort of tarpit, that pulls you in and prevents you from leaving. I'm therefore trying to look at things from lots of different perspectives in the general area without ever getting quite so close as to be pulled in. Eventually I'll have to move further and further in; but the more outside ideas I've tied lines to before then, the better I'll be able to pull myself out again.

How to fish

What I'm hoping to get out of this fishing expedition is new ideas, new ways of thinking about the problem. That's ideas, plural. It's not likely the first new idea one comes up with will be the key to unlocking all the mysteries of the universe. It's not even likely that just one new idea would ever do it. One might need a lot of new ideas, many of which wouldn't actually be part of a solution — but the whole collection of them, including all the ones not finally used, helps to get a sense of the overall landscape of possibilities, which may help in turning up yet more new ideas inspired from earlier ones, and indeed may make it easier to recognize when one actually does strike on some combination of ideas that produce a useful theory.

Hence my remark, in an aside in an earlier post, that I'm okay with absurd as long as it's different and shakes up my thinking.

Case in point. In the early 1500s, there was this highly arrogant and abrasive iconoclastic fellow who styled himself Philippus Aureolus Theophrastus Bombastus von Hohenheim; ostensibly our word "bombastic" comes from his name. He rejected the prevailing medical paradigm of his day, which was based on ancient texts, and asserted his superiority to the then-highly-respected ancient physician Celsus by calling himself "Paracelsus", which is the name you've probably heard of him under. He also shook up alchemical theory; but I mention him here for his medical ideas. Having rejected the prevailing paradigm, he was rather in the market for alternatives. He advocated observing nature, an idea that really began to take off after he shook things up. He advocated keeping wounds clean instead of applying cow dung to them, which seems a good idea. He proposed that disease is caused by some external agent getting into the body, rather than by an imbalance of humours, which sounds rather clever of him. But I'm particularly interested that he also, grasping for alternatives to the prevailing paradigm, borrowed from folk medicine the principle of like affects like. Admittedly, you couldn't do much worse than some of the prevailing practices of the day. But I'm fascinated by his latching on to like-effects-like, because it demonstrates how bits of replicative material may be pulled in from almost anywhere when trying to form a new paradigm. Having seen that, it figured later into my ideas on memetic organisms.

It also, along the way, flags out the existence of a really radically different way of picturing the structure of reality. Like-affects-like is a wildly different way of thinking, and therefore ought to be a great limbering-up exercise.

In fact, like-affects-like is, I gather, the principle underlying the anthropological phenomenon of magic — sympathetic magic, it's called. I somewhat recall an anthropologist expounding at length (alas, I wish I could remember where) that anthropologically this can be understood as the principle underlying all magic. So I got to thinking, what sort of mathematical framework might one use for this sort of thing? I haven't resolved a specific answer for the math framework, yet; but I've tried to at least set my thoughts in order.

What I'm interested in here is the mathematical and thus scientific utility of the like-affects-like principle, not its manifestation in the anthropological phenomenon of magic (as Richard Cavendish observed, "The religious impulse is to worship, the scientific to explain, the magical to dominate and command"). Yet the term "like affects like" is both awkward and vague; so I use the term sympathy for discussing it from a mathematical or scientific perspective.

How might a rigorous model of this work, structurally? Taking a stab at it, one might have objects, each capable of taking on characteristics with a potentially complex structure, and patterns which can arise in the characteristics of the objects. Interactions between the objects occur when the objects share a pattern. The characteristics of objects might be dispensed with entirely, retaining only the patterns, provided one specifies the structure of the range of possible patterns (perhaps a lattice of patterns?). There may be a notion of degrees of similarity of patterns, giving rise to varying degrees of interaction. This raises the question of whether one ought to treat similar patterns as sharing some sort of higher-level pattern and themselves interacting sympathetically. More radically, one might ask whether an object is merely an intersection of patterns, in which case one might aspire to — in some sense — dispense with the objects entirely, and have only a sort of web of patterns. Evidently, the whole thing hinges on figuring out what patterns are and how they relate to each other, then setting up interactions on that basis.

I distinguish between three types of sympathy:

Pseudo-sympathy (type 0). The phenomenon can be understood without recourse to the sympathetic principle, but it may be convenient to use sympathy as a way of modeling it.
Weak sympathy (type 1). The phenomenon may in theory arise from a non-sympathetic reality, but in practice there's no way to understand it without recourse to sympathy.
Strong sympathy (type 2). The phenomenon cannot, even in theory, arise from a non-sympathetic reality.

All of which gives, at least, a lower bound on how far outside the box one might think. One doesn't have to apply the sympathetic principle in a theory, in order to benefit from the reminder to keep one's thinking limber.

(It is, btw, entirely possible to imagine a metric space of patterns, in which degree of similarity between patterns becomes distance between patterns, and one slides back into a geometrical model after all. To whatever extent the merit of the sympathetic model is in its different way of thinking, to that extent one ought to avoid setting up a metric space of patterns, as such.)

Why to fish

Asking questions is, broadly speaking, good. A line comes to mind from James Gleick's biography of Feynman (quoted favorably by Freeman Dyson): "He believed in the primacy of doubt, not as a blemish upon our ability to know but as the essence of knowing." Nevertheless, one does have to pick and choose which questions are worth spending most effort on; as mentioned above, the narrow focus of normal scientific research enables its often-rapid progress. I've been grounding my questions about quantum mechanics in observations about the character of the theory in relation to other theories of physics.

By contrast, one could choose to ground one's questions in reasoning about what sort of features reality can plausibly have. Einstein did this when maintaining that the quantum theory was an incomplete theory of the physical world — that it was missing some piece of reality. An example he cited is the Schrödinger's cat thought-experiment: Until observed, a quantum system can exist in a superposition of states. So, set up an experiment in which a quantum event is magnified into a macroscopic event — through a detector, the outcome of the quantum event causes a device to either kill or not kill a cat. Put the whole experimental apparatus, including the cat, in a box and close it so the outcome cannot be observed. Until you open the box, the cat is in a superposition of states, both alive and dead. Einstein reasoned that since the quantum theory alone would lead to this conclusion, there must be something more to reality that would disallow this superposition of cat.

The trouble with using this sort of reasoning to justify a line of research is, all it takes to undermine the justification is to say there's no reason reality can't be that strange.

Hence my preference for motivations based on the character of the theory, rather than the plausibility of the reality it depicts. My reasoning is still subjective — which is fine, since I'm motivating asking a question, not accepting an answer — but at least the reasoning is then not based on intuition about the nature of reality. Intuition specifically about physical reality could be right, of course, but has gotten a bad reputation — as part of the necessary process by which the quantum paradigm has secured its ecological niche — so it's better in this case to base intuition on some other criterion.

Hygiene again

To make sure I'm fully girded for battle — this is rough stuff, one can't be too well armed for it — I want to revisit some ideas I collected in earlier blog posts, and squeeze just a bit more out of them than I did before.

My previous thought relating explicitly to Theories of Everything was that, drawing an analogy with vau-calculi, spacetime geometry should perhaps be viewed not as a playing field on which all action occurs, but rather as a hygiene condition on the interactions that make up the universe. This analogy can be refined further. The role of variables in vau-calculi is coordinating causal connections between distant parts of the term. There are four kinds of variables, but unboundedly many actual variables of each kind; and α-renaming keeps these actual variables from bleeding into each other. A particular variable, though we may think of it as a very simple thing — a syntactic atom, in fact — is perhaps better understood as a distributed, complex-structured entity woven throughout the fabric of a branch of the term's syntax tree, humming with the dynamically maintained hygiene condition that keeps it separate from other variables. It may impinge on a large part of the α-renaming infrastructure, but most of its complex distributed structure is separate from the hygiene condition. The information content of the term is largely made up of these complex, distributed entities, with various local syntactic details decorating the syntax tree and regulating the rewriting actions that shape the evolution of the term. Various rewriting actions cause propagation across one (or perhaps more than one) of these distributed entities — and it doesn't actually matter how many rewriting steps are involved in this propagation, as for example even the substitution operations could be handled by gradually distributing information across a branch of the syntax tree via some sort of "sinking" structure, mirror to the binding structures that "rise" through the tree.

Projecting some of this, cautiously, through the analogy to physics, we find ourselves envisioning a structure of reality in which spacetime is a hygiene condition on interwoven, sprawling complex entities that impinge on spacetime but are not "inside" it; whose distinctness from each other is maintained by the hygiene condition; and whose evolution we expect to describe by actions in a dimension orthogonal to spacetime. The last part of which is interestingly suggestive of my other previous post on physics, where I noted, with mathematical details sufficient to make the point, that while quantum physics is evidently nondeterministic and nonlocal as judged relative to the time dimension, one can recover determinism and locality relative to an orthogonal dimension of "meta-time" across which spacetime evolves.

One might well ask why this hygiene condition in physics should take the form of a spacetime geometry that, at least at an intermediate scale, approximates a Euclidean geometry of three space and one time dimension. I have a thought on this, drawing from another of my irons in the fire; enough, perhaps, to move thinking forward on the question. This 3+1 dimension structure is apparently that of quaternions. And quaternions are, so at least I suspect (I've been working on a blog post exploring this point), the essence of rotation. So perhaps we should think of our hygiene condition as some sort of rotational constraint, and the structure of spacetime follows from that.

I also touched on Theories of Everything in a recent post while exploring the notion that nature is neither discrete nor continuous but something between (here). If there is a balance going on between discrete and continuous facets of physical worldview, apparently the introduction of discrete elementary particles is not, in itself, enough discreteness to counterbalance the continuous feature provided by the wave functions of these particles, and the additional feature of wave-function collapse or the like is needed to even things out. One might ask whether the additional discreteness associated with wave-function collapse could be obviated by backing off somewhat on the continuous side. The uncertainty principle already suggests that the classical view of particles in continuous spacetime — which underlies the continuous wave function (more about that below) — is an over-specification; the need for additional balancing discreteness might be another consequence of the same over-specification.

Interestingly, variables in λ-like calculi are also over-specified: that's why there's a need for α-renaming in the first place, because the particular name chosen for a variable is arbitrary as long as it maintains its identity relative to other variables in the term. And α-renaming is the hygiene device analogized to geometry in physics. Raising the prospect that to eliminate this over-specification might also eliminate the analogy, or make it much harder to pin down. There is, of course, Curry's combinatorial calculus which has no variables at all; personally I find Church's variable-using approach easier to read. Tracing that through the analogy, one might conjecture the possibility of constructing a Theory of Everything that didn't need the awkward additional discreteness, by eliminating the distributed entities whose separateness from each other is maintained by the geometrical hygiene condition, thus eliminating the geometry itself in the process. Following the analogy, one would expect this alternative description of physical reality to be harder to understand than conventional physics. Frankly I have no trouble believing that a physics without geometry would be harder to understand.

The idea that quantum theory as a model of reality might suffer from having had too much put into it, does offer a curious counterpoint to Einstein's suggestion that quantum theory is missing some essential piece of reality.

The structure of quantum math

The structure of the math of quantum theory is actually pretty simple... if you stand back far enough. Start with a physical system. This is a small piece of reality that we are choosing to study. Classically, it's a finite set of elementary things described by a set of parameters. Hamilton (yes, that's the same guy who discovered quaternions) proposed to describe the whole behavior of such a system by a single function, since called a Hamiltonian function, which acts on the parameters describing the instantaneous state of the system together with parameters describing the abstract momentum of each state parameter (essentially, how the parameters change with respect to time). So the Hamiltonian is basically an embodiment of the whole classical dynamics of the system, treated as a lump rather than being broken into separate descriptions of the individual parts of the system. Since quantum theory doesn't "do" separate parts, instead expecting everything to affect everything else, it figures the Hamiltonian approach would be particularly compatible with the quantum worldview. Nevertheless, in the classical case it's still possible to consider the parts separately. For a system with a bunch of parts, the number of parameters to the Hamiltonian will be quite large (typically, at least six times the number of parts — three coordinates for position and three for momentum of each part).

Now, the quantum state of the system is described by a vector over a complex Hilbert space of, typically, infinite dimension. Wait, what? Yes, that's an infinite number of complex numbers. In fact, it might be an uncountably infinite number of complex numbers. Before you completely freak out over this, it's only fair to point out that if you have a real-valued field over three-dimensional space, that's an uncountably infinite number of real numbers (the number of locations in three-space being uncountably infinite). Still, the very fact that you're putting this thing in a Hilbert space, which is to say you're not asking for any particular kind of simple structure relating the different quantities, such as a three-dimensional Euclidean continuum, is kind of alarming. Rather than a smooth geometric structure, this is a deliberately disorganized mess, and honestly I don't think it's unfair to wish there were some more coherent reality "underneath" that gives rise to this infinite structure. Indeed, one might suspect this is a major motive for wanting a hidden variable theory — not wishing for determinism, or wishing for locality, but just wishing for a simpler model of what's going on. David Bohm's hidden variable theory, although it did show one could recover determinism with actual classical particles "underneath", did so without simplifying the mathematics — the mathematical structure of the quantum state was still there, just given a makeover as a potential field. In my earlier account of this bit of history, I noted that Einstein, seeing Bohm's theory, remarked, "This is not at all what I had in mind." I implied that Einstein didn't like Bohm's theory because it was nonlocal; but one might also object that Bohm's theory doesn't offer a simpler underlying reality, rather a more complicated one.

The elements of the vector over Hilbert space are observable classical states of the system; so this vector is indexed by, essentially, the sets of possible inputs to the Hamiltonian. One can see how, step by step, we've ended up with a staggering level of complexity in our description, which we cope with by (ironically) not looking at it. By which I mean, we represent this vast amorphous expanse of information by a single letter (such as ψ), to be manipulated as if it were a single entity using operations that perform some regimented, impersonal operation on all its components that doesn't in general require it to have any overall shape. I don't by any means deride such treatments, which recover some order out of the chaos; but it's certainly not reassuring to realize how much lack of structure is hidden beneath such neat-looking formulae as the Schrödinger equation. And the amorphism beneath the elegant equations also makes it hard to imagine an alternative when looking at the specifics of the math (as suspected based on biological assessment of the evolution of physics).

The quantum situation gets its structure, and its dynamics, from the Hamiltonian, that single creature embodying the whole of the rules of classical behavior for the system. The Schrödinger equation (or whatever alternative plays its role) governs the evolution of the quantum state vector over time, and contains within it a differential operator based on the classical Hamiltonian function.

iℏ ∂ Ψ

∂t

= Ĥ Ψ .

One really wants to stop and admire this equation. It's a linear partial differential equation, which is wonderful; nonlinearity is what gives rise to chaos in the technical sense, and one would certainly rather deal with a linear system. Unfortunately, the equation only describes the evolution of the system so long as it remains a purely quantum system; the moment you open the box to see whether the cat is dead, this wave function collapses into observation of one of the classical states indexing the quantum state vector, with (to paint in broad strokes) the amplitudes of the complex numbers in the vector determining the probability distribution of observed classical states.

It also satisfies James Clerk Maxwell's General Maxim of Physical Science, which says (as recounted by Ludwik Silberstein) that when we take the derivatives of our system with respect to time, we should end up with expressions that do not themselves explicitly involve time. When this is so, the system is "complete", or, "undisturbed". (The idea here is that if the rules governing the system change over time, it's because the system is being affected by some other factor that is varying over time.)

The equation is, indeed, highly seductive. Although I'm frankly on guard against it, yet here I am, being drawn into making remarks on its properties. Back to the question of structure. This equation effectively segregates the mathematical description of the system into a classical part that drives the dynamics (the Hamiltonian), and a quantum part that smears everything together (the quantum state vector). The wave function Ψ, described by the equation, is the adapter used to plug these two disparate elements together. The moment you start contemplating the equation, this manner of segregating the description starts to seem inevitable. So, having observed these basic elements of the quantum math, let us step back again before we get stuck.

The key structural feature of the quantum description, in contrast to classical physics, is that the parts can't be considered separately. This classical separability produced the sense of simplicity that, I speculated above, could be an ulterior motive for hidden variable theories. The term for this is superposition of states, i.e., a quantum state that could collapse into any of multiple classical states, and therefore must contain all of those classical states in its description.

A different view of this is offered by so-called quantum logic. The idea here (notably embraced by physicist David Finkelstein, who I've mentioned in an earlier post because he was lead author of some papers in the 1960s on quaternion quantum theory) is that quantum theory is a logic of propositions about the physical world, differing fundamentally from classical propositional logic because of the existence of superposition as a propositional principle. There's a counterargument that this isn't really a "logic", because it doesn't describe reasoning as such, just the behavior of classical observations when applied as a filter to quantum systems; and indeed one can see that something of the sort is happening in the Schrödinger equation, above — but that would be pulling us back into the detailed math. Quantum logic, whatever it doesn't apply to, does apply to observational propositions under the regime of quantum mechanics, while remaining gratifyingly abstracted from the detailed quantum math.

Formally, in classical logic we have the distributive law

P and (Q or R) = (P and Q) or (P and R) ;

but in quantum logic, (Q or R) is superpositional in nature, saying that we can eliminate options that are neither, yet allowing more than the union of situations where one holds and situations where the other holds; and this causes the distributive law to fail. If we know P, and we know that either Q or R (but we may be fundamentally unable to determine which), this is not the same as knowing that either both P and Q, or both P and R. We aren't allowed to refactor our proposition so as to treat Q separately from R, without changing the nature of our knowledge.

[note: I've fixed the distributive law, above, which I botched and didn't even notice till, thankfully, a reader pointed it out to me. Doh!]

One can see in this broadly the reason why, when we shift from classical physics to quantum physics, we lose our ability to consider the underlying system as made up of elementary things. In considering each classical elementary thing, we summed up the influences on that thing from each of the other elementary things, and this sum was a small tidy set of parameters describing that one thing alone. The essence of quantum logic is that we can no longer refactor the system in order to take this sum; the one elementary thing we want to consider now has a unique relationship with each of the other elementary things in the system.

Put that way, it seems that the one elementary thing we want to consider would actually have a close personal relationship with each other elementary thing in the universe. A very large Rolodex indeed. One might object that most of those elementary things in the universe are not part of the system we are considering — but what if that's what we're doing wrong? Sometimes, a whole can be naturally decomposable into parts in one way, but when you try to decompose it into parts in a different way you end up with a complicated mess because all of your "parts" are interacting with each other. I suggested, back in my first blog post on physics, that there might be some wrong assumption shared by both classical and quantum physics; well, the idea that the universe is made up of elementary particles (or quanta, whatever you prefer to call them) is something shared by both theories. The quantum math (Schrödinger equation again, above) has this classical decomposition built into its structure, pushing us to perceive the subsequent quantum weirdness as intrinsic to reality, or perhaps intrinsic to our observation of reality — but what if it's rather intrinsic to that particular way of slicing off a piece of the universe for consideration?

The quantum folks have been insisting for years that quantum reality seems strange only because we're imposing our intuitions from the macroscopic world onto the quantum-scale world where it doesn't apply. Okay... Our notion that the universe is made up of individual things is certainly based on our macroscopic experience. What if it breaks down sooner than we thought — what if, instead of pushing the idea of individual things down to a smaller and smaller scale until they sizzle apart into a complex Hilbert space, we should instead have concluded that individual things are something of an illusion even at macroscopic scales?

The structure of reality

One likely objection is that no matter how you split up reality, you'd still have to observe it classically and the usual machinery of quantum mechanics would apply just the same. There are at least a couple of ways — two come to mind atm — for some differently shaped 'slice' of reality to elude the quantum machinery.

The alternative slice might not be something directly observable.

Here an extreme example comes in handy (as hoped). Recall the sympathetic hypothesis, above. A pattern would not be subject to direct observation, any more than a Platonic ideal like "table" or "triangle" would be. (Actually, it seems possible a pattern would be a Platonic ideal.)

This is also reminiscent of the analogy with vau-calculus. I noted above that much of the substance of a calculus term is made up of variables, where by a variable I meant the entire dynamically interacting web delineated by a variable binding construct and all its matching variable instances. A variable in this sense isn't, so to speak, observable; one can observe a particular instance of a variable, but a variable instance is just an atom, and not particularly interesting.
The alternative slice might be something quantum math can't practically cope with. Quantum math is very difficult to apply in practice; some simple systems can be solved, but others are intractable. (It's fashionable in some circles to assume more powerful computers will solve all math problems. I'm reminded of a quote attributed to Eugene Wigner, commenting on a large quantum calculation: "It is nice to know that the computer understands the problem. But I would like to understand it, too.") It's not inconceivable that phenomena deviating from quantum predictions are "hiding in plain sight". My own instinct is that if this were so, they probably wouldn't be just on the edge of what we can cope with mathematically, but well outside that perimeter.

This raises the possibility that quantum mechanics might be an idealized approximation, holding asymptotically in a degenerate case — in somewhat the same way that Newtonian mechanics holds approximately for macroscopic problems that don't involve very high velocities.

We have several reasons, by this point, to suspect that whatever it is we're contemplating adding to our model of reality, it's nonlocal (that is, nonlocal relative to the time dimension, as is quantum theory). On one hand, bluntly, classical physics has had its chance and not worked out; we're already conjecturing that insisting on a classical approach is what got us into the hole we're trying to get out of. On the other hand, under the analogy we're exploring with vau-calculus, we've already noted that most of the term syntax is occupied by distributed variables — which are, in a deep sense, fundamentally nonlocal. The idea of spacetime as a hygiene condition rather than a base medium seems, on the face of it, to call for some sort of nonlocality; in fact, saying reality has a substantial component that doesn't follow the contours of spacetime is evidently equivalent to saying it's nonlocal. Put that way, saying that reality can be usefully sliced in a way that defies the division into elementary particles/things is also another way of saying it's nonlocal, since when we speak of dividing reality into elementary "things", we mean, things partitioned away from each other by spacetime. So what we have here is several different views of the same sort of conjectured property of reality. Keeping in mind, multiple views of a single structure is a common and fruitful phenomenon in mathematics.

I'm inclined to doubt this nonlocality would be of the sort already present in quantum theory. Quantum nonlocality might be somehow a degenerate case of a more general principle; but, again bluntly, quantum theory too has had its chance. Moreover, it seems we may be looking for something that operates on macroscopic scales, and quantum nonlocality (entanglement) tends to break down (decohere) at these scales. This suggests the prospect of some form of robust nonlocality, in contrast to the more fragile quantum effects.

So, at this point I've got in my toolkit of ideas (not including sympathy, which seems atm quite beyond the pale, limited to the admittedly useful role of devil's advocate):

a physical structure substantially not contained within spacetime.
- space emergent as a hygiene condition, perhaps rotation-related.
- robust nonlocality, with quantum nonlocality perhaps as an asymptotic degenerate case.
- some non-spacetime dimension over which one can recover abstract determinism/locality.
decomposition of reality into coherent "finite slices" in some way other than into elementary things in spacetime.
- slices may be either non-observable or out of practical quantum scope.
- the structural role of the space hygiene condition may be to keep slices distinct from each other.
- conceivably an alternative decomposition of reality may allow some over-specified elements in classical descriptions to be dropped entirely from the theory, at unknown price to descriptive clarity.

I can't make up my mind if this is appallingly vague, or consolidating nicely. Perhaps both. At any rate, the next phase of this operation would seem likely to shift further along the scale toward identifying concrete structures that meet the broad criteria. In that regard, it is probably worth remarking that current paradigm physics already decomposes reality into nonlocal slices (though not in the sense suggested here): the types of elementary particles. The slices aren't in the spirit of the "finite" condition, as there are only (atm) seventeen of them for the whole of reality; and they may, perhaps, be too closely tied to spacetime geometry — but they are, in themselves, certainly nonlocal.

Sunday, May 17, 2015

Numberless differentiation

The method described in I.3 resulted in an analogy between the "discrete" space of index values Z=(1,2,...) and the continuous state space Ω of the mechanical system[...]. That this cannot be achieved without some violence to the formalism and to mathematics is not surprising.
— John von Neumann, Mathematical Foundations of Quantum Mechanics, trans. Robert T. Beyer, 1955, §I.4.

In this post, I mean to explore a notion that arose somewhat peripherally in an earlier post. I noted that the abstractive power of a programming language is, in a sense, the second derivative of its semantics. I've also noted, in another post, a curious analogy between term rewriting calculi and cosmological physics, which I used to inspire some off-the-wall thinking about physics. So I've been wondering if the abstractive power notion might be developed into a formal tool for altogether non-numeric discrete systems akin to the continuous differential calculus that plays such a central role in modern physics.

Developing a mathematical tool of this sort is something of a guessing game. One hopes to find one's way to some natural structure in the platonic realm of the mathematics, using whatever clues one can discern, wherever one finds them (be they in the platonic realm or elsewhere).

I've a story from my mother, from when she was in graduate school, about a student asking for motivation for something in an algebraic topology class. The professor replied, when you build a beautiful cathedral you tear down the scaffolding. To me, honestly, this always seemed to be missing the point. You should keep a record of where the scaffolding was. And the architectural plans. And, while you're at it, the draft architectural plans.

Candidly, when I started this I did not know what I was expecting to find, and the path of reasoning has taken several unexpected turns along the way. One thing I haven't found is a purely non-numeric discrete device analogous to continuous differential calculus; but I've got a plate full of deep insights that deserve their chance to be shared, and if the non-numeric discrete device is there to discover, it seems I must be closer to it than I started.

Contents
Discrete continuous physics
Differentiation in physics
Abstractiveness in programming
Causation in term rewriting
Draft analogy
Breaking the draft analogy
Retrenching

Discrete continuous physics

Physics has been struggling with the interplay of continuous and discrete elements for centuries (arguably since ancient times). Continuous math is a clean powerful tool and therefore popular; on the face of it, one wouldn't expect to get nearly as much useful physics done with purely discrete structures. On the other hand, our ordinary experience includes discrete quantities as well as continuous ones (recalling Kronecker's attributed remark, God made the integers; all else is the work of man). Nineteenth-century classical physics reconciled continuous and discrete aspects by populating space with continuous fields and point particles.

The field/particle approach collapsed with quantum mechanics, which is commonly described as saying that various things appearing continuous at a macroscopic level turn out to be discrete at a very small scale. I'd put it almost the other way around. Our notion of discrete things comes from our ordinary, macroscopic experience, but evidently these macroscopic discrete things can be broken up into smaller things; and if we keep pushing this ordinary notion down to a sufficiently small scale, eventually it can't be pushed any further, and we end up having to postulate an underlying (non-observable) continuous wave function which, having been so-to-speak squashed as nearly flat as it can be, smears even supposedly discrete elementary particles into a continuous complex-valued field. Which gives rise to discrete events by means of wave-function collapse, the ultimate kludge. Wave-function collapse is a different way of reconciling continuous and discrete aspects of the theory, but tbh it feels a bit contrived.

In this view, howsoever we wandered into it, we've assumed a continuous foundation and sought to derive discreteness from the continuous substratum. Traditional quantum theory derives discreteness by means of wave function collapse. Some modern theories use the concept of a standing wave to derive discreteness; string theory does this, with closed loops vibrating at some overtone of their length, and the transactional interpretation does too, with waves propagating forward and backward in time between spacetime events and summing to a standing wave.

An alternative sometimes considered is that the foundation could be discrete — so-called digital physics. I've some doubts about this, intuitively, because — how to put this — I don't think Nature likes to be neatly classified as "discrete" or "continuous". I remember, in a long-ago brainstorming session with my father, posing the semi-rhetorical question, "what is neither discrete nor continuous?", to which he mildly suggested, "reality". A purely discrete foundation seems as unlikely to me as a purely continuous one; in fact, it's possible the overly continuous character of the underlying wave function in quantum theory may be one subtle reason I'm not fond of that theory.

I have a particular, if round-about, reason for interest in the prospects for introducing some form of discrete, nonlocal, nonquantum element into physics. I'd actually been thinking in this direction for some time before I realized why I was tending that way, and was rather excited to figure out why because it gives me a place to start looking for what kind of role this hypothetical discrete element might play in the physics. My reasoning comes from, broadly, Mach's principle — the idea that the way the local universe works is a function of the shape of the rest of the universe. (Mach's principle, btw, was given its name by Albert Einstein, who drew inspiration from it for general relativity.)

Suppose we want to study a very small part of the universe; say, for example, a single elementary particle. This requires us either to disregard the rest of the universe entirely, or to assume the influence of the rest of the universe on our single particle is neatly summarized by some simple factor in our system, such as a potential field. But quantum mechanics starts us thinking, somewhat at least, in terms of everything affecting everything else. Suppose this is actually more so than quantum mechanics tells us. Suppose our single particle is interacting nonlocally with the whole rest of the universe. We don't know anything about these interactions with the rest of the universe, so their influence on our one particle is —for us— essentially random. If it were all local and neatly summed up, we might try to ask for it to be described by a potential field of some sort; but since we're supposing this is nonlocal interaction, and we don't have any sort of nonlocal structure in our theory by which to describe it, we really can't do anything with it other than expect it to smear our probability distribution for our one particle.

This nonlocal interaction I'm talking about is presumably not, in general, the sort of nonlocal interaction that arises in quantum mechanics. That sort of interaction, at least when it occurs within the system one is studying, is quantum entanglement, and it's really quite fragile: at large scales it becomes very likely to decohere, its nonlocality breaking down under interactions with the rest of the universe. I'm hypothesizing, instead, some more robust sort of nonlocality, that can work on a large scale. From my reasoning above, it seems that as soon as we hypothesize some such robust nonlocal interaction, we immediately expect the behavior of our single-particle system to appear nondeterministic, simply because we no longer can have an undisturbed system, unless we consider the entire universe as a whole. In contrast to the frequent presentation of quantum nondeterminism as "strange", under this hypothesis nondeterminism at small scales is unsurprising — though not uninteresting, because we may get clues to the character of our robust nonlocal interaction by looking at how it disturbs our otherwise undisturbed small system.

Following these clues is made more challenging because we have to imagine not one new element of the theory, but two: on one hand, some sort of robust nonlocal interaction, and on the other hand, some local behavior that our small system would have had if the robust nonlocal interaction had not been present. We expect the familiar rules of quantum mechanics to be the sum of these two elements. I have already implied that the local element is not itself quantum mechanics, as I suggested something of quantum nondeterminism might be ascribed to the robust nonlocal element. One might suppose the local element is classical physics, and our robust nonlocal interaction is what's needed to sum with classical physics to produce quantum mechanics. Or perhaps, if we subtract the robust nonlocal interaction from quantum mechanics, we get something else again.

This also raises interesting possibilities at the systemic level. The sum of these two hypothetical elements doesn't have to be quantum mechanics exactly, it only has to be close enough for quantum mechanics to be useful when looking only at a small system; so, quantum mechanics could be a useful approximation in a special case, rather as Newtonian mechanics is. We may expect the robust nonlocal element to be significant at cosmological scales, and we may have misread some cosmological phenomena because we were expecting effects on that scale to be local. But just at the moment my point is that, since spacetime is apparently our primal source of continuity, and our robust nonlocal interaction is specifically bypassing that continuity, I would expect the new interaction to be at least partly discrete rather than continuous; intuitively, if it were altogether continuous one might expect it to be more naturally part of the geometry rather than remaining "nonlocal" as such.

Differentiation in physics

We want a schematic sense of the continuous device we're looking for an analogy to.

Suppose 𝓜 is a manifold (essentially, a possibly-curved n-dimensional space), and F is a field on 𝓜. The value of the field at a point p on 𝓜, Fp, might be a scalar (an ordinary number, probably real or complex), or perhaps a vector (directed magnitude); at any rate Fp would likely be some sort of tensor — but I'm trying to assume as little as possible. So Fp is some sort of field value at p.

The derivative of F at p is... something... that describes how F is changing on 𝓜 at p. If you know the derivative F ' of F at every point in 𝓜, you know the whole "shape" of F, and from this shape F ' you can almost reconstruct F. The process of doing so is integration. The process doesn't quite reconstruct F because you only know how F changes across 𝓜, not what it changes relative to. This absolute reference for all of F doesn't change, across the whole manifold 𝓜, so it isn't part of the information captured by F '. So the integral of F ' is, at least conceptually, F plus a constant.

By assuming just a bit more about the values of field F, we can see how to do more with derivative F ' than merely integrate it to almost-reconstruct F. The usual supposition is that Fp is either a number (a scalar, continuous), or a number together with a configuration relative to the manifold 𝓜. The idea here is that the configuration is a simple relationship to the manifold, and the number gives it "depth" (more-or-less literally) by saying "how big" is the configured entity. The simplest such configured value is a vector: a directed magnitude, where the direction on the manifold is the configuration, and the magnitude in that direction is the number. One can also have a bivector, where the configuration, rather than a linear direction, is a two-dimensional plane orientation. There's a more subtle distinction between vectors and covectors (in general, multivectors and comultivectors), which has to do with integration versus differentiation. These distinctions of "type" (scalar, vector, covector, bivector, cobivector) are relatively crude; most of the configuration information is a continuous orientation, some variant of "direction". And ultimately, we use the configurations to guide relations between numbers. All that information about "shape" is channeled into numerical equations.

The continuity of the manifold comes into the situation in two ways. One, already obliquely mentioned, is that the configuration of the field is only partly discrete, the rest of it (in all but the scalar-field case) being a sort of pseudo-direction on the manifold whose value is continuous, and interacts in some continuous fashion with the numerical depth/intensity of the field. The other insinuation of manifold continuity into the continuous derivative is more pervasive: the derivative describes the shape of the field at a point p by taking the shape in an ε-neighborhood of p in the limit as ε approaches zero. The very idea of this limit requires that the neighborhood size ε (as well as the field intensity and probably the pseudo-direction) be continuous.

Abstractiveness in programming

The analogy needs a similar schematic of the discrete device. Abstractiveness, in my treatment, is a relation between programming languages, in which some programming language Q is formally as abstractive as programming language P.

The basic structure on which the treatment rests is an abstract state machine, whose states are "languages" (ostensibly, programming languages), and whose transitions are labeled by texts. The texts are members of a set T of terms generated over a context-free grammar; each text is treated as an atomic whole, so when we consider a sequence of texts, the individual texts within the sequence remain distinct from each other, as if T were the "alphabet" in which the sequence is written (rather than all the texts in the sequence being run together as a string over a more primitive alphabet). Some subset of the texts are designated as observables, which are used to judge equivalences between programs. In a typical interpretation, non-observable texts are units of program code, such as module declarations, while observable texts are descriptors of program output.

Everything that matters about a state of this abstract machine is captured by the set of text sequences possible from that state (a touch of Mach's principle, there); so formally we define each language to be the set of text sequences, without introducing separate objects to represent the states. A language P is thus a set of text sequences, P ⊆ T^*, that's closed under prefixing (that is, if xy ∈ P then x ∈ P). We write P/x for the language reached from language P by text sequence x ∈ P; that is, P/x = { y | xy ∈ P }.

These structures are used to define formally when one language "does" at least as much as another language. The criterion is existence of a function φ : P → Q, also written φP = Q, that maps each text sequence x ∈ P to a text sequence φx ∈ φP such that φ preserves prefixes (φx is always a prefix of φxy) and φ also preserves some property of P understood as defining what it "does". For language expressiveness, this additional preserved property is concerned with the observables, resulting in a formal statement that Q is at least as expressive as P. For language abstractiveness, the additional preserved property is concerned with both observables and expressiveness relations between arbitrary P/x and P/y, resulting in a formal statement that Q is at least as abstractive as P.

The function φ is understood as "rewriting" P texts as Q texts; and the less intrusive this rewriting is, the more easily Q can subsume the capabilities of P. For expressiveness, the class of macro (aka polynomial) transformations is traditionally of interest; operations that can be eliminated by this sort of transformation are traditionally called syntactic sugar.

Causation in term rewriting

Anyone with a bit of background in computer science has probably worked both with grammars and with term-rewriting calculi (though likely not in depth at the same time). Have you thought about the contrast between them? I hadn't, really, until I ended up doing a master's thesis formulating a novel grammar model (pdf), followed by a doctoral dissertation formulating a novel term-rewriting calculus (here). Since I suspect grammars tend to be thought of with less, well, gravitas than calculi, I'll explain a bit of how the grammars in my master's thesis work.

The formalism is called RAGs — Recursive Adaptive Grammars. Where an attribute grammar would have a terminal alphabet for syntax, a nonterminal alphabet for metasyntax (that is, for specifying classes of syntax), and various domains of attribute values for semantics, a RAG has a single unified domain of answers serving all three roles (except that for metasyntax I'll use the deeper name cosemantics). A rule form looks like

⟨v₀,p₀⟩ → t₁ ⟨p₁,v₁⟩ ... t_n ⟨p_n,v_n⟩ t_n+1

where each t_k is a syntax string, and each ⟨•,•⟩ is a cosemantics/semantics pair suitable for labeling a parent node in a parse tree. Conceptually, cosemantics is inherited, specifying how a tree can grow downward (the cosemantics on the root node is the "start symbol"); while semantics is synthesized, specifying the resultant meaning of the parsed string. The v_k are variables, provided from other parts of the tree: cosemantics of the parent node, provided from above; and semantics of the children, provided from below. The p_k are polynomials, determined by the current rule from these variables, hence, from the cosemantics of the parent and semantics of the children. (If the grammar is well-behaved, each child's cosemantics p_k only uses variables v₀,...v_k−1, so there are no circular dependencies in the parse tree.)

Playing the role of the rule set of an attribute grammar, a RAG has a rule function ρ which maps each answer a into a set of rule forms ρ(a); in selecting answers for the variables in a rule form r (producing a rule instance of r), the inherited cosemantics v₀ has to be some answer a such that r ∈ ρ(a).

If the polynomials were only allowed to construct answers, our grammars could recognize somewhat more than context-free languages; but we also want to compute semantics using Turing-powerful computation. For this, I permitted in the polynomials a non-constructive binary operator — the query operator, written as an infix colon, •:•. Conceptually, parsing starts with cosemantics, parses syntax, and in doing so synthesizes semantics; but in the usual grammatical derivation relation, cosemantics and semantics are both on the left side of the derivation, while syntax is on the right, thus:

⟨cosemantics, semantics⟩ ⇒⁺ syntax

The query operator rearranges these elements, because it has the meaning "find the semantics that results from this cosemantics on this syntax":

cosemantics : syntax ⇒⁺ semantics

So we can embed queries in our rule forms, hence the "recursive" in "Recursive Adaptive Grammars". However, I meant RAGs to be based on an elementary derivation step; so I needed a set of derivation step axioms that would induce the above equivalence. My solution was to introduce one more operator — not permitted in rule forms at all, but used during intermediate steps in derivation. The operator: inverse, denoted by an overbar.

Imho, the inverse operator is elegant, bizarre, and cool. What it does is reverse the direction of derivation. That is, for any derivation step c₁ ⇒ c₂, we have another step c₂ ⇒ c₁. The inverse of the inverse of any term c is just the same term c back again; and every answer a is its own inverse. So, given any three answers a₁, a₂, and a₃, if ⟨a₁,a₂⟩ ⇒⁺ a₃ then a₃ ⇒⁺ ⟨a₁,a₂⟩ .

The whole derivation step relation is defined by four axioms. We've already discussed the first two of them, and the third is just the usual rewriting property of compatibility:

If c₁ ⇒ c₂ then c₂ ⇒ c₁ .
If ⟨a₁,a₂⟩ → a₃ is a rule instance of r ∈ ρ(a₁), then ⟨a₁,a₂⟩ ⇒ a₃ .
If c₁ ⇒ c₂ and C is a context in which the missing subterm isn't inside an inverse operator, then C[c₁] ⇒ C[c₂] .

The fourth axiom is the spark that makes queries come alive.

a₁ : ⟨a₁,a₂⟩ ⇒ a₂ .

So now, from ⟨a₁,a₂⟩ ⇒⁺ a₃ we can deduce by the first axiom a₃ ⇒⁺ ⟨a₁,a₂⟩ , by the third axiom a₁ : a₃ ⇒⁺ a₁ : ⟨a₁,a₂⟩ , and stringing this together with the fourth axiom, a₁ : a₃ ⇒⁺ a₂ .

The converse, that a₁ : a₃ ⇒⁺ a₂ implies ⟨a₁,a₂⟩ ⇒⁺ a₃ , can also be proven without too much fuss... if one has already proven the basic result that an answer cannot be derived from another answer. That is, if answer a is derived from c in one or more steps, c ⇒⁺ a , then c is not an answer. This would be trivial if not for the inverse operator, which allows an answer a to be the left-hand side of infinitely many derivations (the inverses of all derivations that end with a); the theorem says that once you've derived a, further derivation from there can't reach another answer. It took me two rather messy pages to prove this in my master's thesis (simple proofs, if they exist, can take decades to find), and I realized at the time it was a sort of analog to the Church-Rosser theorem for a calculus — an elementary well-behavedness result that one really wants to prove first because without it one has trouble proving other things.

I didn't think more of it until, years later when explaining RAGs to my dissertation committee (though in the end I didn't use RAGs in my dissertation), I got to see someone else go through a WTF moment followed by an aha! moment over the virulent un-Church-Rosser-ness of RAGs. Somehow it hadn't registered on me just how very like a term-rewriting calculus my RAG formalism might look, to someone used to working with λ-like calculi. Mathematical systems are routinely classified according to their formal properties, but there's something deeper going on here. RAGs, despite being in a meaningful way Turing-powerful, don't just lack the Church-Rosser property, they don't want it. A λ-like calculus that isn't Church-Rosser is badly behaved, likely pathological; but a grammar that is Church-Rosser is degenerate. It's a matter of purpose, which is not covered by formal mathematical properties.

My point (approached obliquely, but I'm not sure one would appreciate it properly if one didn't follow a route that offers a good view of the thing on approach) is that the purpose of a calculus is mainly to arrive at the result of an operation, whereas the purpose of a grammar is to relate a starting point to the usually-unbounded range of destinations reachable from it. This requires us to think carefully on what we're doing with a discrete analog to the conventional continuous notion of differentiation, because when we take a conventional derivative, the thing we're differentiating is a single-valued function, akin in purpose to a calculus meant to produce a result, whereas when we describe expressiveness/abstractiveness as derivatives of semantics, the thing we're differentiating appears, on the face of it, to be very like a grammar, describing a usually-unbounded network of roads departing from a common starting point.

Draft analogy

As foundation for a notion of discrete differentiation, we're looking for a useful correspondence between elements of these two kinds of structures, continuous and discrete. So far, it might seem we've identified only differences, rather than similarities, between the two. Continuous differentiation is necessarily in the "calculus" camp, abstractiveness/expressiveness necessarily in the "grammar" camp. Continuous differentiation at a point p is fundamentally local, depending in its deepest nature on a family of ε-neighborhoods of p as ε varies continuously toward zero. Abstractiveness/expressiveness of a language P is fundamentally global, depending in its deepest nature on a family of languages reachable from P by means of (in general, unbounded) text sequences. The local structure of the continuous case doesn't even exist in the discrete case, where language P has a set of closest neighbors; and the global structure on which the discrete case depends is deliberately ignored by the continuous derivative.

A starting point for an analogy is hidden in plain sight (though perhaps I've given it away already, as I chose my presentation to set the stage for it). Recall again Mach's principle, that the way the local universe works is a function of the shape of the rest of the universe. Wikipedia offers this anecdote for it:

You are standing in a field looking at the stars. Your arms are resting freely at your side, and you see that the distant stars are not moving. Now start spinning. The stars are whirling around you and your arms are pulled away from your body. Why should your arms be pulled away when the stars are whirling? Why should they be dangling freely when the stars don't move?

(One is reminded of the esoteric principle "as above, so below".)

In our continuous and discrete cases, we have two ways to examine the properties of a global structure at a particular location (p or P), one of which draws on an unbounded family of ε-neighborhoods approaching arbitrarily close, and the other on an (in general) unbounded family of discrete structures receding arbitrarily far. Either way, we derive our understanding of conditions at the particular location by aggregating from an unbounded, coherent structure. One might call the two strategies "unbounded locality" and "unbounded globality". If we accept that the way the local universe works and the global structure of the universe are two faces of the same thing, then these strategies ought to be two ways of getting at the same thing.

This strategic similarity is reassuring, but lacks tactical detail. In the continuous case, field F gave us a value at point p, the derivative gave us another field with again a value at p, and we could keep applying the differentiation operation (if F was sufficiently well-behaved to start with) producing a series of higher-derivative fields, each providing again a value at p. Yet in the discrete case we don't appear to have a local "value" at P, the purely local "state" being so devoid of information that we actually chose to drop the states from the formal treatment altogether; and while we may intuit that expressiveness is the derivative of semantics, and abstractiveness the derivative of expressiveness, the actual formalisms we constructed for these three things were altogether different from each other. The semantics apparently had only the text sequences, set of observables, and a (typically, infinite) discrete structure called a "language"; expressiveness added to this picture functions between languages; and abstractiveness augmented each language with a family of functions between the different nodes within each language, then defining functions between these "languages with expressive structure". On the face of it, this doesn't appear to be a series of structures of the same kind (as fields F, F ', F '' are of the same kind, granting we do expect them to vary their configurations in some regular, generalizable way).

In my treatment of abstractive power, though, I noted that if, in augmenting languages P and Q with "expressive structure", the family of functions we use is the degenerate family of identity functions, then the relation "〈Q,Id〉 is as abstractive as 〈P,Id〉" is equivalent to "Q is as expressive as P". This, together with Mach's principle, gives us a tactical analogy. The underlying "state machine", whose states are languages and whose transitions are labeled by program texts, corresponds to manifold 𝓜. A particular state, taken together with the rest of the machine reachable from that state, corresponds to point p of the manifold. An "expressive structure" overlain on the machine, creating a web of functions between its states, corresponds to a field F (or F ', etc.). Differentiation is then an operation mapping a given "field" (overlying web of functions) to another. Thus, starting with our discrete machine 𝓜, overlaying the family of identity functions gives us the semantics of 𝓜, differentiating the semantics gives 𝓜's expressive structure, differentiating again gives its abstractive structure.

This analogy provides a way for us to consider the various parts of our discrete scheme in terms of the major elements of the continuous scheme (manifold, field, point). But on closer inspection, there are a couple of flaws in the analogy that need to be thought out carefully — and these flaws will lead us to a second philosophical difference comparable in depth to the difference between grammars and term-rewriting calculi.

Breaking the draft analogy

One mismatch in the analogy concerns the difference in substance between 𝓜 and F. In the continuous case, manifold 𝓜 consists entirely of the oriented distances between its points (we know the distances themselves are numbers because they are the ε needed for differentiation). If you start with a field F of uniform value across 𝓜, and take its integral, you can reconstruct 𝓜. The key point here is that 𝓜 and F are made of the same "stuff". In our discrete scheme, though, 𝓜 is a state machine with term-labeled transitions, while F is a family of functions between term sequences. The discrete 𝓜 and F are concretely different structures, made of different "stuff" rather than commensurate in the manner of the continuous 𝓜 and F. It might not be immediately obvious what the consequences would be of this analogy mismatch; but as happens, it feeds into the second mismatch, which is more directly concerned with the practical use of derivatives in physics.

The second mismatch concerns how the manifold changes over time — or, equivalently, its curvature. Our main precedent here is general relativity.

In general relativity, mass warps the space near its position, and the resulting curvature of space guides the movement of the mass over time. It's not apparent that abstractiveness/expressiveness has anything analogous to time in this sense. It should be apparent, though —once one takes a moment to consider— that general relativity doesn't have absolute time either. Observers moving relative to each other will disagree in their perceptions of time, and of space, and of which observed events are and are not simultaneous; so rather than absolute time and space, we have four-dimensional spacetime that looks different to different observers. Thinking of spacetime as a whole, rather than trying to factor it into absolute space and time, a particle with mass traces a one-dimensional curve across the four-dimensional manifold of spacetime. When we said, a moment ago, that the mass's position warps space while space guides the mass's movement, this mutual influence is described by mathematical equations involving the manifold 𝓜 and fields F, F ', etc. These equations are all about numbers (which don't occur in our discrete scheme) and rely on the fact that the manifold and the fields are both made out of this same numeric stuff (which they aren't in our discrete scheme). Evidently, the key element here which we have failed to carry across the analogy is that of equations encompassing both 𝓜 and F.

But, when we look more closely to try to find a discrete element to support an analogy with the continuous equations, we find that the omission is after all something more primal than numbers, or even commonality of substance between 𝓜 and F. The omission is one of purpose. In the continuous case, the purpose of those equations is to enable us to solve for 𝓜 and F. But the discrete device has no interest in solving for 𝓜 at all. We don't use our theories of expressiveness and abstractiveness to define a language; we only use them to study languages that we've devised by other means.

Retrenching

At this juncture in the narrative, imho we've done rather well at reducing the question of the analogy to its essence. But we still don't have an answer, and we've exhausted our supply of insights in getting this far. We need more ammunition. What else do we know about these continuous and discrete schemes, to break us out of this latest impasse?

For one thing, the asymmetric presence of solving in the continuous scheme is a consequence of a difference in the practical purposes for which the two mathematical schemes were devised. In physics, we try to set up a mathematical system that matches reality, and then use the math to translate observations into constraints and, thus, predictions. Solving is the point of the mathematical exercise, telling us what we should expect to deduce from observations if the theory is correct. Whereas in expressiveness/abstractiveness, we consider various possible programming languages —all of which are of our own devising— and study the different consequences of them. Rather than trying to figure out the properties of the world we're living in, we study the properties of different invented worlds in hopes of deciding which we'd like to live in.

The difference goes deeper, though. Even if we did want to solve for state machine 𝓜 in the discrete scheme —in effect, solve for the programming language— we wouldn't find expressiveness/abstractiveness, nor even semantics, at all adequate to the task. Recall from the above account of the continuous scheme, when observing that 𝓜 and F are made of the same "stuff", I suggested that from F one might reconstruct 𝓜. Strange to tell, when I wrote that, I wasn't even thinking yet in terms of this gap in the analogy over solving for 𝓜 (though, obviously, that's where my reasoning was about to take me). There is nothing in manifold 𝓜 that's in any fundamental way more complicated than fields F etc. In stark contrast to the discrete case. Expressiveness/abstractiveness are constructed from families of functions over 𝓜; to the extent semantic/expressive/abstractive comparisons are anything more than directed graphs on the states of 𝓜, their complexity is an echo of a deliberately very limited part of the structural complexity latent in 𝓜 itself. While the internal structure of a programming language —state machine 𝓜 itself— is immensely complicated; tbh, it seems way more complicated than a mere space-time manifold, which is a continuum whereas a programming language is discontinuity incarnate.

The point about discontinuity recalls a key feature of the other analogy I drew, in my earlier post, between cosmological physics and term rewriting. Under that analogy, fundamental forces in physics corresponded to substitution functions in term-rewriting; while physical geometry, with its close relation to the gravitational force, corresponded to α-renaming, with its close relation to λ-substitution. In our current analogy, we've been treating continuity, with its role in ε-neighborhoods, as the defining feature of the manifold, and finding nothing like it in the syntactic complexity of a programming language. What if for purposes of solving, what we need is not so much continuity as geometry?

For this, we need to find something α-renaming-like in the discrete scheme, which is challenging because expressiveness/abstractiveness is not based on term-rewriting. There's more to it, though. Whatever-it-is, α-renaming-like, we mean to be the analog to the manifold. In other words, it's the discrete 𝓜. And since the whole programming language, or equivalently the state machine, evidently isn't α-renaming-like, 𝓜 isn't the programming language after all, but something else. Some facet of the programming language. And not just any facet — one that guides our discrete differentiation analogously to how the geometry of the manifold guides continuous differentiation.