Structural insight: discrete mathematics

Showing posts with label discrete mathematics. Show all posts

Tuesday, July 31, 2018

Co-hygiene and emergent quantum mechanics

Thus quantum mechanics occupies a very unusual place among physical theories: it contains classical mechanics as a limiting case, yet at the same time it requires this limiting case for its own formulation.
— Lev Landau and Evgeny Lifshitz, Quantum Mechanics: Non-relativistic Theory (3rd edition, 1977, afaik).

Gradually, across a series of posts exploring alternative structures for a basic theory of physics, I've been trying to tease together a strategy wherein quantum mechanics is, rather than a nondeterministic foundation of reality, an approximation valid for sufficiently small systems. This post considers how one might devise a concrete mathematical demonstration that the strategy can actually work.

I came into all this with a gnawing sense that modern physics had taken a conceptual wrong turn somewhere, that it had made some —unidentified— incautious structural assumption that ought not have been made and was leading it further and further astray. (I explored the philosophy of this at some depth in an earlier post in the series, several years ago by now.) The larger agenda here is to shake up our thinking on basic physics, accumulating different ways to structure theories so that our structural choices are made with eyes open, rather than just because we can't imagine an alternative. The particular notion I'm stalking atm —woven around the concept of co-hygiene, to be explained below— is, in its essence, that quantum mechanics might be an approximation, just as Newtonian mechanics is, and that the quantum approximation may be a consequence of the systems-of-interest being almost infinitesimally small compared to the cosmos as a whole. Quantum mechanics suggests that all the elementary parts of the cosmos are connected to all the other elementary parts, which is clearly not conducive to practical calculations. In the model I'm pursuing, each element is connected to just a comparatively few others, and the whole jostles about, with each adjustment to an element shuffling its remote connections so that over many adjustments the element gets exposed to many other elements. Conjecturally, if a sufficiently small system interacts in this way with a sufficiently vast cosmos, the resulting behavior of the small system could look a lot like nondeterminism.

The question is, could it look like quantum mechanics?

As I've remarked before, my usual approach to these sorts of posts is to lift down off my metaphorical shelf the assorted fragments I've got on the topic of interest; lay out the pieces on the table, adding at the same time any new bits I've lately collected; inspect them all severally and collectively, rearranging them and looking for new patterns as I see them all afresh; and record my trail of thought as I do so. Sometimes I find that since the last time I visited things, my whole perception of them has shifted (I was, for example, struck in a recent post by how profoundly my perception of Church's λ-calculus has changed just in the past several years). Hopefully I glean a few new insights from the fresh inspection, some of which find their way into the new groupings destined to go back up on the shelf to await the next time, while some other, more speculative branches of reasoning that don't make it into my main stream of thought are preserved in my record for possible later pursuit.

Moreover, each iteration achieves focus by developing some particular theme within its line of speculation; some details of previous iterations are winnowed away to allow an uncluttered view of the current theme; and once the new iteration reaches its more-or-less-coherent insights, such as they are, a reset is then wanted, to unclutter the next iteration. Most of the posts in this series —with a couple of exceptions (1, 2)— have focused on the broad structure of the cosmos, touching only lightly on concrete mathematics of modern physics that, after all, I've suspected from the start of favoring incautious structural assumptions. This incremental shifting between posts is why, within my larger series on physics, the current post has a transitional focus: reviewing the chosen cosmological structure in order to apply it to the abstract structure of the mathematics, preparing from abstract ground to launch an assault on the concrete.

Though I'll reach a few conclusions here —oriented especially toward guidance for the next installment in the series— much of this is going to dwell on reasons why the problem is difficult, which if one isn't careful could create a certain pessimism toward the whole prospect. I'm moderately optimistic that the problem can be pried open, over a sufficient number of patient iterations of study. The formidable appearance of a mountain in-the-large oughtn't prevent us from looking for a way to climb it.

Contents
Co-hygiene
Primitive wave functions
Probability distributions
Quantum/classical interface
Genericity
The universe says 'hi'
The upper box

Co-hygiene

The schematic mathematical model I'm considering takes the cosmos to be a vast system of parts with two kinds of connections between them: local (geometry), and non-local (network). The system evolves by discrete transformational steps, which I conjecture may be selected based entirely on local criteria but, once selected, may draw information from both local and non-local connections and may have both local and non-local effects. The local part of all this would likely resemble classical physics.

When a transformation step is applied, its local effect must be handled in a way that doesn't corrupt the non-local network; that's called hygiene. If the non-local effect of a step doesn't perturb pre-existing local geometry, I call that co-hygiene. Transformation steps are not required in general to be co-hygienic; but if they are, then local geometry is only affected by local transformation steps, giving the steps a close apparent affinity with the local geometry, and I conjectured this could explain why gravity seems more integrated with spacetime than do the other fundamental forces. (Indeed, wondering why gravity would differ from the other fundamental forces was what led me into the whole avenue of exploration in the first place.)

Along the way, though, I also wondered if the non-local network could explain why the system deviated from "classical" behavior. Here I hit on an idea that offered a specific reason why quantum mechanics might be an approximation that works for very small systems. My inspiration for this sort of mathematical model was a class of variant λ-calculi (in fact, λ-calculus is co-hygienic, while in my dissertation I studied variant calculi that introduce non-co-hygienic operations to handle side-effects); and in those variant calculi, the non-local network topology is highly volatile. That is, each time a small subsystem interacts non-locally with the rest of the system, it may end up with different network neighbors than it had before. This means that if you're looking at a subsystem that is smaller than the whole system by a cosmically vast amount — say, if the system as a whole is larger than the subsystem by a factor of 10⁷⁰ or 10⁸⁰ — you might perform a very large number of non-local interactions and never interact with the same network-neighbor twice. It would be, approximately, as if there were an endless supply of other parts of the system for you to interact non-locally with. Making the non-local interactions look rather random.

Without the network-scrambling, non-locality alone would not cause this sort of seeming-randomness. The subsystem of interest could "learn" about its network neighbors through repeated interaction with them, and they would become effectively just part of its internal state. Thus, the network-scrambling, together with the assumption that the system is vastly larger than the subsystem, would seem to allow the introduction of an element of effective nondeterminism into the model.

But, is it actually useful to introduce an element of effective nondeterminism into the model? Notwithstanding Einstein's remark about whether or not God plays dice, if you start with a classical system and naively introduce a random classical element into it, you don't end up with a quantum wave function. (There is a vein of research, broadly called stochastic electrodynamics, that seeks to derive quantum effects from classical electrodynamics with random zero-point radiation on the order of Planck's constant, but apparently they're having trouble accounting for some quantum effects, such as quantum interference.) To turn this seeming-nondeterminism to the purpose would require some more nuanced tactic.

There is, btw, an interesting element of flexibility in the sort of effective-nondeterminism introduced: The sort of mathematical model I'm conjecturing has deterministic rules, so conceivably there could be some sort of invariant properties across successive rearrangements of the network topology. Thus, some kinds of non-local influences could be seemingly-random while others might, at least under some particular kinds of transformation (such as, under a particular fundamental force), be constant. The subsystem of interest could "learn" these invariants through repeated interactions, even though other factors would remain unlearnable. In effect, these invariants would be part of the state of the subsystem, information that one would include in a description of the subsystem but that, in the underlying mathematical model, would be distributed across the network.

Primitive wave functions

Suppose we're considering some very small physical system, say a single electron in a potential field.

A potential field, as I suggested in a previous post, is a simple summation of combined influences of the rest of the cosmos on the system of interest, in this case our single electron. Classically —and under Relativity— the potential field would tell us nothing about non-local influences on the electron. In this sort of simple quantum-mechanical exercise, the potential field used is, apparently, classical.

The mathematical model in conventional quantum mechanics posits, as its underlying reality, a wave function — a complex- (or quaternion-, or whatever-) valued field over the state space of the system, obeying some wave equation such as Schrödinger's,

iℏ ∂ Ψ

∂t

= Ĥ Ψ .

This posited underlying reality has no electron in the classical sense of something that has a precise position and momentum at each given time; the wave function is what's "really" there, and any observation we would understand as measuring the position or momentum of the electron is actually drawing on the information contained in the wave function.

While the wave function evolves deterministically, the mathematical model as a whole presents a nondeterministic theory. This nondeterminism is not a necessary feature of the theory. An alternative mathematical model exists, giving exactly the same predictions, in which there is an electron there in the classical sense, with precise position and momentum at each given time. Of course its position and momentum can't be simultaneously known by an observer (which would violate the Heisenberg uncertainty principle); but in the underlying model the electron does have those unobservable attributes. David Bohm published this model in 1952. However Bohm's model doesn't seem to have offered anything except a demonstration that quantum theory does not prohibit the existence of an unobservable deterministic classical electron. In Bohm's model, the electron had a definite position and momentum, yes, but it was acted on by a "pilot wave" that, in essence, obeyed Schrödinger's equation. And Schrödinger's equation is non-local, in the sense that not only does it allow information (unobservable information) to propagate faster than light, it allows it to "propagate" infinitely fast; the hidden information in the wave function does not really propagate "through" space, it just shows up wherever the equation says it should. Some years later, Bell's Theorem would show that this sort of non-locality is a necessary feature of any theory that always gives the same predictions as quantum mechanics (given some other assumptions, one of which I'm violating; I'll get back to that below); but my main point atm is that Bohm's model doesn't offer any new way of looking at the wave function itself. You still have to just accept the wave function as a primitive; Bohm merely adds an extra stage of reasoning in understanding how the wave function applies to real situations. If there's any practical, as opposed to philosophical, advantage to using Bohm's model, it must be a subtle one. Nevertheless, it does reassure us that there is no prohibition against a model in which the electron is a definite, deterministic thing in the classical sense.

The sort of model I'm looking for would have two important differences from Bohm's.

First, the wave function would not be primitive at all, but instead would be a consequence of the way the local-geometric aspect of the cosmos is distorted by the new machinery I'm introducing. The Schrödinger equation, above, seems to have just this sort of structure, with Ĥ embodying the classical behavior of the system while the rest of the equation is the shape of the distorting lens through which the classical behavior passes to produce its quantum behavior. The trick is to imagine any sensible way of understanding this distorting lens as a consequence of some deeper representation (keeping in mind that the local-geometric aspect of the cosmos needn't be classical physics as such, though this would be one's first guess).

A model with different primitives is very likely to lead to different questions; to conjure a quote from Richard Feynman, "by putting the theory in a certain kind of framework you get an idea of what to change". Hence a theory in which the wave function is not primitive could offer valuable fresh perspective even if it isn't in itself experimentally distinguishable from quantum mechanics. There's also the matter of equivalent mathematical models that are easier or harder to apply to particular problems — conventional quantum mechanics is frankly hard to apply to almost any problem, so it's not hard to imagine an equivalent theory with different primitives could make some problems more tractable.

Second, the model I'm looking for wouldn't, at least not necessarily, always produce the same predictions as quantum mechanics. I'm supposing it would produce the same predictions for systems practically infinitesimal compared to the size of the cosmos. Whether or not the model would make experimentally distinguishable predictions from quantum mechanics at a cosmological scale, would seem to depend on how much, or little, we could work out about the non-local-network part of the model; perhaps we'd end up with an incomplete model where the network part of it is just unknown, and we'd be none the wiser (but for increased skepticism about some quantum predictions), or perhaps we'd find enough structural clues to conjecture a more specific model. Just possibly, we'd end up with some cosmological questions to distinguish possible network structures, which (as usual with questions) could be highly fruitful regardless of whether the speculations that led to the questions were to go down in flames, or, less spectacularly, were to produce all the same predictions as quantum mechanics after all.

Probability distributions

Wave functions have always made me think of probability distributions, as if there ought to be some deterministic thing underneath whose distribution of possible states is generating the wave function. What's missing is any explanation of how to generate a wave-function-like thing from a classical probability distribution. (Not to lose track of the terminology, this is classical in the sense of classical probability, which in turn is based on classical logic, rather than classical physics as such. Though they all come down to us from the late nineteenth century, and complement each other.)

A classical probability distribution, as such, is fairly distinctive. You have an observable with a range of possible values, and you have a range of possible worlds each of which induces an observable value. Each possible world has a non-negative-real likelihood. The (unnormalized) probability distribution for the observable is a curve over the range of observable values, summing for each observable value the likelihoods of all possible worlds that yield that observable value. The probability of the observable falling in a certain interval is the area under the curve over that interval, divided by the area under the curve over the entire range of observable values. If you add together two mutually disjoint sets of possibilities, the areas under their curves simply add, since for each observable value the set of possible worlds yielding it is just the ones in the first set and the ones in the second set.

The trouble is, that distinctive pattern of a classical probability distribution is not how wave functions work. When you add together two wave functions, the two curves get added all right, but the values aren't unsigned reals; they can cancel each other, producing an interference pattern as in classic electron diffraction. (I demonstrated the essential role of cancellation, and a very few other structural elements, in quantum mechanical behavior in a recent post.) As an additional plot twist, the wave function values add, but the probability isn't their sum but (traditionally) the square of the magnitude of their sum.

One solution is to reject classical logic, since classical logic gives rise to the addition rule for deterministic probability distributions. Just say the classical notion of logical disjunction (and conjunction, etc.) is wrong, and quantum logic is the way reality works. While you're at it, invoke the idea that the world doesn't have to make sense to us (I've remarked before on my dim view of the things beyond mortal comprehension trope). Whatever its philosophical merits or demerits, this approach doesn't fit the current context for two reasons: it treats the wave function as primitive whereas we're interested in alternative primitives, so it doesn't appear to get us anywhere new/useful; and, even if it did get us somewhere useful (which it apparently doesn't), it's not the class of mathematical model I'm exploring here. I'm pursuing a mathematical model spiritually descended from λ-calculus, which is very much in the classical deterministic tradition.

So, we're looking for a way to derive a wave function from a classical probability distribution. One has to be very canny about approaching something like this. It's not plausible this would be untrodden territory; the strategy would naturally suggest itself, and lots of very smart, highly trained physicists with strong motive to consider it have had nearly a century in which to do so. Yet, frankly, if anyone had succeeded it ought to be well-known in alternative-QM circles, and I'd hope to have at least heard of it. So going into the thing one should apply a sort of lamppost principle, and ask what one is bringing to the table that could possibly allow one to succeed where they did not. (A typical version of the lamppost principle would say, if you've lost your keys at night somewhere on a dark street with a single lamppost, you should look for them near the lamppost since your chances of finding them if they're somewhere else are negligible. Here, to mix the metaphors, the something-new you bring to the table is the location of your lamppost.)

I'm still boggled by how close the frontier of human knowledge is. In high school I chose computer science for a college major partly (though only partly) because it seemed to me like there was so much mathematics you could spend a lifetime on it without reaching the frontier — and yet, by my sophomore year in college I was exploring extracurricularly some odd corner of mathematics (I forget what, now) that had clearly never been explored before. And now I'm recently disembarked from a partly-mathematical dissertation; a doctoral dissertation being, rather by definition, stuff nobody has ever done before. The idea that the math I was doing in my dissertation was something nobody had ever done before, is just freaky. At any rate, I'm bringing to this puzzle in physics a mathematical perspective that's not only unusual for physics, but unique even in the branch of mathematics I brought it from.

The particular mathematical tools I'm mainly trying to apply are:

"metatime" (or whatever else one wants to call it), over which the cosmos evolves by discrete transformation steps. This is the thing I'm doing that breaks the conditions for Bell's Theorem; but all I've shown it works for is reshaping a uniform probability distribution into one that violates Bell's Inequality (here), whereas now we're not just reshaping a particular distribution but trying to mess with the rules by which distributions combine.

My earlier post on metatime was explicitly concerned with the fact that quantum-mechanical predictions, while non-local with respect to time, could still be local with respect to some orthogonal dimension ("metatime"). Atm I'm not centrally interested in strict locality with respect to metatime; but metatime still interests me as a potentially useful tactic for a mathematical model, offering a smooth way to convert a classical probability distribution into time-non-locality.
transformation steps that aggressively scramble non-local network topology. This seems capable of supplying classical nondeterminism (apparently, on a small scale); but the apparent nondeterminism we're after isn't classical.
a broad notion that the math will stop looking like a wave function whenever the network scrambling ceases to sufficiently approximate classical nondeterminism (which ought to happen at large scales). But this only suggests that the nondeterminism would be a necessary ingredient in extracting a wave function, without giving any hint of what would replace the wave function when the approximation fails.

These are some prominent new things I'm bringing to the table. At least the second and third are new. Metatime is a hot topic atm, under a different name (pseudo-time, I think), as a device of the transactional interpretation of QM (TI). Advocates recommend TI as eliminating the conceptual anomalies and problems of other interpretations — EPR paradox, Schrödinger's cat, etc. — which bodes well for the utility of metatime here. I don't figure TI bears directly on the current purpose though because, as best I can tell, TI retains the primitive wave function. (TI does make another cameo appearance, below.)

On the problem of deriving the wave function, I don't know of any previous work to draw on. There certainly could be something out there I've simply not happened to cross paths with, but I'm not sanguine of finding such; for the most part, the subject suffers from a common problem of extra-paradigm scientific explorations: researchers comparing the current paradigm to its predecessor are very likely to come to the subject with intense bias. Researchers within the paradigm take pains to show that the old paradigm is wrong; researchers outside the paradigm are few and idiosyncratic, likely to be stuck on either the old paradigm or some other peculiar idea.

The bias by researchers within the paradigm, btw, is an important survival adaptation of the scientific species. The great effectiveness of paradigm science — which benefits its evolutionary success — is in enabling researchers to focus sharply on problems within the paradigm by eliminating distracting questions about the merits of the paradigm; and therefore those distracting questions have to be crushed decisively whenever they arise. It's hard to say whether this bias is stronger in the first generation of scientists under a paradigm, who have to get it moving against resistance from its predecessor, or amongst their successors trained within the zealous framework inherited from the first generation; either way, the bias tends to produce a dearth of past research that would aid my current purpose.

A particularly active, and biased, area of extra-paradigm science is no-go theorems, theorems proving that certain alternatives to the prevailing paradigm cannot be made to work (cf. old post yonder). Researchers within the paradigm want no-go theorems to crush extra-paradigm alternatives once and for all, and proponents of that sort of crushing agenda are likely, in their enthusiasm, to overlook cases not covered by the formal no-go-result. Extra-paradigm researchers, in contrast, are likely to ferret out cases not covered by the result and concentrate on those cases, treating the no-go theorems as helpful hints on how to build alternative ideas rather than discouragement from doing so. The paradigm researchers are likely to respond poorly to this, and accuse the alternative-seekers of being more concerned with rejecting the paradigm than with any particular alternative. The whole exchange is likely to generate much more heat than light.

Quantum/classical interface

A classical probability distribution is made up of possibilities. One of them is, and the others are not; we merely don't know which one is. This is important because it means there's no way these possibilities could ever interact with each other; the one that is has nothing to interact with because in fact there are no other possibilities. That is, the other possibilities aren't; they exist only in our minds. This non-interaction is what makes the probability distribution classical. Therefore, in considering ways to derive our wave function from classical probability distributions, any two things in the wave function that interact with each other do not correspond to different classical possibilities.

It follows that quantum states — those things that can be superposed, interfere with each other, and partly cancel each other out — are not separated by a boundary between different classical possibilities. This does not, on the face of it, prohibit superposable elements from being prior or orthogonal to such boundaries, so that the mathematical model superposes entities of some sort and then applies them to a classical probability distribution (or applies the distribution to them). Also keep in mind, though we're striving for a model in which the wave function isn't primitive, we haven't pinned down yet what is primitive.

Now, the wave function isn't a thing. It isn't observable, and we introduce it into the mathematics only because it's useful. So if it also isn't primitive, one has to wonder whether it's even needed in the mathematics, or whether perhaps we're simply to replace it by something else. To get a handle on this, we need to look at how the wave function is actually used in applying quantum mechanics to physical systems; after all, one can't very well fashion a replacement for one part of a machine unless one understands how that part interacts with the rest of the machine.

The entire subject of quantum mechanics appears imho to be filled with over-interpretation; to the extent any progress has been made in understanding quantum mechanics over the past nearly-a-century, it's consisted largely in learning to prune unnecessary metaphysical underbrush so one has a somewhat better view of the theory.

The earliest, conventional "interpretation" of QM, the "Copenhagen interpretation", says properties of the physical system don't exist until observed. This, to be brutally honest, looks to me like a metaphysical statement without practical meaning. There is a related, but more practical, concept called contextuality; and an associated — though unfortunately technically messy — no-go theorem called the Kochen–Specker theorem, a.k.a. the Bell–Kochen–Specker theorem. This all relates to the Heisenberg uncertainty principle, which says that you can't know the exact position and momentum of a particle at the same time; the more you know about its position, the less you can know about its momentum, and vice versa. One might think this would be because the only way to measure the particle's position or momentum is to interact with it, which alters the particle because, well, because to every action there is an equal and opposite reaction. However, in the practical application of the wave function to a quantum-mechanical system, there doesn't appear to be any experimental apparatus within the quantum system for the equal-and-opposite-reaction to apply to. Instead, there's simply a wave function and then it collapses. Depending on what you choose to observe (say, the position or the momentum), it collapses differently, so that the unobservable internal state of the system actually remembers which you chose to observe. This property, that the (unobservable) internal state of the system changes as a result of what you choose to measure about it, is contextuality; and the Kochen–Specker theorem says a classical hidden-variable theory, consistent with QM, must be contextual (much as Bell's Theorem says it must be non-local). Remember Bohm's hidden-variable theory, in which the particle does have an unobservable exact position and momentum? Yeah. Besides being rampantly non-local, Bohm's model is also contextual: the particle's (unobservable, exact) position and momentum are guided by the wave function, and the wave-function is perturbed by the choice of measurement, therefore the particle's (unobservable, exact) position and momentum are also perturbed by the choice of measurement.

Bell, being of a later generation than Bohr and Einstein (and thus, perhaps, less invested in pre-quantum metaphysical ideas), managed not to be distracted by questions of what is or isn't "really there". His take on the situation was that the difficulty was in how to handle the interface between quantum reality and classical reality — not philosophically, but practically. To see this, consider the basic elements of an exercise in traditional QM (non-relativistic, driven by Schrödinger's equation):

A set of parameters define the classical state of the system; these become inputs to the wave equation. [typo fixed]
A Hamiltonian operator Ĥ embodies the classical dynamics of the system.
Schrödinger's equation provides quantum distortion of the classical system.
A Hermitian operator called an "observable" embodies the experimental apparatus used to observe the system. The wave function collapses to an eigenstate of the observable.

The observable is the interface between the quantum system and the classical world of the physicist; and Bell ascribes the difficulty to this interface. Consider a standard double-slit experiment in which an electron gun fires electrons one at a time through the double slit at a CRT screen where each electron causes a scintillation. As long as you don't observe which slit the electron passes through, you get an interference pattern from the wave function passing through the two slits, and that is quantum behavior; but there's nothing in the wave function to suggest the discreteness of the resulting scintillation. That discreteness results from the wave function collapse due to the observable, the interface with classical physics — and that discreteness is an essential part of the described physical reality. Scan that again: in order to fully account for physical reality, the quantum system has to encompass only a part of reality, because the discrete aspect of reality is only provided by the interface between the quantum system and surrounding classical physics. It seems that we couldn't describe the entire universe using QM even if we wanted to because, without a classical observable to collapse the wave function, the discrete aspect of physical reality would be missing. (Notice, this account of the difficulty is essentially structural, with only the arbitrary use of the term observable for the Hermitian operator as a vestige of the history of philosophical angst over the "role of the observer". It's not that there isn't a problem, but that presenting the problem as if it were philosophical only gets in the way of resolving it.)

The many-worlds interpretation of QM (MWI) says that the wave function does not, in fact, collapse, but instead the entire universe branches into multiples for the different possibilities described by the wave function. Bell criticized that while this is commonly presented as supposing that the wave function is "all there is", in fact it arbitrarily adds the missing discreteness:

the extended wave does not simply fail to specify one of the possibilities as actual...it fails to list the possibilities. When the M‍WI postulates the existence of many worlds in each of which the photographic plate is blackened at particular position, it adds, surreptitiously, the missing classification of possibilities. And it does so in an imprecise way, for the notion of the position of a black spot (it is not a mathematical point) [...] [or] reading of any macro‍scope instrument, is not mathematically sharp. One is given no idea of how far down towards the atomic scale the splitting of the world into branch worlds penetrates.
— J.S. Bell, "Six possible worlds of quantum mechanics", Speakable and unspeakable in quantum mechanics (anthology), 1993.

I'm inclined to agree: whatever philosophical comfort the M‍WI might provide to its adherents, it doesn't clarify the practical situation, and adds a great deal of conceptual machinery in the process of not doing so.

The transactional "interpretation" of QM is, afaik, somewhat lower-to-the-ground metaphysically. To my understanding, TI keeps everything in quantum form, and posits that spacetime events interact through a "quantum handshake": a wave propagates forward in time from an emission event, while another propagates backward in time from the corresponding absorption event, and they form a standing wave between the two while backward waves cancel out before the emission and forward waves cancel after the absorption. Proponents of the TI report that it causes the various paradoxes and conceptual anomalies of QM to disappear (cf. striking natural structure), and this makes sense to me because the "observable" Hermitian operator should be thus neatly accounted for as representing half of a quantum handshake, in which the "observer" half of the handshake is not part of the particular system under study. Wherever we choose to put the boundary of the system under study, the interface to our experimental apparatus would naturally have this half-a-handshake shape.

The practical lesson from the transactional interpretation seems to be that, for purposes of modeling QM, we don't have to worry about the wave function collapsing. If we can replicate the wave function, we're in. Likewise, if we can replicate the classical probability distributions that the wave function generates; so long as this includes all the probability distributions that result from weird quantum correlations (spooky action-at-a-distance). That the latter suffices, should be obvious since generating those probability distributions is the whole point of quantum theory; that the latter is possible is demonstrated by Bohm's hidden-variable theory (sometimes called the "Bohm Interpretation" by those focusing on its philosophy).

Genericity

There is something odd about the above list of basic elements of a QM exercise, when compared to the rewriting-calculus-inspired model we're trying to apply to it. When one thinks of a calculus term, it's a very concrete thing, with a specific representation (in fact over-specific, so that maintaining it may require α-renaming to prevent specific name choices from disrupting hygiene); and even classical physics seems to present a rather concrete representation. But the quantum distortion of the wave equation apparently applies to whatever description of a physical system we choose; to any choice of parameters and Ĥ, regardless of whether it bears any resemblance to classical physics. It certainly isn't specific to the representation of any single elementary unit, since it doesn't even blink (metaphorically) at shifting application from a one-electron to a two-electron system.

This suggests, to me anyway, two things. On the negative/cautionary side, it suggests a lack of information from which to choose a concrete representation for the "local" part of a physical system, which one might have thought would be the most straightforward and stable part of a cosmological "term". Perhaps more to the point, though, on the positive, insight-aiding side it suggests that if the quantum distortion is caused by some sort of non-local network playing out through rewrites in a dimension orthogonal to spacetime, we should consider trying to construct machinery for it that doesn't depend, much, on the particular shape of the local representation. If our distortion machinery does place some sort of constraints on local representation, they'd better be constraints that say something true about physics. Not forgetting, we expect our machinery to notice the difference between gravity and the other fundamental forces.

My most immediate goal, though, lest we forget, is to reckon whether it's at all possible any such machinery can produce the right sort of quantum distortion: a sanity check. Clues to the sort of thing one ought to look for are extremely valuable; but, having assimilated those clues, I don't atm require a full-blown theory, just a sense of what sort of thing is possible. Anything that can be left out of the demonstration probably should be. We're not even working with the best wave equation available; the Schrödinger equation is only an approximation covering the non-relativistic case. In fact, the transactional-interpretation folks tell us their equations require the relativistic treatment, so it's even conceivable the sanity check could run into difficulties because of the non-relativistic wave equation (though one might reasonably hope the sanity check wouldn't require anything so esoteric). But all this talk about relativistic and non-relativistic points out that there is, after all, something subtle about local geometry built into the form of the wave equation even though it's not directly visible in the local representation. In which case, the wave equation may still contain the essence of that co-hygienic difference between gravity and the other fundamental forces (although... for gravity even the usual special-relativistic Dirac equation might not be enough, and we'd be on to the Dirac equation for curved spacetime; let's hope we don't need that just yet).

The universe says 'hi'

Let's just pause here, take a breather and see where we are. The destination I've had my eye on, from the start of this post, was to demonstrate that a rewriting system, of the sort described, could produce some sort of quantum-like wave function. I've been lining up support, section by section, for an assault on the technical specifics of how to set up rewriting systems — and we're not ready for that yet. As noted just above, we need more information from which to choose a concrete representation. If we try to tangle with that stuff before we have enough clues from... somewhere... to guide us through it, we'll just tie ourselves in knots. This kind of exploration has to be approached softly, shifting artfully from one path to another from time to time so as not to rush into hazard on any one angle of attack. So, with spider-sense tingling —or perhaps thumbs pricking— I'll shift now to consider, instead of pieces of the cosmos, pieces of the theory.

In conventional quantum mechanics, as noted a couple of sections above, we've got basically three elements that we bring together: the parameters of our particular system of study, our classical laws of physics, and our wave equation. Well, yeah, we also have the Hermitian operator, but, as remarked earlier, we can set that aside since it's to do with interfacing to the system, which was our focus in that section but isn't what we're after now. The parameters of the particular system are what they are. The classical laws of physics are, we suppose, derived from the transformation rules of our cosmic rewriting system, with particular emphasis on the character of the primitive elements of the cosmos (whatever they are) and the geometry, and some degree of involvement of the network topology. The wave equation is also derived from the transformation rules, especially from how they interact with the network topology.

This analysis is already deviating from the traditional quantum scenario, because in the traditional scenario the classical laws of physics are strictly separate from the wave equation. We've had hints of something deep going on with the choice of wave equation; Transactional Interpretation researchers reporting that they couldn't use the non-relativistic wave equation; and then there was the odd intimation, in my recent post deriving quantum-like effects from a drastically simplified system that lacked a wave equation, that the lack of a wave equation was somehow crippling something to do with systemic coherence buried deep in the character of the mathematics. Though it does seem plausible that the wave equation would be derived more from the network topology, and perhaps the geometry, whereas the physical laws would be derived more from the character of the elementary physical components, it is perhaps only to be expected that these two components of the theory, laws and wave equation, would be coupled through their deep origins in the interaction of a single cosmological rewriting calculus.

Here is how I see the situation. We have a sort of black box, with a hand crank and input and output chutes, and the box is labeled physical laws + wave equation. We can feed into it the parameters of the particular physical system we're studying (such as a single electron in a potential field), carefully turn the crank (because we know it's a somewhat cantankerous device so that a bit of artistry is needed to keep it working smoothly), and out comes a wave function, or something akin, describing, in a predictive sense, the observable world. What's curious about this box is that we've looked inside, and even though the input and output are in terms of a classical world, inside the box it appears that there is no classical world. Odd though that is, we've gotten tolerably good at turning the crank and getting the box to work right. However, somewhere above that box, we are trying to assemble another box, with its own hand crank and input/output chutes. To this box, we mean to feed in our cosmic geometry, network topology, and transformation rules, and possibly some sort of initial classical probability distribution, and if we can get the ornery thing to work at all, we mean to turn the crank and get out of it — the physical laws plus wave equation.

Having arrived at this vision of an upper box, I was reading the other day a truthfully rather prosaic account of the party line on quantum mechanics (a 2004 book, not at all without merit as a big-picture description of mainstream thought, called Symmetry and the Beautiful Universe), and encountered a familiar rhetorical question of such treatments: when considering a quantum mechanical wave function, "‍[...] what is doing the waving?" And unlike previous times I'd encountered that question (years or decades before), this time the answer seemed obvious. The value of the wave function is not a property of any particular particle in the system being studied, nor is it even a property of the system-of-interest as a whole; it's not part of the input we feed into the lower box at all, rather it's a property of the state of the system and so part of the output. The wave equation describes what happens when the system-of-interest is placed into the context of a vastly, vastly larger cosmos (we're supposing it has to be staggeringly vaster than the system-of-interest in order for the trick to work right), and the whole is set to jostling about till it settles into a stable state. Evidently, the shape that the lower box gives to its output is the footprint of the surrounding cosmos. So this time when the question was asked, it seemed to me that what is waving is the universe.

The upper box

All we have to work with here are our broad guesses about the sort of rewriting system that feeds into the upper box, and the output of the lower box for some inputs. Can we deduce anything, from these clues, about the workings of the upper box?

As noted, the wave function that comes out of the lower box assigns a weight to each state of the entire system-of-interest, rather than to each part of the system. Refining that point, each weight is assigned to a complete state of the system-of-interest rather than to a separable state of a part of the system-of-interest. This suggests the weight (or, a weight) is associated with each particular possibility in the classical probability distribution that we're supposing is behind the wave equation generated by the upper box. Keep in mind, these possibilities are not possible states of the system-of-interest at a given time; they're possible states of the whole of spacetime; the shift between those two perspectives is a slippery spot to step carefully across.

A puzzler is that the weights on these different possibilities are not independent of each other; they form a coherent pattern dictated by the wave equation. Whatever classical scenario spacetime settles into, it apparently has to incorporate effective knowledge of other possible classical scenarios that it didn't settle into. Moreover, different classical scenarios for the cosmos must —eventually, when things stabilize— settle down to a weight that depends only on the state of our system-of-interest. Under the sort of structural discipline we're supposing, that correlation between scenarios is generated by any given possible spacetime jostling around between classical scenarios, and thus roaming over various possible scenarios to sample them. Evidently, the key to all of this must be the transitions between cosmic scenarios: these transitions determine how the weight changes between scenarios (whatever that weight actually is, in the underlying structure), how the approach to a stable state works (whatever exactly a stable state is), and, of course, how the classical probabilities eventually correlate with the weights. That's a lot of unknowns, but the positive insight here is that the key lever for all of it is the transitions between cosmic scenarios.

And now, perhaps, we are ready (though we weren't a couple of sections above) to consider the specifics of how to set up rewriting systems. Not, I think, at this moment; I'm saturated, which does tend to happen by the end of one of these posts; but as the next step, after these materials have gone back on the shelf for a while and had a chance to become new again. I envision practical experiments with how to assemble a rewriting system that, fed into the upper box, would cause the lower box to produce simple quantum-like systems. The technique is philosophically akin to my recent construction of a toy cosmos with just the barest skeleton of quantum-like structure, demonstrating that the most basic unclassical properties of quantum physics require almost none of the particular structure of quantum mechanics. That treatment particularly noted that the lack of a wave equation seemed especially problematic; the next step I envision would seek to understand how something like a wave equation could be induced from a rewriting system. Speculatively, from there one might study how variations of rewriting system produce different sorts of classical/quantum cosmos, and reason on toward what sort of rewriting system might produce real-world physics; a speculative goal perhaps quite different from where the investigation will lead in practice, but for the moment offering a plausible destination to make sail for.

Saturday, June 11, 2016

The co-hygiene principle

The mathematics is not there till we put it there.
— Sir Arthur Eddington, The Philosophy of Physical Science, 1938.

Investigating possible connections between seemingly unrelated branches of science and mathematics can be very cool. Independent of whether the connections actually pan out. It can be mind-bending either way — I'm a big fan of mind-bending, as a practical cure for rigid thinking — and you can get all sorts of off-beat insights into odd corners that get illuminated along the way. The more unlikely the connection, the more likely potential for mind bending; and also the more likely potential for pay-off if somehow it does pan out after all.

Two hazards you need to avoid, with this sort of thing: don't overplay the chances it'll pan out — and don't underplay the chances it'll pan out. Overplay and you'll sound like a crackpot and, worse, you might turn yourself into one. Relish the mind bending, take advantage of it to keep your thinking limber, and don't get upset when you're not finding something that might not be there. And at the same time, if you're after something really unlikely, say with only one chance in a million it'll pan out, and you don't leave yourself open to the possibility it will, you might just draw that one chance in a million and miss it, which would be just awful. So treat the universe as if it has a sense of humor, and be prepared to roll with the punchlines.

Okay, the particular connection I'm chasing is an observed analogy between variable substitution in rewriting calculi and fundamental forces in physics. If you know enough about those two subjects to say that makes no sense, that's what I thought too when I first noticed the analogy. It kept bothering me, though, because it hooks into something on the physics side that's already notoriously anomalous — gravity. The general thought here is that when two seemingly disparate systems share some observed common property, there may be some sort of mathematical structure that can be used to describe both of them and gives rise to the observed common property; and a mathematical modeling structure that explains why gravity is so peculiar in physics is an interesting prospect. So I set out to understand the analogy better by testing its limits, elaborating it until it broke down. Except, the analogy has yet to cooperate by breaking down, even though I've now featured it on this blog twice (1, 2).

So, building on the earlier explorations, in this post I tackle the problem from the other end, and try to devise a type of descriptive mathematical model that would give rise to the pattern observed in the analogy.

This sort of pursuit, as I go about it, is a game of endurance; again and again I'll lay out all the puzzle pieces I've got, look at them together, and try to accumulate a few more insights to add to the collection. Then gather up the pieces and save them away for a while, and come back to the problem later when I'm fresh on it again. Only this time I've kind-of succeeded in reaching my immediate goal. The resulting post, laying out the pieces and accumulating insights, is therefore both an explanation of where the result comes from and a record of the process by which I got there. There are lots of speculations within it shooting off in directions that aren't where I ended up. I pointedly left the stray speculations in place. Some of those tangents might turn out to be valuable after all; and taking them out would create a deceptive appearance of things flowing inevitably to a conclusion when, in the event, I couldn't tell whether I was going anywhere specific until I knew I'd arrived.

Naturally, for finding a particular answer — here, a mathematical structure that can give rise to the observed pattern — the reward is more questions.

Contents
Noether's Theorem
Calculi
Analogy
Metatime
Transformations
Determinism and rewriting
Nondeterminism and the cosmic footprint
Massive interconnection
Factorization
Side-effects
Co-hygiene
Epilog: hygiene

Noether's Theorem

Noether's theorem (pedantically, Noether's first theorem) says that each differentiable invariant in the action of a system gives rise to a conservation law. This is a particularly celebrated result in mathematical physics; it's explicitly about how properties of a system are implied by the mathematical structure of its description; and invariants — the current fad name for them in physics is "symmetries" — are close kin to both hygiene and geometry, which relate to each other through the analogy I'm pursuing; so Noether's theorem has a powerful claim on my attention.

The action of a system always used to seem very mysterious to me, until I figured out it's one of those deep concepts that, despite its depth, is also quite shallow. It comes from Lagrangian mechanics, a mathematical formulation of classical mechanics alternative to the Newtonian mechanics formulation. This sort of thing is ubiquitous in mathematics, alternative formulations that are provably equivalent to each other but make various problems much easier or harder to solve.

Newtonian mechanics seeks to describe the trajectory of a thing in terms of its position, velocity, mass, and the forces acting on it. This approach has some intuitive advantages but is sometimes beastly difficult to solve for practical problems. The Lagrangian formulation is sometimes much easier to solve. Broadly, the time evolution of the system follows a trajectory through abstract state-space, and a function called the Lagrangian of the system maps each state into a quantity that... er... well, its units are those of energy. For each possible trajectory of the system through state-space, the path integral of the Lagrangian is the action. The principle of least action says that starting from a given state, the system will evolve along the trajectory that minimizes the action. Solving for the behavior of the system is then a matter of finding the trajectory whose action is smallest. (How do you solve for the trajectory with least action? Well, think of the trajectories as abstract values subject to variation, and imagine taking the "derivative" of the action over these variations. The least action will be a local minimum, where this derivative is zero. There's a whole mathematical technology for solving problems of just that form, called the "calculus of variations".)

The Lagrangian formulation tends to be good for systems with conserved quantities; one might prefer the Newtonian approach for, say, a block sliding on a surface with friction acting on it. And this Lagrangian affinity for conservative systems is where Noether's theorem comes in: if there's a differentiable symmetry of the action — no surprise it has to be differentiable, seeing how central integrals and derivatives are to all this — the symmetry manifests itself in the system behavior as a conservation law.

And what, you may ask, is this magical Lagrangian function, whose properties studied through the calculus of variations reveal the underlying conservation laws of nature? Some deeper layer of reality, the secret structure that underlies all? Not exactly. The Lagrangian function is whatever works: some function that causes the principle of least action to correctly predict the behavior of the system. In quantum field theory — so I've heard, having so far never actually grappled with QFT myself — the Lagrangian approach works for some fields but there is no Lagrangian for others. (Yes, Lagrangians are one of those mathematical devices from classical physics that treats systems in such an abstract, holistic way that it's applicable to quantum mechanics. As usual for such devices, its history involves Sir William Rowan Hamilton, who keeps turning up on this blog.)

This is an important point: the Lagrangian is whatever function makes the least-action principle work right. It's not "really there", except in exactly the sense that if you can devise a Lagrangian for a given system, you can then use it via the action integral and the calculus of variations to describe the behavior of the system. Once you have a Lagrangian function that does in fact produce the system behavior you want it to, you can learn things about that behavior from mathematical exploration of the Lagrangian. Such as Noether's theorem. When you find there is, or isn't, a certain differentiable symmetry in the action, that tells you something about what is or isn't conserved in the behavior of the system, and that result really may be of great interest; just don't lose sight of the fact that you started with the behavior of the system and constructed a suitable Lagrangian from which you are now deducing things about what the behavior does and doesn't conserve.

In 1543, Copernicus's heliocentric magnum opus De revolutionibus orbium coelestium was published with an unsigned preface by Lutheran theologian Andreas Osiander saying, more or less, that of course it'd be absurd to suggest the Earth actually goes around the Sun but it's a very handy fiction for the mathematics. Uhuh. It's unnecessary to ask whether our mathematical models are "true"; we don't need them to be true, just useful. When Francis Bacon remarked that what is most useful in practice is most correct in theory, he had a point — at least, for practical purposes.

Calculi

The rewriting-calculus side of the analogy has a structural backstory from at least the early 1960s (some of which I've described in an earlier post, though with a different emphasis). Christopher Strachey hired Peter Landin as an assistant, and encouraged him to do side work exploring formal foundations for programming languages. Landin focused on tying program semantics to λ-calculus; but this approach suffered from several mismatches between the behavioral properties of programming languages versus λ-calculus, and in 1975 Gordon Plotkin published a solution for one of these mismatches, in one of the all-time classic papers in computer science, "Call-by-name, call-by-value and the λ-calculus" (pdf). Plotkin defined a slight variant of λ-calculus, by altering the conditions for the β-rule so that the calculus became call-by-value (the way most programming languages behaved while ordinary λ-calculus did not), and proved that the resulting λ_v-calculus was fully Church-Rosser ("just as well-behaved" as ordinary λ-calculus). He further set up an operational semantics — a rewriting system that ignored mathematical well-behavedness in favor of obviously describing the correct behavior of the programming language — and proved a set of correspondence theorems between the operational semantics and λ_v-calculus.

[In the preceding paragraph I perhaps should have mentioned compatibility, the other crucial element of rewriting well-behavedness; which you might think I'd have thought to mention since it's a big deal in my own work, though less flashy and more taken-for-granted than Church-Rosser-ness.]

Then in the 1980s, Matthias Felleisen applied Plotkin's approach to some of the most notoriously "unmathematical" behaviors of programs: side-effects in both data (mutable variables) and control (goto and its ilk). Like Plotkin, he set up an operational semantics and a calculus, and proved correspondence theorems between them, and well-behavedness for the calculus. He introduced the major structural innovation of treating a side-effect as an explicit syntactic construct that could move upward within its term. This upward movement would be a fundamentally different kind of rewrite from the function-application — the β-rule — of λ-calculus; abstractly, a side-effect is represented by a context 𝓢, which moves upward past some particular context C and, in the process, modifies C to leave in its wake some other context C': C[𝓢[T]] → 𝓢[C'[T]] . A side-effect is thus viewed as something that starts in a subterm and expands outward to affect more and more of the term until, potentially, it affects the whole term — if it's allowed to expand that far. Of course, a side-effect might never expand that far if it's trapped inside a context that it can't escape from; notably, no side-effect can escape from context λ.[ ] , which is to say, no side-effect can escape from inside the body of a function that hasn't been called.

This is where I started tracking the game, and developing my own odd notions. There seemed to me to be two significant drawbacks to Felleisen's approach, in its original published form. For one thing, the transformation of context C to C', as 𝓢 moved across it, could be quite extensive; Felleisen himself aptly called these transformations "bubbling up"; as an illustration of how messy things could get, here are the rules for a continuation-capture construct C expanding out of the operator or operand of a function call:

(CT₁)T₂ → C(λx₁.(T₁(λx₂.(A(x₁(x₂T₂)))))) for unused x_k.
V(CT) → C(λx₁.(T(λx₂.(A(x₁(V‍x₂)))))) for unused x_k.

The other drawback to the approach was that as published at the time, it didn't actually provide the full measure of well-behavedness from Plotkin's treatment of call-by-value. One way or another, a constraint had to be relaxed somewhere. What does the side-effect construct 𝓢 do once it's finished moving upward? The earliest published solution was to wait till 𝓢 reaches the top of the term, and then get rid of it by a whole-term rewriting rule; that works, but the whole-term rewriting rule is explicitly not well-behaved: calculus well-behavedness requires that any rewriting on a whole term can also be done to a subterm, and here we've deliberately introduced a rewriting rule that can't be applied to subterms. So we've weakened the calculus well-behavedness. Another solution is to let 𝓢 reach the top of the term, then let it settle into some sort of normal form, and relax the semantics–calculus correspondence theorems to allow for equivalent normal forms. So the correspondence is weaker or, at least, more complicated. A third solution is to introduce an explicit context-marker — in both the calculus and the operational semantics — delimiting the possible extent of the side-effect. So you've got full well-behavedness but for a different language than you started out with. (Felleisen's exploration of this alternative is part of the prehistory of delimited continuations, but that's another story.)

[In a galling flub, I'd written in the preceding paragraph Church-Rosser-ness instead of well-behavedness; fixed now.]

It occurred to me that a single further innovation should be able to eliminate both of these drawbacks. If each side-effect were delimited by a context-marker that can move upward in the term, just as the side-effect itself can, then the delimiter would restore full Church-Rosser-ness without altering the language behavior; but, in general, the meanings of the delimited side-effects depend on the placement of the delimiter, so to preserve the meaning of the term, moving the delimiter may require some systematic alteration to the matching side-effect markers. To support this, let the delimiter be a variable-binding construct, with free occurrences of the variable in the side-effect markers. The act of moving the delimiter would then involve a sort of substitution function that propagates needed information to matching side-effect markers. What with one thing and another, my academic pursuits dragged me away from this line of thought for years, but then in the 2000s I found myself developing an operational semantics and calculus as part of my dissertation, in order to demonstrate that fexprs really are well-behaved (though I should have anticipated that some people, having been taught otherwise, would refuse to believe it even with proof). So I seized the opportunity to also elaborate my binding-delimiters approach to things that — unlike fexprs — really are side-effects.

This second innovation rather flew in the face of a tradition going back about seven or eight decades, to the invention of λ-calculus. Alonzo Church was evidently quite concerned about what variables mean; he maintained that a proposition with free variables in it doesn't have a clear meaning, and he wanted to have just one variable-binding construct, λ, whose β-rule defines the practical meanings of all variables. This tradition of having just one kind of variable, one binding construct, and one kind of variable-substitution (β-substitution) has had a powerful grip on researchers' imaginations for generations, to the point where even when other binding constructs are introduced they likely still have most of the look-and-feel of λ. My side-effect-ful variable binders are distinctly un-λ-like, with rewriting rules, and substitution functions, bearing no strong resemblance to the β-rule. Freedom from the β mold had the gratifying effect of allowing much simpler rewriting rules for moving upward through a term, without the major perturbations suggested by the term "bubbling up"; but, unsurprisingly, the logistics of a wild profusion of new classes of variables were not easy to work out. Much elegant mathematics surrounding λ-calculus rests squarely on the known simple properties of its particular take on variable substitution. The chapter of my dissertation that grapples with the generalized notion of substitution (Chapter 13, "Substitutive Reduction Systems", for anyone keeping score) has imho appallingly complicated foundations, although the high-level theorems at least are satisfyingly powerful. One thing that did work out neatly was enforcement of variable hygiene, which in ordinary λ-calculus is handled by α-renaming. In order to apply any nontrivial term-rewriting rule without disaster, you have to first make sure there aren't some two variables using the same name whose distinction from each other would be lost during the rewrite. It doesn't matter, really, what sort of variables are directly involved in the rewrite rule: an unhygienic rewrite could mess up variables that aren't even mentioned by the rule. Fortunately, it's possible to define a master α-renaming function that recurses through the term renaming variables to maintain hygiene, and whenever you add a new sort of variable to the system, just extend the master function with particular cases for that new sort of variable. Each rewriting rule can then invoke the master function, and everything works smoothly.

I ended up with four classes of variables. "Ordinary" variables, of the sort supported by λ-calculus, I found were actually wanted only for a specific (and not even technically necessary) purpose: to support partial evaluation. You could build the whole calculus without them and everything would work right, but the equational theory would be very weak. (I blogged on this point in detail here.) A second class of variable supported continuations; in effect, the side-effect marker was a "throw" and the binding construct was a "catch". Mutable state was more complicated, involving two classes of variables, one for assignments and one for lookups. The variables for assignment were actually environment identities; each assignment side-effect would then specify a value, a symbol, and a free variable identifying the environment. The variables for lookup stood for individual environment-symbol queries; looking up a symbol in an environment would generate queries for that environment and each of its ancestor environments. The putative result of the lookup would be a leaf subterm with free variable occurrences for all the queries involved, waiting to assimilate the query results, while the queries themselves would rise through the term in search of matching assignments. Whenever a query found a matching assignment, it would self-annihilate while using substitution to report the result to all waiting free variable occurrences.

Does all this detail matter to the analogy with physics? Well, that's the question, isn't it. There's a lot there, a great deal of fodder to chew on when considering how an analogy with something else might have a structural basis.

Analogy

Amongst the four classes of variables, partial-evaluation variables have a peculiar sort of... symmetry. If you constructed a vau-calculus with, say, only continuation variables, you'd still have two different substitution functions — one to announce that a delimiting "catch" has been moved upward, and one for α-renaming. If you constructed a vau-calculus with only mutable-state variables, you'd have, well, a bunch of substitution functions, but in particular all the substitutions used to enact rewriting operations would be separate from α-renaming. β-substitution, though, is commensurate with α-renaming; once you've got β-substitution of partial-evaluation variables, you can use it to α-rename them as well, which is why ordinary λ-calculus has, apparently at least, only one substitution function.

Qualitatively, partial-evaluation variables seem more integrated into the fabric of the calculus, in contrast to the other classes of variables.

All of which put me powerfully in mind of physics because it's a familiar observation that gravity seems qualitatively more integrated into the fabric of spacetime, in contrast to the other fundamental forces (xkcd). General relativity portrays gravity as the shape of spacetime, whereas the other forces merely propagate through spacetime, and a popular strategy for aspiring TOEs (Theories Of Everything) is to integrate the other fundamental forces into the geometry as well — although, looking at the analogy, perhaps that popular strategy isn't such a good idea after all. Consider: The analogy isn't just between partial-evaluation variables and gravity. It's between the contrast of partial-evaluation variables against the other classes of variables, and the contrast of gravity against the other fundamental forces. All the classes of variables, and all the fundamental forces, are to some extent involved. I've already suggested that Felleisen's treatment of side-effects was both weakened and complicated by its too-close structural imitation of λ, whereas a less λ-like treatment of side-effects can be both stronger and simpler; so, depending on how much structure carries through the analogy, perhaps trying to treat the other fundamental forces too much like gravity should be expected to weaken and complicate a TOE.

Projecting through the analogy suggests alternative ways to structure theories of physics, which imho is worthwhile independent of whether the analogy is deep or shallow; as I've remarked before, I actively look for disparate ways of thinking as a broad base for basic research. The machinery of calculus variable hygiene, with which partial-evaluation variables have a special affinity, is only one facet of term structure; and projecting this through to fundamental physics, where gravity has a special affinity with geometry, this suggests that geometry itself might usefully be thought of, not as the venue where physics takes place, but merely as part of the rules by which the game is played. Likewise, the different kinds of variables differ from each other by the kinds of structural transformations they involve; and projecting that through the analogy, one might try to think of the fundamental forces as differing from each other not (primarily) by some arbitrary rules of combination and propagation, but by being different kinds of structural manipulations of reality. Then, if there is some depth to the analogy, one might wonder if some of the particular technical contrasts between different classes of variables might be related to particular technical contrasts between different fundamental forces — which, frankly, I can't imagine deciphering until and unless one first sets the analogy on a solid technical basis.

I've speculated several times on this blog on the role of non-locality in physics. Bell's Theorem says that the statistical distribution of quantum predictions cannot be explained by any local, deterministic theory of physics if, by 'local and deterministic', you mean 'evolving forward in time in a local and deterministic way'; but it's quite possible to generate this same statistical distribution of spacetime predictions using a theory that evolves locally and deterministically in a fifth dimension orthogonal to spacetime. Which strikes a familiar chord through the analogy with calculus variables, because non-locality is, qualitatively at least, the defining characteristic of what we mean by "side-effects", and the machinery of α-renaming maintains hygiene for these operations exactly by going off and doing some term rearranging on the side (as if in a separate dimension of rewriting that we usually don't bother to track). Indeed, thought of this way, a "variable" seems to be an inherently distributed entity, spread over a region of the term — called its scope — rather than located at a specific point. A variable instance might appear to have a specific location, but only because we look at a concrete syntactic term; naturally we have to have a particular concrete term in order to write it down, but somehow this doesn't seem to do justice to the reality of the hygiene machinery. One could think of an equivalence class of terms under α-renaming, but imho even that is a bit passive. The reality of a variable, I've lately come to think, is a dynamic distributed entity weaving through the term, made up of the binding construct (such as a λ), all the free instances within its scope, and the living connections that tie all those parts together; I imagine if you put your hand on any part of that structure you could feel it humming with vitality.

Metatime

To give a slightly less hand-wavy description of my earlier post on Bell's Theorem — since it is the most concrete example we have to inform our view of the analogy on the physics side:

Bell looked at a refinement of the experiment from the EPR paradox. A device emits two particles with entangled spin, which shoot off in opposite directions, and their spins are measured by oriented detectors at some distance from the emitter. The original objection of Einstein Podolsky and Rosen was that the two measurements are correlated with each other, but because of the distance between the two detectors, there's no way for information about either measurement to get to where the other measurement takes place without "spooky action at a distance". Bell refined this objection by noting that the correlation of spin measurements depends on the angle θ between the detectors. If you suppose that the orientations of the detectors at measurement is not known at the time and place where the particles are emitted, and that the outcomes of the measurements are determined by some sort of information — "hidden variable" — propagating from the emission event at no more than the speed of light, then there are limits (called Bell's Inequality) on how the correlation can be distributed as a function of θ, no matter what the probability distribution of the hidden variable. The distribution predicted by quantum mechanics violates Bell's Inequality; so if the actual probability distribution of outcomes from the experiment matches the quantum mechanical prediction, we're living in a world that can't be explained by a local hidden-variable theory.

My point was that this whole line of reasoning supposes the state of the world evolves forward in time. If it doesn't, then we have to rethink what we even mean by "locality", and I did so. Suppose our entire four-dimensional reality is generated by evolving over a fifth dimension, which we might as well call "metatime". "Locality" in this model means that information about the state of one part of spacetime takes a certain interval of metatime to propagate a certain distance to another part of spacetime. Instead of trying to arrange the probability distribution of a hidden variable at the emission event so that it will propagate through time to produce the desired probability distribution of measurements — which doesn't work unless quantum mechanics is seriously wrong about this simple system — we can start with some simple, uniform probability distribution of possible versions of the entire history of the experiment, and by suitably arranging the rules by which spacetime evolves, we can arrange that eventually spacetime will settle into a stable state where the probability distribution is just what quantum mechanics predicts. In essence, it works like this: let the history of the experiment be random (we don't need nondeterminism here; this is just a statement of uniformly unknown initial conditions), and suppose that the apparent spacetime "causation" between the emission and the measurements causes the two measurements to be compared to each other. Based on θ, let some hidden variable decide whether this version of history is stable; and if it isn't stable, just scramble up a new one (we can always do that by pulling it out of the uniform distribution of the hidden variable, without having to posit fundamental nondeterminism). By choosing the rule for how the hidden variable interacts with θ, you can cause the eventual stable history of the experiment to exhibit any probability distribution you choose.

That immense power is something to keep a cautious eye on: not only can this technique produce the probability distribution predicted by quantum mechanics, it can produce any other probability distribution as well. So, if the general structure of this mathematical theory determines something about the structure of the physical reality it depicts, what it determines is apparently not, in any very straightforward fashion, that probability distribution.

Transformations

The side of the analogy we have prior detailed structural knowledge about is the vau-calculus side. Whatever useful insights we may hope to extract from the metatime approach to Bell's Theorem, it's very sketchy compared to vau-calculus. So if we want to work out a structural pattern that applies to both sides of the analogy, it's plausible to start building from the side we know about, questioning and generalizing as we go along. To start with,

Suppose we have a complex system, made up of interconnected parts, evolving by some sort of transformative steps according to some simple rules.

Okay, freeze frame. Why should the system be made up of parts? Well, in physics it's (almost) always the parts we're interested in. We ourselves are, apparently, parts of reality, and we interact with parts of reality. Could we treat the whole as a unit and then somehow temporarily pull parts out of it when we need to talk about them? Maybe, but the form with parts is still the one we're primarily interested in. And what about "transformative steps"; do we want discrete steps rather than continuous equations? Actually, yes, that is my reading of the situation; not only does fundamental physics appear to be shot through with discreteness (I expanded on this point a while back), but the particular treatment I used for my metatime proof-of-concept (above) used an open-ended sequence of discrete trials to generate the requisite probability distribution. If a more thoroughly continuous treatment is really wanted, one might try to recover continuity by taking a limit a la calculus.

Suppose we separate the transformation rules into two groups, which we call bookkeeping rules and operational rules; and suppose we have a set of exclusive criteria on system configurations, call these hygiene conditions, which must be satisfied before any operational rule can be applied.

Freeze again. At first glance, this looks pretty good. From any unhygienic configuration, we can't move forward operationally until we've done bookkeeping to ensure hygiene. Both calculus rewriting and the metatime proof-of-concept seemingly conform to this pattern; but the two cases differ profoundly in how their underlying hygiene (supposing that's what it is, in the physics case) affects the form of the modeled system, and we'll need to consider the difference carefully if we mean to build our speculations on a sound footing.

Determinism and rewriting

Hygiene in rewriting is all about preserving properties of a term (to wit, variable instance–binding correspondences), whereas our proof-of-concept metatime transformations don't appear to be about perfectly preserving something but rather about shaping probability distributions. One might ask whether it's possible to set up the internals of our metatime model so that the probability distribution is a consequence, or symptom, of conserving something behind the scenes. Is the seemingly nondeterministic outcome of our quantum observation in a supposedly small quantum system here actually dictated by the need to maintain some cosmic balance that can't be directly observed because it's distributed over a ridiculously large number of entities (such as the number of electrons in the universe)? That could lead to some bracing questions about how to usefully incorporate such a notion into a mathematical theory.

As an alternative, one might decide that the probability distribution in the metatime model should not be a consequence of absolutely preserving a condition. There are two philosophically disparate sorts of models involving probabilities: either the probability comes from our lack of knowledge (the hidden-variable hypothesis), and in the underlying model the universe is computing an inevitable outcome; or the probability is in the foundations (God playing dice, in the Einsteinian phrase), and in the underlying model the universe is exploring the range of possible outcomes. I discussed this same distinction, in another form, in an earlier post, where it emerged as the defining philosophical distinction between a calculus and a grammar (here). In those terms, if our physics model is fundamentally deterministic then it's a calculus and by implication has that structural affinity with the vau-calculi on the other side of the analogy; but if our physics model is fundamentally nondeterministic then it's a grammar, and our analogy has to try to bridge that philosophical gap. Based on past experience, though, I'm highly skeptical of bridging the gap; if the analogy can be set on a concrete technical basis, the TOE on the physics side seems to me likely to be foundationally deterministic.

The foundationally deterministic approach to probability is to start with a probabilistic distribution of deterministic initial states, and evolve them all forward to produce a probabilistic distribution of deterministic final states. Does the traditional vau-calculus side of our analogy, where we have so much detail to start with, have anything to say about this? In the most prosaic sense, ones suspects not; probability distributions don't traditionally figure into deterministic computation semantics, where this approach would mean considering fuzzy sets of terms. There may be some insight lurking, though, in the origins of calculus hygiene.

When Alonzo Church's 1932/3 formal logic turned out to be inconsistent, he tried to back off and find some subset of it that was provably consistent. Here consistent meant that not all propositions are equivalent to each other, and the subset of the logic that he and his student J. Barkley Rosser proved consistent in this sense was what we now call λ-calculus. The way they did it was to show that if any term T₁ can be reduced in the calculus in two different ways, as T₂ and T₃, then there must be some T₄ that both of them can be reduced to. Since logical equivalence of terms is defined as the smallest congruence generated by the rewriting relation of the calculus, from the Church-Rosser property it follows that if two terms are equivalent, there must be some term that they both can be reduced to; and therefore, two different irreducible terms cannot possibly be logically equivalent to each other.

Proving the Church-Rosser theorem for λ-calculus was not, originally, a simple matter. It took three decades before a simple proof began to circulate, and the theorem for variant calculi continues to be a challenge. And this is (in one view of the matter, at least) where hygiene comes into the picture. Church had three major rewriting rules in his system, later called the α, β, and γ rules. The α rule was the "bookkeeping" rule; it allowed renaming a bound variable as long as you don't lose its distinction from other variables in the process. The β rule is now understood as the single operational rule of λ-calculus, how to apply a function to an argument. The γ rule is mostly forgotten now; it was simply doing a β-step backward, and was later dropped in favor of starting with just α and β and then taking the congruent closure (reflexive, symmetric, transitive, and compatible). Ultimately the Church-Rosser theorem allows terms to be sorted into β-equivalence classes; but the terms in each class aren't generally thought of as "the same term", just "equivalent terms". α-equivalent terms, though, are much closer to each other, and for many purposes would actually be thought of as "the same term, just written differently". Recall my earlier description of a variable as a distributed entity, weaving through the term, made up of binding construct, instances, and living connections between them. If you have a big term, shot through with lots of those dynamic distributed entities, the interweaving could really be vastly complex. So factoring out the α-renaming is itself a vast simplification, which for a large term may dwarf what's left after factoring to complete the Church-Rosser proof. To see by just how much the bookkeeping might dwarf the remaining operational complexity, imagine scaling the term up to the sort of cosmological scope mentioned earlier — like the number of electrons in the universe.

It seems worth considering, that hygiene may be a natural consequence of a certain kind of factoring of a vastly interconnected system: you sequester almost all of the complexity into bookkeeping with terrifically simple rules applied on an inhumanly staggering scale, and comparatively nontrivial operational rules that never have to deal directly with the sheer scale of the system because that part of the complexity was factored into the bookkeeping. In that case, at some point we'll need to ask when and why that sort of factoring is possible. Maybe it isn't really possible for the cosmos, and a flaw in our physics is that we've been trying so hard to factor things this way; when we really dive into that question we'll be in deep waters.

It's now no longer clear, btw, that geometry corresponds quite directly to α-renaming. There was already some hint of that in the view of vau-calculus side-effects as "non-local", which tends to associate geometry with vau-calculus term structure rather than α-renaming as such. Seemingly, hygiene is then a sort of adjunct to the geometry, something that allows the geometry to coexist with the massive interconnection of the system.

But now, with massive interconnection resonating between the two sides of the analogy, it's definitely time to ask some of those bracing questions about incorporating cosmic connectivity into a mathematical theory of physics.

Nondeterminism and the cosmic footprint

We want to interpret our probability distribution as a footprint, and reconstruct from it the massively connected cosmic order that walked there. Moreover, we're conjecturing that the whole system is factorable into bookkeeping/hygiene on one hand(?), and operations that amount to what we'd ordinarily call "laws of physics" on the other; and we'd really like to deduce, from the way quantum mechanics works, something about the nature of the bookkeeping and the factorization.

Classically, if we have a small system that's acted on by a lot of stuff we don't know about specifically, we let all those influences sum to a potential field. One might think of this classical approach as a particular kind of cosmological factorization in which the vast cosmic interconnectedness is reduced to a field, so one can then otherwise ignore almost everything to model the behavior of the small system of interest using a small operational set of physical laws. We know the sort of cosmic order that reduces that way, it's the sort with classical locality (relative to time evolution); and the vaster part of the factorization — the rest of the cosmos, that reduced to a potential field — is of essentially the same kind as the small system. The question we're asking at this point is, what sort of cosmic order reduces such that its operational part is quantum mechanics, and what does its bookkeeping part look like? Looking at vau-calculus, with its α-equivalence and Church-Rosser β-equivalence, it seems fairly clear that hygiene is an asymmetric factorization: if the cosmos factors this way, the bookkeeping part wouldn't have to look at all like quantum mechanics. A further complication is that quantum mechanics may be an approximation only good when the system you're looking at is vastly smaller than the universe as a whole; indeed, this conjecture seems rather encouraged by what happens when we try to apply our modern physical theories to the cosmos as a whole: notably, dark matter. (The broad notion of asymmetric factorizations surrounding quantum mechanics brings to mind both QM's notorious asymmetry between observer and observed, and Einstein's suggestion that QM is missing some essential piece of reality.)

For this factorization to work out — for the cosmic system as a whole to be broadly "metaclassical" while factoring through the bookkeeping to either quantum mechanics or a very good approximation thereof — the factorization has to have some rather interesting properties. In a generic classical situation where one small thing is acted on by a truly vast population of other things, we tend to expect all those other things to average out (as typically happens with a classical potential field), so that the vastness of the population makes their combined influence more stable rather than less; and also, as our subsystem interacts with the outside influence and we thereby learn more about that influence, we become more able to allow for it and still further reduce any residual unpredictability of the system.

Considered more closely, though, the expectation that summing over a large population would tend to average out is based broadly on the paradigm of signed magnitudes on an unbounded scale that attenuate over distance. If you don't have attenuation, and your magnitudes are on a closed loop, such as angles of rotation, increasing the population just makes things more unpredictable. Interestingly, I'd already suggested in one of my earlier explorations of the hygiene analogy that the physics hygiene condition might be some sort of rotational constraint, for the — seemingly — unrelated reason that the primary geometry of physics has 3+1 dimension structure, which is apparently the structure of quaternions, and my current sense of quaternions is that they're the essence of rotation. Though when things converge like this, it can be very hard to distinguish between an accidental convergence and one that simply reassembles fragments of a deeper whole.

I'll have a thought on the other classical problem — losing unpredictability as we learn more about outside influences over time — after collecting some further insights on the structural dynamics of bookkeeping.

Massive interconnection

Given a small piece of the universe, which other parts of the universe does it interact with, and how?

In the classical decomposition, all interactions with other parts of the universe are based on relative position in the geometry — that is, locality. Interestingly, conventional quantum mechanics retains this manner of selecting interactions, embedding it deeply into the equational structure of its mathematics. Recall the Schrödinger equation,

iℏ ∂ Ψ

∂t

= Ĥ Ψ .

The element that shapes the time evolution of the system — the Hamiltonian function Ĥ — is essentially an embodiment of the classical expectation of the system behavior; which is to say, interaction according to the classical rules of locality. (I discussed the structure of the Schrödinger equation at length in an earlier post, yonder.) Viewing conventional QM this way, as starting with classical interactions and then tacking on quantum machinery to "fix" it, I'm put in mind of Ptolemaic epicycles, tacked on to the perfect-circles model of celestial motion to make it work. (I don't mean the comparison mockingly, just critically; Ptolemy's system worked pretty well, and Copernicus used epicycles too. Turns out there's a better way, though.)

How does interaction-between-parts play out in vau-calculus, our detailed example of hygiene at work?

The whole syntax of a calculus term is defined by two aspects: the variables — by which I mean those "distributed entities" woven through the term, each made of a binding, its bound instances, and the connections that hygiene maintains between them — and, well, everything else. Once you ignore the specific identities of all the variable instances, you've just got a syntax tree with each node labeled by a context-free syntax production; and the context-free syntax doesn't have very many rules. In λ-calculus there are only four syntax rules: a term is either a combination, or a λ-abstraction, or a variable, or a constant. Some treatments simplify this by using variables for the "constants", and it's down to only three syntax rules. The lone operational rule β,

((λx.T₁)T₂) → T₁[x ← T₂] ,

gives a purely local pattern in the syntax tree for determining when the operation can be applied: any time you have a parent node that's a combination whose left child is a λ-abstraction. Operational rules stay nearly this simple even in vau-calculi; the left-hand side of each operational rule specifies when it can be applied by a small pattern of adjacent nodes in the syntax tree, and mostly ignores variables (thanks to hygienic bookkeeping). The right-hand side is where operational substitution may be introduced. So evidently vau-calculus — like QM — is giving local proximity a central voice in determining when and how operations apply, with the distributed aspect (variables) coming into play in the operation's consequences (right-hand side).

Turning it around, though, if you look at a small subterm — analogous, presumably, to a small system studied in physics — what rules govern its non-local connections to other parts of the larger term? Let's suppose the term is larger than our subterm by a cosmically vast amount. The free variables in the subterm are the entry points by which non-local influences from the (vast) context can affect the subterm (via substitution functions). And there is no upper limit to how fraught those sorts of interconnections can get (which is, after all, what spurs advocates of side-effect-less programming). That complexity belongs not to the "laws of physics" (neither the operational nor even the bookkeeping rules), but to the configuration of the system. From classical physics, we're used to having very simple laws, with all the complexity of our problems coming from the particular configuration of the small system we're studying; and now we've had that turned on its head. From the perspective of the rules of the calculus, yes, we still have very simple rules of manipulation, and all the complexity is in the arrangement of the particular term we're working with; but from the perspective of the subterm of interest, the interconnections imposed by free variables look a lot like "laws of physics" themselves. If we hold our one subterm fixed and allow the larger term around it to vary probabilistically then, in general, we don't know what the rules are and have no upper bound on how complicated those rules might be. All we have are subtle limits on the shape of the possible influences by those rules, imposed roundaboutly by the nature of the bookkeeping-and-operational transformations.

One thing about the shape of these nonlocal influences: they don't work like the local part of operations. The substitutive part of an operation typically involves some mixture of erasing bound variable instances and copying fragments from elsewhere. The upshot is that it rearranges the nonlocal topology of the term, that is, rearranges the way the variables interweave — which is, of course, why the bookkeeping rules are needed, to maintain the integrity of the interweaving as it winds and twists. And this is why a cosmic system of this sort doesn't suffer from a gradual loss of unpredictability as the subsystem interacts with its neighbors in the nonlocal network and "learns" about them: each nonlocal interaction that affects it changes its nonlocal-network neighbors. As long as the overall system is cosmically vast compared to the the subsystem we're studying, in practice we won't run out of new network neighbors no matter how many nonlocal actions our subsystem undergoes.

This also gives us a specific reason to suspect that quantum mechanics, by relying on this assumption of an endless supply of fresh network neighbors, would break down when studying subsystems that aren't sufficiently small compared to the cosmos as a whole. Making QM (as I've speculated before), like Newtonian physics, an approximation that works very well in certain cases.

Factorization

Here's what the reconstructed general mathematical model looks like so far:

The system as a whole is made up of interconnected parts, evolving by transformative steps according to simple rules.
The interconnections form two subsystems: local geometry, and nonlocal network topology.
The transformation rules are of two kinds, bookkeeping and operational.
Operational rules can only be applied to a system configuration satisfying certain hygiene conditions on its nonlocal network topology; bookkeeping, which only acts on nonlocal network topology, makes it possible to achieve the hygiene conditions.
Operational rules are activated based on local criteria (given hygiene). When applied, operations can have both local and nonlocal effects, while the integrity of the nonlocal network topology is maintained across both kinds of effect via hygiene, hence bookkeeping.

To complete this picture, it seems, we want to consider what a small local system consists of, and how it relates to the whole. This is all the more critical since we're entertaining the possibility that quantum mechanics might be an approximation that only works for a small local system, so that understanding how a local system relates to the whole may be crucial to understanding how quantum mechanics can arise locally from a globally non-quantum TOE.

A local system consists of some "local state", stuff that isn't interconnection of either kind; and some interconnections of (potentially) both kinds that are entirely encompassed within the local system. For example, in vau-calculi — our only prior source for complete examples — we might have a subterm (λx.(y‍x)). Variables are nonlocal network topology, of course, but in this case variable x is entirely contained within the local system. The choice of variable name "x" is arbitrary, as long as it remains different from "y" (hygiene). But what about the choice of "y"? In calculus reasoning we would usually say that because y is free in this subterm, we can't touch it; but that's only true if we're interested in comparing it to specific contexts, or to other specific subterms. If we're only interested in how this subterm relates to the rest of the universe, and we have no idea what the rest of the universe is, then the free variable y really is just one end of a connection whose other end is completely unknown to us; and a different choice of free variable would tell us exactly as much, and as little, as this one. We would be just as well off with (λx.(z‍x)), or (λz.(w‍z)), or even (λy.(x‍y)) — as long as we maintain the hygienic distinction between the two variables. The local geometry that can occur just outside this subterm, in its surrounding context, is limited to certain specific forms (by the context-free grammar of the calculus); the nonlocal network topology is vastly less constrained.

The implication here is that all those terms just named are effectively equivalent to (λx.(y‍x)). One might be tempted to think of this as simply different ways of "writing" the same local system, as in physics we might describe the same local system using different choices of coordinate axes; but the choice of coordinate axes is about local geometry, not nonlocal network topology. Here we're starting with simple descriptions of local systems, and then taking the quotient under the equivalence induced by the bookkeeping operations. It's tempting to think of the pre-quotient simple descriptions as "classical" and analogize the quotient to "quantum", but in fact there is a second quotient to be taken. In the metatime proof-of-concept, the rewriting kept reshuffling the entire history of the experiment until it reached a steady state — the obvious analogy is to a calculus irreducible term, the final result of the operational rewrite relation of the calculus. And this, at last, is where Church-Rosser-ness comes in. Church-Rosser is what guarantees that the same irreducible form (if any) is reached no matter in what order operational rules are applied. It's the enabler for each individual state of the system to evolve deterministically. To emphasize this point: Church-Rosser-ness applies to an individual possible system state, thus belongs to the deterministic-foundations approach to probabilities. The probability distribution itself is made up of individual possibilities each one of which is subject to Church-Rosser-ness. (Church-Rosser-ness is also, btw, a property of the mathematical model: one doesn't ask "Why should these different paths of system state evolution all come back together to the same normal form?", because that's simply the kind of mathematical model one has chosen to explore.)

The question we're trying to get a handle on is why the nonlocal effects of some operational rules would appear to be especially compatible with the bookkeeping quotient of the local geometry.

Side-effects

In vau-calculi, the nonlocal operational effects (i.e., operational substitution functions) that do not integrate smoothly with bookkeeping (i.e., with α-renaming) are the ones that support side-effects; and the one nonlocal operational effect that does integrate smoothly with bookkeeping — β-substitution — supports partial evaluation and turns out to be optional, in the sense that the operational semantics of the system could be described without that kind of nonlocal effect and the mathematics would still be correct with merely a weaker equational theory.

This suggests that in physics, gravity could be understood without bringing nonlocal effects into it at all, though there may be some sort of internal mathematical advantage to bringing them in anyway; while the other forces may be thought of as, in some abstract sense, side-effects.

So, what exactly makes a side-effect-ful substitution side-effect-ful? Conversely, β-substitution is also a form of substitution; it engages the nonlocal network topology, reshuffling it by distributing copies of subterms, the sort of thing I speculated above may be needed to maintain the unpredictability aspect of quantum mechanics. So, what makes β-substitution not side-effect-ful in character? Beyond the very specific technicalities of β- and α-substitution; and just how much, or little, should we be abstracting away from those technicalities? I'm supposing we have to abstract away at least a bit, on the principle that physics isn't likely to be technically close to vau-calculi in its mathematical details.

Here's a stab at a sufficient condition:

A nonlocal operational effect is side-effect-ful just if it perturbs the pre-existing local geometry.

The inverse property, called "purity" in a programming context (as in "pure function"), is that the nonlocal operational effect doesn't perturb the pre-existing local geometry. β-substitution is pure in this sense, as it replaces a free variable-instance with a subterm but doesn't affect anything local other than the variable-instance itself. Contrast this with the operational substitution for control variables; the key clauses (that is, nontrivial base cases) of the two substitutions are

x[x ← T] → T .
(τx.T)[x ← C] → (τx.C[T[x ← C]]) .

The control substitution alters the local-geometric distance between pre-existing structure T and whatever pre-existing immediate context surrounds the subterm acted on. Both substitutions have the — conjecturally important — property that they substantially rearrange the nonlocal network topology by injecting arbitrary new network connections (that is, new free variables). The introduction of new free variables is a major reason why vau-calculi need bookkeeping to maintain hygiene; although, interestingly, it's taken all this careful reasoning about bookkeeping to conclude that bookkeeping isn't actually necessary to the notion of purity/impurity (or side-effect-ful/non-side-effect-ful); apparently, bookkeeping is about perturbations of the nonlocal network topology, whereas purity/impurity is about perturbations of the local geometry. To emphasize the point, one might call this non-perturbation of local geometry co-hygiene — all the nonlocal operational effects must be hygienic, which might or might not require bookkeeping depending on internals of the mathematics, but only the β- and gravity nonlocal effects are co-hygienic.

Co-hygiene

Abstracting away from how we got to it, here's what we have:

A complex system of parts, evolving through a Church-Rosser transformation step relation.
Interconnections within a system state, partitioned into local (geometry) and nonlocal (network).
Each transformation step is selected locally.
The nonlocal effects of each transformation step rearrange — scramble — nonlocal connections at the locus where applied.
Certain transformative operations have nonlocal effects that do not disrupt pre-existing local structure — that are co-hygienic — and thereby afford particularly elegant description.

What sort of elegance is involved in the description of a co-hygienic operation depends on the technical character of the mathematical model; for β-reduction, what we've observed is functional compatibility between β- and α-substitution, while for gravity we've observed the general-relativity integration between gravity and the geometry of spacetime.

So my proposed answer to the conundrum I've been pondering is that the affinity between gravity and geometry suggests a modeling strategy with a nonlocal network pseudo-randomly scrambled by locally selected operational transformations evolving toward a stable state of spacetime, in which the gravity operations are co-hygienic. A natural follow-on question is just what sort of mathematical machinery, if any, would cause this network-scrambling to approximate quantum mechanics.

On the side, I've got intimations here that quantum mechanics may be an approximation induced by the pseudo-random network scrambling when the system under study is practically infinitesimal compared to the cosmos as a whole, and perhaps that the network topology has a rotational aspect.

Meanwhile, an additional line of possible inquiry has opened up. All along I've been trying to figure out what the analogy says about physics; but now it seems one might study the semantics of a possibly-side-effect-ful program fragment by some method structurally akin to quantum mechanics. The sheer mathematical perversity of quantum mechanics makes me skeptical that this could be a practical approach to programming semantics; on the other hand, it might provide useful insights for the TOE mathematics.

Epilog: hygiene

So, what happened to hygiene? It was a major focus of attention through nearly the whole investigation, and then dropped out of the plot near the end.

At its height of prestige, when directly analogized to spacetime geometry (before that fell through), hygiene motivated the suggestion that geometry might be thought of not as the venue where physics happens but merely as part of its rules. That suggestion is still somewhat alive since the proposed solution treats geometry as abstractly just one of the two forms of interconnection in system state, though there's a likely asymmetry of representation between the two forms. There was also some speculation that understanding hygiene on the physics side could be central to making sense of the model; that, I'd now ease off on, but do note that in seeking a possible model for physics one ought to keep an eye out for a possible bookkeeping mechanism, and certainly resolving the nonlocal topology of the model would seem inseparable from resolving its bookkeeping. So hygiene isn't out of the picture, and may even play an important role; just not with top billing.

Would it be possible for the physics model to be unhygienic? In the abstract sense I've ended up with, lack of hygiene would apparently mean an operation whose local effect causes nonlocal perturbation. Whether or not dynamic scope qualifies would depend on whether dynamically scoped variables are considered nonlocal; but since we expect some nonlocal connectivity, and those variables couldn't be perturbed by local operations without losing most/all of their nonlocality, probably the physics model would have to be hygienic. My guess is that if a TOE actually tracked the nonlocal network (as opposed to, conjecturally, introducing a quantum "blurring" as an approximation for the cumulative effect of the network), the tracking would need something enough like calculus variables that some sort of bookkeeping would be called for.