Saturday, June 11, 2016

The co-hygiene principle

The mathematics is not there till we put it there.
Sir Arthur Eddington, The Philosophy of Physical Science, 1938.

Investigating possible connections between seemingly unrelated branches of science and mathematics can be very cool.  Independent of whether the connections actually pan out.  It can be mind-bending either way — I'm a big fan of mind-bending, as a practical cure for rigid thinking — and you can get all sorts of off-beat insights into odd corners that get illuminated along the way.  The more unlikely the connection, the more likely potential for mind bending; and also the more likely potential for pay-off if somehow it does pan out after all.

Two hazards you need to avoid, with this sort of thing:  don't overplay the chances it'll pan out — and don't underplay the chances it'll pan out.  Overplay and you'll sound like a crackpot and, worse, you might turn yourself into one.  Relish the mind bending, take advantage of it to keep your thinking limber, and don't get upset when you're not finding something that might not be there.  And at the same time, if you're after something really unlikely, say with only one chance in a million it'll pan out, and you don't leave yourself open to the possibility it will, you might just draw that one chance in a million and miss it, which would be just awful.  So treat the universe as if it has a sense of humor, and be prepared to roll with the punchlines.

Okay, the particular connection I'm chasing is an observed analogy between variable substitution in rewriting calculi and fundamental forces in physics.  If you know enough about those two subjects to say that makes no sense, that's what I thought too when I first noticed the analogy.  It kept bothering me, though, because it hooks into something on the physics side that's already notoriously anomalous — gravity.  The general thought here is that when two seemingly disparate systems share some observed common property, there may be some sort of mathematical structure that can be used to describe both of them and gives rise to the observed common property; and a mathematical modeling structure that explains why gravity is so peculiar in physics is an interesting prospect.  So I set out to understand the analogy better by testing its limits, elaborating it until it broke down.  Except, the analogy has yet to cooperate by breaking down, even though I've now featured it on this blog twice (1, 2).

So, building on the earlier explorations, in this post I tackle the problem from the other end, and try to devise a type of descriptive mathematical model that would give rise to the pattern observed in the analogy.

This sort of pursuit, as I go about it, is a game of endurance; again and again I'll lay out all the puzzle pieces I've got, look at them together, and try to accumulate a few more insights to add to the collection.  Then gather up the pieces and save them away for a while, and come back to the problem later when I'm fresh on it again.  Only this time I've kind-of succeeded in reaching my immediate goal.  The resulting post, laying out the pieces and accumulating insights, is therefore both an explanation of where the result comes from and a record of the process by which I got there.  There are lots of speculations within it shooting off in directions that aren't where I ended up.  I pointedly left the stray speculations in place.  Some of those tangents might turn out to be valuable after all; and taking them out would create a deceptive appearance of things flowing inevitably to a conclusion when, in the event, I couldn't tell whether I was going anywhere specific until I knew I'd arrived.

Naturally, for finding a particular answer — here, a mathematical structure that can give rise to the observed pattern — the reward is more questions.

Noether's Theorem
Determinism and rewriting
Nondeterminism and the cosmic footprint
Massive interconnection
Epilog: hygiene
Noether's Theorem

Noether's theorem (pedantically, Noether's first theorem) says that each differentiable invariant in the action of a system gives rise to a conservation law.  This is a particularly celebrated result in mathematical physics; it's explicitly about how properties of a system are implied by the mathematical structure of its description; and invariants — the current fad name for them in physics is "symmetries" — are close kin to both hygiene and geometry, which relate to each other through the analogy I'm pursuing; so Noether's theorem has a powerful claim on my attention.

The action of a system always used to seem very mysterious to me, until I figured out it's one of those deep concepts that, despite its depth, is also quite shallow.  It comes from Lagrangian mechanics, a mathematical formulation of classical mechanics alternative to the Newtonian mechanics formulation.  This sort of thing is ubiquitous in mathematics, alternative formulations that are provably equivalent to each other but make various problems much easier or harder to solve.

Newtonian mechanics seeks to describe the trajectory of a thing in terms of its position, velocity, mass, and the forces acting on it.  This approach has some intuitive advantages but is sometimes beastly difficult to solve for practical problems.  The Lagrangian formulation is sometimes much easier to solve.  Broadly, the time evolution of the system follows a trajectory through abstract state-space, and a function called the Lagrangian of the system maps each state into a quantity that... er... well, its units are those of energy.  For each possible trajectory of the system through state-space, the path integral of the Lagrangian is the action.  The principle of least action says that starting from a given state, the system will evolve along the trajectory that minimizes the action.  Solving for the behavior of the system is then a matter of finding the trajectory whose action is smallest.  (How do you solve for the trajectory with least action?  Well, think of the trajectories as abstract values subject to variation, and imagine taking the "derivative" of the action over these variations.  The least action will be a local minimum, where this derivative is zero.  There's a whole mathematical technology for solving problems of just that form, called the "calculus of variations".)

The Lagrangian formulation tends to be good for systems with conserved quantities; one might prefer the Newtonian approach for, say, a block sliding on a surface with friction acting on it.  And this Lagrangian affinity for conservative systems is where Noether's theorem comes in:  if there's a differentiable symmetry of the action — no surprise it has to be differentiable, seeing how central integrals and derivatives are to all this — the symmetry manifests itself in the system behavior as a conservation law.

And what, you may ask, is this magical Lagrangian function, whose properties studied through the calculus of variations reveal the underlying conservation laws of nature?  Some deeper layer of reality, the secret structure that underlies all?  Not exactly.  The Lagrangian function is whatever works:  some function that causes the principle of least action to correctly predict the behavior of the system.  In quantum field theory — so I've heard, having so far never actually grappled with QFT myself — the Lagrangian approach works for some fields but there is no Lagrangian for others.  (Yes, Lagrangians are one of those mathematical devices from classical physics that treats systems in such an abstract, holistic way that it's applicable to quantum mechanics.  As usual for such devices, its history involves Sir William Rowan Hamilton, who keeps turning up on this blog.)

This is an important point:  the Lagrangian is whatever function makes the least-action principle work right.  It's not "really there", except in exactly the sense that if you can devise a Lagrangian for a given system, you can then use it via the action integral and the calculus of variations to describe the behavior of the system.  Once you have a Lagrangian function that does in fact produce the system behavior you want it to, you can learn things about that behavior from mathematical exploration of the Lagrangian.  Such as Noether's theorem.  When you find there is, or isn't, a certain differentiable symmetry in the action, that tells you something about what is or isn't conserved in the behavior of the system, and that result really may be of great interest; just don't lose sight of the fact that you started with the behavior of the system and constructed a suitable Lagrangian from which you are now deducing things about what the behavior does and doesn't conserve.

In 1543, Copernicus's heliocentric magnum opus De revolutionibus orbium coelestium was published with an unsigned preface by Lutheran theologian Andreas Osiander saying, more or less, that of course it'd be absurd to suggest the Earth actually goes around the Sun but it's a very handy fiction for the mathematics.  Uhuh.  It's unnecessary to ask whether our mathematical models are "true"; we don't need them to be true, just useful.  When Francis Bacon remarked that what is most useful in practice is most correct in theory, he had a point — at least, for practical purposes.


The rewriting-calculus side of the analogy has a structural backstory from at least the early 1960s (some of which I've described in an earlier post, though with a different emphasis).  Christopher Strachey hired Peter Landin as an assistant, and encouraged him to do side work exploring formal foundations for programming languages.  Landin focused on tying program semantics to λ-calculus; but this approach suffered from several mismatches between the behavioral properties of programming languages versus λ-calculus, and in 1975 Gordon Plotkin published a solution for one of these mismatches, in one of the all-time classic papers in computer science, "Call-by-name, call-by-value and the λ-calculus" (pdf).  Plotkin defined a slight variant of λ-calculus, by altering the conditions for the β-rule so that the calculus became call-by-value (the way most programming languages behaved while ordinary λ-calculus did not), and proved that the resulting λv-calculus was fully Church-Rosser ("just as well-behaved" as ordinary λ-calculus).  He further set up an operational semantics — a rewriting system that ignored mathematical well-behavedness in favor of obviously describing the correct behavior of the programming language — and proved a set of correspondence theorems between the operational semantics and λv-calculus.

[In the preceding paragraph I perhaps should have mentioned compatibility, the other crucial element of rewriting well-behavedness; which you might think I'd have thought to mention since it's a big deal in my own work, though less flashy and more taken-for-granted than Church-Rosser-ness.]

Then in the 1980s, Matthias Felleisen applied Plotkin's approach to some of the most notoriously "unmathematical" behaviors of programs:  side-effects in both data (mutable variables) and control (goto and its ilk).  Like Plotkin, he set up an operational semantics and a calculus, and proved correspondence theorems between them, and well-behavedness for the calculus.  He introduced the major structural innovation of treating a side-effect as an explicit syntactic construct that could move upward within its term.  This upward movement would be a fundamentally different kind of rewrite from the function-application — the β-rule — of λ-calculus; abstractly, a side-effect is represented by a context 𝓢, which moves upward past some particular context C and, in the process, modifies C to leave in its wake some other context C':  C[𝓢[T]] → 𝓢[C'[T]] .  A side-effect is thus viewed as something that starts in a subterm and expands outward to affect more and more of the term until, potentially, it affects the whole term — if it's allowed to expand that far.  Of course, a side-effect might never expand that far if it's trapped inside a context that it can't escape from; notably, no side-effect can escape from context λ.[ ] , which is to say, no side-effect can escape from inside the body of a function that hasn't been called.

This is where I started tracking the game, and developing my own odd notions.  There seemed to me to be two significant drawbacks to Felleisen's approach, in its original published form.  For one thing, the transformation of context C to C', as 𝓢 moved across it, could be quite extensive; Felleisen himself aptly called these transformations "bubbling up"; as an illustration of how messy things could get, here are the rules for a continuation-capture construct C expanding out of the operator or operand of a function call:

(CT1)T2   →   C(λx1.(T1(λx2.(A(x1(x2T2))))))    for unused xk.
V(CT)   →   C(λx1.(T(λx2.(A(x1(V‍x2))))))    for unused xk.
The other drawback to the approach was that as published at the time, it didn't actually provide the full measure of well-behavedness from Plotkin's treatment of call-by-value.  One way or another, a constraint had to be relaxed somewhere.  What does the side-effect construct 𝓢 do once it's finished moving upward?  The earliest published solution was to wait till 𝓢 reaches the top of the term, and then get rid of it by a whole-term rewriting rule; that works, but the whole-term rewriting rule is explicitly not well-behaved:  calculus well-behavedness requires that any rewriting on a whole term can also be done to a subterm, and here we've deliberately introduced a rewriting rule that can't be applied to subterms.  So we've weakened the calculus well-behavedness.  Another solution is to let 𝓢 reach the top of the term, then let it settle into some sort of normal form, and relax the semantics–calculus correspondence theorems to allow for equivalent normal forms.  So the correspondence is weaker or, at least, more complicated.  A third solution is to introduce an explicit context-marker — in both the calculus and the operational semantics — delimiting the possible extent of the side-effect.  So you've got full well-behavedness but for a different language than you started out with.  (Felleisen's exploration of this alternative is part of the prehistory of delimited continuations, but that's another story.)

[In a galling flub, I'd written in the preceding paragraph Church-Rosser-ness instead of well-behavedness; fixed now.]

It occurred to me that a single further innovation should be able to eliminate both of these drawbacks.  If each side-effect were delimited by a context-marker that can move upward in the term, just as the side-effect itself can, then the delimiter would restore full Church-Rosser-ness without altering the language behavior; but, in general, the meanings of the delimited side-effects depend on the placement of the delimiter, so to preserve the meaning of the term, moving the delimiter may require some systematic alteration to the matching side-effect markers.  To support this, let the delimiter be a variable-binding construct, with free occurrences of the variable in the side-effect markers.  The act of moving the delimiter would then involve a sort of substitution function that propagates needed information to matching side-effect markers.  What with one thing and another, my academic pursuits dragged me away from this line of thought for years, but then in the 2000s I found myself developing an operational semantics and calculus as part of my dissertation, in order to demonstrate that fexprs really are well-behaved (though I should have anticipated that some people, having been taught otherwise, would refuse to believe it even with proof).  So I seized the opportunity to also elaborate my binding-delimiters approach to things that — unlike fexprs — really are side-effects.

This second innovation rather flew in the face of a tradition going back about seven or eight decades, to the invention of λ-calculus.  Alonzo Church was evidently quite concerned about what variables mean; he maintained that a proposition with free variables in it doesn't have a clear meaning, and he wanted to have just one variable-binding construct, λ, whose β-rule defines the practical meanings of all variables.  This tradition of having just one kind of variable, one binding construct, and one kind of variable-substitution (β-substitution) has had a powerful grip on researchers' imaginations for generations, to the point where even when other binding constructs are introduced they likely still have most of the look-and-feel of λ.  My side-effect-ful variable binders are distinctly un-λ-like, with rewriting rules, and substitution functions, bearing no strong resemblance to the β-rule.  Freedom from the β mold had the gratifying effect of allowing much simpler rewriting rules for moving upward through a term, without the major perturbations suggested by the term "bubbling up"; but, unsurprisingly, the logistics of a wild profusion of new classes of variables were not easy to work out.  Much elegant mathematics surrounding λ-calculus rests squarely on the known simple properties of its particular take on variable substitution.  The chapter of my dissertation that grapples with the generalized notion of substitution (Chapter 13, "Substitutive Reduction Systems", for anyone keeping score) has imho appallingly complicated foundations, although the high-level theorems at least are satisfyingly powerful.  One thing that did work out neatly was enforcement of variable hygiene, which in ordinary λ-calculus is handled by α-renaming.  In order to apply any nontrivial term-rewriting rule without disaster, you have to first make sure there aren't some two variables using the same name whose distinction from each other would be lost during the rewrite.  It doesn't matter, really, what sort of variables are directly involved in the rewrite rule:  an unhygienic rewrite could mess up variables that aren't even mentioned by the rule.  Fortunately, it's possible to define a master α-renaming function that recurses through the term renaming variables to maintain hygiene, and whenever you add a new sort of variable to the system, just extend the master function with particular cases for that new sort of variable.  Each rewriting rule can then invoke the master function, and everything works smoothly.

I ended up with four classes of variables.  "Ordinary" variables, of the sort supported by λ-calculus, I found were actually wanted only for a specific (and not even technically necessary) purpose:  to support partial evaluation.  You could build the whole calculus without them and everything would work right, but the equational theory would be very weak.  (I blogged on this point in detail here.)  A second class of variable supported continuations; in effect, the side-effect marker was a "throw" and the binding construct was a "catch".  Mutable state was more complicated, involving two classes of variables, one for assignments and one for lookups.  The variables for assignment were actually environment identities; each assignment side-effect would then specify a value, a symbol, and a free variable identifying the environment.  The variables for lookup stood for individual environment-symbol queries; looking up a symbol in an environment would generate queries for that environment and each of its ancestor environments.  The putative result of the lookup would be a leaf subterm with free variable occurrences for all the queries involved, waiting to assimilate the query results, while the queries themselves would rise through the term in search of matching assignments.  Whenever a query found a matching assignment, it would self-annihilate while using substitution to report the result to all waiting free variable occurrences.

Does all this detail matter to the analogy with physics?  Well, that's the question, isn't it.  There's a lot there, a great deal of fodder to chew on when considering how an analogy with something else might have a structural basis.


Amongst the four classes of variables, partial-evaluation variables have a peculiar sort of... symmetry.  If you constructed a vau-calculus with, say, only continuation variables, you'd still have two different substitution functions — one to announce that a delimiting "catch" has been moved upward, and one for α-renaming.  If you constructed a vau-calculus with only mutable-state variables, you'd have, well, a bunch of substitution functions, but in particular all the substitutions used to enact rewriting operations would be separate from α-renaming.  β-substitution, though, is commensurate with α-renaming; once you've got β-substitution of partial-evaluation variables, you can use it to α-rename them as well, which is why ordinary λ-calculus has, apparently at least, only one substitution function.

Qualitatively, partial-evaluation variables seem more integrated into the fabric of the calculus, in contrast to the other classes of variables.

All of which put me powerfully in mind of physics because it's a familiar observation that gravity seems qualitatively more integrated into the fabric of spacetime, in contrast to the other fundamental forces (xkcd).  General relativity portrays gravity as the shape of spacetime, whereas the other forces merely propagate through spacetime, and a popular strategy for aspiring TOEs (Theories Of Everything) is to integrate the other fundamental forces into the geometry as well — although, looking at the analogy, perhaps that popular strategy isn't such a good idea after all.  Consider:  The analogy isn't just between partial-evaluation variables and gravity.  It's between the contrast of partial-evaluation variables against the other classes of variables, and the contrast of gravity against the other fundamental forces.  All the classes of variables, and all the fundamental forces, are to some extent involved.  I've already suggested that Felleisen's treatment of side-effects was both weakened and complicated by its too-close structural imitation of λ, whereas a less λ-like treatment of side-effects can be both stronger and simpler; so, depending on how much structure carries through the analogy, perhaps trying to treat the other fundamental forces too much like gravity should be expected to weaken and complicate a TOE.

Projecting through the analogy suggests alternative ways to structure theories of physics, which imho is worthwhile independent of whether the analogy is deep or shallow; as I've remarked before, I actively look for disparate ways of thinking as a broad base for basic research.  The machinery of calculus variable hygiene, with which partial-evaluation variables have a special affinity, is only one facet of term structure; and projecting this through to fundamental physics, where gravity has a special affinity with geometry, this suggests that geometry itself might usefully be thought of, not as the venue where physics takes place, but merely as part of the rules by which the game is played.  Likewise, the different kinds of variables differ from each other by the kinds of structural transformations they involve; and projecting that through the analogy, one might try to think of the fundamental forces as differing from each other not (primarily) by some arbitrary rules of combination and propagation, but by being different kinds of structural manipulations of reality.  Then, if there is some depth to the analogy, one might wonder if some of the particular technical contrasts between different classes of variables might be related to particular technical contrasts between different fundamental forces — which, frankly, I can't imagine deciphering until and unless one first sets the analogy on a solid technical basis.

I've speculated several times on this blog on the role of non-locality in physics.  Bell's Theorem says that the statistical distribution of quantum predictions cannot be explained by any local, deterministic theory of physics if, by 'local and deterministic', you mean 'evolving forward in time in a local and deterministic way'; but it's quite possible to generate this same statistical distribution of spacetime predictions using a theory that evolves locally and deterministically in a fifth dimension orthogonal to spacetime.  Which strikes a familiar chord through the analogy with calculus variables, because non-locality is, qualitatively at least, the defining characteristic of what we mean by "side-effects", and the machinery of α-renaming maintains hygiene for these operations exactly by going off and doing some term rearranging on the side (as if in a separate dimension of rewriting that we usually don't bother to track).  Indeed, thought of this way, a "variable" seems to be an inherently distributed entity, spread over a region of the term — called its scope — rather than located at a specific point.  A variable instance might appear to have a specific location, but only because we look at a concrete syntactic term; naturally we have to have a particular concrete term in order to write it down, but somehow this doesn't seem to do justice to the reality of the hygiene machinery.  One could think of an equivalence class of terms under α-renaming, but imho even that is a bit passive.  The reality of a variable, I've lately come to think, is a dynamic distributed entity weaving through the term, made up of the binding construct (such as a λ), all the free instances within its scope, and the living connections that tie all those parts together; I imagine if you put your hand on any part of that structure you could feel it humming with vitality.


To give a slightly less hand-wavy description of my earlier post on Bell's Theorem — since it is the most concrete example we have to inform our view of the analogy on the physics side:

Bell looked at a refinement of the experiment from the EPR paradox.  A device emits two particles with entangled spin, which shoot off in opposite directions, and their spins are measured by oriented detectors at some distance from the emitter.  The original objection of Einstein Podolsky and Rosen was that the two measurements are correlated with each other, but because of the distance between the two detectors, there's no way for information about either measurement to get to where the other measurement takes place without "spooky action at a distance".  Bell refined this objection by noting that the correlation of spin measurements depends on the angle θ between the detectors.  If you suppose that the orientations of the detectors at measurement is not known at the time and place where the particles are emitted, and that the outcomes of the measurements are determined by some sort of information — "hidden variable" — propagating from the emission event at no more than the speed of light, then there are limits (called Bell's Inequality) on how the correlation can be distributed as a function of θ, no matter what the probability distribution of the hidden variable.  The distribution predicted by quantum mechanics violates Bell's Inequality; so if the actual probability distribution of outcomes from the experiment matches the quantum mechanical prediction, we're living in a world that can't be explained by a local hidden-variable theory.

My point was that this whole line of reasoning supposes the state of the world evolves forward in time.  If it doesn't, then we have to rethink what we even mean by "locality", and I did so.  Suppose our entire four-dimensional reality is generated by evolving over a fifth dimension, which we might as well call "metatime".  "Locality" in this model means that information about the state of one part of spacetime takes a certain interval of metatime to propagate a certain distance to another part of spacetime.  Instead of trying to arrange the probability distribution of a hidden variable at the emission event so that it will propagate through time to produce the desired probability distribution of measurements — which doesn't work unless quantum mechanics is seriously wrong about this simple system — we can start with some simple, uniform probability distribution of possible versions of the entire history of the experiment, and by suitably arranging the rules by which spacetime evolves, we can arrange that eventually spacetime will settle into a stable state where the probability distribution is just what quantum mechanics predicts.  In essence, it works like this:  let the history of the experiment be random (we don't need nondeterminism here; this is just a statement of uniformly unknown initial conditions), and suppose that the apparent spacetime "causation" between the emission and the measurements causes the two measurements to be compared to each other.  Based on θ, let some hidden variable decide whether this version of history is stable; and if it isn't stable, just scramble up a new one (we can always do that by pulling it out of the uniform distribution of the hidden variable, without having to posit fundamental nondeterminism).  By choosing the rule for how the hidden variable interacts with θ, you can cause the eventual stable history of the experiment to exhibit any probability distribution you choose.

That immense power is something to keep a cautious eye on:  not only can this technique produce the probability distribution predicted by quantum mechanics, it can produce any other probability distribution as well.  So, if the general structure of this mathematical theory determines something about the structure of the physical reality it depicts, what it determines is apparently not, in any very straightforward fashion, that probability distribution.


The side of the analogy we have prior detailed structural knowledge about is the vau-calculus side.  Whatever useful insights we may hope to extract from the metatime approach to Bell's Theorem, it's very sketchy compared to vau-calculus.  So if we want to work out a structural pattern that applies to both sides of the analogy, it's plausible to start building from the side we know about, questioning and generalizing as we go along.  To start with,

  • Suppose we have a complex system, made up of interconnected parts, evolving by some sort of transformative steps according to some simple rules.
Okay, freeze frame.  Why should the system be made up of parts?  Well, in physics it's (almost) always the parts we're interested in.  We ourselves are, apparently, parts of reality, and we interact with parts of reality.  Could we treat the whole as a unit and then somehow temporarily pull parts out of it when we need to talk about them?  Maybe, but the form with parts is still the one we're primarily interested in.  And what about "transformative steps"; do we want discrete steps rather than continuous equations?  Actually, yes, that is my reading of the situation; not only does fundamental physics appear to be shot through with discreteness (I expanded on this point a while back), but the particular treatment I used for my metatime proof-of-concept (above) used an open-ended sequence of discrete trials to generate the requisite probability distribution.  If a more thoroughly continuous treatment is really wanted, one might try to recover continuity by taking a limit a la calculus.
  • Suppose we separate the transformation rules into two groups, which we call bookkeeping rules and operational rules; and suppose we have a set of exclusive criteria on system configurations, call these hygiene conditions, which must be satisfied before any operational rule can be applied.

Freeze again.  At first glance, this looks pretty good.  From any unhygienic configuration, we can't move forward operationally until we've done bookkeeping to ensure hygiene.  Both calculus rewriting and the metatime proof-of-concept seemingly conform to this pattern; but the two cases differ profoundly in how their underlying hygiene (supposing that's what it is, in the physics case) affects the form of the modeled system, and we'll need to consider the difference carefully if we mean to build our speculations on a sound footing.

Determinism and rewriting

Hygiene in rewriting is all about preserving properties of a term (to wit, variable instance–binding correspondences), whereas our proof-of-concept metatime transformations don't appear to be about perfectly preserving something but rather about shaping probability distributions.  One might ask whether it's possible to set up the internals of our metatime model so that the probability distribution is a consequence, or symptom, of conserving something behind the scenes.  Is the seemingly nondeterministic outcome of our quantum observation in a supposedly small quantum system here actually dictated by the need to maintain some cosmic balance that can't be directly observed because it's distributed over a ridiculously large number of entities (such as the number of electrons in the universe)?  That could lead to some bracing questions about how to usefully incorporate such a notion into a mathematical theory.

As an alternative, one might decide that the probability distribution in the metatime model should not be a consequence of absolutely preserving a condition.  There are two philosophically disparate sorts of models involving probabilities:  either the probability comes from our lack of knowledge (the hidden-variable hypothesis), and in the underlying model the universe is computing an inevitable outcome; or the probability is in the foundations (God playing dice, in the Einsteinian phrase), and in the underlying model the universe is exploring the range of possible outcomes.  I discussed this same distinction, in another form, in an earlier post, where it emerged as the defining philosophical distinction between a calculus and a grammar (here).  In those terms, if our physics model is fundamentally deterministic then it's a calculus and by implication has that structural affinity with the vau-calculi on the other side of the analogy; but if our physics model is fundamentally nondeterministic then it's a grammar, and our analogy has to try to bridge that philosophical gap.  Based on past experience, though, I'm highly skeptical of bridging the gap; if the analogy can be set on a concrete technical basis, the TOE on the physics side seems to me likely to be foundationally deterministic.

The foundationally deterministic approach to probability is to start with a probabilistic distribution of deterministic initial states, and evolve them all forward to produce a probabilistic distribution of deterministic final states.  Does the traditional vau-calculus side of our analogy, where we have so much detail to start with, have anything to say about this?  In the most prosaic sense, ones suspects not; probability distributions don't traditionally figure into deterministic computation semantics, where this approach would mean considering fuzzy sets of terms.  There may be some insight lurking, though, in the origins of calculus hygiene.

When Alonzo Church's 1932/3 formal logic turned out to be inconsistent, he tried to back off and find some subset of it that was provably consistent.  Here consistent meant that not all propositions are equivalent to each other, and the subset of the logic that he and his student J. Barkley Rosser proved consistent in this sense was what we now call λ-calculus.  The way they did it was to show that if any term T1 can be reduced in the calculus in two different ways, as T2 and T3, then there must be some T4 that both of them can be reduced to.  Since logical equivalence of terms is defined as the smallest congruence generated by the rewriting relation of the calculus, from the Church-Rosser property it follows that if two terms are equivalent, there must be some term that they both can be reduced to; and therefore, two different irreducible terms cannot possibly be logically equivalent to each other.

Proving the Church-Rosser theorem for λ-calculus was not, originally, a simple matter.  It took three decades before a simple proof began to circulate, and the theorem for variant calculi continues to be a challenge.  And this is (in one view of the matter, at least) where hygiene comes into the picture.  Church had three major rewriting rules in his system, later called the α, β, and γ rules.  The α rule was the "bookkeeping" rule; it allowed renaming a bound variable as long as you don't lose its distinction from other variables in the process.  The β rule is now understood as the single operational rule of λ-calculus, how to apply a function to an argument.  The γ rule is mostly forgotten now; it was simply doing a β-step backward, and was later dropped in favor of starting with just α and β and then taking the congruent closure (reflexive, symmetric, transitive, and compatible).  Ultimately the Church-Rosser theorem allows terms to be sorted into β-equivalence classes; but the terms in each class aren't generally thought of as "the same term", just "equivalent terms".  α-equivalent terms, though, are much closer to each other, and for many purposes would actually be thought of as "the same term, just written differently".  Recall my earlier description of a variable as a distributed entity, weaving through the term, made up of binding construct, instances, and living connections between them.  If you have a big term, shot through with lots of those dynamic distributed entities, the interweaving could really be vastly complex.  So factoring out the α-renaming is itself a vast simplification, which for a large term may dwarf what's left after factoring to complete the Church-Rosser proof.  To see by just how much the bookkeeping might dwarf the remaining operational complexity, imagine scaling the term up to the sort of cosmological scope mentioned earlier — like the number of electrons in the universe.

It seems worth considering, that hygiene may be a natural consequence of a certain kind of factoring of a vastly interconnected system:  you sequester almost all of the complexity into bookkeeping with terrifically simple rules applied on an inhumanly staggering scale, and comparatively nontrivial operational rules that never have to deal directly with the sheer scale of the system because that part of the complexity was factored into the bookkeeping.  In that case, at some point we'll need to ask when and why that sort of factoring is possible.  Maybe it isn't really possible for the cosmos, and a flaw in our physics is that we've been trying so hard to factor things this way; when we really dive into that question we'll be in deep waters.

It's now no longer clear, btw, that geometry corresponds quite directly to α-renaming.  There was already some hint of that in the view of vau-calculus side-effects as "non-local", which tends to associate geometry with vau-calculus term structure rather than α-renaming as such.  Seemingly, hygiene is then a sort of adjunct to the geometry, something that allows the geometry to coexist with the massive interconnection of the system.

But now, with massive interconnection resonating between the two sides of the analogy, it's definitely time to ask some of those bracing questions about incorporating cosmic connectivity into a mathematical theory of physics.

Nondeterminism and the cosmic footprint

We want to interpret our probability distribution as a footprint, and reconstruct from it the massively connected cosmic order that walked there.  Moreover, we're conjecturing that the whole system is factorable into bookkeeping/hygiene on one hand(?), and operations that amount to what we'd ordinarily call "laws of physics" on the other; and we'd really like to deduce, from the way quantum mechanics works, something about the nature of the bookkeeping and the factorization.

Classically, if we have a small system that's acted on by a lot of stuff we don't know about specifically, we let all those influences sum to a potential field.  One might think of this classical approach as a particular kind of cosmological factorization in which the vast cosmic interconnectedness is reduced to a field, so one can then otherwise ignore almost everything to model the behavior of the small system of interest using a small operational set of physical laws.  We know the sort of cosmic order that reduces that way, it's the sort with classical locality (relative to time evolution); and the vaster part of the factorization — the rest of the cosmos, that reduced to a potential field — is of essentially the same kind as the small system.  The question we're asking at this point is, what sort of cosmic order reduces such that its operational part is quantum mechanics, and what does its bookkeeping part look like?  Looking at vau-calculus, with its α-equivalence and Church-Rosser β-equivalence, it seems fairly clear that hygiene is an asymmetric factorization:  if the cosmos factors this way, the bookkeeping part wouldn't have to look at all like quantum mechanics.  A further complication is that quantum mechanics may be an approximation only good when the system you're looking at is vastly smaller than the universe as a whole; indeed, this conjecture seems rather encouraged by what happens when we try to apply our modern physical theories to the cosmos as a whole: notably, dark matter.  (The broad notion of asymmetric factorizations surrounding quantum mechanics brings to mind both QM's notorious asymmetry between observer and observed, and Einstein's suggestion that QM is missing some essential piece of reality.)

For this factorization to work out — for the cosmic system as a whole to be broadly "metaclassical" while factoring through the bookkeeping to either quantum mechanics or a very good approximation thereof — the factorization has to have some rather interesting properties.  In a generic classical situation where one small thing is acted on by a truly vast population of other things, we tend to expect all those other things to average out (as typically happens with a classical potential field), so that the vastness of the population makes their combined influence more stable rather than less; and also, as our subsystem interacts with the outside influence and we thereby learn more about that influence, we become more able to allow for it and still further reduce any residual unpredictability of the system.

Considered more closely, though, the expectation that summing over a large population would tend to average out is based broadly on the paradigm of signed magnitudes on an unbounded scale that attenuate over distance.  If you don't have attenuation, and your magnitudes are on a closed loop, such as angles of rotation, increasing the population just makes things more unpredictable.  Interestingly, I'd already suggested in one of my earlier explorations of the hygiene analogy that the physics hygiene condition might be some sort of rotational constraint, for the — seemingly — unrelated reason that the primary geometry of physics has 3+1 dimension structure, which is apparently the structure of quaternions, and my current sense of quaternions is that they're the essence of rotation.  Though when things converge like this, it can be very hard to distinguish between an accidental convergence and one that simply reassembles fragments of a deeper whole.

I'll have a thought on the other classical problem — losing unpredictability as we learn more about outside influences over time — after collecting some further insights on the structural dynamics of bookkeeping.

Massive interconnection

Given a small piece of the universe, which other parts of the universe does it interact with, and how?

In the classical decomposition, all interactions with other parts of the universe are based on relative position in the geometry — that is, locality.  Interestingly, conventional quantum mechanics retains this manner of selecting interactions, embedding it deeply into the equational structure of its mathematics.  Recall the Schrödinger equation,

iℏ Ψ
 =   Ĥ Ψ .
The element that shapes the time evolution of the system — the Hamiltonian function Ĥ — is essentially an embodiment of the classical expectation of the system behavior; which is to say, interaction according to the classical rules of locality.  (I discussed the structure of the Schrödinger equation at length in an earlier post, yonder.)  Viewing conventional QM this way, as starting with classical interactions and then tacking on quantum machinery to "fix" it, I'm put in mind of Ptolemaic epicycles, tacked on to the perfect-circles model of celestial motion to make it work.  (I don't mean the comparison mockingly, just critically; Ptolemy's system worked pretty well, and Copernicus used epicycles too.  Turns out there's a better way, though.)

How does interaction-between-parts play out in vau-calculus, our detailed example of hygiene at work?

The whole syntax of a calculus term is defined by two aspects:  the variables — by which I mean those "distributed entities" woven through the term, each made of a binding, its bound instances, and the connections that hygiene maintains between them — and, well, everything else.  Once you ignore the specific identities of all the variable instances, you've just got a syntax tree with each node labeled by a context-free syntax production; and the context-free syntax doesn't have very many rules.  In λ-calculus there are only four syntax rules: a term is either a combination, or a λ-abstraction, or a variable, or a constant.  Some treatments simplify this by using variables for the "constants", and it's down to only three syntax rules.  The lone operational rule β,

((λx.T1)T2)   →   T1[x ← T2]   ,
gives a purely local pattern in the syntax tree for determining when the operation can be applied:  any time you have a parent node that's a combination whose left child is a λ-abstraction.  Operational rules stay nearly this simple even in vau-calculi; the left-hand side of each operational rule specifies when it can be applied by a small pattern of adjacent nodes in the syntax tree, and mostly ignores variables (thanks to hygienic bookkeeping).  The right-hand side is where operational substitution may be introduced.  So evidently vau-calculus — like QM — is giving local proximity a central voice in determining when and how operations apply, with the distributed aspect (variables) coming into play in the operation's consequences (right-hand side).

Turning it around, though, if you look at a small subterm — analogous, presumably, to a small system studied in physics — what rules govern its non-local connections to other parts of the larger term?  Let's suppose the term is larger than our subterm by a cosmically vast amount.  The free variables in the subterm are the entry points by which non-local influences from the (vast) context can affect the subterm (via substitution functions).  And there is no upper limit to how fraught those sorts of interconnections can get (which is, after all, what spurs advocates of side-effect-less programming).  That complexity belongs not to the "laws of physics" (neither the operational nor even the bookkeeping rules), but to the configuration of the system.  From classical physics, we're used to having very simple laws, with all the complexity of our problems coming from the particular configuration of the small system we're studying; and now we've had that turned on its head.  From the perspective of the rules of the calculus, yes, we still have very simple rules of manipulation, and all the complexity is in the arrangement of the particular term we're working with; but from the perspective of the subterm of interest, the interconnections imposed by free variables look a lot like "laws of physics" themselves.  If we hold our one subterm fixed and allow the larger term around it to vary probabilistically then, in general, we don't know what the rules are and have no upper bound on how complicated those rules might be.  All we have are subtle limits on the shape of the possible influences by those rules, imposed roundaboutly by the nature of the bookkeeping-and-operational transformations.

One thing about the shape of these nonlocal influences:  they don't work like the local part of operations.  The substitutive part of an operation typically involves some mixture of erasing bound variable instances and copying fragments from elsewhere.  The upshot is that it rearranges the nonlocal topology of the term, that is, rearranges the way the variables interweave — which is, of course, why the bookkeeping rules are needed, to maintain the integrity of the interweaving as it winds and twists.  And this is why a cosmic system of this sort doesn't suffer from a gradual loss of unpredictability as the subsystem interacts with its neighbors in the nonlocal network and "learns" about them:  each nonlocal interaction that affects it changes its nonlocal-network neighbors.  As long as the overall system is cosmically vast compared to the the subsystem we're studying, in practice we won't run out of new network neighbors no matter how many nonlocal actions our subsystem undergoes.

This also gives us a specific reason to suspect that quantum mechanics, by relying on this assumption of an endless supply of fresh network neighbors, would break down when studying subsystems that aren't sufficiently small compared to the cosmos as a whole.  Making QM (as I've speculated before), like Newtonian physics, an approximation that works very well in certain cases.


Here's what the reconstructed general mathematical model looks like so far:

  • The system as a whole is made up of interconnected parts, evolving by transformative steps according to simple rules.

  • The interconnections form two subsystems:  local geometry, and nonlocal network topology.

  • The transformation rules are of two kinds, bookkeeping and operational.

  • Operational rules can only be applied to a system configuration satisfying certain hygiene conditions on its nonlocal network topology; bookkeeping, which only acts on nonlocal network topology, makes it possible to achieve the hygiene conditions.

  • Operational rules are activated based on local criteria (given hygiene).  When applied, operations can have both local and nonlocal effects, while the integrity of the nonlocal network topology is maintained across both kinds of effect via hygiene, hence bookkeeping.

To complete this picture, it seems, we want to consider what a small local system consists of, and how it relates to the whole.  This is all the more critical since we're entertaining the possibility that quantum mechanics might be an approximation that only works for a small local system, so that understanding how a local system relates to the whole may be crucial to understanding how quantum mechanics can arise locally from a globally non-quantum TOE.

A local system consists of some "local state", stuff that isn't interconnection of either kind; and some interconnections of (potentially) both kinds that are entirely encompassed within the local system.  For example, in vau-calculi — our only prior source for complete examples — we might have a subterm (λx.(y‍x)).  Variables are nonlocal network topology, of course, but in this case variable x is entirely contained within the local system.  The choice of variable name "x" is arbitrary, as long as it remains different from "y" (hygiene).  But what about the choice of "y"?  In calculus reasoning we would usually say that because y is free in this subterm, we can't touch it; but that's only true if we're interested in comparing it to specific contexts, or to other specific subterms.  If we're only interested in how this subterm relates to the rest of the universe, and we have no idea what the rest of the universe is, then the free variable y really is just one end of a connection whose other end is completely unknown to us; and a different choice of free variable would tell us exactly as much, and as little, as this one.  We would be just as well off with (λx.(z‍x)), or (λz.(w‍z)), or even (λy.(x‍y)) — as long as we maintain the hygienic distinction between the two variables.  The local geometry that can occur just outside this subterm, in its surrounding context, is limited to certain specific forms (by the context-free grammar of the calculus); the nonlocal network topology is vastly less constrained.

The implication here is that all those terms just named are effectively equivalent to (λx.(y‍x)).  One might be tempted to think of this as simply different ways of "writing" the same local system, as in physics we might describe the same local system using different choices of coordinate axes; but the choice of coordinate axes is about local geometry, not nonlocal network topology.  Here we're starting with simple descriptions of local systems, and then taking the quotient under the equivalence induced by the bookkeeping operations.  It's tempting to think of the pre-quotient simple descriptions as "classical" and analogize the quotient to "quantum", but in fact there is a second quotient to be taken.  In the metatime proof-of-concept, the rewriting kept reshuffling the entire history of the experiment until it reached a steady state — the obvious analogy is to a calculus irreducible term, the final result of the operational rewrite relation of the calculus.  And this, at last, is where Church-Rosser-ness comes in.  Church-Rosser is what guarantees that the same irreducible form (if any) is reached no matter in what order operational rules are applied.  It's the enabler for each individual state of the system to evolve deterministically.  To emphasize this point:  Church-Rosser-ness applies to an individual possible system state, thus belongs to the deterministic-foundations approach to probabilities.  The probability distribution itself is made up of individual possibilities each one of which is subject to Church-Rosser-ness.  (Church-Rosser-ness is also, btw, a property of the mathematical model:  one doesn't ask "Why should these different paths of system state evolution all come back together to the same normal form?", because that's simply the kind of mathematical model one has chosen to explore.)

The question we're trying to get a handle on is why the nonlocal effects of some operational rules would appear to be especially compatible with the bookkeeping quotient of the local geometry.


In vau-calculi, the nonlocal operational effects (i.e., operational substitution functions) that do not integrate smoothly with bookkeeping (i.e., with α-renaming) are the ones that support side-effects; and the one nonlocal operational effect that does integrate smoothly with bookkeeping — β-substitution — supports partial evaluation and turns out to be optional, in the sense that the operational semantics of the system could be described without that kind of nonlocal effect and the mathematics would still be correct with merely a weaker equational theory.

This suggests that in physics, gravity could be understood without bringing nonlocal effects into it at all, though there may be some sort of internal mathematical advantage to bringing them in anyway; while the other forces may be thought of as, in some abstract sense, side-effects.

So, what exactly makes a side-effect-ful substitution side-effect-ful?  Conversely, β-substitution is also a form of substitution; it engages the nonlocal network topology, reshuffling it by distributing copies of subterms, the sort of thing I speculated above may be needed to maintain the unpredictability aspect of quantum mechanics.  So, what makes β-substitution not side-effect-ful in character?  Beyond the very specific technicalities of β- and α-substitution; and just how much, or little, should we be abstracting away from those technicalities?  I'm supposing we have to abstract away at least a bit, on the principle that physics isn't likely to be technically close to vau-calculi in its mathematical details.

Here's a stab at a sufficient condition:

  • A nonlocal operational effect is side-effect-ful just if it perturbs the pre-existing local geometry.

The inverse property, called "purity" in a programming context (as in "pure function"), is that the nonlocal operational effect doesn't perturb the pre-existing local geometry.  β-substitution is pure in this sense, as it replaces a free variable-instance with a subterm but doesn't affect anything local other than the variable-instance itself.  Contrast this with the operational substitution for control variables; the key clauses (that is, nontrivial base cases) of the two substitutions are

x[x ← T]   →   T   .

(τx.T)[x ← C]   →   (τx.C[T[x ← C]])   .

The control substitution alters the local-geometric distance between pre-existing structure T and whatever pre-existing immediate context surrounds the subterm acted on.  Both substitutions have the — conjecturally important — property that they substantially rearrange the nonlocal network topology by injecting arbitrary new network connections (that is, new free variables).  The introduction of new free variables is a major reason why vau-calculi need bookkeeping to maintain hygiene; although, interestingly, it's taken all this careful reasoning about bookkeeping to conclude that bookkeeping isn't actually necessary to the notion of purity/impurity (or side-effect-ful/non-side-effect-ful); apparently, bookkeeping is about perturbations of the nonlocal network topology, whereas purity/impurity is about perturbations of the local geometry.  To emphasize the point, one might call this non-perturbation of local geometry co-hygiene — all the nonlocal operational effects must be hygienic, which might or might not require bookkeeping depending on internals of the mathematics, but only the β- and gravity nonlocal effects are co-hygienic.


Abstracting away from how we got to it, here's what we have:

  • A complex system of parts, evolving through a Church-Rosser transformation step relation.

  • Interconnections within a system state, partitioned into local (geometry) and nonlocal (network).

  • Each transformation step is selected locally.

  • The nonlocal effects of each transformation step rearrange — scramble — nonlocal connections at the locus where applied.

  • Certain transformative operations have nonlocal effects that do not disrupt pre-existing local structure — that are co-hygienic — and thereby afford particularly elegant description.

What sort of elegance is involved in the description of a co-hygienic operation depends on the technical character of the mathematical model; for β-reduction, what we've observed is functional compatibility between β- and α-substitution, while for gravity we've observed the general-relativity integration between gravity and the geometry of spacetime.

So my proposed answer to the conundrum I've been pondering is that the affinity between gravity and geometry suggests a modeling strategy with a nonlocal network pseudo-randomly scrambled by locally selected operational transformations evolving toward a stable state of spacetime, in which the gravity operations are co-hygienic.  A natural follow-on question is just what sort of mathematical machinery, if any, would cause this network-scrambling to approximate quantum mechanics.

On the side, I've got intimations here that quantum mechanics may be an approximation induced by the pseudo-random network scrambling when the system under study is practically infinitesimal compared to the cosmos as a whole, and perhaps that the network topology has a rotational aspect.

Meanwhile, an additional line of possible inquiry has opened up.  All along I've been trying to figure out what the analogy says about physics; but now it seems one might study the semantics of a possibly-side-effect-ful program fragment by some method structurally akin to quantum mechanics.  The sheer mathematical perversity of quantum mechanics makes me skeptical that this could be a practical approach to programming semantics; on the other hand, it might provide useful insights for the TOE mathematics.

Epilog: hygiene

So, what happened to hygiene?  It was a major focus of attention through nearly the whole investigation, and then dropped out of the plot near the end.

At its height of prestige, when directly analogized to spacetime geometry (before that fell through), hygiene motivated the suggestion that geometry might be thought of not as the venue where physics happens but merely as part of its rules.  That suggestion is still somewhat alive since the proposed solution treats geometry as abstractly just one of the two forms of interconnection in system state, though there's a likely asymmetry of representation between the two forms.  There was also some speculation that understanding hygiene on the physics side could be central to making sense of the model; that, I'd now ease off on, but do note that in seeking a possible model for physics one ought to keep an eye out for a possible bookkeeping mechanism, and certainly resolving the nonlocal topology of the model would seem inseparable from resolving its bookkeeping.  So hygiene isn't out of the picture, and may even play an important role; just not with top billing.

Would it be possible for the physics model to be unhygienic?  In the abstract sense I've ended up with, lack of hygiene would apparently mean an operation whose local effect causes nonlocal perturbation.  Whether or not dynamic scope qualifies would depend on whether dynamically scoped variables are considered nonlocal; but since we expect some nonlocal connectivity, and those variables couldn't be perturbed by local operations without losing most/all of their nonlocality, probably the physics model would have to be hygienic.  My guess is that if a TOE actually tracked the nonlocal network (as opposed to, conjecturally, introducing a quantum "blurring" as an approximation for the cumulative effect of the network), the tracking would need something enough like calculus variables that some sort of bookkeeping would be called for.


  1. Thanks for writing more about the theoretical underpinnings of vau calculus!

    I find myself wondering about that definition of variables as nonlocal entities. Since not all models of computation use variables, are there other ways of looking at them?

    In SKI calculus, a 'variable' seems to be just a zero-argument function, reducing to a set of stack operations.

    In the object-and-message-passing model of computation, a 'variable' seems to be a property slot of an object, and an object can be represented as a function. So isn't a variable just one particular special case of a pair of function argument -> function value?

    At the SKI and lambda levels, we can optimise variable names away for speed. But in a third camp - the world of large-scale software systems, databases and knowledge representation - variable names become exactly as important as their values. This is the level of source code; published APIs with agreed-on call conventions; databases; network protocols and packets; and documentation. At this level names are very much run-time data, yet operations on names can still form Turing-complete languages. Names as data here can persist for decades or more, living through multiple software implementation languages and operating systems.

    Could vau calulus help us gain insight into these kind of software systems?

    Regards, Nate

    1. Yes, I've wondered about the relationship between computational models with and without variables. What I figure: The point of having variables is the ways of expressing things made possible by their non-locality: you provide a binding for the variable in one place and then refer to it throughout the scope of the binding. SKI has the same computational power but the way you structure things should be fundamentally different because you can't refer to something that's defined elsewhere. Think of it in terms of formal expressiveness: given a program written using variables, how intrusively would you have to rearrange its structure in order to translate it into SKI? (I talked about expressiveness in my post on abstractive power, fwiw.)

      Of course, λ-calculus is co-hygienic, i.e., there are no side-effects. Side-effects are a whole new level of non-locality, and would seem to therefore imply a whole new level of structural expressiveness (for good or ill, or both).

  2. This comment has been removed by a blog administrator.

  3. This comment has been removed by a blog administrator.