Structural insight: Numberless differentiation

The method described in I.3 resulted in an analogy between the "discrete" space of index values Z=(1,2,...) and the continuous state space Ω of the mechanical system[...]. That this cannot be achieved without some violence to the formalism and to mathematics is not surprising.
— John von Neumann, Mathematical Foundations of Quantum Mechanics, trans. Robert T. Beyer, 1955, §I.4.

In this post, I mean to explore a notion that arose somewhat peripherally in an earlier post. I noted that the abstractive power of a programming language is, in a sense, the second derivative of its semantics. I've also noted, in another post, a curious analogy between term rewriting calculi and cosmological physics, which I used to inspire some off-the-wall thinking about physics. So I've been wondering if the abstractive power notion might be developed into a formal tool for altogether non-numeric discrete systems akin to the continuous differential calculus that plays such a central role in modern physics.

Developing a mathematical tool of this sort is something of a guessing game. One hopes to find one's way to some natural structure in the platonic realm of the mathematics, using whatever clues one can discern, wherever one finds them (be they in the platonic realm or elsewhere).

I've a story from my mother, from when she was in graduate school, about a student asking for motivation for something in an algebraic topology class. The professor replied, when you build a beautiful cathedral you tear down the scaffolding. To me, honestly, this always seemed to be missing the point. You should keep a record of where the scaffolding was. And the architectural plans. And, while you're at it, the draft architectural plans.

Candidly, when I started this I did not know what I was expecting to find, and the path of reasoning has taken several unexpected turns along the way. One thing I haven't found is a purely non-numeric discrete device analogous to continuous differential calculus; but I've got a plate full of deep insights that deserve their chance to be shared, and if the non-numeric discrete device is there to discover, it seems I must be closer to it than I started.

Contents
Discrete continuous physics
Differentiation in physics
Abstractiveness in programming
Causation in term rewriting
Draft analogy
Breaking the draft analogy
Retrenching

Discrete continuous physics

Physics has been struggling with the interplay of continuous and discrete elements for centuries (arguably since ancient times). Continuous math is a clean powerful tool and therefore popular; on the face of it, one wouldn't expect to get nearly as much useful physics done with purely discrete structures. On the other hand, our ordinary experience includes discrete quantities as well as continuous ones (recalling Kronecker's attributed remark, God made the integers; all else is the work of man). Nineteenth-century classical physics reconciled continuous and discrete aspects by populating space with continuous fields and point particles.

The field/particle approach collapsed with quantum mechanics, which is commonly described as saying that various things appearing continuous at a macroscopic level turn out to be discrete at a very small scale. I'd put it almost the other way around. Our notion of discrete things comes from our ordinary, macroscopic experience, but evidently these macroscopic discrete things can be broken up into smaller things; and if we keep pushing this ordinary notion down to a sufficiently small scale, eventually it can't be pushed any further, and we end up having to postulate an underlying (non-observable) continuous wave function which, having been so-to-speak squashed as nearly flat as it can be, smears even supposedly discrete elementary particles into a continuous complex-valued field. Which gives rise to discrete events by means of wave-function collapse, the ultimate kludge. Wave-function collapse is a different way of reconciling continuous and discrete aspects of the theory, but tbh it feels a bit contrived.

In this view, howsoever we wandered into it, we've assumed a continuous foundation and sought to derive discreteness from the continuous substratum. Traditional quantum theory derives discreteness by means of wave function collapse. Some modern theories use the concept of a standing wave to derive discreteness; string theory does this, with closed loops vibrating at some overtone of their length, and the transactional interpretation does too, with waves propagating forward and backward in time between spacetime events and summing to a standing wave.

An alternative sometimes considered is that the foundation could be discrete — so-called digital physics. I've some doubts about this, intuitively, because — how to put this — I don't think Nature likes to be neatly classified as "discrete" or "continuous". I remember, in a long-ago brainstorming session with my father, posing the semi-rhetorical question, "what is neither discrete nor continuous?", to which he mildly suggested, "reality". A purely discrete foundation seems as unlikely to me as a purely continuous one; in fact, it's possible the overly continuous character of the underlying wave function in quantum theory may be one subtle reason I'm not fond of that theory.

I have a particular, if round-about, reason for interest in the prospects for introducing some form of discrete, nonlocal, nonquantum element into physics. I'd actually been thinking in this direction for some time before I realized why I was tending that way, and was rather excited to figure out why because it gives me a place to start looking for what kind of role this hypothetical discrete element might play in the physics. My reasoning comes from, broadly, Mach's principle — the idea that the way the local universe works is a function of the shape of the rest of the universe. (Mach's principle, btw, was given its name by Albert Einstein, who drew inspiration from it for general relativity.)

Suppose we want to study a very small part of the universe; say, for example, a single elementary particle. This requires us either to disregard the rest of the universe entirely, or to assume the influence of the rest of the universe on our single particle is neatly summarized by some simple factor in our system, such as a potential field. But quantum mechanics starts us thinking, somewhat at least, in terms of everything affecting everything else. Suppose this is actually more so than quantum mechanics tells us. Suppose our single particle is interacting nonlocally with the whole rest of the universe. We don't know anything about these interactions with the rest of the universe, so their influence on our one particle is —for us— essentially random. If it were all local and neatly summed up, we might try to ask for it to be described by a potential field of some sort; but since we're supposing this is nonlocal interaction, and we don't have any sort of nonlocal structure in our theory by which to describe it, we really can't do anything with it other than expect it to smear our probability distribution for our one particle.

This nonlocal interaction I'm talking about is presumably not, in general, the sort of nonlocal interaction that arises in quantum mechanics. That sort of interaction, at least when it occurs within the system one is studying, is quantum entanglement, and it's really quite fragile: at large scales it becomes very likely to decohere, its nonlocality breaking down under interactions with the rest of the universe. I'm hypothesizing, instead, some more robust sort of nonlocality, that can work on a large scale. From my reasoning above, it seems that as soon as we hypothesize some such robust nonlocal interaction, we immediately expect the behavior of our single-particle system to appear nondeterministic, simply because we no longer can have an undisturbed system, unless we consider the entire universe as a whole. In contrast to the frequent presentation of quantum nondeterminism as "strange", under this hypothesis nondeterminism at small scales is unsurprising — though not uninteresting, because we may get clues to the character of our robust nonlocal interaction by looking at how it disturbs our otherwise undisturbed small system.

Following these clues is made more challenging because we have to imagine not one new element of the theory, but two: on one hand, some sort of robust nonlocal interaction, and on the other hand, some local behavior that our small system would have had if the robust nonlocal interaction had not been present. We expect the familiar rules of quantum mechanics to be the sum of these two elements. I have already implied that the local element is not itself quantum mechanics, as I suggested something of quantum nondeterminism might be ascribed to the robust nonlocal element. One might suppose the local element is classical physics, and our robust nonlocal interaction is what's needed to sum with classical physics to produce quantum mechanics. Or perhaps, if we subtract the robust nonlocal interaction from quantum mechanics, we get something else again.

This also raises interesting possibilities at the systemic level. The sum of these two hypothetical elements doesn't have to be quantum mechanics exactly, it only has to be close enough for quantum mechanics to be useful when looking only at a small system; so, quantum mechanics could be a useful approximation in a special case, rather as Newtonian mechanics is. We may expect the robust nonlocal element to be significant at cosmological scales, and we may have misread some cosmological phenomena because we were expecting effects on that scale to be local. But just at the moment my point is that, since spacetime is apparently our primal source of continuity, and our robust nonlocal interaction is specifically bypassing that continuity, I would expect the new interaction to be at least partly discrete rather than continuous; intuitively, if it were altogether continuous one might expect it to be more naturally part of the geometry rather than remaining "nonlocal" as such.

Differentiation in physics

We want a schematic sense of the continuous device we're looking for an analogy to.

Suppose 𝓜 is a manifold (essentially, a possibly-curved n-dimensional space), and F is a field on 𝓜. The value of the field at a point p on 𝓜, Fp, might be a scalar (an ordinary number, probably real or complex), or perhaps a vector (directed magnitude); at any rate Fp would likely be some sort of tensor — but I'm trying to assume as little as possible. So Fp is some sort of field value at p.

The derivative of F at p is... something... that describes how F is changing on 𝓜 at p. If you know the derivative F ' of F at every point in 𝓜, you know the whole "shape" of F, and from this shape F ' you can almost reconstruct F. The process of doing so is integration. The process doesn't quite reconstruct F because you only know how F changes across 𝓜, not what it changes relative to. This absolute reference for all of F doesn't change, across the whole manifold 𝓜, so it isn't part of the information captured by F '. So the integral of F ' is, at least conceptually, F plus a constant.

By assuming just a bit more about the values of field F, we can see how to do more with derivative F ' than merely integrate it to almost-reconstruct F. The usual supposition is that Fp is either a number (a scalar, continuous), or a number together with a configuration relative to the manifold 𝓜. The idea here is that the configuration is a simple relationship to the manifold, and the number gives it "depth" (more-or-less literally) by saying "how big" is the configured entity. The simplest such configured value is a vector: a directed magnitude, where the direction on the manifold is the configuration, and the magnitude in that direction is the number. One can also have a bivector, where the configuration, rather than a linear direction, is a two-dimensional plane orientation. There's a more subtle distinction between vectors and covectors (in general, multivectors and comultivectors), which has to do with integration versus differentiation. These distinctions of "type" (scalar, vector, covector, bivector, cobivector) are relatively crude; most of the configuration information is a continuous orientation, some variant of "direction". And ultimately, we use the configurations to guide relations between numbers. All that information about "shape" is channeled into numerical equations.

The continuity of the manifold comes into the situation in two ways. One, already obliquely mentioned, is that the configuration of the field is only partly discrete, the rest of it (in all but the scalar-field case) being a sort of pseudo-direction on the manifold whose value is continuous, and interacts in some continuous fashion with the numerical depth/intensity of the field. The other insinuation of manifold continuity into the continuous derivative is more pervasive: the derivative describes the shape of the field at a point p by taking the shape in an ε-neighborhood of p in the limit as ε approaches zero. The very idea of this limit requires that the neighborhood size ε (as well as the field intensity and probably the pseudo-direction) be continuous.

Abstractiveness in programming

The analogy needs a similar schematic of the discrete device. Abstractiveness, in my treatment, is a relation between programming languages, in which some programming language Q is formally as abstractive as programming language P.

The basic structure on which the treatment rests is an abstract state machine, whose states are "languages" (ostensibly, programming languages), and whose transitions are labeled by texts. The texts are members of a set T of terms generated over a context-free grammar; each text is treated as an atomic whole, so when we consider a sequence of texts, the individual texts within the sequence remain distinct from each other, as if T were the "alphabet" in which the sequence is written (rather than all the texts in the sequence being run together as a string over a more primitive alphabet). Some subset of the texts are designated as observables, which are used to judge equivalences between programs. In a typical interpretation, non-observable texts are units of program code, such as module declarations, while observable texts are descriptors of program output.

Everything that matters about a state of this abstract machine is captured by the set of text sequences possible from that state (a touch of Mach's principle, there); so formally we define each language to be the set of text sequences, without introducing separate objects to represent the states. A language P is thus a set of text sequences, P ⊆ T^*, that's closed under prefixing (that is, if xy ∈ P then x ∈ P). We write P/x for the language reached from language P by text sequence x ∈ P; that is, P/x = { y | xy ∈ P }.

These structures are used to define formally when one language "does" at least as much as another language. The criterion is existence of a function φ : P → Q, also written φP = Q, that maps each text sequence x ∈ P to a text sequence φx ∈ φP such that φ preserves prefixes (φx is always a prefix of φxy) and φ also preserves some property of P understood as defining what it "does". For language expressiveness, this additional preserved property is concerned with the observables, resulting in a formal statement that Q is at least as expressive as P. For language abstractiveness, the additional preserved property is concerned with both observables and expressiveness relations between arbitrary P/x and P/y, resulting in a formal statement that Q is at least as abstractive as P.

The function φ is understood as "rewriting" P texts as Q texts; and the less intrusive this rewriting is, the more easily Q can subsume the capabilities of P. For expressiveness, the class of macro (aka polynomial) transformations is traditionally of interest; operations that can be eliminated by this sort of transformation are traditionally called syntactic sugar.

Causation in term rewriting

Anyone with a bit of background in computer science has probably worked both with grammars and with term-rewriting calculi (though likely not in depth at the same time). Have you thought about the contrast between them? I hadn't, really, until I ended up doing a master's thesis formulating a novel grammar model (pdf), followed by a doctoral dissertation formulating a novel term-rewriting calculus (here). Since I suspect grammars tend to be thought of with less, well, gravitas than calculi, I'll explain a bit of how the grammars in my master's thesis work.

The formalism is called RAGs — Recursive Adaptive Grammars. Where an attribute grammar would have a terminal alphabet for syntax, a nonterminal alphabet for metasyntax (that is, for specifying classes of syntax), and various domains of attribute values for semantics, a RAG has a single unified domain of answers serving all three roles (except that for metasyntax I'll use the deeper name cosemantics). A rule form looks like

⟨v₀,p₀⟩ → t₁ ⟨p₁,v₁⟩ ... t_n ⟨p_n,v_n⟩ t_n+1

where each t_k is a syntax string, and each ⟨•,•⟩ is a cosemantics/semantics pair suitable for labeling a parent node in a parse tree. Conceptually, cosemantics is inherited, specifying how a tree can grow downward (the cosemantics on the root node is the "start symbol"); while semantics is synthesized, specifying the resultant meaning of the parsed string. The v_k are variables, provided from other parts of the tree: cosemantics of the parent node, provided from above; and semantics of the children, provided from below. The p_k are polynomials, determined by the current rule from these variables, hence, from the cosemantics of the parent and semantics of the children. (If the grammar is well-behaved, each child's cosemantics p_k only uses variables v₀,...v_k−1, so there are no circular dependencies in the parse tree.)

Playing the role of the rule set of an attribute grammar, a RAG has a rule function ρ which maps each answer a into a set of rule forms ρ(a); in selecting answers for the variables in a rule form r (producing a rule instance of r), the inherited cosemantics v₀ has to be some answer a such that r ∈ ρ(a).

If the polynomials were only allowed to construct answers, our grammars could recognize somewhat more than context-free languages; but we also want to compute semantics using Turing-powerful computation. For this, I permitted in the polynomials a non-constructive binary operator — the query operator, written as an infix colon, •:•. Conceptually, parsing starts with cosemantics, parses syntax, and in doing so synthesizes semantics; but in the usual grammatical derivation relation, cosemantics and semantics are both on the left side of the derivation, while syntax is on the right, thus:

⟨cosemantics, semantics⟩ ⇒⁺ syntax

The query operator rearranges these elements, because it has the meaning "find the semantics that results from this cosemantics on this syntax":

cosemantics : syntax ⇒⁺ semantics

So we can embed queries in our rule forms, hence the "recursive" in "Recursive Adaptive Grammars". However, I meant RAGs to be based on an elementary derivation step; so I needed a set of derivation step axioms that would induce the above equivalence. My solution was to introduce one more operator — not permitted in rule forms at all, but used during intermediate steps in derivation. The operator: inverse, denoted by an overbar.

Imho, the inverse operator is elegant, bizarre, and cool. What it does is reverse the direction of derivation. That is, for any derivation step c₁ ⇒ c₂, we have another step c₂ ⇒ c₁. The inverse of the inverse of any term c is just the same term c back again; and every answer a is its own inverse. So, given any three answers a₁, a₂, and a₃, if ⟨a₁,a₂⟩ ⇒⁺ a₃ then a₃ ⇒⁺ ⟨a₁,a₂⟩ .

The whole derivation step relation is defined by four axioms. We've already discussed the first two of them, and the third is just the usual rewriting property of compatibility:

If c₁ ⇒ c₂ then c₂ ⇒ c₁ .
If ⟨a₁,a₂⟩ → a₃ is a rule instance of r ∈ ρ(a₁), then ⟨a₁,a₂⟩ ⇒ a₃ .
If c₁ ⇒ c₂ and C is a context in which the missing subterm isn't inside an inverse operator, then C[c₁] ⇒ C[c₂] .

The fourth axiom is the spark that makes queries come alive.

a₁ : ⟨a₁,a₂⟩ ⇒ a₂ .

So now, from ⟨a₁,a₂⟩ ⇒⁺ a₃ we can deduce by the first axiom a₃ ⇒⁺ ⟨a₁,a₂⟩ , by the third axiom a₁ : a₃ ⇒⁺ a₁ : ⟨a₁,a₂⟩ , and stringing this together with the fourth axiom, a₁ : a₃ ⇒⁺ a₂ .

The converse, that a₁ : a₃ ⇒⁺ a₂ implies ⟨a₁,a₂⟩ ⇒⁺ a₃ , can also be proven without too much fuss... if one has already proven the basic result that an answer cannot be derived from another answer. That is, if answer a is derived from c in one or more steps, c ⇒⁺ a , then c is not an answer. This would be trivial if not for the inverse operator, which allows an answer a to be the left-hand side of infinitely many derivations (the inverses of all derivations that end with a); the theorem says that once you've derived a, further derivation from there can't reach another answer. It took me two rather messy pages to prove this in my master's thesis (simple proofs, if they exist, can take decades to find), and I realized at the time it was a sort of analog to the Church-Rosser theorem for a calculus — an elementary well-behavedness result that one really wants to prove first because without it one has trouble proving other things.

I didn't think more of it until, years later when explaining RAGs to my dissertation committee (though in the end I didn't use RAGs in my dissertation), I got to see someone else go through a WTF moment followed by an aha! moment over the virulent un-Church-Rosser-ness of RAGs. Somehow it hadn't registered on me just how very like a term-rewriting calculus my RAG formalism might look, to someone used to working with λ-like calculi. Mathematical systems are routinely classified according to their formal properties, but there's something deeper going on here. RAGs, despite being in a meaningful way Turing-powerful, don't just lack the Church-Rosser property, they don't want it. A λ-like calculus that isn't Church-Rosser is badly behaved, likely pathological; but a grammar that is Church-Rosser is degenerate. It's a matter of purpose, which is not covered by formal mathematical properties.

My point (approached obliquely, but I'm not sure one would appreciate it properly if one didn't follow a route that offers a good view of the thing on approach) is that the purpose of a calculus is mainly to arrive at the result of an operation, whereas the purpose of a grammar is to relate a starting point to the usually-unbounded range of destinations reachable from it. This requires us to think carefully on what we're doing with a discrete analog to the conventional continuous notion of differentiation, because when we take a conventional derivative, the thing we're differentiating is a single-valued function, akin in purpose to a calculus meant to produce a result, whereas when we describe expressiveness/abstractiveness as derivatives of semantics, the thing we're differentiating appears, on the face of it, to be very like a grammar, describing a usually-unbounded network of roads departing from a common starting point.

Draft analogy

As foundation for a notion of discrete differentiation, we're looking for a useful correspondence between elements of these two kinds of structures, continuous and discrete. So far, it might seem we've identified only differences, rather than similarities, between the two. Continuous differentiation is necessarily in the "calculus" camp, abstractiveness/expressiveness necessarily in the "grammar" camp. Continuous differentiation at a point p is fundamentally local, depending in its deepest nature on a family of ε-neighborhoods of p as ε varies continuously toward zero. Abstractiveness/expressiveness of a language P is fundamentally global, depending in its deepest nature on a family of languages reachable from P by means of (in general, unbounded) text sequences. The local structure of the continuous case doesn't even exist in the discrete case, where language P has a set of closest neighbors; and the global structure on which the discrete case depends is deliberately ignored by the continuous derivative.

A starting point for an analogy is hidden in plain sight (though perhaps I've given it away already, as I chose my presentation to set the stage for it). Recall again Mach's principle, that the way the local universe works is a function of the shape of the rest of the universe. Wikipedia offers this anecdote for it:

You are standing in a field looking at the stars. Your arms are resting freely at your side, and you see that the distant stars are not moving. Now start spinning. The stars are whirling around you and your arms are pulled away from your body. Why should your arms be pulled away when the stars are whirling? Why should they be dangling freely when the stars don't move?

(One is reminded of the esoteric principle "as above, so below".)

In our continuous and discrete cases, we have two ways to examine the properties of a global structure at a particular location (p or P), one of which draws on an unbounded family of ε-neighborhoods approaching arbitrarily close, and the other on an (in general) unbounded family of discrete structures receding arbitrarily far. Either way, we derive our understanding of conditions at the particular location by aggregating from an unbounded, coherent structure. One might call the two strategies "unbounded locality" and "unbounded globality". If we accept that the way the local universe works and the global structure of the universe are two faces of the same thing, then these strategies ought to be two ways of getting at the same thing.

This strategic similarity is reassuring, but lacks tactical detail. In the continuous case, field F gave us a value at point p, the derivative gave us another field with again a value at p, and we could keep applying the differentiation operation (if F was sufficiently well-behaved to start with) producing a series of higher-derivative fields, each providing again a value at p. Yet in the discrete case we don't appear to have a local "value" at P, the purely local "state" being so devoid of information that we actually chose to drop the states from the formal treatment altogether; and while we may intuit that expressiveness is the derivative of semantics, and abstractiveness the derivative of expressiveness, the actual formalisms we constructed for these three things were altogether different from each other. The semantics apparently had only the text sequences, set of observables, and a (typically, infinite) discrete structure called a "language"; expressiveness added to this picture functions between languages; and abstractiveness augmented each language with a family of functions between the different nodes within each language, then defining functions between these "languages with expressive structure". On the face of it, this doesn't appear to be a series of structures of the same kind (as fields F, F ', F '' are of the same kind, granting we do expect them to vary their configurations in some regular, generalizable way).

In my treatment of abstractive power, though, I noted that if, in augmenting languages P and Q with "expressive structure", the family of functions we use is the degenerate family of identity functions, then the relation "〈Q,Id〉 is as abstractive as 〈P,Id〉" is equivalent to "Q is as expressive as P". This, together with Mach's principle, gives us a tactical analogy. The underlying "state machine", whose states are languages and whose transitions are labeled by program texts, corresponds to manifold 𝓜. A particular state, taken together with the rest of the machine reachable from that state, corresponds to point p of the manifold. An "expressive structure" overlain on the machine, creating a web of functions between its states, corresponds to a field F (or F ', etc.). Differentiation is then an operation mapping a given "field" (overlying web of functions) to another. Thus, starting with our discrete machine 𝓜, overlaying the family of identity functions gives us the semantics of 𝓜, differentiating the semantics gives 𝓜's expressive structure, differentiating again gives its abstractive structure.

This analogy provides a way for us to consider the various parts of our discrete scheme in terms of the major elements of the continuous scheme (manifold, field, point). But on closer inspection, there are a couple of flaws in the analogy that need to be thought out carefully — and these flaws will lead us to a second philosophical difference comparable in depth to the difference between grammars and term-rewriting calculi.

Breaking the draft analogy

One mismatch in the analogy concerns the difference in substance between 𝓜 and F. In the continuous case, manifold 𝓜 consists entirely of the oriented distances between its points (we know the distances themselves are numbers because they are the ε needed for differentiation). If you start with a field F of uniform value across 𝓜, and take its integral, you can reconstruct 𝓜. The key point here is that 𝓜 and F are made of the same "stuff". In our discrete scheme, though, 𝓜 is a state machine with term-labeled transitions, while F is a family of functions between term sequences. The discrete 𝓜 and F are concretely different structures, made of different "stuff" rather than commensurate in the manner of the continuous 𝓜 and F. It might not be immediately obvious what the consequences would be of this analogy mismatch; but as happens, it feeds into the second mismatch, which is more directly concerned with the practical use of derivatives in physics.

The second mismatch concerns how the manifold changes over time — or, equivalently, its curvature. Our main precedent here is general relativity.

In general relativity, mass warps the space near its position, and the resulting curvature of space guides the movement of the mass over time. It's not apparent that abstractiveness/expressiveness has anything analogous to time in this sense. It should be apparent, though —once one takes a moment to consider— that general relativity doesn't have absolute time either. Observers moving relative to each other will disagree in their perceptions of time, and of space, and of which observed events are and are not simultaneous; so rather than absolute time and space, we have four-dimensional spacetime that looks different to different observers. Thinking of spacetime as a whole, rather than trying to factor it into absolute space and time, a particle with mass traces a one-dimensional curve across the four-dimensional manifold of spacetime. When we said, a moment ago, that the mass's position warps space while space guides the mass's movement, this mutual influence is described by mathematical equations involving the manifold 𝓜 and fields F, F ', etc. These equations are all about numbers (which don't occur in our discrete scheme) and rely on the fact that the manifold and the fields are both made out of this same numeric stuff (which they aren't in our discrete scheme). Evidently, the key element here which we have failed to carry across the analogy is that of equations encompassing both 𝓜 and F.

But, when we look more closely to try to find a discrete element to support an analogy with the continuous equations, we find that the omission is after all something more primal than numbers, or even commonality of substance between 𝓜 and F. The omission is one of purpose. In the continuous case, the purpose of those equations is to enable us to solve for 𝓜 and F. But the discrete device has no interest in solving for 𝓜 at all. We don't use our theories of expressiveness and abstractiveness to define a language; we only use them to study languages that we've devised by other means.

Retrenching

At this juncture in the narrative, imho we've done rather well at reducing the question of the analogy to its essence. But we still don't have an answer, and we've exhausted our supply of insights in getting this far. We need more ammunition. What else do we know about these continuous and discrete schemes, to break us out of this latest impasse?

For one thing, the asymmetric presence of solving in the continuous scheme is a consequence of a difference in the practical purposes for which the two mathematical schemes were devised. In physics, we try to set up a mathematical system that matches reality, and then use the math to translate observations into constraints and, thus, predictions. Solving is the point of the mathematical exercise, telling us what we should expect to deduce from observations if the theory is correct. Whereas in expressiveness/abstractiveness, we consider various possible programming languages —all of which are of our own devising— and study the different consequences of them. Rather than trying to figure out the properties of the world we're living in, we study the properties of different invented worlds in hopes of deciding which we'd like to live in.

The difference goes deeper, though. Even if we did want to solve for state machine 𝓜 in the discrete scheme —in effect, solve for the programming language— we wouldn't find expressiveness/abstractiveness, nor even semantics, at all adequate to the task. Recall from the above account of the continuous scheme, when observing that 𝓜 and F are made of the same "stuff", I suggested that from F one might reconstruct 𝓜. Strange to tell, when I wrote that, I wasn't even thinking yet in terms of this gap in the analogy over solving for 𝓜 (though, obviously, that's where my reasoning was about to take me). There is nothing in manifold 𝓜 that's in any fundamental way more complicated than fields F etc. In stark contrast to the discrete case. Expressiveness/abstractiveness are constructed from families of functions over 𝓜; to the extent semantic/expressive/abstractive comparisons are anything more than directed graphs on the states of 𝓜, their complexity is an echo of a deliberately very limited part of the structural complexity latent in 𝓜 itself. While the internal structure of a programming language —state machine 𝓜 itself— is immensely complicated; tbh, it seems way more complicated than a mere space-time manifold, which is a continuum whereas a programming language is discontinuity incarnate.

The point about discontinuity recalls a key feature of the other analogy I drew, in my earlier post, between cosmological physics and term rewriting. Under that analogy, fundamental forces in physics corresponded to substitution functions in term-rewriting; while physical geometry, with its close relation to the gravitational force, corresponded to α-renaming, with its close relation to λ-substitution. In our current analogy, we've been treating continuity, with its role in ε-neighborhoods, as the defining feature of the manifold, and finding nothing like it in the syntactic complexity of a programming language. What if for purposes of solving, what we need is not so much continuity as geometry?

For this, we need to find something α-renaming-like in the discrete scheme, which is challenging because expressiveness/abstractiveness is not based on term-rewriting. There's more to it, though. Whatever-it-is, α-renaming-like, we mean to be the analog to the manifold. In other words, it's the discrete 𝓜. And since the whole programming language, or equivalently the state machine, evidently isn't α-renaming-like, 𝓜 isn't the programming language after all, but something else. Some facet of the programming language. And not just any facet — one that guides our discrete differentiation analogously to how the geometry of the manifold guides continuous differentiation.

2 comments:

John CowanMay 17, 2015 at 10:55 AM
Or as I like to say, "Die Consen und die Fixnummen hat der lieber McCarthy gemacht: alles anderen ist Hackerenwerk." (McCarthy made conses and fixnums; all else is the work of hackers.)
AnonymousNovember 24, 2015 at 6:39 AM
This comment has been removed by a blog administrator.

Sunday, May 17, 2015

Numberless differentiation

2 comments: