Saturday, June 11, 2016

The co-hygiene principle

The mathematics is not there till we put it there.
Sir Arthur Eddington, The Philosophy of Physical Science, 1938.

Investigating possible connections between seemingly unrelated branches of science and mathematics can be very cool.  Independent of whether the connections actually pan out.  It can be mind-bending either way — I'm a big fan of mind-bending, as a practical cure for rigid thinking — and you can get all sorts of off-beat insights into odd corners that get illuminated along the way.  The more unlikely the connection, the more likely potential for mind bending; and also the more likely potential for pay-off if somehow it does pan out after all.

Two hazards you need to avoid, with this sort of thing:  don't overplay the chances it'll pan out — and don't underplay the chances it'll pan out.  Overplay and you'll sound like a crackpot and, worse, you might turn yourself into one.  Relish the mind bending, take advantage of it to keep your thinking limber, and don't get upset when you're not finding something that might not be there.  And at the same time, if you're after something really unlikely, say with only one chance in a million it'll pan out, and you don't leave yourself open to the possibility it will, you might just draw that one chance in a million and miss it, which would be just awful.  So treat the universe as if it has a sense of humor, and be prepared to roll with the punchlines.

Okay, the particular connection I'm chasing is an observed analogy between variable substitution in rewriting calculi and fundamental forces in physics.  If you know enough about those two subjects to say that makes no sense, that's what I thought too when I first noticed the analogy.  It kept bothering me, though, because it hooks into something on the physics side that's already notoriously anomalous — gravity.  The general thought here is that when two seemingly disparate systems share some observed common property, there may be some sort of mathematical structure that can be used to describe both of them and gives rise to the observed common property; and a mathematical modeling structure that explains why gravity is so peculiar in physics is an interesting prospect.  So I set out to understand the analogy better by testing its limits, elaborating it until it broke down.  Except, the analogy has yet to cooperate by breaking down, even though I've now featured it on this blog twice (1, 2).

So, building on the earlier explorations, in this post I tackle the problem from the other end, and try to devise a type of descriptive mathematical model that would give rise to the pattern observed in the analogy.

This sort of pursuit, as I go about it, is a game of endurance; again and again I'll lay out all the puzzle pieces I've got, look at them together, and try to accumulate a few more insights to add to the collection.  Then gather up the pieces and save them away for a while, and come back to the problem later when I'm fresh on it again.  Only this time I've kind-of succeeded in reaching my immediate goal.  The resulting post, laying out the pieces and accumulating insights, is therefore both an explanation of where the result comes from and a record of the process by which I got there.  There are lots of speculations within it shooting off in directions that aren't where I ended up.  I pointedly left the stray speculations in place.  Some of those tangents might turn out to be valuable after all; and taking them out would create a deceptive appearance of things flowing inevitably to a conclusion when, in the event, I couldn't tell whether I was going anywhere specific until I knew I'd arrived.

Naturally, for finding a particular answer — here, a mathematical structure that can give rise to the observed pattern — the reward is more questions.

Noether's Theorem
Determinism and rewriting
Nondeterminism and the cosmic footprint
Massive interconnection
Epilog: hygiene
Noether's Theorem

Noether's theorem (pedantically, Noether's first theorem) says that each differentiable invariant in the action of a system gives rise to a conservation law.  This is a particularly celebrated result in mathematical physics; it's explicitly about how properties of a system are implied by the mathematical structure of its description; and invariants — the current fad name for them in physics is "symmetries" — are close kin to both hygiene and geometry, which relate to each other through the analogy I'm pursuing; so Noether's theorem has a powerful claim on my attention.

The action of a system always used to seem very mysterious to me, until I figured out it's one of those deep concepts that, despite its depth, is also quite shallow.  It comes from Lagrangian mechanics, a mathematical formulation of classical mechanics alternative to the Newtonian mechanics formulation.  This sort of thing is ubiquitous in mathematics, alternative formulations that are provably equivalent to each other but make various problems much easier or harder to solve.

Newtonian mechanics seeks to describe the trajectory of a thing in terms of its position, velocity, mass, and the forces acting on it.  This approach has some intuitive advantages but is sometimes beastly difficult to solve for practical problems.  The Lagrangian formulation is sometimes much easier to solve.  Broadly, the time evolution of the system follows a trajectory through abstract state-space, and a function called the Lagrangian of the system maps each state into a quantity that... er... well, its units are those of energy.  For each possible trajectory of the system through state-space, the path integral of the Lagrangian is the action.  The principle of least action says that starting from a given state, the system will evolve along the trajectory that minimizes the action.  Solving for the behavior of the system is then a matter of finding the trajectory whose action is smallest.  (How do you solve for the trajectory with least action?  Well, think of the trajectories as abstract values subject to variation, and imagine taking the "derivative" of the action over these variations.  The least action will be a local minimum, where this derivative is zero.  There's a whole mathematical technology for solving problems of just that form, called the "calculus of variations".)

The Lagrangian formulation tends to be good for systems with conserved quantities; one might prefer the Newtonian approach for, say, a block sliding on a surface with friction acting on it.  And this Lagrangian affinity for conservative systems is where Noether's theorem comes in:  if there's a differentiable symmetry of the action — no surprise it has to be differentiable, seeing how central integrals and derivatives are to all this — the symmetry manifests itself in the system behavior as a conservation law.

And what, you may ask, is this magical Lagrangian function, whose properties studied through the calculus of variations reveal the underlying conservation laws of nature?  Some deeper layer of reality, the secret structure that underlies all?  Not exactly.  The Lagrangian function is whatever works:  some function that causes the principle of least action to correctly predict the behavior of the system.  In quantum field theory — so I've heard, having so far never actually grappled with QFT myself — the Lagrangian approach works for some fields but there is no Lagrangian for others.  (Yes, Lagrangians are one of those mathematical devices from classical physics that treats systems in such an abstract, holistic way that it's applicable to quantum mechanics.  As usual for such devices, its history involves Sir William Rowan Hamilton, who keeps turning up on this blog.)

This is an important point:  the Lagrangian is whatever function makes the least-action principle work right.  It's not "really there", except in exactly the sense that if you can devise a Lagrangian for a given system, you can then use it via the action integral and the calculus of variations to describe the behavior of the system.  Once you have a Lagrangian function that does in fact produce the system behavior you want it to, you can learn things about that behavior from mathematical exploration of the Lagrangian.  Such as Noether's theorem.  When you find there is, or isn't, a certain differentiable symmetry in the action, that tells you something about what is or isn't conserved in the behavior of the system, and that result really may be of great interest; just don't lose sight of the fact that you started with the behavior of the system and constructed a suitable Lagrangian from which you are now deducing things about what the behavior does and doesn't conserve.

In 1543, Copernicus's heliocentric magnum opus De revolutionibus orbium coelestium was published with an unsigned preface by Lutheran theologian Andreas Osiander saying, more or less, that of course it'd be absurd to suggest the Earth actually goes around the Sun but it's a very handy fiction for the mathematics.  Uhuh.  It's unnecessary to ask whether our mathematical models are "true"; we don't need them to be true, just useful.  When Francis Bacon remarked that what is most useful in practice is most correct in theory, he had a point — at least, for practical purposes.


The rewriting-calculus side of the analogy has a structural backstory from at least the early 1960s (some of which I've described in an earlier post, though with a different emphasis).  Christopher Strachey hired Peter Landin as an assistant, and encouraged him to do side work exploring formal foundations for programming languages.  Landin focused on tying program semantics to λ-calculus; but this approach suffered from several mismatches between the behavioral properties of programming languages versus λ-calculus, and in 1975 Gordon Plotkin published a solution for one of these mismatches, in one of the all-time classic papers in computer science, "Call-by-name, call-by-value and the λ-calculus" (pdf).  Plotkin defined a slight variant of λ-calculus, by altering the conditions for the β-rule so that the calculus became call-by-value (the way most programming languages behaved while ordinary λ-calculus did not), and proved that the resulting λv-calculus was fully Church-Rosser ("just as well-behaved" as ordinary λ-calculus).  He further set up an operational semantics — a rewriting system that ignored mathematical well-behavedness in favor of obviously describing the correct behavior of the programming language — and proved a set of correspondence theorems between the operational semantics and λv-calculus.

[In the preceding paragraph I perhaps should have mentioned compatibility, the other crucial element of rewriting well-behavedness; which you might think I'd have thought to mention since it's a big deal in my own work, though less flashy and more taken-for-granted than Church-Rosser-ness.]

Then in the 1980s, Matthias Felleisen applied Plotkin's approach to some of the most notoriously "unmathematical" behaviors of programs:  side-effects in both data (mutable variables) and control (goto and its ilk).  Like Plotkin, he set up an operational semantics and a calculus, and proved correspondence theorems between them, and well-behavedness for the calculus.  He introduced the major structural innovation of treating a side-effect as an explicit syntactic construct that could move upward within its term.  This upward movement would be a fundamentally different kind of rewrite from the function-application — the β-rule — of λ-calculus; abstractly, a side-effect is represented by a context 𝓢, which moves upward past some particular context C and, in the process, modifies C to leave in its wake some other context C':  C[𝓢[T]] → 𝓢[C'[T]] .  A side-effect is thus viewed as something that starts in a subterm and expands outward to affect more and more of the term until, potentially, it affects the whole term — if it's allowed to expand that far.  Of course, a side-effect might never expand that far if it's trapped inside a context that it can't escape from; notably, no side-effect can escape from context λ.[ ] , which is to say, no side-effect can escape from inside the body of a function that hasn't been called.

This is where I started tracking the game, and developing my own odd notions.  There seemed to me to be two significant drawbacks to Felleisen's approach, in its original published form.  For one thing, the transformation of context C to C', as 𝓢 moved across it, could be quite extensive; Felleisen himself aptly called these transformations "bubbling up"; as an illustration of how messy things could get, here are the rules for a continuation-capture construct C expanding out of the operator or operand of a function call:

(CT1)T2   →   C(λx1.(T1(λx2.(A(x1(x2T2))))))    for unused xk.
V(CT)   →   C(λx1.(T(λx2.(A(x1(V‍x2))))))    for unused xk.
The other drawback to the approach was that as published at the time, it didn't actually provide the full measure of well-behavedness from Plotkin's treatment of call-by-value.  One way or another, a constraint had to be relaxed somewhere.  What does the side-effect construct 𝓢 do once it's finished moving upward?  The earliest published solution was to wait till 𝓢 reaches the top of the term, and then get rid of it by a whole-term rewriting rule; that works, but the whole-term rewriting rule is explicitly not well-behaved:  calculus well-behavedness requires that any rewriting on a whole term can also be done to a subterm, and here we've deliberately introduced a rewriting rule that can't be applied to subterms.  So we've weakened the calculus well-behavedness.  Another solution is to let 𝓢 reach the top of the term, then let it settle into some sort of normal form, and relax the semantics–calculus correspondence theorems to allow for equivalent normal forms.  So the correspondence is weaker or, at least, more complicated.  A third solution is to introduce an explicit context-marker — in both the calculus and the operational semantics — delimiting the possible extent of the side-effect.  So you've got full well-behavedness but for a different language than you started out with.  (Felleisen's exploration of this alternative is part of the prehistory of delimited continuations, but that's another story.)

[In a galling flub, I'd written in the preceding paragraph Church-Rosser-ness instead of well-behavedness; fixed now.]

It occurred to me that a single further innovation should be able to eliminate both of these drawbacks.  If each side-effect were delimited by a context-marker that can move upward in the term, just as the side-effect itself can, then the delimiter would restore full Church-Rosser-ness without altering the language behavior; but, in general, the meanings of the delimited side-effects depend on the placement of the delimiter, so to preserve the meaning of the term, moving the delimiter may require some systematic alteration to the matching side-effect markers.  To support this, let the delimiter be a variable-binding construct, with free occurrences of the variable in the side-effect markers.  The act of moving the delimiter would then involve a sort of substitution function that propagates needed information to matching side-effect markers.  What with one thing and another, my academic pursuits dragged me away from this line of thought for years, but then in the 2000s I found myself developing an operational semantics and calculus as part of my dissertation, in order to demonstrate that fexprs really are well-behaved (though I should have anticipated that some people, having been taught otherwise, would refuse to believe it even with proof).  So I seized the opportunity to also elaborate my binding-delimiters approach to things that — unlike fexprs — really are side-effects.

This second innovation rather flew in the face of a tradition going back about seven or eight decades, to the invention of λ-calculus.  Alonzo Church was evidently quite concerned about what variables mean; he maintained that a proposition with free variables in it doesn't have a clear meaning, and he wanted to have just one variable-binding construct, λ, whose β-rule defines the practical meanings of all variables.  This tradition of having just one kind of variable, one binding construct, and one kind of variable-substitution (β-substitution) has had a powerful grip on researchers' imaginations for generations, to the point where even when other binding constructs are introduced they likely still have most of the look-and-feel of λ.  My side-effect-ful variable binders are distinctly un-λ-like, with rewriting rules, and substitution functions, bearing no strong resemblance to the β-rule.  Freedom from the β mold had the gratifying effect of allowing much simpler rewriting rules for moving upward through a term, without the major perturbations suggested by the term "bubbling up"; but, unsurprisingly, the logistics of a wild profusion of new classes of variables were not easy to work out.  Much elegant mathematics surrounding λ-calculus rests squarely on the known simple properties of its particular take on variable substitution.  The chapter of my dissertation that grapples with the generalized notion of substitution (Chapter 13, "Substitutive Reduction Systems", for anyone keeping score) has imho appallingly complicated foundations, although the high-level theorems at least are satisfyingly powerful.  One thing that did work out neatly was enforcement of variable hygiene, which in ordinary λ-calculus is handled by α-renaming.  In order to apply any nontrivial term-rewriting rule without disaster, you have to first make sure there aren't some two variables using the same name whose distinction from each other would be lost during the rewrite.  It doesn't matter, really, what sort of variables are directly involved in the rewrite rule:  an unhygienic rewrite could mess up variables that aren't even mentioned by the rule.  Fortunately, it's possible to define a master α-renaming function that recurses through the term renaming variables to maintain hygiene, and whenever you add a new sort of variable to the system, just extend the master function with particular cases for that new sort of variable.  Each rewriting rule can then invoke the master function, and everything works smoothly.

I ended up with four classes of variables.  "Ordinary" variables, of the sort supported by λ-calculus, I found were actually wanted only for a specific (and not even technically necessary) purpose:  to support partial evaluation.  You could build the whole calculus without them and everything would work right, but the equational theory would be very weak.  (I blogged on this point in detail here.)  A second class of variable supported continuations; in effect, the side-effect marker was a "throw" and the binding construct was a "catch".  Mutable state was more complicated, involving two classes of variables, one for assignments and one for lookups.  The variables for assignment were actually environment identities; each assignment side-effect would then specify a value, a symbol, and a free variable identifying the environment.  The variables for lookup stood for individual environment-symbol queries; looking up a symbol in an environment would generate queries for that environment and each of its ancestor environments.  The putative result of the lookup would be a leaf subterm with free variable occurrences for all the queries involved, waiting to assimilate the query results, while the queries themselves would rise through the term in search of matching assignments.  Whenever a query found a matching assignment, it would self-annihilate while using substitution to report the result to all waiting free variable occurrences.

Does all this detail matter to the analogy with physics?  Well, that's the question, isn't it.  There's a lot there, a great deal of fodder to chew on when considering how an analogy with something else might have a structural basis.


Amongst the four classes of variables, partial-evaluation variables have a peculiar sort of... symmetry.  If you constructed a vau-calculus with, say, only continuation variables, you'd still have two different substitution functions — one to announce that a delimiting "catch" has been moved upward, and one for α-renaming.  If you constructed a vau-calculus with only mutable-state variables, you'd have, well, a bunch of substitution functions, but in particular all the substitutions used to enact rewriting operations would be separate from α-renaming.  β-substitution, though, is commensurate with α-renaming; once you've got β-substitution of partial-evaluation variables, you can use it to α-rename them as well, which is why ordinary λ-calculus has, apparently at least, only one substitution function.

Qualitatively, partial-evaluation variables seem more integrated into the fabric of the calculus, in contrast to the other classes of variables.

All of which put me powerfully in mind of physics because it's a familiar observation that gravity seems qualitatively more integrated into the fabric of spacetime, in contrast to the other fundamental forces (xkcd).  General relativity portrays gravity as the shape of spacetime, whereas the other forces merely propagate through spacetime, and a popular strategy for aspiring TOEs (Theories Of Everything) is to integrate the other fundamental forces into the geometry as well — although, looking at the analogy, perhaps that popular strategy isn't such a good idea after all.  Consider:  The analogy isn't just between partial-evaluation variables and gravity.  It's between the contrast of partial-evaluation variables against the other classes of variables, and the contrast of gravity against the other fundamental forces.  All the classes of variables, and all the fundamental forces, are to some extent involved.  I've already suggested that Felleisen's treatment of side-effects was both weakened and complicated by its too-close structural imitation of λ, whereas a less λ-like treatment of side-effects can be both stronger and simpler; so, depending on how much structure carries through the analogy, perhaps trying to treat the other fundamental forces too much like gravity should be expected to weaken and complicate a TOE.

Projecting through the analogy suggests alternative ways to structure theories of physics, which imho is worthwhile independent of whether the analogy is deep or shallow; as I've remarked before, I actively look for disparate ways of thinking as a broad base for basic research.  The machinery of calculus variable hygiene, with which partial-evaluation variables have a special affinity, is only one facet of term structure; and projecting this through to fundamental physics, where gravity has a special affinity with geometry, this suggests that geometry itself might usefully be thought of, not as the venue where physics takes place, but merely as part of the rules by which the game is played.  Likewise, the different kinds of variables differ from each other by the kinds of structural transformations they involve; and projecting that through the analogy, one might try to think of the fundamental forces as differing from each other not (primarily) by some arbitrary rules of combination and propagation, but by being different kinds of structural manipulations of reality.  Then, if there is some depth to the analogy, one might wonder if some of the particular technical contrasts between different classes of variables might be related to particular technical contrasts between different fundamental forces — which, frankly, I can't imagine deciphering until and unless one first sets the analogy on a solid technical basis.

I've speculated several times on this blog on the role of non-locality in physics.  Bell's Theorem says that the statistical distribution of quantum predictions cannot be explained by any local, deterministic theory of physics if, by 'local and deterministic', you mean 'evolving forward in time in a local and deterministic way'; but it's quite possible to generate this same statistical distribution of spacetime predictions using a theory that evolves locally and deterministically in a fifth dimension orthogonal to spacetime.  Which strikes a familiar chord through the analogy with calculus variables, because non-locality is, qualitatively at least, the defining characteristic of what we mean by "side-effects", and the machinery of α-renaming maintains hygiene for these operations exactly by going off and doing some term rearranging on the side (as if in a separate dimension of rewriting that we usually don't bother to track).  Indeed, thought of this way, a "variable" seems to be an inherently distributed entity, spread over a region of the term — called its scope — rather than located at a specific point.  A variable instance might appear to have a specific location, but only because we look at a concrete syntactic term; naturally we have to have a particular concrete term in order to write it down, but somehow this doesn't seem to do justice to the reality of the hygiene machinery.  One could think of an equivalence class of terms under α-renaming, but imho even that is a bit passive.  The reality of a variable, I've lately come to think, is a dynamic distributed entity weaving through the term, made up of the binding construct (such as a λ), all the free instances within its scope, and the living connections that tie all those parts together; I imagine if you put your hand on any part of that structure you could feel it humming with vitality.


To give a slightly less hand-wavy description of my earlier post on Bell's Theorem — since it is the most concrete example we have to inform our view of the analogy on the physics side:

Bell looked at a refinement of the experiment from the EPR paradox.  A device emits two particles with entangled spin, which shoot off in opposite directions, and their spins are measured by oriented detectors at some distance from the emitter.  The original objection of Einstein Podolsky and Rosen was that the two measurements are correlated with each other, but because of the distance between the two detectors, there's no way for information about either measurement to get to where the other measurement takes place without "spooky action at a distance".  Bell refined this objection by noting that the correlation of spin measurements depends on the angle θ between the detectors.  If you suppose that the orientations of the detectors at measurement is not known at the time and place where the particles are emitted, and that the outcomes of the measurements are determined by some sort of information — "hidden variable" — propagating from the emission event at no more than the speed of light, then there are limits (called Bell's Inequality) on how the correlation can be distributed as a function of θ, no matter what the probability distribution of the hidden variable.  The distribution predicted by quantum mechanics violates Bell's Inequality; so if the actual probability distribution of outcomes from the experiment matches the quantum mechanical prediction, we're living in a world that can't be explained by a local hidden-variable theory.

My point was that this whole line of reasoning supposes the state of the world evolves forward in time.  If it doesn't, then we have to rethink what we even mean by "locality", and I did so.  Suppose our entire four-dimensional reality is generated by evolving over a fifth dimension, which we might as well call "metatime".  "Locality" in this model means that information about the state of one part of spacetime takes a certain interval of metatime to propagate a certain distance to another part of spacetime.  Instead of trying to arrange the probability distribution of a hidden variable at the emission event so that it will propagate through time to produce the desired probability distribution of measurements — which doesn't work unless quantum mechanics is seriously wrong about this simple system — we can start with some simple, uniform probability distribution of possible versions of the entire history of the experiment, and by suitably arranging the rules by which spacetime evolves, we can arrange that eventually spacetime will settle into a stable state where the probability distribution is just what quantum mechanics predicts.  In essence, it works like this:  let the history of the experiment be random (we don't need nondeterminism here; this is just a statement of uniformly unknown initial conditions), and suppose that the apparent spacetime "causation" between the emission and the measurements causes the two measurements to be compared to each other.  Based on θ, let some hidden variable decide whether this version of history is stable; and if it isn't stable, just scramble up a new one (we can always do that by pulling it out of the uniform distribution of the hidden variable, without having to posit fundamental nondeterminism).  By choosing the rule for how the hidden variable interacts with θ, you can cause the eventual stable history of the experiment to exhibit any probability distribution you choose.

That immense power is something to keep a cautious eye on:  not only can this technique produce the probability distribution predicted by quantum mechanics, it can produce any other probability distribution as well.  So, if the general structure of this mathematical theory determines something about the structure of the physical reality it depicts, what it determines is apparently not, in any very straightforward fashion, that probability distribution.


The side of the analogy we have prior detailed structural knowledge about is the vau-calculus side.  Whatever useful insights we may hope to extract from the metatime approach to Bell's Theorem, it's very sketchy compared to vau-calculus.  So if we want to work out a structural pattern that applies to both sides of the analogy, it's plausible to start building from the side we know about, questioning and generalizing as we go along.  To start with,

  • Suppose we have a complex system, made up of interconnected parts, evolving by some sort of transformative steps according to some simple rules.
Okay, freeze frame.  Why should the system be made up of parts?  Well, in physics it's (almost) always the parts we're interested in.  We ourselves are, apparently, parts of reality, and we interact with parts of reality.  Could we treat the whole as a unit and then somehow temporarily pull parts out of it when we need to talk about them?  Maybe, but the form with parts is still the one we're primarily interested in.  And what about "transformative steps"; do we want discrete steps rather than continuous equations?  Actually, yes, that is my reading of the situation; not only does fundamental physics appear to be shot through with discreteness (I expanded on this point a while back), but the particular treatment I used for my metatime proof-of-concept (above) used an open-ended sequence of discrete trials to generate the requisite probability distribution.  If a more thoroughly continuous treatment is really wanted, one might try to recover continuity by taking a limit a la calculus.
  • Suppose we separate the transformation rules into two groups, which we call bookkeeping rules and operational rules; and suppose we have a set of exclusive criteria on system configurations, call these hygiene conditions, which must be satisfied before any operational rule can be applied.

Freeze again.  At first glance, this looks pretty good.  From any unhygienic configuration, we can't move forward operationally until we've done bookkeeping to ensure hygiene.  Both calculus rewriting and the metatime proof-of-concept seemingly conform to this pattern; but the two cases differ profoundly in how their underlying hygiene (supposing that's what it is, in the physics case) affects the form of the modeled system, and we'll need to consider the difference carefully if we mean to build our speculations on a sound footing.

Determinism and rewriting

Hygiene in rewriting is all about preserving properties of a term (to wit, variable instance–binding correspondences), whereas our proof-of-concept metatime transformations don't appear to be about perfectly preserving something but rather about shaping probability distributions.  One might ask whether it's possible to set up the internals of our metatime model so that the probability distribution is a consequence, or symptom, of conserving something behind the scenes.  Is the seemingly nondeterministic outcome of our quantum observation in a supposedly small quantum system here actually dictated by the need to maintain some cosmic balance that can't be directly observed because it's distributed over a ridiculously large number of entities (such as the number of electrons in the universe)?  That could lead to some bracing questions about how to usefully incorporate such a notion into a mathematical theory.

As an alternative, one might decide that the probability distribution in the metatime model should not be a consequence of absolutely preserving a condition.  There are two philosophically disparate sorts of models involving probabilities:  either the probability comes from our lack of knowledge (the hidden-variable hypothesis), and in the underlying model the universe is computing an inevitable outcome; or the probability is in the foundations (God playing dice, in the Einsteinian phrase), and in the underlying model the universe is exploring the range of possible outcomes.  I discussed this same distinction, in another form, in an earlier post, where it emerged as the defining philosophical distinction between a calculus and a grammar (here).  In those terms, if our physics model is fundamentally deterministic then it's a calculus and by implication has that structural affinity with the vau-calculi on the other side of the analogy; but if our physics model is fundamentally nondeterministic then it's a grammar, and our analogy has to try to bridge that philosophical gap.  Based on past experience, though, I'm highly skeptical of bridging the gap; if the analogy can be set on a concrete technical basis, the TOE on the physics side seems to me likely to be foundationally deterministic.

The foundationally deterministic approach to probability is to start with a probabilistic distribution of deterministic initial states, and evolve them all forward to produce a probabilistic distribution of deterministic final states.  Does the traditional vau-calculus side of our analogy, where we have so much detail to start with, have anything to say about this?  In the most prosaic sense, ones suspects not; probability distributions don't traditionally figure into deterministic computation semantics, where this approach would mean considering fuzzy sets of terms.  There may be some insight lurking, though, in the origins of calculus hygiene.

When Alonzo Church's 1932/3 formal logic turned out to be inconsistent, he tried to back off and find some subset of it that was provably consistent.  Here consistent meant that not all propositions are equivalent to each other, and the subset of the logic that he and his student J. Barkley Rosser proved consistent in this sense was what we now call λ-calculus.  The way they did it was to show that if any term T1 can be reduced in the calculus in two different ways, as T2 and T3, then there must be some T4 that both of them can be reduced to.  Since logical equivalence of terms is defined as the smallest congruence generated by the rewriting relation of the calculus, from the Church-Rosser property it follows that if two terms are equivalent, there must be some term that they both can be reduced to; and therefore, two different irreducible terms cannot possibly be logically equivalent to each other.

Proving the Church-Rosser theorem for λ-calculus was not, originally, a simple matter.  It took three decades before a simple proof began to circulate, and the theorem for variant calculi continues to be a challenge.  And this is (in one view of the matter, at least) where hygiene comes into the picture.  Church had three major rewriting rules in his system, later called the α, β, and γ rules.  The α rule was the "bookkeeping" rule; it allowed renaming a bound variable as long as you don't lose its distinction from other variables in the process.  The β rule is now understood as the single operational rule of λ-calculus, how to apply a function to an argument.  The γ rule is mostly forgotten now; it was simply doing a β-step backward, and was later dropped in favor of starting with just α and β and then taking the congruent closure (reflexive, symmetric, transitive, and compatible).  Ultimately the Church-Rosser theorem allows terms to be sorted into β-equivalence classes; but the terms in each class aren't generally thought of as "the same term", just "equivalent terms".  α-equivalent terms, though, are much closer to each other, and for many purposes would actually be thought of as "the same term, just written differently".  Recall my earlier description of a variable as a distributed entity, weaving through the term, made up of binding construct, instances, and living connections between them.  If you have a big term, shot through with lots of those dynamic distributed entities, the interweaving could really be vastly complex.  So factoring out the α-renaming is itself a vast simplification, which for a large term may dwarf what's left after factoring to complete the Church-Rosser proof.  To see by just how much the bookkeeping might dwarf the remaining operational complexity, imagine scaling the term up to the sort of cosmological scope mentioned earlier — like the number of electrons in the universe.

It seems worth considering, that hygiene may be a natural consequence of a certain kind of factoring of a vastly interconnected system:  you sequester almost all of the complexity into bookkeeping with terrifically simple rules applied on an inhumanly staggering scale, and comparatively nontrivial operational rules that never have to deal directly with the sheer scale of the system because that part of the complexity was factored into the bookkeeping.  In that case, at some point we'll need to ask when and why that sort of factoring is possible.  Maybe it isn't really possible for the cosmos, and a flaw in our physics is that we've been trying so hard to factor things this way; when we really dive into that question we'll be in deep waters.

It's now no longer clear, btw, that geometry corresponds quite directly to α-renaming.  There was already some hint of that in the view of vau-calculus side-effects as "non-local", which tends to associate geometry with vau-calculus term structure rather than α-renaming as such.  Seemingly, hygiene is then a sort of adjunct to the geometry, something that allows the geometry to coexist with the massive interconnection of the system.

But now, with massive interconnection resonating between the two sides of the analogy, it's definitely time to ask some of those bracing questions about incorporating cosmic connectivity into a mathematical theory of physics.

Nondeterminism and the cosmic footprint

We want to interpret our probability distribution as a footprint, and reconstruct from it the massively connected cosmic order that walked there.  Moreover, we're conjecturing that the whole system is factorable into bookkeeping/hygiene on one hand(?), and operations that amount to what we'd ordinarily call "laws of physics" on the other; and we'd really like to deduce, from the way quantum mechanics works, something about the nature of the bookkeeping and the factorization.

Classically, if we have a small system that's acted on by a lot of stuff we don't know about specifically, we let all those influences sum to a potential field.  One might think of this classical approach as a particular kind of cosmological factorization in which the vast cosmic interconnectedness is reduced to a field, so one can then otherwise ignore almost everything to model the behavior of the small system of interest using a small operational set of physical laws.  We know the sort of cosmic order that reduces that way, it's the sort with classical locality (relative to time evolution); and the vaster part of the factorization — the rest of the cosmos, that reduced to a potential field — is of essentially the same kind as the small system.  The question we're asking at this point is, what sort of cosmic order reduces such that its operational part is quantum mechanics, and what does its bookkeeping part look like?  Looking at vau-calculus, with its α-equivalence and Church-Rosser β-equivalence, it seems fairly clear that hygiene is an asymmetric factorization:  if the cosmos factors this way, the bookkeeping part wouldn't have to look at all like quantum mechanics.  A further complication is that quantum mechanics may be an approximation only good when the system you're looking at is vastly smaller than the universe as a whole; indeed, this conjecture seems rather encouraged by what happens when we try to apply our modern physical theories to the cosmos as a whole: notably, dark matter.  (The broad notion of asymmetric factorizations surrounding quantum mechanics brings to mind both QM's notorious asymmetry between observer and observed, and Einstein's suggestion that QM is missing some essential piece of reality.)

For this factorization to work out — for the cosmic system as a whole to be broadly "metaclassical" while factoring through the bookkeeping to either quantum mechanics or a very good approximation thereof — the factorization has to have some rather interesting properties.  In a generic classical situation where one small thing is acted on by a truly vast population of other things, we tend to expect all those other things to average out (as typically happens with a classical potential field), so that the vastness of the population makes their combined influence more stable rather than less; and also, as our subsystem interacts with the outside influence and we thereby learn more about that influence, we become more able to allow for it and still further reduce any residual unpredictability of the system.

Considered more closely, though, the expectation that summing over a large population would tend to average out is based broadly on the paradigm of signed magnitudes on an unbounded scale that attenuate over distance.  If you don't have attenuation, and your magnitudes are on a closed loop, such as angles of rotation, increasing the population just makes things more unpredictable.  Interestingly, I'd already suggested in one of my earlier explorations of the hygiene analogy that the physics hygiene condition might be some sort of rotational constraint, for the — seemingly — unrelated reason that the primary geometry of physics has 3+1 dimension structure, which is apparently the structure of quaternions, and my current sense of quaternions is that they're the essence of rotation.  Though when things converge like this, it can be very hard to distinguish between an accidental convergence and one that simply reassembles fragments of a deeper whole.

I'll have a thought on the other classical problem — losing unpredictability as we learn more about outside influences over time — after collecting some further insights on the structural dynamics of bookkeeping.

Massive interconnection

Given a small piece of the universe, which other parts of the universe does it interact with, and how?

In the classical decomposition, all interactions with other parts of the universe are based on relative position in the geometry — that is, locality.  Interestingly, conventional quantum mechanics retains this manner of selecting interactions, embedding it deeply into the equational structure of its mathematics.  Recall the Schrödinger equation,

iℏ Ψ
 =   Ĥ Ψ .
The element that shapes the time evolution of the system — the Hamiltonian function Ĥ — is essentially an embodiment of the classical expectation of the system behavior; which is to say, interaction according to the classical rules of locality.  (I discussed the structure of the Schrödinger equation at length in an earlier post, yonder.)  Viewing conventional QM this way, as starting with classical interactions and then tacking on quantum machinery to "fix" it, I'm put in mind of Ptolemaic epicycles, tacked on to the perfect-circles model of celestial motion to make it work.  (I don't mean the comparison mockingly, just critically; Ptolemy's system worked pretty well, and Copernicus used epicycles too.  Turns out there's a better way, though.)

How does interaction-between-parts play out in vau-calculus, our detailed example of hygiene at work?

The whole syntax of a calculus term is defined by two aspects:  the variables — by which I mean those "distributed entities" woven through the term, each made of a binding, its bound instances, and the connections that hygiene maintains between them — and, well, everything else.  Once you ignore the specific identities of all the variable instances, you've just got a syntax tree with each node labeled by a context-free syntax production; and the context-free syntax doesn't have very many rules.  In λ-calculus there are only four syntax rules: a term is either a combination, or a λ-abstraction, or a variable, or a constant.  Some treatments simplify this by using variables for the "constants", and it's down to only three syntax rules.  The lone operational rule β,

((λx.T1)T2)   →   T1[x ← T2]   ,
gives a purely local pattern in the syntax tree for determining when the operation can be applied:  any time you have a parent node that's a combination whose left child is a λ-abstraction.  Operational rules stay nearly this simple even in vau-calculi; the left-hand side of each operational rule specifies when it can be applied by a small pattern of adjacent nodes in the syntax tree, and mostly ignores variables (thanks to hygienic bookkeeping).  The right-hand side is where operational substitution may be introduced.  So evidently vau-calculus — like QM — is giving local proximity a central voice in determining when and how operations apply, with the distributed aspect (variables) coming into play in the operation's consequences (right-hand side).

Turning it around, though, if you look at a small subterm — analogous, presumably, to a small system studied in physics — what rules govern its non-local connections to other parts of the larger term?  Let's suppose the term is larger than our subterm by a cosmically vast amount.  The free variables in the subterm are the entry points by which non-local influences from the (vast) context can affect the subterm (via substitution functions).  And there is no upper limit to how fraught those sorts of interconnections can get (which is, after all, what spurs advocates of side-effect-less programming).  That complexity belongs not to the "laws of physics" (neither the operational nor even the bookkeeping rules), but to the configuration of the system.  From classical physics, we're used to having very simple laws, with all the complexity of our problems coming from the particular configuration of the small system we're studying; and now we've had that turned on its head.  From the perspective of the rules of the calculus, yes, we still have very simple rules of manipulation, and all the complexity is in the arrangement of the particular term we're working with; but from the perspective of the subterm of interest, the interconnections imposed by free variables look a lot like "laws of physics" themselves.  If we hold our one subterm fixed and allow the larger term around it to vary probabilistically then, in general, we don't know what the rules are and have no upper bound on how complicated those rules might be.  All we have are subtle limits on the shape of the possible influences by those rules, imposed roundaboutly by the nature of the bookkeeping-and-operational transformations.

One thing about the shape of these nonlocal influences:  they don't work like the local part of operations.  The substitutive part of an operation typically involves some mixture of erasing bound variable instances and copying fragments from elsewhere.  The upshot is that it rearranges the nonlocal topology of the term, that is, rearranges the way the variables interweave — which is, of course, why the bookkeeping rules are needed, to maintain the integrity of the interweaving as it winds and twists.  And this is why a cosmic system of this sort doesn't suffer from a gradual loss of unpredictability as the subsystem interacts with its neighbors in the nonlocal network and "learns" about them:  each nonlocal interaction that affects it changes its nonlocal-network neighbors.  As long as the overall system is cosmically vast compared to the the subsystem we're studying, in practice we won't run out of new network neighbors no matter how many nonlocal actions our subsystem undergoes.

This also gives us a specific reason to suspect that quantum mechanics, by relying on this assumption of an endless supply of fresh network neighbors, would break down when studying subsystems that aren't sufficiently small compared to the cosmos as a whole.  Making QM (as I've speculated before), like Newtonian physics, an approximation that works very well in certain cases.


Here's what the reconstructed general mathematical model looks like so far:

  • The system as a whole is made up of interconnected parts, evolving by transformative steps according to simple rules.

  • The interconnections form two subsystems:  local geometry, and nonlocal network topology.

  • The transformation rules are of two kinds, bookkeeping and operational.

  • Operational rules can only be applied to a system configuration satisfying certain hygiene conditions on its nonlocal network topology; bookkeeping, which only acts on nonlocal network topology, makes it possible to achieve the hygiene conditions.

  • Operational rules are activated based on local criteria (given hygiene).  When applied, operations can have both local and nonlocal effects, while the integrity of the nonlocal network topology is maintained across both kinds of effect via hygiene, hence bookkeeping.

To complete this picture, it seems, we want to consider what a small local system consists of, and how it relates to the whole.  This is all the more critical since we're entertaining the possibility that quantum mechanics might be an approximation that only works for a small local system, so that understanding how a local system relates to the whole may be crucial to understanding how quantum mechanics can arise locally from a globally non-quantum TOE.

A local system consists of some "local state", stuff that isn't interconnection of either kind; and some interconnections of (potentially) both kinds that are entirely encompassed within the local system.  For example, in vau-calculi — our only prior source for complete examples — we might have a subterm (λx.(y‍x)).  Variables are nonlocal network topology, of course, but in this case variable x is entirely contained within the local system.  The choice of variable name "x" is arbitrary, as long as it remains different from "y" (hygiene).  But what about the choice of "y"?  In calculus reasoning we would usually say that because y is free in this subterm, we can't touch it; but that's only true if we're interested in comparing it to specific contexts, or to other specific subterms.  If we're only interested in how this subterm relates to the rest of the universe, and we have no idea what the rest of the universe is, then the free variable y really is just one end of a connection whose other end is completely unknown to us; and a different choice of free variable would tell us exactly as much, and as little, as this one.  We would be just as well off with (λx.(z‍x)), or (λz.(w‍z)), or even (λy.(x‍y)) — as long as we maintain the hygienic distinction between the two variables.  The local geometry that can occur just outside this subterm, in its surrounding context, is limited to certain specific forms (by the context-free grammar of the calculus); the nonlocal network topology is vastly less constrained.

The implication here is that all those terms just named are effectively equivalent to (λx.(y‍x)).  One might be tempted to think of this as simply different ways of "writing" the same local system, as in physics we might describe the same local system using different choices of coordinate axes; but the choice of coordinate axes is about local geometry, not nonlocal network topology.  Here we're starting with simple descriptions of local systems, and then taking the quotient under the equivalence induced by the bookkeeping operations.  It's tempting to think of the pre-quotient simple descriptions as "classical" and analogize the quotient to "quantum", but in fact there is a second quotient to be taken.  In the metatime proof-of-concept, the rewriting kept reshuffling the entire history of the experiment until it reached a steady state — the obvious analogy is to a calculus irreducible term, the final result of the operational rewrite relation of the calculus.  And this, at last, is where Church-Rosser-ness comes in.  Church-Rosser is what guarantees that the same irreducible form (if any) is reached no matter in what order operational rules are applied.  It's the enabler for each individual state of the system to evolve deterministically.  To emphasize this point:  Church-Rosser-ness applies to an individual possible system state, thus belongs to the deterministic-foundations approach to probabilities.  The probability distribution itself is made up of individual possibilities each one of which is subject to Church-Rosser-ness.  (Church-Rosser-ness is also, btw, a property of the mathematical model:  one doesn't ask "Why should these different paths of system state evolution all come back together to the same normal form?", because that's simply the kind of mathematical model one has chosen to explore.)

The question we're trying to get a handle on is why the nonlocal effects of some operational rules would appear to be especially compatible with the bookkeeping quotient of the local geometry.


In vau-calculi, the nonlocal operational effects (i.e., operational substitution functions) that do not integrate smoothly with bookkeeping (i.e., with α-renaming) are the ones that support side-effects; and the one nonlocal operational effect that does integrate smoothly with bookkeeping — β-substitution — supports partial evaluation and turns out to be optional, in the sense that the operational semantics of the system could be described without that kind of nonlocal effect and the mathematics would still be correct with merely a weaker equational theory.

This suggests that in physics, gravity could be understood without bringing nonlocal effects into it at all, though there may be some sort of internal mathematical advantage to bringing them in anyway; while the other forces may be thought of as, in some abstract sense, side-effects.

So, what exactly makes a side-effect-ful substitution side-effect-ful?  Conversely, β-substitution is also a form of substitution; it engages the nonlocal network topology, reshuffling it by distributing copies of subterms, the sort of thing I speculated above may be needed to maintain the unpredictability aspect of quantum mechanics.  So, what makes β-substitution not side-effect-ful in character?  Beyond the very specific technicalities of β- and α-substitution; and just how much, or little, should we be abstracting away from those technicalities?  I'm supposing we have to abstract away at least a bit, on the principle that physics isn't likely to be technically close to vau-calculi in its mathematical details.

Here's a stab at a sufficient condition:

  • A nonlocal operational effect is side-effect-ful just if it perturbs the pre-existing local geometry.

The inverse property, called "purity" in a programming context (as in "pure function"), is that the nonlocal operational effect doesn't perturb the pre-existing local geometry.  β-substitution is pure in this sense, as it replaces a free variable-instance with a subterm but doesn't affect anything local other than the variable-instance itself.  Contrast this with the operational substitution for control variables; the key clauses (that is, nontrivial base cases) of the two substitutions are

x[x ← T]   →   T   .

(τx.T)[x ← C]   →   (τx.C[T[x ← C]])   .

The control substitution alters the local-geometric distance between pre-existing structure T and whatever pre-existing immediate context surrounds the subterm acted on.  Both substitutions have the — conjecturally important — property that they substantially rearrange the nonlocal network topology by injecting arbitrary new network connections (that is, new free variables).  The introduction of new free variables is a major reason why vau-calculi need bookkeeping to maintain hygiene; although, interestingly, it's taken all this careful reasoning about bookkeeping to conclude that bookkeeping isn't actually necessary to the notion of purity/impurity (or side-effect-ful/non-side-effect-ful); apparently, bookkeeping is about perturbations of the nonlocal network topology, whereas purity/impurity is about perturbations of the local geometry.  To emphasize the point, one might call this non-perturbation of local geometry co-hygiene — all the nonlocal operational effects must be hygienic, which might or might not require bookkeeping depending on internals of the mathematics, but only the β- and gravity nonlocal effects are co-hygienic.


Abstracting away from how we got to it, here's what we have:

  • A complex system of parts, evolving through a Church-Rosser transformation step relation.

  • Interconnections within a system state, partitioned into local (geometry) and nonlocal (network).

  • Each transformation step is selected locally.

  • The nonlocal effects of each transformation step rearrange — scramble — nonlocal connections at the locus where applied.

  • Certain transformative operations have nonlocal effects that do not disrupt pre-existing local structure — that are co-hygienic — and thereby afford particularly elegant description.

What sort of elegance is involved in the description of a co-hygienic operation depends on the technical character of the mathematical model; for β-reduction, what we've observed is functional compatibility between β- and α-substitution, while for gravity we've observed the general-relativity integration between gravity and the geometry of spacetime.

So my proposed answer to the conundrum I've been pondering is that the affinity between gravity and geometry suggests a modeling strategy with a nonlocal network pseudo-randomly scrambled by locally selected operational transformations evolving toward a stable state of spacetime, in which the gravity operations are co-hygienic.  A natural follow-on question is just what sort of mathematical machinery, if any, would cause this network-scrambling to approximate quantum mechanics.

On the side, I've got intimations here that quantum mechanics may be an approximation induced by the pseudo-random network scrambling when the system under study is practically infinitesimal compared to the cosmos as a whole, and perhaps that the network topology has a rotational aspect.

Meanwhile, an additional line of possible inquiry has opened up.  All along I've been trying to figure out what the analogy says about physics; but now it seems one might study the semantics of a possibly-side-effect-ful program fragment by some method structurally akin to quantum mechanics.  The sheer mathematical perversity of quantum mechanics makes me skeptical that this could be a practical approach to programming semantics; on the other hand, it might provide useful insights for the TOE mathematics.

Epilog: hygiene

So, what happened to hygiene?  It was a major focus of attention through nearly the whole investigation, and then dropped out of the plot near the end.

At its height of prestige, when directly analogized to spacetime geometry (before that fell through), hygiene motivated the suggestion that geometry might be thought of not as the venue where physics happens but merely as part of its rules.  That suggestion is still somewhat alive since the proposed solution treats geometry as abstractly just one of the two forms of interconnection in system state, though there's a likely asymmetry of representation between the two forms.  There was also some speculation that understanding hygiene on the physics side could be central to making sense of the model; that, I'd now ease off on, but do note that in seeking a possible model for physics one ought to keep an eye out for a possible bookkeeping mechanism, and certainly resolving the nonlocal topology of the model would seem inseparable from resolving its bookkeeping.  So hygiene isn't out of the picture, and may even play an important role; just not with top billing.

Would it be possible for the physics model to be unhygienic?  In the abstract sense I've ended up with, lack of hygiene would apparently mean an operation whose local effect causes nonlocal perturbation.  Whether or not dynamic scope qualifies would depend on whether dynamically scoped variables are considered nonlocal; but since we expect some nonlocal connectivity, and those variables couldn't be perturbed by local operations without losing most/all of their nonlocality, probably the physics model would have to be hygienic.  My guess is that if a TOE actually tracked the nonlocal network (as opposed to, conjecturally, introducing a quantum "blurring" as an approximation for the cumulative effect of the network), the tracking would need something enough like calculus variables that some sort of bookkeeping would be called for.

Sunday, January 31, 2016

Schools of artlanging

Its [conlanging's] development to perfection must none the less certainly be prevented by its solitariness, the lack of interchange, open rivalry, study or imitation of others' technique.
"A Secret Vice", J.R.R. Tolkien, circa 1931.
The important thing is that conlanging start to have a critical apparatus within which the artistic merits of conlangs can be evaluated and where different schools of thought can define and defend themselves. [...] I therefore announce the founding of the Naturalist school of conlanging, which regards the following three things as values: [...]
— "Lighting Some Flames: Towards conlang artistry" (commonly known as "The Artlanger's Rant"), Jesse Bangs, Conlang Mailing List, March 12, 2002.
If naturalistic conlanging is the equivalent of realist painting, then what is an impressionist conlang?  A surrealist conlang?  What's the conlang equivalent of Guernica?  Who is the conlanging equivalent of Gauguin or Dalí?
The Art of Language Invention (Penguin Books, 2015), David J. Peterson, p. 264.

I've some things to say here about conlangs as works of art, with particular attention to the relationship between artist and audience.

The special occasion for this is that I've recently read David J. Peterson's new book The Art of Language Invention.  It's kind of mind-boggling to realize how far conlanging has come since the term was coined 25 years ago, as isolated practitioners of Tolkien's "secret vice" found each other through the internet and began to draw together into a community.  Peterson, of course, has played a central part in the conlanging community during its emergence from the shadows; he was involved in the establishment of the annual Language Creation Conference series and the Language Creation Society, and is probably most broadly recognized atm as the architect of the Dothraki language for the HBO series Game of Thrones.

Artlang appreciation
Small stuff
Artlang appreciation

Early in the Postscript of his book, Peterson asks "how an audience appreciates a conlang" (p. 260).  Seems to me there are two ways:  by observing its use — spoken or written — and by studying its description.  (Descriptions can include examples of use, of course, and it seems awfully common for critiques of conlang descriptions to either praise the plentiful examples or complain that there aren't enough of them.)  Peterson expresses doubts about studying a reference grammar, on grounds that doing so takes an awful lot of work.  On the other hand, against the former method, way back in his Introduction (p. 6) he suggests that if the actors made a mistake speaking a conlang in a TV program, only about a tenth of a percent of viewers would notice.

I think the one-in-a-thousand number gives something of a false impression of how much benefit works of fiction can accrue from using conlangs.  The number may be in the right ballpark, but all Peterson actually claimed for it was that it was how many viewers consciously notice a specific mistake in using a conlang.  One might ask, instead, how many viewers would subliminally notice if the alien (in its broadest sense) speech were faked with nonsense rather than using a conlang.

Granted, the benefits of using a conlang may be amplified in written fiction, because there alien speech appears in a very stable form inviting close and protracted scrutiny (with the complicating factor that it'll almost certainly have to appear in a romanized form).  For alien speech/writing in a TV or movie setting, though, the more of it there is the harder it is to produce an air of authenticity if there's really no language behind it.  (Peterson does allude to this effect in passing on page 2.)

Much has been made of the use, in Star Wars: The Force Awakens, of physical models in preference to computer-generated imagery (CGI).  I recall one of the actors remarking about this (all part of the pre-release publicity, of course) that it's possible to give a more authentic performance when interacting with a model that's really there than when pretending to interact in a blue room with something that isn't there.  The difference, both indirectly through the actors' performances and directly through the feel of the set, is meant to make the movie seem real to an audience subliminally.  I think the same principle applies to use of conlangs:  for a truly authentic feel there has to be a language.

Studying a reference grammar is a lot of work.  One thing we should be asking, in that regard, is how to make descriptions of language more accessible.  In fact it's not at all clear how a language ought to be described even for linguistic purposes, a shortcoming likely to become gradually clearer to a conlanger as they get into the more advanced levels of the subject.  For me the first major warning sign came with studying "The Blue Bird of Ergativity" (which I was set onto by the Conlangery Podcast) — a 2005 linguistics paper arguing, rather convincingly imho, that ergativity isn't really a thing, in the same way that blueness isn't a thing for birds:  a completely superficial property with no biological significance.  Before long I'd lost faith in things like the dative case.  Keep in mind, natural language translation has been just a few years away for the past half century; it's a dreadfully hard problem, which seems to require a thinking mind who actually understands the text being translated and is fluent in both languages.  Describing a language seems unlikely to be less difficult than translation.

All of this seems to me to imply another, deeper question, though.  From How do we describe a language? we're led to What does it mean to construct a language?, and then we find that the real starting point should be What is a language?.  A living natlang lives in the collective understanding of its speakers.  An extinct natlang is presumably defined by the past collective understanding of its speakers, which we try to reconstruct.  For a conlang?  There's a good chance a conlang mightn't be spoken fluently even by its creator; so either the conlang is its description, or the conlang is some sort of natural resonance point that its description tries to lead us to.

Small stuff

One early passage in the book stirred a minor pet peeve of mine.  On page 34, expanding on a remark that IPA represents sound independent of spelling, he goes off on a several-sentence rant about the absurdity of English spelling.  As a fluent speaker and writer of English, I find his ridicule of English spelling perfectly reasonable (and I enjoyed the way he said it).  What's striking to me, though, is that he has nothing similar to say, afaics, against IPA (the International Phonetic Alphabet).  Presumably because he's a very highly trained linguist.  It's possible to get so used to almost anything, that one stops noticing its oddness; and it seems as if Peterson may have done so with the IPA.  Avoiding that fate myself, I've retained my awareness that the IPA is utterly insane.

There is, of course, no likely cure for the IPA, for pretty much the same reason there's no likely cure for English spelling.  The ASCII-only alternative to the IPA, X-SAMPA (with its variant CXS), has the advantage of being ASCII-only and essentially nothing else to recommend it.  I exaggerate less than one might think.

The prospects for getting an alternative adopted certainly wouldn't stop a conlanger from trying to design such a thing.  So far my own efforts aren't worth blogging about.  If they ever reach the level of being interesting, perhaps I will blog about them.  In order to actually be useful for presenting conlangs, they'd have to be significantly easier than IPA for a non-linguist to understand, which seems a very high bar indeed given the challenge a non-linguist will face no matter what the notation (but this high bar also offers some small degree of guidance for efforts in that direction).

Another low-level point that bugs me is glossing, where an example of the conlang is juxtaposed with its translation on a separate line, and between them is a gloss explaining the features of each conlang element.  Here I think the bar for improving utility is considerably lower.  Glossing, like the examples it's attached to, is encouraged by conlang critiques.  I used glossing myself in my blog post a while back on Lamlosuo, e.g.

 losutu   li   susua 
the sometime-speaker sleeps
The trouble here may even be clear to a linguist (whereas a linguist probably wouldn't see the problem if the language being glossed were less peculiar).  What do those abbreviations stand for?  Hint:  NEUT does stand for neutral, but V doesn't stand for verb, and SUB doesn't stand for subject.  Even I do double-takes on STR, which in the circles I move in would ordinarily stand for string (but not in this gloss).  Linguistics documents tend to have a list somewhere of abbreviations used; and that's awful for conlanging, because memorizing those stupid abbreviations is one more obstacle to learning about the language, which is the end goal.  Your target audience for a conlang description isn't limited to professional linguists who consider learning a whole new system of analysis for this new language a more-or-less normal part of their profession.  I actually regretted, after the fact, having indulged common practice by introducing gloss abbreviations; the above should probably have been something more like
 losutu   li   susua 
the sometime-speaker sleeps

It would also be great for non-linguists if everything had sound files attached so you could hear what all the examples are supposed to sound like; but that is so difficult (it's not uncommon to create a conlang you can't yourself pronounce) one never hears complaints when it isn't done — just praise when it is well done.


Almost all conlangs have reference grammars; it's hard to define a conlang without a grammar, although I recall hearing of a collaborative project in which people proposed sentences in the language and the community voted on which ones to allow.  However, defining a conlang via a reference grammar has, in my experience fwiw, three major drawbacks to keep in mind.

First, grammars as used for natlangs are, or should be, descriptive rather than prescriptive.  The classificatory schemes of such grammars are analytic rather than synthetic:  even though a reference grammar is meant to describe the language (whereas a pedagogical grammar teaches its use), still it seeks to tell how the language works rather than why.  This is close to the point mentioned earlier about the superficiality of ergativity.  If such classifications are indeed superficial, someone carefully studying a natlang based on the reference grammar may still manage to glom on to the deeper nature of the language by using the grammar as a jumping-off point; but a conlang defined thus superficially may actually lack that deeper nature that its author, if pursuing naturalism, would like to have.

Second, grammars may tend to make languages seem much more orderly than they are in practice, especially spoken practice.  Anyone who has transcribed an oral presentation is aware of this:  we speak haltingly, with all sorts of false starts, typically with lots of low-level discourse particles apparently serving to help the listener navigate through the speaker's on-the-fly revisions in assembling their syntax (in English, um and er might be used for this, varying by dialect).  Perhaps this would be mitigated in fiction, both written and on-screen, because dialog would tend to be somewhat unrealistically orderly anyway for the sake of exposition; but even so, fully explaining a paragraph of fluently written English (or whatever is the primary natlang of the work) tends to involve some remarkably esoteric grammatical gymnastics.

Third and likely most problematic, grammars tend to describe things in terms of a standard set of principles for organizing sentences and clauses.  Why is that a problem?  Because very often the organization of sentences and clauses isn't the point, but merely a tool used in service of some other purpose — often a purpose that can only be properly appreciated by studying longer narratives and discourses.

Consider verb–argument alignment.  The point, apparently, is to specify which arguments to a verb play what part in the verb.  Most commonly you have nominative-accusative alignment, where the agent of a transitive verb is marked the same as the subject of an intransitive verb, while the transitive patient is marked differently.  Alternatively there's ergative-absolutive alignment, where transitive patient is marked as intransitive subject while transitive agent is marked differently.  Why this difference matters may vary from language to language, but often has partly to do with the way verbs are valency-reduced — as with the passive voice in nominative-accusative languages, anti-passive in ergative-absolutive — when fitting together larger narratives.  But then there's Tagalog, the dominant language of the Philippines.  Tagalog uses Austronesian alignment, which means there's marking on the verb indicating what system is used to mark the arguments, with a variety of options available — and the reason for doing so has to do with specificity.  That's right, specificity.  How specific a noun is.  Don't expect to get a really good explanation of how this works in Tagalog, either, because as best I can tell, as of now the linguistic community hasn't altogether figured it out yet.

What does seem clear to me, though, is that alignment in Tagalog is a particular instance where understanding what the feature is being used for is essential to making sense out of it (linguistically, at least; there's no need to consciously understand it in order to use it, otherwise Tagalog native speakers would simply explain it to the linguists and it would all be settled).  That is, figuring out which argument does what in the verb isn't really the "point" of alignment, it's just the mechanical act we use to delineate which facets of the language we've chosen to put in the box labeled "alignment".  I suspect some degree of the same effect applies to many of these seemingly low-level language features:  their apparent mechanical operation isn't necessarily their primary purpose, and their primary purpose may be impossible to see clearly until one gets up to the narrative scale.  (Appeal to the narrative scale is a recurring theme in the Conlangery Podcast, btw, even though they've never actually devoted a podcast to it, as such.)

What we're facing here is the difference between the "universal" — Chomskyist — approach to grammar, and the "functional" approach.  In essence, the universal approach supposes that grammar is hardwired into the human brain, so that the shape of language is determined by that wiring, whereas the functional approach supposes that grammar is shaped by the functions people want language to serve.  Whatever you think of the philosophies underlying these positions, there's a bunch of things going on in language that tend to get missed by the usual grammatical approach, and the functional approach seems like it ought to provide those missing elements.  Except, apparently it doesn't.  Functional grammar advocates tend to take the principle to its extreme and look at everything from the narrative perspective.  Whatever you think of that philosophically, it doesn't seem very useful for conlanging since it doesn't provide local structure — less a problem for functional grammarians than conlangers, thanks again to the descriptive/prescriptive distinction.

Ideal for conlanging — or so it seems to this conlanger — would be a modified variant form of traditional grammars, with comparable degree of local structure but free of traditional assumptions about purpose and mechanical classification, avoiding implicit bias toward conjectured universal restrictions (conlangs don't have to obey any rules hardwired into the human brain, if there are any, and some conlangs deliberately try not to);  and with high-level functionality integrated into it in a way that naturally meshes with the more low-level-mechanical facilities.  Both universal and functional grammarians would seem to have philosophical reasons for not trying to construct a grammar like that; but conlangers should have no such disincentive.


It's easy to write software that forces its human users to fit what they're doing into a rigid structure.  Doing so can have negative consequences for society, as it reduces users' opportunities to do what human sapience is actually good at; but to the current point, limiting the user's options is definitely counterproductive when what you're trying to support is creative artlanging.  It's not so easy to write software that supports the best of what humans can do (creative thinking) together with the best of what computers can do (rote memory and purely mechanical manipulation).  That sort of software must by nature be general-purpose and inherently flexible.  Two of the best artifacts we have so far in this genre — TeX and wikis — are both plausible candidates to support conlanging, but they both have flaws for the purpose.  In brief, TeX is overly technical, wikis are computationally weak.

What we need is the sort of simple basic elements one finds in a wiki;  with TeX-like computational power available at need — if the opportunity arises I'd like to try a fexpr-based extension strategy rather than the macro-based strategy shared by TeX and wikis;  and an interactive element with enough power to let the computer provide advice and mechanical aid while aggressively preserving the user's flexibility of form as well as substance.  Is all that even possible?  I suspect so.  The limiting factor isn't computational power (we've likely had enough of that since before the word "conlang" was coined) or brute force programming power (throwing more people at a project makes it take longer), but finessing the design of the software.  I've got a few lines I'm pursuing, some of which I've been pursuing for decades; note that I've named just two artifacts distributed across a half century of development, an average of less than one every other decade.  Patience is indicated.


Jesse Bangs's notion in the Artlanger's Rant was that Naturalism — which he defined as valuing a naturalistic, complex, creative conlang — would be a "school" of conlanging.  There's some history there.  The conlanging community that had formed around the CONLANG mailing list in 1991 underwent a schism in 1993, with a subgroup leaving to form the AUXLANG mailing list.  You can't very well use an auxlang to unite the world if you can't get everyone to agree on which auxlang is best for that purpose, so it seems natural that the auxlangers were inclined to advocate their favorite auxlang; after the schism, the CONLANG mailing list adopted a policy against language advocacy (though afaict auxlangers who respect the advocacy ban are more encouraged than required to use the other list).  When the Artlanger's Rant was written, that schism was still recent community history, and thinking of conlanging in terms of that sort of big goal would have been pretty natural (irony not intended).

I suspect Bangs's proposed definition of a Naturalist school may be a touch heavy on goals and light on means.  He does mention complexity/completeness, which is about content; and creativity, i.e. not imitating your own native language, which is sort of a technical criterion.  But I'm thinking of the nature of the description; like, the form of the reference grammar.  If you're trying for something that evokes naturalism using a very traditional and rather sketchy grammar, might you end up with a sort-of cubist conlang?  It seems that might modify the notion of completeness, shifting emphasis to a more abstract side of naturalism; not sure quite how that would work, but the nature of cubism suggests something of the sort; while on the purely technical side it might give a sense of simplification.  Perhaps some other non-traditional form of grammar would produce a sort-of impressionist conlang.  The goal of course is to nurture the conlangers' art, not to imitate schools of painting; but I do take from these pseudo-examples an encouragement to explore different forms of grammatical description.

What about the next part of Peterson's question?  Can a conlang convey a powerful message like that of Guernica?  Not to overplay the analogy, but can a conlang come across to its audience as anti-war?  Can it carry profound ideas, or strong emotions?  By what vector?  The mere phonaesthetics of a conlang surely can't carry so much, though it can presumably contribute to it.  Perhaps more can be be carried, for those who learn about them, by the vocabulary of the language, and the ideas/attitudes/ways of thinking that the language fosters.  Improved lucidity of language description for non-linguists would seem to help here; and ideas/attitudes might be affected less by traditional, relatively low-level mechanical considerations than by the sort of narrative-level things that functional linguists emphasize.  At any rate, I suspect that making some headway on the first round of challenges I've been discussing here will put us in a better position to tackle this second round of more abstract, expressive challenges.

Thursday, October 1, 2015

Natural intelligence

[...] man had always assumed that he was more intelligent than dolphins because he had achieved so much — the wheel, New York, wars and so on — while all the dolphins had ever done was muck about in the water having a good time.  But conversely, the dolphins had always believed that they were far more intelligent than man — for precisely the same reasons.
— Douglas Adams, The Hitchhiker's Guide to the Galaxy, Chapter 23.

I have a few things I want to say here about the nature of natural, as opposed to artificial, intelligence.  While I'm (as usual) open to all sorts of possibilities, the theory of mind I primarily favor atm has a couple of characteristics that I see are contrary to prominent current lines of thought, so I want to try to explain where I'm coming from, and perhaps something of why.

The sapience engine
The sapience engine

In a nutshell, my theory is that the (correctly functioning) human brain is a sapience engine — a device that manipulates information (or, stored and incoming signals, if you prefer to put it so) in a peculiar way that gives rise to a sapient entity.  The property of sapience itself is a characteristic that arises from the peculiar nature of the manipulation being done; the resulting entity is sapient not because of how much information it processes, but because of how it processes it.

This rather simple theory has some interesting consequences.


There's an idea enjoying some popularity in AI research these days, that intelligence is just a matter of scale.  (Ray Dillinger articulated this view rather well, imho, in a recent blog post, "How Smart is a Smart AI?".)  I can see at least four reasons this view has appeal in the current intellectual climate. 

  1. Doing computation on a bigger and bigger scale is what we know how to do.  Compounding this, those who pursue such a technique are rewarded, both financially and psychologically, for enthusing about the great potential of what they're pursuing.  And of course, what we know how to do doesn't get serious competition for attention, because the alternative is stuff we don't know how to do and that doesn't play nearly as well.  Better still, by ascribing full sapience to a range of computational power we haven't achieved yet, we absolve ourselves of blame for having not yet achieved full AI.  Notwithstanding that just because we know how to do it doesn't mean it has to be able to accomplish what we want it to.

  2. The more complex a computer program gets — the more branching possibilities it encompasses and the bigger the database it draws on — the more effectively it can fool us into seeing its behavior as "like us".  (Ray Dillinger's post discusses this.)

  3. The idea that sapience is just a matter of scale appeals to a certain self-loathing undercurrent in modern popular culture.  It seems to have become very trendy to praise the cleverness of other species, or of evolution itself, and emphasize how insignificant we supposedly are; in its extreme form this leads to the view that there's nothing at all special about us, we're just another species occupying an ecological niche no more "special" than the niches of various kinds of ants etc.  (My subjective impression is that this trend correlates with environmentalism, though even if the correlation is real I'm very wary of reading too much into it.  I observe the trend in Derek Bickerton's Adam's Tongue, 2009, which I wouldn't otherwise particularly associate with environmentalism.)

  4. The "scale" idea also gets support from residual elements of the previously popular, opposing view of Homo sapiens as special.  Some extraordinary claims are made about what a vast amount of computing power is supposedly possessed by the human brain — evidently supposing that the human brain is actually doing computation, exploiting the computational potential of its many billions of neurons in a computationally efficient way.  As opposed to, say, squandering that computational potential in an almost inconceivably inefficient way in order to do something that qualitatively isn't computation.  The computational-brain idea also plays on the old "big brain" idea in human evolution, which supposed that the reason we're so vastly smarter than other primates is that our brains are so much bigger.  (Terrance Deacon in The Symbolic Species, 1997, debunking the traditional big-brain idea at length, notes its appeal to the simplistic notion of the human brain as a computer.)

I do think scale matters, but I suspect its role is essentially catalytic (Deacon also expresses a catalytic view); and, moreover, I suspect that beyond a certain point, bigger starts to degrade sapience rather than enhance it.  I see scale coming into play in two respects.  As sketched in my previous post on the subject (here), I conjecture the key device is a non-Cartesian theater, essentially short-term memory.  There are two obvious size parameters for adjusting this model:  the size of the theater, and the size of the audience.  I suspect that with too small an audience, the resultant entity lacks efficacy, while with too large an audience, it lacks coherence.  Something similar seems likely to apply to theater size; I don't think the classic "seven plus or minus two" size of human short-term memory is at all arbitrary, nor strongly dependent on other constraints of our wetware (such as audience size).

Note that coherent groups of humans, though they represent collectively a good deal more computational potential than a single human, are generally a lot stupider.  Committees — though they can sometimes produce quite good results when well-chaired — are notorious for their poor collective skills; "design by committee" is a running joke.  A mob is well-described as a mindless beast.  Democracy succeeds not because it necessarily produces brilliant results but because it resists the horrors of more purely centralized forms of government.  Wikipedia, the most spectacular effort to date to harness the wisdom of the masses, rather thoroughly lacks wisdom, being prone to the vices of both committees and mobs.  (Do I hate Wikipedia?  No.  I approve deeply of some of the effects it has had on the world, while deploring others; that's a complicated subject for another time.)  One might suppose that somehow the individual people in a group are acting a bit like neurons (or some mildly larger unit of brain structure), and one would need a really big group of people before intelligence would start to reemerge, but honestly I doubt it.  Once you get past a "group of one", the potential intelligence of a group of people seems to max out at a well-run committee of about six, and I see no reason to think it'll somehow magically reemerge later.  Six, remember, is roughly the size of short term memory, and I've both wondered myself, and heard others wonder, if this is because the individuals on the committee each have short term memories of about that size; but as an alternative, I wonder if, just possibly, the optimal committee size is not so much an echo of the size of the non-Cartesian theater as a second example of the same deep phenomenon that led the non-Cartesian theater to have that size in the first place.


As might be guessed from the above, since my last blog post on this subject I've been reading Derek Bickerton's Adam's Tongue (2009) and Terrance Deacon's The Symbolic Species (1997, which was recommended to me by a commenter on my earlier post).  Both have a fair amount to say about Noam Chomsky, mostly in the nature of disagreement with Chomsky's notion of a universal language instinct hardwired into the brain.

But it struck me, repeatedly throughout both books, that despite Deacon's disagreements with Chomsky and Bickerton's disagreements with Deacon and Chomsky, all three were in agreement that communication is the essence of the human niche, and sapience is an adjunct to it.  I wondered why they thought that, other perhaps than two of them being linguists and therefore inclined to see their own subject in whatever they look at (which could as well explain why I look at the same things and see an algorithm).  Because I don't altogether buy into that linguistic assumption.  They seem to be dismissing a possibility that imho is worth keeping in play for now.

There's a word I picked up from Deacon:  exaptation.  Contrasting with adaptation by changing prefix ad- to ex-.  The idea is that instead of a species feature developing as an adaptation for a purpose that the species finds beneficial, the feature develops for some other purpose and then, once available, gets exapted for a different purpose.  The classic example is feathers, which are so strongly associated with flight now, that it's surprising to find they were apparently exapted to that purpose after starting as an adaptation for something else (likely, for insulation).

So, here's my thought.  I've already suggested, in my earlier post, that language is not necessary to sapient thought, though it does often facilitate it and should naturally arise as a consequence of it.  What if sapience was exapted for language after originally developing for some other purpose?

For me, the central question for the evolution of human sapience is why it hadn't happened before.  One possible answer is, of course, that it had happened before.  I'm inclined to think not, though.  Why not?  Because we're leaving a heck of a big mark on the planet.  I'm inclined to doubt that some other sapient species would have been less capable or, collectively, more wise; so it really seems likely to me that if this had happened before we might have noticed.  (Could it have happened before without our noticing?  Yes, but as I see it Occam's Razor doesn't favor that scenario.)

To elaborate this idea — exaptation of sapience for language — and to put it into perspective with the alternatives suggested by Deacon and Bickerton, I'll need to take a closer look at how an evolutionary path might happen to be extremely unlikely.


Evolution works by local search in the space of possible genetic configurations:  imagining a drastically different design is something a sapient being might do, not something evolution would do.  At any given point in the process, there has to be a currently extant configuration from which a small evolutionary step can reach another successful configuration.  Why might a possible target configuration (or a family of related ones, such as "ways of achieving sapience") be unlikely to happen?  Two obvious contributing factors would be:

  • the target is only successful under certain external conditions that rarely hold, so that most of time, even if an extant species were within a short evolutionary step of the target, it wouldn't take that step because it wouldn't be advantageous to do so.

  • there are few other configurations in the close neighborhood of the target, so that it's unlikely for any extant species to come within a short evolutionary step of the target.

To produce an unusually unlikely evolutionary target, it seems we should combine both of these sorts of factors.  So suppose we have an evolutionary target that can only be reached from a few nearby configurations, and consider how a species would get to one of these nearby configurations.  The species must have gotten there by taking a small evolutionary step that was advantageous at the time, right?  Okay.  I submit that, if the target is really to be hard to get to, we should expect this small step to a nearby configuration to have a different motivation than the last step to the target.  This is because, if these two consecutive steps were taken for the same reason, then at any time a species takes the penultimate step, at that same time the last step should also be viable.  That would cancel out our supposition that the target is only occasionally viable:  any time it's viable to get to the nearby configuration, it's also viable to continue on to the target.  In which case, for the target to be really difficult to get to, its nearby neighbors would have to be difficult to get to, merely pushing the problem of unlikelihood back from the target to its nearby neighbors, and we start over with asking why its neighbors are unlikely.

In other words, in order for the evolutionary target to be especially unlikely, we should expect it to be reached by an exaptation of something from another purpose.

(I'm actually unclear on whether or not feathers may be an example of this effect.  Clearly they aren't necessary for flight, witness flying insects and bats; but without considerably more biological flight expertise, I couldn't say whether there are technical characteristics of feathered flight that have not been achieved by any other means.)


This is one reason I'm doubtful of explaining the rarity of sapience while concentrating on communication.  Lots of species communicate; so if sapience were an adaptation for communication, why would it be rare?  Bickerton's book proposes a specific niche calling for enhanced communication:  high-end scavenging, where bands of humans must cooperate to scavenge the remains of dead megafauna.  Possible, yes; but I don't feel compelled by it.  The purpose — the niche — doesn't seem all that unlikely to me.

Deacon's book proposes a more subtle, though seemingly closely related, purpose.  Though less specific about the exact nature of the niche being occupied — which could be high-end scavenging, or perhaps group hunting — he suggests that in order to exploit the niche, hominins had to work together in large bands containing multiple mated pairs.  This is tricky, he says, because in order for these group food-collection expeditions to be of major benefit to the species, those who go on the expeditions must find it in their own genetic self-interest to share the collected food with the stay-at-home nurturers and young.  He discusses different ways to bring about the required self-interest motive; but evolution works, remember, in small steps, so not all of these strategies would be available for our ancestors.  He suggests that the strategy they adopted for the purpose — the strategy, we may suppose, that was reachable by a small evolutionary step — was to have each mated pair enter into a social contract, essentially a marriage arrangement, in which the female agrees to mate only with a particular male in exchange for receiving food from that male's share.  The arrangement holds together so long as they believe each other to be following the rules, and this requires intense communication between them plus sophisticated reasoning about each other's future behavior.

I do find (to the extent I understand them) Deacon's scenario somewhat more plausible than Bickerton's, in that it seems to provide more support for unlikelihood.  Under Bickerton, a species tries to exploit a high-end scavenging niche, and the available solution to the coordination problem is proto-language.  (He describes various other coordination techniques employed by bees and ants.)  Under Deacon, a species tries to exploit a high-end scavenging-or-hunting niche, and the available solution to the cooperation problem is a social contract supported by symbolic thought.  In either scenario, the species is presented with an opportunity that it can only exploit with an adaptation.  For this to support unlikelihood, the adaptation has to be something that under most circumstances would not have been the easiest small-step solution to the challenge.  Under Bickerton, the configuration of the species must make proto-language the closest available solution to the coordination problem.  Under Deacon, the configuration of the species must make symbolic thinking the closest available solution to the cooperation problem.  This is the sense in which, as I said, I find Deacon's scenario somewhat more plausible.

However, both scenarios seem to me to be missing something important.  Both of them are centrally concerned with identifying a use (coordination per Bickerton, cooperation per Deacon) to which the new feature is to be put:  they seek to explain the purpose of an adaptation.  By my reasoning above, though, either the target of this adaptation should be something that's almost never the best solution for the problem, or the target should only be reachable if, at the moment it's wanted, some unlikely catalyzing factor is already present in the species (thus, available for exaptation).  Or, of course, both.

From our end of human evolution, it seems that sapience is pretty much infinitely versatile, and so ought to be a useful adaptation for a wide variety of purposes.  While this may be so, when conjecturing it one should keep in mind that if it is so, then sapience should be really difficult to achieve in the first place — because if it were both easy to achieve and useful for almost everything, one would expect it to be a very common development.  The more immediately useful it is once achieved, the more difficult we'd expect it to be to achieve in the first place.  I see two very plausible hypotheses here:

  • Sapience itself may be an adaptation for something other than communication.  My previous post exploring sapience as a phenomenon (here) already suggested that sapience once achieved would quickly be exapted for communication.  My previous posts regarding verbal culture (starting here) suggest that language, once acquired, may take some time (say, a few million years) to develop into a suitable medium for rapid technological development; so the big payoff we perceive from sapience would itself be a delayed exaptation of language, not contributing to its initial motivation.  Deacon suggests there are significant costs to sapience, so that its initial adoption has to have a strong immediate benefit.

  • Sapience may require, for its initial emergence, exaptation of some other relatively unlikely internal feature of the mind.  This calls for some deep mulling over, because we don't have at all a firm grasp of the internal construction of sapience; we're actually hoping for clues to the internal construction from studying the evolutionary process, which is what we're being offered here if we can puzzle it out.

Putting these pieces together, I envision a three-step sequence.  First, hominin minds develop some internal feature that can be exapted for sapience if that becomes sufficiently advantageous to overcome its costs.  Second, an opportunity opens up, whereby hominin communities have a lot to gain from group food-collection (be it scavenging or hunting), but to make it work requires sophisticated thinking about future behavior, leading to development of sapience.  The juxtaposition of these first two steps being the prime source of unlikelihood.  I place no specific requirement on how sapience is applied to the problem; I merely suppose that sapience (symbolic thinking, as Deacon puts it) makes individuals more able to realize that cooperating is in their own self-interest, and doing so is sufficiently advantageous to outweigh the costs of sapience, therefore genes for sapience come to dominate the gene pool.  Third, as sapience becomes sufficiently ubiquitous in the population, it is naturally exapted for language, which then further plays into the group cooperation niche as well as synergizing with sapience more broadly.  At this point, I think, the process takes on an internal momentum; over time, our ancestors increasingly exploit the language niche, becoming highly optimized for it, and the benefits of language continue to build until they reach critical mass with the neolithic revolution.