Structural insight

The relativity-quantum split

2020-12-02T13:44:00.000-08:00

There may be said to be two classes of people in the world; those who constantly divide the people of the world into two classes, and those who do not.
— Robert Benchley, Vanity Fair, February 1920.

In this post I suggest, and explore, an alternative way of thinking about the relationship between relativity and quantum mechanics. This is part of my broad program of shaking up our thinking with alternative approaches to basic physics, but not part of my ongoing series, within that broad program, exploring term-rewriting physics; the idea here and the term-rewriting theme might be combinable, but, going into this, I fail to see any obvious need for that to be possible, nor obvious way for it to work.

Two of the deepest, most elemental mysteries of modern physics are (1) why gravity differs from the other fundamental forces, and (2) why the two great theories of post-classical basic physics, relativity and quantum mechanics, are so profoundly incompatible with each other. These two mysteries are evidently related to each other, focusing on different parts of the same elephant. The point in both cases is to reconcile the incompatibilities: to unify gravity with the other forces, and to unify relativity with quantum mechanics. My long-running exploration of term-rewriting calculi for basic physics focuses on the first question, of gravity versus the other forces, starting from an observed structural similarity between term-rewriting calculi and the basic forces. This post focuses on the second question, of the rift between the two great theories, starting from a thought experiment I conjured for a post years ago when explaining the deep connection between special relativity and locality (here).

In past posts I've mentioned several reasons for accumulating a repertory of alternative approaches to a subject, even if the alternatives are not mutually compatible. One may draw on such a repertory for spare parts to use when assembling a new theory, and even spare parts not directly used may provide inspiration (here). One may use alternative ideas as anchors when wading into an established paradigm that is easy to get trapped in, the better to tow oneself out again after acquiring whatever one waded in to learn: that is, the more alternative approaches one knows about going in, the less likely that during close contact with the paradigm one will be seduced into imagining the paradigm is the only way things can be done (there). One may use alternative ideas to ensure that when one does choose to accept some facet of an established paradigm, it is an informed decision rather than made through not knowing of any other choice (over thar). And, aside from all those things, I'm a great believer in unorthodoxy as a mental limbering-up exercise (numerous mentions scattered throughout this blog).

This kind of repertory-accumulation calls for a sort of skeptical open-mindedness that doesn't arise in the normal practice of paradigm science, and engages hazards normal science doesn't have to cope with (hazards on both sides; pro and con, open-mindedness and skepticism). The essential power of paradigm science, per Kuhn's Structure of Scientific Revolutions, is that researchers can throw themselves wholeheartedly into exploring the potential of the paradigm because they have no distractions by any alternative to the paradigm: all rivals have been ruthlessly suppressed. The ruthless suppression means, though, that the mainstream view of alternative approaches is distorted not only by the perspective mismatch between different paradigms, but by the mainstream deliberately heaping ridicule on its rivals. (I've been especially aware of this in my dissertation area.) Alternative-science researchers often add further murk, not only by being sometimes under-cautious themselves, but also through their style of presentation. I recall once studying (briefly) a fairly impressive alternative-science book whose main quirk, that I noticed at the time, was a casually asserted conviction by the author that their mentor's research, which they were apparently quite competently expanding upon, had been arbitrarily suppressed by mainstream scientists who saw it as a threat to their employment. Yikes. This part of the author's presentation really did sound very crackpot-fringe, except that it was quite bizarrely off-kilter from the classic crackpot rant — because it wasn't a rant, but calmly presented as pedestrian fact. I eventually concluded this was likely explained by another part of Kuhn's account of a paradigm: the paradigm provides a model of what valid research in the field should look like. This researcher's mentor likely had ranted that way, and they simply took the claims on-board. Which I find a bit mind-blowing; but my immediate point is that one has to take alternative-science works one-at-a-time, expecting to encounter varying degrees and flavors of psychoceramics that may call for customized handling: what to filter out, what to take seriously, and what, occasionally, to flat-out discard.

At any rate, the current post operates, as one might hope to find in explorations of this sort, on two levels: it explores a specific approach, and in the process it gathers insights into the larger space of possible approaches that this specific approach belongs to. Honestly, going into this with the specific approach in mind, I didn't foresee the extent of the larger insights coming out of it. A single alternative hypothesis, such as term-rewriting physics, breaks out of the conventional mold but, with only two data points, doesn't offer much of a view of the broader range of possibilities, really only occasional glimpses; but two alternative hypotheses —or even better, more than two; the more the merrier, presuming they're very different from each other— allow some significant triangulation, for a stereoscopic view of the terrain. I do mean to give the specific hypothesis here its full measure of attention while I'm about it; but the general insights offer quite an impressive vista, and I'll be studying the viewshed extensively as I go. Or, in a more navigational metaphor, this is going to be something of a wild ride through the space of possible theories, with tangent paths densely packed along the way, shooting off in all directions; the contrast with my past, relatively much tamer explorations came as, candidly, something of a shock to me.

Rather a heavy jolt, in fact. With all those arguments in favor of exploring many different approaches to a problem, this time the range of possibilities has come roaring on me and I'm experiencing a potential downside of the technique. Alternative scientists have this in common with mainstream scientists: when it comes time to pursue their chosen paradigm in-depth, they need some way to put on blinders and ignore the clamor of alternatives. There's one set of ways the mainstream blithely dismisses fringe theories, another set by which scientists working on the fringes blithely dismiss mainstream theories, because both groups have to do this in order to get any in-depth work done. The parallels go deeper, I'm observing; each group, in order to eliminate distractions without exhausting themselves, resorts to shortcut techniques that the other group criticizes with some justice — perhaps worth a blog post in itself, if I can assemble one. But, on the other hand, if you can ever (as I have been trying to do) once begin to see the full range of theories possible, it can be overwhelming. I find myself inadequately prepared, and in need of some novel strategies to cope.

A side issue, btw, on which I admit to some ambivalence has been the role of detailed math in my exploratory posts on alternative physics. On one hand it seems highly advisable not to get lost in details when looking at multiple paradigms that differ in much higher-level, structural ways. On the other hand, I'm anxious to get down to brass tacks. Sometimes inspiration starts close to the detailed math; as in my post on metaclassical physics, which started with an algorithm for generating a probability distribution. My long-running series on co-hygiene started with a high-level-structural similarity, lacking low-level details, between the mathematics of term-rewriting calculi and the basic forces in physics. For a structural inspiration to advance much beyond the inspirational stage depends on someone furnishing it with some solid mathematical details; as string theory emerged in the early 1970s when an approach to quantum mechanics that had been kicking about since proposed about three decades earlier by Werner Heisenberg (roughly, he proposed that in describing elementary particles one should abandon space and time entirely, treating a particle as a black box with inputs and outputs) merged with some mathematics for the description of vibrating strings. The overall trend in an extended theory-development process, spanning years or decades, is likely to be punctuated by a series of upward steps in mathematical specifics, perhaps with long intervals between. The nascent idea of the current post, like my term-rewriting approach to physics, is in the market for a further infusion of low-level math to give it clearer form.

This post does not have a punchline. Conclusions reached are mostly about possible paths that might be worth further exploration; and there are lots of those, identifying which feels in itself worthwhile; but no particular path is chased down to its end here, not even within the specific central hypothesis of the post. Expect here to find questions, rather than answers.

Contents
What things look like
Why two theories
Slow things
Target
Third law
Second law
Length contraction
Shockwaves
Geometry
Fields
Where to go from here

What things look like

To begin slowly, here's a simple and important point: the apparent "shape" of any principle in physics depends not only on the underlying mathematical model of what is happening, but on how one interprets it. No, my demonstration of this point is not about the interpretation of quantum mechanics; it's about the interpretation of relativity.

I submit that relativity does not have to be understood as preventing us from traveling faster than light; within a fairly reasonable set of assumptions, it can be understood as allowing travel at arbitrarily high speeds but shaping what it looks like and its consequences. To show this, I propose the following thought experiment.

Let us suppose we have what amounts to a giant ruler stretching, say, a thousand light-years; of negligible mass, so we won't have to worry about that detail; marked off in light-seconds, light-minutes, light-hours, and so on. We start at one end of the ruler, and we wish to travel very fast along the length of the ruler. What happens as we try to do that?

According to the usual interpretation of relativity, the faster we go, the more our clock slows down, preventing us from ever actually reaching the speed of light. A stationary observer —stationary with respect to the ruler, that is— sees our clock slowing down so that although we may continue to accelerate, the effort we put in is spread out over a longer and longer time and we never actually reach light-speed. However, suppose we choose to define our velocity in terms of the ticks of our clock and the markings on the ruler. Keep in mind, this is purely a matter of interpretation: a change in which ratio of numbers we choose to call "velocity". As we continue to increase our speed along the length of the ruler, the ruler appears —to us— to contract (although, to the stationary observer, we contract along our direction of motion). If we keep speeding up, after a while we'll be passing a light-second-marker on the ruler once each second-on-our-clock; what the stationary observer sees is that we travel one light-second along the ruler in somewhat more than one second, but our clock is now running somewhat more slowly so that it only advances by one second during that time. Continuing to accelerate, we can reach a point where we're passing ten such marks on the ruler for each tick of our clock; to us it appears the ruler has greatly contracted, while to a stationary observer it appears our clock has greatly slowed so that we pass ten light-second-marks of the ruler while our clock only advances by one second. According to our chosen interpretation, this means we are traveling at ten times the speed of light. Supposing we have the means to accelerate enough (a technological detail we're stipulating to), we can reach a "velocity" of a hundred times the speed of light, or a thousand, or whatever other (finite) goal we choose. Granted, in conventional terms it still appears to the stationary observer that our clock is slowing down and we stay below the speed of light; while to us they appear to contract in our direction of travel, and their clocks slow down behind us or speed up ahead of us (afaics; some of these details are usually omitted from simplified descriptions of the thought experiment). When we reach the other end of the ruler and slow to a stop, more than a thousand years will have elapsed for the stationary observer; while to us the elapsed time will have been much shorter, and the extensive aging of stationary observers will have been, under our interpretation, a strange consequence of our rapid travel.

I am not, atm, in any way advocating this unconventional interpretation of relativity. I only draw attention to it as a demonstration that what the predictions of a mathematical model "look like" depends on how you choose to interpret them.

Why two theories

Waves and particles have been alternative views of elementary physical phenomena at least since Descartes; sometimes one is more advantageous, sometimes the other. Waves, of course, are fields in motion, an extraordinarily complex —one might argue, unboundedly complex— phenomenon whose complexity may, in itself, be a reason to sometimes simplify one's view of things by considering particles. There is also a natural drive to use particles when the physical effect under consideration is inherently discrete. In principle, it seems, one can get along almost exclusively with either one, but then one has to introduce just a bit of the other. J.S. Bell noted, especially, that quantum mechanics describes everything exclusively with waves up until it becomes necessary in practice to account for discrete events, at which point one arbitrarily interjects wave-function collapse. All of this is a form of the discrete/continuous balance, which I discussed in a previous post (yonder).

No, I don't propose here to view wave/particle duality (nor any of the other complementary pairs I've mentioned so far) as the origin of the split between quantum mechanics and relativity. But with all this setting the stage, I'm now ready to introduce my main theme for the post, which starts out, at least, as another simple thought experiment, not based on any sort of modern physics.

Consider a "classical" universe, with three Euclidean dimensions of space and one of time, and classical particles moving about in this space at arbitrary velocities. That is, let us suppose a universe with no curvature to its spacetime, and no fields.

This scenario is deliberately just a sketch, of course, and an unavoidably incomplete one; its point is just the sort of simplification noted above as an advantage of particles. Without fields we have, following the usual sort of classical treatment, no action-at-a-distance at all, and our point-like particles flying about with no forces between them will continually miss each other and thus not affect each other. So, yes, our sketch is going to miss some things because it's just a sketch; but let's see whether there's something interesting we can get out of it. (If we didn't play around with various parts of a problem, rather than trying to take on everything at once, it's doubtful we'd make any progress at all.)

If it helps to have a more specific mathematical framework, here's a way to think of it; keeping in mind that, because this is just a stepping stone on our way to some unimagined theory we'd like to find, whatever degrees of freedom we build into our mathematical framework may be much less structurally radical than what we eventually end up changing, later on. That said: Our central starting premise is that the entire configuration of the universe is a set of particles with position, velocity, and presumably some other properties which we leave unspecified for now. How each particle behaves over time must then be some sort of function of the whole system, also not precisely determined for now but, broadly speaking, we expect each particle to travel most of the time in a more-or-less-straight line, and expect each particle to be significantly affected by other particles only when it comes close to them even though interactions won't actually require point-collisions (since, on the face of it, point-collisions should be infinitely unlikely). The two unknown elements of the scenario, which we suppose are small enough for us to learn something useful from the scenario even while initially blurring them out, are the other properties that particles have (conventionally mass, charge, etc.), and the way particles passing close enough to each other deflect each other's trajectories.

Suppose we are interested in what happens in some particular volume of space over some particular interval of time; as a for-instance, it could be a spherical volume of space with a radius of "one foot" —by which we'll mean a light-nanosecond, for which a foot is a standard (and fairly good) approximation— and an interval of one nanosecond.

If we had relativity, guaranteeing that particles travel no faster than the speed of light, we could safely assume that, for a particle to impinge on our volume-of-interest before the end of the time interval, the particle would have to have been within one foot of the volume at the start of the interval, since at the speed of light the particle could only travel one foot, one light-nanosecond, by the end of the interval. Even considering secondary effects with a series of particles affecting each other, no such chain of effects could impact the volume by the end of the interval if the chain started more than a foot away from the volume. This is what makes relativity a local theory, that there is no need to consider anything outside an initial sphere with a two-foot radius. In practice, the advantage isn't that we're sure we haven't omitted something within that two-foot radius from our description of the system, but rather that we can get away with blithely claiming there's nothing else there, and if we've taken pains to "control" any actual experiment, our claim will usually have been true, while on the rare occasions there really was something else there, we can usually throw out that data on grounds the experiment was insufficiently controlled.

However, in this thought experiment we're supposing a "classical" (i.e., Euclidean) universe, with no speed limit, so that in fact a particle, if it's moving fast enough, could be anywhere in the universe at the start of the interval and still pass through the volume-of-interest during the one nanosecond we're studying. (Yes, even if the universe were of infinite size, since the velocity is unbounded too.) It's still true, more or less, that not every particle in the universe can impinge on the volume-of-interest during the interval-of-interest. We can mostly disregard any particle moving too slowly to overcome its distance from the volume at the start of the interval — unless, of course, some distant slow particle gets deflected, and thus sped up, by some other particle, but we're largely disregarding such secondary effects for the nonce; and there's also the complication of whether a distant, fast particle is aimed in the wrong direction, which must also take into account its possible deflection by distant slow particles (or even by other fast particles), again bringing our unknown elements into play. But even after dismissing whichever of these cases we can actually dismiss, we still don't have any obvious, practical upper bound on how many particles might be involved, and we really don't have any specific foreknowledge of most of those particles exactly because they're arbitrarily far away to start with.

Now, here's a deceptively-innocent-sounding question. What does a particle moving faster than light look like? Or more to the point, what does a whole universe's worth of particles moving faster than light and potentially impinging on the volume-of-interest look like? Recall, from our earlier thought-experiment, that even under relativity, interpreting "what it looks like" can make a profound difference in how the theory is understood. Could it be, that the universe of fast particles looks like... a quantum wave function? After all, we surely can't account individually for each and every particle in the universe, nor even for just all the fast ones; so any description we produce of all those fast particles will be something like a probability distribution.

In fact, it seems we're really going to want to separate our picture of this situation into two parts: for the slow things, we'll want to assume we know specifically what those things are, whereas with the universe of fast things we can't get away with that assumption so we'll have to handle it probabilistically. In the traditional nineteenth-century approach to this sort of situation, we would still assume that we know about a few particular particles (perhaps even just one), and then we would summarize all our assumptions about the rest of the universe by positing some sort of space-filling field(s) — but we're then, typically, constrained to assume that these space-filling fields are propagating at no more than the speed of light, which may not be a workable assumption at all if the rest of the universe includes lots of stuff faster than that. Limiting field propagation to the speed of light is likely to be especially problematic when the known particles we're considering are themselves moving faster than the field propagates. Limited field propagation speed is, on the face of it, naturally suited to slow, local situations. Keeping in mind those still-pending unknown elements, it seems plausible we would develop two separate sets of theoretical machinery to handle these two cases, fast and slow: a local deterministic theory for when we're focusing exclusively on the slow stuff, and a non-local probabilistic theory for when we care more about the influence of the fast rest of the universe. And that's just what we do have: relativity and quantum mechanics. It's unsurprising, in this view, that the two theories wouldn't mix.

Our scenario is still just a sketch, needing to be fleshed out mathematically with those unknown elements; and we hope to do it in a way that fits reality and provides some sort of clue to unifying relativity and quantum mechanics. If all this is a right track rather than, say, a flight of fancy, it ought in principle to be part of the rhyming scheme of physics and should therefore, once we're onto it, flow more-or-less smoothly from the natural structure of things. These sorts of speculations tend (such is the impression I've picked up over the decades) to concern themselves with explaining why quantum mechanics emerges, but my instinct here tells me to start by looking instead to explain why relativity emerges. Where does all that bending geometry to stay below the speed of light come from? And for that matter, how does that particular speed come to play a distinguished role in things?

Slow things

If, in some to-be-determined refinement of our sketch, quantum mechanics focuses mainly on allowing for the collective weight of all the fast things, but disregards some unknown element(s) to do with the slow stuff, and relativity attends to the local unknown element(s) but relentlessly disregards the collective weight of the fast stuff, then both theories are approximations. I've been speculating on this blog for some time that quantum mechanics may be an approximation, which in itself has felt a bit odd since physicists brag, justifiably, about how very accurate QED is. Relativity is also considered accurate. Is it really credible to speculate that both are approximations?

Maybe. As an aid to thought we should perhaps ask, at this point, what we mean by "approximation"; considering what we were able to do, earlier, by fiddling with our definition of "velocity", surely some careful thought on "approximation" wouldn't be out of order. For example, on the slow-and-local side of things, the very fact that we are considering a specifically chosen set of slow-and-local things is itself an approximation of reality, even if the results are then correct-to-many-decimal-places within the assumptions: as noted, if we perceive that something has interfered from outside, we say the experiment has been corrupted and throw out that case.

If traditional fields are a summary assumption about the rest of the universe (as I've suggested in several previous posts, though the suggestion is going to zag violently sideways later in this post), we might plausibly expect this interpretation-as-a-summary-assumption to apply as well to the gravitational field, and thus to the relativistic curvature of spacetime. We're exploring a supposition in this post, though, that the overall scenario can be understood as Euclidean, and the relativistic curvature comes from our insistence on local focus. My first intuition here is that the relativistic curling inward to stay below the speed of light comes from a sort of inward pressure generated by the fast-universe leaning against the barrier of our insistent local assumption. Metaphorical pressure, of course.

As additional evidence that the conventional approach to fields is broken, I note stochastic electrodynamics, which —as best I figure— says that a classical system can reproduce some (maybe all) quantum effects if it's given random initial conditions on the order of Planck's constant.

As an alternative to the "inward pressure" metaphor: perhaps, by admitting only the local part of what's going on, we lop off the fast part of things. This suggests that somehow the form of our refined mathematical description would lend itself to lopping-off of this sort.

The use of tensors in relativity seems to be a way of internalizing the curvature of spacetime into the dynamical equations; building the locality assumption into the mathematical language so that, if the fast/slow dichotomy is really there, it becomes literally impossible to speak of the fast-universe in that language. Suppose we want to express the dynamics of the system —whatever those are— in coordinates of our absolute Euclidean spacetime. What would that look like? Trained in the usual relativistic approach to these things, one might be tempted to simply take the absolute Euclidean space as a "stationary" reference frame and use Lorentz transformations to describe things that aren't stationary; but we know that's not altogether what we're looking for, because it excludes the fast-universe.

At this juncture in the exploration, it seems to me that I've spent the past three-plus decades underestimating the obscuring power of relativity, and afaics I've been following the herd in this regard. Quantum mechanics is so ostentatiously peculiar, we've been spending all our metaphysical angst on it and accepted relativity with, in the larger scheme of things, scarcely a qualm. My own blog posts on physics, certainly, have heavily focused on where the weirdness of quantum mechanics comes from, touching lightly or not at all on relativity. Yet, consider what I just wrote about relativity: "it becomes literally impossible to speak of the fast-universe in that language." In my post Thinking outside the quantum box, I noted particularly that, in terms of memetic evolution, a scientific paradigm can improve its survival fitness either by matching reality especially well, or by inducing scientists to commit to a conceptual framework in which they cannot ask any questions that would expose weaknesses of the theory. Quantum mechanics explicitly controls how questions can be asked, and in my discussion I, as usual, barely mentioned relativity. But relativity controls the mathematical framework softly, so that we don't even notice what is missing. Which is why, though it's a favorable development that this angle of attack offers a symmetric explanation for the relativity/quantum split, what I find most promising about it is its ability to view relativity as preventing a question from being asked.

This view of relativity appears to be a direct consequence of having omitted fields in the first place. Einstein's theory of relativity traces back to problems created by the interaction between velocity of an observer and velocity of propagation of the electromagnetic field (through the thought experiment of chasing a light beam). By setting aside fields, we've deferred facing the observer-velocity/propagation-velocity problem; I did notice this deferral repeatedly as I wended the path to this point, but had no reason to dwell on it yet; if we're on a right track, though, in coming upon a natural structure we expect a resonance effect in which the pieces of the puzzle should all fall into place, including that one.

What sort of dynamics could account for that peculiar bending inward that causes the language of relativity to limit discussion to the slow-universe? Meditating on it, here's one thought. If the effect can be viewed as a lopping off of the fast part of reality, this suggests that at the upper end of the relativistic velocity range —where the weird stuff, the "bending", occurs— we're only seeing part of the picture: something is being lost, i.e., something is not being conserved. Which smacks of the sort of reinterpretation-by-playing-definition-games that we used to tamper with relativity up toward the top of this post. I'm also recalling, from freshman physics, there was this folderol about potential energy versus kinetic energy that always sounded pretty dubious to me (and, as I recall, to some of my instructors too): it hung together, but still somehow felt like a bit of a shell game, with energy shifting into reality from... where?

Which brings us back to the fast half of the scenario.

Target

If we expect to get useful insight out of solving for the unknowns in this sketch, we need a solution that accounts for both halves; fast, as well as slow. Relativity is quintessentially local, of course, but, once we step beyond that framework, non-locality isn't technically difficult (whatever philosophical qualms one might have): mathematical models addressing quantum phenomena —of which I can think, off hand, of about four that have been mentioned on this blog at one time or another— have no apparent difficulty achieving non-local (even instantaneous) transmission of internal information needed to drive the model; the technical challenge is targeting the transmission to where the internal information is needed.

First there is, of course, the conventional approach to quantum mechanics. Non-local internal information propagation is handled so painlessly it's easy to overlook that it's happening, thanks to Fourier analysis — a ubiquitous tool of modern quantum mechanics, whose neat trick is to represent arbitrary behavior of a wave function as a sum of (in general) infinitely many sine waves. You can describe, in this manner, a wave that propagates across spacetime at finite velocity; but you can describe anything else this way, too. For, even if the described wave were local in its behavior, the components of the sum are sine waves, and any sine wave is inherently non-local. The internal information represented by a sine wave, whatever that information is, occurs throughout all of spacetime in undiminished form; periodically, yes, but there's always another peak, another trough. So non-locality isn't a technical difficulty. Targeting the information is another matter, though the conventional solution to this too is easily overlooked. The internal information needed for a given particle is delivered specifically to the one particle that needs to know by the expedient of giving each particle its own wave function. Although we make much of how there are supposedly only four fundamental forces, accounted for (well, three of the four) by the Standard Model, in the detailed math we custom-fashion a separate force for each particle. Or, occasionally, for each entangled set of particles, which is nearly the same thing since the math of large-scale entanglement tends to get hideously messy.

I tentatively don't count basic de Broglie–Bohm pilot wave theory as a separate model here, on grounds that —in minimal form— it seems to me indistinguishable for this purpose from the conventional approach. Basic pilot-wave theory handles internal information distribution in essentially the same way as the conventional approach and merely, afaict, interprets what is happening differently; hence, presumably, Einstein's remark about it, "This is not at all what I had in mind." (According to Bohm, Einstein had suggested that quantum mechanics is an incomplete picture of reality in the same sense that a mechanical clock is incomplete if some of the gears are missing; but Bohm's pilot-wave theory, one might say, relabels the existing gears without really adding any.)

My reservation on this part of the count is that latent in pilot-wave theory is a potential feature that may fairly qualify as new on structural grounds: a pilot wave might not produce the sort of probability distribution conventional quantum mechanics calls for. The term for this sort of thing is quantum non-equilibrium: the idea that we see the usual quantum weirdness, rather than some other kind of weirdness, merely because we're living in a world whose hidden variables have settled into an equilibrium state. If Louis de Broglie explored this angle, I haven't heard of it; but scuttlebutt says David Bohm did consider it. I'm inclined to believe this, since Bohm, working at least a couple of decades later, evidently explored the idea much more thoroughly than de Broglie had had a chance to before it was rejected at the 1927 Solvay Conference.

Stochastic electrodynamics (SED) is, to my understanding, quite a different kettle of fish from pilot-wave theory. Notwithstanding that, as I write this, Wikipedia's opening sentence on SED claims it's "an extension of the de Broglie–Bohm interpretation". Wikipedia does note there's a tremendous range of work under the SED umbrella; but, afaics, what makes it all properly SED is that its basic strategy is to consider a classical system augmented by random background electromagnetic radiation on the order of Planck's constant. This "zero-point field" is a sort of pilot-wave, but doesn't in itself address the information-targeting problem at all; there are no customized fields (wave functions), and nothing else overtly replacing them. There's some work (I've fairly readily turned up) claiming that pilot-waves should arise as emergent phenomena in SED, apparently related to a relativistic-wave-equation effect called Zitterbewegung (literally, jittery motion; first proposed, according to Wikipedia, by Erwin Schrödinger). One gathers this emergence claim is not uncontroversial.

Still, despite the controversy, there's a certain fascination to the prospect of a theory in which all quantum weirdness is emergent from a classical system, and one might wonder why researchers haven't been flocking to SED in much greater numbers. I suspect this is, at base, because SED is a blatantly incomplete theory: it makes no pretense at all of explaining where the zero-point field comes from (though one technical paper I found suggested many practitioners have a vague intuition that it comes from the electromagnetic forces of the rest of the universe). One is struck that, as it happens, accounting for the summed influences of the rest of the universe is just what the current blog post proposes to do.

Then there's the transactional interpretation. From my forays into the transactional interpretation, its wave functions are conventional but, at least in the form primarily advocated by its chief architect John G. Cramer, it introduces one distinctive structural feature: an additional time-like dimension along which a quantum system develops under direction of its wave function. Cramer calls the additional dimension pseudo-time. The system is thereby able to reach equilibrium through a process orthogonal to spacetime, so that non-equilibrium is purely internal and cannot, even in principle, be observed.

My own term-rewriting approach, as it's lately been developing, also uses an additional time-like dimension for development to (some sort of) equilibrium; I used to call this extra dimension meta-time, and having outgrown that name I've yet to settle on an alternative name (so, even if I've maybe heard Cramer isn't happy with the name pseudo-time that he's now stuck with, I've thus far had no great luck either). Spacetime in my approach is a network of discrete elements (a "term") rather than a continuum, which precludes conventional wave functions. The most distinctive structural feature is that targeted internal information exchange is achieved through explicit point-to-point (or at least, contemplating one-to-many linkage between a variable binding and its bound instances, party-line) network connections.

The current post, though, seems to call for some different approach from any of these. The point here is to explore possible advantages of analyzing the universe in terms of fast and slow particles. To target internal information to specifically where it needs to go, something must be added to the sketch: either some whole new primary structural element, such as customized fields or customized network connections, which however seems likely to import the flavor of some other model rather than bringing out the flavor of the current one; or some peripheral device to, say, guide particles with the precision needed for entanglement. And whatever new thing we introduce, we want it to mesh well with the machinery we use to derive relativity.

Third law

To every action there is an equal and opposite reaction.
— Common paraphrase of Newton's third law of motion.

In looking for a way through all this underbrush, there's an interesting distinction in these various arrangements, between structures that obey Newton's third law, and structures that don't. Wave functions in conventional quantum mechanics are spooky partly because they react but do not act; the wave function describes things that the particle might do, and is therefore affected by other things in the physical world, but most of the wave function —distributed throughout the entire universe, after all— does not affect any of the physical universe, except for the direct effect of the one particular event to which the wave function collapses, and the indirect effect of statistical distributions of quantized events. Yes, this sort of reaction-without-action is atypical of classical physics; but before one gets too carried away with how unclassical it is, note that constant fields that act but do not react are used —in practice— routinely in classical physics — and, for that matter, in quantum physics. When Einstein had to describe gravitational fields that simultaneously act on matter and react to it, he struggled to cope with the circularity.

Particles, on the other hand, are pretty much always supposed to react to everything under consideration, and act on each other, though typically they don't act on fields; granting, they may generate fields that other particles react to.

Of further interest, reaction-without-action seems closely associated with the task of targeted internal information; that is, in constructing a field to describe reaction one specifies just what is going to do the reacting.

It seems that, as often used practically in elementary physics, particles are things but fields are not things; rather, fields represent aspects of interactions between things, either action or reaction but not so often both at once. We're almost —though not quite— safe to say that these are the essence of what a theory of physical reality represents: things, and the interactions between them; thus particles and, in some form or other, fields. Not quite safe for a couple of reasons: conceptually, quantum entanglement leaves room to wonder whether particles are independent things after all; and technically, orthodox fields in full generality aren't quite so straightforward (a point I'll expand upon further below, when it comes up in the natural flow of the discussion). But the sketch in our current thought experiment calls only for independent particles, so let us proceed from there.

Second law

The law that entropy always increases — the second law of thermodynamics — holds, I think, the supreme position among the laws of Nature. If someone points out to you that your pet theory of the universe is in disagreement with Maxwell's equations — then so much the worse for Maxwell's equations. If it is found to be contradicted by observation — well, these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.
— Sir Arthur Stanley Eddington, The Nature of the Physical World (first edition was 1928), chapter 4.

Since our foremost interest here is in consequences of the sketch, we don't really want to introduce some vast new structure, unrelated to the sketch, in the unknowns added to complete the sketch. We'd also prefer to have an exact theory, even if it's intractable so that we're always going to have to introduce some sort of approximation when tackling any real-world problem (as long as we're getting some sort of useful insight from it).

Moreover, recall J.S. Bell's observation that the trouble with quantum mechanics is lack of clarity on where to draw the line between quantum and classical realms, i.e., when to collapse the wave function (discussed in a previous post). Now consider what this implies, for our current thought experiment, about interactions of sets of things. We can imagine trying to assemble our model from pairwise interactions between particles, though we don't at this point know how we would then account for entanglement. We can imagine treating the whole system as one big entangled unit, but really there doesn't appear to be any practical use at all in this "wave function of the universe" approach. And in between these two, we have a different form of the same problem that Bell was objecting to, lack of clarity on where to drawn the line. So my own sense of the situation is that the only choice likely to provide an in-any-way-useful exact solution is pairwise interaction; at least, pairwise interaction "under the hood".

Where, then, does entanglement come in? Two possibilities come to mind. Either we derive entanglement as an emergent phenomenon from our pairwise interactions, or we derive entanglement as some sort of apparent distortion, akin to the apparent distortion we're already trying to conjure to account for curved relativistic space time. I don't have, going into this, any prior intuition for the emergent-phenomenon approach to this, besides which there are minds working on some variant of it already (so I needn't count, amongst my possible motives for pursuing it, a lack of exploration of that region by others). The distortion approach seems worth some attention; it certainly ought to be good for mind-bending. It is, to put this delicately, not immediately obvious how the distortion would work; for relativity, we've some hope that lopping off the "fast" part of the phenomenon would somehow leave behind curved spacetime, but is it remotely imaginable that dropping some terms from one's equations would "leave behind" entanglement? Plus the conundrum of just what one expects to "lop off" when looking at the quantum-like influence of the fast-universe.

To bring out this point in more detail, the distinction between lopping-off in relativity versus quantum mechanics is that in relativity it seems we may be able to lop off some of the energy, presumably siphoning off some of the energy to the fast-universe (which we'd likely manage with some slight-of-hand, per above); but in quantum mechanics, what is there to siphon off? Either things are correlated with each other, or they aren't. How can a correlation between things be the result of siphoning something off? This does bring up, rather in passing, a crucial point: the things in quantum mechanics are... slow. That is, quantum mechanics, like relativity, is a theory about the slow-universe, and if there is any siphoning-off to be done, it's going to be siphoning off into the fast-universe, just as it was with relativity. As for the main question, what can be siphoned off to leave behind correlation... how about siphoning off disorder? Or at least, redistributing it. Admittedly, I've long been quietly disturbed by the whole issue of whether total entropy in the cosmos is increasing or decreasing; increasing entropy is disturbing in one way, and decreasing entropy is disturbing in a whole different way.

This might be a good moment to consider the emergence element after all, inasmuch as that seems to have been dependent on some sort of equilibrium assumption, which may tie in with siphoning off, or otherwise shifting about, of disorder.

With or without emergence, though, entropy is slippery; as it's been taught to me, entropy isn't something you specify as a behavior of a system, but something you derive from that behavior. So any solution of this sort is going to be intuited from the principles and shaped to produce the right sort of entropic effect. An encouraging sign is that, again, this is changing the form of the question: rather than asking directly what would generate a quantum wave function, we're asking what sort of complementary entropic system would, when subtracted from reality, leave behind a quantum wave function. We're conjecturing that our theories, both relativity and quantum mechanics, are only about a part, perhaps a smallish part, of the universe, while a lot of other coexisting stuff isn't covered and follows different rules; the whole fast/slow thing is a guess that got us started, and for this post I'll stick with it and see where it goes, but the idea of a coexisting part of reality following different rules is more general and may outlast the fast/slow guess. In my most recent post in my term-rewriting physics series I suggested something similar involving "macrostructures", large-scale cosmological patterns (yonder). The vibe of this coexisting-part-of-reality hypothesis is also reminiscent of dark matter.

Contemplating that comparison with dark matter, it occurs to me that, indeed, dark matter is essentially stuff that has to fall outside the purview of our familiar physical laws; which in turn is essentially the function assigned for this post to the fast-universe, and is also, substantially, a way of saying that our existing physical theories are special cases of something more general.

The imprecision of entropy —that it doesn't pin down the behavior of a system but merely measures a feature of it— should be at least partly counterbalanced by the fact that quantum mechanics doesn't pin down the behavior of a system either. One of the oddest things about quantum mechanics is, already, that the Schrödinger equation doesn't specify any precise behavior at all, but merely takes some arbitrary classical behavior embodied by the Hamiltonian parameter Ĥ and produces a distorted probabilistic spectrum of possibilities. But for the current purpose, this ought to be an advantage, because probabilities are the stuff that entropy is made of.

In general research for this post, semi-idly perusing explanations of quantum mechanics on the internet, I turned up incidentally a somewhat off-hand remark that the three great branches of nineteenth-century physics were Mechanics, Thermodynamics, and Electrodynamics. (One can, btw, save a great deal of time in this sort of perusal, by discarding any supposed-explanation as soon as it claims 'quantum mechanics says a particle can be in two places at once' or equivalent, or if it refers to many-worlds as a "hypothesis" or "theory" or talks about "proving" it. If the presenter doesn't understand the subject well enough to distinguish between an interpretation and a theory, they literally don't know what they're talking about. I do not use the term literally figuratively.) This trichotomy raises a curious point that, while it seems inapplicable to the current post, may be worth pursuing in some future exploration: we are thoroughly accustomed to treat thermodynamics as an emergent property of large systems, of the statistical behavior of collections of things. Why? Why should mechanics and electrodynamics be considered primitive while thermodynamics is not? I recall noting, as I skimmed a modern quaternionic treatment of Maxwell's equations, that it claimed to unify electromagnetism with heat, and thinking, what the heck? Heat is at a totally different logical level, how can it even make sense to unify it with one of the fundamental forces? The question again is, why have we gotten into this habit of treating heat as a high-level emergent phenomenon? I suspect this is a conceptual legacy from the nineteenth century, when it seemed possible to derive thermodynamics stochastically from fine-grained deterministic particles. If we don't have deterministic particles anyway, though, it's not apparent we have anything to lose by making thermodynamics lower-level than these other "fundamental" forces. (Even if one prefers a deterministic foundation, that problem might be just as tractable working downward from thermodynamics.) With a befuddling implication that the primitive particles we've been studying could be emergent phenomena.

Length contraction

Now who would think
and who forecast
that bodies shrink
when they go fast?
It makes old Isaac's theory
look weary.
— Relativity (to the tune of Personality), in The Physical Revue, Tom Lehrer, 1951.

It's all very well to suggest, in a hand-wavy sort of way, that as you try to accelerate toward the speed of light you don't actually get there because, approaching light-speed, more and more of that acceleration leaks out into the fast-universe; but sooner or later you're going to have to explain how that actually works. Just what sort of shifting to/from the fast-universe would actually serve to generate special relativity? In fishing for a thought-experiment to aid intuition on this, it's problematic to start with some primitive event that induces acceleration, because one gets distracted by trying to explain the nature of the primitive event. A particle decaying into two (or more) particles? An elastic collision between particles? I suggest a different thought-experiment, more abstract and therefore simpler (and, oddly enough, also reminiscent of classic Myth‍busters, which derived an important element of fun from professionally conducted demolition): suppose we have a spherical object, traveling near to the speed of light, and we blow it up. (Maybe it's a firecracker, or a Death Star.)

Suppose this spherical object is moving (relative to us, presumed stationary observers) at 99% of the speed of light. At that relative velocity, there's going to be a pronounced special-relativistic effect. That is, when we say it's "spherical", we mean that someone at rest relative to the object would see it as spherical. Call its direction of travel the x-axis. Because (for us) it's moving at 0.99c, we see it as an ellipsoid, circular in its profile in the y‍z-plane, but greatly flattened in its x dimension.

When it explodes, it throws off a shell of some sort —gas or bric-a-brac or whatnot— which moves outward from the sphere at constant velocity in all directions; this is, again, what an observer stationary relative to the object sees. Suppose, for convenient reference, that the sphere itself, or at least a cinder of somewhat-similar size, is left behind at the center, continuing along with its original course and speed. To see the relativistic effects without getting confused by velocities that coincidentally cancel because they're too similar to each other, suppose the shell is moving outward at 9.9% of the speed of light. What shape do we see, in our very different reference frame? Evidently the y‍z profile of the shell is still circular. But the fore and aft sides of the shell are asymmetric. In accordance with special relativity, the fore side of the shell must still be traveling at less than the speed of light, although it's going faster than the remaining object; so this fore side of the shell appears very flattened up against the light-speed barrier. The aft side, though, is now traveling at considerably less than light-speed, slightly under 0.9c, which is still a "relativistic" velocity with definite length-contraction but should exhibit considerably less length-contraction than the remaining object. So the aft side of the shell is going to bulge outward decidedly more than the fore side.

What are we to make of this peculiarly asymmetric shell, flattened along the x-axis but much more so on its positive-x than its negative-x side? The shape evidently expresses, in some sense or other, the curvature of spacetime around the object. I'm reminded of the impression I've recently gathered, in my ongoing efforts to grok tensors, that the heart of the subject lies in the interplay of various sorts of derivatives — which may also be true of whatever sort of quaternionic treatment is dual to tensors, judging by my explorations on that front (yonder). That angle on things, though, seems contingent on some high-caliber insights I've not yet acquired on tensors and quaternions; so for the current post I'll let it lie.

If the bending-inward, as an object approaches the speed of light, is because some things get shunted out into the fast universe, for our purpose we need to say more than just that they're shunted out. We have to be concerned with just what happens to those shunted things, what form they take in the fast-universe — because otherwise what's the point of having this particular sort of new theory that says the stuff outside the old theories differs mainly by being faster-than-light. Will the new fast stuff somehow be targeted to return specifically to where it came from as the object decelerates away from light-speed, in which case it's not clear in what sense the targeted stuff is faster-than-light; or will it come back to haunt us through quantum-style effects, since we're supposing quantum mechanics is the theory that deals with the fast-universe affecting slow things?

Shockwaves

In discussing, above, why the fast-universe hypothesis would naturally lead to two separate theories, I noted the awkwardness of fast particles interacting with a field whose propagation speed is slower than the particles. As a discouragement to considering the scenario, I'd stand by this. Contemplating Einstein's reasoning, it seems that the notion of time was tied to the perception of light in a way that would foster philosophical absurdities if particles moved faster than the propagation speed of the electromagnetic medium. However, a century's worth of living with quantum mechanics ought to have made us a lot more tolerant of philosophical absurdities than we were when the framework of relativity was set up; and traveling faster than the medium of propagation is not strictly a technical obstacle. We have, in fact, a quite familiar real-world example of what can happen when an object emanating waves travels faster than the waves propagate: a sonic boom.

Intriguingly, a sonic boom is about as close to discontinuous as you can expect to get in sound propagating through a fluid medium. I've blogged before on the discrete/continuous balance (yonder; which I also alluded to in the discussion of why-two-theories); various modern theories of physics tend to suffer from an overdose of continuity, and compensate by introducing additional discreteness by various means; notably, wave-function collapse in the Copenhagen interpretation, or standing waves in string theory (and elsewhere, such as the aforementioned SED). Loop quantum gravity adds discreteness directly by quantizing space (although, admittedly, my intermittent forays have not yet given me a proper feel for that technique). So... if particles of the fast-universe produce something akin to a sonic boom, might this be an alternative source of discreteness? One might reasonably expect it to manifest in the quantum theory which considers effects from the fast-universe, rather than relativity which we're supposing systematically omits those effects.

There is an important distinction to be tracked, here, between waves that vibrate in the direction that the wave moves, versus waves that vibrate perpendicular to the direction the wave moves. Technically, the first are called longitudinal waves, the second, transverse waves. A sonic boom, as such, is a sound wave in a fluid medium; it's a wave of pressure, which is longitudinal: the medium moves in the direction of the wave's propagation to produce variations of pressure across surfaces orthogonal to the direction of propagation. Longitudinal waves occur in a scalar field; compression causes the scalar field value to increase (high pressure), decompression to decrease (low pressure). However, in the usual treatment under Maxwell's equations, the electrical and magnetic fields are vector fields, with no scalar component, and electromagnetic waves are thus purely transverse.

Here we get into a bit of history. Around the turn of the century (no, not that century, the other one), Oliver Heaviside considered the possibility of longitudinal electromagnetic waves (Electromagnetic Theory Volume II, 1899, Appendix D) and roundly rejected them. He first observed simply that the observed phenomena of light aren't consistent with longitudinal waves; they don't arise in elastic solids, nor light reflection and refraction. Then he dove anyway into an extensive mathematical exploration of how to consistently extend the existing equations with compressional waves, only to reach, on the far side of it all, the same conclusion.

Heaviside, if I may say so, knew what he was about. I expect his conclusions were thoroughly solid within the scope of his chosen assumptions, his chosen conceptual framework. To put this in proper perspective, though, what were his chosen assumptions?

Oliver Heaviside and J. Willard Gibbs were the primary contenders on the vector side of the great vectors-quaternions debate of the 1890s. As I've discussed on this blog before (most recently last year), quaternions were discovered in 1843 by William Rowan Hamilton, with the specific purpose of filling a technical need to represent directed magnitudes in three-dimensional space as mathematical values. Today, if we want to study some more-or-less-hairy structure, we may, somewhat casually (by historical standards), set up an algebra for it, specifying how to describe the things and what operations can be performed on them, and off we go. Not so in 1843. There was no idea of mathematically studying structures that don't have all the well-behavedness properties of ordinary numbers (not unless you count things like alchemical tracts that associate certain integers with certain metals, certain metallurgical operations with certain arithmetical operations, and then propose to transmute lead to gold arithmetically, which sort of thing has been implicated in associating the name of eighth-century Persian alchemist Jabir ibn Hayyan with the etymology of our word gibberish). Through the work of Leonhard Euler et al., mathematicians had only just recently been forced to swallow complex numbers (which really are fabulously well-behaved by algebraic standards), and still had indigestion therefrom. Quaternions are, from a well-behavedness standpoint, the most conservative generalization that can be taken beyond the complex numbers, just barely less well-behaved by giving up a single axiom —commutativity of multiplication— while retaining, most especially, unique division.

(Since quaternion multiplication isn't commutative, i.e. in general  ab ≠ b‍a,  quaternion left-division and right-division are separate operations. Each non-zero quaternion a has a unique multiplicative inverse  a⁻¹  with  a a⁻¹ = a⁻¹ a = 1,  and one then has right-division  a / b = a b⁻¹,  left-division  b \ a = b⁻¹a.)

The trick to constructing quaternions is that, instead of introducing a single square root of minus-one,  i,  as for complex numbers, you introduce three separate square roots of minus-one, traditionally called  i, j, k,  corresponding to the three orthogonal axes of three-dimensional space. The three imaginary units anti-commute with each other, thus  i‍j = −j‍i etc., and are cyclically related with  i‍j‍k = −1. The general form of a quaternion is  q = w + xi + yj + zk. Part of Hamilton's genius was realizing that you can't get all the well-behavedness he wanted in just three dimensions; you need the three mutually symmetric imaginaries for the three dimensions and a fourth real (i.e., non-imaginary) term  w. (Btw, yes he did immediately think to interpret this fourth dimension, metaphysically, as time, so we could defensibly call it  t  rather than  w;  but I digress.) Carried away with enthusiasm over how beautifully the mathematics of quaternions worked out, he invented a great many new words to name different concepts in quaternion analysis; including scalar for the real part of q, vector for the non-real part, and tensor for the length of the whole thing (square root of  w² + x² + y² + z²), which is where we get all those words, although the meaning of tensor has since changed beyond apparent recognition. He proceeded to devote the rest of his life to exploring quaternion analysis, and seems to have eventually worked himself to death on it. Meanwhile, though, mathematicians took his idea of generalized numbers and axiomatic foundations, and ran with it. By the last couple of decades of the century, mathematicians had pretty thoroughly outgrown that instinct to retain every last drop of well-behavedness, and Gibbs and Heaviside proposed a system of vector analysis that would treat the triples of spatial coordinates  ⟨x, y, z⟩  as values for their own sake, dispensing with the scalar w and likewise dispensing with the whole idea that there were any imaginaries involved. And that, of course, is what set the stage for the great debate of the 1890s.

Notice, though, that in choosing vector analysis over quaternion analysis, what you are giving up is, straightforwardly, the scalar term. And scalars are what you need for longitudinal waves. (Heaviside, in his digression on the mathematics of longitudinal waves, seems to have gone about it by deriving a scalar from the vectors, quite a different approach from carrying a scalar term alongside each vector, but within the bounds of his chosen conceptual framework.) When modern investigators pursue the idea of longitudinal electromagnetic waves —rather outside the mainstream, but I see there was a 2019 conference with some of this sort of thing in Prague, under the title Physics Beyond Relativity— these investigators are apt to do so using a quaternionic generalization of Maxwell's equations. Maxwell himself, btw, never wrote his equations in the form we use today; they were cast into that form by Gibbs and Heaviside. Maxwell was a quaternion enthusiast, and used two forms: Cartesian coordinates for manual calculation, and quaternions for conceptual understanding. Maxwell died of cancer in 1879, at the age of 48 — eight months after William Kingdon Clifford, another quaternion enthusiast, died of tuberculosis aged 33, and about two years before Gibbs's system of vector analysis was first printed.

Fair warning: if you undertake to study the modern literature on longitudinal electromagnetic waves, brace yourself for some of that psychoceramic filtering I mentioned at the top of this post. Authors may vary, but those I've seen who seriously engage this topic tend to combine it with some variant of Nikola Tesla power-transmission, which, though perhaps only moderately iffy in itself, is immediately adjacent to the full-blown conspiracy theory of Nikola Tesla limitless-free-energy, according to which the Powers That Be suppressed his limitless-free-energy invention because it would have undermined their monopoly on centralized power distribution and thereby their control over the general population. (If you enjoy a good conspiracy theory, there are some lovely ones about Tesla at conspiracies.net; I'd warn you to turn off JavaScript and cookies before visiting that site because it's probably a trap to invade your privacy, but of course you already know the entire internet is such a trap... right? :p )

Geometry

When we say that the geometry in our thought-experiment is Euclidean, we're saying something about how the distribution of particles, by location and velocity (or "position and momentum"; however exactly one happens to frame it), can be. In deriving quantum-equivalent discreteness from sonic booms of fast particles, this so-to-speak "Euclid-driven" distribution should be not only coped with, it should be desirable, for otherwise we ought to question whether the particular premise of this thought experiment is really worth further pursuit after all. We've gotten a great deal of general insight in this post from pulling apart the different parts of our models (notably, particles and fields) and understanding how the different parts can be used in various theories, but as for this particular theory, that Euclid-driven distribution is its essence, and either we want that essence, or perhaps it's time we try shopping somewhere else.

Is there any sign of such a distribution in quantum mechanics? Overtly, there doesn't seem to be; the only opportunities for "tuning" in quantum mechanics seem to be far more global, like Planck's constant, the fine-structure constant, the gravitational constant, the somewhat-notorious cosmological constant (the sorts of things that come up in the so-called Dirac large numbers hypothesis — though it's a somewhat sobering indication of how hazardous this sort of investigation can be, that Dirac, with currently quite a good reputation in the physics community, gets the respectable front of this approach named after him, while in Wikipedia's article on Eddington, his particular pursuit of the approach is described as "almost numerological"; and before you say that's because of details of how they pursued it: yes, that's kind of the point). Similarly, in stochastic electrodynamics the only obvious parameter in the influence of the rest of the universe is the size of the zero-point field, i.e., Planck's constant again. If we're going to get anywhere with the specifically Euclid-driven approach, we need a place in the theory for more shaped input; either more customized to the particular systems we study (made up of slow particles), or at least more customized to the particular configurations of the fast-universe that affect our particular systems-of-interest.

My approach to term-rewriting physics is another example of attempting to assign a nontrivial structure to the non-local connections of the model; with, in that case, no initial expectation of anything remotely geometric about the "network" other than, perhaps, a vague suspicion there might be something rotational involved. Compare-and-contrast the Kaluza–Klein-style approach of string theory, in which a rotational element of this sort comes in through an overtly geometrical structure. Well into my exploration of the term-writing approach, I speculated on the possibility of emergent macrostructures that would, presumably, have some unimagined sort of dual "co-geometry", whereas string theory introduces new fine-grained local geometry, and the current post tries to impose a single geometry across the entire range of speeds (slow and fast). Details are especially lacking for the term-rewriting physics treatment, which starts with little notion of what sort of structure the network would have, other than broad inspiration from variable-binding patterns in vau-calculi; but the only obvious function of that unknown structure in the model is to pipe in seeming-nondeterminism, with no more specific purpose overtly indicated. It seems, though, that in both of my exploratory approaches I'm supposing there is some definite structure to the way the rest of the universe impinges on our system-of-interest to produce quantum phenomena. My intuition evidently favors this as a place for some of those missing gears that Einstein was talking about.

Suppose, then, we're trying to fashion a bit of machinery to fit into this gap in the theory where the overall structure of the network of non-local connections should go. What is this missing gizmo supposed to do? We've said, piping in nondeterminism, but there's little guidance in that. From the example of the Schrödinger equation, it seems that, if quantum mechanics represents the rest of the universe impinging non-locally on whatever system we're interested in, then the rest of the universe is acting as a sort of distorted mirror for whatever we're doing.

The very blankness of quantum mechanics on this point, the lack of any apparent overall structure to the non-local network, may itself be the shortcoming of the quantum theory that it systematically prevents us from asking about. Which might be an even more subtle sort of distraction, after all, than relativity offers; at least the speed of light has an obvious part to play in relativity, whereas this is just not there, something one doesn't think of because there's no reason one would think of it.

So perhaps in my next explorations into alternative physics, rather than using quantum mechanics and relativity as starting points from which to look for different ways to address the same things they address, I ought instead to be looking for macrostructure that's altogether outside their scope. Alternatively (sort of), it would be rather elegant if the shape of the interaction between macro- and microstructures were somehow responsible for the shape of the Schrödinger equation, the distorting lens that transforms classical systems-of-interest into their quantum forms; not that I've any idea atm how that would work.

Fields

In background research whilst drafting this post, I stumbled on a whole alternative-science pocket community I hadn't encountered before, represented at the Prague conference linked earlier, Physics Beyond Relativity. The conference as a whole appears to have been a fairly eclectic collection of more-or-less-fringe science sharing a common negative theme of doubting some aspect or other of relativity; but, browsing the material, I gradually discerned a significant subset with a more positive unifying theme (perhaps following the conference organizers, who may have deliberately grouped similarly-themed talks near each other on the schedule).

Early on in perusing the conference site, I was rather bemused by the assertion that Lorentz contraction —which, the site pointedly remarks, has never been experimentally confirmed— was invented, following the Michelson–Morley experiment, to save the ether. There are several interesting points in that. The bit about experimental confirmation highlights the thorny general problem of what experiments do and don't confirm. The allusion to Michelson–Morley reminds me of a rant (seen many years ago, and which I cannot, alas, pin down atm) by a physicist wondering when and how Michelson–Morley had acquired a sort of mythological status, whereas their understanding of the history was that Michelson–Morley was just one amongst many elements that contributed to the consensus paradigm-shift in physics. But what really caused me to do a double-take was the bit about saving the ether. Say what? I was taught that the ether theory was discredited by Michelson–Morley, and this is pretty much what Wikipedia says about it (unsurprisingly, per Wikipedia's mainstream bias). The point about the ether turns out, I've concluded, to be rather central to that subset-unifying theme; but it took me a few steps to get there. (To be very clear, there were also plenty of talks at the conference indifferent re ether, and at least one openly advocating it.)

The point about experimental evidence for Lorentz contraction: It's actually quite hard to experimentally demonstrate Lorentz contraction, as on the face of it you'd have to accelerate some macroscopic object to near the speed of light and observe what it looks like — from the side, so e.g. accelerating a spaceship to near-light-speed going directly away from us wouldn't suffice. The only things we've accelerated that much are particles in particle accelerators, which might as well be point-particles. It was pointed out, though, that the particles themselves come in bunches, and if you treat one of those as a macroscopic object, you can see whether it contracts. Well, it doesn't. Experimentally observed. The official answer is, of course, that the individual particles are expected to contract, but the interval between them doesn't contract. Which, to me as a third party watching all this, rather demonstrates the principle that what an experiment demonstrates depends on your interpretation; the mainstream interpretation here doesn't seem glaringly unreasonable, but there may be an element of post-fact‍o justification in it; mainstream science is, somewhat by definition, heavily invested in concluding that what we're looking at is consistent with the prevailing paradigm. While these other scientists look at the same thing and see it as contradicting the paradigmatic prediction, which... isn't exactly i‍mplausible, either.

Another sub-theme at the conference is Weber electrodynamics, another bit of alternative science I'd somehow either never crossed paths with, or at least no so that it stuck with me; an alternative to the Maxwell electrodynamics thoroughly embraced by mainstream physics (occasioning another double-take on my part). Weber, like Maxwell though several years earlier, had gone about unifying a bunch of pre-existing equations to produce an overall description of the behavior of electrically charged particles; but whereas Maxwell synthesized a wave theory, Weber's starting point was Coulomb's law and, accordingly, his single equation describes a point-to-point force between two charged particles — with no field involved. Weber's generalization of Coulomb's law depends both on the distance between the particles, and on its first and second derivatives, the derivatives appearing in ordinarily negligible terms because they're divided by c² — the square of the speed of light. In the structural terms discussed above, though, Weber's equation is structurally interesting in that it's targeted: it specifies exactly which particles are affected by the force, rather than describing a field which would then be expected to affect whatever particles happen to encounter it.

That dependence on the first and second derivatives, I admit to finding rather fascinating. For one thing, it puts me in mind, just a bit, of the acceleration-dependent MOND alternative to Newton's law of gravitation, which has been knocking around for several decades now as an alternative to the dark-matter hypothesis. (Admittedly, when it comes to it, I have trouble quite wrapping my head around the role of acceleration in Maxwell, which dives down some sort of rabbit hole to do with the metaphysics of the magnetic field; there's likely a whole blog post in that alone.)

Weber's equation occasions another example of alternative interpretations of the same observation. A peculiar property of the equation, criticized by Hermann von Helmholtz, is that under certain circumstances it leads to effectively negative inertial mass. This involves very small distances — essentially, so it's suggested, the size of the nucleus of an atom. Negative inertial mass means attractive and repulsive forces swap places, with the implication that this could account for positively charged protons within an atomic nucleus not instantly blowing apart the nucleus. That's worth another double-take. Mainstream physics basically introduces a whole additional fundamental force (the strong nuclear force) to provide an attractive force to balance that terrific repulsive force between protons. The big-picture sense here is that our whole description of nature has been vastly complicated by using global ether-like fields rather than point-to-point forces.

It is, thereby, a fascinating exercise in the subjectivity of observation, to read in Wikipedia's article on Weber electrodynamics (linked above) the statement that "Despite various efforts, a velocity-dependent and/or acceleration-dependent correction to Coulomb's law has never been observed". Which is true only if you do not count, as observational evidence of such a correction, the observed fact that atomic nuclei exist. It's a bit fascinating to reflect that the scientists on both sides (or however many sides this has) have such heavy psychological investments in their respective interpretations that most, if not all, of them are never going to budge without at the very least some truly extraordinary new development (and perhaps not even then).

A particular point raised by advocates of Weber, with interesting structural connections, is that Maxwell electrodynamics doesn't obey Newton's third law unless the electromagnetic field itself is treated as an object in itself; that is, the field can push a particle, and the particle is affected by the field, and the only thing the particle pushes back on is the field itself. Another way of saying this is that various quantities are not conserved unless the field is included. Presumably, this would be true of any orthodox field carrying a force — including the gravitational field, with the expectation that relativity, predicting gravity waves, would have this property as well. In fact, it seems that any non-quantum field would work this way, in general (although, as described earlier, classical fields are often treated as constants and thereby act but do not react). Note, though, that quantum waves are not subject to this objection, because of wave/particle duality: the wave does not distribute conserved quantities independent of particles, because when it actually interacts with something else, the wave itself is a particle. So all of this is tied in with quantization. With the heavy irony that this objection to relativity-style fields is contingent on these fields not being quantized. (Conversely, an objection to Weber electrodynamics, prominent in the Wikipedia article, is the existence of the phenomenon of radiation pressure — but this too appears to be an artifact of interpretation, in that the quantum strategy of substituting particles for waves should allow radiation pressure to be described in an ether-less manner as well.)

Compared to this radical rejection of fields in favor of pairwise interactions between particles, it makes sense that orthodox fields, even post-Michelson–Morley, would be perceived as ether theory. This, at any rate, I take to be the common tenet of that little alternative-science community, such as it is: that the basis of physics should be ether-free. My impression, from a sampling of talks at the conference, is that this subgroup have collectively a bunch of fragments of intuition that there's something there, and just need for an insight to come along to bring it all together (which is what we're all looking for, of course, whichever area of theory-space we're looking in).

All of which highlights a point I failed to bring out, earlier, in discussing sonic booms: shockwaves occur in an energy-carrying medium. That is, any object moving through such a medium —in an orthodox (i.e., non-quantized) handling of such a situation— has to invest energy, and, therefore, it has to slow down. The loss of energy to field-drag would have to be allowed for in such a theory. Evidently, it would alter the shape of the probability distribution of fast-universe velocities, shifting them downward and eventually tending to push things from the fast-universe to the slow-universe. Oddly, it would also seem to cause even slow things to slow further, reminiscent of the Aristotelian view that things in motion tend to come to rest; one is also reminded of Heaviside's criticism of electromagnetic compression waves, per above, that they apparently do not occur in elastic bodies (presumably because if they did, we would expect each bounce of a rubber ball to produce an accompanying burst of electromagnetic radiation). One would then either have to provide some means to also shift velocities upward, or accept a model in which the universe necessarily runs down. The idea of siphoning disorder into the fast-universe, suggested earlier to support quantum entanglement, might possibly play into this. On the face of it, certainly, shockwaves aren't compatible with an ether-free theory.

A final thought on this. In my first physics post on this blog (back in 2012), I said that when a long succession of theories just keep getting more complicated, they may all share in common some wrong assumption that is diverting them into this downward spiral, and I particularly recommended basic physics as an area where this appeared to have been happening for the past century or so. In those terms, the common premise of that pocket community is that the ether metaphor is the wrong assumption that's got mainstream physics in a cul-de-sac.

Where to go from here

The fast/slow particle hypothesis itself hasn't been hugely successful, though it has some interesting features. It allows relativity to be viewed as preventing something from being said, and on the quantum-theory side it stirs speculation on how to balance the correlations of entanglement by siphoning off disorder into the fast-universe.

Where the thought experiment has paid off spectacularly is on the "stereoscopic view of the terrain" side of things, where, as noted at the top of the post, we've turned up a welter of ideas to play with. Most of these have somehow to do with the role of fields in basic physical theories, which figures since the immediate thought experiment was mostly about omitting the parts of conventional theory that involve fields. Important distinctions were made between fields describing action versus fields describing reaction, and between fields describing universal forces such as electromagnetism (or presumably gravity, though the form of that in relativity is trickier) versus fields describing targeted information such as the wave function of a particular particle or entangled subsystem. The large-scale structure of non-local interactions was flagged for further study, which doesn't obviously relate to the fields aspect of things (though, once one has said that, one naturally starts wondering about it). Weber electrodynamics pops up as a field-free approach that pulls in velocity and acceleration, inviting possible structural comparison/contrast with Maxwell electrodynamics; moreover, the velocity/acceleration aspect is vaguely reminiscent of the acceleration-dependent force law of MOND, while the point-to-point feature brings to mind the targeted information of quantum wave functions. Both connections are tempting, hooking up Weber, or really any field-free approach, with either relativity or quantum mechanics.

Riffing for a moment: MOND and dark matter were both devised to explain why stars toward the outer edge of galaxies don't move as they were expected to under Newtonian gravitation. Both are kludges. The dark-matter hypothesis says, these stars don't move the way they ought to if they were being pulled only by the mass of the things we can see, so let's pretend there's lots of massive stuff we can't see that's distributed so as to produce the observed motion. It's necessary to hypothesize a huge amount of invisible mass. The MOND alternative, proposed in 1983, is to tweak Newton's second law ( F = m a ) as it applies to gravitation, so that it remains Newtonian for large accelerations but exceeds the Newtonian values for accelerations much below some small critical value a₀. Tampering with Newtonian dynamics doesn't bother me so much as having no intuition for why. If (for an example that readily comes to mind) the missing element were quantization of gravity, I'd guess the lower end of the curve might be attenuated rather than increased, akin to the way quantization attenuated predictions of black-body radiation (thereby forestalling the so-called "ultraviolet catastrophe").

Speaking of which, how do we introduce quantum-ness into an ether-less theory a‍la Weber? Looking at a small system-of-interest, consisting of a few particles, we can certainly consider the force between each pair of particles of interest (though the number of pairs goes up as the square of the number of particles); but it seems a rather daunting prospect to consider, for each particle of interest, pairwise forces in relation to a practically infinite number of particles-not-individually-of-interest in the rest of the cosmos. Refining the question, then, how does the ether hypothesis impinge on quantum theory? Recall the Schrödinger equation,

iℏ	∂	Ψ
	∂t

There isn't any explicit ether-like field here; in fact, there's very little here at all (which is why it's such a nifty equation: simplicity). There is time, which may be awkward when relativity denies absolute time; but the shape of things appears to come in primarily because the whole equation is parameterized by the classical behavior of the system-of-interest, described by the Hamiltonian Ĥ. It isn't actually necessary for Ĥ to describe anything remotely ether-like, and physicists in tune with the technique could readily take an arbitrary non-geometrical Ĥ and run it through the Schrödinger equation to produce a quantum description. No, I don't know quite where to go with this; though I've wondered if an entire cosmos, when subjected to a Weber-style force law with certain properties, naturally gives rise to the abrupt, lurching behavior of quantization. These questions do, though, seem a credible jumping-off point for further exploration.

If I were desperately trying to narrow things down to a single approach —which I realize is a common sort of desperation amongst scientists (normal or otherwise) when caught in a paradigm-crisis situation— all this welter of possible directions could be quite frustrating. Which, just at this moment, is another good reason to favor a more liberally wide-ranging attitude.

Sapient storytelling

2020-07-16T21:50:00.000-07:00

DOCTOR: You're improving, Harry.
HARRY: Am I really?
DOCTOR: Yes, your mind is beginning to work. It's entirely due to my influence, of course. You mustn't take any credit.
— The Ark in Space, Doctor Who, 1975.

Here I mean to further develop my theories on the evolution of human culture, which has been a theme on this blog since way back. Most recently, I stretched my thinking on the subject through an exploration of the notoriously unorthodox ideas of Julian Jaynes, emerging with a greatly sharpened focus on storytelling. This time I mean to take what I can, similarly, from Bruno Snell's The Discovery of Mind (Die Entdeckung des Geistes, 1946; English translation with some added material, 1953) and Eric Havelock's Preface to Plato (1963).

My theories (to recap) started with Havelock's core notion of a cultural phase shift from orality to literacy in Greece just before the time of Plato, circa 2500 years ago (Plato was circa 2400 years ago; note, btw, I mostly rough out these ancient numbers-of-years-ago from the convenient figure 2000 CE, so my numbers are a shade low). From there, I extrapolated a still earlier phase of culture preceding orality, giving it working name verbality and drawing inspiration for its character from Daniel Everett's recent study of the Pirahã (Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle, 2009). My seemingly-separate pursuits of memetics and sapience converged with this and thoroughly blended into it (leading me —not for the first time— to wonder if (a) I've been attracted to these things because my intuition, which I've always figured is a lot smarter than I am anyway, sensed they were connected; (b) my interests are all similar in character and therefore especially likely to have natural connections between them; (c) I've found connections between them because they were the things I was looking at; or (d) all of the above). And then I read Jaynes, because as readers of this blog might notice I favor using unorthodoxy as a mental limbering-up/perspective exercise, and, after reading him once, found him worth a second, in-depth reading to squeeze all the perspective out of it that I could; and Jaynes mentioned Snell.

My favored mode for these intermediate exploratory posts is to pull the appropriate materials down from my (metaphorical) top shelf; spread it all out on my work bench; contemplate it, tinker with it, and develop some new ideas; and gather it up with perhaps some re-sorting and put it back up on the shelf till next time. In this case, I actually have quite a lot of previously developed material to pull down from the shelf, which I'm eager to write out in a clean copy following its extensive advancement provoked by Jaynes. It's always clearer what you've got, from one of these major explorations, after it's had time to settle out, leaving behind what would only become distracting artifacts of the exploration through which it formed.

Also as usual, I mean to leave a record of my explorations, including the tangents (here's a shout-out for tangents!), which in this case means showing how I've incrementally assembled understanding of the material as I've worked my way through the texts. Initial sections of this post are the state of my understanding before starting the studies of Snell and Havelock, and my accounts of those studies themselves are broadly time-ordered, as possible, recording which insights were derived at which parts of the source material; though in some cases I tweak the phrasing of these intermediate thoughts to avoid confusion from where things end up later.

I do feel I've come out of this with far greater understanding of the non-self-conscious Homeric mindset, and of the Havelockian conceptions of oral poetry and the literate mind; with some new insight possibly to be had, from the Havelockian conceptions, into the role of tense in the emergence of modern consciousness. The really unexpected outcome here has been a radical reevaluation of Socrates and Plato, whom tbh I'd tended to think of as dry academics, as instead —maybe— front-line fighters in a social revolution of immediate relevance to the internet age.

Contents
Mind and language
Timeline
Hallucination
Snell
Havelock
Evolution
Opinion

Mind and language

There seems to be a consensus of modern authors I've read on the subject —though I could be stuck in a clique of authors who refer me to each other— that language is the unique feature of human evolution that sets us apart from other animals. This consensus view takes to its unreasonably literal extreme, so it seems to me, the insight that individual people are "nothing" without the culture they learn from their community and without the language through which that culture is communicated and embodied; so that we-as-individuals become unimportant except for our somehow having this ability to use language and thus allowing the collective to extend itself through us. Homo linguist, as it were. Having thus dismissed individual sapience as merely a consequence of our language use, one must then ask where on Earth our ability to use language comes from. A particularly popular answer in this context is the Chomskyist universal grammar hypothesis, which says we've got a specialized language-processing unit hardwired into our brains; a natural conclusion, perhaps, —that we'd need help from a peripheral language-processing device— once one dismisses the possibility we might actually be smart on our own.

I don't buy it.

Not that language isn't great and all that. (I'm a big fan.) But I don't think it's the driving force. It seems to me much simpler to suppose —though it may be dreadfully old-fashioned— that we are actually smart, that there is something extraordinary that goes on in our brains that doesn't happen (readily nor often) in the brains of other animals. And that that is where language comes from. There's no mystery in how we can understand language because its form is already a consequence of the character of our minds; our languages are the sorts of languages we would create if we'd been the ones who created them, because, well, we are the ones who created them. No need for some special encoding/decoding system, between sapients; the unadorned wetware already does exactly what's needed because what's needed is to pass the uncoded message directly from one mind to the next. I figure language is a natural consequence of having a high-enough density of similar-enough sapiences communicating with each other to make the development of language worthwhile; then just give it lots of time to evolve.

Honestly, it seems rather awkward otherwise to account for anyone ever having a clever idea that nobody else has had before. I don't find it credible that such things are all just a matter of certain ideas meeting up with each other and recombining; however they got together, they'd just sit there inertly if there weren't something to the individual host mind in which they met. And, given that there has to be some sort of thought engine there at all, suppose it's a powerful engine and I see no call for anything more than that; don't bother with a fancy interface unit to code and decode, just hook up a transmitter and receiver and let the things resonate with each other directly.

This view, that the uniqueness of the human mind is something about how it processes information, is perhaps not surprising from a traditional computer scientist, brought up on algorithms (i.e., rules for how to process information). Acknowledging the role of individual information processing is, afaics, rather atypical of recent memeticists, as memetics emphasizes evolution of ideas from the ideas' perspective, traditionally downplaying the role of the individual host minds in favor of the contagion from one mind to another — which, on the face of it, ought to spread from mind to mind through the vehicle of language. I'm finding, though, that having taken an algorithmic position and worked out a few simple theories about the algorithms involved, this modest catalyst to the memetic theory has a gratifying stabilizing and simplifying influence on my treatment of memetic evolution.

As a case in point, subordination of language to sapience takes the sting out of that gadfly of modern linguistics called the Sapir–Whorf hypothesis: the theory that (in strong form) how your language works determines how you think. If the underlying sapient mind gives rise to language, which is then the medium by which ideas get from mind to mind, one expects a particular language to carry bias toward the kind of thought structure that was put into it —because, after all, the thoughts it induces should be approximately those put into it— and then the individual receiving mind, having been influenced by that bias, may (as we strive to do, when "keeping an open mind") shake somewhat loose from its conditioned assumptions, and project whatever new thoughts it may pull from the wellspring of its individual sapience, into its outgoing language; not only expressing new meaning, but bending the language in the process, as, say, nudging word nuances and grammatical structures, or in the extreme coining new words or innovating grammar.

My model of mind isn't elaborate; really, having a model of mind beneath the level of language seems half the battle. I'll introduce the main features of my model at the start of my timeline of events (just below).

Btw, when I suggest we're actually smart, how smart do I mean? Generally, that is; not getting hung up on individual variation (and setting aside individual fallibility, which is not to the point of what I mean by smart; cf. Thoreau's "hobgoblin of little minds"). I've proposed in a past post that sapience is qualitatively more powerful than formal reasoning — that is, more powerful in practice, as asking for it to be more powerful "in theory" would be surrendering the field to formal reasoning from the start. My recommendation on how much more powerful (offered in a post on conlanging) has been that, similarly to the way formal computational power has an apparent "most powerful" class of Turing-powerful models able to simulate each other, there may be a most-powerful class to which general sapiences —such as humans— belong, that in-principle would be able to understand each other; notwithstanding the extraordinary practical difficulties of understanding each other in particular instances. (As I remarked in the conlanging post, we still aren't even sure whether or not rongorongo is writing, let alone what it means.)

Timeline

The basic elements of my model of mind, I believe we share more-or-less in common with a very wide range of animals who are clearly nowhere near sapient. Including the very affectionate house cat who has kept me company throughout this and a great many preceding blog posts.

To start, I approximate the human mind as a whole as a vast population of agents, each representing a thought; with a central register designating about seven-plus-or-minus-two of these as current foci of attention. The central register, which I'm supposing is our short-term memory (hence the size, seven plus or minus two), I may also refer to as the "non-Cartesian theater" (or simply the stage), in contrast to a favorite debunking target of Daniel Dennett, the Cartesian theater (a major target of his 1991 book Consciousness Explained); the point being that the key objections directed at the Cartesian theater are really objections to the monolithic Cartesian audience of the theater, which is not a problem when the audience is a massively distributed sea of agents.

The size of the theater is an obvious parameter of variation; one would think the house cat would have a very much smaller theater, and one might guess that most animals would.

The second component of my model of mind, which I reckon logically dependent on the first, I call the self-loom, which constructs a memory of the individual's history, based primarily I suppose on the pattern of occupants of the stage. I'm mentioning it separately as it seems logically possible to have the stage without using it as a frame for a loom (the smaller the stage, the more plausible this seems, to me — fewer threads means less for a loom to weave, after all), but the loom figures prominently in my theories as responsible for the curious phenomenon of dreaming — and like so many pet owners, I perceive from twitches and vocalizations while it's asleep that our house cat dreams. So all this mental machinery I'm describing, which is as much of the mind as I can sketch internal structure for atm, we share with probably most advanced mammals at the very least.

This shared level of mentality changes at the first point on my proposed timeline, the onset of the Paleolithic, about three million years ago (Wikipedia last I checked claimed 3.3 million). I figure this is when humans became sapient. I've blogged before on the evolutionary intricacies involved, which appear to make it an extremely unlikely evolutionary development, but I blogged about it in the abstract, deliberately keeping that discussion independent of just what the development was that took place then. I'm now identifying it as the facile ability to construct symbolic thoughts (cf. Terrance Deacon's 1997 book The Symbolic Species); thus identifying it, in my model, as a variation in agent formation, an activity implied by but undescribed in my sketch of the mind.

Language, as a product of sapience, would also have emerged at that time. I'm aware this event may be placed a bit later. Two million years out, say, rather than three; or even later, as there's still plenty of room in this part of the timeline. The rest of my theories wouldn't be perturbed even if the onset of sapience/language were as recent as, say, a hundred thousand years ago (although in my second post on Jaynes, I offered reason to suspect sapience would precede the divergence between H. neanderthalensis and H. sapiens, circa four hundred thousand years ago).

Most of the following three million years, through the entire Lower Paleolithic and Middle Paleolithic, I suppose occupied by verbality, that stage of human culture and language that precedes orality. Although the Pirahã presumably cannot be typical of such a period, because after all they are extraordinary in even existing long after the age of verbality has passed, there are several characteristics of the Pirahã I feel should apply to all verbality: their culture has no art and no storytelling; and their language has, on one hand, no time vocabulary nor tense, and on the other hand, lacks —tremendously controversially— a technical property called recursion.

Recursion might be a thoroughly obscure technical property of languages, if not that its reputation was built up particularly by no less than Noam Chomsky — I think I may fairly describe him as, reputationally, the greatest linguist alive today. Recursion is the property that a phrase, of some general sentence-like class, can contain embedded within it a subphrase of that same class. Chomsky had this pegged as a distinctive universal characteristic of human language, thus tightly linked with the supposed universal grammar instinct; and then Daniel Everett reported that the Pirahã language isn't recursive. With the result that professor Everett has now had the unusual, if dubious, honor of being called a "pure charlatan" by, reputationally, the greatest linguist alive today.

From my perspective, since language is a projection of sapient thought, the question becomes, what sort of thinking projects in the form of recursion. The answer, I submit, is meta-level use of thought, which projects as meta-level use of language. In other words, and more pointedly, recursion is the property your language has if you're a storyteller. (In the earlier post on my second reading of Jaynes, I called this "thinking about thinking", a seductively symmetric phrase that gets increasingly awkward the more closely one considers its semantics; during the explorations below, especially Havelock, it will become quite untenable.)

I therefore reckon that the introduction of recursion into language marks the beginning of storytelling and thus the transition from verbality to orality. I expect this to produce dramatic effects, in the emergence of art and an explosive acceleration in the development of new technologies. Both of which we observe at the onset of the Upper Paleolithic, which is variously dated anywhere from forty to seventy thousand years ago (Wikipedia, last I saw, claimed fifty); I've been using the admittedly nominal figure forty. So I place the introduction of recursion then; likewise art and storytelling. Not tense, though; that comes later.

Interestingly enough, Pirahã, though lacking time vocabulary and verb tense, does have verb aspect (the property dealing with state-of-completion, repetitivity, and such). The reconstructed Proto-Indo-European (PIE) language, generally believed atm to have been extant roughly 6500 to 4500 years ago, in its earliest phase appears also to have had aspect without tense. I'm unsure where to put the onset of aspect, on my timeline; even whether to place it before or after the onset of orality, since the presence of aspect in a modern anomalous non-recursive language doesn't necessarily imply aspect must originally have developed before recursion.

The next steps on my timeline are the introduction of past tense, which seems to have been toward the early side of the PIE range, and future tense, which apparently post-dates PIE. I suspect the introduction of tense, and the sort of thinking about time that underlies it, is a major part of the shift of mentality Julian Jaynes was picking up on in his perception of the breakdown of the bicameral mind during roughly this period.

After tense, the final anchor point of my timeline —other than, of course, the present day— is the transition from orality to literacy, which I inherit from Havelock. I understand this to be (amongst, perhaps, other things) a profound change of attitude toward fact, resulting, at least in part, from a shift in the perceived basis of fact from oral traditions to written records which are inherently more stable (and we're now shifting partway back, as electronic records are, in some important ways, no more stable than oral traditions). Evidently Jaynes, Havelock, everyone sees that something extraordinary happened to the Greek mindset in the centuries leading up to Plato, with differences of interpretation as to what it was.

Hallucination

Under the theories I've got going here, a human mind trained in a different stage of development of storytelling doesn't "work" fundamentally differently —it's all based on the stage, and the loom, and the idea-synthesizer— but they do think different thoughts. When trying to study such people, notably when studying their language (be it written or reconstructed), this raises a formidable challenge in trying to think like them, which is after all the function of language: to induce thoughts in someone else or, in this case, to reproduce someone else's thoughts in oneself. It's not just that the meanings of words have changed, or that idioms and metaphors and the divisions between concepts have changed; with a different phase of storytelling, the nature of the conceptualizing process has changed.

It's also possible something else has changed, at least a bit, which may relate to the —objectively, rather bizarre— main thesis of Julian Jaynes; and this I would like to take a moment to consider, as at this late date any possible insight into what the world looked like to someone three, or six, or nine thousand years ago is not to be lightly dismissed. I'll suggest how to mitigate Jaynes's rather hard-edged theory.

Okay, if you haven't heard about Julian Jaynes, I hope my efforts at an even-handed presentation of his ideas don't leach all the mischievous fun out of them. Julian Jaynes's proposal was that during what I'd call the early part of the PIE era, the human mind functioned along radically different lines than it does now. Naturally I'm not all-in on that, because I maintain the basic structure of the mind has been substantially stable for millions, perhaps tens or hundreds of millions, of years. What gives his proposal a certain edge in notoriety is the form of the envisioned bicameral mind. He says that in the bicameral mind, the "left" (i.e., dominant) brain hemisphere handled the details of following directions generated by the "right" hemisphere. And these directions manifested themselves in the form of hallucinated gods. So that when human characters in the Iliad —Achilles and Agamemnon and such— were told what to do by various gods, that's not a metaphor, or poetic license; it's an accurate description of the sort of thing people in that era experienced, which was hallucinatory in nature.

I do think Jaynes miscalculated on the extremity of this effect, probably because he was basing his guess at the bicameral mind on modern clinical studies of extreme schizophrenia — a psychiatric disorder in which the patient is debilitated by hallucinations. Studying extreme cases can sometimes carry this liability, of perspective framed in terms of extremes. I'm particularly interested in whether the bicameral effect has to be as extreme as Jaynes depicts because Jaynes was led, apparently by his expectation of an extreme effect, to propose a correspondingly radical rearrangement of the underlying architecture of the mind during the breakdown of bicamerality, whereas I am proposing a substantially stable underlying architecture.

We were, however, asking what these earlier phases of storytelling looked like from the inside, and Jaynes notes some fairly straightforward descriptions of gods directing people in those times. The gods in the Iliad being a prime example. Another classic instance is the stone monumental depiction of ancient Babylonian king Hammurabi, of law-giving fame, in which he stands diligently listening before his enthroned god, receiving (so Jaynes interprets the scene) dictation of the laws. That was about 3750 years ago. Jaynes contrasts it with the monumental depiction, from about 500 years later, of Assyrian king Tukulti-Ninurta I, kneeling rather than standing before the throne of his god (so, again, Jaynes reads it) — and the throne is empty. Which Jaynes proposes as a very straightforward depiction that the gods have ceased to appear.

My particular suggestion to mitigate Jaynes's scenario is, simply, that hallucinations don't have to be debilitating. Think of it this way. Schizophrenia patients describe having their whole awareness disrupted and overwhelmed by hallucinations, so that they're effectively prevented from completing thoughts for themselves. The accounts are harrowing. But an unwelcome actual, unhallucinated voice speaking to you can disrupt and prevent you from thinking, too. Whereas we can and do juggle multiple conversations with different people at once when we're in the groove of the situation. (I remember once hearing someone claim this as a peculiarity of home life in a certain subculture, and thinking, that's not just one subculture; that's normal for humans, but we underestimate how much alike we are, as well as how different, because each of us only gets to be one person.) So, with the right conceptual framework, it should be possible for personified gods to be at once both hallucinations and just aspects of the world. They don't have to be the harrowing experience described by modern schizophrenics, whose condition is both pathological and not well supported by the current phase of storytelling. Jaynes does discuss, extensively, the importance of teaching people what to expect of these sorts of phenomena, noting for instance that what happens to people when hypnotized has changed over the decades with social expectations of what will happen.

We have non-pathological hallucinations even now; religious visions are a notable type. The idea that these are, in fact, overwhelming, all-consuming experiences seems to be part of our collective unconscious (while the idea that an overwhelming, all-consuming hallucination isn't pathological if it's religious, is part of our social conditioning — but that's a whole different can of worms). However, being taught to expect overwhelming experiences may, indeed, cause them to overwhelm. We also, on consideration, have a paradigm of spirits, or imaginary friends, who keep us company and converse with us on a more equal basis.

Snell

Snell's book uses philology to support an account of the development of the modern European conception of the human mind over the interval from Homer, circa 2700 years ago, through roughly the time of Aristophanes (and Plato), circa 2400 years ago, with some mention of authors as late as Ovid (circa 2000 years ago) and cameos from substantially modern authors (within the past two or three centuries).

All Jaynes says about Snell's analysis is that "our conclusions [...] are quite different."

Snell's treatment is a series of studies, always meant, he says in his introduction, to be a coherent collection. Working my way through the book, what most struck me about his writing style —I say this non-judgmentally— was that he mostly doesn't explain what, if anything, is his point. In each study, he starts saying things about his subject, goes on saying things about it for a while, then stops. This is not how I write, on this blog; I strongly favor a style one might call "least surprise", in which I strive to make it always clear where I'm going, and where I end up. On closer examination, though, there's more to it than saying what you're doing; there's a difference in what we're doing. In my explorations, I always am somewhere. There's a sort of singular focus to my ramblings that is never quite there in Snell. Of course each of Snell's sentences is saying one specific thing, and as one scales up there is still a broad flow, but it's more like narrating the flow of many objects down a river, describing objects that catch his eye as they pass, rather than following one object at a time as it weaves amongst the others. His writing style diffuses outward, while mine converges inward. It took me once through the book to begin to get a feel for what he was doing. My notes on that once-through reading were spotty. So when I finished, I read the whole thing a second time, from the top. Some of my first impressions seem significant —you only get to see something for the first time once— and I've tried to preserve those here; but I've filled in gaps and revised, leaving these notes considerably longer than their first draft.

He does, in fact, give a highly compressed accounting of the themes of the various essays, toward the end of his introduction (though not quite one-to-one, and excepting Chapter 7 which was added after the introduction was written). On my second reading, this accounting was important guidance for understanding the individual essays. On the first reading it failed to register on me, likely because I expected the theme of each individual essay would be more fully stated in that essay. This was, I confess, not the first time I've botched my reading of a book because I had clear, and mistaken, expectations of what sort of book I was getting into; saving readers of my blog from this sort of frustration is part of the motive for my least-surprise blog style. Turning this meta-lesson back to the content, there's an echo in it of Jaynes's remarks on the importance of preparatory conditioning in how one experiences things. When a story begins, say, "Once upon a time," if we expect a happily-ever-after sort of fairy tale to follow, it may be disorienting to find ourselves instead in a tale of the much darker Brothers Grimm variety. This should provide some sort of insight into the evolution of storytelling.

The introduction painstakingly analyzes the problem of describing the development of a way of describing something whose shape is determined by its description. This seems to me to make it all more difficult than it needs to be. The tangle, which Snell struggles to sort out, is created by supposing that words actively participate in thought. It's all very well to warn (as Snell does) against applying a modern conception of mind to the study of ancient people who likely did not share that conception, but the warning is rather blunted when the alternative is a circular definition. My approach has no circularity problem. The modernly conceived mind, for me, is a fiction to which we ascribe the acts of the sapience engine; like a character in a historical drama, it is invented after the past events in which it pretends to participate. It doesn't actually think; that's an illusion. It is only thought about, and thereby affects our thinking in the same way that anything we think about affects our thinking. It's just another idea, a memetic structure that has evolved over time, as did whatever ancient notion preceded it.

As Snell works his way cautiously, in his introduction, through a discussion of the perplexing difficulties of translation caused by differing conceptions of mind, I find that much of what he says remains true under my different approach to the subject. Though I'm struck by the irony that what Snell describes with reference to our attempts to understand Homer, also applies —if on a smaller scale— to my attempts to understand Snell. On the positive side, it should follow that my study of Snell can serve to some degree as practice for studying more remote mindsets.

In Chapter 1, Homer's View of Man, Snell observes that Homer had many words for fragments of things that later Greeks would have a word for as a whole. He details Homer's various words for different ways of looking (wistfully, inquisitively, etc.), despite lack of a word for the general act of looking that they all share in common. Then he explicates in detail that Homer has no word for a living body but instead uses words for various parts of the body, and notes one could equally well say that Homer's Greeks obviously had bodies but did not know them as bodies, or that they "did not have a body in the modern sense of the word" [p. 8]; by which Snell evidently prepares for a similar discussion of words related to mind.

The fragmentation effect may relate to my interest in verb tense, which is about an individual reference point in time. (It will also relate, later in this post, to an observation of Havelock about plurality versus unity.)

Snell describes the situation in language-driven terms. These abstract concepts absent from Homer become obvious once there is a word for them, he says; whereas I would say of the same phenomenon, that once the concept is there a word appears for it and becomes the means by which it propagates easily and becomes ensconced in the culture. He even quite naturally presents language as an autonomous actor (supposing the translation from German preserves this nuance): "[...] as if language aims progressively to express the essence of an act, but is at first unable to comprehend it because it is a function, and as such neither tangibly apparent nor associated with certain unambiguous emotions." [p. 7]

Snell's discussion of words related to mind, in which he finds Homer's view of people fragmented mentally as well as physically, takes up more than half the chapter. Along the way, he notes particularly that he doesn't find it useful to analyze the situation in terms of concrete-versus-abstract, but rather in terms of organ-versus-function. Earlier words, he says, are essentially organs rather than functions. An interesting point since Jaynes repeatedly stressed the more traditional concrete-rather-than-abstract character of early writings. (With an eye to how Snell's organ-versus-function will play with Havelock's treatment, below, it seems worth emphasizing that organ-versus-function does not conflict with concrete-versus-abstract; functions are abstract; but organ-versus-function says more about the content than does concrete-versus-abstract.)

In the last paragraph of Chapter 1, Snell abruptly makes bold statements about wizardry and magic, which he says are obvious predecessors of the view of humans presented by Homer. The Olympian gods are a replacement for this, an alternative to people being subject to the whims of wizardry and magic. He'd been proceeding so cautiously up till then, I was taken quite by surprise. This supposition about what comes before Homer seems just as arbitrary as Jaynes's — or mine, for that matter. Everybody wants to figure what preceded Homer, and gets there (initially, at least) by guesswork; alas, here Snell doesn't acknowledge when he's guessing. At any rate, in this light, Jaynes's remark that "our conclusions [...] are quite different" —taking "conclusions" to mean theories— makes a lot more sense.

Chapter 2 is about the Olympian gods, which Snell says were an achievement by Homer (in, perhaps, an extended sense); or, more precisely, he cites Herodotus (circa 2450 years ago; rough contemporary of Socrates, circa 250 years after Homer — and, btw, after Homer's contemporary Hesiod): "Herodotus, himself born in the land of these poems, testifies that Homer and Hesiod presented the Greeks with their gods." [p. 37]

What preceded the Olympian gods in Greece? Snell hovers near to this question throughout the chapter, but does not seem imho to seriously consider anything other than superstition and magic. His ending to the previous chapter was evidently meant to lead in to this. It feels, to me, rather presumptuous of him (though possibly he would consider himself less committed than I interpret); the closest I see to evidence is some remarks [p. 35] on the Titanic gods overthrown by the Olympians. I'm struck, btw, that Snell, remarking on the absence of the supposed pre-Olympian religion in Homer, says the omission must be deliberate — contrasting with his remark in his introduction [p. ix] that if Homer doesn't mention something we can deduce he didn't know about it. I remarked this about Jaynes, too, that he interpreted absence from Homer as either evidence of absence or as deliberate omission depending on what fit his thesis.

The Homeric mindset, as Snell describes it, does not have a notion of the individual person as a source of motivation, but has instead the pantheon of Olympian gods who provide motivations. (Snell quotes Goethe: "The god to whom a man proves devout, that is his own soul turned inside out." [p. 31]) The Olympian gods do not, Snell notes, generally command (as the Christian god is wont to do [p. 29 endnote]), but rather offer advice to be followed or not at the person's choice. This strikes me as a significant contrast with Jaynes, who emphasizes the obedience rather than the choice; recall my remarks (above) on the difference between overwhelming hallucinated gods, and conversing hallucinated companions. I have a particular interest in the difference this implies between paths conceptual evolution has followed in different cultures, rather than exclusively the particular path it did follow in ancient Greece; which of course one can't judge Snell too harshly for not addressing more, since his particular interest is, indeed, the particular path followed in ancient Greece.

Chapter 3 is about lyric poetry, which, Snell says, in Greece rather than in Europe as a whole, chronologically followed epics and preceded drama. A key change from epics to lyric poetry, and other arts at the same time, is the introduction of an author. Snell includes in the form, on stylistic grounds, poetry accompanied by flute rather than lyre which he calls "personal lyric"; taking as his focus three authors: Archilocus, circa 2700 years ago; Sappho, circa 2600; and Anacreon, circa 2500. Personal experience of conflicting motivations is, Snell says, a new idea since the epic; bitter-sweet (in relation to love) is, he says, a new coinage when Sappho uses it. Snell's explanation —if I quite followed him— is that Homer describes people in terms too strictly operational for motivational conflicts to appear explicitly. While the lyric poets do not yet portray individuals as sources of motivation, they introduce a broad rhythm/ebb-and-flow of life as a force besides gods. Snell notes that during this time political parties were formed for the first time. And then again, reminiscent of Chapter 2, in the last paragraph of Chapter 3 he brings in something new, this time saying he has "shown" Homer couldn't understand the soul as being opposed to the body, and I said, wait, shown what?

In Chapter 4 he presents Pindar —rough contemporary of Anacreon— as a transitional figure, depicting the simplicity of the gods manifest on Earth. Snell figures Pindar could only have occurred at that moment, on the cusp of transition of mindset, when humans were about to be seen as individuals; Pindar sees the world as magnificently and manifestly ordered, overt rather than spiritual as the human spirit is not yet recognized as such, with gods and humans viewing each other rather symmetrically. Pindar also, Snell notes, does not subordinate the parts to the whole, a tendency not yet lost from the Homeric mindset.

Chapter 5 is quite dense. Its underlying theme is the emergence of a distinction between external and internal reality; its primary vehicle is discussion of the work of Aeschylus, especially in contrast to his predecessor Homer. In Homer, there are two levels, gods and men, but both are external, immediately present. In early ritual performances, the performance connects the represented myth with the occasion of the performance. A degree of dissociation occurs with performers describing the myth in words rather than acting it out. The story becomes disassociated from the occasion. Somewhere in all this, Snell notes a change of statue inscriptions from saying "I am so-and-so" to "I am a representation of so-and-so". Individual experience comes in with lyric poetry, but not individual decision. Aeschylus is concerned with individuals making decisions within their own minds; our modern image of Achilles struggling internally over his decision to get back into the fighting, Snell says, is due to Aeschylus. Action for Aeschylus, he says, is in the mind, involving both the past and future.

The immediate theme of Chapter 6 is the step from Aeschylus to Euripides (though on my first reading I had a lot of trouble figuring that out). The discussion turns about Aristophanes's criticism of Euripides in favor of Aeschylus, which gained no traction until revived by a series of thinkers starting in the 1700s — Snell mentions Lessing, Herder, then in more depth Schlegel and Nietzsche, and eventually also Goethe. It seems that, in essence, Aristophanes objected that Euripides leached the morality out of tragedy by over-rationalizing it; a criticism Snell ultimately rejects. Snell acknowledges that Euripides marked the end of Attic tragedy, but maintains it was a natural evolution, Euripides carrying further the rational shift of Aeschylus, in which decisions are made by individuals rather than gods (not dragging morality down to nature, for Snell maintains the Homeric gods are natural, but perhaps recognizing a spark of divinity in the human mind). Along the way Snell touches, briefly [p. 116], on Plato and the role of poetry in teaching, following apparently the traditional view later rejected by Havelock.

Chapter 7 discusses the changing Greek view of the distinction between divine knowledge and human knowledge, which Snell says existed in Homer but changed in character. This is the chapter that Snell added for the 1957 translation. On first reading I didn't follow what the difference was supposed to have started as, or changed to; Snell doesn't explain where he's going at the start, of course, and though in this case the title accurately reflects the main point ("Human Knowledge and Divine Knowledge Among the Early Greeks"), I only figured that out after reading it, having no longer taken the title seriously as an indicator of what mattered after my experiences with some of the earlier chapter titles. (Verily, the first reading did not go well.) At any rate, on second reading, the Homeric view was that knowledge is proportional to experience and therefore the gods alone have complete awareness hence perfect knowledge. The influence of gods on humans is natural; humans learn by experience and consider it a gift from the gods. As this view changes, humans depend less on connection to the gods. Snell describes several variants due to particular authors, e.g. the gods may "know" some things that resemble truth by aren't truth. One place this can go is a scientific outlook toward empirical data, but Snell says ultimately the influence of Parmenides led toward rejecting sensory evidence in favor of divine inspiration.

Reading the descriptions of these various views of knowledge, I got a strong sense that these people had different cognitive types — profoundly different kinds of information-processing going on in their heads. I'd love to get a handle on how cognitive types interact with the history of all this; but, constructing any sort of post about them is extraordinarily fraught. Not something to tangle with here, anyway.

Chapter 8 is about the changing Greek view of right behavior. On first reading: The Homeric-era goals, according to Snell, were profit, happiness, and honor/repute/glory. Honor, if I follow him rightly, is where support for the social order comes in, has a strong early association with religion, and leads later to the notion of the state. On second reading, my interpretation of the content stands; noting that this time around, the overall flow of Snell's reasoning came across tolerably well (contrary the first reading); the addition of Chapter 7 made some sense. The destination of Chapter 8 is the moral attitude of Socrates, for whom virtue is founded on the decision-making ability of the individual, the emergence of which by separation from the gods has by this time been explored in two different aspects, in chapters 5 on tragedy and 7 on knowledge.

Chapter 9 traces the use of comparisons, from Homer forward, with reference to myth on the early end and logic on the late end. My sense, as I scrutinized this chapter on second reading, was that this chapter conveys an especially important element in Snell's thinking. His use of terms such as metaphor and paradigm appears to be precise, though what precisely they mean to him was less clear to me — despite his having apparently turned parts of his discussion particularly toward explicating those terms.

The contrast between his apparently preferred mode of thought and mine is striking; even when he sets out to explain what he is doing, his exposition is essentially unfocused, whereas even if I were to set out to wander aimlessly about an area of study, I would start out by saying plainly that that was what I was going to do, and each point I made in my wandering would have a gathering-inward coherence to it in contrast to the distributing-outward character of Snell's observations. I do wonder how much of his style is cultural versus how much is cognitive; I don't ask the same question too closely about my own style lest I tie my thinking in a knot, but note that after many weeks of deeply studying Snell, I had to reinforce my own style against a slight diffusion of focus.

The problematic language-as-thought that I noted in Snell's introduction and Chapter 1 is not noticeable out here, in the deep water midstream of his investigation. Here it appears that for Snell, language is the medium through which thought is expressed, thus language helps to guide thought, an attitude akin to my own. His use of the term mythology also (like metaphor and paradigm) appears to be specific and important to his overall vision, though I was unable to get a solid hold on it either. Mythical thinking presents images meant to be apprehended directly whereas logic searches. Comparisons are a form of expression, while reasoning is implicit behind expression in Homer and, over the following centuries, gradually shifts under the surface of expression until it finds a position where the form of expression rearranges itself so as to support logic explicitly.

(Reviewing these notes on Snell after my subsequent deep-study of Havelock, I find the remark about mythical thinking —"Mythical thinking presents images meant to be apprehended directly whereas logic searches"— absolutely stunningly Havelockian. There was no glimmer of such a recognition in my mind when I wrote the remark; yet the connection shines out so brightly now that I'm seriously contemplating whether to revisit this part of Snell. A thoroughly unanticipated thought as, laboring through my second reading of Snell, I tried to wring all the insight I could out of it in the sincere belief that I would surely never pass that way again.)

Chapter 10 appears, then, to be about the process by which language gradually rearranges itself to support logic; where the preceding chapter focused on comparison, this chapter focuses on the emergence of abstract nouns in the Greek language. As seems typical of Snell's style (at its best), the chapter touches a variety of variously intriguing themes. He repeatedly considers three primary kinds of words —nouns, adjectives, and verbs— and three kinds of nouns —proper, concrete, and abstract— with the definite article enabling transformation of adjectives and verbs into abstract nouns. The emergence of abstract nouns, he says, essential to science, is enabled by the language having a definite article, which becomes able to transform an adjective or verb into an abstract noun. The emphasis on abstractions, I immediately recognized as reminiscent of Havelock. Snell also notes mythical names as a precursor of abstractions; at one point [p. 237] he contrasts Homer's mythical identification of Ocean as the origin of the gods with Thales's concrete identification of water as the origin of all things. Snell also spends some attention on motion, which he says (if I quite followed him) underlies all verbs in a scientific view; that bit, I admit, powerfully reminded me of the vector language concept behind my conlang Lamlosuo (leading me to really wonder about the deep connections between my interests). This miscellany of fascinating themes flies somewhat out of control when, near the end of the chapter, he begins an apparently summing-up paragraph with "What we have seen to be true for the substantive and the adjective, has now been shown to hold for the verb as well." This was, sad to say, a complete non sequitur for me; I've no idea what we're supposed to understand to have been shown for all three kinds of words at this point. Later in that paragraph (which is almost a page long, as many of his paragraphs are), he also refers to there being three parts of all Indo-European grammar, by which again I'm unsure what is meant; maybe nouns adjectives and verbs, but nothing in the context confirms this: he suggests a connection to the three genres of poetry, epic lyric and drama, and I honestly have no idea how those are supposed to relate to grammar of any stripe.

At one point in the chapter [pp. 241–2], Snell addresses the matter of tense, which figures prominently in my own recent conjectures. Snell says science is particularly concerned only with past tense since all observations are necessarily in that tense (I'm aware this property of observation also figures prominently in journalism), and notes that the Greek language (in the age he's concerned with, I presume) uses aspect rather than tense. As I recall it's been noted somewhere-or-other in the Conlangery podcast that grammatical terminology tends to get murky when shifting between languages, so I'm wary of exactly what doesn't have tenses actually means, but the point is clearly highly important for how Greek fits into the evolutionary picture I'm assembling.

Chapter 11 is about evolving ideals of human behavior; truth and justice and such. While previous essays in the collection mostly focused on the ancient world, this one relates it all to the present, with rather stunning effect for me as I'd somehow lost track of the reality that this stuff was written in Germany in the middle of a century when really awful things had been happening in and around Germany. Snell writes, rather matter-of-factly, of considering a few years after the first World War "what values were worth saving in Europe". I've got other views lately of the bone-deep psychological impact of WWI, e.g. J.R.R. Tolkien's work is thoroughly shot through with it (once one relegates the Catholic influence to the background). And this particular chapter was first published, according to the translator's note, in 1947, before appearing in the collection in 1948.

Chapter 12 is about how the poetry of Callimachus fits into the evolution of thought after the classical period. Callimachus was (I see from Wikipedia) born a decade or so after Aristotle died. Snell describes him as post-philosophical, not pursuing philosophical goals but rather making light of previous intellectual thought in an erudite manner that, Snell says, sets the tone for thousands of years of erudite poetry. He describes [p. 266] the pre-philosophical poets as "[staking] out new areas of the mind", from which the discovery of mind and philosophy arose, while Callimachus also explored new territory but not in a profound moral or conceptual sense but rather art for art's sake.

Finally, in Chapter 13 Snell fits Virgil into the picture. Virgil was a Roman poet writing several centuries later; chronologically he is at the end of the period studied by Snell's book, as Homer in Chapter 1 is chronologically at the beginning. (Despite the hand-waving in Chapter 2, I only recall one solid reference in Snell's book to anything before Homer, through the evidence of a scene on someone's shield in the Iliad [p. 285].) According to Snell, Virgil was influenced by the somewhat ironic writings of Theocritus – afaict a contemporary of Callimachus — but took the erudite shepherds portrayed by Theocritus seriously, and created poetry with the seriousness of Homer, not grounded in the concrete world as Homer was but instead set in an idealistic landscape unhinged from reality. This, in the political context for Virgil of a Rome whose people were sick of chaos and wanted the peaceful stability furnished by Augustus. Snell presents this as Virgil's discovery of poetry as an exercise in imagination, and of poetic symbolism, also reprising Snell's point from his Introduction to the book that the entire intellectual progress throughout the book is discovery rather than invention. Along the way he explores differences between conceptual frameworks of, notably, Virgil, Theocritus, Homer and Hesiod, and Plato.

After completing my second, intensive study of the book, my overall —positive— assessment is that Snell understands the shifting mindsets at a deep level that he doesn't articulate directly (cf. profundity index). I'm on the brink of seeing what he's getting at.

Havelock

Going into a second reading of Havelock, I was particularly interested in his view of the conceptual structure of orality, which I didn't feel I'd understood from my first reading. After my experience with Snell I was also wondering how I would perceive Havelock's style.

Havelock's Forward, contrasting Snell, is written in a convergent rather than diffusive style. It also doesn't fall into the trap of treating words as self-defining, but rather describes the difficulty of understanding pre-Socratic texts as one of working out what concepts words signified in-context to those who used them, and how they came to do so; as in my own framework, a non-circular process, in which words express ideas but do not cause them. He describes his main thesis, that the pre-Socratics were not practicing philosophy in a modern sense but rather were developing literate concepts starting from an oral mentality.

Reference to Milman Parry provided a fascinating detour through Wikipedia. Parry was born in 1902, eleven months and change before Havelock, and died at age thirty three (just under thirty three and a half). Havelock in his forward credits Parry with a crucial clue to what an oral mentality looks like, but doesn't digress to explain what the clue was — hence Wikipedia. From the Wikipedia write-ups, Parry studied traditional oral poetry, and concluded it uses formulas that allow bards to rapidly assemble verse from modular parts. Havelock's contribution was to work out the conceptual changes involved in shifting from the construction of prose through oral formulae to the construction of prose through abstract categories. This information about Parry could be immensely valuable for understanding Havelock... if it's right, as opposed to, say, a deep misunderstanding of one or both of them by whoever imposed their perceptions on that part of Wikipedia. The latter concern connects directly to what made this detour immediately fascinating for me. Evidently, the Wikipedia articles about Parry and the people and ideas surrounding him were written by one set of contributors, the articles about Havelock and the people and ideas surrounding him by a different set of contributors, and quite blatantly the communities surrounding these two scholars have profoundly different frames for the overall situation. The articles in the Parry neighborhood portray Parry as the primary author of the orality/literacy distinction, and oh-by-the-way the distinction is associated with Havelock. Those in the Havelock neighborhood present Havelock as author of the distinction, with an important debt acknowledged by Havelock to the related work of Parry on the structure of oral poetry.

(Interestingly, a much more pronounced example of reverberating hearsay surrounds Parry's death, and likely feeds back into the way the Parry-neighborhood treats him. In addition to studying Homer, Parry also traveled twice in the 1930s to Yugoslavia where he recorded oral bards in that narrow window of time when modern recording equipment was available while the oral tradition was still practiced; and then Parry, having developed the habit of carrying a loaded pistol in those dangerous regions, accidentally fatally shot himself while unpacking his luggage in a Los Angeles hotel room. According to the recent source article cited by Wikipedia, this rather pointless death was transmogrified in the telling, Parry ennobled with comparisons to Lawrence of Arabia (who had died similarly pointlessly about half a year before), Charles Darwin, Don Quixote, Alexander the Great, John Keats, Paul Gauguin, Igor Stravinsky; and ultimately the story was rewritten as a tale of tragic suicide after being denied well-deserved honors — reminiscent, the recent author noted, of ancient Greek hero Ajax.)

Havelock reasons carefully from evidence in Plato's Republic. His first chapter is largely debunking of numerous conventional views on how to interpret the Republic. Despite the title traditionally attached to it, the Republic is not about politics, by Havelock's reasoning, but rather its main theme is education; Havelock credits it with introducing the Western university system. Plato's Book Ten denounces poetry as intellectual poison, and Havelock systematically debunks numerous traditional ways of denying that Plato really means this; Havelock argues that the attack on poetry in Book Ten doesn't occur out of the blue, but fits into the larger scheme of the preceding Books. Although Plato is traditionally supposed to treat the sophists as his opponents, in Book Ten, Plato sees the sophists as his allies against the threat of poetry. Havelock also particularly notes that Plato is pointedly not interested in distinguishing epic from tragedy.

Alongside the debunking, Havelock also begins to assemble evidence of what Plato actually is talking about, noting that while we perceive poetry as an aesthetic experience, to Plato it must be something else. In Havelock's Chapter Two, he considers what Plato means by the word mimesis (accent on the second syllable, btw; rhymes with thesis), which is generally translated as something to do with imitation/mimicry. While Plato uses mimesis for dramatic rather than descriptive communication, at various points he uses it for what is done by composer, presenter, young student, or adult audience; the point Havelock is driving at seems to be that Plato was talking about something that defies distinctions we are accustomed to because his referent does not exist in our modern culture. Havelock's Chapter Three then articulates what he believes the oral mindset to be: a communicated oral tradition on a massive scale, requiring a whole way of life devoted to its constant reinforcement, which is identified with rather than considered (hence the term mimesis), and which places constraints on the poet as well as the audience. He describes a gradual migration from orality to literacy, passing through two intermediate phases: from more-or-less pure orality with Homer and Hesiod, through craft literacy with the tragedians, and semi-literacy with the sophists, to literacy with Plato.

In reading Havelock's description of plays in Athens, I was reminded of Jaynes's account of the first tragic play in Athens, The Fall of Miletus. In Jaynes's account —which, as I noted in my earlier post, doesn't really seem to match Wikipedia's mainstream account despite both versions invoking the same authority (Herodotus)— the Athenian audience was completely freaked out by the play's single performance, shutting down the city for days, upon which the play was burned and its author banned. Jaynes explains this extreme reaction as a population no longer bicameral but not yet knowing how to handle conscious emotions, which the conscious mind can dwell upon thus creating positive feedback that does not occur in the bicameral mind. As an alternative interpretation under Havelock's premise, of the effect alleged by Jaynes, it occurs to me that tragedies seem rather meant to make the audience think, which is a literate impulse, whereas, for such an early tragedy, an audience used to absorbing oral tradition mimetically might be unable to filter the tragic aspect of the story and might therefore drastically overreact to it. Note that Havelock's orality and Jaynes's bicamerality both omit reflecting on the emotion, with the interesting difference that in Jaynes's version the problem is the presence of reflection, whereas in my extrapolation of Havelock the problem is the absence of reflection.

Taking all these authors together at this stage, I found Havelock most credible on the points he addresses; but Jaynes seems on to something too, if more tenuously, and I've continued to look for ways Havelockian phenomena could project something suggestive of the shapes Jaynes perceives in the evidence. It's entirely possible, when sensing some elusive missing element in a substantially sound theory, to fumble about a bit looking for the missing element, so that higher-risk theories such as Jaynes can play a vital role in assembling a complete picture — as, for that matter, Snell with his less focused style may also serve an important role in accumulating fragmentary patterns in a domain not yet ripe for a wholly coherent new theory.

Havelock's Chapter Four proposes that Homer is mainly instructional material with a plot added to hold it together, rather than, in its usual modern interpretation, an "epic" plot with some tedious bits added on. (Btw, I was caught off-guard when Havelock explained this in the first paragraph of the chapter; apparently habituated by reading Snell, I'd braced myself to try to puzzle out what the point of the chapter was.) Havelock identifies, citing Hesiod, two types of information primarily conveyed: nomoi (singular nomos), rules of social behavior; and ethea (singular ethos), rules of personal behavior. He then engages in a detailed discussion of Homer's content. Which drags on, as he looks at a long series of examples, each of which is individually unimpressive but whose point is, apparently, the cumulative weight of persistent repetition of formulae. Havelock seems to be turning inside-out not only the traditional interpretation of the Iliad as a plot with details added (thus, details with a plot added), but also Parry's interpretation of oral storytelling as modular parts used to construct a story (thus, instructive parts arranged as a story).

Studying Havelock's reasoning, I was struck by his, afaict, rather conventional view of the consciousness of Homeric-era people, whereas the other authors I've been immersed in lately —Jaynes and Snell— each in their own way envision the self evolving during the following centuries. Havelock, of course, proposes a transformation of the mind during this period that I've found quite compelling — but at the same time, I'm finding Havelock's account oddly incomplete. It's somewhat-belatedly occurred to me that I'm grasping for more than an assemblage of parts borrowed from these different authors: I want an evolution of mind starting with a Homeric state that is oral (in Havelock's sense) and selfless with natural gods (in Snell's sense), with possibly some Jaynes-like element, morphing continuously to a literate conscious mind. Key to making this work, it seems, is to work out not only how the Homeric conception of the gods and individual people works, but how it works in, relates to, and perhaps depends on, an oral mindset. Neither the selfless‍ness nor the orality ought to be fully understandable without the other, and likewise for the dual processes of emergence of self and of literacy.

I've been trying to wrap my head around a Homeric manner, based on Snell, of analyzing the whole of society and the people in it, involving Olympian gods, alternatively to the modern analysis into individual conscious selves and interactions between them. Now I'm trying to work out how orality bears on this, and on the transformation.

Toward the end of the chapter, Havelock notes that Plato objects to Homer claiming to give technical instruction without being an expert — that is, claiming to impart techne as well as nomos. "The boundary between moral behaviour and skilled behaviour in an oral culture", Havelock observes, "is rather thin."

Chapter Five is a discussion of the instructional view of Homer, after spending Chapter Four examining the detailed text of the start of the Iliad for instructional nature. His preferred simile for the oral epic is of a walk through a house crowded with furniture, describing each item as he passes it and trying to arrange to pass by and describe nearly everything, with quite limited flexibility for the bard, and considerable skill required to integrate the story with the instruction. He does discuss Parry's view, particularly observing that Parry was observing oral tradition in a culture where the organs of government had literacy at their disposal so that the oral tradition was merely entertainment.

In seeking to better grasp Havelock's notion of oral poetry, I'm struck by remarks of Umberto Eco on the classic 1942 movie Casablanca. The movie is, we're given to understand, a dense mass of cliches thrown together rather desperately during production, producing something "if not actually against the will of its authors and actors, then at least beyond their control. [...] [T]here unfolds with almost telluric force the power of Narrative in its natural state, without Art intervening to discipline it. [...] Something has spoken in place of the director." Eco also notes that while middle-aged audiences watch the movie nostalgically, college-age audiences "greet each scene and canonical line of dialogue ('Round up the usual suspects,' 'Was that cannon fire, or is it my heart pounding?' — or even every time that Bogey says 'kid') with ovations usually reserved for football games." (Umberto Eco, Casablanca, or, The Clichés are Having a Ball, 1994.) If Havelock's oral-poetic mode is a real phenomenon, and people are basically the same today as we were three thousand years ago, the mode should still be there to tap into, and it surely sounds from Eco's description as if Casablanca is tapping into it.

Chapter Six discusses Hesiod's hymn to the Muses. Havelock maintains this is an account of what bards in an oral society do, expressed in the form such an account must take in their oral medium; and, he says, it agrees with what Plato says about what bards did in the society, from a philosopher's perspective. According to Havelock, Hesiod makes the Muses daughters of Zeus and Mnemosyne — this being the means available to Hesiod to express relations between abstract categories without actually having abstract categories (which are literate), Zeus representative of social order, Mnemosyne of memory/recall/record. The ninth Muse described is Calliope, the consort of princes, whose name Havelock translates as "fair-utterance": representative of the oral-poetic use of words, which are either wielded directly by a prince with the gift, or provided by the prince's bard, as poetry is needed both to persuade people to follow orders and to put the orders in a form that will be remembered accurately. This aspect of princes is not dwelled upon by Homer, Havelock suggests, because it's so obvious no-one would think to notice it, and it's invisible in Homer's text since any words put in characters' mouths are part of Homer's text which is all poetic; and even so, Havelock notes a Homeric depiction of Achilles as a "speaker of tales", "chanting the glories of heroes".

Chapter Seven is about how the culture in which Homer and Hesiod lived would have been shaped by the ancient Greek Dark Age of several centuries leading up to it, from the fall of the Mycenaean civilization. This was, by Havelock's account, a profoundly "dark" age in the sense that we have no direct knowledge of it, as it was especially purely non-literate — forcing Greek culture into a purely oral form, at the same time a diaspora was forcing strong measures to preserve their cultural identity. (Havelock is, btw, disinclined to view the age as "dark" in the sense of the Greeks being uncivilized.) Given the instructive nature of oral poetry, the culture depicted in Homer would be necessarily that of Homer's time, with the seeming of a story about ancient heroes merely a framework to convey the contemporary nomoi and e‍the. Memory and a gift for the poetic form would be a real advantage for leaders, resulting in a culture led by the poetically talented (I'm picturing a word "poet‍o‍c‍racy" here), which leads me to wonder whether certain cognitive types that we would hold in high esteem —such as Albert Einstein's— might have done poorly in that society. (It also occasionally crosses my mind, when studying this stuff, to wonder whether Plato was tone-deaf.)

Chapter Eight, a relatively short chapter, is about what the oral mindset looks like. Havelock's main point here is that in either type of culture, oral or literate, a main repository of preserved cultural knowledge defines the highest form of expression, and the form of ordinary language devolves therefrom — turning "upside-down" a widespread literate perception of metrical language as a ceremonial/artistic variant from ordinary language. People in an oral culture do in fact, he says, speak ordinarily in a metrical fashion; a difficult thing to observe, but he mentions several glimpses of it; a few observations from encounters with oral cultures in WWI, and samples of pre-alphabetic writing. From WWI, T.E. Lawrence describes metrical speech in the mustering of non-literate Arab troops; while an account of Gallipoli describes Australian soldiers making short logical remarks while Turkish soldiers use metrical idiom (I freely admit to loving the proverb quoted: "Smiling may you go and smiling may you come again"). In early syllabary writing systems, according to Havelock, the written form is too clumsy to become culturally dominant, and so instead imitates the usual form of speech; so Havelock suggests the metrical form of these writings is evidence that people actually spoke that way, rather than (as he notes has been suggested) evidence that the material was ceremonial and thus atypical. He also briefly notes that the non-metricality of literate speech is due to the speaker's perception that the main repository of knowledge is stable without meter.

Chapter Nine describes the psychology of oral poetic performance. The process, constraining performer as well as audience, is hypnotic, we're told, using rhythmic structure at multiple levels —meter, instrumental accompaniment, and sometimes also dance— to support memory by limiting the range of expressions possible and (if I'm following this point rightly) thereby minimizing the work involved, and also providing positive reinforcement through sensual pleasure. An altogether complex process. All of which sounds abstractly plausible but, for me, failed to come across as immediately persuasive; which on reflection I ascribe to a lack of concrete illustration or example. Contrast the preceding chapter which, though only about half as long, offered several interesting evidences to support its point. The lack of illustrative examples may be inherent in the technically intricate subject, which might require a more massive treatment to get into concrete evidence; but while that might explain the shortfall, it doesn't make it any less unfortunate for me since I want illustrations to help me grok the concept and transform it for my altered theoretical framework.

Chapter Ten is about the character of what oral poetry says: "The Content and Quality of the Poetised Statement". This chapter is about as long as the preceding two chapters combined. The oral poetic form uses doings rather than abstractions; Havelock is very clear, in this chapter, that abstractions are strictly a literate phenomenon, repeatedly emphasizing that wherever Homer appears to be framing an abstraction, it's an illusion — though evidently I'm more able to assimilate this point now, after my studies of Jaynes and Snell, than I was on my first reading since somehow I managed (as visible in my post on second-reading of Jaynes) to come away from that first reading unclear on whether orality would have its own form of abstraction.

It also deeply impresses me (thanks, presumably, to my recent revisiting of the subject on this blog) that the concept underlying my conlang prototype Lamlosuo now seems, if not more peculiar, then more deeply peculiar than I'd recognized when devising it. In my recent post I was already concerned that the conlang, by starting with an entirely clean slate, would lack the vestiges of an oral past; having a clinical atmosphere, as I described it; but since the early days of that project (better part of twenty years ago) I'd also characterized its alternative language model as going rather than being-doing, and it seems I was saying more than I realized. Havelock makes perfectly clear [p. 182] that the two halves of "being-doing" are, in fact, literacy and orality (he contrasts the words "being" and "doing" explicitly), so that by rejecting both, the Lamlosuo model evidently steps rather sideways from the whole Havelockian scheme. That is, vectors, the content words of the language, reject both being and doing; as noted in the recent post, my attempt to add a vector for the copular verb failed not because the language couldn't support the copular function, but because that function was performed by the role particles so that it would make no sense for a vector to take on that function. Suggesting this facet of literacy has been kept out of the vocabulary yet built into the core grammatical structure, for whatever that implies.

Havelock also emphasizes in this chapter that the oral poetic form requires actions, each with a doer, an actor, for which he suggests polytheism is ideal since it provides suitable actors for a wide range of processes that a literate mind would perceive as abstract. Havelock notes particularly the extensive use of the metaphor of birth and death [pp. 172–3] (though he's only talking about Homeric metaphor here, not the sweeping force-alternative-to-gods that Snell ascribes to later "lyric" poetry). Somewhere around this part of the chapter, Havelock begins again to pull in example oral passages, mainly from the Iliad, which soon get quite thick, though I found it often required close study to appreciate the subtle points he was making about the examples. It seems to me he would have studied tremendous amounts of material and found these subtle patterns he describes permeating it all, which isn't something one can readily put into a book (not at this level of readability, anyway), so the best one can likely do is to explain in detail what to look for and then give a relatively few specific examples from which the attentive reader may be able to discern the subtleties.

The chapter explores in depth three key features of oral poetry, which Havelock repeatedly emphasizes are specifically criticized by Plato. (1) Everything must happen embedded in time, so, no timeless truths. (2) It must happen in a series of episodes, presented in sequence, between which the audience must construe any connection. (3) It must evoke vivid imagery.

The first point, embedding in time, would seem to contrast with Pirahã on the early side —a language without tense— and also contrast with the timeless "being" of "being-doing" languages on the late side. For the second point, series of episodes, Havelock's technical term is parataxis (though I gather some scholars may use the term somewhat differently). Snell had also remarked on how this sort of juxtaposition could be used to imply causation for which there were not yet suitable connectives. Havelock, on the other hand, gives [p. 184] an unusually sophisticated example where Homer does explicitly express causation, "Y because X" rather than "X, and Y"; but suggests that this doesn't work easily in the oral medium because it disrupts the implicitly temporal sequence of images. Contemplating this, I begin to suspect that the modern movie medium of storytelling may be more alien, in its relation to all this, than Lamlosuo's going language model.

Thus ends Part One (of two) of Havelock, about orality, to be followed by the somewhat shorter Part Two about the transition to literacy. Marshaling my understanding of the oral and literate mindsets at this point, before launching into Part Two, as noted earlier I see the two mindsets as different analyses of society as a whole into concepts. I think of the literate analysis as horizontal, or at least flat, the oral analysis as more vertical. The literate view has individual autonomous people, and then abstract connections between them; using the term abstract per Havelock to highlight that this analysis simply would not work in the oral-poetic medium. The oral view decomposes society by, more or less, motivations, allowing a maximally useful set of actors for oral-poetic doings in the absence of literate abstractions. (The "vertical" structure of this oral view puts me in mind of the sorts of non-local decompositions of basic physics I speculated on in a blog post some time back.)

Initially I found the titles of the two Parts of the book both rather hard to grasp. Part One is "The Image-Thinkers", which really meant nothing to me going in to my second reading (other than presumably referring to orality); only coming out of my deep study of that Part could I look back and see it as a comment on the importance of the vivid-imagery aspect of orality. Part Two is "The Necessity of Platonism", which I eventually decided —after two in-depth passes on its first chapter, Eleven— refers to the necessity, in order to implement his plan for societal reform, of eradicating orality. This is a major theme of Chapter Eleven: historians, says Havelock, have come to accept that the self-conscious mind emerged in Greece at about that time (he's talking about Snell, apparently); but what's new in his conception is that this means the overthrow of a preceding oral mindset.

The essential point here is that intellectual thinking requires a separation between thinker and thought-about (or knower and known), thus the concept of the thinker as an autonomous person — a concept that cannot coexist with mimesis in which the participants immerse themselves in each successive oral-poetic vivid episode (psychological identification, becoming many rather than one). Havelock describes "a whole group of intellectuals in the last half of the fifth century" wielding the weapon of the primal dialectic technique of asking someone to explain what they just said, destroying the rhythm of the mimetic process, "arousing the consciousness from its dream language and stimulating it to think abstractly." The first step in Plato's proposed curriculum is arithmetic, because it requires problem solving rather than memorization.

From grade school, I acquired an (admittedly rather vague) impression of Socrates and Plato as "ancient philosophers" in a very dry, bookish sense (though that's not really pejorative, coming from me). From Havelock I get a picture of social radicals on the front lines of an active war against a highly addictive non-conscious mindset, whose issues may well be as relevant in the internet age as they were in the Hellenic age. Havelock did say early on, his thesis required a reinterpretation of Plato's Republic, but it seems I didn't grok at that time the scope of reinterpretation entailed. Looking back from the end of Chapter Eleven, the trial of Socrates makes a lot more sense than the way it was taught to me in school; and the "philosopher king" notion takes on a whole different character if one understands it not so much to mean "philosophers" —in the modern sense— should be in charge as people who think literately rather than orally should be in charge. Granting that particulars of Plato's vision may have been missing some practical insights provided by another two and a half millennia of experience, one suspects he might not be particularly surprised by the currently observable consequences of a prominent class of world leaders failing to apply intellection.

The underlying dynamic of literate learning is that, because the written record preserves itself and can be studied repeatedly, its audience can save themselves the substantial investment of energy in the oral-hypnotic process, freeing up all that energy for thinking. What I'm doing with Havelock, poring over his book for a highly intensive re-reading, is evidently a profoundly literate activity.

Havelock, btw, apparently prefers to use the verb think, if at all, to refer to literate intellection, a usage he seems aware could be confusing so doesn't rely on, but nonetheless avoids the term with reference to orality. Whereas I, for whom sapience is the baseline of interest, use think for any information-processing that has the sapience thing, quite indifferent to the differences of use that distinguish literacy from orality, or even from verbality. Though I don't mean to change my usage to accommodate his, as such, I'm likely to be extra-wary of situations where the usage may be ambiguous, hence generally using the term with a modifier: sapient thought/thinking, literate thought/thinking, etc.

The words abstract and concrete are also worth some attention. In the second-Jaynes post, I used at least once the term oral abstraction. This was partly my confusion over Havelock's view of orality; as noted above, he's very clear that by abstraction he means a purely literate phenomenon. Snell uses abstract for the class of abstract nouns, as opposed to concrete nouns and proper nouns, which is rather akin to Havelock's usage since Snell's point is that abstract nouns are a late development. My own use of "oral abstraction" was in the context of Jaynes's emphasis on the ostensibly concrete nature of early writings. Snell maintained (also above) that he found concrete-versus-abstract less useful in studying Homer than organ-versus-function. My own point, referencing "oral abstraction", was that a mindset not decomposed into literate concepts might nonetheless be decomposed into concepts that aren't exactly "concrete" either. At the time I wrote that, I was deferring judgment on what an alternative decomposition would be; since studying Snell and tempering it with some Havelock, I'd tend to identify this as the "vertical" decomposition of society into motivation-oriented actors —notably including gods— rather than autonomous individual people. (Though whether that qualifies as "concrete" in Jaynes's sense is, on reflection, rather difficult to say; Snell's point about organ-versus-function seems to me to have some merit, while the horizontal-versus-vertical description needs work but may also be on to something.)

Contemplating the possible relationship between the internet-age consequences of world leaders who don't practice intellection, and Plato's motives for advocating "philosopher kings", a stray, curious, and rather circuitous thought occurs to me. In literate society, there seem to be some kinds of deep, latent folk-knowledge that don't get written down; at least, not explicitly, though one might be able to recover some of it by "reading between the lines". Suppose something similar were to occur in oral society, where some deep latent folk-knowledge isn't directly committed to the primary recording medium of oral sagas. Now, Havelock paints a picture of a cadre of intellectuals in the late fifth century BCE (basically Socrates's generation) waging guerrilla war on orality. I've been thinking of all these evolution-of-mind developments as inevitable natural developments, each occurring in its own time, and in the larger scheme of things presumably so they are; but in the specific event, what if these people were acting because of something they perceived about their past, that we're missing because it wasn't explicitly stated in the "official" record? This intellectual guerrilla war would have been about two or three centuries after Homer and Hesiod, and six or seven centuries after some cataclysmic we-don't-know-what wiped out civilization over a massive area (Mycenaean Greece, including Troy, was just part of it). So, if they were fighting tooth and nail to encourage intellectuals —especially, in the case of Socrates's star pupil Plato, intellectuals in charge of society— what does that suggest they thought about the cause of the Late Bronze Age collapse?

Not to put too fine a point on it, my core speculation here is that, just perhaps, these Late Bronze Age civilizations collapsed en mass‍e partly because they were run by irrational idiots. Or, at least, that this made a bad situation worse. The Wikipedia article on the collapse mentions, amongst suggested possible causes, "general systems collapse", which afaics is a fancy way of saying the dynamics of the civilization were out of whack so it broke. If idiotic leaders were really widespread enough to contribute significantly to that big and sudden a disaster, it would have to be something out of whack in the dynamics of the system that tended to put idiots in charge. Of course, I thought to myself at this point, we can't really know; it's not like the ancient Greeks left us an epic account of the Mycenaean military commanders making decisions based on emotions unbridled by reason and getting people killed left and right as a result — but they did, didn't they. Which in an odd way would turn some of Havelock's iconoclasm inside-out: the plot of Homer's Iliad, which Havelock dismisses as a vehicle for the encyclopedic detail within, would also carry a larger message about the society, hidden in plain sight; and Plato's Republic, which Havelock says despite its traditional title is not primarily about politics, would look a lot more political.

Chapter Twelve is about the nature of literate knowledge as Plato characterizes it. The object of non-oral knowledge cannot be the content of the oral poem both because, Havelock notes, general knowledge is only implicit in the poem —which must deal exclusively with concrete instances, episodes embedded in time— and because to touch the poem is to fall under its spell. Plato does, Havelock says, believe that non-oral knowledge has definite characteristics, just as oral knowledge had its character — vivid, episodic, timeful. Plato's three essential features of non-oral knowledge —says Havelock if I've understood him rightly— are substantially opposite to those three features of oral knowledge: non-visual‍ness (an abstraction is unseen), "integrity" (an abstraction is "one", rather than the "many" of episodic oral poetry), and timelessness.

Note, btw, this characterization of the oral/literate contrast matches fairly well with Snell's characterization of Homeric style: fragments of things rather than wholes; organs rather than functions; described in operational terms.

Havelock, following his pattern of rejecting traditional interpretations of Plato in favor of his thesis that Plato is talking about the shift from orality to literacy, emphatically maintains Plato is talking about a "syntactical situation" rather than a "metaphysical super-reality". I think Havelock is about half right on this. I'm inclined to agree that Plato is talking about structural differences between oral knowledge and literate knowledge, and the traditional interpretation of Platonism is exaggerated by failing to take this structural element into account. I'm also mindful that Havelock uses the term syntax somewhat differently than I'm accustomed to (he's a classicist, I'm a computer scientist), and I don't quite have a handle on his usage. But I also think there's an element of cognitive type involved here, in which Plato is perceiving a kind of structure that is perceptible to some cognitive types and not to others. So in this case Havelock's attempt to turn the traditional interpretation upside down may only achieve a 90-degree angle (or, in a physics-nerd-ish metaphor, if he's turned it a full 180 degrees it has spin ½).

In Chapter Thirteen, with the rather difficult title "Poetry as Opinion", Havelock uses an analysis of the meanings of particular words to defend his claim that Plato's attack on poetry is not a digression, but rather is the same thing Plato had been addressing all along. I struggled to pull together the mass of information Havelock has provided; he seems afaics to be attempting to re-derive in this chapter some points that he had presented as established in earlier chapters, leaving me unable to confidently apply here any of my hard-won, tenuous grasp of preceding chapters. To my understanding, here Havelock discusses especially Plato's Book Five, where isolated abstractions are introduced; to some degree Book Seven where arithmetic is proposed as a way of teaching literate thought; and Book Ten with its attack on poetry; but Plato uses different words in different places. Especially, Book Ten uses the term mimesis, which Havelock had already spent an entire chapter discussing near the beginning of Part One; whereas Book Five uses the word doxa (δόξα), which is usually translated as... "opinion". Hence, of course, the chapter title: "Poetry as Opinion", Havelock's central claim being that what Plato attacks under the name "opinion" is also the subject of his attack on poetry, with modern misunderstanding arising because for us, opinion is common while poetry is esoteric, whereas for Plato doxa doesn't mean "opinion" in quite our sense, while poetry in Plato's world was ubiquitous. Plato distinguishes between philodoxoi (fans of opinion) and philosophoi (fans of intellectual thought) — which I found stunningly close to my own ideas on the modern struggle between fact-based and opinion-based mindsets; but rather than completely depart from Havelock at this point, I'll revisit that in a separate section, below.

The problem with "opinion", Havelock emphasizes, is that it leads to "contradiction", where a thing can be both big and small, beautiful and ugly, a person may be both just and unjust, noble and ignoble, etc. This initially perplexed me, and I took a while to catch on. Evidently, a thing takes on these "contradictory" properties at different times, which one has to be able to talk about in order to make sense of the real world (shades of monadic programming); so that the objection to "contradiction" really did come across to me sounding as if Plato was presenting the time-dependent world as less real than the realm of literate abstractions. That is the traditional interpretation of Plato that I was taught; that has always struck me as a bit silly; and that Havelock explicitly denied was Plato's intent ("metaphysical super-reality"). The point, I eventually grokked, is that because any thing can take on these contradictory properties in different episodes of the oral narrative sequence, all such properties in the oral mindset are transitory, so that the oral thinker —the philodoxos— is unable to formulate timeless abstract thoughts. In hindsight, I've been conditioned by my familiarity with the late crisis in the foundations of mathematics, to take the word "contradiction" as a very strong word of denial, pushing me to interpret this criticism of orality as a wholesale denial of the transitory material world; though it's also possible Havelock intended some degree of this misunderstanding, as he claims, on the final pages of the chapter, that philosophy itself later lost track of its original purpose of defeating the oral mindset, thus of merely enabling abstract thoughts, and "substituted the attempt to throw off the spell of material things" [p. 250].

Also at the very end of the chapter, Havelock remarks that doxa means "impression" both in the sense of the impressions I get, and impressions others get of me; thus, impressions by me as subject and impressions of me as object, defying the subject/object distinction that Havelock chose as his foremost point for Part Two (Chapter Eleven, on separation of knower from known).

Chapter Fourteen is about Plato's view of the structure of abstractions. Havelock names the chapter after Plato's "theory of forms", but notes that Plato didn't present such a theory explicitly, rather it appears to have been terminology familiar to the circle of intellectuals around him; and Havelock further limits his attention to the Republic since he reckons Plato's later works are no longer so concerned with defeating orality and so lose track of the contrast. Havelock after noting pairs of opposite abstractions —beautiful/ugly, good/evil, etc.— cites a progression of increasingly complex abstractions about the real world: position in two dimensions, then in three; motion in space; sound moving in space. Which does seem an interesting progression, but Havelock's discussion seems incautious when he tries to ascribe to Plato an overly modern view of empirical science — Havelock as a classicist is well-positioned to recognize when scholars project modern attitudes on poetry, but not so able to recognize when he himself projects modern attitudes on science. The point Havelock drives at here is that the transitory world should be described in terms of timeless abstractions. Having developed this point carefully, Havelock then turns the point to a criticism of Plato: he discusses the different words Plato might have used and the words he did use, which translate into English as visualizable ones, shape and form (hence the "theory of forms"), which he says are because Plato wants morality to be absolute, having himself been predisposed by his upbringing to authoritarianism. Havelock accuses Plato, though, of falling back into the spell of visualizability, and associates this with departure from science into mysticism.

Chapter Fifteen, Havelock's final chapter, goes back to look at the time between Homer and Plato, during which others gradually developed the movement that Plato articulated. Much of what he has to say is presented as hypothesis to be tested; he clearly wants someone to investigate further since, one gets the impression, he simply has not had time to do so himself as he's focused on Plato. Plato's word philosophos, he emphasizes, has to be interpreted much more broadly than its modern sense; it's anyone intellectual, anyone who thinks abstractly; and Havelock says it was quite a new word when Plato was using it. All the pre-Socratics, including the sophists, are meant to be included. Havelock suggests that Plato was basically inventing the idea of a class of intellectual people. He ascribes to Aristotle the analysis of that collection of thinkers according to their positions on specific issues, which is the way they've been perceived ever since; but, he maintains, from Plato's point of view it's more important that they have in common their use of the poetic idiom to form abstract thoughts.

I was struck by Havelock's remark [p. 291] that in order to have a society you must have an educational discipline; one wonders what that should be understood to imply about large-scale home schooling.

The first stage of cultural change, Havelock figures, caused by a popularly usable alphabet, is non-didactic poetry that hangs around because it can be written down. He describes the gradual exploration of what alphabetization enables, noting that authors at first may give up either rhythm or episodic sequence, but not both at once; one therefore gets either non-rhythmic episodic narrative, or rhythmic abstract thinking. Hesiod, he says, takes the first step, in the Theogony, by arranging the gods into families; and notes the shift from sound to vision, ear to eye, echo-and-response to architecture, requiring the support of a written form. I'm put in mind of visual programming languages (though quite what to think about them is unclear; after half a century they seem no closer to competing with textual programming languages, but it doesn't seem like we know how to do them, so the situation isn't like 3D movies that were waiting for a precipitating event; best guess, either text really is better, or we're going about visual languages in entirely the wrong way). At any rate, Hesiod's Works and Days goes further, or tries to, partway into non-visible abstracts. Havelock then describes the pre-Socratics seeking to improve on Hesiod's account of the structure of the world, and gradually realizing in doing so that what they are doing is something new.

Evolution

Merging Snell's and Havelock's views of the Homeric mind, it seems that oral poetry requires everything to be framed in terms of ("in the syntax of", Havelock might say) actors doing things visualizably, and that to encompass within this framework as much as possible of the grand scheme of things, one decomposes the world into 'personified' motivations; more-or-less, gods. It's not practical, in such an oral framework, to analyze the world into individual people with relationships between them, because the relationships can't be explicitly described within the oral framework, being essentially abstract, which is a literate intellectual phenomenon. Thus, the dividing lines between these elements of reality —that is, the dividing lines between gods-or-the-like— are orthogonal to the dividing lines between individual people, which I've tended to describe as "vertical" rather than "horizontal" analysis of society. (The "horizontal" half of this metaphor clashes with pre-existing metaphors such as social strata; though the "vertical" half may have suggested itself exactly because it's orthogonal to those alternative "horizontal" senses.)

A crucial missing part of this picture, to my thinking, is how it generalizes to other oral cultures besides the ancient Greek. Snell and Havelock are, after all, specifically focused on ancient Greece exactly because of the properties that make it atypical (especially, its relatively extensive record and particular influence on later European culture), making generalization from it to other oral cultures problematic. This seems a significant area for future investigation; there's maybe a blog post there, or at least a significant chunk of one.

Havelock's requirement (which he ascribes to Plato) that each vivid episode in oral poetry be embedded in time —either past, present, or future— adds an interesting new dynamic to my timeline for the advancement of orality. I'd placed the development of tense relatively late in the overall span of time assigned to orality; from the start of the Upper Paleolithic, a span just shy of forty to seventy thousand years. So, even in Greece it seems that for most of the oral period there would have been no tense. This raises the question of what oral storytelling would tend to look like without tense, since this would fall outside the scope of Havelock's portrait; most simply, one could lean further on juxtaposition of episodes to imply indirectly what tense would convey directly. The introduction of tense ought to make oral storytelling more potent, and future tense might be a natural technological advancement once one has past/present. It then seems plausible that Plato's timeless abstractions might be a naturally following technological development, but this reasoning hits a stumbling block in that Havelock firmly ties abstraction to literacy, and Greek literacy to alphabetic writing. One is then left with the rather peculiar question of what causal connection there could be between convenient writing and future tense, a puzzle that might-or-might-not benefit from some broader information on the chronological relations between future tense and different forms of writing in a broad sample of cultures. (Which would presumably require a closer examination of just what, in this context, one should understand tense to be — keeping in mind the slipperiness of the term, as noted above re Snell's Chapter 10.)

Thinking back to Jaynes's notion of hallucinated gods, with all the rest of this to draw on, a stray thought occurs. Given Parry's basic insight that oral epics are formulaic, and Havelock's that this corresponds to a certain kind of thought structure, should we expect the Jaynesian gods to have spoken to people formulaically? For that matter, could it be that the trouble with the modern hallucinations is that they're non-formulaic, that somehow this arrangement doesn't function smoothly without the oral thought-patterns?

Opinion

While evolution of the mind is what I'd mainly expected to get out of studying these books, what I got all-unexpectedly, from my deep-study of Havelock, was a strong suspicion that Plato may have been confronting a dire social problem remarkably similar to what we're facing a couple of decades into the twenty-first century.

I'd already been aware the shift from orality to literacy was understood to be, at least in part, caused by a perceived shift in the nature of the underlying stable representation of knowledge, from a "writ in water" oral tradition to something rather more like "writ in stone"; and it had occurred to me that the internet age has partly destabilized this, as records in electronic form are far more vulnerable to revision or outright erasure. The social volatility of the internet has been a pretty common observation for some years now. While some of this is perhaps due to the fact that information even in long-term-storage is less stable than it was, the instability becomes drastically greater in the short-term; the obviously massively increased accessibility of electronic records, for which we had such high hopes a couple of decades ago, turns out to be a double-edged sword, as empowering large segments of the population to make claims instantly accessible to large segments of the population subverts many of the traditional means by which rumors were kept from getting out of hand. It has seemed to me for some time that this change in the dynamics of meme transmission has so changed the environment of the ideosphere that we're in the midst of a memetic mass-extinction event.

And then there is the all-out war taking place now between fact-based and opinion-based mentalities. This has been a major theme in my recent thinking (though not, hitherto, on this blog), first in relation to news neutrality and then to the compatibility, or incompatibility, of civilization with the internet. In the fact-based mentality, one's basic impulse is to do one's best to acquire knowledge of objective reality, and one may then use that objective understanding as a foundation on which to build opinions. In the opinion-based mentality, one starts by choosing what opinions to hold, and may then select claims-of-fact to promote one's chosen opinions; or even invent claims of fact to promote one's opinions.

These bare-bones definitions don't give a sense of what the current conflict is like. So, with no previous blog post to refer to, I'll take a moment here for a ground-level view of the thing.

In the 2012 US presidential election, when Obama beat Romney in a landslide, Fox News had predicted a landslide for Romney based on what, to most of us, appeared to be laughably invalid techniques (iirc, they basically polled their own viewers and treated that as representative of the electorate). On election night Karl Rove refused to believe that Romney had lost, and eventually Fox News's own Megyn Kelly asked him, on air, "Is this just math that you do as a Republican to make yourself feel better? Or is this real?" (Megyn Kelly remained an advocate of journalism —a.k.a. "fact-based journalism"— at Fox for several more years, taking flak in 2016 for applying a fact-based approach to candidate Donald Trump, before quitting Fox that summer after publicly acknowledging having been sexually harassed by Roger Ailes. Trump, btw, will come up rather often in this discussion, for the rather banal reason that he's high-profile in the US — where I am.)

I'd had my own first major encounter with the opinion-based mindset back in 2010, in a discussion of Wikinews with a Wikipedian who, if I'd quite understood the situation, had recently decided to adopt an anti-Wikinews... opinion. My own assessment of Wikipedia is, as a whole, a mix of positive and negative. (This isn't a situation where the positive and negative cancel each other.) In the late 1990s, after a hopeful start to the World Wide Web, we were sliding toward a dystopia in which knowledge would be strictly a privilege of the rich while freely available resources would be, pretty-much exclusively, propaganda. That dystopia didn't happen then, and continues to not happen, in significant part because of Wikipedia; not because of the quality of its information, which is intrinsically debatable and for this purpose largely irrelevant, but because the existence of Wikipedia makes it difficult for any one special interest to dominate the flow of information. The other side of it is that Wikipedia appears to have been designed, from the start, on the naive expectation that in a radically open editing environment, opinions will balance each other in the long run and therefore should not be a systemic concern. I don't believe, btw, that balance is a viable way to achieve neutrality (various counterexamples come to mind); but even if it were, I think Wikipedia is tragically mistaken in supposing it can safely ignore the fact-based/opinion-based conflict. To disregard the conflict is to favor the opinion-based side; Wikipedia fails on this front both by encouraging people to gamble on the reliability of claims (footnotes, Wikipedia's primary means to connect claims with their sources, are too low-profile to encourage mindfulness by readers, having been designed for unobtrusiveness in an age when information-providing was much more tightly controlled); and, more subtly, by teaching contributors to the project that neutrality is only achievable through a negotiation of many points of view, with the corollary that individuals are inherently non-neutral and therefore ultimately cannot aspire to individual neutrality (whereas Wikinews actively promotes individual neutrality, seeking to frame neutrality in a way that can for the most part be successfully applied by each individual contributor — and thereby playing into the dynamics of Wikinews's two-tier review-for-publication system, which however seems too technical a topic to belong in this blog post).

In the 2010 encounter, I was slow to catch on to what was happening, perhaps partly because the person I was attempting to discuss the matter with was someone I'd had, up to that time, a great deal of respect for (besides which, I had in those days not yet fully shaken free of the spell of Wikipedia's "assume good faith" principle — another topic that seems too involved to tangle with here). They presented me with a block quote from a rabidly anti-Wikinews screed by someone who was clearly basing their attack on a fictionalized version of Wikinews. The Wikipedian presenting this to me finished with —and this too should have been a warning to me— the remark, "Hard to disagree with that." If I'd been quicker on the uptake, it might have occurred to me in the moment that it's only hard to disagree with this sort of attack if the attacker really doesn't care about facts. However, not being quite that quick, I remarked that the argument was based on lack of knowledge of Wikinews. In retrospect, my remark was chosen on an expectation that if someone became aware their argument was based on a foundation of unsound claims, their basic impulse would be to seek a better understanding of the facts of the situation. That was not what happened. Instead, they looked for a way to shape whatever revised information might come there way into a different argument for the position they'd staked out for themselves.

Lest there be any question that this is a hot war, one might cite the 2018 murder of Washington Post columnist Jamal Khashoggi (here's a Wikinews article), and reports from John Bolton on what Donald Trump says about journalists behind the scenes (here's a recent article, though not from Wikinews: J. Edward Moreno, "Bolton claims Trump called journalists 'scumbags' who should be 'executed' ", The Hill, 06/17/20).

Okay, that's the modern conflict. How similar is it to the orality/literacy conflict of Plato's era? There's that suggestive speculation about the Late Bronze Age collapse. There's a suggestive irrationality about the opinion-based mentality. But then we're left to try to compare, more specifically, the oral mentality to the opinion-based mentality. Oral poetry is supposed to be rhythmic, vivid, episodic, and timeful. While it might be interesting to analyze Donald Trump's output against all these, that would clearly be an undertaking well beyond the scope of this (already quite long) blog post. However, as food for thought, here's a Language Log entry about Trump's rhetorical style from early this year: Oral vs. written rhetoric (filed by Mark Liberman, February 9).

But then, there's also something curious in that elusive translation of the ancient Greek word doxa, by which Havelock steadfastly maintains Plato is referring to oral thought, as opposed to sophia, intellectual thought. Havelock sticks, rather unenthusiastically, to the "conventional" translation of doxa as "opinion", with a chapter endnote that the word may be more properly "thought in general", "an unqualified 'state of mind' ". I'm thinking, though, the conventional translation —"opinion"— may capture something primal about the subject matter.

The suggestion I take from all this is that what we are facing today is about as close as can be to —literally, as well as in the spirit of the clichéd plot device— an ancient evil. Close enough to really wonder if we should be sending a scholar into the library to study ancient tomes and work out how the evil was defeated last time (speaking of clichés). Alas it's never quite that easy; circumstances have changed, the enemy has invented new defenses against common tactics. Amongst the most obvious is to preemptively accuse others of whatever one has been doing oneself (which is why I was especially worried when, toward the end of the 2016 election, Donald Trump started to accuse the Democrats of rigging the election). Nevertheless, a closer study of the tactics used in the ancient Greek conflict really does seem worthwhile.

I have in mind, for future posts, to explore how sapience interacts with technology, since that's another key factor in the current mess; as well as the dynamics of the structure of society (economics, politics, and whatnot); and —if at all possible without completely tying everyone's thinking in knots, including my own— how different cognitive types work and interact with each other and all the rest of it.

Irregularity in language

2020-03-17T19:57:00.000-07:00

you wouldn't believe the kind of hate mail I get about my work on irregular verbs
— Stephen Pinker, in an interview with The Guardian, 2007.

Assembling my prototype conlang Lamlosuo transformed my understanding of irregularity in language. That was unexpected. The prototype was supposed to be a learning vehicle, yes — for learning about the language model I'd devised. Irregularity wasn't mentioned on the syllabus.

I set out to create an experimental prototype conlang with radically different semantic underpinnings than human natural languages (natlangs). (This blog is littered with evidence of my penchant for studying the structure of things by devising alternative structures to contrast with them.) The prototype was meant as a testbed for trying out features for conlangs based on the envisioned semantics; it had no strong stake in regularity, one way or another, aside from an inclination not to deliberately build in irregularities that would make the testbed less convenient to work with. The effect of the experiment, though, was rather like scattering iron filings on a piece of paper above a magnet, and thereby revealing unsuspected, ordinarily-invisible structure. From contemplating the shape of the prototype that emerged, I've both revised my thinking on irregularity in general, and drawn take-away lessons on the character of the language structure the prototype is actually meant to explore.

My first post on Lamlosuo, several years ago now, laid out the premise of the project and a limited set of its structural consequences, while deferring further complications —such as an in-depth discussion of irregularity— to some later post. This post is its immediate sequel, describing major irregular elements of Lamlosuo as they emerged, as well as what I learned from them about irregularity in general and about the language model in particular.

[Overall insights about the language project are largely —though by no means entirely— concentrated in the final section below. Insights into irregularity are distributed through the discussion, as they arise from details of the language.]

Contents
Irregularity
Vector language
Regularity
Routine idiosyncrasies
Patterns of variation
Extraordinary idiosyncrasies
Whither Lamlosuo?

Irregularity

From our early-1970s hardcopy Britannica (bought by my parents to support their children's education), I gathered that commonly used words tend to accumulate irregularities, while uncommonly used words tend to accumulate regularity by shedding their irregularities. From 1990s internet resources on conlanging (published there by scattered conlangers as they reached out through the new medium to form a community), I gathered that irregularity may be introduced into a conlang to make it feel more naturalistic. All of which I still believe, but these credible ideas can easily morph into a couple of major misapprehensions about irregularity, both of which I was nursing by the time I committed to my first conlanging project, at the turn of the century: that the only reason natlangs have irregularity is that natlangs evolve randomly in the field, so that a planned conlang would only have irregularity if the designer deliberately put it there; and that irregularity serves no useful function in a language, so that desire for naturalism would be the only reason a conlang designer would put it there.

20 years later, I'd put my current understanding this way: Irregularity is a natural consequence of the impedance mismatch between the formal structure of language and the sapient semantics communicated through it (a mismatch I last blogged about yonder). Sapient thought structures are too volatile to fit neatly into a single rigid format; large parts of a language, relatively far from its semantic core, may be tolerably regular, but the closer things get to its semantic core, the more often they call for variant structure. It may even be advantageous for elements near the core to be just slightly out of tune with each other, so they create (to use another physics metaphor) a complex interference pattern that can be exploited to slip sapient-semantic notions through the formal structure. Conversely, one may be able to deduce where the semantic core of the language is, from where this effect stirs up irregularity. By similar feedback, also, structural nonuniformities can orient sapient users of the language as they work intensively with the semantic core; I analogize this with the bumps on the F and J keys of a QWERTY keyboard, which allow a touch-typist to feel when their fingers are in standard position.

These effects are likely to apply as well to programming languages, which are ultimately vehicles for sapient thought. Note that the most peculiar symbol names of Lisp are concentrated at its semantic core: lambda, car, cdr.

Vector language

My central goal for this first conlanging project was to entirely eliminate nouns and verbs, in a grammatical sense, by replacing the verb-with-arguments structural primitive of human natlangs with some fundamentally different structural primitive. The verb-with-arguments structural pattern induces asymmetry between the grammatical functions of the central "verb" and the surrounding "nouns", which afaics is where the grammatical distinction between verbs and nouns comes from. (My notes also call these "being-doing" languages, as verbs commonly specify "doing" something while nouns specify simply "being" something.) In the structure I came up with to replace this, each content element would be, uniformly, an act of motion ("going"), understood to entail a thing that goes (the cursor), where it's going from and to, and perhaps some other elements such as the path by which it goes. For the project as a whole I hoped to have several related languages and some grammatical variance between them, but figured I'd need first to understand better how a language of this sort can work, to understand the kinds of variation possible. So I set out to build a prototype language, to serve as a testbed for studying whether-and-how the language model could work.

In the prototype language, there is just one open class of vocabulary words, called vectors, each of which has five participant slots, called roles. The five roles are: cursor, start, end, path, and pivot. The name pivot suggests that the action is somehow oriented about the pivot element, but really the pivot role is a sort of catch-all, a place to put an additional object associated with the action in some way. The pivot role in itself says something about irregularity. In lexicon building, each vector has definitions for each of its occupied roles. Defining all these roles for a given vector, I've found, establishes the meaning of the vector with great clarity. The cursor is the only absolutely mandatory role: there can't be a going without something that goes. The start and end are usually clear. The path is usually fairly straightforward as well, though sometimes occupied by an abstract process rather than a physical route of travel. But each vector is, in the end, semantically unique; and its uniqueness rebels against being pinned down precisely into a predetermined form —I analogize this to the Heisenberg uncertainty principle, where constraining one part of a particle's description requires greater leeway for another part— so that while the cursor, start, and end are usually quite uniform, and the path has limited flexibility, the pivot provides more significant slack to accommodate the idiosyncrasy of each vector.

For example: The first meaning I worked out for the language was a vector meaning speak. This was before the language even had a phonology; it was meant to verify, before investing further in the structural primitive, that it was capable of handling abstracts; and speak, as a meaning in a conlang, was appealingly meta. In a speech act, it seemed the thing that goes from somewhere to somewhere is the message; so I reckoned the cursor should be the message, the thing said. The start would then be the speaker; and the end would be whomever receives it, the audience. It was unclear whether the path would be more usefully assigned to the route by which the message travels, or the transmission medium through which it travels (such as the air carrying sound, or aether carrying radio waves); waiting for a preference to emerge, I toyed with one or the other in my notes but ultimately the path role of that vector has remained unoccupied. For the pivot, I struck on the idea of making it the language in which the message is expressed (such as English — or Lamlosuo).

The "escape-valve" pattern —regularity with an outlet to accommodate variant structure that doesn't neatly fit the regularity— recurred a number of times in the language design as it gradually emerged. The various escape mechanisms accommodate different grades of variant structure, and while the relations between these devices are more complex than mere nesting, the whole reminds me somewhat of a set of matryoshka dolls. With that image in mind, I'm going to try to order my description of these devices from the outside in, from the broadest and mildest irregularities to the narrowest and most extreme.

It's a fair question, btw, where all this emergent structure in the prototype emerges from. It all comes through my mind; the question is, what was I tapping into? (I'll set the origin of the vector primitive itself outside the scope of the question, as the initial inspiration seems rather discontinuous whereas the process after that may be somewhat fathomable.) My intent has been to access the platonic structure of the language model; that's platonic with a lower-case p, meaning, structure independent of particular minds in the same sense that mathematical structure is independent of particular minds. Given the chosen language primitive, I've tried through the prototype to explore the contours of the platonic structural space around that chosen primitive, letting natural eddies in that space shape the design while, hopefully, reducing perturbations from biases-of-thought enough to let the natural eddies dominate. (I also have some things I could say on the relationship between platonic structure and sapient thought, which I might blog about at some point if I can figure out how to address it without getting myself, and possibly even those who read it, hopelessly mired in a quag of perspective bias.)

Regularity

The outermost nesting shell; the outer matryoshka doll, as it were; is, in theory, the entirely regular structure of the language. I shall attempt to enumerate just those parts in this section, as briskly as may be. This arrangement turns out to be somewhat challenging, both because language features aren't altogether neatly arranged by how regular they are, and because the noted concentration of irregularity toward the semantic core assures there will be some irregularity in nearly all remotely interesting examples in Lamlosuo (on top of the limitations of Lamlosuo's thin vocabulary). Much of this material, with a bit more detail on some things and less on others, is included in the more leisurely treatment in the earlier post.

Ordinarily, a syllable has five possible onsets: f s l j w (as in fore, sore, lore, yore, wore); five possible nuclei: i u e o a (close front, close back, mid front, mid back, open; in my idiolect, roughly as in teem, tomb, tame, tome, tom); and two possible codas: n m (as in nor, more). In writing a word, if a front vowel (i or e) is followed by j and another vowel, or if a back vowel (u or o) is followed by w and another vowel, the consonant between those vowels is omitted; for example, lam‍losu‍wo would be shortened to lam‍losu‍o. Two other sounds occasionally arise: an allophone of f, written as t (the initial sound of thorn); and one plosive written as an apostrophe, ' (the initial sound of tore).

A basic vector word consists of an invariant stem and a mandatory class suffix. The stem is two or more consonant-vowel syllables (accent on the first syllable of the stem), and the class suffix is one consonant-vowel syllable. There are eleven classes: the neutral class, and ten genders; a neutral vector is sort-of a lexical verb, an engendered vector is sort-of a lexical noun (though this distinction lacks grammatical punch, as they're all still vectors). The neutral suffix after a back vowel (u or o) is ‑wa, otherwise it's ‑ja (so, the suffix consonant is omitted unless the stem ends with a). Genders identify role (one of the five) and volitionality (volitional or non-volitional). Non-volitional genders use front vowels, volitional genders use back vowels; the onset determines the role: ‑li/‑lu cursor, ‑ti/‑tu start, ‑se/‑so end, ‑je/‑jo path, ‑we/‑wo pivot. Somewhat relevant to irregularity, btw: start and end genders deliberately use different vowels to strengthen their phonological contrast since they have relatively weak semantic contrast; while, on the other hand, an earlier experiment in the language determined that assigning the vowels in consistent sets (either i/u or e/o, never i/o or e/u) is a desirable regularity to avoid confusion.

For example: The vector meaning speak has stem losu-. The neutral form is losua; engendered forms are losuli (message, non-volitional), losutu (speaker, volitional), losuso (audience, volitional), losuo/losue (living language/non-living language). My first thought for the non-volitional pivot, losue, was dead language; but then it occurred to me that that gender would also suit a conlang.

Vector words can also take any of a limited set of prefixes, each of the form consonant-vowel-consonant; as the two coda consonants are very similar (m and n), I try to avoid using two prefixes that differ only by coda. In ideal principle, each prefix would modify its vector in a uniform way. A vector prefix can also be detached from the vector it modifies, to become a preposition.

A simple clause is a chain of vectors, where each pair of consecutive vectors in the chain are connected by means of role alignment. Generically, one puts between the two vectors first a dominant role particle, which specifies a role of the first vector (the dominant vector in the alignment), then a subordinate role particle specifying a role of the second vector (the subordinate vector in the alignment), indicating that the same object occupies those two roles. Ordinarily, the dominant role particles are just the volitional gender suffixes, the subordinate role particles are just the non-volitional gender suffixes, all now as standalone words, except using f rather than t for the start particles. For instance, losua fu li susu‍a would equate the start of losua with the cursor of susu‍a. If a vector is engendered, one may omit its role particle from an alignment, in which case by default it aligns on its engendered role (though an engendered vector can be explicitly aligned on any of its roles). There are also a set of combined role particles, using the usual role consonants with vowel a; a combined role particle aligns both vectors on that role.

Each of the fifteen basic role particles (five dominant, five subordinate, five combined) has a restrictive variant; the distinction being that a non-restrictive alignment asserts a relationship between vectors whose meanings are determined by other means, while a restrictive alignment must be taken into account in determining the meanings of the vectors. Each restrictive role particle prefixes the corresponding non-restrictive particle with its own vowel; thus, ja → a‍ja, etc.

A clause can be packaged up as an object by preceding it with a subordinate content particle. A subordinate content particle is simply a single vowel, as a standalone word. The five subordinate content particles determine the mood of the objectified clause (and can also be used at the front of a sentence to assign a mood to the whole thing): a, indicative; i, invitational; u, imperative; e, noncommittal; o, tentative. Having bundled up a clause as an object, one can then treat it as the subordinate half of a role alignment with a dominant vector. There are also dominant content particles, which package up the dominant vector (just the one vector) as an object to align with some role of the subordinate vector, thus beginning a subordinate relative clause. Dominant content particles prefix ow- to the corresponding subordinate content particles (the w attaches to the second syllable, and then is dropped since preceded by a back vowel) — with a lone exception for the dominant tentative content particle, which by strictly regular construction should be oo but uses leading vowel u (thus, uo) to avoid confusion with the dominant restrictive pivot particle (oo). (In crafting that detail, I was reminded of English "its" versus "it's".)

The image of a subordinate content particle packaging up a subordinate clause and objectifying it for alignment with a dominant role seems to have built into it a phrase-structure view of the situation. Possibly there is a way to view the same thing in a dependency-grammar framework (rather like wave-particle duality in physics); the whole constituency/dependency thing is not yet properly clear to me, and when I designed that part of Lamlosuo I was unaware of the whole controversy: phrase-structure was the only approach to grammar I'd even seen, somewhat in grade-school and intensively in compiler parsing algorithms. So, this particular part of the language design might or might not contain an embedded structural bias.

A provector has a stem of the form vowel-consonant and a class suffix. The provector stems are in- (interrogative), um- (recollective), en- (indefinite), on- (relative), an- (demonstrative). The recollective provector has an antecedent earlier in the clause, and does not align with its syntactic predecessor; where ordinarily alignment can only align a vector with two others (the one before it and the one after it), as antecedent of a recollective provector it can participate in any number of additional alignments. (The demonstrative provector, btw, serves the function of a third-person pronoun, using cursor an‍lu/an‍li in general, volitional start an‍tu for a person of the same sex/gender as the speaker, volitional end an‍so for a person of different sex/gender from the speaker; but I digress.)

A vector can incorporate a simple clause. Position the vector at the front of the simple clause, and join the entire clause together with plosives (') between its words; the whole then aligns as its first vector, with the rest of the incorporated clause aligned to it independent of any other surrounding context. Recollective provectors may be disambiguated by incorporating a copy of the antecedent vector.

Routine idiosyncrasies

Beyond a vector's definitions of its neutral form and up-to-ten genders, each vector has a number of conventions associated with it that accommodate low-to-medium-grade vector-idiosyncrasies of the sort that occur broadly throughout the vocabulary. Role alignment is not as simple as "the object that occupies this role of this vector is the same object that occupies that role of that vector": that isn't always the sort of relation-between-vectors that's wanted, and when it is, there may be refinements needed to clarify what is meant. The meaning of an alignment is resolved primarily by alignment conventions of the dominant vector. My notes on the language design suggest that exceptions to the regular sense of alignment are most often associated with vectors corresponding, in a verb-with-arguments language, to conjunctions and helping verbs.

Combined role particles play a significant part in this because, it turns out, the "standard" meaning of the combined role particles —to align the same role of both vectors, thus la = lu li, sa = so se, etc.— is rarely wanted. The combined role particles are therefore an especially likely choice for reassignment by convention based on more realistic uses of a particular vector. A given vector often has some practical use, due to the particular meaning of that vector, for alignments that involve multiple roles of each vector (as a simple example, one might equate the cursor of both vectors, and at the same time equate the end of the first vector with the start of the second); or, sometimes, for some other more peculiar alignment strategy appropriate to the vector's particular meaning; and combined role particles are routinely drafted for the purpose.

Several rather ordinary vectors have some role that, by the nature of their meaning, is often a complex information structure described by a subordinate clause, and therefore they use the combined role particle on that role to imply a subordinate noncommittal content particle (e): losua la — (say that —), lawa‍ja la — (teach that —), susu‍a wa — (dream that —); sofo‍a (deduce) and soo‍a (imply) do this on multiple roles. A more mundane example of variant alignment conventions (not involving implied content particles) applies to stem seli-, go repeatedly, whose cursor is the set of instances of repeated going. When dominant in an alignment with combined cursor particle la, the subordinate vector is what is done repeatedly (a restrictive alignment); subordinate start, path, and end are assigned to those dominant roles, while subordinate cursor is assigned to dominant pivot. Preceding seli- by a number indicates the number of repetitions; for example, siwe‍a seli‍a la jasu‍a = sneeze three times. (In fact, this can be shortened to siwe seli jasu‍a; see my earlier remark on interesting examples.)

A moderately irregular configuration is two neutral vectors used consecutively in a clause with no explicit particle between them. The strictly regular language assigns no meaning to this arrangement, as there are no gender suffixes on the vectors to determine default roles when omitting the role particles; the configuration has to depend on conventions of the dominant (or, less likely, the subordinate) vector. The language notes stipulate that this type of alignment is restrictive.

Patterns of variation

The alignment idiosyncrasies of particular vectors fall into overall patterns. At the start of Lamlosuo I didn't see this coming, which in retrospect seems part of my general failure to appreciate that irregularity is more than skin deep. With increasing number of vectors explored in the lexicon, though, I began to sense the shapes of these patterns beneath the surface, and then tried to work out what some of them were.

Because these are patterns that arise in other patterns that arise in the language, they compound the ambiguity between (again) the language's platonic structure versus my latent biases of thinking: each lexicon entry is subject to this ambiguity, both in the choice of the entry and in its articulation, while the perception of patterns amongst the data points is ambiguous again. This blog post has a lopsided interest in the platonic structure —my biases would be entirely irrelevant if not for the drive to subtract them from the picture— but I'd recommend here to not stint on even-handed skepticism. Vulnerable as the process is to infiltration by biases of thinking (the phrase "leaky as a sieve" comes to mind), it should be no less vulnerable to infiltration from the platonic structure of the language. Influences from the platonic realm can seep in both directly by perturbing interplay of language elements, and indirectly by perturbing biases of thought at any point in the backstory of the thought. Biased influence can therefore be platonic influence; or, putting the same thing another way, the only biases we'd want to subtract from the picture are those that aren't ultimately caused by the platonic structure. However murky the process gets, I'd still hope for the emergent patterns to carry evidence of the platonic structure.

Very early on, I'd speculated consecutive neutral vectors might align by chaining sequentially, cursor-to-cursor and end-to-start. This in its pure form looked less plausible as the lexicon grew, as it became clear that many vectors were of the wrong form. (For instance, aligning losua susu‍a in this way —susu‍a means sleep— would equate the message with the person who sleeps, and the audience with the act of falling asleep.) Another early notion was that some vectors would be used to modify other vectors, by aligning in parallel with them — equating cursor-to-cursor, path-to-path, start-to-start, end-to-end. I've called these modifiers advectors. Parallel alignment could be assigned, by dominant-vector-driven convention, to consecutive neutral vectors, and perhaps to the combined path particle (ja, which in this case would take on restrictive effect by convention). The sequential/parallel preference also arises in the semantics of more general alignments, such as the sentence (mentioned earlier) losua fu li susu‍a, which describes a speech act and a sleep act, both by the same person (dominant start, the speaker, is aligned with subordinate cursor, the sleeper); to understand the import of the alignment, one has to know whether the speaking and sleeping events take place in parallel (so that the person is speaking while sleeping) or in series (so that the person speaks and then sleeps).

When the merger of two vectors allows their combination to be treated as a single vector, the vector stems may be concatenated directly, forming a compound stem which can then be engendered after the merge. For example, stem lolulelo- means father, sequentially combining lolu- (impregnate) and lelo- (give birth to). According to current notes, btw, lolulelo- has shortened form lole-.

When a series of consecutive neutral vectors form a compact clause, short of merging into a single compact vector, I've considered a convention that the neutral class suffix may be omitted from all but the last of the sequence — "in all but the most formal modern usage", as the language notes say. (Evidently I hesitated over this, as the latex document has a boolean flag in it to turn this language feature on or off; but it's currently on.)

Accumulating vocabulary gradually revealed that pivots generally fell into several groupings. A reference point defining the action (whence the term pivot). Intermediate point on the path. Motivation for the action. Agent causing the action. Instrument. Vehicle. Listing these now, it seems apparent these are the sorts of things that —in a typical European natlang— might well manifest as a clutter of more-or-less-obscure noun cases. I'd honestly never thought of those sorts of clutters-of-noun-cases as a form of intermediate-grade irregularity (despite having boggled at, say, Finnish locatives); and now I'm wondering why I hadn't.

Eventually, I worked out a tentative system of three logical roles —patient, agent, instrument— superimposed on the five concrete roles. These logical roles would map to concrete roles identifying the associated noun primarily affected by the action (patient), initiating the action (agent), and implementing the action (instrument). Of the three, only patient is mandatory; agent and instrument often occur, but sometimes either or both don't. Afaics, agent and instrument are always distinct from each other, but either may map to the same concrete role as patient.

Patient is usually either cursor or end, though occasionally pivot or start; "path patient", say my notes, "is unattested". Agent is usually either cursor, start, or pivot; if the patient is the cursor, usually the agent is either pivot or cursor. Instrument is usually either cursor or pivot: pivot when cursor is agent, cursor when cursor isn't agent. Patient also correlates with natlang verb valency: when a vector corresponds to an intransitive verb, its patient is almost always the cursor, when to a transitive verb, its patient is typically the end.

For some time it remained unclear whether the logical roles should be considered a platonic feature. I've often taken a "try it, see if it works" attitude toward adding things to the language, which is after all meant to be a testbed; the eventual rough indicator of a feature's platonic authenticity (platonicity?) is then how well it takes hold in the language once added. A few of the things I've added just sat there inertly in the language design, until eventually discarded as failing to resonate with the design (such as a vector equivalent of the English verb be; which in retrospect clashes with the Lamlosuo design, both as copula which is what role particles are for, and as pure being whereas vectors impose motion on everything they portray). Given some time to settle in, logical roles appear reasonably successful, having deeply integrated into some inner workings of the language: various sorts of alignments both guide and are guided by logical roles. Alignment guides logical roles, notably, in restrictive sequential or parallel alignments; for example, an advector inherits the logical roles of the other vector in parallel alignment. Logical roles guide alignment in the highly irregular vector(s) at the apparent heart of the language, which I'll describe momentarily.

I wondered about aspect —the structure of an activity with respect to time (as opposed to its placement in time, which is tense)— for the prototype language, since aspect is a prominent feature of human natlangs. Aspect has arisen in Lamlosuo mainly through the principle that the action of a neutral vector is usually supposed by default to happen once, whereas the action of an engendered vector is usually supposed to happen habitually. Thus, in losua fu li susu‍a someone speaks and then sleeps, whereas in losutu li susu‍a a habitual speaker sleeps. Usually, in a restrictive alignment, aspect too is inherited by the dominant vector, which affords some games with aspect by particular vectors (deferred to the next section below). If one wanted more nuanced sorts of aspect in the testbed language, one might introduce them through alignments with particular vectors that exist to embody those aspects; however, I never actually did this. Allowing myself to be guided by whatever "felt" natural to pursue (so one may speculate what sort of butterfly started the relevant breeze), my explorations led me instead to something... different. Not relating a vector to time, but rather taking "tangents" to the basic vector at various points and in various abstract-spatial directions. As the trend became more evident, I dubbed that sort of derived relation attitude. (My language notes assert, within the fictional narrative, that the emphasis on attitude rather than aspect is a natural consequence of the language speakers' navigational mindset.) Some rather mundanely regular particular vectors were introduced to support attitudes; looking through the lexicon, I see stems jali- (leave), jeli- (go continuously), joli- (arrive), supporting respectively the inceptive, progressive, and terminative attitudes.

Extraordinary idiosyncrasies

In any given language, it seems, there's likely to be some particular hotspot in the vocabulary where idiosyncrasies cluster. Hopefully, the location of such a hotspot ought to say something profound about the language model, though as usual there's always potential bias-of-thought to take into account. The English verb be is a serious contender for the most irregular verb in the language, with do coming in a respectable second to round out this semantic heart of the language structure. As noted earlier, I've sometime referred to human languages as "being-doing languages"; and occasionally my notes have called vector languages "going languages". Early on, I simplistically imagined that a generic vector meaning go might be the center of the language. Apparently not, though; in the central neighborhood, yes, but not at the very heart. The stand-out vector that's accumulated irregularity like it's going out of style is fajoa — meaning change state.

A sort of microcosm for this hotspot effect occurs in the finitely bounded set of Lamlosuo's vector prefixes (which, by the phonotactics described earlier, are each consonant-vowel-consonant, so there are at most 5×5×2 = 50 of them, or 25 if no two prefixes differ only in their final consonant; the current lexicon has 12, which is about 50% build-out and feels fairly dense). Most of the prefixes are fairly straightforward in function (since prefix jun- makes a vector reflexive, junlosua would be talk to oneself; and so on). The most exceptional prefix, consistently through the evolution of the language, has been lam-, which makes the vector deictic, i.e, makes it refer to the current situation. The deictic prefix, as I've used it, is rather strongly transformative and I've used it only sparingly, on a few vectors where its effect is especially useful; in particular, stems losu-, sile-, jilu-. (Though I would expect a fluent speaker to confidently use lam- in less usual ways when appropriate, as fluent speakers are apt to occasionally bend their language to the bafflement of L2 speakers.)

Stem lam‍losu- is the speaking in which the word itself occurs. Several of its engendered forms are particularly useful; lamlosuo (volitional pivot) is the living language of the speaking in which the word occurs, hence the conlang itself (viewed fictionally as a living language); lam‍losu‍so (volitional end) is the audience, thus the second-person pronoun; lam‍losu‍tu (volitional start) is the speaker, thus the first-person pronoun. The latter two are contracted (a purely syntactic form of irregularity, motivated by convenience/practicality) to laso and latu.

Stem sile- means experience the passage of time; the cursor is the experiencer; path, time; start, the experiencer's past; end, their future; pivot, the moment of experience. lam‍sile- is the immediate experience-of-time, whose pivot is now; after working with it for a while, I adopted a convention that the past/present/future might colloquially omit the prefix. Tense is indicated by aligning a clause with engendered sile‍tu (past), sile‍wo (present, if one wants to specify that explicitly), or sile‍so (future). Hence, latu fi losua oa sile‍tu = sile‍tu a latu fi losua = I spoke.

Stem jilu- means go or travel in a generic sense (whereas go in a directional sense is wilu-). lam‍jilu‍a is the going we're now engaged in; its cursor is an inclusive first-person pronoun (we who are going together); path, the journey we're all on (i.e, the activity we're engaged in); pivot, here; end (or occasionally start), there. With preposition sum indicating a long path, this enters into the formal phrase sum lam‍sile‍tu sum lam‍jilu‍se: long ago and far away.

Now, fajo-. Change state. The cursor is the thing whose state changes. Non-volitional path is the process of state-change, volitional path is the instrument of state-change. Non-volitional pivot is an intermediate state, volitional pivot is the agent of state-change. Start and end, both non-volitional, are the state before and after the change.

When dominant fajo- aligns its cursor with some role of a subordinate vector, fajo- is the state change undergone by the aligned subordinate role during the action of the subordinate vector. Either the dominant role, the subordinate role, or both may be elided; the dominant role when unspecified defaults to cursor —even if fajo- is engendered, an extraordinary exception— while the subordinate role when unspecified defaults to patient, making the meaning of the construct overtly dependent on which concrete role of the subordinate vector is the patient. Along with all this, the dominant pivot aligns to the subordinate agent, and dominant path to subordinate instrument (when the subordinate vector has those logical roles). According to the language notes, if the subordinate vector doesn't have an agent, and the subordinate pivot is an intermediate point on the subordinate path (as e.g. for sile-), and the subordinate cursor aligns with the dominant cursor, the dominant pivot is the state of the subordinate cursor as it passes through the subordinate pivot.

One thus has such creatures as fajo‍ti losu‍tu, the state of having not yet spoken; and fajo‍se losu‍so, the state of having been spoken to. (Notice that these things take many more words to say in English than in Lamlosuo, whereas the past tense took many more words to say in Lamlosuo than in English.)

Cursor-aligned fajo- can also take the form of a preposition fam or prefix fam-, with the difference between the two that engenderment of the vector is applied after a prefix, but before a preposition. Thus, fam⟨stem⟩⟨gender⟩ = fajo⟨gender⟩ ⟨stem⟩a. For example, susu‍e = event of dreaming, fam‍susu‍e = fajo‍e susu‍a = state of dreaming.

When dominant fajo- aligns its path with a subordinate content clause, fajo- is the state change vector of the complex process described by the content clause. Combined role particle ja initiates a noncommittal content clause by implying subordinate content particle e. The dominant cursor is then the situation throughout the process, dominant start the situation before the process, dominant end the situation after the process, dominant pivot the agent of the process.

fajoa has siblings lajoa and wajoa.

lajoa describes a change of mental state. Dominant path of lajo- doesn't align with a subordinate clause, but dominant cursor aligns similarly to fajo-, describing the change of mental state of whichever participant in the subordinate action; noting, the agent, if not otherwise determined, is the cursor's inclination toward the change (always available in the volitional pivot engendered form, lajo‍o). For example, recalling sile‍tu = past (earlier point in time), where fajo‍ti sile‍a = fam‍sile‍ti = youth (external state at earlier point in time), lajo‍ti sile‍a = inexperience (internal state at earlier point in time). When the subordinate vector already describes the mind, fajo- describes mental state, and lajo- is not used; e.g., fam‍susu‍e = state of dreaming is primarily an internal state.

wajoa describes the abstract process of being used as an instrument. Cursor, instance of use; non-volitional path, process of use; volitional path, person who uses; (volitional/non-volitional) pivot, agent of use; (volitional/non-volitional) start, instrument of use; end, patient of use. Alignment is similar to fajo-, but subordinate role defaults to instrument rather than patient. For example, wajo‍o jilu‍a = person who uses a vehicle or riding beast, wajo‍o jilu‍e = person who uses a vehicle, wajo‍o jilu‍o = person who uses a riding beast.

On the periphery of this central knot of irregularity is jilu-, meaning (again) go or travel in a generic sense. When dominant in an alignment with combined path particle ja, the role particle implies subordinate noncommittal content particle e, and jilu- aligns in parallel (it's an advector) to whatever complex process is described by the following subordinate clause. (I don't group this with the larger family of mundane vectors using combined role particles to imply subordinate content, because here the alignment is implicitly restrictive and doesn't follow from complexity in the semantics of the vector, as with teach (lawa-), imply (soo-), etc.) Here the alignment is purely a grammatical device; it unifies a complex process from the subordinate clause into a coherent vector, and objectifies it as the volitional path (engendered form jilu‍jo). More subtly, jilu‍a ja with an engendered subordinate vector can provide a neutral vector with habitual aspect: jilu‍a wo jeo‍e = go using a fast vehicle (once), jilu‍a ja jeo‍e = habitually go using a fast vehicle.

One can (btw) also play games with habitual aspect in using fajo-, exactly because it doesn't inherit the aspect of the subordinate clause: engendering fajo- gives the state change habitual aspect, but gender in the subordinate clause does not. Thus, latu we fajoa ja susu‍a lu laso = I (once) cause you to (once) sleep; latu we fajoa ja susu‍lu laso = I (once) cause you to habitually sleep; latu fajo‍o ja susu‍a lu laso = I habitually cause you to (once) sleep; latu fajo‍o ja susu‍lu laso = I habitually cause you to habitually sleep. (Why I would have this soporific effect, we may suppose is provided by the context in which the sentence occurs.)

Whither Lamlosuo?

After a while —perhaps a year or more of tinkering— Lamlosuo began to take on an increasingly organic texture. Natlangs draw richness from being shaped by many different people; a personal project, I think, when carried on for a long time starts to accrue richness from essentially the same source: its single author is never truly the same person twice. If you set aside the project and come back to it a week or a month later, you're not the same person you were when you set it aside; beside the additional things you've experienced in that time, most people would also no longer be quite immersed in some project details and would likely develop a somewhat different experience of them while reacquiring. So the personal project really is developed by many people: all the people that its single author becomes during development. This enrichment cannot be readily duplicated over a short time, because the author doesn't change much in a short time. This may be part of why the most impressive conlangs tend to be decades-long efforts; of course total labor adds up, but also, richness adds up.

The most active period of Lamlosuo development tailed off after about three years, due to a two-part problem in the vocabulary — phonaesthetic and semantic.

The phonology and phonotactics of Lamlosuo (whose conception I discussed a bit in the earlier post) are flat-out boring. There are just-about no internal markers indicating morphological structure within a vector stem —even the class suffix is generally hard to recognize as not part of the stem— so there has been a bias toward two-syllable vector stems; it's been my perception that uniformly two-syllable simple stems help a listener identify the class suffix, so that nonuniform stem lengths (especially, odd-syllable-count stems) can be disorienting. There are only a rather small number of two-syllable stems possible (basically, 5⁴ = 625) and, moreover, packing those stems too close together within the available space not only makes them harder to remember, but harder even to distinguish. After a while I reformed the lexicon a bit by easing in some informal principles about distance between stems (somewhat akin to Hamming distance) and some mnemonic similarities between semantically related stems. The most recent version of the language document has 70 simple vector stems.

Semantically, a large part of the extant vocabulary is about the mechanics of saying things — attitude, conjunctions, numbers. One also wants to have something to talk about. Not wanting to build social biases into a vocabulary that didn't yet have a culture attached to it, I started with vocabulary for rather generic biological functions (eat, sleep...) and navigational maneuvers (go fast/slow, go against the current...) on the —naive— theory this would be "safe". Later, with the mechanics-oriented vocabulary more complete, a small second wave of content-oriented words ventured into emotional, intellectual, and spiritual matters. (The notes outline somewhat more ambitious spiritual structure than has been implemented yet; though I do rather like the stems deployed so far (speaking of bias) — fulo-, go wrongly, go contrary to the proper order of things; jolo-, go rightly, go with the proper order of things; wio-, inform emotion with reason; wie-, inspire reason with emotion.)

I did take away some lessons from building content vocabulary for Lamlosuo. The vector approach has a distinctly dynamic effect on the language's outlook, since it doesn't lend itself to merely labeling things but asks for some sort of "going" to underlie each word. This led, for instance, to the coining of two different words for blood, depending on what activity it's engaged in — jesa‍lu (circulating blood) and fesa‍lu (spilt blood). Also, just as the vector concept induces conception of motions for a given noun, the identification of roles for each vector induces conception of participants for a given activity; for instance, in trying to provide a vector corresponding to English adjective fast, one has first advector jeo‍a, go at high speed, from which one then gets jeo‍lu (fast goer), jeo‍e (fast vehicle), jeo‍o (fast riding beast).

The dynamism of everything being in motion is accompanied by a secondary effect playing in counterpoint: whereas human languages tend to provide each verb with a noun actor, Lamlosuo is more concerned to provide each vector with a noun affected. This is a rather subtle difference. The human-language tendency manifests especially in the nominative case (which of course ergative languages don't have, but then, accusative languages are more common); the Lamlosuo tendency is visible in the stipulation that the patient logical role is mandatory while the agent role is optional (keeping in mind, my terms patient and agent for Lamlosuo have somewhat different connotations than those terms as commonly used in human grammar: affected is not quite the same as acted upon). The distinction seems to derive from the relatively calm, measured nature of the vector metaphor for activity: while going is more dynamic than being, it is on balance less dynamic than most forms of doing. (If there's a bias there in my patterns of thought, I'm not sure its effect on this feature could be distinguished from its effect on the selection of the vector primitive in the first place.)

From time to time as Lamlosuo has developed, I've wondered about personal names. If even labeling ordinary classes of things requires the conception of underlying motions, how is one to handle a label meant to signify the unique identity of a particular person? I would resist a strategy for names that felt too much like backsliding into "being-doing" mentality, since much of the point of the exercise is to try something different (and since, into the bargain, any such backsliding would be highly suspect of bias on my part; not that I'd absolutely stonewall such a development, but the case for it would have to be pretty compelling). Early in the development of Lamlosuo, I was able to simply defer the question, as at that point questions about the language that had answers were the exception, and this was just one more in the general sea of unknowns. Lately, though, in closely studying the evolution of abstract thought in ancient Greece (reading Bruno Snell's The Discovery of Mind, as part of drafting a follow-up to my post on storytelling), I'm struck by how Lamlosuo's ubiquity of process sits in relation to Snell's analysis of abstract nouns, concrete nouns, and proper nouns (and verbs, to which Snell ascribes a special part in forming potent metaphors). The larger conlanging project, within which Lamlosuo serves, posits a long timeline of development of the conspeakers' civilization, and as I look at it now this begs the question of how their abstractions evolved. Mapping out the evolution might or might not provide, or inspire, a solution to the naming problem; at any rate, it's deeply unclear at this point what these factors imply for Lamlosuo, as well as for the larger project.

Avoiding cultural assumptions in the vocabulary created a highly clinical atmosphere (which is why I called the hope of culture-neutrality "naive": lack of cultural features is a kind of culture; also note, human culture ought to contain traces from the evolution of abstract thought). Each word tended to be given a rather pure, antiseptic meaning (until late in the game when I started deliberately working in a bit of flavor), heightening a trend already latent in the cut-and-dried mechanics of the language that arose from its early intent, as a testbed, to not bother with naturalism (so, in a way all this traces back to regularity). For example: hoping to insulate the various sex-related vocabulary words from lewd overtones, I set out to fashion an advector corresponding to the English adjective obscene, so that one might then claim the various other words weren't obscene without the advector (which of course amounts to making those other words more clinical). The result took on a life of its own. Advector josu-, do something obscene (with absolutely no implication whatever as to what is done); start agent; end patient; pivot instrument. One is naturally led to consider the difference between a non-volitional instrument and a volitional instrument. Throw in the reflective prefix and, for extra vitriol, an invitational mood particle, and you've got i jun‍josu‍a, which the language notes translate as "FU", but really it's more precise, more... clinical than that.

One natural next major step for Lamlosuo —if there were to be a next major step, rather than moving on to the other languages it was meant to prepare the way for— would be a push to significantly expand the vocabulary, to allow testing the dynamics of larger discourses. (I wrote a long post a while back about long discourses). However, the bland, narrow vocabulary space seemed an obstacle to this sort of major vocabulary-expansion operation. A serious naturalistic conlang would combat this sort of problem partly through the richness that, as noted, comes from developing in many sessions over a long time; but ultimately one also has to mix this with some technological methods. Purely technological methods would always create something with an artificial feel, so one really wants to find ways of using technological methods to amplify whatever sapient richness is input to the system; and that sounds like an appropriate study for a testbed language such as Lamlosuo. Moreover, I just don't readily track all the complex details of a linguistic project like this — not if it's skipping like a pebble across time, with intervals between development sessions ranging from a few hours to a few years; I therefore imagined some sort of automated system that would help keep track of the parts of the language design, noting which parts are more, or less, conformant to expected patterns — and why. (I'm very much aware that, in creating such designs, to maintain an authentic sapient pattern you need to be able to explain an exception just once and not have the system keep hounding you about it until you give the answer the automated system favors.)

And at this point, things take an abrupt turn toward fexprs. (Cf. the law of the instrument.) My internal document describing the language is written in LaTeX. Yet, as just described, I'd like it to do more, and do it ergonomically. As it happens, I have a notion how to approach this, dormant since early in the development of my dissertation: I've had in mind that, if (as I've been inclined to believe for some years now) fexprs ought to replace macros in pretty much all situations where macros are used, then it follows that TeX, which uses macros as its basic extension mechanism, should be redesigned to use fexprs instead. LaTeX is a (huge) macro package for TeX.

So, Lamlosuo waits on the speculative notion of a redesign of TeX. It seems I ought to come out of such a redesign with some sort of deeper understanding of the practical relationship between macro-based and fexpr-based implementations, because Knuth's design of TeX is in essence quite integrated — a daunting challenge to contemplate tampering with. (One also has to keep in mind that the extreme stability of the TeX platform is one of its crucial features.) It's rather sobering to realize that a fexpr-based redesign of TeX isn't the most grandiose plan in my collection.

Rewriting and emergent physics

2019-05-02T21:57:00.000-07:00

Tempora mutantur, nos et mutamur in illis.
(Times change, and we change with them.)
— Latin Adage, 16t‍h-century Germany.

I want to understand how a certain kind of mathematical system can act as a foundation for a certain kind of physical cosmos. The ultimate goal of course would be to find a physical cosmos that matches the one we're in; but as a first step I'd like to show it's possible to produce certain kinds of basic features that seem prerequisite to any cosmos similar to the one we're in. A demonstration of that much ought, hopefully, to provide a starting point to explore how features of the mathematical system shape features of the emergent cosmos.

The particular kind of system I've been incrementally designing, over a by-now-lengthy series of posts (most recently yonder), is a rewriting system —think λ-calculus— where a "term" (really more of a graph) is a state of the whole spacetime continuum, a vast structure which is rewritten according to some local rewrite rules until it reaches some sort of "stable" state. The primitive elements of this state have two kinds of connections between them, geometry and network; and by some tricky geometry/network interplay I've been struggling with, gravity and the other fundamental forces are supposed to arise, while the laws of quantum physics emerge as an approximation good for subsystems sufficiently tiny compared to the cosmos as a whole. That's what's supposed to happen for physics of the real world, anyway.

To demonstrate the basic viability of the approach, I really need to make two things emerge from the system. The obvious puzzle in all this has been, from the start, how to coax quantum mechanics out of a classically deterministic rewriting system; inability to extract quantum mechanics from classical determinism has been the great stumbling block in devising alternatives to quantum mechanics for about as long as quantum mechanics has been around (harking back to von Neumann's 1932 no-go theorem). I established in a relatively recent post (thar) that the quintessential mathematical feature of quantum mechanics, to be derived, is some sort of wave equation involving signed magnitudes that add (providing a framework in which waves can cancel, so producing interference and other quantum weirdness). The geometry/network decomposition is key for my efforts to do that; not something one would be trying to achieve, evidently, if not for the particular sort of rewriting-based alternative mathematical model I'm trying to apply to the problem; but, contemplating this alternative cosmic structure in the abstract, starting from a welter of interconnected elements, one first has to ask where the geometry — and the network — and the distinction between the two — come from.

Time after time in these posts I set forth, for a given topic, all the background that seems relevant at the moment, sift through it, glean some new ideas, and then set it all aside and move on to another topic, till the earlier topic, developing quietly while the spotlight is elsewhere, becomes fresh again and offers enough to warrant revisiting. It's not a strategy for the impatient, but there is progress, as I notice looking back at some of my posts from a few years ago. The feasibility of the approach hinges on recognizing that its value is not contingent on coming up with some earth-shattering new development (like, say, a fully operational Theory of Everything). One is, of course, always looking for some earth-shattering new development; looking for it is what gives the whole enterprise shape, and one also doesn't want to become one of those historical footnotes who after years of searching brushed past some precious insight and failed to recognize it, so that it had to wait for some other researcher to discover it later. But, as I noted early in this series, the simple act of pointing out alternatives to a prevailing paradigm in (say) physics is beneficial to the whole subject, like tilling soil to aerate it. Science works best with alternatives to choose between; and scientists work best when their thoughts and minds are well-limbered by stretching exercises. For these purposes, in fact, the more alternatives the merrier, so that as a given post is less successful in reaching a focused conclusion it's more likely to compensate in variety of alternatives.

In this series of physics posts, I keep hoping to get down to mathematical brass tacks; but very few posts in the series actually do so (with a recent exception in June of last year). Alas, though the current post does turn its attention more toward mathematical structure, it doesn't actually achieve concrete specifics. Getting to the brass tacks requires first working out where they ought to be put.

Contents
Dramatis personae
Connections
Termination
Duality
Scale

Dramatis personae

A rewriting calculus is defined by its syntax and rewriting rules; for a given computation, one also needs to know the start term. In this case, we'll put off for the moment worrying about the starting configuration for our system.

The syntax defines the shapes of the pieces each state (aka term, graph, configuration) is made of, and how the pieces can fit together. For a λ-like calculus, the pieces of a term would be syntax-tree nodes; the parent/child connections in the tree would be the geometry, and the variable binding/instance connections would be the network. My best guess, thus far, has been that the elementary pieces of the cosmos would be events in spacetime. Connections between events would, according to the general scheme I've been conjecturing, be separated into local connections, defining spacetime, and non-local connections, providing a source of seeming-randomness if the network connections are sufficiently widely distributed over a cosmos sufficiently vast compared to the subsystem under consideration.

I'm guessing that, to really make this seeming-randomness trick work, the cosmos ought to be made up of some truly vast number of events; say, 10⁶⁰, or 10⁸⁰, or on up from there. If the network connections are really more-or-less-uniformly distributed over the whole cosmos, irrespective of the geometry, then there's no obvious reason not to count events that occur, say, within the event horizon of a black hole, and from anywhere/anywhen in spacetime, which could add up to much more than the currently estimated number of particles in the universe. Speculatively (which is the mode all of this is in, of course), if the galaxy-sized phenomena that motivate the dark-matter hypothesis are too big, relative to the cosmos as a whole, for the quantum approximation to work properly —so one would expect these phenomena to sit oddly with our lesser-scale physics— that would seem to suggest that the total size of the cosmos is finite (since in an infinite cosmos, the ratio of the size of a galaxy to the size of the universe would be exactly zero, no different than the ratio for an electron). Although, as an alternative, one might suppose such an effect could derive, in an infinite cosmos, from network connections that aren't distributed altogether uniformly across the cosmos (so that connections with the infinite bulk of things get damped out).

With the sort of size presumed necessary to the properties of interest, I won't be able to get away with the sort of size-based simplifying trick I've gotten away with before, as with a toy cosmos that has only four possible states. We can't expect to run a simulation with program states comparable in size to the cosmos; Moore's law won't stretch that far. For this sort of research I'd expect to have to learn, if not invent, some tools well outside my familiar haunts.

The form of cosmic rewrite rules seems very much up-for-grabs, and I've been modeling guesses on λ-like calculi while trying to stay open to pretty much any outre possibility that might suggest itself. In λ-like rewriting, each rewriting rule has a redex pattern, which is a local geometric shape that must be matched; it occurs, generally, only in the geometry, with no constraints on the network. The redex-pattern may call for the existence of a tangential network connection —the β-rule of λ-calculus does this, calling for a variable binding as part of the pattern— and the tangential connection may be rearranged when applying the rule, just as the local geometry specified by the redex-pattern may be rearranged. Classical λ-calculus, however, obeys hygiene and co-hygiene conditions: hygiene prohibits the rewrite rule from corrupting any part of the network that isn't tangent to the redex-pattern, while co-hygiene prohibits the rewrite rule from corrupting any part of the geometry that isn't within the redex-pattern. Impure variants of λ-calculus violate co-hygiene, but still obey hygiene. The guess I've been exploring is that the rewriting rules of physics are hygienic (and Church-Rosser), and gravity is co-hygienic while the other fundamental forces are non-co-hygienic.

I've lately had in mind that, to produce the right sort of probability distributions, the fluctuations of cosmic rewriting ought to, in essence, compare the different possible behaviors of the subsystem-under-consideration. Akin to numerical solution of a problem in the calculus of variations.

Realizing that the shape of spacetime is going to have to emerge from all this, the question arises —again— of why some connections to an event should be "geometry" while others are "network". The geometry is relatively regular and, one supposes, stable, while the network should be irregular and highly volatile, in fact the seeming-randomness depends on it being irregular and volatile. Conceivably, the redex-patterns are geometric (or mostly geometric) because the engagement of those connections within the redex-patterns cause those connections to be geometric in character (regular, stable), relative to the evolution of the cosmic state.

The overall character of the network is another emergent feature likely worth attention. Network connections in λ-calculus are grouped into variables, sub-nets defined by a binding and its bound instances, in terms of which hygiene is understood. Variables, as an example of network structure, seem built-in rather than emergent; the β-rule of λ-calculus is apparently too wholesale a rewriting to readily foster ubiquitous emergent network structure. Physics, though, seems likely to engage less wholesale rewriting, from which there should also be emergent structure, some sort of lumpiness —macrostructures— such that (at a guess) incremental scrambling of network connections would tend to circulate those connections only within a particular lump/macrostructure. The apparent alternative to such lumpiness would be a degree of uniform distribution that feels, to my intuition anyway, unnatural. One supposes the lumpiness would come into play in the nature of stable states that the system eventually settles into, and perhaps the size and character of the macrostructures would determine at what scale the quantum approximation ceases to hold.

Connections

Clearly, how the connections between nodes —the edges in the graph— are set up is the first thing we need to know, without which we can't imagine anything else concrete about the calculus. Peripheral to that is whether the nodes (or, for that matter, the edges) are decorated, that is, labeled with additional information.

In λ-calculus, the geometric connections are of just three forms, corresponding to the three syntactic forms in the calculus: a variable instance has one parent and no children; a combination node has one parent and two children, operator and operand; and a λ-expression has one parent and one child, the body of the function. For network connections, ordinary λ-calculus has one-to-many connections from each binding to its bound instances. These λ network structures —variables— are correlated with the geometry; the instances of a variable can be arbitrarily scattered through the term, but the binding of the variable, of which there is exactly one, is the sole asymmetry of the variable and gives it an effective singular location in the syntax tree, required to be an ancestor in the tree of all the locations of the instances. Interestingly, in the vau-calculus generalization of λ-calculus, the side-effectful bindings are somewhat less uniquely tied to a fixed location in the syntax tree, but are still one-per-variable and required to be located above all instances.

Physics doesn't obviously lend itself to a tree structure; there's no apparent way for a binding to be "above" its instances, nor apparent support for an asymmetric network structure. Symmetric structures would seem indicated. A conceivable alternative strategy might use time as the "vertical" dimension of a tree-like geometry, though this would seem rather contrary to the loss of absolute time in relativity.

A major spectrum of design choice is the arity of network structures, starting with whether network structures should have fixed arity, or unfixed as in λ-like calculi. Unfixed arity would raise the question of what size the structures would tend to have in a stable state. Macrostructures, "lumps" of structures, are a consideration even with fixed arity.

Termination

In exploring these realms of possible theory, I often look for ways to defer aspects of the theory till later, as a sort of Gordian-knot-cutting (reducing how many intractable questions I have to tackle all at once). I've routinely left unspecified, in such deferral, just what it should mean for the cosmic rewriting system to "settle into a stable state". However, at this point we really have no choice but to confront the question, because our explicit main concern is with mathematical properties of the probability distribution of stable states of the system, and so we can do nothing concrete without pinning down what we mean by stable state.

In physics, one tends to think of stability in terms of asymptotic behavior in a metric space; afaics, exponential stability for linear systems, Lyapunov stability for nonlinear. In rewriting calculi, on the other hand, one generally looks for an irreducible form, a final state from which no further rewriting is possible. One could also imagine some sort of cycle of states that repeat forever, though making that work would require answers to some logistical questions. Stability (cyclic or otherwise) might have to do with constancy of which macrostructure each of an element's network connections associates to.

If rewriting effectively explores the curvature of the action function (per the calculus of variations as mentioned earlier), it isn't immediately obvious how that would then lead to asymptotic stability. At any rate, different notions of stability lead to wildly different mathematical developments of the probability distribution, hence this is a major point to resolve. The choice of stability criterion may depend on recognizing what criterion can be used in some technique that arrives at the right sort of probability distribution.

There's an offbeat idea lately proposed by Tim Palmer called the invariant set postulate. Palmer, so I gather, is a mathematical physicist deeply involved in weather prediction, from which he's drawn some ideas to apply back to fundamental physics. A familiar pattern in nonlinear systems, apparently, is a fractal subset of state space which, under the dynamics of the system, the system tends to converge upon and, if the system state actually comes within the set, remains invariant within. In my rewriting approach these would be the stable states of the cosmos. The invariant set should be itself a metric space of lower dimension than the state space as a whole and (if I'm tracking him) uncomputable. Palmer proposes to postulate the existence of some such invariant subset of the quantum state space of the universe, to which the actual state of the universe is required to belong; and requiring the state of the universe to belong to this invariant set amounts to requiring non-independence between elements of the universe, providing an "out" to cope with no-go theorems such as Bell's theorem or the Kochen–Specker theorem. Palmer notes that while, in the sort of nonlinear systems this idea comes from, the invariant set arises as a consequence of the underlying dynamics of the system, for quantum mechanics he's postulating the invariant set with no underlying dynamics generating it. This seems to be where my approach differs fundamentally from his: I suppose an underlying dynamics, produced by my cosmic rewriting operation, from which one would expect to generate such an invariant set.

Re Bell and, especially, Kochen-Specker, those no-go theorems rule out certain kinds of mutual independence between separate observations under quantum mechanics; but the theorems can be satisfied —"coped with"— by imposing some quite subtle constraints. Such as Palmer's invariant set postulate. It seems possible that Church-Rosser-ness, which tampers with independence constraints between alternative rewrite sequences, may also suffice for the theorems.

Duality

What if we treated the lumpy macrostructures of the universe as if they were primitive elements; would it be possible to then describe the primitive elements of the universe as macrostructures? Some caution is due here for whether this micro/macro duality would belong to the fundamental structure of the cosmos or to an approximation. (Of course, this whole speculative side trip could be a wild goose chase; but, as usual, on one hand it might not be a wild goose chase, and on the other hand wild-goose-chasing can be good exercise.)

Perhaps one could have two coupled sets of elements, each serving as the macrostructures for the other. The coupling between them would be network (i.e., non-geometric), through which presumably each of the two systems would provide the other with quantum-like character. In general the two would have different sorts of primitive elements and different interacting forces (that is, different syntax and rewrite-rules). Though it seems likely the duals would be quite different in general, one might wonder whether in a special case they could sometimes have the same character, in which case one might even ask whether the two could settle into identity, a single system acting as its own macro-dual.

For such dualities to make sense at all, one would first have to work out how the geometry of each of the two systems affects the dynamics of the other system — presumably, manifesting through the network as some sort of probabilistic property. Constructing any simple system of this sort, showing that it can exhibit the sort of quantum-like properties we're looking for, could be a worthwhile proof-of-concept, providing a buoy marker for subsequent explorations.

On the face of it, a basic structural difficulty with this idea is that primitive elements of a cosmic system, if they resemble individual syntax nodes of a λ-calculus term, have a relatively small fixed upper bound on how many macrostructures they can be attached to, whereas a macrostructure may be attached to a vast number of such primitive elements. However, there may be a way around this.

Scale

I've discussed before the phenomenon of quasiparticles, group behaviors in a quantum-mechanical system that appear (up to a point) as if they were elementary units; such eldritch creatures as phonons and holes. Quantum mechanics is fairly tolerant of inventing such beasts; they are overtly approximations of vastly complicated underlying systems. Conventionally "elementary" particles can't readily be analyzed in the same way —as approximations of vastly complicated systems at an even smaller scale— because quantum mechanics is inclined to stop at Planck scale; but I suggested one might achieve a similar effect by importing the complexity through network connections from the very-large-scale cosmos, as if the scale of the universe were wrapping around from the very small to the very large.

We're now suggesting that network connections provide the quantum-like probability distributions, at whatever scale affords these distributions. Moreover, we have this puzzle of imbalance between, ostensibly, small bounded network arity of primitive elements (analogous to nodes in a syntax tree) and large, possibly unbounded, network arity of macrostructures. The prospect arises that perhaps the conventionally "elementary" particles —quarks and their ilk— could be already very large structures, assemblages of very many primitive elements. In the analogy to λ-calculus, a quark would correspond to a subterm, with a great deal of internal structure, rather than to a parse-tree-node with strictly bounded structure. The quark could then have a very large network arity, after all. Quantum behavior would presumably arise from a favorable interaction between the influence of network connections to macrostructures at a very large cosmic scale, and the influence of geometric connections to microstructures at a very small scale. The structural interactions involved ought to be fascinating. It seems likely, on the face of it, that the macrostructures, exhibiting altogether different patterns of network connections than the corresponding microstructures, would also have different sorts of probability distributions, not so much quantum as co-quantum — whatever, exactly, that would turn out to mean.

If quantum mechanics is, then, an approximation arising from an interaction of influences from geometric connections to the very small and network connections to the very large, we would expect the approximation to hold, not at the small end of the range of scales, but only at a subrange of intermediate scales — not too large and at the same time not too small. In studying the dynamics of model rewriting systems, our attention should then be directed to the way these two sorts of connections can interact to reach a balance from which the quantum approximation can emerge.

At a wild, rhyming guess, I'll suggest that the larger a quantum "particle" (i.e., the larger the number of primitive elements within it), the smaller each corresponding macrostructure. Thus, as the quanta get larger, the macrostructures get smaller, heading toward a meeting somewhere in the mid scale — notionally, around the square root of the number of primitive elements in the cosmos — with the quantum approximation breaking down somewhere along the way. Presumably, the approximation also requires that the macrostructures not be too large, hence that the quanta not be too small. Spinning out the speculation, on a logarithmic scale, one might imagine the quantum approximation working tolerably well for, say, about the middle third of the lower half of the scale, with the corresponding macrostructures occupying the middle third of the upper half of the scale. This would put the quantum realm at a scale from the number of cosmic elements raised to the 1/3 power, down to the number of cosmic elements raised to the 1/6 power. For example, if the number of cosmic elements were 10¹²⁰, quantum scale would be from 10⁴⁰ down to 10²⁰ elements. The takeaway lesson here is that, even if those guesses are off by quite a lot, the number of primitive elements in a minimal quantum could still be rather humongous.

Study of the emergence of quasiparticles seems indicated.

Nabla

2019-04-18T21:00:00.000-07:00

They [quaternions] are relatively poor 'magicians'; and, certainly, they are no match for complex numbers in this regard.
— Roger Penrose, The Road to Reality, 2005, §11.2.

In this post, I'm going to explore some deep questions about the nature of quaternion differentiation.

Along the way I'm going to suggest some reasons Penrose's assessment quoted above may be somewhat off-target. I'm quite interested in Penrose's view of quaternions because he presents a twenty-first century form of the classical arguments against quaternions, with an (afaics) sincere effort at objectivity, by someone who patently does appreciate the profound power of elegant mathematics (in apparent contrast to the vectorists' side of the 1890s debate). The short-short version: Not only do I agree that quaternions lack the magic of complex numbers, I think it would be bizarre if they had that magic since they aren't complex numbers — but I see clues suggesting they've other magic of their own.

If I claimed to know just what the magic of quaternions is, it would be safe to bet I'd be wrong; the challenge is way too big for answers to come that easily. However, in looking for indirect evidence that some magic is there to find, I'll pick up some clues to where to look next, the ambiguous note on which this post will end... or rather, trail off.

Before I can start in on all that, though, I need to provide some background.

Contents
Setting the stage
Quaternions
Doubting classical nabla
Considering quaternions
Full nabla
Partial derivatives
Generalized quaternions
Rotation
Minkowski
Langlands

Setting the stage

When I was first learning vector calculus as a freshman in college (for perspective, that's about when Return of the Jedi came out), I initially supposed that the use of symbol ∇ in three different differential operators — ∇, ∇×, ∇· — was just a mnemonic device. My father, who'd been interested in quaternions since, as best as I can figure, when he was in college (about when Casablanca came out), promptly set me straight: those operators look similar because they're all fragments of a single quaternion differential operator called nabla:

∇ =
i ∂

∂x

+
j ∂

∂y

+
k ∂

∂z

.

If you know a smattering of vector calculus, you may be asking, isn't that just the definition of the gradient operator? No, because of the seemingly small detail that the factors i, j, k aren't unit vectors, i, j, k — i, j, k are imaginary numbers. Which has extraordinary consequences. I'd better take a moment to explain quaternions.

Quaternions

A vector space over some kind of scalar numbers is the n-tuples of scalars, for some fixed n, together with scalar multiplication (to multiply a vector by a scalar, just multiply all the elements of the vector by that scalar) and vector addition (add corresponding elements of the vectors). An algebra is a vector space equipped with an internal operation called multiplication ("internal" meaning you multiply a vector by a vector, rather than by a scalar) and a multiplicative identity, such that scalar multiplication commutes with internal multiplication, and internal multiplication is bilinear (fancy term, simple once you've seen it: each element of the product is a polynomial in the elements of the factors, where each term in each polynomial has one element from each factor).

Whatever interesting properties a particular algebra has are, in a sense, contained in its internal multiplication. So when we speak of the "algebraic structure" of an algebra, what we're talking about is really just its multiplication table.

Quaternions are a four-dimensional hypercomplex algebra. They're denoted by the symbol ℍ (after Hamilton, their discoverer). Hypercomplex just means that the first of the four basis elements is the multiplicative identity, so that the first dimension of the vector space can be identified with the scalars, in this case the real numbers, ℝ. Traditionally the four basis elements are called 1, i, j, k; which said, hereafter I'll prefer to call the imaginaries i₁, i₂, i₃, and occasionally use i₀ as a synonym for 1. The four real vector-space components of a quaternion, I'll indicate by putting subscripts 0,1,2,3 on the name of the quaternion; thus, a = a₀ + a₁i₁ + a₂i₂ + a₃i₃ = Σ a_ki_k.

Quaternion multiplication is defined by i₁² = i₂² = i₃² = i₁ i₂ i₃ = −1, where multiplication of the imaginary basis elements is associative (i₁(i₂ i₃) = (i₁ i₂) i₃ and so on) and different imaginary basis elements anticommute (i₁ i₂ = − i₂ i₁ and so on). The whole multiplication table can be put together from these few rules, and we have the quaternion product (take a deep breath):

ab = (a₀b₀ − a₁b₁ − a₂b₂ − a₃b₃)

+ i₁ (a₀b₁ + a₁b₀ + a₂b₃ − a₃b₂)

+ i₂ (a₀b₂ − a₁b₃ + a₂b₀ + a₃b₁)

+ i₃ (a₀b₃ + a₁b₂ − a₂b₁ + a₃b₀) .

This is that bilinear multiplication table mentioned earlier, where each element of the product is a simple polynomial in elements of the factors. If you stare at this a bit, you can also see that when a and b are imaginary (that is, a₀ = b₀ = 0), the real part of the product is minus the dot product of the vectors, and the imaginary part of the product is the cross product of the vectors: ab = a×b − a·b.

A few handy notations: real part re(a) = a₀; imaginary part i‍m(a) = a − a₀; conjugate a^* = re(a) − i‍m(a); norm ||a|| = sqrt (Σ a_k²).

Quaternion multiplication is associative. Quaternion multiplication is also non-commutative, which was a big deal when Hamilton first discovered quaternions in 1843, because until then all known kinds of numbers had obeyed all the "usual" laws of arithmetic. But what's really interesting about quaternion multiplication — at least, on the face of it — is that it has unique division. That is, for all quaternions a and b, where a is non-zero, there is exactly one quaternion x such that ax = b, and exactly one quaternion x such that x‍a = b. In particular, with b=1, this says that every non-zero a has a unique left-multiplicative inverse, and a unique right-multiplicative inverse. These are actually the same number, which we write

a⁻¹ =
a^*
||a||²

(the conjugate divided by the square of the norm). So a a⁻¹ = a⁻¹a = 1.

Division algebras are very special; pathological cases aside, there are only four of them: real numbers, complex numbers, quaternions, and octonions. (Yes, there are hypercomplex numbers with seven imaginaries that are even more mind-bending than quaternions. But that's another story.)

To solve equation ax=b, we left-multiply both sides by a⁻¹, thus a⁻¹b = a⁻¹(ax) = (a⁻¹a)x = x; and likewise, the solution to x‍a=b is x = b a⁻¹. We call right-multiplication by a⁻¹ "right-division by a", and write b / a = b a⁻¹; similarly, left-division a \ b = a⁻¹ b. (Backslash, btw, is such a dreadfully overloaded symbol, I can somewhat understand why I haven't seen others use it this way; but I'm quite bowled over by how elegantly natural this use seems to me. It preserves the order of symbols when applying associativity: (a / b) c = a (b \ c).) Naturally, I won't write division vertically unless the denominator is real.

Okay, we're girded for battle. Back to nabla.

Doubting classical nabla

Our definition of nabla, you'll recall, was

∇ =
i₁ ∂

∂x₁

+
i₂ ∂

∂x₂

+
i₃ ∂

∂x₃

.

This operator is immensely useful; people have been making great use of it, or its fragments in vector calculus, for well over a century and a half. But three things about it bother me, the first two of which I've seen remarked by papers written in the past few decades.

Handedness

My first bother follows from the fact that quaternion multiplication isn't commutative. This was, remember, a dramatic new idea in 1843; the key innovation that empowered Hamilton's discovery, because what he wanted — something akin to complex numbers but with arithmetic sensible in three dimensions — requires non-commutative multiplication. But if multiplication isn't commutative, why should the partial derivatives in the definition of ∇ necessarily be left-multiplied by the imaginaries? Why shouldn't they be right-multiplied, instead?

I've seen modern papers that use together both left and right versions of nabla. Peter Michael Jack, who's had a web presence for (I believe) nearly as long as there's been a web, has suggested using a prefix operator for the left-multiplying nabla and a postfix operator for the right-multiplying nabla. Candidly, I find that notation dire hard to read. The point of prefix operators (which Hamilton championed, the better part of a century before Jan Łukasiewicz) is to make expressions much simpler to parse, and mixing prefix with postfix doesn't do that. Another notation I see used in at least one place modernly is a subscript q on the left or right of the nabla symbol to indicate which side to put the imaginary factors on. I'm not greatly enthused by that notation either because it uses a multi-part symbol. I have an alternative solution in mind for the notational puzzle; but I'll want to make clear first the whole of what I'm trying to notate.

Truncation

My second bother is that traditionally defined nabla isn't even a full quaternion operator. It only has the partial derivatives with respect to the three imaginary components. Where's the partial with respect to the real component? In the 1890s debate, the quaternionists said quaternions are profoundly meaningful as coherent entities, and the vectorists said scalars and vectors are meaningful while their sum is meaningless. Now, I'm quite sympathetic to the importance of mathematical elegance, but come on, guys, make up your mind! Either you go full-quaternion, or you don't. A nabla that only acts on vector functions is just lame.

There's a good deal of history related to the question of imaginary nabla versus full nabla. The truncation of nabla to the imaginary components throughout the nineteenth century may have been partly an historical accident. Consideration of the four-dimensional operator seems to have started just before the turn of the twentieth century, and modern quaternionists I've observed use a full-quaternion operator. I'll have more to say about this history in the next section.

Meaning

My third bother is a byproduct, as best I can figure, of my persistent sense of arbitrariness about the nabla operator. (This is the difficulty I've never seen anyone else remark upon. Perhaps I'm missing something everyone else gets, or maybe I've just never looked in the right place; but then again, maybe people are just reluctant to publicly admit something doesn't make sense to them. That might explain a lot about the world.) It was never obvious to me why it should be meaningful — or, if you prefer the word, useful — to multiply the partial derivatives by the imaginaries in the first place. It's clear to me why you'd do that if you were defining gradient, because gradient is meaningful for any number of dimensions, and doesn't depend in any way on the existence of a division operation. But quaternions do have unique division, in fact it's rather a big deal that they have unique division, and the usual definition of ordinary derivative involves dividing by Δx. So why are we multiplying by the imaginaries, instead of dividing by them?

Considering quaternions

Some of my above questions have historical answers, which also bear on the challenge raised by Penrose in the epigraph at the beginning of this post.

By Chapter 11 of The Road to Reality, where Penrose makes that remark (and where he also candidly describes the question of quaternions' use in physics as a "can of worms"), he's already described some marvelous properties of complex numbers, culminating with one (hyper‍functions) only published in 1958. Which raises an important point. Complex numbers have been intensely studied by mainstream researchers throughout the modern era of physics, yet Penrose's crowning bit of complex 'magic' wasn't discovered until 1958?

Compare that to how much, or rather how little, scrutiny quaternions have received. Hamilton discovered them in 1843; but Hamilton was a mathematical genius, not a great communicator. Quaternions, so I gather, remained the archetype of a baffling abstract theory until Einstein's General Theory of Relativity took over that role. The first tome Hamilton wrote on the subject, Lectures on Quaternions, daunted the mathematical luminaries of the day; his later Elements of Quaternions, published incomplete in 1866 following his death in 1865, wasn't easy either. The first accessible introduction to the subject was Peter Guthrie Tait's 1867 Elementary Treatise on Quaternions. Quaternions got a big publicity boost when James Clerk Maxwell used them (for their conceptual clarity, rather than for mundane calculations) in his 1873 Treatise on Electricity and Magnetism. And then in the 1890s quaternions "lost" the great vectors-versus-quaternions debate and their use gradually faded thereafter. There simply weren't all that many people working with quaternions in the nineteenth century, and as world population increased in the twentieth century quaternions were no longer a hot topic.

Moreover, exploration of nabla got off on the wrong foot. Hamilton seems to have first dabbled with it several years before he discovered quaternions, as a sort of "square root" of the Laplacian, at which time naturally he only gave it three components; and when he adapted it to a quaternionic form it still had only three components. He didn't do much with it in the Lectures, and planned a major section on it for the Elements but, afaict, died before he got to it. James Clerk Maxwell was a first-class mind and a passionate quaternion enthusiast, but died at the age of forty-eight in 1879 — the same year as William Kingdon Clifford, who was only thirty-three, another first-class mind who had explored quaternions. The full-quaternion nabla was finally looked at, preliminarily, in 1896 by Shunkichi Kimura, but by that time the quaternionic movement was starting to wind down. Yes, quaternions were still being used for some decades thereafter, but less and less, and the notations get harder and harder to follow as quaternionic notations were hybridized with Gibbs vector notations, further disrupting the continuity of the tradition and undermining systematic progress. Imho, it's entirely possible for major insights to still be waiting patiently.

A subtle point on which Penrose's portrayal of quaternions is somewhat historically off: Penrose says that although to a modern mind the one real and three imaginary components of quaternions naturally suggest the one time and three space dimensions of spacetime, that's just because we've been acclimated to the idea of spacetime by Einstein's theory of relativity; and quaternions don't actually work for relativity because they have the wrong signature (I'll say a bit more about this below; see here). But as far as the notion of spacetime goes, the shoe is on the other foot. Hamilton expected mathematics to coincide with reality (a principle Penrose also, broadly, embraces), and as soon as he discovered quaternions he did connect their structure metaphysically with the four dimensions of space and time. Penrose is quite right, I think, that ideas like this get to be "in the air"; but in this case it looks to me like it first got into the air from quaternions. So I'm more inclined to suspect quaternions suggested spacetime and thereby subtly contributed to relativity, rather than relativity and spacetime suggesting a connection to quaternions. The latter implies an anachronistic influence that must be illusory (for relativity to influence Hamilton would seem to require a TARDIS); the former hints at some deeper magic.

The point about quaternions having the wrong signature has its own curious historical profile. Penrose expresses very much the mainstream party line on the issue, essentially echoing the assessment of Hermann Minkowski a century earlier who, in formulating his geometry of spacetime, explicitly rejected quaternions, saying they were "too narrow and clumsy for the purpose". The basic mathematical point here (or, at least, a form of it) is that the norm of a quaternion is the square root of the sum of the squares of its components, √(t²+x²+y²+z²), whereas in Minkowski spacetime the three spatial elements should be negative, √(t²−x²−y²−z²). But here the plot thickens. Minkowski, who so roundly rejected quaternions, defines a differential operator that is, structurally, the four-dimensional nabla. As for quaternions and relativity, Ludwik Silberstein (a notable popularizer of relativity, in his day) did use quaternions for special relativity — except that, to be precise, he used biquaternions.

Biquaternions (which Hamilton had also worked with) are quaternions whose four coefficients are complex numbers in a fourth, independent imaginary. Or, equivalently, they're complex numbers whose two coefficients are quaternions in an independent set of three imaginaries. Either way, that's a total of eight real coefficients. Biquaternions do not, of course, have unique division. However, there are some oddly suggestive features to Silberstein's treatment. His spacetime vectors have only four non-zero real coefficients (of the four quaternion coefficients, a₀ is real while a_k≥1 are imaginary, so that Σ(a_ki_k)² = a₀²−||a₁||²−||a₂||²−||a₃||²; while other biquaternions he considers have imaginary a₀ and real a_k≥1). Moreover, he prominently uses the "inverse" of a biquaternion, defined structurally just as for quaternions,

a^*

||a||²

, notwithstanding the technical lack of general biquaternion division.

Silberstein's approach contrasts with the quaternionic treatment of special relativity by P.A.M. Dirac, published in 1945 as part of the centennial celebration for the discovery of quaternions. Dirac used real quaternions on the grounds that since the merit of quaternions is in their having division, it would be pointless to use biquaternions which are of no particular mathematical interest. His mapping of spacetime coordinates onto real quaternions was unintuitive. But looking at the oddly familiar-looking patterns in Silberstein's treatment, and Minkowski's operator which is hard not to think of as a full quaternionic nabla, one might well wonder if there is something going on that defies Dirac's claim about the importance of unique division. Perhaps we've been incautious in our assumptions about just where the deep magic is to be found.

There are two pitfalls in this kind of thinking, which the inquirer must thread carefully between. On one hand, one might assume there is some unknown deep magic here, rather than trying to work out what it is; this not only would lean toward numerology, but if there really is something to be found, would miss out on the benefits of finding it. On the other hand, one could derive some superficial mathematical account of the particular mathematical relationships involved, based on math one already knows about, and assume this is all there is to the matter; which would again guarantee that any deeper insight waiting to be found would not be found. (Current mainstream thinking, btw, falls into the latter pitfall, essentially reasoning that geometric algebras are useful in a way that quaternions are not, therefore quaternions are not useful.) Is there any situation where it would really be time to give up the search altogether? Well, yes, one does come to mind — if one were to arrive at some deep insight into why one should really believe there isn't some deep magic here. Which might itself be some rather deep and interesting magic.

Frankly, I don't even know quite where to look for this hypothetical deep magic. I sense its presence, as I've just described; but so far, I'm exploring various questions in the general neighborhood, patiently, with the notion that if these sorts of great insights naturally emerge from a large, broad body of research (as they have done for complex numbers), the chances of finding such a thing should improve as one increases the overall size and breadth of one's body of lesser insights.

Which brings me back to the particular point I'm pursuing in this post, the full quaternion nabla.

Full nabla

From a purely technical perspective, it isn't difficult to define four versions of the full quaternion nabla, differing only by whether each imaginary acts on its corresponding partial derivative by left-multiplying, right-multiplying, left-dividing, or right-dividing. The only remaining — purely technical — question is how to write these four different operators in an uncluttered way that keeps them straight. Since the traditional nabla has three partial derivatives and is denoted by a triangle, I'll denote these full nablas, with four partial derivatives, by a square. To keep track of how the imaginaries are introduced, I'll put a dot inside the square, near one of the corners: upper left for left-multiplying by imaginaries, upper right for right-multiplying, lower left for left-dividing, lower right for right-dividing. (This operator notation affords coherence, as the dot is inside so there's no mistaking it for a separate element, and, as a bonus, should also be easy to write quickly and accurately by hand on the back of an envelope.)

Let a = f(x). Noting that for imaginary i_k, 1/i_k = −i_k, the full-quaternion nablas are

●

a =
∂a

∂x₀

+
i₁ ∂a

∂x₁

+
i₂ ∂a

∂x₂

+
i₃ ∂a

∂x₃

     ●

a =
∂a

∂x₀

+
∂a i₁

∂x₁

+
∂a i₂

∂x₂

+
∂a i₃

∂x₃

●
a =
∂a

∂x₀

−
i₁ ∂a

∂x₁

−
i₂ ∂a

∂x₂

−
i₃ ∂a

∂x₃

= (
●

a ) ‌^*

     ●
a =
∂a

∂x₀

−
∂a i₁

∂x₁

−
∂a i₂

∂x₂

−
∂a i₃

∂x₃

= (
     ●

a ) ‌^*

and when we expand a = a₀ + a₁i₁ + a₂i₂ + a₃i₃,

●

a
 =  (
∂a₀
∂x₀
−
∂a₁
∂x₁
−
∂a₂
∂x₂
−
∂a₃
∂x₃
)

+ i₁ (
∂a₁
∂x₀
+
∂a₀
∂x₁
+
∂a₃
∂x₂
−
∂a₂
∂x₃
)

+ i₂ (
∂a₂
∂x₀
−
∂a₃
∂x₁
+
∂a₀
∂x₂
+
∂a₁
∂x₃
)

+ i₃ (
∂a₃
∂x₀
+
∂a₂
∂x₁
−
∂a₁
∂x₂
+
∂a₀
∂x₃
) .

Here, the left-hand column is the partial with respect to x₀, and the rest is the fragmentary differential operators from vector calculus: the rest of the top row is minus the divergence, the rest of the diagonal is the gradient, and the remaining six terms are the curl. When we reverse the order of multiplication for the right-multiplying

●

, the imaginaries commute with the scalars and with themselves, but anticommute with each other — so everything stays the same except that the sign of the curl is reversed. We have

●

=
∂
∂x₀
− div + grad + curl

     ●

=
∂
∂x₀
− div + grad − curl

●

=
∂
∂x₀
+ div − grad − curl

     ●

=
∂
∂x₀
+ div − grad + curl .

By taking differences between these nablas, one can isolate the partial with respect to x₀, and the curl, and... the gradient minus the divergence. One cannot, however, separate the gradient from the divergence this way, which raises the suspicion that the gradient and divergence are, in some profound sense, a single entity. There may be some insights waiting here into the intuitive meanings of these various fragments of the full nabla.

Wait. Wasn't part of the point of the 1890s debate that the quaternionists maintained the whole quaternion was in a profound sense a single entity? Why are we still talking about the meanings of fragments of this thing, instead of the whole? And while we're at it, why is it in any way meaningful to multiply-or-divide the partial derivatives by the basis elements?

Partial derivatives

From here, the path I've been following breaks up, with faint trails scattering off in many directions. No one trail immediately suggests itself to me as especially worth a protracted stroll, so for now I'll take a quick look down the first turn or so of several, getting a sense of the immediate neighborhood, and let my back‍brain mull over what to explore in some future post.

Possibly, in my quest for the deeper meaning of the nabla operator, I may be asking too much. With the caveat that this may be one of those situations where it's right to ask too much; some kinds of results must be pursued that way; but it's worth keeping in mind that, idealism as may be, there's always been a strong element of utility in the nabla tradition. Starting with, as noted above, the pre-quaternion history of nabla, the choice of operator has been in significant part a matter of what works.

A secondary theme that's been in play at least since Shunkichi Kimura's 1896 treatment is total derivatives versus partial derivatives. Without tangling in the larger question of coherent meaning, Kimura did address this point explicitly and up-front: why write

●

a =
∂a

∂x₀

+
i₁ ∂a

∂x₁

+
i₂ ∂a

∂x₂

+
i₃ ∂a

∂x₃

rather than

●

a =
da

dx₀

+
i₁ da

dx₁

+
i₂ da

dx₂

+
i₃ da

dx₃

?

Kimura, after noting that the two forms are interchangeable when the x_k are independent, chose partial derivatives. And reached this choice by considering the utility of the two candidate operators in expressing some standard equations, and adopting the operator he finds notationally more convenient. It figures this would be the operator using partial derivatives, which are more technically primitive building blocks and thus —one would think— ought logically to provide a more versatile foundation.

An (arguably) more definite form of the total/partial question appears in modern quaternionic treatments of Maxwell's equations ([1], [2]), with the peculiar visible consequence that the definition of full nabla in these treatments has a stray factor of 1/c on the partial with respect to time (x₀). On investigation, this turns out to be a consequence of starting out with the total derivative with respect to time, supposing (as I track this, three and a half decades after I took diffy Q‍s) the whole is time-dependent. Expanding the partials,

d

d‍x₀

=
∂

∂x₀

+

∂x₁ ∂

∂x₀ ∂x₁

+

∂x₂ ∂

∂x₀ ∂x₂

+

∂x₃ ∂

∂x₀ ∂x₃

.

Now, the partials

∂x_k≥1

∂x₀

are the velocities of propagation along the spatial axes, which for Maxwell's equations are taken to be the speed of light, c. This factor of c therefore shows up on three out of four partials, but not on the partial with respect to time; for convenience —that again— one defines an operator with a factor of 1/c on it, which eliminates the extra factors of c on three of the partials, but introduces a 1/c on the partial with respect to time.

And then there is the matter of orienting the partials. Which I'm still foggy on, how the imaginaries get in there and thus whether they multiply or divide, on the left or on the right. I see treatments just splicing the imaginaries in with at most a casual reference to orientation in an algebra, which early classroom experience has conditioned me to treat as someone who understands it all and doesn't take time to explain every little thing (I've been in that position a few times myself); but over time I've started to suspect that the folks acting so in this case might not really understand it any better than I do (I've been in that position, too).

Generalized quaternions

Quaternions lost out on the concrete front to vector calculus. But they also lost out on the abstract front. Mathematicians took Hamilton's idea of using axioms to define more general forms of numbers and reason about their properties, and ran with it. Linear algebra. Clifford algebras. Lie and Jordan algebras. Rings. Groups. Monoids. Semi-groups. People who want special numbers won't go as far as quaternions, and people who want general numbers won't stop at quaternions.

Yet, generalized quaternions — quaternions whose four coefficients aren't real numbers — have occasionally been employed. Why? On the face of it, generalized quaternions don't have the specific properties that make real quaternions unique. Are they used, then, out of some perceived mystical significance of quaternions, or is there actually something structural about quaternions, aside from their unique mathematical properties as a division algebra, that they can confer even in the generalized setting? I do not, of course, have a decisive answer for this question. I do have some places to look for small insights building toward prospects of an answer.

The places to look evidently fall into two groups, those that look within the scope of real quaternions and those that look at generalized forms of quaternions. In looking at real quaternions the point is to understand what they have to offer beyond mere unique division, that might possibly linger after the unique division itself has dropped away. I'll have more to say, further below, about real than generalized quaternions; I'm simply not familiar with much research using generalized quaternions as such, as most researchers either stick with real quaternions or drop quaternion structure.

On the generalized-quaternions front, I've already mentioned Silberstein; but, tbh, all I get from Silberstein is the question. That is, Silberstein's work suggests to me there's something of interest in generalized quaternions, but doesn't go far enough to identify what. There are some well-known generalizations that go off in different directions from Silberstein; besides geometric algebras, which are enjoying some popularity atm, there's the Cayley–Dickson construction, which offers an infinite sequence of hypercomplex algebras with 2ⁿ components, each losing just a bit more well-behavedness: complexes, quaternions, octonions, sedenions, and on indefinitely (though usually not bothering with fancy names beyond the sedenions). So far, I haven't felt any of those sorts of generalizations were retaining the character of quaternions; so that, whatever merits those generalizations might enjoy in themselves, they wouldn't offer insights into the peculiar merits of quaternions.

As it happens, I do know of someone who continues further in what appears to be the same direction as Silberstein. But there's a catch.

The work I'm thinking of was done about sixty years ago by a Swedish civil engineer by the name of Otto Fischer. He wrote two books on the subject, Universal Mechanics and Hamilton's Quaternions (1951) and Five Mathematical Structural Models in Natural Philosophy with Technical Physical Quaternions (1957). It happens I can study the earlier book all I want, because my father bought a copy which I've inherited. Fischer indeed did not stop at real quaternions nor biquaternions. He moved on to what he called quadric quaternions — quaternions whose coefficients are themselves quaternions in an independent set of imaginaries, thus with six elementary imaginaries in two sets of three, and sixteen real coefficients — and thence to double quadric quaternions, which are quadric quaternions whose sixteen coefficients are themselves quadric quaternions in independent imaginaries, thus twelve elementary imaginaries in four sets of three, and 256 real coefficients. If what is needed to bring out the secrets of generalized quaternions is a sufficiently general treatment, Fischer should qualify.

Looking back now, Fischer's work looks a bit fringe; but it didn't look so extra-paradigmatic at the time. The 1890s vectors-quaternions debates were in the outer reaches of living memory, about as far removed as the 1950s are today; and work on quaternions had been done by some prominent physicists within the past few years. In particular, Sir Arthur Eddington, who had tinkered with quaternions, had only recently died. Fischer's work was — deservedly — criticized for its density, but afaict wasn't dismissed out of hand, as such.

In any case, my current interest is on the periphery of things, rather than in the center of prevailing paradigm research; so I can afford to tolerate a certain off-beat character in Fischer's work — up until Fischer gives me a reason to think I've nothing further worthwhile to find in it. And Fischer comes across as competent and quite self-aware of the density and indirection of his work, which he seeks to mitigate — though there's a real question as to whether he succeeds.

What I really want to understand about Fischer's work is, having provided himself with such an immense array of generalized quaternionic structure, what does he use it for? There are some clues readily visible in the preface and final sections of the book; somehow he seems to be associating different quaternion subsets of his general numbers with different specialties, and he's playing some kind of games with "pyramids" of differential operators. To really get a handle on it all, I fear it may be necessary to confront the book in full depth from page 1, which I've tried far enough to realize it's the single densest mathematical treatment I've encountered (though he does take very seriously his own advice to "begin at the beginning", else I'd hold out no hope at all of making sense of it).

So, studying Fischer's work may be one source of... eventual... insights into the puzzle of generalized quaternions. It certainly isn't a short-term prospect; but, there it is.

Before getting back to real quaternions in the next section, I'll digress to remark that Fischer reinforces a belief I've held ever since I really started researching the history of quaternions — in 1986 — that what we really need in mathematics is a certain type of software.

By my reading of the history, the vectorists in the 1890s debate really did have one important practical point in their favor: if you have to deal with the algebra by hand, it seems it'd be vastly easier to not make careless errors when following the rectangular regimentation of matrix algebra than the spinning vortices of quaternion algebra. (Recalling from my earlier post, the equivalence between matrix and quaternion methods is akin to the equivalence between particles and waves — with quaternions playing the part of waves.) That is, if you try to do quaternion algebra, involving breaking things down into components, on the back of an envelope you're awfully likely to make a mistake; so I immediately imagined having a computer help you get it right. (I didn't imagine a graphical user interface, btw, as that technology really didn't exist yet for personal computing. Looking back, I find myself ambivalent about GUIs; sure, they can be sparkly, but they don't always help us think clearly; we're so busy thinking of how to use the graphics, we forget to think first and foremost about the logical structures we'd like to interface with.)

Thinking about this idea, I eventually decided the underlying logical structures one wants would be essentially proofs, so that in a sense the software would be a sort of "proof processor", by loose analogy with the "word processor". Achieving the fluidity of back-of-the-envelope algebra was always key to my concept; my occasional encounters with "symbolic math" software have given me the impression of something far too cumbersome for what I envision. Facilely moving between alternative paths of reasoning should be easy; symbol definition would seem to call for something halfway between conventional "declarative" and "imperative" styles. I also imagined the computer trying, in its free moments, to devise context-sensitive helpful suggestions for what to do next — without trying to take control of the proof task away from the human user. I've never been a fan of fully-automated proof, as such; in the early days of personal computing (as a commenter on another of my posts reminded me) we anticipated computers of the future would enhance our brain power, not attempt to replace it, and the enhancements weren't to be just increasing our ability to look things up, either.

Where does Fischer come into this? Well, Fischer not only deals with massive grids of coordinates, his notation looks extremely idiosyncratic to me, using different conventions than anything else I've seen. Perhaps a typical 1950s Swedish civil engineer would find much of it quite conventional. But, unless you spend all of your time in one narrow mathematical sub‍community, studying mathematics is a pretty heavily linguistic exercise, because every sub‍community has their own language and one is forever having to translate between them. Wouldn't it be nice to be able to just toggle some controls and switch between the way one author (such as Fischer) wrote, and the conventions used by whichever other author you prefer?

Btw, this software I'm describing? Not a minor interest. Not just a past interest. I still want it, all the more because, even though I've always felt it was doable and would be immensely valuable, afaict we're no closer to having it now than we were thirty years ago. Never assume that what you think is needed will be provided by somebody else. Think of it this way: if you can see it's doable and would be valuable, presumably you'd be more likely that most people to make it happen; so if you aren't going to the effort to make it happen, that's a sample of one suggesting that nobody else will go to the effort either. I've also never felt I could properly describe this software in words, so even if I was gifted with a team of programmers to implement it I couldn't tell them what to do; so I figured if it was going to happen I'd have to do it myself. Only, it looks like a huge project, so for one person to implement it would require a programming language with unprecedentedly vast abstractive power. By some strange coincidence, designing a programming language like that is something I've been striving for ever since.

Rotation

Another trail that, sooner or later, clearly needs to be explored is the relationship, at its most utterly abstract, between quaternions and rotation.

Hamilton was looking at rotations, from the start. Quaternions, as noted, stand in relation to matrices as waves to particles; in some profound sense, quaternions seem to be the essence of rotation. The ordinary understanding of quaternion division is that a quaternion is the ratio of two three-vectors, and the non-commutativity of quaternion multiplication then follows directly from recognition that rotations on a sphere produce different results if done in a different order. Even Silberstein, who was using biquaternions rather than real quaternions and was working in Minkowski rather than Euclidean spacetime, was doing rotations, which in itself suggests that what's going on is more than meets the eye.

This is a tricky point. The relationship between quaternions and rotation is readily explained, indeed rather trivialized, in terms of peculiarities of rotation in three-dimensional Euclidean space. This is very much the canonical view, the one embraced by Penrose. Real quaternions become a single case in a general framework, and are then easily dismissed as merely an aberration that loses its seeming specialness when the wider context is properly appreciated.

The weakness in this reasoning is that it depends on the choice of general framework. This would be easier to see if the framework involved were alternative rather than mainstream. Suppose there were two different general frameworks in which the specific case (here, quaternions) could be fit; and in one of these frameworks, the specific case appears incidental, while in the other framework it appears pivotal. It would then be hard to make a compelling case, based on the first framework, that the specific case is incidental, because the second framework would be right there calling that conclusion into question. If the first framework is the only one we know about, though, the same case can be quite persuasive. To even question the conclusion we'd have to imagine the possibility of an alternative framework; and actually finding such an alternative could be a formidable challenge. Especially with the possibility hanging over us that perhaps the alternative mightn't really exist after all.

Investigating this trail seems likely to become an intensive study in avoiding conceptual pitfalls while dowsing for new mathematics.

Minkowski

A narrow, hence more technically fraught, target for mathematical dowsing is Minkowski spacetime. Minkowski's decisive condemnation of a quaternionic approach —"too narrow and clumsy for the purpose"— is a standard quote on the subject, cited by quaternion opponents and proponents alike. If there is an alternative general framework to be found, after all, it'd have to handle Minkowski.

Without actually wading into this thing (not to be undertaken lightly), I can only note from a distance a few features that may be of interest when the time comes. The mechanical trouble in this is evidently to do with the pattern of signs, which seems reminiscent of the multiple variants of nabla (though the pessimist in me insists it can't be quite that easy); which, logically, oughtn't be applicable to the situation unless one were really already dealing with a derivative. Off hand, the only way that comes to mind for derivatives to come into it is if the whole physical infrastructure is something less obvious than what Minkowski was doing — which, yes, is cheating; and cheating (so to speak) is likely the only way to end up with a different answer than Minkowski did, so this might, just conceivably, be a hopeful development.

Langlands

I wondered whether even to mention this. The geometric Langlands correspondence lies at the extreme wide end of mathematical dowsing targets; about as poetic as mathematics comes (which is very poetic indeed), and at the same time about as esoteric as it comes (yea, verily).

Mathematics in its final form is, of course, highly formal (I say "of course", but see my earlier remarks on axioms as a legacy of quaternions). The ideas don't start out formal though; and there's always lots of material that hasn't yet worked its way across to the formal side. Moreover, attempts to describe the poetry of mathematics for non-mathematicians, in my experience, ultimately fail because they're trying to do something that can't really be done: they're trying to divorce the (very real) poetic nature of mathematics from the technical nature of the subject, and at last this can't really be done because the true poetry is that the elegance arises from the technicalities.

Poking around on the internet, I found a discussion on Quora from a few years ago on the question Can the Langlands Program be described in layman's terms? There were some earnest attempts that ultimately devolved into technical arcana; but my favorite answer, offered by a couple of respondents, was in essence: no.

My own hand-wavy assessment: Robert Langlands conjectured broad, deep connections between the seemingly distant mathematical subjects of number theory and algebraic geometry. Especially distant in that, poetically speaking, number theory is a flavor of "discrete" math, while algebraic geometry is toward the continuous side of things. (I riffed on the discrete/continuous balance in physics some time back.) An especially high-publicity result fitting within this vast program was Andrew Wiles's proof of Fermat's Last Theorem, which hinged on proving a conjecture about elliptic curves.

Why would I even bring up such a thing? The Langlands program has gotten tangled up, in this century, with supersymmetry in physics; and the geometric side of Langlands is about complex curves. In effect, Langlands biases mathematical speculations toward further enhancing the reputation of complex numbers. So if one suspects physics may also lean toward the quaternionic, and one is also looking for interesting mathematical properties of quaternions, it seems fair game to ask whether quaternions can play into some variation on Langlands.

Storytelling

2019-03-19T17:50:00.000-07:00

But when it was midnight Shahrazad awoke and signalled to her sister Dunyazad who sat up and said, "Allah upon thee, O my sister, recite to us some new story, delightsome and delectable, wherewith to while away the waking hours of our latter night."
— The Book of the Thousand Nights and a Night: A Plain and Literal Translation of the Arabian Nights Entertainments, Richard Francis Burton, 1885.

In this post, I mean to further two purposes at once: to expand my thinking on the evolution of sapient thought, and to deepen my understanding of Julian Jaynes's book The Origin of Consciousness in the Breakdown of the Bicameral Mind (1976). The evolution-of-sapience part is my long-term interest; but the questions Jaynes raises are my current fuel and direction for exploring that evolutionary theme. My evolutionary thinking emerges on the other side of this driven exploration with some fascinating new insights and a set of further investigations to pursue.

I've considered language evolution several times on this blog, lately in February of last year. My unifying theme has been an extension of Eric Havelock's theory expounded in his Preface to Plato (1963), where he supposes that ancient Greek culture around the time of Plato had just undergone a profound transformation from orality, in which human culture is preserved in oral sagas, to literacy in which human culture is preserved in writing. I conjecture the existence of a still-earlier phase of human culture, before orality, for which I took cues from the Pirahã culture recently studied in the Amazon. The Pirahã have neither art nor storytelling; and their language, amongst a variety of other peculiarities, neither number nor time vocabulary, nor verb tense. To provide a convenient handle on the idea, I've used working name verbality for the pre-orality phase of culture; and I've hazarded a guess that the transition from verbality to orality is marked by the appearance of art and new technologies at the start of the Upper Paleolithic, circa forty thousand years ago.

I blogged preliminary thoughts on Jaynes, on my first reading of the book, in March of last year. Jaynes's thesis is that for some time leading up to about five thousand years ago, human minds functioned differently than the consciousness we experience today. Instead, the human mind was, in Jaynes's terminology, bicameral, with the left and right brains (so-called) operating in a mode of partially independent coordination resembling schizophrenia, and this bicameral coordination has gradually broken down and been replaced by modern consciousness over the past five millennia. He apparently sees the modern self as a character in a story the mind tells itself, a view I've encountered from others and with which I agree. Jaynes presents a detailed case for his thesis; my top criticism, on first reading, was that a less radical explanation of the evidence could be afforded by the memetic hypothesis, which was not yet available when Jaynes first formulated his bicameral thesis (as memes were only proposed in the same year Jaynes published his book, 1976), and which [memetics] I have been refining myself on this blog (starting some time back).

Another theme from past posts that informs my view of mind evolution is my model of sapient mind. Going in to this post, I'd expected my model itself to play a passive supporting role; instead, however, the discussions in this post have provided extensive feedback on implications of my model of mind.

As I tied up my thoughts on first reading Jaynes, I reckoned a second in-depth reading would be in order, coordinated with a systematic effort to reinterpret Jaynes's accumulated evidence as grist for a more detailed timeline of memetic and linguistic evolution following my verbality hypothesis. That's what I mean to do here.

I was blindsided, while preparing what I thought would be the final draft of this post, by a flash of insight that I hadn't remotely seen coming — at least, not (heh) consciously. Now, in these posts I mostly try to keep the path of my explorations intact — so it's clear how I got to where I did, and, also, so diverging paths not taken can be returned to some day; but it does sometimes happen that later turns of the path are related to earlier ones, so that the earlier turns should be marked for later recall, and I'll add forward references in the earlier discussion to cue the reader. This time, though, the new insight sharply altered the complexion of points scattered across the whole discussion, causing a few to look prescient and several others oblivious. Rather than compromise the whole depiction of the journey, I've left most of it untouched; with this one paragraph as a warning up-front, so the reader, when hit by that final turn, may hopefully suffer less mental whiplash from it than I did. For what it's worth, the one thing earlier in this introduction that now grates on me is the list of peculiarities of the Pirahã language; that list was cherry-picked from a ready-to-hand longer list of oddities of Pirahã in a discussion with no obvious relation to any of this — it was about conlanging. What grates is that there's something really important missing from the list, that looks obvious to me in retrospect (but while it's important here, it would have been entirely out-of-place in the conlanging discussion I cribbed from). It really will all make more sense (I hope) when I come to it in due season, at the far end of the path; though meanwhile my proofreading of this whole post will be punctuated by winces where I squelch my impulse to inject forward-references to it.

One point I wish to be perfectly clear from the outset of this post: I have great respect for Jaynes. In the process of this post I will say some pretty harsh things about him, and do not care to be misunderstood on this point. In a single lifetime, none of us can see everything; we need the testimony of those who have visited realms we have not, and Jayne's insights from his professional background, though I may question and criticize mercilessly, I do take very seriously. I wouldn't otherwise be devoting such close attention to them.

Contents
Timeline
Sapient mind
Jaynesian consciousness
Words
Gods
Brains
Evolution
Breakdown
Vestiges
Jaynes as a whole
Frame story

Timeline

Here is my working theory, before undertaking my second reading of Jaynes, on what happened when.

Human sapience started, I conjecture, at the onset of the Paleolithic —the old stone age— circa three million years ago. Dates assigned to these early milestones vary somewhat; at this writing, for instance, Wikipedia puts the Paleolithic onset at 3.3 million. (I blogged on the emergence of sapience back yonder.) Some modern thinkers on this —e.g. Daniel Dennett, Darwin's Dangerous Idea (1995)— have not only viewed language as the key distinguishing feature of sapience, but given language a causative or even definitive role in the process; however, I think language is more usefully understood as an effect rather than cause of sapience. I've speculated sapience is some sort of algorithmic phenomenon, quite possibly positioned in a pocket of evolutionary search-space such that most evolutionary paths are diverted away from it, requiring some peculiar set of conditions for ignition and surrounded by mostly disadvantageous alternatives. My best guess atm is that language may be a catalyst for sustained ignition: that language naturally emerges in a sufficiently dense population of sapients and, perhaps, helps to create the survival advantage needed to drive further evolution of both sapience and language. I also suspect the sapience engine may be related to the non-Cartesian theater of human short-term memory (more on that in the next section) — an algorithmic, rather than linguistic, view of the sapient mind.

I am, btw, not inclined to the Chomsky-esque notion of an elaborate universal grammar device genetically programmed into the human brain. As usual, I've an Ockham-ish preference for a simpler theory. I envision the sapience engine as a simple and robust —if rather evolutionarily hard-to-find— "chunking" device, accounting simultaneously for, on the cognitive side, formation of abstract ideas, and, projected onto the linguistic side, the tree-structured tendencies of human grammar. Though it seems entirely plausible that genetic evolution, once it got sapience in its metaphorical teeth, would favor improved linguistic capacity, I still see the internal sapience engine —what the solitary sapient mind does in itself, rather than how it interacts with other sapient minds— as the root cause of the distinctive shape of human evolution. I'd tend to ascribe technical features of human language to practical constraints of communication by a simple sapience engine rather than to incidental constraints of an elaborate language engine. (Daniel Everett recommended a similar conclusion from his study of the Pirahã, Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle, 2009: "Language is a by-product of general properties of human cognition [...] constraints on communication that are common to evolved primates [...] and the overarching constraints of human cultures".)

From my first reading of Jaynes, his vision of the early human mind differs rather extremely from mine. Because I envision the algorithmic core of human consciousness as essential to sapience, necessarily I would expect sapient early hominins some millions of years ago to have minds structured along very broadly the same lines as those of modern humans. Jaynes views conscious humans as an anomaly (if not a pathology) that has developed only within the past five thousand years or so. Since I view language as a natural consequence of sapience, and possibly a necessary part of its evolutionary advantage, my scenario also ascribes some form of language to sapient early hominins, again some millions of years ago. Jaynes suggests, in candid disagreement with mainstream thinking, that language didn't emerge until the start of the Upper Paleolithic —the late stone age— some forty or so thousand years ago; the point in human development where I'm placing the start of orality. On first reading, I found his attitude on this point quite refreshing, taking it for a cheerfully good-natured try at a dramatically unorthodox alternative to prevailing thought on a point that, honestly, we're all guessing at — our ancestors from ten or a hundred thousand years ago, let alone a million, having neglected to leave us any audio recordings of their interactions. I approve (as I've remarked numerous times on this blog) of shaking up orthodoxy, to keep our thinking limber. Nevertheless, in this case I'm disinclined to Jaynes's late date for the onset of language.

Jaynes's notion of modern consciousness —if I understood correctly on first reading— presumes the narrative self is generated by a monolithic device, whereas my notion of essential sapience engine, though possessing a center (non-Cartesian theater), is inherently distributed and thereby more flexible. As mentioned in my earlier post, Susan Blackmore in The Meme Machine (2000) also described a monolithic notion of self, though she explained she no longer believed in it due to her study of memetics. When she adopted a notion of self as a character in a narrative generated by the mind, apparently she reckoned that only the generated character was monolithic, not the generating mind. Daniel Dennett too, e.g. in Consciousness Explained (1991), described a non-monolithic mind; Dennett's notion of mind, as I understand it, was radically decentralized, apparently lacking any internal structure to the device prior to memes being fed into it. Differences aside, all these non-monolithic models of mind seem consistent with substituting memetics for bicamerality at the center of Jaynes's grand scheme.

The very fact that storytelling is involved in the modern notion of self implies, within my scheme, that early hominins, with a verbal rather than oral culture and therefore with no storytelling at all, would not have a self in quite the modern sense. This seems to me a less radical prospect than Jaynes's bicameral mind, because I don't see the narratized self as central to the structure of the mind; again, my sapience device affords a moderate degree of coherence prior to any memetic programming. There is a curious implication in this early lack of modern self, that, just as my conjectured sapience engine would support a softened form of modern conscious mind, it might also support a softened form of Jaynes's bicameral mind. It's unclear to me just what form this would take, and I'm not immediately convinced it should play a significant role in the evolutionary scheme, but it's another thing to consider on a second reading. One can't rule out this variant bicameral scenario without first properly understanding it; and one should also keep in mind that Jaynes was a psychologist, with (as alluded to earlier) expertise in quite a different realm of phenomena than my own background has offered me.

For the next several million years, through the entirety of the Lower Paleolithic and Middle Paleolithic, by my timeline human language was in its verbality phase, without —one supposes— art, storytelling, number, time. The only model I have for this sort of language is Pirahã, and from its peculiar circumstances one expects Pirahã to be an atypical, even pathological, example. Though I take Pirahã as an existence proof for language that lacks key features of orality, I have as yet no notion of the range of variation of verbal languages, let alone how in particular human languages before the age of orality might likely differ from the strange outlier Pirahã in an age of literacy. Not only does Pirahã provide only one data point in the conjectured range of verbal languages, but I don't immediately see any strong reason to expect Pirahã to be a holdover from the age of verbality; it seems just as likely to have somehow reinitiated verbality (that is, they're both vanishingly unlikely events, with nothing but the existence of Pirahã to suggest that at least one of the two probabilities is non-zero). It's a guess that the absences of number and time are really related to those of art and storytelling; absence of time seems, imho, intuitively as if it ought to be related to absence of storytelling, but it would not be nearly so hard to imagine absence of number as just a coincident peculiarity. Derek Bickerton —Adam's Tongue (2009)— would populate these millions of years of human prehistory with a series of baby steps along the road to full-fledged language — steps that Jaynes must fit into the few tens of thousands of years of the Upper Paleolithic. I've wondered whether one might be able to somehow do a partial forensic reconstruction of verbal language by studying grammatical peculiarities of the Basque language isolate, but it's not immediately clear that I could pursue that with my current means.

Orality started, in my timeline, at the onset of the Upper Paleolithic, the late stone age, perhaps forty thousand years ago (be the same more or less; at this writing, Wikipedia puts it around fifty thousand). Or more precisely, in my scenario the start of orality brought about the onset of the Upper Paleolithic around that time, marked by a flourishing of art and technology. I'd expect the sluggish information transmission of verbal society to support only a very gradual advancement of technology, and the introduction of orality to produce an immediate acceleration of technological progress. The correlation between technology and art was already suggested by the conspicuous absence of art along with storytelling from Pirahã culture; one might, with some element of justice, say that art appears necessary to technological advance. (A beta-audience for this text points out early cave paintings telling a story.)

Recently reported evidence dates Neanderthal cave paintings to sixty five thousand years ago (link). Under my premise, this would indicate that the transition to orality was not specific to species Homo sapiens, making the transition (inasmuch as the species are separate) a memetic rather than genetic phenomenon, and suggesting that the genetic potentials of both species were able to reach the transition. With my supposition that sapience itself is part of the genetic potential involved, and guessing sapience only developed once, rather than evolving convergently for both species, it would follow that even if sapience didn't start all the way back at the onset of the Paleolithic, it ought to be at least as old as the divergence between H. neanderthalensis and H. sapiens, circa four hundred thousand years ago.

An important point in my recent reasoning on sapience is that memetic evolution is several orders of magnitude faster than genetic evolution. One might ask whether that applies to memetic evolution in verbal culture, or only later in oral/literate culture. If the Lower and Middle Paleolithic must be excluded from memetic evolution, it would raise the question of whether the onset of sapience ought to be reckoned from the start of the Upper Paleolithic after all, akin to Jaynes's timeline (and thereby weakening my verbality premise). However, the relatively slow start in the verbal phase may be compatible with rapid memetic evolution after all; note that genetic evolution on Earth got off to a very slow start, languishing in relatively primitive forms for at least three billion, perhaps as much as four billion, years before abruptly shifting into a higher gear with the Cambrian explosion. That pattern would fit tolerably well with memetic evolution extending through all, or a significant part of, the Paleolithic, with the interval between Paleolithic onset and Upper Paleolithic onset suggesting a rate of progress roughly three orders of magnitude faster than genetic evolution.

The technical character of the verbality/orality transition is unclear. According to my notion of a substantially fixed sapience engine, it seems the change ought to be linguistic, or rather (more precisely) ought to have a key manifestation in that form; but is the essential/signature linguistic element a provision for time? For number? Something else? Atm —poised to undertake my second reading of Jaynes— I see no basis for a strong preference between these alternatives; conceivably, though, there might be a way to work backward to it:

Until my first reading of Jaynes, I hadn't had occasion to consider what the verbality hypothesis implies about the evolution of orality. On consideration, though, it seems apparent that, starting from the dead stop of verbality, the high art of storytelling wouldn't spring fully formed but would, in fact, require an immense effort, presumably over a very long time (by human standards), to develop. Indeed, the development of the narrative self would seem to be itself a contiguous part of the evolution of storytelling, while the evidence Jaynes presents for bicamerality might also inform my alternative premise with some glimpses of intermediate stages in the evolution of storytelling. (I noted, in my earlier post, Jaynes remarked that the Iliad never describes human bodies as a whole, but rather as collections of parts; and I wondered at the time, what that says about the state of the art of storytelling.)

With a sufficiently detailed picture of the evolution of storytelling, then, one might imagine running it backward to deduce how it might have started. Though one might also try to reason forward to the start of storytelling by imagining what verbal culture might have been like. Evidently some modest level of technological skill was passed on from generation to generation, as glacially advancing technology characterizes the Paleolithic. Speculatively, might the volume of technological knowledge, being passed from generation to generation, have simply grown large enough to draw the attention of sapient minds, which then did what sapient minds do — thought about it, thereby causing it to creep into their language?

Even after the transition from orality to literacy, the evolution of storytelling wouldn't stop; it would just adapt to the changing memetic environment.

Sapient mind

As I hope to effectively transplant Jaynes's research from his model of mind to my own model of mind, if I'm to carry it off plausibly, I need to be clear on what my model of mind is. I've described this model before, in my post on sapience and language. In outline: I envision human short-term memory, with its seven-plus-or-minus-two chunks of information, as a sort of "non-Cartesian theater" forming the centerpiece of the mind. The audience watching the theater is a vast population of agents within the mind, each one representing and promoting —essentially, embodying— a thought. When a thought is promoted loudly enough —for which an important element is relating to other thoughts that are currently well-promoted, especially the ones now on-stage— the thought may be promoted onto the stage, becoming one of the lucky seven-plus-or-minus-two.

(A degree of freedom here is that I'd entertain, if necessary, a model in which the theater is something other than short-term memory; but short-term memory was the inspiration for my model, and remains my favored hypothesis.)

Daniel Dennett spent a great deal of effort, in his 1991 book Consciousness Explained, debunking the notion of a "Cartesian theater", in which the audience watching the theater is a monolithic consciousness. If the monolithic observer is immaterial (a soul), you've got Cartesian dualism; if the monolithic observer is material, it's apparently a mind, thus of the same type as the mind within which the theater and observer occur, so you've got an infinite recursion. However, neither objection applies to a model of mind in which the audience of the theater is a massively distributed sea of agents.

Presumably the agents in the audience also have some limited direct interaction (whispering to each other during the performance, as it were); and occasionally these interactions might rise to the level of sub-communities. The model would seem to be continuously deformable into a spectrum of unusual configurations, in which a large subcommunity could come to operate on a similar scale to the primary theater. One might wonder whether, in cases of split personality, alternate personalities would have normal-sized, or smaller-than-normal, short-term memories (on which, I have no strongly favored hypothesis). Conceivably, some such unusual configuration might resemble Jaynes's bicamerality.

Whatever philosophers (such as Descartes) may have said about the mind, some common figures of speech have no difficulty portraying the mind as coherent-but-separable. Having difficulty choosing between two alternatives, one might say, "I'm of two minds. Part of me says [A], but another part says [B]." In my experience, this usage feels entirely natural, neither remarkable nor problematic, unless we're told to notice it.

The theater evidently provides a good loom (or the frame for one) with which to weave a narrative self. Dream states may be examples of configurations in which the primary theater is in retreat, raising some question of how far the coherence of the primary theater is, or is not, directly related to the narrative self. Researchers tell us that dreaming takes place during REM sleep, so that we don't really "wake up out of a dream" with things happening around us contributing to the dream — the dream has to have happened earlier — but I envision the dream process as a sort of idling function similar to the self-loom, producing fragments that the primary loom may catch up and weave in as it's coming on-line when we wake. The dreams we "remember" would be those woven into the self-narrative, and could therefore have been selected based on events taking place as we wake; this ought to relate to the mechanism whereby agents are promoted onto the stage based on relevance to external stimuli. The resulting patched narrative, with stitching from the dream fragment to the situation on waking, would look like waking up to that situation out of the dream.

Note that I'm using the term "narrative" to describe a semantic understanding of a process that happens, not a linguistic rendition of the process. The loom I'm describing is prior to (or, deeper than) language.

It may be crucially important that (as anyone with a pet cat or dog has likely noted) non-sapient animals dream. If dreaming is an idling function of what I've been calling the self-loom, then the loom would seem to be properly part of the mind prior to sapience. Presumably an animal mind would have an agent-promotion system akin to the one that advances agents to our non-Cartesian stage, leading to questions about the size of an animal's short-term memory that I have difficulty imagining how to investigate. The relations between promotion device, loom, and sapience ought to bear on the development of the narrative self.

Another likely manifestation, btw, of the self-loom is the illusion that we hesitate before acting on a sudden stimulus. Something abrupt happens, we hesitate, and then we act; or rather, that's the story we end up with. But that hesitation should be an artifact of the loom. The stimulus has to propagate through our nervous system, and when you're talking about fractions of a second, that propagation time is significant. As the loom weaves a story after the fact to describe what happened in terms of a synthesized self, it can't ascribe that time delay to propagation through the nervous system because that's not part of the story world (it would be breaking the fourth wall). So, to explain the time delay in terms of a directly present self, the loom says that the self hesitated.

Key mysteries in the recipe for sapience would appear to be how new "chunks" of information —abstract thoughts, or agents— are formed, and how the theater-like organization arranges itself. (As my thinking on this has advanced, I've lost my erstwhile interest in trying to build a sapient mind; the social consequences of building sapiences would be fraught, and I suspect, in any case, that the highest likelihood of producing a healthy, well-balanced sapience would be by the traditional method — biological reproduction followed by many years of child-rearing.)

Jaynesian consciousness

Jaynes envisions consciousness as a construct, with a narrative, so it's not entirely unlike my view of the narrative self. His notion, which he assembles slowly and carefully over his first three chapters (counting his Introduction), is apparently more structurally detailed than mine, apparently specific to consciousness rather than appealing to a more general theory of sapience, based on a sophisticated notion of metaphor which he presents in his third chapter and associates with language.

Some caution is wanted against possible confusion between Jaynes's approach to consciousness, which does involve narratization; and my approach to mind, which does include consciousness. To avoid getting tangled, I'll try to consistently use consciousness for Jaynesian consciousness, and reserve narrative self for the memetic notion I'm using within my non-Cartesian-theater-based theory of mind.

The first word of Jaynes's Introduction is "O". How many modern books can you think of that begin with the word "O"? How may modern books even use the word "O" (other than in quoting something from a bygone age)? The terms "prevenient counsel" and "introcosm" seem to belong in a paragraph that starts with the word "O". The word "introcosm", btw, isn't in any of the unabridged dead-tree dictionaries I have ready access to, and has almost no profile on the internet; one of the top Google matches I got was someone asking what it meant ten years ago on Yahoo Answers —because they'd encountered the word in Jaynes's book— and apparently nobody else knew either. From hints here and there, I'll guess it's psychology terminology borrowed into English from Spanish, perhaps tracing back to Juan Luis Vives, the sixteenth-century father of modern psychology. My favorite definition so far, from Steven Kotler: "the universe within, the infinity glimpsed down the rabbit hole of mind."

Jaynes's Introduction is a whirlwind tour of different approaches taken historically to the question of consciousness. Based on my impressions from first reading, I was particularly keen to look for discrepancies between what question Jaynes asks, and what questions the other historical approaches ask. That rather delightful first paragraph of Jaynes's, I find, defines his question: where does the introcosm, the internal world of the conscious mind, come from?

On my first reading, I suggested Jaynes was rather hard on past theories. On second reading, less so, with ambivalence on one theory in particular. Most of the historical approaches he describes are too broad to qualify as paradigms in the Kuhnian sense, and he offers plausible criticisms of them for his purpose. However, near the bottom of his list when he comes to the major twentieth-century school of behaviorism, which does qualify as a paradigm, it seems to me there may be some confusion between what Jaynes is interested in, what the paradigm is interested in, and what the behavioral approach —viewed in historical context, which is after all the context in which Jaynes presents it— applies to. Kuhn noted that a scientific paradigm defines what questions can validly be asked, and behaviorism expressly declines to ask about the introspective view of the mind; in historical context, behaviorism's defining role seems to be, to learn about the mind by studying it strictly from the outside. Perhaps, as Jaynes portrays, behaviorists explicitly denied that consciousness exists (on which Jaynes's training informs while mine does not); perhaps, also, Jaynes's perception of them may have been influenced by his own interests. As may be, it seems a valid point by Jaynes that if your objective is to understand the origins of consciousness, behaviorism won't help you.

It didn't register on me till partway into the following chapter, that one of those general approaches Jaynes had ticked off in his whirlwind tour is partly implied by the narrative self. I'll come back to that.

Back near the start of his Introduction, Jaynes notes that a variety of metaphors have been used to describe consciousness, varying with the popular imagination of the day (such as one from the nineteenth century that makes consciousness sound very like a steam engine). Starting with early metaphors describing consciousness as a sort of vision (which, I note, the above definition of introcosm also does). I didn't much note the early mention of metaphor on first reading; it seems more significant on second reading, though, foreshadowing where Jaynes will take his explorations later in the book.

(I've been calling, btw, the Breakdown of the Bicameral Mind a book, because honestly in plain English that's what you call a thing made up of consecutively numbered pages bound in a cover with a collective title for the whole. But Jaynes calls it an essay. His essay is nineteen chapters divided, after the Introduction, into three parts with six chapters each, and he calls the parts books (numbered I–III), hence presumably his avoidance of the term book for the whole bound volume.)

Following his Introduction, Jaynes's Book I is meant to follow the path by which he arrived at his beliefs, and his first chapter of Book I is about what consciousness is not. He suggests, at the end of the chapter, that this is an essential start to making his case, because if he can't show it's believable that an entire civilization could be made up of people who aren't conscious, the rest of what he has to say will fall flat.

The significance of the chapter from my perspective is quite different, due apparently to Jaynes focusing on consciousness while I'm focusing on the narrative self.

Each of the things Jaynes says consciousness is not, to me is a feature of the story we tell about the narrative self, therefore it's an element of storytelling about which we could ask, when did this element first enter the storytelling tradition? That is, each of them is potentially something to try to place somewhere on my timeline. Major items on his list: reacting to things doesn't require consciousness; concepts do not require it, nor learning, nor thinking, nor reason; and consciousness has no specific location (he notes that while we place it in the head, Aristotle placed it in the upper chest). Along the way, he also notes that the notion of the mind as a blank slate (evidently related to several of the items on his list) is present in Aristotle but didn't really catch on until John Locke in the seventeenth century.

However, while Jaynes reasons carefully about each individual thing on his list to show it doesn't need consciousness, for me it's immediate that none of those activities would require the narrative self because, simply put, the narrative self isn't involved in any of them until after the fact. The narrative self is like a fictional character in an historical novel, witnessing real historical events but unable to affect them because, after all, the character wasn't really there, but instead its participation was invented later by the storyteller. This, somewhat ironically, puts the narrative self roughly into one of Jaynes's rejected general views of consciousness from his Introduction: the fifth in his whirlwind tour, which he calls the "helpless spectator", a witness to events but unable to change them. Afaict it differs from the helpless spectator, because this fictional character is modeled on something else —a sapience engine— that really was a participant in the historical events. The sapience engine doesn't behave quite like a narrative self would, and this mismatch leads to some artifacts in the storyline (like the illusion of hesitation, mentioned above); but mostly the story hangs together pretty well. Even though any engagement of the narrative self in those real events is, strictly speaking, an illusion.

Jaynes presents what he's doing in this chapter as disproving misconceptions —his word— about consciousness. The reason these things are features to me, elements potentially to be placed on the timeline rather than misconceptions, is exactly because I'm embracing the essentially fictional and after-the-fact nature of the narrative self.

Despite the perspective shear between Jaynes's focus on consciousness and mine on the narrative self, Jaynes and I do have something curiously in common here, in the big picture of what we're doing: our ideas are both quite mind-bending to consider at scale. Jaynes openly acknowledges, in concluding the chapter, that the idea of an entire civilization of people who aren't conscious is extraordinary. However, even though I've described my alternative as "less radical", in a sense the narrative self goes even further than bicamerality in this regard: if the narrative self is a fiction that imperfectly approximates after-the-fact the actual performance of a sapience-engine, and to the extent this narrative fiction is the essence of what we think of as a "person", then while Jaynesian consciousness offers the prospect of whole ancient civilizations with no conscious people, the memetic narrative self proposes that the civilization we're in now has, in a certain sense, no people at all.

Having explored in-depth his key thesis that entire civilizations of people could function without consciousness, Jaynes develops in Chapter I.2 his theory of what consciousness is. From my perspective, there are two kinds of content in this chapter, differing in their relevance to my agenda: how consciousness is formed, and features of the form it takes.

The features of consciousness described here are of less interest to me than the ones in the previous chapter. Those in the previous chapter were, in my terms, features of the narrative self that don't accurately reflect the approximated sapience-engine — which made them of particular interest to me since they were evidently part of the storyline of the self, and I'm interested in exploring the development of that storyline. They were of less interest to Jaynes, as they bore neither on the character of the approximated mind nor, apparently, on the process of forming consciousness, both relevant to Jaynes's agenda. The features in this chapter are of more interest to Jaynes as he means them to accurately reflect the underlying mind, which he wants to understand better; but I already have a model of the sapience-engine. I'm not looking for new insights there, open though I'd hope to be to enhancements —or, more unfortunately, contradictions— to my model. So features of consciousness that Jaynes considers correct, i.e., accurate reflections of the mind, would likely have less bearing on my agenda.

Jaynes and I do, broadly, share an interest in how consciousness —or the closely related narrative self— is formed. Jaynes acknowledges that consciousness is an incomplete projection of the actual mind; but he can't invoke memetics to explain how the projection works because he developed his ideas before memetics was proposed by Richard Dawkins's The Selfish Gene (1976, the same year Jaynes's Bicameral Mind came out). Jaynes therefore devises another means of projection from the actual mind to the consciousness: metaphor, for which he provides an elaborate theory, with a cross-connection to language that makes of language a major tool for probing the historical development of consciousness. This results in a remarkably dense chapter, a fact that Jaynes explicitly acknowledges at the end of it.

Jaynes coins a flurry of terms to describe facets of metaphor. Any metaphor projects characteristics of one thing onto another thing; the thing projected from, he calls the metaphier, the thing projected onto, the metaphrand. The metaphier, he says, is always more known than the metaphrand, as we're projecting the metaphor in order to say something about the metaphrand. Not everything about the metaphier is relevant to the metaphor; so the particular features of the metaphier being projected are the paraphiers, and the features of the metaphrand they project onto are the paraphrands. An example he uses is "the snow blankets the ground": metaphier a blanket on a bed, metaphrand a layer of snow on the ground; paraphiers, he suggests, warmth, protection, sleep to be followed by waking, projected onto paraphrands of the snow keeping the earth snug while it sleeps till spring. He also has a term analog for a specialized type of metaphor, in which the metaphrand is meant to correspond part-by-part to the metaphier, as with a map corresponding part-by-part to the mapped territory. Even on the second reading I found myself wondering, why do you think this is important enough to warrant all these new terms?

Jaynes lists six features of consciousness: spatialization, exerption, the analog 'I', the metaphor 'me', narratization, and conciliation. Essentially, these are an array of operations for generating aspects of the introcosm (cf. the epigraph from Locke at the start of the Wizard Book, also seeking to enumerate thought-generating operations). After intense study, I still can't figure out the difference between the analog 'I' and the metaphor 'me'. He spends about a page explaining the term conciliation — as I understand it, representing something in the conscious mind by shoehorning it into a form we're already familiar with — and then scarcely uses the term conciliation again in the volume (though it gets iirc at least one passing mention in Book III, and another in the 1990 Afterword).

Jaynes develops his treatment of metaphor in terms of language. Metaphor, he says, is the primary means by which language builds new vocabulary. By the end of the chapter, he takes this a step further and claims that metaphor is a linguistic process, i.e., that metaphor cannot exist without language — which implies, since he maintains consciousness is formed by metaphor, that consciousness cannot exist without language. I don't see evidence supporting this in his treatment, perhaps because (as remarked earlier), in difference from most authors I've read on this subject, I'm inclined to think of the algorithmic processes of the sapient mind as prior to, and thus generating, language rather than vice versa. However, Jaynes and I end up in similar places after all, Jaynes theorizing that consciousness is generated through linguistic metaphor, and I that the narrative self is a product of storytelling.

There appears to be an interesting comparison and contrast here between the machineries envisioned, for processes within language versus prior to language, in Jaynes's model of mind, Dennett's, and mine. As remarked earlier, Dennett's model appears, by my lights, to lack any internal structure to the device prior to memes being poured into it; but at the linguistic level, where Jaynes has his extensive inventory of consciousness-generating processes, I have as little to say about detailed processes as Dennett does. Whereas below the level of language, I'm the one of the three who suggests algorithmic structure while Jaynes and Dennett impose no structure at that level.

The distinction between high- and low-level processing also plays in to my earlier point that the narrative self is fiction, versus Jaynes's view of consciousness as metaphor. Jaynes makes clear, as his discussion branches out in later chapters, that he sees the metaphor projection of consciousness as suppressing bicameral function of the mind, and thereby fundamentally altering the way human beings behave. Under my approach, the evolution of the narrative self would obviously affect people's behavior (as would any aspect of the evolution of storytelling), since it's part of what we think, and what we think affects what we do. So Jaynes and I agree on this much, that these phenomena, consciousness/self, have real consequences for human behavior. There is no contradiction here with my observation that the narrative self is a helpless spectator, witnessing events but unable to change them — because this is the difference between thinking, and being thought about. That is, the narrative self does not actually think, because its supposed role in thought is an after-the-fact fiction, with actual thought being done by the sapience-engine; but the sapience-engine does actually think about the narrative self, constructing it and reasoning about it, and the sapience-engine is a real actor in the world, so its thinking about the narrative self has real consequences. This distinction, though, is possible for me because I assign structure to algorithmic processing below the level of language; while the distinction is apparently not available to Jaynes because he doesn't consider algorithmic processing below the level of language. Without lower-level algorithmic processing, he continues to view consciousness as an active participant in thinking. His position is not obviously inconsistent (though lacking some telltale explanatory power noted earlier, such as for the hesitation effect); but, with Jaynes already acknowledging that thought does not require consciousness, it seems to me much simpler to keep the two cleanly separated by making the conscious mind always thought about rather than thinking.

On further reflection, it seems an advantage to the particulars of my low-level treatment that they factor out the chunking aspect of processing, by which ideas are formed, addressing only certain other kinds of processing — some ways that ideas, once formed, arrange themselves into a coherent mind. So I'm able to say something useful about low-level processing without being dragged into the central mystery of what Terrance Deacon, following C.S. Pierce, called symbolic thinking. In contrast to which, Jaynes (and, for that matter, Locke) tried to describe idea formation at a high level — which seems to me a doomed attempt as it gives an unavoidably oversimplified account of something I suspect is at the heart of sapience.

Words

Jaynes is particularly interested in the changes of meaning by which words gradually go from referring to concrete things, to referring to abstracts to do with mind or spirit. He touches on various words, e.g. English be and obey, Greek soma and wanax, but focuses especially on words for mind or spirit illuminated by the Iliad, which he considers the earliest written material we understand well enough to sift closely for the sorts of subtleties he's after. The Iliad is an oral story preserved in written form possibly as late as the eighth century BCE (about 28–2700 years ago; Jaynes figures 29–2800), describing alleged events from at least four centuries before that (32–3100 years ago). Jaynes focuses principally on seven words: thumos, phrenes, noos, psyche; kradie, ker, etor. But here I hit a major procedural snag. How am I to make use of Jaynes's work?

Jaynes acknowledges a bias problem with this material; he describes translation of abstract terms in ancient texts as "a Rorschach test in which modern scholars project their own subjectivity with little awareness of the importance of their distortion." (Btw: when Jaynes says subjectivity, he means thinking in a conscious manner.) That's why he starts with the Iliad rather than something older. But how then can we judge the bias of Jaynes's interpretation, without first learning ancient Greek and extensively studying the text of the Iliad ourselves? The best available answer, trite though it is, would seem to be: carefully. With each of Jaynes's observations about these words, one has to consider the likelihood of opportunities for Jaynes to misread the evidence in that particular case. The same goes, of course, for mainstream thought on these words, which Jaynes so colorfully describes being led astray by its expectations.

I also see an opportunity here to make use of another major body of etymological work — again, of course, keeping in mind likelihood of misconstructions: Proto-Indo-European (PIE). Far and away the most broadly cross-correlated reconstruction of an early proto-language, PIE is for that same reason far more likely to reflect real trends in early language, and probes significantly further back into the originating period of consciousness/self than the Iliad; PIE is believed to have been spoken in a period from roughly 6500 to 4500 years ago, more or less, which would have it falling out of use more than a thousand years before the events in the Iliad (let alone their written recording, another half millennium or so later).

And here is something I already know about PIE (once reminded of it), from years of dabbling in conlanging: PIE does something funky with verb tense. I've seen authors say it doesn't even have tense, just aspect. Wikipedia portrays PIE tense as present versus past, but the deeper one goes into that, the weirder it gets. I'm reminded of a remark (somewhere in the Conlangery podcast) that all these technical grammatical terms become wobbly when one starts looking at multiple languages. But evidently those reconstructing PIE have had particular difficulty working out what to do with verb treatment of time, which in my framework is already something to watch out for, a likely area of volatility as verbs wouldn't have treated time at all until the advent of orality.

Jaynes also mentions related work by Bruno Snell, a couple of decades before Jaynes's Bicameral Mind, on the gradual development of awareness of mind from Homer through Aristophanes. Snell's apparent view of the process as a growing awareness rather than a change in mind-function would naturally make his work uninteresting to Jaynes, who mentions Snell in a footnote to explain, in effect, why he won't have any more to say about him; to me, though, Snell seems well worth investigating hereafter, as a different, philological take on the matter and a counterpoint to Jaynes — my own view of the process, as development of story, being somewhat different from (afaik pending further study) either.

Gods

At the core of bicamerality, as Jaynes envisions it, is the god-voices generated by the right side of the brain. He develops this idea at length, that the human characters in the Iliad, Achilles and Agamemnon and on, are told what to do by gods; and that the humans have no self-awareness, no consciousness, no introspective world. Now, this is an interesting point, because lack of a narrative self seems to me quite a separate question from no inner world.

By contrast, Havelock maintained that abstract forces are a literate notion, while oral traditions require actors, for which the Greek gods, he noted, are exceptionally well-suited. That's a distinctly narrative view of gods. (I'm reminded, at this point, of something I read about pre-Christian Slavic mythology — that when arrested by Christianity its development was partway though a natural evolution from lesser spirits to greater gods. That too seems a potential source of some insight, worthy of further investigation, into the developmental track of storytelling.)

Under my model of sapience, which presupposes an underlying algorithmic mind-structure that doesn't vary much across these changes of mindset, the gods, like the modern narrative self, are just another thing thought about. So that the existence of an inner world thought about, and perhaps used for planning in a not-so-alien sense, would be quite separate from whether the characters in that fictional world are presented as being self-aware in the modern sense. In this way, self-awareness appears to be a key story element, whose development for the story one would expect to be extraordinarily difficult to distinguish from the development of self-awareness by the storyteller.

What would a mind be like if primed by an earlier form of storytelling, without self-awareness built into it? Presumably the answer ought to be: rather like the minds of the human characters in the Iliad. And possibly also rather like Jaynes's bicameral mind. Jaynes, being a psychologist, was understandably much concerned with what it would be like to be bicameral. For my part, with zero training in psychology, the psychology of minds in earlier stages of the development of storytelling initially seemed a far more intractable puzzle than working out a sequence of development of overt storytelling features. Central though the reality, or unreality, of bicamerality is to Jaynes, my own central concern in this regard was finding which aspects of bicamerality are compatible with my model of sapience, and whether any incompatible aspects are especially likely. It only came on me gradually that I need the internal psychology to unravel the compatibility questions. Keeping in mind that imaginability though perhaps necessary can't be sufficient — however far Jaynes might have successfully imagined bicamerality, and however compellingly communicated to the reader, doesn't require it to have actually been so.

Jaynes has a persistent problem with confirmation bias, which imho he doesn't do terribly well at guarding against; I remarked this on my first reading. The overall force of his argument builds through many specific details that, taken individually, could afford alternative explanations — some of which, in fairness, are much easier to see if one has an alternative explanation on-hand for the general phenomena he's noting (which, of course, I do). For example: In Chapter I.3, he addresses the potential objection to his theory that bicameral civilization should lead to chaos in any but a rigidly structured society, which is not what the Iliad depicts. He notes recent translation of Linear B, which turns out to depict a rigidly hierarchical society, and he concludes that the authors of the Iliad simply ignored this aspect of Mycenaean society. Apparently the available Linear B texts are all administrative records, which to me makes it unsurprising they'd present the society as rigidly structured. But it also struck me that Jaynes had just argued (on the previous page, no less), in reply to another objection, that the authors of the Iliad, in describing pervasively interventionist gods, were just describing the world as they knew it, not exercising some sort of poetic license. So Jaynes wants the authors' descriptions of gods to be simply describing the world as they knew it (which would be my guess also; it's quite consistent with an evolution of storytelling), but then when the poem isn't consistent with the kind of ancient society he wants, he figures they're systematically leaving that out of the epic.

Brains

Though above I contrasted Jaynes's, Dennett's, and my models of mind at the linguistic versus algorithmic levels, there's also a still-lower level one could consider; the neurological, the realm of brain hemispheres and regions, commissural fibers, modules, and whatnot. Dennett has some things to say about the neurological level; Jaynes has a great deal prominently to say about it, as he conjectures a particular neural mechanism responsible for generating god-voices. I'm the one of three who has had nothing to say at the neurological level, focusing instead on the algorithmic level with an eye to how it can give rise to higher-level structure.

Jaynes claims the brain is specialized to support bicamerality, with both hemispheres separately capable of understanding language, but only the left hemisphere in control of externally producing language, while the corresponding right-hemisphere facility is set up to produce an internal god-voice to tell the left-hemisphere what to do when a decision is needed. He attaches much significance to the fact that language is controlled by a single hemisphere while most other "important" facilities, he says, are redundant to both hemispheres; he claims some great evolutionary pressure must have been at work to create this asymmetry — which falls rather flat for me, as I see no reason why this asymmetry should be motivated by supporting bicamerality rather than, say, because language is too delicate to allow multiple sources of control, or, if the right-hemisphere facility must have some driving functional purpose, why the purpose mightn't be something else, related perhaps to short-term memory (which may also require singular control, implying the asymmetry would be far older than sapience). Jaynes proposes, as the channel carrying god-voices from right to left, the anterior commissure, and spends some time describing cases of commissurotomy, surgical cutting of the connections between the hemispheres (as a treatment for epilepsy); interestingly to me, though passing very quickly and casually in Jaynes's treatment, "all patients show[ed] short-term memory deficits".

To what extent are bicameral god-voices, or similar phenomena, reconcilable with my model of mind? The audience for my non-Cartesian theater is a sea of agents that may sometimes coalesce into rather large structures, each of which one might wish to treat either as a coherent group of agents or as a single especially-large agent. If the theater effectively resides in the left hemisphere, small agents in the right hemisphere may find it more difficult than their left-hemisphere colleagues to achieve individual promotion to the stage, so they may have better luck if they form coalitions with other right-hemisphere residents; and there's also a second language facility on the right that could allow such a coalition to take on a relatively self-like form. One consequence could be an actor, from the right hemisphere, appearing on the stage with more-or-less the aspect of a bicameral god. More generally, one might expect the hemispheres to specialize in qualitatively different sorts of agents, the left perhaps for smaller and the right for larger (whatever that would actually come out as in practice). Perhaps a major advantage of the asymmetric theater is simply that it offers nonuniform granularity of thoughts.

Evolution

Jaynes says bicamerality is the last step in the evolution of language. Of course, my theory posits a different relation between language and mind; and the Iliadic mind, whatever it was like, is not the last step for me; but this language-development view does give us some common ground, enhancing the likelihood that Jaynes's reasoning may have bearing on my timeline. Jaynes advocates group selection; he has in mind a bicameral civilization as a hive-like phenomenon with great numbers of people coordinated by the voices of gods, and he envisions this evolving at the group level rather than the individual level — which, I admit, strikes me as rather ironic since Jaynes's essay came out in the same year with Dawkins's The Selfish Gene which has, as a major theme, debunking group selection.

I've mentioned Jaynes's theory that language didn't even start until the Upper Paleolithic, nominally forty thousand years ago. His basic justification for this claim is that if language had started millions of years earlier, he would expect lots more technological and cultural progress over all that earlier time; for which I have an alternative explanation, with my verbality hypothesis, fitting neatly with the long incubation time of genetic life before the Cambrian explosion. Going in to my second reading of Jaynes, my guess was that the steps he envisions in the development of language should be a mix of some steps that for me belong in the verbal phase of development, and some that belong to the development of storytelling in the oral phase.

Jaynes's major milestones in the development of language:

Intentional calls (as opposed to silent signals), then separation of modifiers (e.g. intensifiers) from what they modify. Both prior to the Upper Paleolithic.
Commands. He figures this causes the Upper Paleolithic onset, which I've attributed to the phase shift from verbality to orality. It seems an awfully dramatic consequence to attach to something as simple as expressing a command, until one realizes that Jaynes is (and I am too, for that matter) considering how the conceptual framework we construct affects what we can think. This harks back to the old debate over the Sapir–Whorf hypothesis; Jaynes evidently puts the framework at the language level and I at the algorithmic, but we both figure there's some room for the bounds to be stretched and thus to evolve over time and generations.
Animal nouns. Causes appearance of animals in cave paintings, between twenty five and fifteen thousand years ago. It's not immediately obvious to me what sort of development in storytelling might correspond to this observable effect.
Thing nouns. Causes appearance of various new things — technical innovations, like barbed fishhooks. Here I'm insufficiently clear even on what event he has in mind.

Jaynes suggests that with language would come auditory hallucinations, providing continuity of attention in the absence of consciousness. Which is what the theater does in my model; and since I consider the theater necessary for the self-loom, and the loom necessary for dreaming, and since the cat now lying draped across my foot clearly dreams, I don't buy that language is needed for continuity of attention. I might buy that a larger theater (i.e., short-term memory) would make it easier to think through complex tasks, and remain focused on detailed procedures for a long time.

Names. Jaynes puts these quite late on his timeline, around twelve to ten thousand years ago. This is curious since Everett describes the Pirahã as having names. Jaynes sees this as a major step in our conceptualization of other people, and therefore a major step in the development of bicameral god-voices. I'm skeptical on this point, as animals who aren't remotely sapient clearly recognize other individuals. Jaynes says, plausibly, that names increase the intensity of our thinking about people, just as animal nouns increase the intensity of our thinking about animals. I'm interested in why he thinks it's such a late development.

Shifting from the linguistic to the archaeological, Jaynes notes the emergence from around seven thousand years ago of great agricultural civilizations. He figures some tremendously powerful device —bicamerality— is needed to enable this. At this point, though, I find his case rather weakened by comparison with current events. Large numbers of people motivated to follow the directives of a fictional being does not require some radically different type of mind. Not to put too fine a point on it, if people want badly enough to find a strong leader with their interests at heart, they can be appallingly vulnerable to someone who tells them what they want to hear. They'll convince themselves that this demagogue has their best interests at heart, despite ample evidence of intense selfishness, and will believe all manner of reality-defying claims because the demagogue said them. Not only is this demonstrating a tremendous capacity of modern conscious populations to be ring-led, but it's fair to say the demagogue's devotees, who generally want a strong leader with their interests at heart, are following the leader they imagine the demagogue to be, rather than the actual person. That in itself makes me doubt that any radical hypothesis such as bicamerality is needed to explain ancient civilizations organized around god-kings.

Jaynes offers three basic features of ancient civilizations that he maintains only make sense given bicamerality.

His first feature, treatment of gods as the rulers of civilizations, I found tbh thoroughly unconvincing as evidence for his theory, and (at first blush) not overtly insightful for the evolution of storytelling, either. Major explicit archaeological evidence for this feature is "houses of the gods" placed prominently at the center of ancient cities, in the positions, as Jaynes notes, where one might expect to find the dwelling of the prince or ruler. That expectation struck me as based on an unjustified bias in favor of an atheistic social structure. The most spectacular edifice in a medieval European city was apt to be their cathedral. Of the evidence Jaynes describes in that part of his discussion, the only point that seemed to me to invite explanation (I remarked on this also on first reading) was the way the Incan empire was conquered so ridiculously easily by Spanish conquistadors in 1532. That peculiar episode doesn't seem to need an extreme explanation such as bicamerality —religion has plenty of control over conscious people today— though it might, perhaps, demonstrate that different stages in the development of storytelling can alter how receptive individuals are to certain kinds of memes. On reflection, that's not a surprising phenomenon.

The second feature he presents as evidence is custom treating the deceased as if they were still alive. There are certainly plenty of examples of that; but again, I see no demand for bicamerality here. The motive for such practice could be as simple as a state of denial, or, more in line with my scenario, a narrative presentation of people as forces not limited in time; a natural result, perhaps, of attempting to position people, as speech participants, in a conceptual framework that has had time added to it (fitting neatly with the theory that time is the essential ingredient for the transition from verbality to orality). Jaynes claims that a wide variety of early civilizations used the same word for dead people as for gods — depending on the tricky question of how to interpret somewhat abstract terms in a language further removed and less triangulated than PIE. Jaynes's interpretation of such words being apparently as much of a Rorschach test for his biases as more mainstream researcher's for theirs.

The third feature is the extensive religious use of human idols. This does seem to want psychological explanation. Here too, Jaynes reasons that the observed phenomenon demands explanation and only bicamerality will do, while I see no call for any such extreme explanation. Nonetheless, the use of human idols does seem likely to fit into the evolution of storytelling; and offers the enticing prospect of concrete archaeological evidence going much further back in time than linguistic evidence can — though the details most of interest to me aren't of interest to Jaynes, details of sequence of changing practice over time.

I'm quite open to the possibility that the human mind, when conditioned with an earlier stage of orality (i.e., an earlier form of storytelling), might engage in more hallucination as part of its normal function; conceivably even hallucination as pervasive as Jaynes suggests (though I do wonder if the effects involved could take some intermediate form less explicit than what we would call hallucination... or if, conversely, what we call hallucination is differently perceived because of our different conceptual framework). The two main aspects of Jaynes's theory I'm actively disinclined from are his insistence on a radically altered mind architecture for his bicameral man, and his casual dismissal of religion as a highly impactful idea system rather than a symptom of this radically altered mind-architecture. This sometimes makes it quite awkward to sort out what to think about some of Jaynes's supposed evidence, as I may find myself neither agreeing nor disagreeing, in that while Jaynes's interpretation of the evidence may seem quite unjustified, it may at the same time seem credibly like evidence of some different-but-related possibility that Jaynes isn't acknowledging.

As a major test of his theory, he then tries it out on writings from civilizations he figures were bicameral; his main choice is between hieroglyphic/hieratic and cuneiform, and for a starting point he rejects hieroglyphic/hieratic as far less accessible. As elsewhere in his linguistic explorations, though, while I agree with his concern that mainstream modern scholars fill in the unknowns in translation with their own conceptual biases, I find Jaynes is just as guilty of filling in the unknowns with his bicameral theories. It seems one would like to go into an in-depth translation effort already aware of a wide range of possibilities, to reduce the risk of lapsing unaware into one or another of them. How to do that without pouring a large increment of one's life into the effort, is not immediately clear to me (but there's a lot of that sort of problem floating around in this).

The various cultural organizations he describes —such as land owned by gods, with kings as their "tenant farmers"— seem like plausible ways to develop a story of the world if one is building it around deities as the actors needed for oral sagas, making these various cultural organizations stages in the evolution of storytelling. The particular manners of writing he quotes, in which gods are described as speaking/uttering/commanding, may seem to him compelling evidence of pervasive auditory hallucinations, but seem to me entirely plausible forms of expression for a culture in an intermediate stage of inventing manners of storytelling description.

Jaynes offers particularly a bicameral alternative explanation of the ancient Egyptian notion of the ka, which he presents as badly in need of reconsideration. Here again, he is at his most credible when criticizing mainstream translators for injecting modern concepts into ancient writings, which they seem likely to do; I'm inclined to take him seriously when he doubts the conventional interpretation of the ka. (Wikipedia's account of the ka presents a strikingly confident picture, which I don't trust at all because Wikipedia nurtures a consistent bias toward mainstream thought.) I'm doubtful, though, of Jaynes's contention that these ancient languages were very concrete; my sense from Havelock was that while literate abstraction does not occur in oral society, there is a different sort of oral abstraction, so that it seems Jaynes could be confusing oral-abstract with concrete and thus missing an essential element for grasping concepts such as ka.

For example, Jaynes mentions a Sumerian proverb, "Act promptly, make your god happy"; or at least, that's a common translation. As an alternative (and with a nod to the extreme uncertainty of abstract translations so far back), Jaynes suggests, "Don't think; let there be no time between hearing your bicameral voice and doing what it tells you." I observe, though, that if one supposes such voices, be they either hallucinated or conceptualized, are simply a natural impulse of the mind, not fundamentally different from our impulses except for how it's thought about as shaped by more primitive storytelling technology, one might try —for example— "Don't hesitate to follow the dictates of your conscience."

Breakdown

In the later chapters of Book II, Jaynes focuses on the process by which consciousness replaced bicameralism, especially in the second millennium BCE (4000–3000 years ago), describing its causes (Chapter II.3) and evidence of the process (Chapters II.4–II.6). This material contains more of direct interest to me, since my hope going in was to reinterpret the changes in terms of storytelling technology; though Jaynes's natural efforts to view everything in relation to bicameralism can seem more of a distraction for my purposes. For example, he discusses trade early in Chapter II.3, noting (or, claiming) that early trade between bicameral kingdoms was not an interaction between individual people, and as I approached the end of the chapter that was one of the points that had stuck in my mind; so that when, in concluding the chapter, he summarized the causes he'd named of the breakdown of bicameralism, I was surprised to see he didn't mention trade, instead (on careful inspection) naming the larger causative point under which he'd mentioned trade, "the inherent fragility of hallucinatory control". Which is the difference between my interest in changing behavior, versus his interest in properties of the internal state of bicameralism.

Another change he describes in that chapter that stuck in my mind —and also not mentioned in his summary— is the emergence of warfare: apparently, in the preceding millennium villages didn't have defensive walls. He also notes immense viciousness of Assyrian laws and warfare when Assyria resurged late in the second millennium, which he ascribes to breakdown of social order because the previous social order had been by bicameralism that was failing.

Tbh, the first of Jaynes's arguments I actually found impressive was at the very start of Chapter II.4 (basically, halfway through the essay). It seems, proportionate to its impact, especially at hazard from Jaynes's persistent difficulty in presenting a false dilemma between mainstream thought and his bicameral theory. By this point he's already described monumental depiction of Hammurabi, Babylonian king of law-giving fame in the early second millennium BCE (3750 years ago), standing before his god seated on a throne, listening diligently to his god's instructions being delivered in a business-like way — according to Jaynes, a typical, matter-of-fact portrayal of a bicameral king being instructed by his god through normal hallucination in a smoothly functioning bicameral theocracy. Jaynes starts Chapter II.4 with the monumental depiction of Tukulti-Ninurta I, Assyrian king half a millennium later and ostensibly the first to style himself "King of Kings"; starkly contrasting with Hammurabi, for the first time in history, according to Jaynes, in two respects: he kneels before the throne — and the throne is empty; also, according to Jaynes, a straightforward portrayal that the gods, who guided orderly bicameral societies through hallucination, have ceased to appear.

Jaynes presents this contrast forcefully, making it quite a stunning revelation: the gods disappeared, and these monuments directly tell us so if we're able to understand the message. Jaynes was apparently deeply impressed, put that into his presentation of it, and it comes through. But, not to get carried away with the false dilemma here, how would this fit into a memetic alternative to bicameralism? Religious hallucinations happen even today. The Pirahã apparently have group hallucinations. Religions, and similar ideologies, exert tremendous force on modern, literate populations. The evidence presented, in the contrast between Hammurabi and Tukulti-Ninurta I, suggests something momentous shifted, something to do with social order, how one conceptualizes the world, and perhaps even hallucinations; but none of that seems to require a drastic rearrangement of internal architecture as implied by the bicameral hypothesis. The challenge seems to me to be in understanding how people in these ancient societies conceptualized the world, which, under the memetic hypothesis, should follow a continuous path of development from the verbal-oral transition toward the oral-literate transition. I'm thinking I should reread Havelock, which just possibly I might get more out of now; and Snell.

Jaynes notes a number of phenomena that arose after, in his view, bicamerality broke down. First on his list is prayer, begging a god to speak; he reports that recovering schizophrenics occasionally do this as their hallucinated voices retreat from them. Then he mentions angels, part-bird beings that, he says, start to appear as part of a distancing from gods: earlier, individuals have gods that can speak to the greater gods, then the same scenes are shown but the greater god is absent as with Tukulti-Ninurta I, and then, angels. Tbh I don't see how this progress follows naturally with the breakdown of bicameralism, but if one supposes these beings are conceptualizations of aspects of the world it seems this may afford a more natural interpretation of the sequence as a memetic evolution.

Then he describes demons, malevolent entities. Evil, he says, didn't exist earlier. As usual, he presents this development as evidence of bicameralism, which I found quite unconvincing: if these beings are part of the way one conceptualizes reality, and really terrible things are happening in the world (which seems a pretty good summary of that millennium), it doesn't seem remotely surprising that malevolent beings would be introduced into the conceptual mix.

He notes the retreat of the gods, most of whom used to be on Earth, to heaven.

He makes detailed note of the emergence of divination, distinguishing four kinds as gradual steps on the way to developing an analog space in which to consider alternative behaviors of the self — exopsychic, he calls them. In order of progression toward consciousness in Jaynes's view: omens, sometimes-bizarre supposed cause-effect connections, so divining from miscellaneous events; sortilege, the casting of lots, so divining from random events; augury, reading from natural processes (such as oil or wax poured in water, or the arrangement of entrails of a sacrificed animal); and at last spontaneous divination, reading from whatever next catches one's eye. He notes in the second and third steps the use of the right hemisphere, which is good at spatial relations, and use of metaphor. The first three he says were occasionally known in Mesopotamia in the mid-second millennium BCE, but became major trends later; the first two in the early first millennium BCE, the third in the late first millennium BCE. Spontaneous divination he doesn't find in Mesopotamia but figures must have been there, noting its description in the Old Testament; and he notes it was popular in Europe into the Middle Ages.

With the force of Jaynes's presentation, it took some time for me to register, as a subliminal sense of discomfort worked its way into the open, that all the while he uses Hammurabi as an exemplar of a bicameral-theocratic king, the law code Hammurabi is famous for concerns penalties for crimes by individual people that don't altogether fit with Jaynes's portrayal of a bicameral society.

Closing out his discussion of Mesopotamia, he notes scattered signs of emerging consciousness: A change in tone of personal messages from Hammurabi —whose letters are apparently quite factual, or so Jaynes construes— to messages a thousand years later from an Assyrian king. (Which, I note, is very much a change in mode of storytelling.) Initiation of detailed annals of events, which Jaynes construes as spatialization of time. Versions of the epic of Gilgamesh from, seemingly, different eras, wherein the earlier lacks the interior perspective of the later.

A rare glimpse of Jaynes's own internal state occurs in his discussion of spontaneous divination, as he describes applying the technique as he writes that section. On his first try, he "reads" that he is getting too speculative, and on his second try, that he has to tie together a bunch of miscellaneous threads.

Jaynes's historical analysis of Greek literature in Book II, looking for evidence of the transition from bicamerality to consciousness (he likes for this the term transilience), is at once particularly relevant to my own search for patterns of change over time, and a particularly clear case of the perspective flaw in Jaynes's approach. He considers the changing treatment of most of the terms from his earlier discussion, which is just the sort of thing I want; calling these terms preconscious hypostases, by which he means, terms used to stand for the elements of internal state that will eventually be assembled into consciousness. (Btw: pronounce hypostasis with accent on the second syllable, similarly to hypothesis; shifting to the third syllable in the adjectival form, hypostatic, as hypothetical.) He hypothesizes that these hypostases pass through four phases of development: an objective phase of literal meaning about the external world; internal phase of literal meaning internal to the person; subjective phase where they become abstract spaces where feelings/thoughts can be "put"; and synthetic phase where all the hypostases are assembled into a singular conscious self. And this sequence of phases is where his method loses its way. The intermittent evidence can't testify in detail to the whole sequence, so Jaynes's reconstruction of the changing meanings of the words is inspired by his hypothesis; which demonstrates that the historical evidence is consistent with the hypothesis, that it can be interpreted so as to line up with the hypothesis, rather than actively supporting the hypothesis. Making his reconstruction more difficult for me to apply to a variant hypothesis, since his reconstruction is partly founded on his hypothesis.

Jaynes's first, "objective" phase of preconscious hypostases seems imho especially dubious, because it is grounded in his much-invoked supposition that early writings are very concrete. If the nature of abstractions was shifting at that time —and especially if it was continuously shifting, an elaboration from Havelock's thesis— one might plausibly expect the vocabulary of these intermediate-oral abstractions to be nearly indistinguishable, at our great conceptual remove, from a vocabulary we would describe as "objective".

Jaynes dates the crucial transilience to consciousness in Greece to roughly 600 BCE (2600 years ago), noting a tremendous blossoming of Greek literature thereafter. This would be about two centuries before Plato — recalling that (to my understanding) Havelock reckoned the great shift of Greek culture from orality to literacy had just recently happened when Plato was writing. It seems that Jaynes and Havelock both placed a tremendous shift in Greek thought in this era, the difference between them being the character they assigned to the shift.

Jaynes remarks of several texts, as he discusses them, that they could be understood as describing the transition from bicamerality to consciousness. There is some ambiguity in this suggestion, between a text from which one can learn of the transition, and a deliberate recounting of the transition. The latter —deliberate, though one ought not in Jaynes's framework to say conscious, recounting— raises a more general point about Jaynes's theory in relation to mine. Supposing that some of these texts were deliberate recountings of the matter, why would one undertake to tell that story? A closely allied question is, why religion? Jaynes says the books of the Pentateuch were assembled out of "nostalgic anguish for the lost bicamerality of a subjectively conscious people." (Chapter II.6, p. 297) This seems to me, on reflection, a rather piecemeal approach to motivations. I tend to posit a basic human impulse to describe, explain — to storytell; which is admittedly not altogether adequate since the Pirahã apparently lack such an impulse, but does stand in essence for storytelling as a coherent phenomenon rather than a hodgepodge of separate effects. And this coherence leads shortly to my difference from Jaynes: once one starts to think of the whole sequence of development in terms of a coherent phenomenon of storytelling, it seems clear that the technology of storytelling would have to be invented, developing gradually over time; and the introduction of this powerful new factor into one's understanding of the situation softens, without eliminating, the ideas Jaynes is applying exclusively. That is, Jaynes appears to be assembling his model of the evolution of human thought from just two pillars —bicamerality and consciousness— whereas it seems to me a smoother model of the evolution ought to be afforded by building into it an explicit role for the technological development of storytelling.

Despite my objections to Jaynes's method in trying to defend his thesis, and my contention that his core insight signifies something somewhat different, less radical, than what he extrapolates from it, I'm fascinated by the insight itself. On a few occasions scattered through the essay — I have in mind atm three in particular— the core insight shines through, dazzlingly. I remarked above on the depiction of Tukulti-Ninurta I kneeling before an empty throne. Another dazzling moment occurs in Jaynes's discussion of the Old Testament as a record (deliberate or no) of the stormy bicameral-conscious transition; though, curiously, the impact of it failed to reach me on my first reading, as to some extent did all three moments of dazzlement, perhaps because I had to get through the whole of the essay once, to get the measure of Jaynes's overall vision settled in my mind, before I could see these particular moments from Jaynes's perspective.

What I suspect to be the deepest core of Jayne's insight is revealed in Chapter I.4, The Bicameral Mind; an epiphanic recognition of commonality between experiences of modern schizophrenia and ancient writings. Jaynes, after several paragraphs' detailed description of a schizophrenic episode in which a man visiting the Coney Island beach was commanded by an auditory hallucination to drown himself, writes,

The patient walking the pounded sands of Coney Island heard his pounding voices as clearly as Achilles heard Thetis along the misted shores of the Aegean. And even as Agamemnon "had to obey" the "cold command" of Zeus, or Paul the command of Jesus before Damascus, so Mr. Jayson waded into the Atlantic Ocean to drown. Against the will of his voices he was saved by lifeguards and brought to Bellevue Hospital, where he recovered to write of this bicameral experience.
— Julian Jaynes, The Origin of Consciousness in the Breakdown of the Bicameral Mind, Book I, Chapter 4.

This passage not only failed to grab me on my first reading, but its effect on me was delayed even on second reading, finally drawing me back to it after some half dozen additional chapters.

Vestiges

Jaynes devotes Book III to "vestiges" of bicamerality, effects left over from bicamerality that linger even to this day. He considers religion to be, itself, such a vestige; which I honestly find not just unsupported, but implausible. Jaynes's basic method in promoting his thesis is to show that it can offer a coherent interpretation of the evidence, and at times he does make it all feel rather persuasively coherent — but, set down Jaynes for a while and go immerse yourself in the traditional interpretation of the history of religion, and it's quite coherent, too. Religion feels like a thing in itself, not a vestige of something else. This, I observe, is a common property of Book III: the effects he describes are, to varying degrees, not suggestive of bicamerality unless one starts with the bicameral hypothesis and looks for things that would then be related to it. In fairness, Jaynes presents his primary evidence for bicamerality in earlier chapters, and states up front in Book III that he hopes through these later vestiges to send illumination backward to "some of the darker problems of Books I and II"; so he's not really claiming these effects as further directly persuasive evidence.

Jaynes considers oracles a vestige, an effect caused by the loss of ubiquitous bicamerality, and outlines a six-stage process as bicamerality retreats. His stages: (1) locality oracles, awe-inspiring places that, in the early post-bicameral age, would still allow individuals to get in touch with their bicameral voices. (2) prophets, individual people who were still in touch with their bicameral voices after members of the general populace weren't. (3) trained prophets, taught with increasing difficulty to reach a bicameral state. (4) possessed oracles, taught to reach a frenzied state from which their voices would speak to others, but not to them. (5) interpreted possessed oracles, where additional specialists would be needed to figure out what was said in the frenzy. (6) erratic oracles, less and less teachable, less consistently accessible, less interpretable even by specialists. This sequence is of some potential interest to me since it's change of behavior, for which Jaynes offers some evidence of chronological progression; but the presentation of the stages as stages in a progression, and interpretation of them relative to bicamerality, seems rooted in Jaynes's chosen thesis.

Jaynes particularly notes that possession, which occurs starting from stage four of his oracular progression, is distinctly different from bicamerality in that the possessed individual doesn't remember it afterward. Jaynes doesn't feel fully able to fit this into his theory of the internal workings of the mind, though he feels it ought to fit somehow. Moving outward to increasingly un-bicamerality-like effects, he notes negatory possession, where the subject doesn't want to be possessed (briefly touching on Tourette's Syndrome), and glossolalia. These effects seem worth some careful attention in my own efforts to reconcile the hallucinatory aspect of Jaynes's ideas into my view of the mind.

Then Jaynes discusses music and poetry — an important topic also for Havelock, as his theory was reinterpreting Plato's remarks on the subject from The Republic. Jaynes's timeline calls for the bicameral gods to have poetized their instructions; then, as bicamerality begins to break down, for poets to have musical accompaniment, using the music to stimulate areas in the right hemisphere, adjacent to where god voices are generated; then later poets to be unaccompanied by music as poetry becomes a left-hemisphere concern. He notes Plato describing poets being possessed by the muse. It strikes me once again, reading Jaynes's treatment, that Jaynes wants a rather extreme rewiring of the brain to be implied by, broadly, the conceptual difference between perceiving the muse as an aspect of the poet versus separate from the poet. This relates to my earlier remarks about the narrative self being thought about but inherently not a participant in thinking; which I see as a key error in Jaynes's theory, that he supposes the self participates in thinking when it cannot do so. However, he does make a pretty good case (even if it's not quite what he had in mind) that what we think can somehow affect gross brain architecture, cross-hemispherically. I often prefer (as one might notice from a sufficiently large cross-section of this blog) to find some way to bypass controversial problems, as a sort of Gordian-knot-cutting; with this thought/architecture entanglement, though, integrating Jaynes's observations into my storytelling timeline seems to require some sort of provision for the entanglement, and it's not immediately obvious how to do this while avoiding commitment on psychological/neurological questions way outside my areas of expertise.

Jaynes devotes Chapter III.4 to hypnosis, describing it in considerable depth and promoting it as another vestige of bicamerality, with the interesting twist that another person, the operator, takes the place of the bicameral voice. When Jaynes asks whether hypnotized subjects have elevated right-hemisphere activity, predicts they should if the bicamerality hypothesis is correct, and describes evidence that they do, he loses me at the second step because I don't see why his bicamerality hypothesis should imply elevated right-hemisphere activity in this situation. If the role of the right hemisphere in bicamerality is supposed to be production of the bicameral hallucination, and there isn't any hallucination involved in hypnosis because the operator provides the authorization instead, then wouldn't the hypothesis predict an unelevated (perhaps even depressed) level of right-hemisphere activity? More broadly, Jaynes claims as he concludes the chapter that the alternative to a bicameral explanation is to suggest that the various aspects of hypnosis are all exaggerations of ordinary phenomena, and this he dismisses as not explaining, but explaining away, hypnosis. I take his point to be that the purpose of viewing each aspect of hypnosis this way is that the viewer doesn't want to believe in hypnosis. Some observations about this: Undoubtedly some people who embrace such reasoning would do so for that purpose, but presumably not all, and in any case, if the reasoning is valid it shouldn't matter why it was suggested. Jaynes's reason for suggesting various reasoning, after all, is to support his preferred hypothesis, which doesn't necessarily make his suggestions wrong (though it does, admittedly, make it especially needful to view his suggestions with careful criticism). And just because one views the various aspects of hypnosis as exaggerated fragments of ordinary brain function, rather than as scattered fragments of some fundamentally alternative mode of brain function, does not apparently prevent hypnosis from being a coherent phenomenon. I'm conjecturing that whatever state of thinking occupied the part of prehistory where Jaynes places the bicameral mind, it would be some coherent mode within the range of basic functions, and so would hypnosis be. If it's imaginable that fragments of bicameral function would reform, with some pieces missing, into a different cohesive phenomenon of hypnosis, then it seems (to me, anyway) at least as plausible that fragments of ordinary modern brain function could also form into a coherent phenomenon of hypnosis, and if they could do that, why not form into some other configuration(s) with further similarities to bicamerality?

The mixture of features he describes in hypnosis, differing in interesting ways from the mixtures in other phenomena he's discussed, ought to be useful to me in exploring what refinements of my model of mind would be most useful to accommodate some of his insights.

At almost-last, in Chapter III.5 Jaynes discusses schizophrenia. Which, he says, was not a thing in the bicameral age, was perceived during the breakdown of bicamerality as being god-touched, and only later came to be treated as an illness. Defining schizophrenia medically is, he notes, a can of worms, but he reckons the florid unmedicated condition is uniquely similar to bicamerality; unsurprisingly, since he apparently modeled his notion of bicamerality on what he knew of schizophrenia. As he describes patients reporting that their hallucinated voices would interfere with their conscious thinking by getting to the thoughts faster, this seems to me like support for both modes, the hallucination and the conscious thought, being simply alternative presentations of the underlying thought rather than, as I understand Jaynes to claim, profoundly different means of generating the thought in the first place.

Religion he views as a pale echo of bicamerality in the state of individual minds, whereas I see it as a tremendously powerful force in the memetic environment (the noösphere). (Jaynes's father, btw, was a Unitarian minster, which likely implies he would would have picked up an extensive knowledge of religion and developed immunities to religious dogma.)

He also notes that schizophrenics are able to keep doing the same thing for a very long time without getting bored, and are apt to focus on details to the exclusion of big picture; which are also facets of the way he envisions his bicameral man. Though, from my own perspective, it sounds slightly reminiscent of the autistic spectrum, which seems internally just about diametrically opposite from bicamerality.

His final chapter, III.6, is about science, which he treats as an offshoot of religion.

On one hand, I already didn't agree with his treatment of religion as a remnant of something earlier rather than an evolutionary development in its own right, and I similarly view science as a further evolutionary development in its own right. It seems to me the shifts along the path from religion to science make more sense without his bicameral hypothesis; which is related to his hyperfocus —with respect to religion, science, hypnosis, etc.— on authorization, an aspect for him of our bicamerality-supporting neural structures. Which I just don't see as that low-level; not that people don't look for social approval and all that, but it's never seemed to me to be more than one impulse among many.

On the other hand, I've felt all along (through both readings) there's something awkwardly off-kilter about Jaynes's approach to scientific methodology. Some of that may be unavoidable since he's exploring areas whose inherent depth of complexity earns them the somewhat-pejorative descriptor "soft science"; but the evidential flaws I've been noting in Jaynes's treatment are, with hindsight, consistent with symptoms of treating scientific hypotheses as authoritative pronouncements rather than conjectures within a bundle of alternatives.

Jaynes as a whole

Jaynes's 1990 Afterword looks at his theory as a whole; conveniently for me since, for my objective here, coming to the end of his essay I need to sum up my understanding of his ideas into something applicable to elaborating my own theory. He notes that he has, properly, not just one hypothesis, but several; which is true for me also, of course, as I've got my ideas about (at a quick inventory) sapience and mind; language; evolution; stages of culture; and the memetic structures of religion and science. Jaynes's inventory of hypotheses puts first "consciousness is based on language", which he, rightly I think, spends the most time on. This is the very point on which my treatment most differs from his (and from others'): he says other researchers have failed to separate introspection from other cognitive processes that aren't done by consciousness; but again, as has gradually come out over the above discussion, I don't agree that anything is done by consciousness. I'm seeing the conscious mind as an illusionary construct of the same order as hallucinated gods; which also implies that, even if Jaynes were right about the past ubiquity of hallucinated gods, the shift from there to ubiquitous consciousness would simply not be as foundational as Jaynes describes it.

He also, by my lights, overplays the idea of metaphor. Just as I hold thinking prior to consciousness, likewise prior to language. His metaphors are also, I think, too directional. When we need a word for something, and stretch a word we already had to cover the new case, our choice of word says something interesting about how we're thinking, what meanings we find similar to each other, but it's not wholly a mapping from the previous meaning to the new one; stretching the meaning of a word to cover more is just stretching the meaning of the word to cover more.

The tail end of his Afterword strikes a peculiar note, as he claims emotions are consciousness of affect —consciousness of biochemically organized behavior, of things like fear, shame, sexual excitement— which cannot happen in bicameralism precisely because it is consciousness of these things. He makes his case that techniques had to be developed, after the breakdown of bicameralism, to end emotions so they didn't just get stuck in a positive feedback loop. He tells an impressive tale of how the first tragic play in Athens, The Fall of Miletus, was so upsetting to the populace that it shut down the city for days, after which it was banned, burned, and its author banished never to be heard from again. For another example of out-of-control emotion, he offers the story of Oedipus, who, he says, is alluded to in the Iliad and Odyssey, where apparently he killed his father, married his mother, subsequently realized it and felt shame —an affect— then got over it and lived on with his mother and their children to the end of his days; whereas later, in a more conscious age, the story was retold with Oedipus feeling guilt, an emotion, and going completely off the rails, tearing his eyes out etc. And thirdly Jaynes claims that sexual fantasy was also invented during this time. This is another of those Jaynesian details I'm not sure what to do with, both because I'm not sure how it might be integrated with my scenarios, and because I'm not sure how much is really there to integrate. Noting in this case, for example, that Jaynes's impressive tale of the first tragic play in Athens isn't particularly consistent with the current mainstream account of what happened. (Wikipedia's inherent vulnerability to mainstream bias also means, conveniently in this case, it's tolerably likely to accurately depict mainstream thought.)

Jaynes's entire scenario seems to me to have too many parts; starting from his basic premise of bicameralism he then looks for diverse causes for its occurrence, its breakdown, and the emergence of various other things in its place, whereas my scenario calls for the continuous operation of a single process of memetic evolution. Ideally, anyway.

Overall, Jaynes has both narratized time and (I think) storytelling starting post-bicameralism, whereas I've had them starting nominally forty thousand years ago, when he has language start. He even describes the advent of consciousness in the first half of the first millennium BCE (3000–2500 years ago) as a "cognitive explosion", much as I've compared the advent of art and storytelling to the Cambrian Explosion, except of course I put the explosion more than thirty-five thousand years earlier. So if I mean to account for his evidence in my scenario —and it does seem to me my alternative has some advantages worth exploring— I should have a working hypothesis for how storytelling evolves through the period where Jaynes put bicameralism and its breakdown.

I should also be considering how my model of mind might interact with some form of hallucinatory phenomena, and the neurology thereof.

Frame story

A frame story is a story within which a story is told. A modern example occurs in the movie The Princess Bride, where a grandfather offers to read a book to his sick grandson, and most of the movie is the story he reads. A classical example is the story of Scheherazade, saving herself and healing her king's mind with her stories night after night for a thousand and one nights, quoted in the epigraph at the top of this post. There's another reason frame stories matter to this final section, though; read on.

The further we try to push back into the period Jaynes would call bicameral, the less we have to go on. The invention of writing is itself evidence of something; of some step in the evolution I want to reconstruct; but having archaeological evidence of ancient writing doesn't necessarily imply having a clue what it means. And even as one thinks one has a clue, one could be severely mistaken. Jaynes rightly objects to interpreting ancient writings as using alternative metaphors to describe modern thoughts; but it is far more difficult to puzzle out a use of ancient words and idioms to describe the sort of thoughts one would have if those were the sorts of thoughts, words, and idioms one was accustomed to. You don't have to buy into Jaynesian bicameralism for that sort of translation to be a horrendously difficult challenge.

An alternative searchlight into that period is, as mentioned earlier, the reconstructed PIE (Proto-Indo-European) language, estimated to have fallen out of use more than a thousand years before the events of the Iliad. By Jaynes's timeline, the PIE speakers should be fully bicameral. The handling of time in PIE is, indeed, more primitive than in its more modern descendants, which had (for example) to devise their own approaches to handling future tense from which one gathers the mother tongue had no such device. My sporadic readings on the matter suggest early PIE didn't have any tense at all, just aspect; but this still seems to be a gradual evolution of treatment of time in an age that, by Jaynes's reckoning, ought not to be treating time at all.

Strikingly, Pirahã has aspect. Which apparently puts it, in that sense, on a par with early PIE, circa 6000 years ago. So if I'm right in putting the advent of storytelling some 34 (or more) thousand years before that, then evidently tense as such is spectacularly nowhere near the first event in the orality timeline, yet aspect stretches back all the way into verbality. Granting, aspect in Pirahã could be a difference of that modern anomaly from ancient verbal languages; but even so it ought to imply that aspect is not inherently non-verbal. Tense, with its apparent late onset, could be the first event in the phase of things that Jaynes is perceiving as the breakdown of the bicameral mind: first past tense that gets the ball rolling, then future tense that brings the roof down. Unless, of course, the introduction of past tense is the start of storytelling, more like where Jaynes puts the start of the decline of bicameralism, verbality ends there, and the event I've been figuring for the verbality/orality transition is something else again, such as Jaynes's envisioned start of language. I prefer to pursue the 40-thousand-years-of-storytelling hypothesis, for now; in which case, it would seem storytelling has to be initiated by something subtler than tense. A really close contrast of aspect between Pirahã and PIE seems indicated, whether it uncovers some significant difference, or not.

Reconstructed PIE also has a complex system of pronouns; I mention this because, from what I understand, Pirahã appears not to have its own pronouns, but instead a system of pronouns borrowed from another local language, suggesting that, just possibly, pronouns are characteristic of orality rather than verbality. This could tie in to the question of when the concept of personhood developed.

Besides reconstructed linguistic development, I have a couple of other sources of inspiration available to me for how storytelling may have developed starting from, as I'm conjecturing, the Upper Paleolithic onset (though language reconstruction, notably featuring PIE, seems the most solid of the three). As a second source, there is some internal reasoning to be done about what sorts of players could have inhabited ancient people's conceptual landscape — gods, spirits (see my earlier remarks on pre-Christian Slavic mythology), and of course the self, of which Jaynes too makes much. One might add a gloss at this point for my suspicion, earlier in this post, that the notion of an afterlife (what Jaynes calls "the living dead") may result from trying to reconcile the concept of personhood with the concept of time. As a third source, there are Jaynes's accounts of what was going on over the millennia, with the large caveat that Jaynes is a plainly biased interpreter of whatever direct or indirect evidence he can find. He apparently pins his entire scenario on one putative event, the bicameral age, for which of course there can only be indirect evidence, and then the rest of what he says happened is an exercise in demonstrating that the evidence could be interpreted consistently with that one event. So one can't altogether trust that what Jaynes says happened necessarily did happen, varying from case to case; though it can still inspire, and should be allowed for.

The dramatis personae of the introcosm would, as noted, include gods, spirits, and the self. Some thought might be given to ordering these elements, relative both to each other and to other features such as past and future tense. Written records are in some sense a character in the story as well, the psychological consequence of their inherent stability being, to my understanding, at the root of Havelock's notion of literacy.

A side notion I've been contemplating, in the vein of less radical variants on bicameralism, is that earlier forms of orality might encourage a less traumatic form of hallucinatory effect than what Jaynes proposes. This seems implicit in describing the conscious mind as on the same order as hallucinated gods. Jaynes evidently bases his vision of ancient bicamerality on modern schizophrenia, which, as he notes, is a debilitating pathology that interferes with the conscious mind; but, even if modern schizophrenia isn't something that would be anomalous in any era, we needn't expect ancient hallucinations to have been so disruptive. If the effects of hypnosis can change over decades because of what subjects expect it's like to be hypnotized —as Jaynes reports— why shouldn't the effects of god-hallucinations change with mental framework over centuries? In conjecturing a continuous evolutionary development of the oral mind, a less traumatic form of hallucination should better fit the theory.

Before trying to fully sort out quite what to do with Jaynes's version of events —on which I've certainly a better handle, at this point— and, likewise, his notions of internal phenomena both hallucinatory and conscious, I do want to reread Havelock, and study Snell.

...except...

at this point I hesitated. (Yes, as promised at the top of this post, I'm about to get blindsided.)

There seemed nothing more to do, but I wasn't enthused with the state of things. I'd finished rereading Jaynes's essay, accumulating some solid insights into implications of my model of mind. I'd looked at the situation as a whole after the rereading, adding a solid insight about the invention of tense and planning what to investigate next. Yet I was left with less notion that at the start, of what caused the onset of orality or even when it happened. My theory, meant to provide coherent shape to the development of human culture, was instead becoming rather shapeless.

Realizing, however, these speculative exploratory posts can't all strike gold, and with no further inspiration apparent, I set about final polishing on the draft, preparing to post.

And that's when I got blindsided.

Remember that list of peculiarities of Pirahã? No number or time vocabulary, no verb tense. In my final summing up I added pronouns as sort-of-missing. But Pirahã is notorious in linguistics for something else, which only dawned on me (wham!) when I stopped trying to push forward and spent some downtime polishing the draft. There's a technical language property called recursion; Noam Chomsky has tagged this property as universal-and-unique to human language. And Daniel Everett, after deep study of Pirahã, says Pirahã isn't recursive. Thereby, if correct, blowing Chomsky's cherished theory out of the water. Over which, Noam Chomsky — one of the most influential linguists of the modern age — has called Daniel Everett a "pure charlatan". (Well, okay, my citation is in a newspaper in Brazil, so, he called Everett a "charlatão puro".)

Recursion is the ability to nest sentences (or some similar grammatical category) inside each other, potentially to unlimited depth. An Old West hero and villain, in de rigueur white and black hats, are playing poker, and the villain says, dramatically aside with an evil snicker, "Little does he know that I know what's in his hand." The hero says, aside, "Little does he know that I know that he knows what's in my hand." Villain, "Little does he know that I know that he knows that I know what's in his hand." And this can go on as long as both players have the stamina for it. And that's recursion; the ability to say things about saying things. The ability to tell a frame story.

Which brings me back to the Sapir–Whorf hypothesis, that language affects how we think. I maintain stoutly that language is a secondary effect of our sapient thoughts. We project our ways of thinking into our language, and if they don't fit, we bend the language to the purpose, whether that means stretching the vocabulary or stretching the grammar. Naturally, once the language has been bent for a way of thinking, it may communicate that way of thinking to those who use the language, since communicating thought is what language does. The language thus becomes a common pool for a shared cultural mode. It can't do much to prevent us from ways of thinking we've acquired, though it might make it harder for us to communicate them.

My immediate point is that recursion is a symptom, not an underlying cause. Saying about saying is a symptom. The underlying cause is thinking about thinking.

Which is the best answer I can offer to Jaynes's question: where the introcosm comes from.

This, then, is my sketch of the evolution of storytelling; four points, on which to pin the whole:

invention of framing, of thinking about thinking, projected into language as recursion; forty (or more) thousand years ago.
invention of past tense (by my tentative impression, about six thousand years ago).
invention of future tense (a bit later, I take it; best not even guess till I've studied up a bit on the subject).
literacy; onset in Greece about 2600–2400 years ago.

Sketchy, but no longer shapeless; inviting new avenues of investigation, including a tantalizing long interval between the first two points. And likely providing new guidance on what to attend when studying Havelock and Snell.

Lisp, mud, and wikis

2018-10-19T12:15:00.000-07:00

APL is like a diamond. It has a beautiful crystal structure; all of its parts are related in a uniform and elegant way. But if you try to extend this structure in any way — even by adding another diamond — you get an ugly kludge. LISP, on the other hand, is like a ball of mud. You can add any amount of mud to it [...] and it still looks like a ball of mud!
— R1RS (1978) page 29, paraphrasing remarks attributed to Joel Moses. (The merits of the attribution are extensively discussed in the bibliography of the HOPL II paper on Lisp by Steele and Gabriel (SIGPLAN Notices 28 no. 3 (March 1993), p. 268), but not in either freely-available on-line version of that paper.)

Modern programming languages try to support extension through regimentation, a trend popularized with the structured languages of the 1970s; but Lisp gets its noted extensibility through lack of regimentation. It's not quite that simple, of course, and the difference is why the mud metaphor works so well. Wikis are even muddier than Lisp, which I believe is one of the keys to their power. The Wikimedia Foundation has been trying lately to make wikis cleaner (with, if may I say so, predictably crippling results); but if one tries instead to introduce a bit of interaction into wiki markup (changing the texture of the mud, as it were), crowd-sourcing a wiki starts to look like a sort of crowd-sourced programming — with some weird differences from traditional programming. In this post I mean to explore wiki-based programming and try to get some sense of how deep the rabbit hole goes.

This is part of a bigger picture. I'm exploring how human minds and computers (more generally, non-sapiences) compare, contrast, and interact. Computers, while themselves not capable of being smart or stupid, can make us smarter or stupider (post). Broadly: They make us smarter when they cater to our convenience, giving full scope to our strengths while augmenting our weaknesses; as I've noted before (e.g. yonder), our strengths are closely related to language, which I reckon is probably why textual programming languages are still popular despite decades of efforts toward visual programming. They make us stupider when we cater to their convenience, which typically causes us to judge ourselves by what they are good at and find ourselves wanting. For the current post, I'll "only" tackle wiki-based programming.

I'll wend my way back to the muddiness/Lisp theme further down; first I need to set the stage with a discussion of the nature of wikis.

Contents
Wiki markup
Wiki philosophy
Nimble math
Brittle programming
Mud
Down the rabbit-hole
Wiki perspective

Wiki markup

Technically, a wiki is a collection of pages written in wiki markup, a simple markup language designed for specifying multi-page hypertext documents, characterized by very low-overhead notations, making it exceptionally easy both to read and to write. This, however, is deceptive, because how easy wiki markup is isn't just a function of the ergonomics of its rules of composition, but also of how it's learned in a public wiki community — an instance of the general principle that the technical and social aspects of wikis aren't separable. Here's what I mean.

In wiki markup (I describe wikimedia wiki markup here), to link to another page on the wiki locally called "foo", you'd write [[foo]]; to link to it from some other text "text" instead of from its own name, [[foo|text]]. Specify a paragraph break by a blank line; italics ''baz'', boldface '''bar'''. Specify section heading "quux" by ==quux== on its own line, or ===quux=== for a subsection, ====quux==== for a subsubsection, etc.; a bulleted list by putting each item on its own line starting with *. That's most of wiki markup right there. These rules can be so simple because specifying a hypertext document doesn't usually require specifying a lot of complicated details (unlike, say, the TeX markup language where the point of the exercise is to specify finicky details of typesetting).

But this core of wiki markup is easier that the preceding paragraph makes it sound. Why? Because as a wiki user — at least, on a big pre-existing wiki such as Wikipedia, using a traditional raw-markup-editing wiki interface — you probably don't learn the markup primarily by reading a description like that, though you might happen across one. Rather, you start by making small edits to pages that have already been written by others and, in doing so, you happen to see the markup for other things near the thing you're editing. Here it's key that the markup is so easy to read that this incidental exposure to examples of the markup supports useful learning by osmosis. (It should be clear, from this, that bad long-term consequences would accrue from imposing a traditional WYSIWYG editor on wiki users, because a traditional WYSIWYG editor systematically prevents learning-by-osmosis during editing — because preventing the user from seeing how things are done under-the-hood is the purpose of WYSIWYG.)

Inevitably, there will be occasions when more complicated specifications are needed, and so the rules of wiki markup do extend beyond the core I've described. For example, embedding a picture on the page is done with a more elaborate variant of the link notation. As long as these occasional complications stay below some practical threshold of visual difficulty, though (and as long as they remain sufficiently rare that newcomers can look at the markup and sort out what's going on), the learning-by-osmosis effect continues to apply to them. You may not have, say, tinkered with an image on a page before, but perhaps you've seen examples of that markup around, and even if they weren't completely self-explanatory you probably got some general sense of them, so when the time comes to advance to that level of tinkering yourself, you can figure out most or all of it without too much of the terrible fate of reading a help page. Indeed, you may do it by simply copying an example from elsewhere and making changes based on common sense. Is that cargo-cult programming? Or, cargo-cult markup? Maybe, but the markup for images isn't actually all that complicated, so there probably isn't an awful lot of extra baggage you could end up carrying around — and it's in the nature of the wiki philosophy that each page is perpetually a work in progress, so if you can do well enough as a first approximation, you or others may come along later and improve it. And you may learn still more by watching how others improve what you did.

Btw, my description of help pages as a terrible fate isn't altogether sarcastic. Wikis do make considerable use of help pages, but for a simple reason unrelated to the relative effectiveness of help pages. The wiki community are the ones who possess know-how to perform expert tasks on the wiki, so the way to capture that knowledge is to crowd-source the capture to the wiki community; and naturally the wiki community captures it by doing the one thing wikis are good for: building hypertext documents. However, frankly, informational documentation is not a great way to pass on basic procedural knowledge; the strengths of informational documentation lie elsewhere. Information consumers and information providers alike have jokes about how badly documentation works for casual purposes — from the consumer's perspective, "when all else fails, read the instructions"; from the producer's, "RTFM".

Sometimes things get complicated enough that one needs to extend the markup. For those cases, there's a notation for "template calls", which is to say, macro calls; I'll have more to say about that later.

Wiki philosophy

Here's a short-list of key wiki philosophical principles. They more-or-less define, or at least greatly constrain, what it is to be a wiki. I've chosen them with an evident bias toward relevance for the current discussion, and without attempting a comprehensive view of wiki philosophy — although I suspect most principles that haven't made my list are not universal even to the wikimedian sisterhood (which encompasses considerable variation from the familiar Wikipedian pattern).

Each page is perpetually a work in progress; at any given moment it may contain some errors, and may acquire new ones, which might be fixed later. Some pages may have some sort of "completed" state after which changes to them are limited, such as archives of completed discussions; but even archives are merely instances of general wiki pages and, on a sufficiently long time scale, may occasionally be edited for curational purposes.

Pages are by and for the People. This has (at least that I want to emphasize) two parts to it.

Anyone can contribute to a page. Some kinds of changes may have to be signed off on by users the community has designated as specially trusted for the purpose (whatever the purpose is, in that instance); but the driving force is still input from the broader public.

The human touch is what makes wikis worthwhile. Any automation introduced into a wiki setting must be strictly in the service of human decisions, always preserving — or, better, enhancing — the human quality of input. (One could write a whole essay on this point, someone probably should, and quite possibly someone has; but for now I'm just noting it.) People will be more passionate about donating their volunteer labor to a task that gives more scope for their creativity, and to a project that visibly embraces idealistic belief in the value of the human touch; and will lack passion for a task, and project, where their input is apparently no more valued than that of a bot.

Page specification is universally grounded in learn-by-osmosis wiki markup. I suspect this is often overlooked in discussions of wiki philosophy because the subject is viewed from the inside, where the larger sweep of history may be invisible. Frankly, looking back over the past five decades or so of the personal-computing revolution from a wide perspective, I find it glaringly obvious that this is the technical-side sine qua non of the success of wikis.

There is also a meta-principle at work here, deep in the roots of all these principles: a wiki is a human self-organizing system. The principles I've named provide the means for the system to self-organize; cripple them, and the system's dynamic equation is crippled. But this also means that we cannot expect to guide wikis through a conventional top-down approach (which is, btw, another reason why help pages don't work well on a wiki). Only structural rules that guide the self-organization can shape the wiki, and complicated structural rules will predictably bog down the system and create a mess; so core simplicity is the only way to make the wiki concept work.

The underlying wiki software has some particular design goals driven by the philosophical principles.

Graceful degradation. This follows from pages being works in progress; the software platform has to take whatever is thrown at it and make the best use of what's there. This is a point where it matters that the actual markup notations are few and simple: hopefully, most errors in a wiki page will be semantic and won't interfere with the platform's ability to render the result as it was intended to appear. Layout errors should tend to damp out rather than cascade, and it's always easier to fix a layout problem if it results in some sensible rendering in which both the problem and its cause are obvious.

Robustness against content errors. Complementary to graceful degradation: while graceful degradation maximizes positive use of the content, robustness minimizes negative consequences. The robustness design goal is driven both by pages being works in progress and by anyone being able to contribute, in that the system needs to be robust both against consequences of things done by mistake and against consequences of things done maliciously.

Radical flexibility. Vast, sweeping flexibility; capacity to express anything users can imagine, and scope for their imaginations. This follows from the human touch, the by and for the People nature of wikis; the point of the entire enterprise is to empower the users. To provide inherent flexibility and inherent robustness at the deep level where learning-by-osmosis operates and structural rules guide a self-organizing system, is quite an exercise in integrated design. One is reminded of the design principle advocated (though not entirely followed for some years now) by the Scheme reports, to design "not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary"; except, the principle is made even more subtle because placing numeric bounds on operations, in a prosaic technical sense, is often a desirable measure for robustness against content errors, so one has to find structural ways to enable power and flexibility in the system as a whole that coexist in harmony with selective robustness-driven numeric bounds.

Authorization/quality control. This follows from the combination of anyone being able to contribute with need for robustness (both malicious and accidental). A wiki community must be able to choose users who are especially trusted by the community; if it can't do that, it's not a community. Leveraging off that, some changes to the wiki can be restricted to trusted users, and some changes once made may have some sort of lesser status until they've been approved by a trusted user. These two techniques can blur into each other, as a less privileged user can request a change requiring privileges and, depending on how the interface works, the request process might look very similar to making a change subject to later approval.

As software platform design goals for a wiki, imho there's nothing particularly remarkable, let alone controversial, about these.

Nimble math

Shifting gears, consider abstraction in mathematics. In my post some time back on types, I noted

In mathematics, there may be several different views of things any one of which could be used as a foundation from which to build the others. That's essentially perfect abstraction, in that from any one of these levels, you not only get to ignore what's under the hood, but you can't even tell whether there is anything under the hood. Going from one level to the next leaves no residue of unhidden details: you could build B from A, C from B, and A from C, and you've really gotten back to A, not some flawed approximation of it[.]

Suppose we have a mathematical structure of some sort; for example, matroids (I had the fun some years back of taking a class on matroids from a professor who was coauthoring a book on the subject). There are a passel of different ways to define matroids, all equivalent but not obviously so (the Wikipedia article uses the lol-worthily fancy term cryptomorphic). Certain theorems about matroids may be easier to prove using one or another definition, but once such a theorem has been proven, it doesn't matter which definition we used when we proved it; the theorem is simply true for matroids regardless of which equivalent definition of matroid is used. Having proven it we completely forget the lower-level stuff we used in the proof, a characteristic of mathematics related to discharging the premise in conditional proof (which, interestingly, seems somehow related to the emergence of antinomies in mathematics, as I remarked in an earlier post).

Generality in mathematics is achieved by removing things; it's very easy to do, just drop some of the assumptions about the structures you're studying. Some of the theorems that used to hold are no longer valid, but those drop out painlessly. There may be a great deal of work involved in figuring which of the lost theorems may be re-proven in the broader setting, with or without weakening their conclusions, but that's just icing on the cake; nothing about the generalization can diminish the pre-existing math, and new results may apply to special cases as facilely as matroid theorems would leap from one definition to another. Specialization works just as neatly, offering new opportunities to prove stronger results from stronger assumptions without diminishing the more general results that apply.

Brittle programming

There seems to be a natural human impulse to seek out general patterns, and then want to explain them to others. This impulse fits in very well with Terrance Deacon's vision of the human mind in The Symbolic Species (which I touched on tangentially in my post on natural intelligence) as synthesizing high-level symbolic ideas for the purpose of supporting a social contract (hence the biological importance of sharing the idea once acquired). This is a familiar driving force for generalization in mathematics; but I see it also as an important driving force for generalization in programming. We've got a little program to do some task, and we see how to generalize it to do more (cf. this xkcd); and then we try to explain the generalization. If we were explaining it to another human being, we'd probably start to explain our new idea in general terms, "hand-waving". As programmers, though, our primary dialog isn't with another human being, but with a computer (cf. Philip Guo's essay on User Culture Versus Programmer Culture). And computers don't actually understand anything; we have to spell everything out with absolute precision — which is, it seems to me, what makes computers so valuable to us, but also trips us up because in communicating with them we feel compelled to treat them as if they were thinking in the sense we do.

So instead of a nimble general idea that synergizes with any special cases it's compatible with, the general and specific cases mutually enhancing each other's power, we create a rigid framework for specifying exactly how to do a somewhat wider range of tasks, imposing limitations on all the special cases to which it applies. The more generalization we do, the harder it is to cope with the particular cases to which the general system is supposed to apply.

It's understandable that programmers who are also familiar with the expansive flexibility of mathematics might seek to create a programming system in which one only expresses pure mathematics, hoping thereby to escape the restrictions of concrete implementation. Unfortunately, so I take from the above, the effort is misguided because the extreme rigidity of programming doesn't come from the character of what we're saying — it comes from the character of what we're saying it to. If we want to escape the problem of abstraction, we need a strategy to cope with the differential between computation and thought.

Mud

Lisp [...] made me aware that software could be close to executable mathematics.
— ACM Fellow Profile of L. Peter Deutsch.

Part of the beauty of the ball-of-mud quote is that there really does seem to be a causal connection between Lisp's use of generic, unencapsulated, largely un-type-checked data structures and its extensibility. The converse implication is that we should expect specialization, encapsulation, and aggressive type-checking of data structures — all strategies extensively pursued in various combinations in modern language paradigms — each to retard language extension.

Each of these three characteristics of Lisp serves to remove some kind of automated restriction on program form, leaving the programmer more responsible for judging what is appropriate — and thereby shifting things away from close conversation with the machine, enabling (though perhaps not actively encouraging) more human engagement. The downside of this is likely obvious to programming veterans: it leaves things more open to human fallibility. Lisp has a reputation as a language for good programmers. (As a remark attributed to Dino Dai Zovi puts it, "We all know that Lisp is the best language around, but in the hands of most it becomes like that scene in Fantasia when Mickey Mouse gets the wand.")

Even automatic garbage collection and bignums can be understood as reducing the depth of the programmer's conversation with the computer.

I am not, just atm, taking sides on whether the ability to define specialized data types makes a language more or less "general" than doing everything with a single amorphous structure (such as S-expressions). If you choose to think of a record type as just one example of the arbitrary logical structures representable using S-expressions, then a minimal Lisp, by not supporting record types, is declining to impose language-level restrictions on the representing S-expressions. If, on the other hand, you choose to think of a cons cell as just one example of a record type (with fields car and cdr), then "limiting" the programmer to a single record type minimizes the complexity of the programmer's forced interaction with the language-level type restrictions. Either way, the minimal-Lisp approach downplays the programmer's conversation with the computer, leaving the door open for more human engagement.

Wikis, of course, are pretty much all about human engagement. While wiki markup minimizes close conversation with the computer, the inseparable wiki social context does actively encourage human engagement.

Down the rabbit-hole

At the top of the post I referred to what happens if one introduces a bit of interaction into wiki markup. But, as with other features of wiki markup, this what-if cannot be separate from the reason one would want to do it. And as always with wikis, the motive is social. The point of the exercise — and this is being tried — is to enhance the wiki community's ability to capture their collective expertise at performing wiki tasks, by working with and enhancing the existing strengths of wikis.

The key enabling insight for this enhancement was that by adding a quite small set of interaction primitives to wiki markup (in the form, rather unavoidably, of a small set of templates), one can transform the whole character of wiki pages from passive to active. Most of this is just two primitives: one for putting text input boxes on a page, and another for putting buttons on the page that, when clicked, take the data in the various input boxes from the page and send it somewhere — to be transformed, used to initialize text boxes on another page, used to customize the appearance of another page, or, ultimately, used to devise an edit to another page. Suddenly it's possible for a set of wiki pages to be, in effect, an interactive wizard for performing some task, limited only by what the wiki community can collectively devise.

Notice the key wiki philosophical principles still in place: each page perpetually a work in progress, by and for the people, with specification universally grounded in learn-by-osmosis wiki markup. It seems reasonable to suppose, therefore, that the design principles for the underlying wiki software, following from those wiki philosophical principles, ought also to remain in place. (In the event, the linked "dialog" facility is, in fact, designed with graceful degradation, robustness, and flexibility in mind and makes tactical use of authorization.)

Consider, though, the nature of the new wiki content enabled by this small addition to the primitive vocabulary of wiki markup. Pages with places to enter data and buttons that then take the entered data and, well, do stuff with it. At the upper end, as mentioned, a collection of these would add up to a crowd-sourced software wizard. The point of the exercise is to bring the wiki workflow to bear on growing these things, to capture the expertise uniquely held by the wiki community; so the wiki philosophical principles and design principles still apply, but now what they're applying to is really starting to look like programming; and whereas all those principles were pretty tame when we were still thinking of the output of the process as hypertext, applying them to a programming task can produce some startling clashes with the way we're accustomed to think about programming — as well as casting the wiki principles themselves in a different light than in non-interactive wikis.

Wikis have, as noted, a very mellow attitude toward errors on a wiki page. Programmers as a rule do not. The earliest generation of programmers seriously expected to write programs that didn't have bugs in them (and I wouldn't bet against them); modern programmers have become far more fatalistic about the occurrences of bugs — but one attitude that has, afaics, never changed is the perception that bugs are something to be stamped out ASAP once discovered. Even if some sort of fault-tolerance is built into a computer system, the gut instinct is still that a bug is an inherent evil, rather than a relatively less desirable state. It's a digital rather than analog, discrete rather than continuous, view. Do wikis want to eliminate errors? Sure; but it doesn't carry the programmer's sense of urgency, of need to exterminate the bug with prejudice. Some part of this is, of course, because a software bug is behavioral, and thus can actually do things so its consequences can spread in a rapid, aggressive way that passive data does not... although errors in Wikipedia are notorious for they way they can spread about the infosphere — at a speed more characteristic of human gossip than computer processing.

The interactive-wiki experiment was expected from the outset to be, in its long-term direction, unplannable. Each individual technical step would be calculated and taken without probing too strenuously what step would follow it. This would be a substantially organic process; imagine a bird adding one twig at a time to a nest, or a painter carefully thinking out each brush stroke... or a wiki page evolving through many small edits.

In other words, the interactivity device —meant to empower a wiki community to grow its own interactive facilities in much the same way the wiki community grows the wiki content— would itself be developed by a process of growth. Here's it's crucial —one wants to say, vital— that the expertise to be captured is uniquely held by the wiki community. This is also why a centralized organization, such as the Wikimedia Foundation, can't possibly provide interactive tools to facilitate the sorts of wiki tasks we're talking about facilitating: the centralized organization doesn't know what needs doing or how to do it, and it would be utterly unworkable to have people petitioning a central authority to please provide the tools to aid these things. The central authority would be clueless about what to do at every single decision point along the way, from the largest to the smallest decision. The indirection would be prohibitively clumsy, which is also why wiki content doesn't go through a central authority: wikis are a successful content delivery medium because when someone is, say, reading a page on Wikipedia and sees a typo, they can just fix it themselves, in what turns out to be a very straightforward markup language, rather than going through a big, high-entry-cost process of petitioning a central authority to please fix it. What it all adds up to, on the bottom line, is the basic principle that wikis are for the People, and that necessarily applies just as much to knowledge of how to do things on the wiki as it does to the provided content.

Hence, that the on-wiki tasks need to be semi-automated, not automated as such: they're subject to concerns similar to drive-by-wire or fly-by-wire, in which Bad Things Happen if the interface is designed in a way that cuts the human operator out of the process. Not all the problems of fly/drive-by-wire apply (I've previously ranted, here and there, on misdesign of fly-by-wire systems); but the human operator needs not only to be directly told some things, but also to be given peripheral information. Sapient control of the task is essential, and in the case of wikis is an important part of the purpose of the whole exercise; and peripheral information is a necessary enabler for sapient control: the stuff the human operator catches in the corner of their eye allows them to recognize when things are going wrong (sometimes through subliminal clues that low-bandwidth interfaces would have excluded, sometimes through contextual understanding that automation lacks); and peripheral information also enables the training-by-osmosis that makes the user more able to do things and recognize problems subliminally and have contextual understanding.

These peculiarities are at the heart of the contrast with conventional programming. Most forms of programming clearly couldn't tolerate the mellow wiki attitude toward bugs; but we're not proposing to do most forms of programming. An on-wiki interactive assistant isn't supposed to go off and do things on its own; that sort of rogue software agent, which is the essence of the conventional programmer's zero-tolerance policy toward bugs, would be full automation, not semi-automation. Here the human operator is supposed to be kept in the loop and well-informed about what is done and why and whatever peripheral factors might motivate occasional exceptions, and be readily empowered to deviate from usual behavior. And when the human operator thinks the assistant could be improved, they should be empowered to change or extend it.

At this point, though, the rabbit hole rather abruptly dives very deep indeed. In the practical experiment, as soon as the interaction primitives were on-line and efforts began to apply them to real tasks, it became evident that using them was fiendishly tricky. It was really hard to keep track of all the details involved, in practice. Some of that, of course, has to be caused by the difficulty of overlaying these interactions on top of a wiki platform that doesn't natively support them very well (a politically necessary compromise, to operate within the social framework of the wikimedian sisterhood); but a lot of it appears to be due to the inherent volatility wiki pages take on when they become interactive. This problem, though, suggests its own solution. We've set out to grow interactive assistants to facilitate on-wiki tasks. The growing of interactive assistants is itself an on-wiki task. What we need is a meta-assistant, a semi-automated assistant to help users with the creation and maintenance of semi-automated assistants.

It might seem as if we're getting separated from the primitive level of wiki markup, but in fact the existence of that base level is a necessary stabilizing factor. Without a simple set of general primitives underneath, defining an organic realm of possibilities for sapient minds to explore, any elaborate high-level interface naturally devolves into a limited range of possibilities anticipated by the designer of the high-level interface (not unlike the difference between a multiple-choice quiz and an essay question; free-form text is where sapient minds can run rings around non-sapient artifacts).

It's not at all easy to design a meta-assistant to aid managing high-level design of assistants while at the same time not stifling the sapient user's flexibility to explore previously unimagined corners of the design space supported by the general primitives. Moreover, while pointedly not limiting the user, the meta-assistant can't help guiding the user, and while this guidance should be generally minimized and smoothed out (lest the nominal flexibility become a sudden dropping off into unsupported low-level coding when one steps off the beaten path), the guidance also has to be carefully chosen to favor properties of assistants that promote success of the overall interactive enterprise; as a short list,

Avoid favoring negative features such as inflexibility of the resulting assistants.
Preserve the reasoning behind exceptions, so that users aren't driven to relitigate exceptional decisions over and over until someone makes the officially "unexceptional" choice.
Show the user, unobnoxiously, what low-level manipulations are performed on their behalf, in some way that nurtures learning-by-osmosis. (On much the same tack, assistants should be designed —somehow or other— to coordinate with graceful degradation when the interactivity facilities aren't available.)
The shape of the assistants has to coordinate well with the strategy that the meta-assistant uses to aid the user in coping with failures of behavior during assisted operations — as the meta-assistant aids in both detecting operational problems and in adjusting assistants accordingly.
The entire system has to be coordinated to allow recovery of earlier states of assistants when customizations to an assistant don't work out as intended — which becomes extraordinarily fraught if one contemplates customizations to the meta-assistant (at which point, one might consider drawing inspiration from the Lispish notion of a reflective tower; I'm deeply disappointed, btw, to find nothing on the wikimedia sisterhood about reflective towers; cf. Wand and Friedman's classic paper [pdf]).

A common theme here is, evidently, interplay between features of and encouraged by the meta-assistant, reaching its height with the last two points. All in all, the design of the meta-assistant is a formidable — better, an invigorating — challenge. One suspects that, in fact, given the unexplored territory involved, it may be entirely impossible to plan it out in advance, so that growing would be really the only way to approach it at all.

Wiki perspective

As noted earlier, the practical experiment is politically (thus, socially) constrained to operate within the available wikimedia platform, using strictly non-core facilities —mostly, JavaScript— to simulate interactive primitives. Despite some consequent awkward rough spots at the interface between interactivity and core platform, a lightweight prototype seemed appropriate to start with because the whole nature of the concept appeared to call for an agile approach — again, growing the facility. However, as a baseline of practical interactive-wiki experience has gradually accumulated, it is by now possible to begin envisioning what an interactive-core wiki platform ought to look like.

Beyond the basic interactive functionality itself, there appear to be two fundamental changes wanted to the texture of the platform interface.

The basic interaction functionality is mostly about moving information around. Canonically, interactive information originates from input fields on the wiki page, each specified in the wiki markup by a template call (sometimes with a default value specified within the call); and interactive information is caused to move by clicking a button, any number of which may occur on the wiki page, each also specified in the wiki markup by a template call. Each input field has a logical name within the page, and each button names the input fields whose data it is sending, along with the "action" to which the information is to be sent. There are specialized actions for actually modifying the wiki —creating or editing a page, or more exotic things like renaming, protecting, or deleting a page— but most buttons just send the information to another page, where two possible things can happen to it. Entirely within the framework of the interactive facility, incoming information to a page can be used to initialize an input field of the receiving page; but, more awkwardly (because it involves the interface between the lightweight interactive extension and the core non-interactive platform), incoming information to a page can be fed into the template facility of the platform. This wants a bit of explanation about wiki templates.

Wiki pages are (for most practical purposes) in a single global namespace, all pages fully visible to each other. A page name can be optionally separated into "fields" using slashes, a la UNIX file names, which is useful in keeping track of intended relationships between pages, but with little-to-no use of relative names: page names are almost always fully qualified. A basic "template call" is delimited on the calling page by double braces ("{{}}") around the name of the callee; the contents of the callee are substituted (in wiki parlance, transcluded) into the calling page at the point of call. To nip potential resource drains (accidental or malicious) in the bud, recursion is simply disallowed, and a fairly small numerical bound is placed on the depth of nested calls. Optionally, a template can be parameterized, with arguments passed through the call, using a pipe character ("|") to separate the arguments from the template name and from each other; arguments may be explicitly named, or unnamed in which case the first unnamed parameter gets the name "1", the second "2", and so on. On the template page, a parameter name delimited by triple curly braces ("{{{}}}") is replaced by the argument with that name when the template is transcluded.

In conventional programming-language terms, wiki template parameters are neither statically nor dynamically scoped; rather, all parameter names are strictly local to the particular template page on which they occur. This is very much in keeping with the principle of minimizing conversation with the machine; the meaning-as-template of a page is unaffected by the content of any other page whatever. Overall, the wiki platform has a subtle flavor of dynamic scoping about it, but not from parameter names, nor from how pages name each other; rather, from how a page names itself. The wiki platform provides a "magic word" —called like a template but performing a system service rather than transcluding another page— for extracting the name of the current page, {{PAGENAME}} (this'll do for a simple explanation). But here's the kicker: wiki markup {{PAGENAME}} doesn't generate the name of the page on which the markup occurs, but rather, the name of the page currently being displayed. Thus, when you're writing a template, {{PAGENAME}} doesn't give you the name of the template you're writing, but the name of whatever page ultimately calls it (which might not even be the name of the page that directly calls the one you're writing). This works out well in practice, perhaps because it focuses you on the act of display, which is after all the proper technical goal of wiki markup. (There was, so I understand, a request long ago from the wikimedia user community for a magic word naming the page on which the markup occurs, but the platform developers declined the request; apparently, though we've already named here a couple of pretty good design reasons not to have such a feature —minimal conversation, and display focus— the developers invoked some sort of efficiency concern.)

Getting incoming parameters from the dialog system to the template system can be done, tediously under-the-hood, by substituting data into the page — when the page is being viewed by means of an "action" (the view action) rather than directly through the wiki platform. This shunting of information back-and-forth between dialog and template is awkward and falls apart on some corner cases. Presumably, if the interactive facility were integrated into the platform, there ought to be just one kind of parameter, and just one kind of page-display so that the same mechanism handles all parameters in all cases. This, however, implies the first of the two texture changes indicated to the platform. The wikimedia API supports a request to the server to typeset a wiki text into html, which is used by the view action when shunting information from dialog parameters to template parameters; but, crucially, the API only supports this typesetting on an unadorned wiki text — that is, a wiki text without parameters.

To properly integrate interactivity into the platform, both the fundamental unit of modular wiki information, and the fundamental unit of wiki activity, have to change, from unadorned to parameterized. The basic act of the platform is then not displaying a page, but displaying a page given a set of arguments. This should apply, properly, to both the API and the user interface. Even without interactivity, it's already awkward to debug a template because one really wants to watch what happens as the template receives arguments, and follow the arguments as they flow downward and results flow back upward through nested template calls; which would require an interface oriented toward the basic unit of a wiki text plus arguments, rather than just a wiki text. Eventually, a debugging interface of this sort may be constructed on top of the dialog facility for wikimedia; in a properly integrated system, it would be basic platform functionality.

The second texture change I'm going to recommend to the platform is less obvious, but I've come to see it as a natural extension of shifting from machine conversation to human.

Wiki template calls are a simple process (recall the absence of complicated scoping rules) well grounded in wiki markup; but even with those advantages, as nested call depth increases, transclusion starts to edge over into computational territory. The Wikimedia Foundation has apparently developed a certain corporate dislike for templates, citing caching difficulties (which they've elected not to try to mitigate as much as they might) and computational expense, and ultimately turning to an overtly computational (and, by my reckoning, philosophically anti-wiki) use of Lua to implement templates under-the-hood.

As a source of inspiration, though, consider transitive closures. Wikibooks (another wikimedia sister of Wikinews and Wikipedia) uses one of these, where the entire collection of about 3000 books are hierarchically arranged in about 400 subjects, and a given book when filed in a subject has to be automatically put in all the ancestors of that subject. Doing this with templates, without recursion it can still be managed by a chain of templates calling each other in sequence (with some provision that if the chain isn't long enough, human operator intervention can be requested to lengthen it); but then, there's also the fixed bound on nesting depth. In practice the fixed bound only supports a tower of six or seven nested subjects. This could, of course, be technically solved the Foundation's way, by replacing the internals of the template with a Lua module which would be free to use either recursion or iteration, at the cost of abandoning some central principles of wiki philosophy; but there's another way. If we had, for each subject, a pre-computed list of all its ancestors, we wouldn't need recursion or iteration to simply file the book in all of them. So, for each subject keep a list of its ancestors tucked away in an annex to the subject page; and let the subject page check its list of ancestors, to make sure it's the same as what you get by merging its parents with its parent's lists of ancestors (which the parents, of course, are responsible for checking); and if the check fails, request human operator intervention to fix it — preferably, with a semi-automated assistant to help. If someone changes the subject hierarchy, recomputing various ancestor lists then needs human beings to sign off on the changes (which is perhaps just as well, since changing the subject hierarchy is a rather significant act that ought to draw some attention).

This dovetails nicely into a likely strategy for avoiding bots —fully automated maintenance-task agents, which mismatch the by and for the People wiki principle— by instead providing for each task a semi-automated assistant and a device for requesting operator intervention. So, what if we staggered all template processing into these sorts of semi-automated steps? This might also dovetail nicely with parameterizing the fundamental unit of modular wiki information, as a deferred template call is just this sort of parameterized modular unit.

There are some interesting puzzles to work out around the edges of this vision. Of particular interest to me is the template call mechanism. On the whole it works remarkably well for its native purpose; one might wonder, though, whether it's possible (and whether it's desirable, which does not go without saying) to handle non-template primitives more cleanly, and whether there is anything meaningful to be applied here from the macro/fexpr distinction. The practical experiment derives an important degree of agility from the fact that new "actions" can be written on the fly in javascript (not routinely, of course, but without the prohibitive political and bureaucratic hurdles of altering the wikimedia platform). Clearly there must be a line drawn somewhere between the province of wiki markup presented to the wiki community, and other languages used to implement the platform; but the philosophy I've described calls for expanding the wiki side of that boundary as far as possible, and if one is rethinking the basic structure of the wiki platform, that might be a good moment to pause and consider what might be done on that front.

As a whole, though, this prospect for an inherently interactive wiki platform seems to me remarkably coherent, sufficiently that I'm very tempted to thrash out the details and make it happen.

Discourse and language

2018-09-22T21:51:00.002-07:00

"Of course, we practiced with computer generated languages. The neural modelers created alien networks, and we practiced with the languages they generated."

Ofelia kept her face blank. She understood what that meant: they had created machines that talked machine languages, and from this they thought they had learned how to understand alien languages. Stupid. Machines would not think like aliens, but like machines.

— Remnant Population, Elizabeth Moon, 1996, Chapter Eighteen.
A plague o' both your houses!
— Mercutio, Romeo and Juliet, William Shakespeare, circa 1591–5, act 3 scene 1.

I've realized lately that, for my advancement as a conlanger, I need to get a handle on discourse, by which I mean, the aspects of language that arise in coordinating texts beyond the scope of the ordinary grammatical rules of sentence formation. Turns out that's a can of worms; in trying to get a handle on discourse, I find myself confronting what are, as best I can work out, some of the messiest controversies in linguistics in the modern era.

I think conlanging can provide a valuable service for linguistics. My purpose in this post is overtly conlinguistic: for conlanging theory, I want to understand the linguistic conceptual tangle; and for conlanging practice, I want methodology for investigating the discursive (that would be the adjectival form of discourse) dynamics of different language arrangements. But linguistics —I've in mind particularly grammar— seems to me to have gotten itself into something of a bind, from a scientific perspective. It's got a deficit of falsifiable claims. Since we've a limited supply of natural languages, we'd like to identify salient features of the ones we have; but how can we make falsifiable claims about which features matter unless we have an unconstrained space of possibilities within which we can see that natural languages are not randomly distributed? We have no such unconstrained space of possibilities; it seems we can only define a space of possibilities by choosing a particular model of how to describe language, and inability to choose between those models is part of the problem; they're all differently constrained, not unconstrained. Conlanging, though, lets us imagine possibilities that may defy all our constrained models — if we don't let the models constrain our conlanging — and any tool that lets us do conlang thought-experiments without choosing a linguistic model should bring us closer to glimpsing an unconstrained space of possibilities.

As usual, I mean to wade in, and document my journey as well as my destination; such as it is. In this case, though a concrete methodology is my practical goal, it will be all I can manage, through mapping the surrounding territory (theoretical and practical), to see the goal more clearly; actually achieving the goal, despite all my striving for it here, will be for some future effort. The search here is going to take rather a lot of space, too, not least when I start getting into examples, since it's in the nature of the subject —study of large language structures— that the examples be large. If one wants to get a handle on a technique suitable for studying larger language structures, though, I reckon one has to be willing to pay the price of admission.

Contents
Advice
Academe
Gosh
Easy as futhorc
Revalency
Relative clauses
The deep end

Advice

Some bits of advice I've picked up from on-line conlangers:

The concept of morphemes — word parts that compose to make a word form, like in‑ describe ‑able → indescribable — carries with it a bunch of conceptual baggage, about how to think about the structure of language, that is likely counterproductive to inject into conlanging. David J. Peterson makes this point.
Many local features of language are best appreciated when one sees how they work in an extended discourse. This has been something of a recurring theme on the Conlangery Podcast, advice they've given about a variety of features.
Ergativity, a feature apparently fascinating to many conlangers, may not even be a thing. I was first set on to this objection by the Conlangery Podcast, who picked it up from the linguistic community where it occurred in, afaict, the keynote address of a 2005 conference on ergativity.

The point about morphemes is that they are just one way of thinking about how word forms arise. In fact, they are an unfalsifiable way to think about it: anything that language does, one can invent a way to describe using morphemes (ever hear of a subtractive morpheme?). That might be okay if you're trying to make sense of the overwhelming welter of natural languages, but doesn't make it a good way to construct a language; at least, not a naturalistic artlang. If you try to reverse-engineer morphemic analysis to construct a language, you'll end up with a conlang whose internal structure is the sort of thing most naturally described by morphemic analysis. It's liable to feel artificial; which, from a linguistic perspective, may suggest that morphemic analysis isn't what we're doing when we use language.

There are a couple of alternatives available to morphemic analysis. There's lexeme-based morphology; that's where you start with a "stem" and perform various processes on it to get its different word forms. For a noun that's called declining, for a verb it's conjugating. The whole collection of all the forms of the word is called a lexeme; yes, that's another -eme word for an abstract entity defined by some deeper structure (like a phoneme, abstracted from a set of different allophones that are all effectively the same sound in the particular language being studied; or a morpheme that is considered a single abstract entity although it make take different concrete forms in particular words, like English plural -s versus -es that might be analyzed as two allomorphs of the same morpheme; though afaik there's no allo- term corresponding to lexeme). The third major alternative approach is word-based morphology, in which the whole collection of word forms is considered as a set, rather than trying to view it as a bunch of different transformations applied to a stem. For naturalistic artlanging — or for linguistic experimentation — word-based morphology has the advantage that it doesn't try to artificially impose some particular sort of internal structure onto the word; but then again, not all natlangs are equally chaotic. For example, morpheme-based morphology is more likely to make sense if you're studying an agglutinative language (which prefers to simply concatenate word parts, each with a single meaning), while becoming more difficult to apply with a fusional language (where a single word part can combine a whole bunch of meanings, like a Latin noun suffix for a particular combination of gender case and number).

As I've tried to increase my sophistication as a conlanger, more and more I've come up against things for which discourse is recommended. But, I perceive a practical problem here. Heavyweight conlangers tend to be serious polyglots. Such people tend to treat learning a new language as something done relatively casually. Study the use of this feature in longer discourses, such a person might suggest, to get a feel for how it works. But to do that properly, it seems you'd have to reach a fairly solid level of comfort in the language. Not everyone will find that a small investment. And if you want to explore lots of different ways of doing things, multiplying by a large investment for each one may be prohibitive.

So one would really like to have a method for studying the discursive properties of different language arrangements without having to acquire expensive fluency in each language variant first. Keeping in mind, it's not all that easy even to give one really extended example of discourse, nor to explain to the reader what they ought to be noticing about it.

Okay, so, what's the story with ergativity? Here's how the 2005 paper explains the general concern:

when we limit a collection to certain kinds of specimens, there is a question whether a workshop on "ergativity" is analogous to an effort to collect, say, birds with talons — an important taxonomic criterion —, birds that swim — which is taxonomically only marginally relevant, but a very significant functional pattern —, or, say, birds that are blue, which will turn out to be pretty much a useless criterion for any biological purpose.
— Scott Delancey, "The Blue Bird of Ergativity", 2005.

The paper goes on to discuss particular instances of ergativity in languages, and the sense I got in reading these discussions was (likely as the author intended) that what was going on in these languages was really idiosyncratic, and calling it "ergativity" didn't begin to do it justice.

Now, another thing often mentioned by conlangers in the next breath after ergativity is trigger languages. Trigger languages are yet another way of marking the relations between a verb and its arguments, different again from nominative-accusative languages (like English) or ergative-absolutive languages (like Basque). There's a catch, though. Trigger languages are a somewhat accidental invention of conlangers. They were meant to be a simplification of Austronesian alignment — but (a) there may have been some misunderstanding of what linguists were saying about how Austronesian alignment works, and (b) linguists are evidently still trying to figure out how Austronesian alignment works. From poking around, the sense I got was that Austronesian alignment is only superficially about the relation of the verb to its arguments, really it's about some other property of nouns — specificity? — and, ultimately, one really ought to... study its use in longer discourses, to get a feel for how it works. Doh.

Another case where exploratory discourse is especially recommended is nonconfigurationality. Simply put, it's the property of some languages that word order isn't needed to indicate the grammatical relationships of words within a sentence, so that word order within a sentence can be used to indicate other things — such as discursive structure. Here again, though, there's a catch. There's a distinct difference between "configurational" languages like English, where word order is important in determining the roles of words in a sentence (the classic example is dog bites man versus man bites dog), and "nonconfigurational" languages like ancient Greek (or conlang Na'vi), where word order within a given sentence is mostly arbitrary. However, the massively polysyllabic terms for these phenomena, "configurational" and "nonconfigurational", come specifically from the phrase-structure school of linguistics. That's, more-or-less, the Chomskyists. Plenty of structural assumptions there, with a side order of controversy. So, just as you take on more conceptual baggage by saying "bound morpheme" than "inflection", you take on more conceptual baggage by saying "nonconfigurational" than "free word order".

Is there another way of looking at "nonconfigurationality", without the phrase-structure? The Wikipedia article on nonconfigurationality notes that in dependency grammar, the configurational/‌nonconfigurational distinction is meaningless. One thing about Wikipedia, though: more subtle than merely "don't take it as gospel", you should think about the people who wrote what you're reading. In this case, some digging turns up a remark in the archives of Wikipedia's Linguistics project, to the effect that, we really appreciate your contributions, but please try for a more balanced presentation of different points of view, as for example in the nonconfigurationality article you spend more time talking about how dependency grammar doesn't like it than actually talking about the thing itself.

It seems Chomskyism is not the only linguistic school that has its enthusiasts.

The thought did cross my mind, at this point, that if dependency grammar fails to acknowledge the existence of a manifest distinction between configurational and nonconfigurational languages, that's not really something for dependency grammar to brag about. (Granting, this failure-to-acknowledge may be based on a narrower interpretation of "nonconfigurationality" than the phrase-structurists actually had in mind.)

In reading up on morpheme-like concepts, I came across the term phonaestheme — which caught my attention since I was aware J.R.R. Tolkien's approach to language, both natural and constructed, emphasized phonaesthetics. A phonaestheme is a bit of the form of a word that's suggestive of the word's meaning, without actually being a "unit of meaning" as a morpheme would be; that is, a word isn't composed of phonaesthemes, it just might happen to contain one or more of them. Notable phonaesthemes are gl- for words related to light or vision, and sn- for words related to the nose or mouth. Those two, so we're told, were mentioned two and a half millennia ago, by Plato.

The whole idea of phonaesthemes flies in the face of the principle of the arbitrariness of signs. More competing schools of thought. This is a pretty straightforward disagreement: Swiss linguist Ferdinand de Saussure, 1857–1913, apparently put great importance on the principle that the choice of a sign is completely arbitrary, absolutely unrelated to its meaning; obviously, the idea that words of certain forms tend to have certain kinds of meanings is not consistent with that.

That name, Ferdinand de Saussure, sounded familiar to me. Seems he was hugely influential in setting the course for twentieth-century linguistics, and he's considered the co-founder, along with C.S. Pierce, of semiotics — the theory of signs.

Semiotics definitely rang a bell for me. Not a particularly harmonious one, alas; my past encounters with semiotics had not been altogether pleasant.

Academe

Back around 1990, when I first started thinking about abstraction theory, I did some poking around to get a broad sense of who in history might have done similar work. There being no systematic database of such things (afaik) on the pre-web internet, I did my general poking around in a hardcopy Encyclopædia Britannica. Other than some logical terminology to do with defining sets, and specialized use of the term abstraction for function construction in λ-calculus, I found an interesting remark that (as best I can now recall) while Scholasticism, the dominate academic tradition in Europe during the Dark Ages, was mostly concerned with theological questions, one (or did the author claim it was the one?) non-theological question Scholastics extensively debated was the existence of universals — what I would tend to call "abstractions". There were three schools of thought on the question of universals. One school of thought said universals have real existence, perhaps even more real than the mundane world we live in; that's called Platonism, after Plato who (at least if we're not misreading him) advocated it. A second school of thought said universals have no real existence, but are just names for grouping things together. That's called nominalism; perhaps nobody believes it in quite the most extreme imaginable sense, but a particularly prominent representative of that school was William of Ockham (after whom Occam's razor is named). In between these two extremes was the school of conceptualism, saying universals exist, but only as concepts; John Locke is cited as representative (who wrote An Essay Concerning Human Understanding, quoted at the beginning of the Wizard Book for its definition of abstraction).

That bit of esoterica didn't directly help me with abstraction theory. Many years later, though, in researching W.V. Quine's dictum To be is to be the value of a variable (which I'd been told was the origin of Christopher Strachey's notion of first-class value), when I read a claim by Quine that the three early-twentieth-century schools of thought on the foundations of mathematics — logicism, intuitionism, and formalism — were latter-day manifestations of the three medieval schools of thought on universals, I was rather bemused to realize I understood what he was saying.

I kept hoping, though, I'd find some serious modern research relevant to what I was trying to do with abstraction theory. So I expanded the scope of my literature search to my alma mater's university library, and was momentarily thrilled to find references to treatment of semiotics (I'd never heard the term before) in terms of sets of texts, which sounded a little like what I was doing. It took me, iirc, one afternoon to be disillusioned. Moving from book to book in the stacks, I gathered that the central figure in the subject in modern times was someone (whom I'd also never heard of before) by the name of Jacques Derrida. But it also became very clear to me that the material was coming across to me as meaningless nonsense — suggesting that either the material was so alien it might as well have been ancient Greek (I hadn't actually learned the term nonconfigurational at that point, but yes, same language), or else that the material was, in fact, meaningless nonsense.

The modern growth of the internet, all of which has happened since my first literature searches on that subject, doesn't necessarily improve uniformly on what could be done by searching off-line through physical stacks of books and journals in a really good academic library (indeed, it may be inferior in some important ways), but what is freely available on-line can be found with a lot less effort (if you can devise the right keywords to search for; which was less of a problem for off-line searches before the old physical card catalogs were destroyed — but I digress). Turns out I'm not alone in my reaction to Derrida; here are some choice quotes about him from Wikiquote:

Derrida's special significance lies not in the fact that he was subversive, but in the fact that he was an outright intellectual fraud — and that he managed to dupe a startling number of highly educated people into believing that he was onto something.
— Mark Goldblatt, "The Derrida Achievement," The American Spectator, 14 October 2004.
Those who hurled themselves after Derrida were not the most sophisticated but the most pretentious, and least creative members of my generation of academics.
— Camille Paglia, "Junk Bonds and Corporate Raiders : Academe in the Hour of the Wolf", Arion, Spring 1991.
Many French philosophers see in M. Derrida only cause for silent embarrassment, his antics having contributed significantly to the widespread impression that contemporary French philosophy is little more than an object of ridicule. M. Derrida's voluminous writings in our view stretch the normal forms of academic scholarship beyond recognition. Above all — as every reader can very easily establish for himself (and for this purpose any page will do) — his works employ a written style that defies comprehension.
— Barry Smith et al. "Open letter against Derrida receiving an honorary doctorate from Cambridge University", The Times (London), Saturday, May 9, 1992.

This is the intellectual climate in which, in the 1990s, physicist Alan D. Sokal submitted a nonsense article to peer reviewed scholarly journal Social Text, to see what would happen — and it was published (link).

One might ask what academic tradition (if any) Derrida's work came from. Derrida references Saussure. Derrida's approach is sometimes called post-structuralism, as it critiques the structuralist tradition of the earlier twentieth century. Structuralism, I gather, said that the relation between the physical world and the world of ideas must be mediated by the structures of language. (In describing post-structuralism one may cover a multitude of sins with a delicate term like "critique", such as denying that the gap between reality and ideas can be bridged, or denying that there is such a thing as reality.) Structuralism, in turn, grew out of structural linguistics, the theory that language could be understood as a hierarchy of discrete structures — phonemes, morphemes, lexical categories, and so on. Structural linguistics is due in significant part to Ferdinand de Saussure.

It doesn't seem fair to blame Saussure for Derrida. Apparently a large part of all twentieth-century linguistic theory traces back through Saussure. Saussure's tidily structured approach to linguistics does appear to have led to both the Chomskyist and (rather less directly, afaict) the dependency grammar approaches — the phrase-structure approach is also called constituency grammar to contrast with dependency, as the key difference is whether one looks at the parts (constituents) or the connections (dependencies). Despite my suspicion that both of those approaches may have inherited some over-tidiness, I'm not inclined to "blame" Saussure for them, either; it seems to me perfectly possible that the structural strategy may have been a good way to move things forward in its day, and also not be a good way to move things forward from where we are now. The practical question is, where-to next?

That term phonaestheme, which reminded me of phonaesthetics associated with J.R.R. Tolkien? Turns out phonaestheme was coined by J.R. Firth, an English contemporary of J.R.R. Tolkien. Firth was big on the importance of context. "You shall know a word by the company it keeps", he's quoted from 1957. Apparently he favored "polysystematism", which afaict means that you don't limit yourself to just one structural system for studying language, but switch between systems opportunistically. Since that's pretty much what a conlanger has to do — whatever works — I rather like the attitude. "His work on prosody," says Wikipedia somewhat over-sagaciously, "[...] he emphasised at the expense of the phonemic principle". It took me a few moments to untangle that; it says a lot less than it seems; as best I can figure, prosody is aspects of speech sound that extend beyond individual sound units (phonemes), and the phonemic principle basically says all you need are phonemes, i.e., you don't need prosody. So... he emphasized prosody at the expense of the principle that you don't need prosody? Doesn't sound as impressive, somehow. Unsurprisingly, Saussure comes up in the early history of the idea of phonemes.

In hunting around for stuff about discourse, I've been aware for a while there's another whole family of approaches to grammar called functional grammar — as opposed to structural grammar. So the whole constituent/‌dependency thing is all structural, and this is off in a different world altogether. Words are considered for their purpose, which naturally puts a lot of attention on discourse because a lot of the purpose of words has to do with how they fit into their larger context (that's why the advice to consider discourses in the first place). There are a bunch of different flavors of functional grammar, including one — systemic functional grammar — due to Firth's student Michael Halliday; Wikipedia notes that Halliday approaches language as a semiotic system, and lists amongst the influences on systemic functional grammar both Firth and Saussure. (Oh what a tangled web we weave...) I keep hoping to find, somewhere in this tangle, a huge improvement on the traditional —evidently Saussurean— approach to grammar/‌linguistics I'm more-or-less familiar with. Alas, I keep being disappointed to find alien vocabulary and alien concepts, and keep gradually coming to suspect that a lot of what the structural approach can do well (there are things it does well) has been either sidelined or simply abandoned, while at the same time the terminology has been changed more than necessary, for the sake of being different.

It's a corollary to the way new scientific paradigms seek to gain dominance (somewhat relevant previous post: yonder) that the new paradigm will always change more than it needs to. A new paradigm will not succeed if it tries merely to improve those things about the old paradigm that need improvement. Normal science gets its effectiveness from the fact that normal scientists don't have to spend time and effort defending their paradigm, so they can put all that energy into working within the paradigm, and thereby make rapid progress at exploring that paradigm's space of possible research. Eventually this leads to clear recognition of the inadequacies of the paradigm; but even then, many folks will stick to the old paradigm, and we probably shouldn't think too poorly of them for doing so, even though we might think they're being shortsighted in the particular case. Determination to make one or another paradigm work is the wind in science's sails. But, exactly because abandoning the old paradigm for a new one is so traumatic, nobody's going to want to do it for a small reason. And those who do want to do it are likely to want to disassociate themselves from the old paradigm entirely. That means changing way more than necessary. Change for its own sake, far in excess of what was really needed to deal with the problems that precipitated the paradigm shift in the first place.

Another thread in the neighborhood of functional grammar is emergent grammar, a view of linguistic phenomena proposed in a 1987 paper by British-American linguist Paul Hopper. Looking over that paper gave me a better appreciation of the structuralism/‌functionalism schism as a struggle between rival paradigms. As Thomas Kuhn noted, rival paradigms aren't just alternative theories; they determine what entities there are, what questions are meaningful, what answers are meaningful — so followers of rival paradigms can fail to communicate by not even agreeing on what the subject of discussion is. Notably, even Hopper's definition of discourse isn't the same as what I thought I was dealing with when I started. My impression, starting after all with traditional (structural) grammar by default, was that discourse is above the level of a sentence; but for functional grammarians, to my understanding, that sentence boundary is itself artificial, and they'd object to making any such strong distinction between intra-sentence and inter-sentence. Hopper's paper is fairly willing to acknowledge that traditional grammatical notions aren't altogether illusions; its point is that they are only approximations of the pattern-matching reality assembled by language speakers, for whom the structure of language — abstract rules, idioms, literary allusions, whatever — is perpetually a work in progress.

Which sounds great... but, looking through some abstracts of more recent work in the emergent grammar tradition, one gets the impression that much of it amounts to "we don't yet have a clue how to actually do this". So once again, it seems there's more backing away from traditional structural grammar than replacing it; I've sympathy for their plight, as anyone trying to develop an alternative to a well-established paradigm is sure to have a less developed paradigm than their main competition, but that sympathy doesn't change my practical bottom line.

It was interesting to me, looking through Hopper's paper, that while much of it was quite accessible, the examples of discourse were not so much.

Gosh

Fascinating though the broad sweep of these shifting paradigmatic trends may be, it seems kind of like overkill. I do believe it's valuable big picture, but now that we've oriented to that big picture, it seems we ought to come down to Earth a bit if we're to deal with the immediate problem; I started just wanting a handy way to explore how different conlang features play out in extended discourse. As a conlanger I've neither been a great fan of linguistic universals (forsooth), nor felt any burning need to overturn the whole concept of grammatical structure. As I've remarked before, a structural specification of a conlang is likely to be the conlang's primary identity; most conlangs don't have a lot of L1 speakers with which to do field interviews. Granting Kuhn's observation that a paradigm determines what questions and answers are possible, if a linguistic paradigm doesn't let me effectively answer the questions I need to answer to define my conlang, I won't be going in whole-hog for that linguistic paradigm.

Also, as remarked earlier, the various modern approaches — both structural and functional — analyze (natural) language, and there's no evident reason to suppose that running that analysis in reverse would make a good way to construct a language, certainly not if one hopes for a result that doesn't have the assumptions of that analysis built in.

So, for conlanging purposes, what would an ideal approach to language look like?

Well, it would be structural enough to afford a clear definition of the language. It would be functional enough to capture the more free-form aspects of discourse that intrude even on sentences in "configurational" languages like English. In all cases it would afford an easily accessible presentation of the language. Moreover, we would really like it to — if one can devise a way to achieve this — avoid pre-determining the range of ways the conlang could work. It might be possible, following a suitable structuralist paradigm, to reduce the act of building a language to a series of multiple-choice questions and some morpheme entries (or just tell the wizard to use a pseudo-random-number generator), but the result would not be art, just as paint-by-numbers isn't art; and, in direct proportion to its lack of artistry, it would lack value as an exploration of unconstrained language-space. For my part, I see this as an important and rather lovely insight: the art of conlanging is potentially useful to the science of linguistics only to the extent that conlanging is an art rather than a science.

The challenge has a definitional aspect and a descriptive aspect. One way to define how a language works is to give a classical structural specification. This can be relatively efficient and lucid, for the amount of complexity it can encompass. As folks such as Hopper point out, though, it misses a lot of things like idioms, and proverbs, and overall patterns of discourse. Not that they'd necessarily deny the classical structural description has some validity; it's just not absolute, nor complete. We'd like to be able to specify such linguistic patterns in a way that includes the more traditional ones and also includes all these other things, in a spectrum. Trouble is, we don't know how. One might try to do it by giving examples, and indeed with a sufficient amount of work that might more-or-less do the job; but then the descriptive aspect rears its head. Some of these patterns are apparently quite complicated and subtle, and by-example is quite a labor-intensive way to describe, and quite a labor-intensive way to learn, them. Insisting on both aspects at once, definitional and descriptive, isn't asking "too much", it's asking for what is actually needed for conlanging — which makes conlanging a much more practical forum for thrashing this stuff out; an academic discipline isn't likely to reject a paradigm on the grounds that it isn't sufficiently lucid for the lay public. The debatable academic merits of some occult theoretical approach to linguistics is irrelevant to whether an artlang's audience can understand it.

So what we're looking for is a lucid way to describe more-or-less-arbitrary patterns of the sort that make up language, ranging from ordinary sentence structure through large-scale discourse patterns and whatnot. Since large-scale discourse patterns are, afaict, already both furthest from lucidity and furthest from being covered by the traditional structural approach, they seem a likely place to start.

Easy as futhorc

Let's take one of those extended examples that I found impenetrable in Hopper's paper. It's a passage from the Anglo-Saxon Chronicle, excerpted from the entry for Anno Domini 755; that's the first year that has a really lengthy report (it's several times the length of any earlier year). Here is the passage as rendered by Wikisource (as Hopper's paper did not fare well in conversion to html). It's rather sparsely punctuated; instead it's liberally sprinkled with symbol ⁊, shorthand for "and" (even at junctures where there is a period). The alphabet used is a variant of Latin with several additional letters — æ and ð derived from Latin letters, þ and ƿ derived from futhorc runes (in which Anglo-Saxon had been written in earlier times, whose first six runes are feoh ur þorn os rad cen — futhorc).

⁊ þa geascode he þone cyning lytle werode on wifcyþþe on Merantune ⁊ hine þær berad ⁊ þone bur utan beeode ær hine þa men onfunden þe mid þam kyninge wærun

The point Hopper is making about this passage has to do with the way its verbs and nouns are arranged, which wouldn't have to be arranged that way under a traditional structural description of the "rules" of Anglo-Saxon grammar. Truthfully, coming to it cold, his point fell completely flat for me because only laborious scrutiny would allow me to even guess which of those words are verbs and which are nouns, let alone how the whole is put together. And that is the basic problem, right there: the pattern meant to be illustrated can't be seen without first achieving a level of comfort with the language that may be expensive. If, moreover, you want to consider a whole range of different ways of doing things (as I have sometimes wanted to do, in my own conlanging), the problem is greatly compounded.

Since Hopper's point involves logical decomposition of the passage into segments, he does so and sets next to each its translation (citing Charles Plummer's 1899 translation); as Hopper's paper (at least, in html conversion) rather ran together each segment with its translation, making them hard to separate by eye, I've added tabular format:

Anglo-Saxon English

⁊ þa geascode he þone cyning and then he found the king

lytle werode with a small band of men

on wifcyþþe a-wenching

on Merantune in Merton

⁊ hine þær berad and caught up with him there

⁊ þone bur utan beeode and surrounded the hut outside

ær hine þa men onfunden before the men were aware of him

þe mid þam kyninge wærun who were with the king

Table 1.

I looked at that and struggled to reason out which Anglo-Saxon word contributes what to each segment (and even then it was just a guess). The problem is further highlighted by Hopper's commentary, where he chooses to remark particularly on which bits are verb-initial and which are verb-final — as if I (his presumed interested, generally educated but lay, reader) could see at a glance which words are the verbs, or, as he may have supposed, just see at a glance the whole structure of the thing.

We can glimpse another part of the same elephant from Tolkien's classic 1936 lecture "Beowulf: The Monsters and the Critics", in which he promoted the wonderfully subversive position that Beowulf is beautiful poetry, not just a subject for dry academic scholarship. His lecture has been hugely influential ever since; but my point here is that he was one of those polyglots I was talking about earlier, and was able to appreciate the beauty of the poem because he was fluent in Old English (as well as quite a lot of other related languages, including, of all things, Gothic). I grok that such beauty is best appreciated from the inside; but it really is difficult for mere mortals to get inside like that. One suspects a shortfall of deep fluency even amongst the authors of academic treatises on Beowulf may have contributed significantly to the dryness Tolkien was criticizing. My concern here is that we want to be able to illustrate (and even investigate) facets of the structure of discourses without requiring prior fluency; if these illustrations also contribute to later fluency, that'd be wicked awesome.

The two problems with Table 1 are, apparently, that it's not apparent what's going on with the individual words, and that it's not apparent what's going on with the significant part (whatever that is) of the high-level structure. There's a standard technique meant to explain what the individual words are doing, glossing. There are a couple of good reasons why one would not expect glossing to be a good fit here, but we need to start somewhere, so here's at attempt at an interlinear gloss for this passage:

⁊ þa geascode he þone cyning

and
then
intensive-ask
3rd;sg;past he
3rd;nom;sg the
acc;msc;sg king

and then found he the king

lytle werode

small
instrumental;sg troop
dative;sg

with a small band of men

on wifcyþþe

on/at/about
woman.knowledge
dative;sg

about woman-knowledge

on Merantune

on/in/about
Merton
dative

in Merton

⁊ hine þær berad

and
he
acc;sg there
adverb catch.up.with.by.riding
3rd;sg;past

and him there caught up with

⁊ þone bur utan beeode

and
the
acc;msc;sg room
from.without
adverb bego/surround
3rd;sg;past

and the hut outside surrounded

ær hine þa men onfunden

before
he
acc;sg the
nom;pl man
nom;pl en-find
subjunctive;pl;past

before him the men became aware of

þe mid þam kyninge wærun

who/which/that
with
he
dative king
dative;sg be
pl;past

who with the king were

Table 2.

Okay. Those two reasons I had in mind, why glossing would not be a good fit here, are both in evidence. Basically we get simultaneously too much and too little information.

I remarked in a previous post that it's very easy for glossing to fail to communicate its information ("too little" information). I didn't "borrow" the above gloss from someone else's translation (though I did occasionally compare notes with one); I put it together word-by-word, and got far more out of that than is available in Table 2. The internal structures of some of those words are quite fascinating. Hopper was talking about verb-initial and verb-final clauses, and I was sidetracked by the fact that his English translations didn't preserve the positions of the verbs; I've tried to fix that in Table 2, by tying the translation more closely to the original; but I was also thrown off by the translation "a-wenching", because it gave me the impression that was a verb-based segment. I do like the translation a-wenching rather more than other translations I've found, as it doesn't beat around the bush; I also found womanizing, with a woman, and, just when I thought I'd seen it all, visiting a lady, which forcefully reminded me of Mallory's Le Morte Darthur. The original is a prepositional phrase, with preposition on and object of the preposition wifcyþþe.

I first consciously noticed about thirty years ago that prepositions are spectacularly difficult to translate between languages, an awareness that has shaped the directions of my conlanging ever since. Wiktionary defines Old English (aka Anglo-Saxon) on as on/in/at/among. wifcyþþe is even more fun; not listed as a whole in Wiktionary, an educated guess shows it's a compound whose parts are individually listed — wif, woman/wife, and an inflected form of cyþþu, suggested definitions either knowledge, or homeland (country which is known to you). So the king was on about woman-knowledge. Silly me; I'd imagined that "biblical knowledge" thing was a euphemism for the sake of Victorian sensibilities, which perhaps it was in part, but the origin is at least a thousand years earlier and not apparently trying to spare anyone's sensibilities. It also doesn't involve any verb, so I adjusted the translation to reflect that.

The declension of cyþþu was rather insightful, too. The Wiktionary entry is under cyþþu because that's the nominative singular. The declension table has eight entries; columns for singular and plural, rows for nominative, accusative, genitive, and dative; no separate row for the instrumental case, though instrumental does show up separately for some Old English nouns. But here's the kicker: cyþþe is listed in all the singular entries except nominative, and as an alternative for the plural nominative and accusative. I've listed it as dative singular, because in this context (as best I can figure) it has to be dative to be the object of the preposition, and as a dative it has to be singular, but that really isn't an intrinsic property of the word. It really seems very... emergent. This word is somewhere in an intermediate state between showing these different cases and not showing them. Putting it another way, the cases themselves are in an intermediate state of being: the "reality" of those cases in the language depends on the language caring about them, and evidently different parts of the language are having different ideas about how "real" they should be (in contrast to unambiguous, purely regular inflections in non-naturalistic conlangs such as Esperanto or, for that matter, my own prototype Lamlosuo).

There's also rather more going on than the gloss can bring out in geascode and berad, which involve productive prefixes ge- and be- added to ascian and ridan — ge-ask = discover by asking/demanding (interrogating?), be-ride = catch up with by riding. All that, beneath the level of what the gloss brings out — as well as the difficulty the gloss has bringing it out. The gloss seems most suited to providing additional information when focusing in on what a particular word is doing within a particular small phrase; it can't show the backstory of the word ("too little" information) at the same time that it clutters any attempt to view a large passage ("too much" information; the sheer size of Table 2 underlines this point). Possibly, for this purpose, the separate line for the translation is largely redundant, and could be merged with the gloss to save space; but there's still too much detail there. The next step would be to omit some of the information about the inflections; but this raises the question of just which information about the words does matter for the sort of higher-level structure we're trying to get at.

Here's a compactified form based on the gloss, merging the gloss with the translation and omitting most of the grammatical notes.

⁊ þa geascode he þone cyning

and
then
ge-asked
(found) he
(nominative) the
(accusative) king

lytle werode

with a small
(instrumental) band of men
(dative)

on wifcyþþe

on/about
woman.knowledge
(dative; wenching)

on Merantune

in
Merton
(dative)

⁊ hine þær berad

and
him
(accusative) there
(adverb) be-rode
(caught up with)

⁊ þone bur utan beeode

and
the
(accusative) hut
outside
(adverb) be-goed
(surrounded)

ær hine þa men onfunden

before
him
(accusative) the men
(nominative) en-found
(noticed)

þe mid þam kyninge wærun

who
with
the
(dative) king
(dative) were

Table 3.

Imho this is better, bringing out a bit more of the most important low-level information, less of the dispensable low-level clutter, and perhaps leaving more opportunity for glimpses of high-level structure. In this particular case, since the higher-level structure Hopper wants to bring out is simply where the verbs are, one might do that by putting the verbs in boldface, thus:

⁊ þa geascode he þone cyning

and
then
ge-asked
(found) he
(nominative) the
(accusative) king

lytle werode

with a small
(instrumental) band of men
(dative)

on wifcyþþe

on/about
woman.knowledge
(dative; wenching)

on Merantune

in
Merton
(dative)

⁊ hine þær berad

and
him
(accusative) there
(adverb) be-rode
(caught up with)

⁊ þone bur utan beeode

and
the
(accusative) hut
outside
(adverb) be-goed
(surrounded)

ær hine þa men onfunden

before
him
(accusative) the men
(nominative) en-found
(noticed)

þe mid þam kyninge wærun

who
with
the
(dative) king
(dative) were

Table 4.

Hopper's point is, broadly, that this follows the pattern of "a verb-initial clause, usually preceded by a temporal adverb such as a 'then'; [...] [which] may contain a number of lexical nouns introducing circumstances and participants [...] followed by a succession of verb-final clauses". And indeed, we can now see that that's what's going on here.

The technique used by Table 4, with some success, also has a couple of limitations. (1) It is specific to this one type of structure, with no apparent generalization. (2) It appears to be a means only for showing the reader a pattern that the linguist already recognizes, rather than for the linguist to discover patterns, or, even more insightfully, for the linguist to experiment with how the high-level dynamics would be changed by an alteration in the rules of the language. Are those other things too much to ask? Heck no. Ask, otherwise ye should expect not to receive.

Revalency

For a second case study to move things forward, I have in mind something qualitatively different; not a single extended passage with some known property(-ies) to be conveyed to the reader, but a battery of examples exploring different ways to arrange a language, meant to be exhaustive within a limited range of options. It is, in fact, a variant of the case study that set me on the road to the discourse-representation problem. About ten years ago, after first encountering David J. Peterson's essay on Ergativity, I dreamed up a verb alignment scheme, alternative to nominative-accusative (NA) or ergative-absolutive (EA), called valent-revalent (VR), and was curious enough to try to get a handle on it by an in-depth systematic comparison with NA and EA. The attempt was both a success and a failure. I learned some interesting things about VR that were not at all apparent to start with, but I'm unsure how far to credit the lucidity of the presentation — by which we want to elucidate things for both the conlanger and, hopefully, their audience — for those insights; it seems to some extent I learned those things by immersing myself in the act of producing the presentation. I also came away from it with a feeling of artificiality about VR, but it's taken me years to work out why; and in the long run I didn't stay satisfied with the way I'd explored the comparison between the systems, which is part of why I'm writing this blog post now.

First of all, we need to choose what form our illustrations will take — that is, we have to choose our example "language". Peterson's essay defines a toy conlang — Ergato — with only a few words and suffixes so that simply working through the examples, with a wide variety of different ways for the grammar to work, is enough to confer familiarity. I liked his essay and imitated it, using a subset of Ergato for an even smaller language, to illustrate just the specific issues I was interested in. Another alternative, since we're trying to explore the structure itself, might be to use pseudo-English with notes, like the translational gloss in the previous section but without the Old English at the top. Some objections come to mind, though pseudo-English is well worth keeping handy in the toolkit. The pseudo-English may be distracting; Ergato is, gently, more immersive. The pseudo-English representation would be less compact than Ergato. And a micro-conlang Ergato has more of the fun of conlanging in it.

The basic elements of reduced Ergato:

Verbs Nouns Pronoun

English Ergato

to sleep
to pet
to give sapu
lamu
kanu

English Ergato

panda
woman
book
man
fish palino
kelina
kitapo
hopoko
tanaki

English Ergato

she li

Conjunction

English Ergato

and i

Suffixes Suffixes

English Ergato

Valency Reduction
Past Tense
Plural
-to
-ri
-ne

English Ergato

Default Case
Special Case
Recipient/Dative Case
Oblique Case
Extra Case —
-r
-s
-k
-m

Table 5.

Peterson's essay had more verbs, especially, so he could explore various subtle semantic distinctions; for the structures I had in mind, I just chose one intransitive (sleep), one transitive (pet), and one ditransitive (give).

Quick review: NA and EA concern core thematic roles of noun arguments to a verb: S=subject, the single argument to an intransitive verb; A=agent, the actor argument to a transitive verb; P=patient, the acted-upon argument to a transitive verb. In pure NA and pure EA, two of the three core thematic roles share a case, while one is different; in pure NA, the odd case is accusative patient, the other two are nominative; in pure EA, the odd case is ergative agent, the other two are absolutive. There are other systems for aligning verb arguments, but I was, mostly, only looking at those two and VR.

Word order was a question. Peterson remarks that he finds SOV (subject object verb) most natural for an ergative language, and I find that too. (I'll have a suggestion as to why, a bit further below.) I'm less sure of my sense that SVO (subject verb object) is natural for a nominative language, because my native English is nominative and SVO, which might be biasing me (or, then again, the evolution of English might be biased in favor of SVO because of some sort of subtle affinity between SVO and nominativity). But I found verb-initial order (VSO) far the most natural arrangement for VR. So, when comparing these, should one use a single word order so as not to distract from the differences, or let each one put its best foot forward by using different orders for the three of them? I chose at the time to use verb-initial order for all three systems.

Okay, here's how VR works, in a nutshell (skipping some imho not-very-convincing suggestions about how it could have developed, diachronically); illustrations to follow. Argument alignment is by a combination of word order with occasional case-like marking. By default, all arguments have the unmarked case, called valent; the first argument is the subject/agent, the second is the patient, and if it's ditransitive the third is the recipient. Arguments can be omitted by simply leaving them off the end. If an argument is marked with suffix -t, it's in the revalent case, which means that an argument was omitted just before it; the omitted argument can be added back onto the end. To cover a situation that can only come up with a ditransitive verb, there's also a double-revalent case, marked by -s, that means two arguments were omitted. (The simplest, though not the only, reason VR prefers verb-initial order is that, in order to deduce the meaning of an argument from its VR marking, you have to already know the valency of the verb.)

A first step is to illustrate the three systems side-by-side for ordinary sentences. To try to bring out the structure, such as it is, the suffixes are highlighted. This would come out better with a wider page; but we'd need a different format if there were more than three systems being illustrated, anyway.

The woman is sleeping.
The woman is petting the panda.
The woman is giving the book to the panda.

NA EA VR

Sapu kelina.

Lamu kelina palinor.

Kanu kelina kitapor palinos.

Sapu kelina.

Lamu kelinar palino.

Kanu kelinar kitapo palinos.

Sapu kelina.

Lamu kelina palino.

Kanu kelina kitapo palino.

Table 6.

Valency reduction changes a transitive verb to an intransitive one. Starting with a transitive sentence, the default-case argument is dropped, the special-case argument is promoted to default-case, the verb takes valency-reduction marking to make it intransitive, and the dropped argument may be reintroduced as an oblique. In NA, this is passive voice; in EA, it's anti-passive voice. VR is different from both, in that the verb doesn't receive a valency-reduction suffix at all (in fact, I chose revalent suffix -t on the premise that it was descended from a valency-reduction suffix that somehow moved from the verb to one of its noun arguments), and both the passive and anti-passive versions are possible.

The woman is petting the panda.
The woman is petting.
The panda is being petted.
The panda is being petted by the woman

NA EA VR

Lamu kelina palinor.

Lamuto palino.

Lamuto palino kelinak.

Lamu kelinar palino.

Lamuto kelina.

Lamuto kelina palinok.

Lamu kelina palino.

Lamu kelina.

Lamu palinot.

Lamu palinot kelina.

Table 7.

There may be a hint, here, of why Ergativity would have an affinity for SOV word order. In this VSO order, anti-passivization moves the absolutive (unmarked) argument from two positions after the verb to one position after the verb (or to put it another way, it changes the position right after the verb from ergative to absolutive). Under SVO, anti-passivization would move the absolutive argument from after the verb to before it (or, change the position before the verb from ergative to absolutive). But under SOV, the absolutive would always be the argument immediately before the verb.

This reasoning doesn't associate NA specifically with SVO, but does tend to discourage NA from using SOV, since then passivization would move the nominative relative to the verb. On the other hand, considering more exotic word orders (which conlangers often do), this suggests NA would dislike VOS but be comfortable with OVS or OSV, while EA would dislike OVS and OSV but be comfortable with VOS.

Passivization omits the woman. Anti-passivization omits the panda. Pure NA Ergato has no way to omit the woman, pure EA Ergato has no way to omit the panda. Pure VR can omit either one. English can also omit either one, because in addition to allowing passivization of to pet, English also allows it to be an active intransitive verb — "The woman is petting." (One could say that to pet can be transitive or intransitive, or one might maintain that there are two separate verbs to pet, a transitive verb and an intransitive verb; it's an artificial question about the structural description of the language, not about the language itself.)

Table 7 is a broad and shallow study; and by that standard, imho rather successful, as the above is a fair amount of information to have gotten out of it. However, it's too shallow to provide insight into why one might want to use these systems (if, in fact, one would, which is open to doubt since natlangs generally aren't "pure" NA or EA in this sense, let alone VR). A particularly puzzling case, as presented here, is why a speaker of pure EA Ergato would want to drop the panda and then add it back in exactly the same position but with the argument cases changed; but on one hand this is evidently an artifact of the particular word order we've used, and on the other hand Delancey was pointing out that different languages may have entirely different motives for ergativity.

Using VR, it's possible to specify any subset of the arguments to a verb, and put them in any order.

VR English

Kanu kelina kitapo palino.

Kanu kelina palinot kitapo.

Kanu kitapot palino kelina.

Kanu kitapot kelinat palino.

Kanu palinos kelina kitapo.

Kanu palinos kitapot kelina.

The woman is giving the book to the panda.
The woman is giving to the panda the book.
The book is being given to the panda by the woman.
The book is being given by the woman to the panda.
The panda is being given by the woman the book.
The panda is being given the book by the woman.

Kanu kelina kitapo.
Kanu kelina palinot.
Kanu kitapot palino.
Kanu kitapot kelinat.
Kanu palinos kelina.
Kanu palinos kitapot. The woman is giving the book.
The woman is giving to the panda.
The book is being given to the panda.
The book is being given by the woman.
The panda is being given to by the woman.
The panda is being given the book.

Kanu kelina.
Kanu kitapot.
Kanu palinos. The woman is giving.
The book is being given.
The panda is being given to.

Table 8.

And this ultimately is why it fails. You can do this with VR; and why would you want to? In shallow studies of the pure NA and pure EA languages, we could suspend disbelief there would turn out to be some useful way to exploit them at a higher level of structure, because we know those pure systems at least approximate systems that occur in natlangs. But VR isn't approximating something from a natlang. It was dreamed up from low-level structural concerns; there's no reason to expect it will have some higher-level benefit. It's not something one would do without need, either. It requires tracking not just word order, not just case markings, but a correlation between the two such that case markings have only positional meaning about word order, rather than any direct meaning about the roles of the marked nouns, which seems something of a mental strain. It's got no redundancy built into it, and it's perfectly unambiguous in exhaustively covering the possibilities — much too tidy to occur in nature. There's also no leeway in it for the sort of false starts and revisions that take place routinely in natural speech; you can't decide after you've spoken a revalent noun argument to use a different word and then say that instead, because the meaning of the revalent suffix will be different the second time you use it.

It's still a useful experiment for exploring the dynamics of alignment systems, though.

But just a bit further up in scale, we meet a qualitatively different challenge

Relative clauses

Consider relative clauses. In my revalency explorations ten years ago, I seem to have simply chosen a way for relative clauses to work, and run with it. There was a Conlangery Podcast about relative clauses a while back (2012), which made clear there are a lot of ways to do this. Where to start? Not with English; too worn a trail. My decade-past choice looks rather NA-oriented; so, how about an EA language? Lots of languages have bits of ergativity in them — even English does — but deeply ergative languages are thinner on the ground. Here's a sample sentence from a 1972 dissertation on relative clauses in Basque (link).

Aitak irakurri nai du amak erre duen liburua.
Father wants to read the book that mother has burned.

I had a lot more trouble assembling a gloss for this sentence than for the earlier example in Anglo-Saxon. You might think it would be easier, since Basque is a living language actively growing in use over recent decades, where Anglo-Saxon has been dead for the better part of a thousand years; and since the example is specifically explicated in a dissertation by a linguist — one would certainly like to think that being explicated intensely by a linguist would be in its favor. The dissertation did cover more of these words than Wiktionary did. My main problem was with du/duen; I worked out from general context, with steadily increasing confidence, they had to be finite auxiliary verbs, but my sources were most uncooperative about confirming that.

aitak irakurri nai du amak erre duen liburua

father to read wants mother to burn has done book

(ergative) (infinitive) (desire has) (ergative) (infinitive) (relativized) (absolutive)

Basque is a language isolate — a natlang that, as best anyone can figure, isn't related to any other language on Earth. Suggested origins include Cro-Magnons and aliens.

Basque is thoroughly ergative (rather than merely split ergative — say, ergative only for the past tense). It's not altogether safe to classify Basque by the order of subject object and verb, because Basque word order apparently isn't about which noun is the subject and which is the object; it's about which is the topic and which is the focus; I haven't tackled that to fully grok, but it makes all kinds of sense to me that a language that thoroughly embraces ergativity would also not treat subject as an important factor in choosing word order, since subject in this sense is essentially the nominative case. That whole line of reasoning about why SOV would be more natural for an ergative language than SVO or VSO exhibits, in retrospect, a certain inadequacy of imagination. Also, most Basque verbs don't have finite forms. Sort-of as if most verbs could only be gerunds (-ing). Nearly all conjugation is on an auxiliary verb, that also determines whether the clause is transitive or intransitive — as if instead of "she burned the book" you'd say "she did burning of the book" (with auxiliary verb did). There are also more verbal inflections than in typical Indo-European languages; the auxiliary verb agrees with the subject, the direct object, and the indirect object (if those objects occur). I was reminded of noted conlang Kēlen, which arranges to have, in a sense, no verbs; if you took the Basque verbal arrangement a bit further by having no conjugating verbs at all beyond a small set of auxiliaries, and replaced the non-finite verbs with nouns, you'd more-or-less have Kēlen.

When a relative clause modifies a noun, one or another of the nouns in the relative clause refers to the antecedent — although in Basque the relative clause occurs before the noun it modifies, so say rather one of them refers to the postcedent. In my enthusiastically tidy mechanical tinkering ten years ago, I worried about how to specify such things unambiguously. Basque's solution? Omit the referring word entirely. Which also means you omit all the affixes that would have been on that noun in the relative clause; and Basque really pours on the affixes. So, as a result of omitting the shared noun from the relative clause, you may be omitting important information about its role in the relative clause, thus important information about how the relative clause relates to the noun it modifies, leaving lots of room for ambiguity which the audience just resolves from context. Now that's a natural language feature; I love it.

The 1972 dissertation took time out (and space, and effort) to argue, in describing this omission of the shared noun, that the omitted noun is deleted in place, rather than moved somewhere else and then deleted. This struck me as a good example of what can happen when you try to describe something (here, Basque) using a structure (here, conventional phrase-structure grammar) that mismatches the thing described, and have to go through contortions to make it come out right. It reminded me of debating how many angels can dance on the head of a pin. The sense of mismatch only got stronger when I noticed, early in the dissertation's treatment, parenthetical remark "(questions of definiteness versus indefiniteness will not be raised here)". He'd put lots of attention into things dictated by his paradigm even though they don't correspond to obvious visible features of the language, while dismissing obvious visible things his paradigm said shouldn't matter.

Like I was saying earlier: determination to make one or another paradigm work is the wind in science's sails.

It's tempting to perceive Basque as a bizarre and complicated language. Unremitting ergativity. Massive agglutinative affixing. Polypersonal agreement on auxiliary verbs. Even two different "s" phonemes (that is, in English they're both allophones of the alveolar fricative). I'm given to understand such oddness continues as one drills down into details of the language. The Conlangery Podcast's discussion of Basque notes that it has a great many exceptions, things that only occur in one corner of the language. But, there's something wrong with this picture. All I've picked up over the years suggests there is no such thing as an especially complicated or bizarre natlang. Basque is agglutinative, the simply composable morphological strategy that lends itself particularly well to morpheme-based analysis. The Conlangery discussion notes that Basque verbs are extremely regular. Standard Basque phonology has the most boring imaginable set of vowels (if you're looking for a set of vowels for an international auxlang, and you want to choose phonemes basically everyone on Earth will be able to handle, you choose the same five basic vowel sounds as Basque). From what I understand of the history of grammar, our grammatical technology traces its lineage back to long-ago studies of either Sanskrit, Greek, or Latin, three Indo-European languages whose obvious similarities famously led to the proposal of a common ancestor language. It's to be expected that a language bearing no apparent genetic relationship whatsoever to any of those languages would not fit the resultant grammatical mold. If somehow our theories of grammatical structure had all been developed by scholars who only knew Basque, presumably the Indo-European languages wouldn't fit that mold well, either.

The deep end

All this discussion provides context for the problem and a broad sense of what is needed. The examples thus far, though, have been simple; even the Anglo-Saxon, despite its length. There's not much point charging blindly into complex examples without learning first what there is to be learned from more tractable ones. Undervaluing conventional structural insights seems a likely hazard of the functional approach.

My objective from the start, though, has been to develop means for studying the internal structure of larger-scale texts. Not these single sentences, about which hangs a pervasive sense of omission of larger structures intruding on them from above (I'm reminded (tangentially?) of the "network" aspect of subterms in my posts on co-hygiene). Sooner or later, we've got to move past these shallow explorations, to the deep end of the pool.

We've sampled kinds of structure that occur toward the upper end of the sentence level. (I could linger on revalency for some time, but for this post that's only a means to an end.) Evidently we can't pour dense information into our presentation without drowning out what we want to exhibit — interlinear glosses are way beyond what we can usefully do — so we should expect an effective device to let us exhibit aspects of a large text one aspect at a time, rather than displaying its whole structure at once for the audience to pick things out of. It won't be "automatic", either; we expect any really useful technique to be used over a very wide range of structural facets with sapient minds at both the input and output of the operation — improvising explorations on the input side and extracting patterns insightfully from the output. (In other words, we're looking not for an algorithm, but for a means to enhance our ability to create and appreciate art.)

It would be a mistake, I think, to scale up only a little, say to looking at how one sentence relates to another; that's still looking down at small structures, rather than up at big ones. It would also be self-defeating to impose strict limitations on what sort of structure might be illustrable, though it's well we have some expectations to provide a lower bound on what might be there to find. One limitation I will impose, for now: I'm going to look at reasonably polished written prose, rather than the sort of unedited spoken text sometimes studied by linguists. Obviously the differences between polished prose and unedited speech are of interest — for both linguistics and conlanging — but ordinary oral speech is a chaotic mess of words struggling for the sort of coherent stream one finds in written prose. So it should be possible to get a clearer view of the emergent structure by studying the polished form, and then as a separate operation one might try to branch outward from the relatively well-defined structures to the noisily spontaneous compositional process of speech. The definition of a conlang seems likely to be more about the emergent structure than the process of emergence, anyway.

So, let's take something big enough to give us no chance of dwelling in details. The language has got to be English; the point is to figure out how to illustrate the structure, and a prerequisite to that is having prior insight (prior to the illustrative device, that is) into all the structure that's there to be illustrated. Here's a paragraph from my Preface to Homer post; I've tried to choose it (by sheer intuition) to be formidably natural yet straightforward. I admit, this paragraph appeals to me partly because of the unintentional meta-irony of a rather lyrical sentence about, essentially, how literate society outgrows oral society's dependence on poetic devices.

Such oral tradition can be written down, and was written down, without disrupting the orality of the society. Literate society is what happens when the culture itself embraces writing as a means of preserving knowledge instead of an oral tradition. Once literacy is assimilated, set patterns are no longer needed, repetition is no longer needed, pervasive actors are no longer needed, and details become reliably stable in a way that simply doesn't happen in oral society — the keepers of an oral tradition are apt to believe they tell a story exactly the same way each time, but only because they and their telling change as one. When the actors go away, it becomes possible to conceive of abstract entities. Plato, with his descriptions of shadows on a cave wall, and Ideal Forms, and such, was (Havelock reckoned) trying to explain literate abstraction in a way that might be understood by someone with an oral worldview.

Considering this as an example text in a full-fledged nominative-accusative SVO natlang, with an eye toward how the nouns and verbs are arranged to create the overall effect — there's an awful lot going on here. The first sentence starts out with an example of topic sharing (the second clause shares the subject of the first; that's another thing I explored for revalency, back when), and then an adverbial clause modifying the whole thing; just bringing out all that would be a modest challenge, but it's only a small part of the whole. I count a little over 150 words, with at least 17 finite verbs and upwards of 30 nouns; and I sense that almost everything about the construction of the whole has a reason to it, to do with how it relates to the rest. But even I (who wrote it) can't see the whole structure at once. How to bring it into the light, where we can see it?

The only linguistic tradition I've noticed marking up longer texts like this is incompatible with my objectives. Corpus linguistics is essentially data mining from massive quantities of natural text; in terms of the functions of a Kuhnian paradigm, it's strong on methodology, weak on theory. The method is to do studies of frequencies of patterns in these big corpora (the Brown Corpus, for example, has a bit over a million words); really the only necessary theoretical assumption is that such frequencies of patterns are useful for learning about the language. There is btw, interestingly, no apparent way to reverse-engineer the corpus-linguistics method so as to construct a language. There is disagreement amongst researchers as to whether the corpus should be annotated, say for structure or parts of speech (which does entail some assumption of theory); but annotation even if provided is still meant to support data mining of frequencies from corpora, whereas I'm looking to help an audience grok the structure of a text of perhaps a few hundred words. Philosophically, corpus linguistics is about algorithmically extracting information from texts that cannot be humanly apprehended at once, whereas I'm all about humanly extracting information from a text by apprehension.

We'd like a display technique(s) to bring out issues in how the text is constructed; why various nouns were arranged in certain ways relative to their verbs and to other nouns, say. Why did the first sentence say "the orality of the society" rather than "the society's orality"? The second sentence "Literate society is what happens when" rather than "Literate society happens when" (or, for that matter, "When [...], literate society happens")? More widely, why is most of the paragraph written in passive voice? We wouldn't expect to directly answer these, but they're sorts of things we want the audience to be able to get insight into from looking at our displays.

Patterns of use of personal pronouns (first, second, third, fourth), and/or animacy, specificity, or the like are also commonly recommended for study 'to get a feel for how it works'; though this particular passage is mostly lacking in pronouns.

A key challenge here seems to be getting just enough information into the presentation without swamping it in too much information. We can readily present the text with a few elements —words, or perhaps affixes— flagged out, by means of bolding or highlighting, and show a small amount of text structure by dividing it into lines and perhaps indenting some of them. Trying to use more than one means of flagging out could easily get confusing; multiple colors would be hard to reconcile with various forms of color-blindness, conceivably one might get away with about two forms of flagging by some monochromatic means. But, how to deal with more than two kinds of elements; and, moreover, how to show complex relationships?

One way to handle more complex flags would be to insert simple tags of some sort into the text and flag the tags rather than the text itself. Relationships between the tags, one might try to make somewhat more apparent through the text formatting (linebreaks and indentation).

Trying to ease into the thing, here is a simple formatting of the text, with linebreaks and a bit of indentation.

Such oral tradition can be written down,
and was written down,
without disrupting the orality of the society.
Literate society is what happens when the culture itself embraces
writing as a means of preserving knowledge
instead of an oral tradition.
Once literacy is assimilated,
set patterns are no longer needed,
repetition is no longer needed,
pervasive actors are no longer needed,
and details become reliably stable
in a way that simply doesn't happen in oral society —
the keepers of an oral tradition are apt to believe
they tell a story exactly the same way each time,
but only because they and their telling change as one.
When the actors go away,
it becomes possible to conceive of abstract entities.
Plato, with his descriptions of shadows on a cave wall,
and Ideal Forms,
and such,
was (Havelock reckoned) trying to explain literate abstraction
in a way
that might be understood
by someone
with an oral worldview.

This brings out a bit of the structure, including several larger or smaller cases of parallelism; just enough, perhaps, to hint that there is much more there that is still just below the surface. One could imagine discussing the placement of each noun and verb relative to the surrounding structures, resulting in an essay several times the length of the paragraph itself. No wonder displaying the structure is such a challenge, when there's so much of it.

One could almost imagine trying to mark up the paragraph with a pen (or even multiple colors of pens), circling various words and drawing arrows between them. Probably creating a tangled mess and still not really conveying how the whole is put together. Though this does remind us that there's a whole other tradition for representing structure called sentence diagramming. Granting that sentence diagramming, besides its various controversies, doesn't bring out the right sort of structure, brings out too much else, and is limited to structure within a single sentence; it's another sort of presentational strategy to keep in mind.

Adding things up: we're asking for a simple, flexible way to flag out a couple of different kinds of words in an extended text and show how they're grouped... that can be readily implemented in html. The marking-two-kinds-of-words part is relatively easy; set the whole text in, say, grey, one kind of marked words in black, and a second kind of marked words (better perhaps to choose the less numerous marked kind) in black boldface. For grouping, indentation such as above seems rather clumsy and extremely space-consuming; as an experimental alternative, we could try red parentheses.

Taking this one step at a time, here are the nouns and verbs:

Such oral tradition can be written down, and was written down, without disrupting the orality of the society. Literate society is what happens when the culture itself embraces writing as a means of preserving knowledge instead of an oral tradition. Once literacy is assimilated, set patterns are no longer needed, repetition is no longer needed, pervasive actors are no longer needed, and details become reliably stable in a way that simply doesn't happen in oral society — the keepers of an oral tradition are apt to believe they tell a story exactly the same way each time, but only because they and their telling change as one. When the actors go away, it becomes possible to conceive of abstract entities. Plato, with his descriptions of shadows on a cave wall, and Ideal Forms, and such, was (Havelock reckoned) trying to explain literate abstraction in a way that might be understood by someone with an oral worldview.

Marking that up was something of a shock for me. The first warning sign, if I'd recognized it, was the word "disrupting" in the first sentence; should that be marked as a noun, or a verb? Based on the structure of the sentence, it seemed to belong at the same level as, and parallel to, the two preceding forms of write, so I marked "disrupting" as a verb and moved on. The problem started to dawn on me when I hit the word "writing" in the second sentence, which from the structure of that sentence wanted to be a noun. The word "preserving", later in the sentence, seems logically more of an activity than a participant, so feels right as a verb although one might wonder whether it has some common structure with "writing". The real eye-opener though —for me— was the word "descriptions" in the final sentence. Morphologically speaking, it's clearly a noun. And yet. Structurally, it's parallel with "trying to explain"; that is, it's an activity rather than a participant.

The activity/participant semantic distinction is a common theme in my conlanging. I see this semantic distinction as unavoidable, although the corresponding grammatical and lexical noun/verb distinctions are more transitory. My two principal conlang efforts each seek to eliminate one of these transitory distinctions. Lamlosuo, the one I've blogged about, shuns grammatical nouns and verbs, though it has thriving lexical noun and verb classes. My other conlang, somewhat younger and less developed with current working name Refactor, has thriving grammatical nouns and verbs yet no corresponding lexical classes. (The semantic distinction is scarcely mentioned in my post on Lamlosuo; my draft post on Refactor, not nearly ready for prime time, has a bit more to say about activities and participants.)

In this case, had "descriptions" been replaced by a gerund —which grammatically could have been done, though the prose would not have flowed as smoothly (and why that should be is a fascinating question)— we've already the precedent from earlier in the paragraph of choosing to call a gerund a noun or verb depending on what better fits the structure of the passage. Imagine replacing "descriptions", or perhaps "descriptions of", by "describing". (An even more explicitly activity-oriented transformation would be to replace "with his descriptions of" by "when describing".)

The upshot is that I'm now tempted to think of noun and verb as "blue birds", in loose similarity to Delancey's doubts about ergativity. I'm starting to feel I no longer know what grammar is. Which may be in part a good thing, if you believe (as I do; cf. my physics posts) that shaking up one's thinking keeps it limber; but let's not forget, we're trying to aid conlanging, and the grammar of a conlang is apt to be its primary definition.

Meanwhile, building on the noun/verb assignments such as they are, here's a version with grouping parentheses:

(Such oral tradition ((can be written down), and (was written down)), without (disrupting (the orality (of the society.)))) (Literate society (is what happens (when the culture itself (embraces (writing as a (means of (preserving knowledge))) instead of (an oral tradition.))))) ((Once literacy (is assimilated,)) (set patterns (are no longer needed,)) (repetition (is no longer needed,)) (pervasive actors (are no longer needed,)) and (details (become reliably stable (in a way that simply (doesn't happen (in oral society)))))) — (the keepers (of an oral tradition) (are apt to believe (they (tell (a story) (exactly the same way (each time,))))) but only because ((they and their telling) (change (as one)))). (When (the actors (go (away,))) it (becomes possible (to (conceive of (abstract entities.))))) (Plato, (with his descriptions of (shadows (on a cave wall,)) and (Ideal Forms,) and (such,)) (was (Havelock reckoned) trying to explain literate abstraction (in a way that (might be understood by someone (with an oral worldview.)))))

Maybe I should have been prepared for it this time, after the noun/verb marking shook my confidence in the notions of noun and verb. Struggling to decide where to add parentheses here showing the nested, tree structure of the prose has convinced me that the prose is not primarily nested/tree-structured. This fluent English prose (interesting word, fluent, from Latin fluens meaning flowing) is more like a stream of key words linked into a chain by connective words, occasionally splitting into multiple streams depending in parallel from a common point — very much in the mold of Lamlosuo. Yes, that would be the conlang whose structure I figured could not possibly occur in a natural human language, motivating me to invent a thoroughly non-human alien species of speakers; another take on the anadew principle of conlanging, in which conlang structures judged inherently unnatural turn out to occur in natlangs after all. In fairness, imho Lamlosuo is more extreme about the non-tree principle than English, as there really is an element of "chunking" apparent in human language that Lamlosuo studiously shuns; but I'm still not seeing, in this English prose, nearly the sort of syntax tree that grade-school English classes, or university compiler-construction classes, had primed me to expect. (The tree-structured approach seems, afaict, to derive from sentence diagramming, which was promulgated in 1877 as a teaching method.)

So here I am. I want to be able to illustrate the structure of a largish prose passage, on the order of a paragraph, so that the relationships between words, facing upward to large-scale structure, leap out at the observer. I've acquired a sense of the context for the problem. And I've discovered that I'm not just limited by not knowing how to display the structure — I don't even know what the structure is, not even in the case of my own first language, English. Perhaps the tree-structure idea is due to having looked at the structure facing inward toward small-scale structure rather than outward to large-scale; but I'm facing outward now, and thinking our approach to grammatical structure may be altogether wrong-headed. Which, as a conlanger, is particularly distressing since conlangs tend to use a conventionally structured grammar in the primary definition of the language.

Saturation point reached. Any further and I'd be supersaturated, and start to lose things as I went along. Time for a "reset", to clear away the general clutter we've accumulated along the path of this post. Give it some time to settle out, and a fresh post with a new specific focus can select the parts of this material it needs and start on its own path.

Co-hygiene and emergent quantum mechanics

2018-07-31T18:11:00.000-07:00

Thus quantum mechanics occupies a very unusual place among physical theories: it contains classical mechanics as a limiting case, yet at the same time it requires this limiting case for its own formulation.
— Lev Landau and Evgeny Lifshitz, Quantum Mechanics: Non-relativistic Theory (3rd edition, 1977, afaik).

Gradually, across a series of posts exploring alternative structures for a basic theory of physics, I've been trying to tease together a strategy wherein quantum mechanics is, rather than a nondeterministic foundation of reality, an approximation valid for sufficiently small systems. This post considers how one might devise a concrete mathematical demonstration that the strategy can actually work.

I came into all this with a gnawing sense that modern physics had taken a conceptual wrong turn somewhere, that it had made some —unidentified— incautious structural assumption that ought not have been made and was leading it further and further astray. (I explored the philosophy of this at some depth in an earlier post in the series, several years ago by now.) The larger agenda here is to shake up our thinking on basic physics, accumulating different ways to structure theories so that our structural choices are made with eyes open, rather than just because we can't imagine an alternative. The particular notion I'm stalking atm —woven around the concept of co-hygiene, to be explained below— is, in its essence, that quantum mechanics might be an approximation, just as Newtonian mechanics is, and that the quantum approximation may be a consequence of the systems-of-interest being almost infinitesimally small compared to the cosmos as a whole. Quantum mechanics suggests that all the elementary parts of the cosmos are connected to all the other elementary parts, which is clearly not conducive to practical calculations. In the model I'm pursuing, each element is connected to just a comparatively few others, and the whole jostles about, with each adjustment to an element shuffling its remote connections so that over many adjustments the element gets exposed to many other elements. Conjecturally, if a sufficiently small system interacts in this way with a sufficiently vast cosmos, the resulting behavior of the small system could look a lot like nondeterminism.

The question is, could it look like quantum mechanics?

As I've remarked before, my usual approach to these sorts of posts is to lift down off my metaphorical shelf the assorted fragments I've got on the topic of interest; lay out the pieces on the table, adding at the same time any new bits I've lately collected; inspect them all severally and collectively, rearranging them and looking for new patterns as I see them all afresh; and record my trail of thought as I do so. Sometimes I find that since the last time I visited things, my whole perception of them has shifted (I was, for example, struck in a recent post by how profoundly my perception of Church's λ-calculus has changed just in the past several years). Hopefully I glean a few new insights from the fresh inspection, some of which find their way into the new groupings destined to go back up on the shelf to await the next time, while some other, more speculative branches of reasoning that don't make it into my main stream of thought are preserved in my record for possible later pursuit.

Moreover, each iteration achieves focus by developing some particular theme within its line of speculation; some details of previous iterations are winnowed away to allow an uncluttered view of the current theme; and once the new iteration reaches its more-or-less-coherent insights, such as they are, a reset is then wanted, to unclutter the next iteration. Most of the posts in this series —with a couple of exceptions (1, 2)— have focused on the broad structure of the cosmos, touching only lightly on concrete mathematics of modern physics that, after all, I've suspected from the start of favoring incautious structural assumptions. This incremental shifting between posts is why, within my larger series on physics, the current post has a transitional focus: reviewing the chosen cosmological structure in order to apply it to the abstract structure of the mathematics, preparing from abstract ground to launch an assault on the concrete.

Though I'll reach a few conclusions here —oriented especially toward guidance for the next installment in the series— much of this is going to dwell on reasons why the problem is difficult, which if one isn't careful could create a certain pessimism toward the whole prospect. I'm moderately optimistic that the problem can be pried open, over a sufficient number of patient iterations of study. The formidable appearance of a mountain in-the-large oughtn't prevent us from looking for a way to climb it.

Contents
Co-hygiene
Primitive wave functions
Probability distributions
Quantum/classical interface
Genericity
The universe says 'hi'
The upper box

Co-hygiene

The schematic mathematical model I'm considering takes the cosmos to be a vast system of parts with two kinds of connections between them: local (geometry), and non-local (network). The system evolves by discrete transformational steps, which I conjecture may be selected based entirely on local criteria but, once selected, may draw information from both local and non-local connections and may have both local and non-local effects. The local part of all this would likely resemble classical physics.

When a transformation step is applied, its local effect must be handled in a way that doesn't corrupt the non-local network; that's called hygiene. If the non-local effect of a step doesn't perturb pre-existing local geometry, I call that co-hygiene. Transformation steps are not required in general to be co-hygienic; but if they are, then local geometry is only affected by local transformation steps, giving the steps a close apparent affinity with the local geometry, and I conjectured this could explain why gravity seems more integrated with spacetime than do the other fundamental forces. (Indeed, wondering why gravity would differ from the other fundamental forces was what led me into the whole avenue of exploration in the first place.)

Along the way, though, I also wondered if the non-local network could explain why the system deviated from "classical" behavior. Here I hit on an idea that offered a specific reason why quantum mechanics might be an approximation that works for very small systems. My inspiration for this sort of mathematical model was a class of variant λ-calculi (in fact, λ-calculus is co-hygienic, while in my dissertation I studied variant calculi that introduce non-co-hygienic operations to handle side-effects); and in those variant calculi, the non-local network topology is highly volatile. That is, each time a small subsystem interacts non-locally with the rest of the system, it may end up with different network neighbors than it had before. This means that if you're looking at a subsystem that is smaller than the whole system by a cosmically vast amount — say, if the system as a whole is larger than the subsystem by a factor of 10⁷⁰ or 10⁸⁰ — you might perform a very large number of non-local interactions and never interact with the same network-neighbor twice. It would be, approximately, as if there were an endless supply of other parts of the system for you to interact non-locally with. Making the non-local interactions look rather random.

Without the network-scrambling, non-locality alone would not cause this sort of seeming-randomness. The subsystem of interest could "learn" about its network neighbors through repeated interaction with them, and they would become effectively just part of its internal state. Thus, the network-scrambling, together with the assumption that the system is vastly larger than the subsystem, would seem to allow the introduction of an element of effective nondeterminism into the model.

But, is it actually useful to introduce an element of effective nondeterminism into the model? Notwithstanding Einstein's remark about whether or not God plays dice, if you start with a classical system and naively introduce a random classical element into it, you don't end up with a quantum wave function. (There is a vein of research, broadly called stochastic electrodynamics, that seeks to derive quantum effects from classical electrodynamics with random zero-point radiation on the order of Planck's constant, but apparently they're having trouble accounting for some quantum effects, such as quantum interference.) To turn this seeming-nondeterminism to the purpose would require some more nuanced tactic.

There is, btw, an interesting element of flexibility in the sort of effective-nondeterminism introduced: The sort of mathematical model I'm conjecturing has deterministic rules, so conceivably there could be some sort of invariant properties across successive rearrangements of the network topology. Thus, some kinds of non-local influences could be seemingly-random while others might, at least under some particular kinds of transformation (such as, under a particular fundamental force), be constant. The subsystem of interest could "learn" these invariants through repeated interactions, even though other factors would remain unlearnable. In effect, these invariants would be part of the state of the subsystem, information that one would include in a description of the subsystem but that, in the underlying mathematical model, would be distributed across the network.

Primitive wave functions

Suppose we're considering some very small physical system, say a single electron in a potential field.

A potential field, as I suggested in a previous post, is a simple summation of combined influences of the rest of the cosmos on the system of interest, in this case our single electron. Classically —and under Relativity— the potential field would tell us nothing about non-local influences on the electron. In this sort of simple quantum-mechanical exercise, the potential field used is, apparently, classical.

The mathematical model in conventional quantum mechanics posits, as its underlying reality, a wave function — a complex- (or quaternion-, or whatever-) valued field over the state space of the system, obeying some wave equation such as Schrödinger's,

iℏ ∂ Ψ

∂t

= Ĥ Ψ .

This posited underlying reality has no electron in the classical sense of something that has a precise position and momentum at each given time; the wave function is what's "really" there, and any observation we would understand as measuring the position or momentum of the electron is actually drawing on the information contained in the wave function.

While the wave function evolves deterministically, the mathematical model as a whole presents a nondeterministic theory. This nondeterminism is not a necessary feature of the theory. An alternative mathematical model exists, giving exactly the same predictions, in which there is an electron there in the classical sense, with precise position and momentum at each given time. Of course its position and momentum can't be simultaneously known by an observer (which would violate the Heisenberg uncertainty principle); but in the underlying model the electron does have those unobservable attributes. David Bohm published this model in 1952. However Bohm's model doesn't seem to have offered anything except a demonstration that quantum theory does not prohibit the existence of an unobservable deterministic classical electron. In Bohm's model, the electron had a definite position and momentum, yes, but it was acted on by a "pilot wave" that, in essence, obeyed Schrödinger's equation. And Schrödinger's equation is non-local, in the sense that not only does it allow information (unobservable information) to propagate faster than light, it allows it to "propagate" infinitely fast; the hidden information in the wave function does not really propagate "through" space, it just shows up wherever the equation says it should. Some years later, Bell's Theorem would show that this sort of non-locality is a necessary feature of any theory that always gives the same predictions as quantum mechanics (given some other assumptions, one of which I'm violating; I'll get back to that below); but my main point atm is that Bohm's model doesn't offer any new way of looking at the wave function itself. You still have to just accept the wave function as a primitive; Bohm merely adds an extra stage of reasoning in understanding how the wave function applies to real situations. If there's any practical, as opposed to philosophical, advantage to using Bohm's model, it must be a subtle one. Nevertheless, it does reassure us that there is no prohibition against a model in which the electron is a definite, deterministic thing in the classical sense.

The sort of model I'm looking for would have two important differences from Bohm's.

First, the wave function would not be primitive at all, but instead would be a consequence of the way the local-geometric aspect of the cosmos is distorted by the new machinery I'm introducing. The Schrödinger equation, above, seems to have just this sort of structure, with Ĥ embodying the classical behavior of the system while the rest of the equation is the shape of the distorting lens through which the classical behavior passes to produce its quantum behavior. The trick is to imagine any sensible way of understanding this distorting lens as a consequence of some deeper representation (keeping in mind that the local-geometric aspect of the cosmos needn't be classical physics as such, though this would be one's first guess).

A model with different primitives is very likely to lead to different questions; to conjure a quote from Richard Feynman, "by putting the theory in a certain kind of framework you get an idea of what to change". Hence a theory in which the wave function is not primitive could offer valuable fresh perspective even if it isn't in itself experimentally distinguishable from quantum mechanics. There's also the matter of equivalent mathematical models that are easier or harder to apply to particular problems — conventional quantum mechanics is frankly hard to apply to almost any problem, so it's not hard to imagine an equivalent theory with different primitives could make some problems more tractable.

Second, the model I'm looking for wouldn't, at least not necessarily, always produce the same predictions as quantum mechanics. I'm supposing it would produce the same predictions for systems practically infinitesimal compared to the size of the cosmos. Whether or not the model would make experimentally distinguishable predictions from quantum mechanics at a cosmological scale, would seem to depend on how much, or little, we could work out about the non-local-network part of the model; perhaps we'd end up with an incomplete model where the network part of it is just unknown, and we'd be none the wiser (but for increased skepticism about some quantum predictions), or perhaps we'd find enough structural clues to conjecture a more specific model. Just possibly, we'd end up with some cosmological questions to distinguish possible network structures, which (as usual with questions) could be highly fruitful regardless of whether the speculations that led to the questions were to go down in flames, or, less spectacularly, were to produce all the same predictions as quantum mechanics after all.

Probability distributions

Wave functions have always made me think of probability distributions, as if there ought to be some deterministic thing underneath whose distribution of possible states is generating the wave function. What's missing is any explanation of how to generate a wave-function-like thing from a classical probability distribution. (Not to lose track of the terminology, this is classical in the sense of classical probability, which in turn is based on classical logic, rather than classical physics as such. Though they all come down to us from the late nineteenth century, and complement each other.)

A classical probability distribution, as such, is fairly distinctive. You have an observable with a range of possible values, and you have a range of possible worlds each of which induces an observable value. Each possible world has a non-negative-real likelihood. The (unnormalized) probability distribution for the observable is a curve over the range of observable values, summing for each observable value the likelihoods of all possible worlds that yield that observable value. The probability of the observable falling in a certain interval is the area under the curve over that interval, divided by the area under the curve over the entire range of observable values. If you add together two mutually disjoint sets of possibilities, the areas under their curves simply add, since for each observable value the set of possible worlds yielding it is just the ones in the first set and the ones in the second set.

The trouble is, that distinctive pattern of a classical probability distribution is not how wave functions work. When you add together two wave functions, the two curves get added all right, but the values aren't unsigned reals; they can cancel each other, producing an interference pattern as in classic electron diffraction. (I demonstrated the essential role of cancellation, and a very few other structural elements, in quantum mechanical behavior in a recent post.) As an additional plot twist, the wave function values add, but the probability isn't their sum but (traditionally) the square of the magnitude of their sum.

One solution is to reject classical logic, since classical logic gives rise to the addition rule for deterministic probability distributions. Just say the classical notion of logical disjunction (and conjunction, etc.) is wrong, and quantum logic is the way reality works. While you're at it, invoke the idea that the world doesn't have to make sense to us (I've remarked before on my dim view of the things beyond mortal comprehension trope). Whatever its philosophical merits or demerits, this approach doesn't fit the current context for two reasons: it treats the wave function as primitive whereas we're interested in alternative primitives, so it doesn't appear to get us anywhere new/useful; and, even if it did get us somewhere useful (which it apparently doesn't), it's not the class of mathematical model I'm exploring here. I'm pursuing a mathematical model spiritually descended from λ-calculus, which is very much in the classical deterministic tradition.

So, we're looking for a way to derive a wave function from a classical probability distribution. One has to be very canny about approaching something like this. It's not plausible this would be untrodden territory; the strategy would naturally suggest itself, and lots of very smart, highly trained physicists with strong motive to consider it have had nearly a century in which to do so. Yet, frankly, if anyone had succeeded it ought to be well-known in alternative-QM circles, and I'd hope to have at least heard of it. So going into the thing one should apply a sort of lamppost principle, and ask what one is bringing to the table that could possibly allow one to succeed where they did not. (A typical version of the lamppost principle would say, if you've lost your keys at night somewhere on a dark street with a single lamppost, you should look for them near the lamppost since your chances of finding them if they're somewhere else are negligible. Here, to mix the metaphors, the something-new you bring to the table is the location of your lamppost.)

I'm still boggled by how close the frontier of human knowledge is. In high school I chose computer science for a college major partly (though only partly) because it seemed to me like there was so much mathematics you could spend a lifetime on it without reaching the frontier — and yet, by my sophomore year in college I was exploring extracurricularly some odd corner of mathematics (I forget what, now) that had clearly never been explored before. And now I'm recently disembarked from a partly-mathematical dissertation; a doctoral dissertation being, rather by definition, stuff nobody has ever done before. The idea that the math I was doing in my dissertation was something nobody had ever done before, is just freaky. At any rate, I'm bringing to this puzzle in physics a mathematical perspective that's not only unusual for physics, but unique even in the branch of mathematics I brought it from.

The particular mathematical tools I'm mainly trying to apply are:

"metatime" (or whatever else one wants to call it), over which the cosmos evolves by discrete transformation steps. This is the thing I'm doing that breaks the conditions for Bell's Theorem; but all I've shown it works for is reshaping a uniform probability distribution into one that violates Bell's Inequality (here), whereas now we're not just reshaping a particular distribution but trying to mess with the rules by which distributions combine.

My earlier post on metatime was explicitly concerned with the fact that quantum-mechanical predictions, while non-local with respect to time, could still be local with respect to some orthogonal dimension ("metatime"). Atm I'm not centrally interested in strict locality with respect to metatime; but metatime still interests me as a potentially useful tactic for a mathematical model, offering a smooth way to convert a classical probability distribution into time-non-locality.
transformation steps that aggressively scramble non-local network topology. This seems capable of supplying classical nondeterminism (apparently, on a small scale); but the apparent nondeterminism we're after isn't classical.
a broad notion that the math will stop looking like a wave function whenever the network scrambling ceases to sufficiently approximate classical nondeterminism (which ought to happen at large scales). But this only suggests that the nondeterminism would be a necessary ingredient in extracting a wave function, without giving any hint of what would replace the wave function when the approximation fails.

These are some prominent new things I'm bringing to the table. At least the second and third are new. Metatime is a hot topic atm, under a different name (pseudo-time, I think), as a device of the transactional interpretation of QM (TI). Advocates recommend TI as eliminating the conceptual anomalies and problems of other interpretations — EPR paradox, Schrödinger's cat, etc. — which bodes well for the utility of metatime here. I don't figure TI bears directly on the current purpose though because, as best I can tell, TI retains the primitive wave function. (TI does make another cameo appearance, below.)

On the problem of deriving the wave function, I don't know of any previous work to draw on. There certainly could be something out there I've simply not happened to cross paths with, but I'm not sanguine of finding such; for the most part, the subject suffers from a common problem of extra-paradigm scientific explorations: researchers comparing the current paradigm to its predecessor are very likely to come to the subject with intense bias. Researchers within the paradigm take pains to show that the old paradigm is wrong; researchers outside the paradigm are few and idiosyncratic, likely to be stuck on either the old paradigm or some other peculiar idea.

The bias by researchers within the paradigm, btw, is an important survival adaptation of the scientific species. The great effectiveness of paradigm science — which benefits its evolutionary success — is in enabling researchers to focus sharply on problems within the paradigm by eliminating distracting questions about the merits of the paradigm; and therefore those distracting questions have to be crushed decisively whenever they arise. It's hard to say whether this bias is stronger in the first generation of scientists under a paradigm, who have to get it moving against resistance from its predecessor, or amongst their successors trained within the zealous framework inherited from the first generation; either way, the bias tends to produce a dearth of past research that would aid my current purpose.

A particularly active, and biased, area of extra-paradigm science is no-go theorems, theorems proving that certain alternatives to the prevailing paradigm cannot be made to work (cf. old post yonder). Researchers within the paradigm want no-go theorems to crush extra-paradigm alternatives once and for all, and proponents of that sort of crushing agenda are likely, in their enthusiasm, to overlook cases not covered by the formal no-go-result. Extra-paradigm researchers, in contrast, are likely to ferret out cases not covered by the result and concentrate on those cases, treating the no-go theorems as helpful hints on how to build alternative ideas rather than discouragement from doing so. The paradigm researchers are likely to respond poorly to this, and accuse the alternative-seekers of being more concerned with rejecting the paradigm than with any particular alternative. The whole exchange is likely to generate much more heat than light.

Quantum/classical interface

A classical probability distribution is made up of possibilities. One of them is, and the others are not; we merely don't know which one is. This is important because it means there's no way these possibilities could ever interact with each other; the one that is has nothing to interact with because in fact there are no other possibilities. That is, the other possibilities aren't; they exist only in our minds. This non-interaction is what makes the probability distribution classical. Therefore, in considering ways to derive our wave function from classical probability distributions, any two things in the wave function that interact with each other do not correspond to different classical possibilities.

It follows that quantum states — those things that can be superposed, interfere with each other, and partly cancel each other out — are not separated by a boundary between different classical possibilities. This does not, on the face of it, prohibit superposable elements from being prior or orthogonal to such boundaries, so that the mathematical model superposes entities of some sort and then applies them to a classical probability distribution (or applies the distribution to them). Also keep in mind, though we're striving for a model in which the wave function isn't primitive, we haven't pinned down yet what is primitive.

Now, the wave function isn't a thing. It isn't observable, and we introduce it into the mathematics only because it's useful. So if it also isn't primitive, one has to wonder whether it's even needed in the mathematics, or whether perhaps we're simply to replace it by something else. To get a handle on this, we need to look at how the wave function is actually used in applying quantum mechanics to physical systems; after all, one can't very well fashion a replacement for one part of a machine unless one understands how that part interacts with the rest of the machine.

The entire subject of quantum mechanics appears imho to be filled with over-interpretation; to the extent any progress has been made in understanding quantum mechanics over the past nearly-a-century, it's consisted largely in learning to prune unnecessary metaphysical underbrush so one has a somewhat better view of the theory.

The earliest, conventional "interpretation" of QM, the "Copenhagen interpretation", says properties of the physical system don't exist until observed. This, to be brutally honest, looks to me like a metaphysical statement without practical meaning. There is a related, but more practical, concept called contextuality; and an associated — though unfortunately technically messy — no-go theorem called the Kochen–Specker theorem, a.k.a. the Bell–Kochen–Specker theorem. This all relates to the Heisenberg uncertainty principle, which says that you can't know the exact position and momentum of a particle at the same time; the more you know about its position, the less you can know about its momentum, and vice versa. One might think this would be because the only way to measure the particle's position or momentum is to interact with it, which alters the particle because, well, because to every action there is an equal and opposite reaction. However, in the practical application of the wave function to a quantum-mechanical system, there doesn't appear to be any experimental apparatus within the quantum system for the equal-and-opposite-reaction to apply to. Instead, there's simply a wave function and then it collapses. Depending on what you choose to observe (say, the position or the momentum), it collapses differently, so that the unobservable internal state of the system actually remembers which you chose to observe. This property, that the (unobservable) internal state of the system changes as a result of what you choose to measure about it, is contextuality; and the Kochen–Specker theorem says a classical hidden-variable theory, consistent with QM, must be contextual (much as Bell's Theorem says it must be non-local). Remember Bohm's hidden-variable theory, in which the particle does have an unobservable exact position and momentum? Yeah. Besides being rampantly non-local, Bohm's model is also contextual: the particle's (unobservable, exact) position and momentum are guided by the wave function, and the wave-function is perturbed by the choice of measurement, therefore the particle's (unobservable, exact) position and momentum are also perturbed by the choice of measurement.

Bell, being of a later generation than Bohr and Einstein (and thus, perhaps, less invested in pre-quantum metaphysical ideas), managed not to be distracted by questions of what is or isn't "really there". His take on the situation was that the difficulty was in how to handle the interface between quantum reality and classical reality — not philosophically, but practically. To see this, consider the basic elements of an exercise in traditional QM (non-relativistic, driven by Schrödinger's equation):

A set of parameters define the classical state of the system; these become inputs to the wave equation. [typo fixed]
A Hamiltonian operator Ĥ embodies the classical dynamics of the system.
Schrödinger's equation provides quantum distortion of the classical system.
A Hermitian operator called an "observable" embodies the experimental apparatus used to observe the system. The wave function collapses to an eigenstate of the observable.

The observable is the interface between the quantum system and the classical world of the physicist; and Bell ascribes the difficulty to this interface. Consider a standard double-slit experiment in which an electron gun fires electrons one at a time through the double slit at a CRT screen where each electron causes a scintillation. As long as you don't observe which slit the electron passes through, you get an interference pattern from the wave function passing through the two slits, and that is quantum behavior; but there's nothing in the wave function to suggest the discreteness of the resulting scintillation. That discreteness results from the wave function collapse due to the observable, the interface with classical physics — and that discreteness is an essential part of the described physical reality. Scan that again: in order to fully account for physical reality, the quantum system has to encompass only a part of reality, because the discrete aspect of reality is only provided by the interface between the quantum system and surrounding classical physics. It seems that we couldn't describe the entire universe using QM even if we wanted to because, without a classical observable to collapse the wave function, the discrete aspect of physical reality would be missing. (Notice, this account of the difficulty is essentially structural, with only the arbitrary use of the term observable for the Hermitian operator as a vestige of the history of philosophical angst over the "role of the observer". It's not that there isn't a problem, but that presenting the problem as if it were philosophical only gets in the way of resolving it.)

The many-worlds interpretation of QM (MWI) says that the wave function does not, in fact, collapse, but instead the entire universe branches into multiples for the different possibilities described by the wave function. Bell criticized that while this is commonly presented as supposing that the wave function is "all there is", in fact it arbitrarily adds the missing discreteness:

the extended wave does not simply fail to specify one of the possibilities as actual...it fails to list the possibilities. When the M‍WI postulates the existence of many worlds in each of which the photographic plate is blackened at particular position, it adds, surreptitiously, the missing classification of possibilities. And it does so in an imprecise way, for the notion of the position of a black spot (it is not a mathematical point) [...] [or] reading of any macro‍scope instrument, is not mathematically sharp. One is given no idea of how far down towards the atomic scale the splitting of the world into branch worlds penetrates.
— J.S. Bell, "Six possible worlds of quantum mechanics", Speakable and unspeakable in quantum mechanics (anthology), 1993.

I'm inclined to agree: whatever philosophical comfort the M‍WI might provide to its adherents, it doesn't clarify the practical situation, and adds a great deal of conceptual machinery in the process of not doing so.

The transactional "interpretation" of QM is, afaik, somewhat lower-to-the-ground metaphysically. To my understanding, TI keeps everything in quantum form, and posits that spacetime events interact through a "quantum handshake": a wave propagates forward in time from an emission event, while another propagates backward in time from the corresponding absorption event, and they form a standing wave between the two while backward waves cancel out before the emission and forward waves cancel after the absorption. Proponents of the TI report that it causes the various paradoxes and conceptual anomalies of QM to disappear (cf. striking natural structure), and this makes sense to me because the "observable" Hermitian operator should be thus neatly accounted for as representing half of a quantum handshake, in which the "observer" half of the handshake is not part of the particular system under study. Wherever we choose to put the boundary of the system under study, the interface to our experimental apparatus would naturally have this half-a-handshake shape.

The practical lesson from the transactional interpretation seems to be that, for purposes of modeling QM, we don't have to worry about the wave function collapsing. If we can replicate the wave function, we're in. Likewise, if we can replicate the classical probability distributions that the wave function generates; so long as this includes all the probability distributions that result from weird quantum correlations (spooky action-at-a-distance). That the latter suffices, should be obvious since generating those probability distributions is the whole point of quantum theory; that the latter is possible is demonstrated by Bohm's hidden-variable theory (sometimes called the "Bohm Interpretation" by those focusing on its philosophy).

Genericity

There is something odd about the above list of basic elements of a QM exercise, when compared to the rewriting-calculus-inspired model we're trying to apply to it. When one thinks of a calculus term, it's a very concrete thing, with a specific representation (in fact over-specific, so that maintaining it may require α-renaming to prevent specific name choices from disrupting hygiene); and even classical physics seems to present a rather concrete representation. But the quantum distortion of the wave equation apparently applies to whatever description of a physical system we choose; to any choice of parameters and Ĥ, regardless of whether it bears any resemblance to classical physics. It certainly isn't specific to the representation of any single elementary unit, since it doesn't even blink (metaphorically) at shifting application from a one-electron to a two-electron system.

This suggests, to me anyway, two things. On the negative/cautionary side, it suggests a lack of information from which to choose a concrete representation for the "local" part of a physical system, which one might have thought would be the most straightforward and stable part of a cosmological "term". Perhaps more to the point, though, on the positive, insight-aiding side it suggests that if the quantum distortion is caused by some sort of non-local network playing out through rewrites in a dimension orthogonal to spacetime, we should consider trying to construct machinery for it that doesn't depend, much, on the particular shape of the local representation. If our distortion machinery does place some sort of constraints on local representation, they'd better be constraints that say something true about physics. Not forgetting, we expect our machinery to notice the difference between gravity and the other fundamental forces.

My most immediate goal, though, lest we forget, is to reckon whether it's at all possible any such machinery can produce the right sort of quantum distortion: a sanity check. Clues to the sort of thing one ought to look for are extremely valuable; but, having assimilated those clues, I don't atm require a full-blown theory, just a sense of what sort of thing is possible. Anything that can be left out of the demonstration probably should be. We're not even working with the best wave equation available; the Schrödinger equation is only an approximation covering the non-relativistic case. In fact, the transactional-interpretation folks tell us their equations require the relativistic treatment, so it's even conceivable the sanity check could run into difficulties because of the non-relativistic wave equation (though one might reasonably hope the sanity check wouldn't require anything so esoteric). But all this talk about relativistic and non-relativistic points out that there is, after all, something subtle about local geometry built into the form of the wave equation even though it's not directly visible in the local representation. In which case, the wave equation may still contain the essence of that co-hygienic difference between gravity and the other fundamental forces (although... for gravity even the usual special-relativistic Dirac equation might not be enough, and we'd be on to the Dirac equation for curved spacetime; let's hope we don't need that just yet).

The universe says 'hi'

Let's just pause here, take a breather and see where we are. The destination I've had my eye on, from the start of this post, was to demonstrate that a rewriting system, of the sort described, could produce some sort of quantum-like wave function. I've been lining up support, section by section, for an assault on the technical specifics of how to set up rewriting systems — and we're not ready for that yet. As noted just above, we need more information from which to choose a concrete representation. If we try to tangle with that stuff before we have enough clues from... somewhere... to guide us through it, we'll just tie ourselves in knots. This kind of exploration has to be approached softly, shifting artfully from one path to another from time to time so as not to rush into hazard on any one angle of attack. So, with spider-sense tingling —or perhaps thumbs pricking— I'll shift now to consider, instead of pieces of the cosmos, pieces of the theory.

In conventional quantum mechanics, as noted a couple of sections above, we've got basically three elements that we bring together: the parameters of our particular system of study, our classical laws of physics, and our wave equation. Well, yeah, we also have the Hermitian operator, but, as remarked earlier, we can set that aside since it's to do with interfacing to the system, which was our focus in that section but isn't what we're after now. The parameters of the particular system are what they are. The classical laws of physics are, we suppose, derived from the transformation rules of our cosmic rewriting system, with particular emphasis on the character of the primitive elements of the cosmos (whatever they are) and the geometry, and some degree of involvement of the network topology. The wave equation is also derived from the transformation rules, especially from how they interact with the network topology.

This analysis is already deviating from the traditional quantum scenario, because in the traditional scenario the classical laws of physics are strictly separate from the wave equation. We've had hints of something deep going on with the choice of wave equation; Transactional Interpretation researchers reporting that they couldn't use the non-relativistic wave equation; and then there was the odd intimation, in my recent post deriving quantum-like effects from a drastically simplified system that lacked a wave equation, that the lack of a wave equation was somehow crippling something to do with systemic coherence buried deep in the character of the mathematics. Though it does seem plausible that the wave equation would be derived more from the network topology, and perhaps the geometry, whereas the physical laws would be derived more from the character of the elementary physical components, it is perhaps only to be expected that these two components of the theory, laws and wave equation, would be coupled through their deep origins in the interaction of a single cosmological rewriting calculus.

Here is how I see the situation. We have a sort of black box, with a hand crank and input and output chutes, and the box is labeled physical laws + wave equation. We can feed into it the parameters of the particular physical system we're studying (such as a single electron in a potential field), carefully turn the crank (because we know it's a somewhat cantankerous device so that a bit of artistry is needed to keep it working smoothly), and out comes a wave function, or something akin, describing, in a predictive sense, the observable world. What's curious about this box is that we've looked inside, and even though the input and output are in terms of a classical world, inside the box it appears that there is no classical world. Odd though that is, we've gotten tolerably good at turning the crank and getting the box to work right. However, somewhere above that box, we are trying to assemble another box, with its own hand crank and input/output chutes. To this box, we mean to feed in our cosmic geometry, network topology, and transformation rules, and possibly some sort of initial classical probability distribution, and if we can get the ornery thing to work at all, we mean to turn the crank and get out of it — the physical laws plus wave equation.

Having arrived at this vision of an upper box, I was reading the other day a truthfully rather prosaic account of the party line on quantum mechanics (a 2004 book, not at all without merit as a big-picture description of mainstream thought, called Symmetry and the Beautiful Universe), and encountered a familiar rhetorical question of such treatments: when considering a quantum mechanical wave function, "‍[...] what is doing the waving?" And unlike previous times I'd encountered that question (years or decades before), this time the answer seemed obvious. The value of the wave function is not a property of any particular particle in the system being studied, nor is it even a property of the system-of-interest as a whole; it's not part of the input we feed into the lower box at all, rather it's a property of the state of the system and so part of the output. The wave equation describes what happens when the system-of-interest is placed into the context of a vastly, vastly larger cosmos (we're supposing it has to be staggeringly vaster than the system-of-interest in order for the trick to work right), and the whole is set to jostling about till it settles into a stable state. Evidently, the shape that the lower box gives to its output is the footprint of the surrounding cosmos. So this time when the question was asked, it seemed to me that what is waving is the universe.

The upper box

All we have to work with here are our broad guesses about the sort of rewriting system that feeds into the upper box, and the output of the lower box for some inputs. Can we deduce anything, from these clues, about the workings of the upper box?

As noted, the wave function that comes out of the lower box assigns a weight to each state of the entire system-of-interest, rather than to each part of the system. Refining that point, each weight is assigned to a complete state of the system-of-interest rather than to a separable state of a part of the system-of-interest. This suggests the weight (or, a weight) is associated with each particular possibility in the classical probability distribution that we're supposing is behind the wave equation generated by the upper box. Keep in mind, these possibilities are not possible states of the system-of-interest at a given time; they're possible states of the whole of spacetime; the shift between those two perspectives is a slippery spot to step carefully across.

A puzzler is that the weights on these different possibilities are not independent of each other; they form a coherent pattern dictated by the wave equation. Whatever classical scenario spacetime settles into, it apparently has to incorporate effective knowledge of other possible classical scenarios that it didn't settle into. Moreover, different classical scenarios for the cosmos must —eventually, when things stabilize— settle down to a weight that depends only on the state of our system-of-interest. Under the sort of structural discipline we're supposing, that correlation between scenarios is generated by any given possible spacetime jostling around between classical scenarios, and thus roaming over various possible scenarios to sample them. Evidently, the key to all of this must be the transitions between cosmic scenarios: these transitions determine how the weight changes between scenarios (whatever that weight actually is, in the underlying structure), how the approach to a stable state works (whatever exactly a stable state is), and, of course, how the classical probabilities eventually correlate with the weights. That's a lot of unknowns, but the positive insight here is that the key lever for all of it is the transitions between cosmic scenarios.

And now, perhaps, we are ready (though we weren't a couple of sections above) to consider the specifics of how to set up rewriting systems. Not, I think, at this moment; I'm saturated, which does tend to happen by the end of one of these posts; but as the next step, after these materials have gone back on the shelf for a while and had a chance to become new again. I envision practical experiments with how to assemble a rewriting system that, fed into the upper box, would cause the lower box to produce simple quantum-like systems. The technique is philosophically akin to my recent construction of a toy cosmos with just the barest skeleton of quantum-like structure, demonstrating that the most basic unclassical properties of quantum physics require almost none of the particular structure of quantum mechanics. That treatment particularly noted that the lack of a wave equation seemed especially problematic; the next step I envision would seek to understand how something like a wave equation could be induced from a rewriting system. Speculatively, from there one might study how variations of rewriting system produce different sorts of classical/quantum cosmos, and reason on toward what sort of rewriting system might produce real-world physics; a speculative goal perhaps quite different from where the investigation will lead in practice, but for the moment offering a plausible destination to make sail for.

Why quantum math is unclassical

2018-06-25T16:34:00.000-07:00

For me, the important thing about quantum mechanics is the equations, the mathematics. If you want to understand quantum mechanics, just do the math. All the words that are spun around it don't mean very much. It's like playing the violin. If violinists were judged on how they spoke, it wouldn't make much sense.
— Freeman Dyson, in an interview with Onnesha Roychoudhuri, Salon, 2007.

Put aside all metaphysical questions about what sort of universe could be described by quantum mechanics. Given that quantum mechanics is a recipe for making predictions about the physical world, and that those predictions are rather peculiar by classical standards, what is it about the recipe that causes these peculiarities?

In this post, I'm going to try to vastly simplify the recipe while still producing those peculiarities: I'm going to build a toy cosmos, a really tiny system with really simple rules that, on their face, have almost none of the specific structure of quantum mechanics; yet, if it works out right, the system will still exhibit certain particular effects whose origins —whose mathematical origins— I want to understand better. Here's my list of effects I want:

Nondeterminism.
Quantum interference.
Disappearance of quantum interference under observation.
Quantum entanglement.

I've tried this before, more than a decade ago, but my perspective has recently changed from my explorations of co-hygiene. A little after the turn of the millennium I was studying a 1988 MIT AI Lab memo by Gary Drescher, "Demystifying Quantum Mechanics: A Simple Universe with Quantum Uncertainty", and wanted to use a similar technique to explore some specific peculiarities of quantum math. I used an even simpler toy cosmos than the 1988 memo had, which I could because my goals were narrower than Drescher's. I eventually put my results up on the web through my WPI CS Department account (2006), though I didn't feel right at the time about making it a WPI CS Department tech report (a decision I eventually came to regret, after I'd got my doctoral hood and left, and it was too late). But, nifty though the 2006 paper was in some ways, I now feel it didn't go far enough in simplifying the simple universe. At the time I wanted to keep the "quantum" math similar enough to actual quantum mechanics to retain its look-and-feel, so that the reader would still think, yes, that is like quantum mechanics. Now, though, I really want to strip away almost all the structure of quantum mechanics; because I'm now very interested to know which consequences of quantum mechanics are caused by which parts of the mathematical model from which they flow.

The result, with most of the instrument missing, won't be recital-quality violin; not even musical, really. But I hope to learn from it a bit of how the instrument works.

Contents
Classical toy cosmos
Quantum toy cosmos
Nondeterminism
Interference
Observation
Entanglement
Why

Classical toy cosmos

A quantum view of a cosmos can only be constructed relative to a classical view. So we have to start with a classical toy cosmos.

The instantaneous state of this cosmos consists of just two boolean —true/false— variables, a and b; so there are only four possible states for the cosmos to be in, which we call TT, TF, FT, FF (listing a then b). Time advances discretely from one moment t to the next t+1, and we're allowed to apply some experimental apparatus across that interval that determines how the state at t+1 depends on the state at t. There are just three kinds of experimental apparatus, each of which has two variants depending on whether it's focused on a or b:

set v: causes the variable to be true in the next state.
clear v: causes the variable to be false in the next state.
copy v: causes the value of the variable in the old state to become the value of both variables in the next state.

Nothing changes unless explicitly changed by the apparatus.

For example, from state TF, here are the states produced by the six possible experiments:

TF → set a → TF

TF → set b → TT

TF → clear a → FF

TF → clear b → TF

TF → copy a → TT

TF → copy b → FF

Quantum toy cosmos

A quantum state of the cosmos consists of a vector indexed by classical states; that is, q = ⟨w_s⟩ where s varies over the four classical states of the cosmos (in order TT, TF, FT, FF).

We understand a quantum state to determine a probability distribution of classical states of the toy cosmos; for quantum state q, we denote the probability of classical state s by p_s(q).

As always when reasoning about quantum mechanics — but this bears repeating, to keep the concepts straight — we, as physicists studying the mathematics of the situation, are not observers in the technical sense of quantum theory. That is, we are not part of the toy cosmos at all. We can reason about the evolution of the quantum state of the toy cosmos; how an experiment changes the probabilities from time t to time t+1, from p_s(q_t) to p_s(q_t+1); and our reasoning does not alter the system. Observation is one of the possible processes within the toy cosmos, which we will eventually get around to reasoning about, below.

What sorts of values, though, are the weights w_s within the quantum state?

In current mathematical physics, one would expect these weights to be what's called a gauge field — one of those terms that doesn't mean much to outsiders but, to those in the know, carries along a great deal of extra baggage. We don't want that baggage here; and it's worth a moment just to consider why we don't want it.

In classical Lagrangian mechanics, one considers the evolution of a system as a path through the system's classical state-space (where points in the space are classical states of the system). A function called a Lagrangian maps points in the state-space to energies. The action of the system is the line integral along this path. The principle of least action says that from a given state, the system will follow the path that minimizes the action. One solves for this minimal path using a mathematical technology called the calculus of variations. And Noether's theorem (yeah, yeah, Noether's first theorem) says that each differentiable invariant of the action — each symmetry of the action — gives rise to a conservation law.

In recent quantum physics, the system state — the range of points in the state-space — consists of a classical state together with what I've called here a "weight"; that's the wavy part of the wave function. While part of that weight can be perceived more-or-less directly as probability (traditionally, probability proportional to the square of the amplitude of a complex number), the rest of it can't be perceived; but its symmetries give rise to conservation laws which in turn come out as classes of particles. Photons, gluons, and whatnot. The weights form a gauge field, the invariances that give rise to the conservation laws are gauge symmetries, etc.

Physicists tend to ground their thinking in an imagined "real world"; a century or so of quantum mechanics hasn't really dimmed this attitude, even if the "real world" now imagined is Platonic such as a gauge field. The attitude has considerable merit imho (leading, e.g., to the profound change I've noted in my view of λ-calculus, which was after all originally an exercise in formalist meta-mathematics, essentially a manipulation of syntax deliberately disregarding any possible referent); but the attitude does seem to make physicists especially vulnerable to mistaking the map for the territory. That is, in treating the gauge field as if it were "really there", the physicist may forget to distinguish between a mathematical theory that successfully describes observable features of reality, and mathematics that is "known" to underlie reality. The Lagrangian (as I pointed out in an earlier post) isn't some magic deeper level of reality, it's just whatever works to cause the principle of least action to give the right answer; and Noether's theorem, profound as it is, points out the physical consequences of a mathematical structure that was devised in the first place from the physical world, with the mathematical structure thus serving as essentially a catalyst to reasoning. Physicists, lacking a traditional classical-style model of reality, observe (say) a force and construct a gauge theory for it which they then think of as a theorized "real thing" (not necessary a bad attitude), reason through Noether's theorem to a class of particles, look for them using massive devices such as the Large Hadron Collider, and when they observe the phenomenon they predicted, then treat the particle as "known" and even take some properties of the gauge field as "known". The chain of reasoning is so long that even the question of whether the observed particle "exists" is somewhat open to interpretation; and the gauge field is even more problematic.

More to the immediate point, the purpose of this post calls for avoiding the entire baggage train attached to the term "gauge", in pursuit of a minimal mathematical structure giving rise to the specifically named peculiar behaviors of quantum mechanics.

Taking a semi-educated stab at minimality, let's have just three possible weights: a neutral weight, and two polar opposites. Call the neutral weight 0 (zero). One might call the other two 1 and −1, but really the orientation of those has to do with multiplication, and we're not going to have any sort of multiplication of weights by each other, so to avoid implying any particular orientation, let's unimaginatively call them left and right. Two operations are provided on weights. Unary negation, −w, transforms left to right, transforms right to left, and leaves 0 unchanged.

In the classical toy cosmos, each experiment determined, given the classical state s at time t, the resulting classical state s' at time t+1. In the quantum version, each experiment determines, for each possible classical state s at time t, and each possible classical state s' at time t+1, what contribution does weight w_t,s make to weight w_t+1,s'. Each weight at time t+1 is simply the sum of the contributions to that weight from each of the weights at time t. This requires, of course, that we sum a set of weights; let the sum of a set of weights be whichever of left or right there are more of amongst the arguments, or zero if there are the same number of left and right arguments. This summation operation —for which we'll freely use the usual additive notation— is, btw, not at all mathematically well-behaved; commutative, but not associative since, for example,

left + left + (right + right)  =  left
left + (left + right) + right  =  0
(left + left) + right + right  =  right.

The ill-behavedness however is a bit moot, because in the six possible experiments of our toy cosmos, no sum will ever have more than two non-zero addends, and non-associativity only happens when there are at least three non-zero addends.

We understand a zero weight to mean that classical state is not possible at that time; and assign equal probabilities to all non-zero-weighted classical states in the quantum state. Presumably, for all possible experiments, a zero weight at time t contributes zero to each weight at time t+1.

It remains to define, for each experiment, the contribution of each weight before the experiment to each weight after the experiment. We'll write s for a classical state before, s' after; before weight w_s, after weight w'_s', and contribution of the former to the latter w_s→s'. We have w'_s' = Σ_s w_s→s' (that is, each after-weight is the sum of the contributions to it from each of the before-weights). We'll mainly represent these transformations by tables, rather that depending on all this elaborate notation.

Consider any set/clear v experiment. Before-state s contributes nothing to any after-state that changes the non-v variable. If s already has v with the value called for, only the contribution to s'=s can be non-zero, w'_s→s = w_s. If s doesn't have the value of v called for, it contributes its weight to the state with v changed, and also contributes the negation of its weight to the unchanged state. In all,

set a
→

TT w_TT w_TT + w_FT

TF w_TF w_TF + w_FF

FT w_FT −w_FT

FF w_FF −w_FF

set b
→

TT w_TT w_TT + w_TF

TF w_TF −w_TF

FT w_FT w_FT + w_FF

FF w_FF −w_FF

clear a
→

TT w_TT −w_TT

TF w_TF −w_TF

FT w_FT w_TT + w_FT

FF w_FF w_TF + w_FF

clear b
→

TT w_TT −w_TT

TF w_TF w_TT + w_TF

FT w_FT −w_FT

FF w_FF w_FT + w_FF

Follow the same pattern for a copy v experiment, adjusting which values are changed.

copy a
→

TT w_TT w_TT + w_TF

TF w_TF −w_TF

FT w_FT −w_FT

FF w_FF w_FT + w_FF

copy b
→

TT w_TT w_TT + w_FT

TF w_TF −w_TF

FT w_FT −w_FT

FF w_FF w_TF + w_FF

This has, btw, all been constructed to avoid awkward questions when interpreting quantum states probabilistically by guaranteeing that each experiment, operating on a predecessor quantum state with at least one non-zero weight, will always produce a successor quantum state with at least one non-zero weight.

Demonstrating the intended quantum effects is —if it can be done— then just a matter of assembling suitable compositions of experiments.

Nondeterminism

The fundamental difference between quantum state and classical state is, always, that any observed state of reality is classical. Quantum state evolves deterministically — we've just specified precisely how it evolves through each experiment — and our difficulty is that we see no way to interpret the probability distributions of quantum mechanics as deterministic evolution of classical states.

Interference

The effect to be demonstrated is that a sequence of two experiments produces a probability distribution that doesn't compose the probability distributions of the two individual experiments.

Suppose we set a and then clear a. To be clear on what's going on, we start from a pure state, that is, a quantum state in which only one classical state is possible. If that pure state has a=true, the quantum state after set a would be unchanged, so the final probability distribution would be just that of the second experiment, clear a. So choose instead a pure starting state with a=false.

set a
→ clear a
→

TT w_FT −w_FT

TF

FT w_FT −w_FT

FF

Here, the second experiment produces a quantum state at time t+2 where the weight on classical state FT is the sum of the weights on states TT and FT at time t+1; and since the first experiment has left those two as polar opposites, they cancel, w_FT − w_FT = 0, so the outcome of the sequence of two experiments is pure state TT. Even though each of the experiments individually, when applied to a pure state where the value isn't what the experiment seeks to make it, would produce a probability distribution between two possible classical result states.

Observation

In the standard two-slit experiment, electron wave interference disappears when we observe which slit the electron goes through. So, to disrupt the interference effect we've just demonstrated, put a copy a in between the other two operations, to observe, within the toy cosmos, the intermediate classical state of the system.

set a
→ copy a
→ clear a
→

TT w_FT w_FT −w_FT

TF

FT w_FT −w_FT w_FT w_FT

FF −w_FT −w_FT

Here, the final experiment gives a time t+3 weight for FT that is the sum of the time t+2 weights for TT and FT, but now they have the same sign so they don't cancel.

Interestingly, although this does spoil the interference pattern from the previous demonstration, it doesn't produce the crisp "classical" probability distribution that we expect observation to exhibit in a similar scenario in real-world quantum mechanics. In my 2006 paper, I did get a crisply classical distribution; but there, the transformation of weights by the copy v operation was itself deterministic, assigning zero weight to those classical outcomes in which the value was not copied. I defined the copy transformation differently this time because it had always bothered me that the 2006 paper did not guarantee that an experiment could not result in an all-zero quantum state. My best guess, atm, as to why this zero-outcome problem doesn't ordinarily arise in full-blown quantum mechanics is that it has to do with the overall coherence provided by the wave equation, a structural component of quantum mechanics entirely omitted here. At least, I've never heard of this particular anomaly arising in full-blown quantum mechanics; though full-blown quantum mechanics does have anomalies of its own that seem no less alarming if perhaps more sophisticated, such as infinities that may crop up causing renormalization problems in quantum gravity.

Conceivably, this may be a clue that the presence of a wave equation is profoundly fundamental to the overall structure of quantum mechanics. Identifying the deep structural role of a wave equation, independent of the details of any particular wave equation, would seem to be another exercise for another day — though possibly not all that distant a day, given the sorts of questions I've been asking regarding co-hygiene.

At any rate, the intervening copy a experiment does alter the probability distribution of values of a despite the fact that the classical effect of the experiment on a pure classical state never alters the value of a.

Entanglement

The idea of entanglement, in its strongest sense, is that things done to one variable affect the other variable. Loosely, we want to perform experiments on one variable that don't touch the other variable, yet alter the probability distribution of the other variable. There is so little mathematical structure left in our toy cosmos, that there aren't a lot of options to consider for demonstrating this effect. The only operations that don't touch one variable are set/clear of the other variable. Asymmetric handling of states can be derived from the fact that the set-clear sequence we used to demonstrate interference only causes interference on a pure start state if a=false. So, suppose we run our set-clear on an initial quantum state with a correlation between a and b.

set a
→ clear a
→

TT w_FT −w_FT

TF w_TF w_TF −w_TF

FT w_FT −w_FT

FF w_TF

The two starting weights never get added to each other, so it doesn't matter for this sequence whether they have the same polarity, as long as they're both non-zero. In the start state, the probability of b=true is 1/2, as is the probability of a=true; in the final state, the probability of b=true is 1/3, while the probability of a=true is 2/3.

Why

Our toy cosmos deliberately leaves out most complications of quantum mechanics. We do require, in order that the theory be at all quantum-y, to be able to understand the mathematical model as describing a probability distribution of possible perceived classical states; to understand the quantum state as being partitioned into elements associated with particular classical states; and to understand each of these elements as contributing to various elements of the successor quantum state. That leaves the question of what sort of information a quantum state associates with each classical state; that is, what is the range over which each weight varies; and then, of course, what are the rules by which a given experiment transforms predecessor quantum state to successor quantum state. In order to exhibit interference, it seems there must be a way for weights to cancel each other out during the summation process, and in this post I've deliberately taken the simplest sort of weight I could imagine that would allow canceling.

The resulting toy cosmos does exhibit the quantum interference effect, and clearly the demonstration of this effect does rely on weights canceling during summation.

Nondeterminism —relative, that is, to classical states— arises, potentially, when a single predecessor classical-state contributes non-zero weight to more than one successor classical-state. Interference arises (given the cancellation provided for), again potentially, when a single successor classical-state receives non-zero contributions from more than one predecessor classical-state.

The quantum interference effect depends crucially on the fact that weights are holistic. That is, a weight is assigned to a classical state of the entire cosmos; it isn't a characteristic of any particular feature within the classical state of the cosmos. This is why observation within the toy cosmos disrupts interference: once the particular part of the cosmos we're manipulating (variable a in our demonstration) is "observed" by another part of the cosmos (variable b in our demonstration), the classical state of the cosmos as a whole may differ because of what the observer saw, so that interference does not occur. (Tbh, this point was more clearly exhibited in the 2006 paper, where observation was absolute — as it is in the full-blown quantum mechanics of our physical world; but it is still there to be found in the toy cosmos of this blog post.)

Entanglement was something I really wanted to understand in 2006; curiously, in 2018 I'm finding it less interesting than observation. An experiment can cause interference amongst the successors of one classical-state and not amongst the successors of another classical state, so that, in the quantum successor-state, successors of one classical-state are collectively more probable than successors of another classical-state. If the experiment only manipulates one variable (a) without affecting the other (b), this difference in probabilities of successor states can mean a difference in probabilities of values of the unmanipulated variable (b).

These latter two points are somewhat murkier from the above demonstrations than they were from the 2006 paper; the murkiness is apparently due to my decision in this blog post to define the copy v operation as something that might or might not change the state, rather than something that always changes the state in the 2006 paper; and that decision was made here due to considerations of avoiding possible quantum zero-states. As noted earlier, this seems to be something to do with the absence, from this immensely simplified mathematical structure, of a wave equation that would ward off such anomalies.

It seems, then, that I went into this blog post seeking to clarify minimal structure needed to produce certain quantum effects; and confirmed that those effects could still be produced by the chosen reduced structure; but the structure became so reduced that the demonstrations were less clear than in the 2006 paper, and questions arose about what other primal characteristics of quantum mechanics may have already been lost due to evisceration of internal structure of the transformation of quantum state, i.e., the "wave equation" which has been replaced above by ad hoc tables specifying the successor weights for each experiment.

Sapience and the limits of formal reasoning

2018-06-02T11:40:00.000-07:00

Anakin: Is it possible to learn this power?
Palpatine: Not from a Jedi.
— Star Wars: Episode III – Revenge of the Sith, George Lucas, 2005.

In this post I mean to tie together several puzzles I've struggled with, on this blog and elsewhere, for years; especially, on one hand, the philosophical implications of Gödel's results on the limitations of formal reasoning (post), and on the other hand, the implications of evidence that sapient minds are doing something our technological artifacts do not (post).

From time to time, amongst my exploratory/speculative posts here, I do come to some relatively firm conclusion; so, in this post, with the philosophical implications of Gödel. A central notion here will be that formal systems manipulate information from below, while sapiences manipulate it from above.

As a bonus I'll also consider how these ideas on formal logic might apply to my investigations on basic physics (post); though, that will be more in the exploratory/speculative vein.

As this post is mostly tying together ideas I've developed in earlier posts, it won't be nearly as long as the earlier posts that developed them. Though I continue to document the paths my thoughts follow on the way to any conclusions, those paths won't be long enough to meander too very much this time; for proper meandering, see the earlier posts.

Contents
Truth
Physics

Truth

Through roughly the second half of the nineteenth century, mathematicians aggressively extended the range of formal reasoning, ultimately reaching for a single set of axioms that would found all of logic and mathematics. That last goal was decisively nixed by Gödel's Theorem(s) in 1931. Gödel proved, in essence, that any sufficiently nontrivial formal axiomatic system, if it doesn't prove anything false, cannot prove itself to be self-consistent. It's still possible to construct a more powerful axiomatic system that can prove the first one self-consistent, but that more powerful system then cannot prove itself self-consistent. In fact, you can construct an infinite series of not-wrong axiomatic systems, each of which can prove all of its predecessors self-consistent, but each system cannot prove its own self-consistency.

In other words, there is no well-defined maximum of truth obtainable by axiomatic means. By those means, you can go too far (allowing proofs of some things that aren't so), or you can stop short (failing to prove some things that are so), but you can't hit the target.

For those of us who work with formal reasoning a lot, this is a perplexing result. What should one make of it? Is there some notion of truth that is beyond the power of all these formal systems? And what would that even mean?

For the question of whether there is a notion of objective mathematical truth beyond the power of all these formal systems, the evident answer is, not formally. There's more to that than just the trivial observation that something more powerful than any axiomatic system cannot itself be an axiomatic system; we can also reasonably expect that whatever it is, we likely won't be able to prove its power is greater axiomatically.

I don't buy into the notion that the human mind mystically transcends the physical; an open mind I have, but I'm a reductionist at heart. Here, though, we have an out. In acknowledging that a hypothetical more-powerful something might not be formally provable more powerful, we open the door to candidates that we can't formally justify. Such as, a sapient mind that emerges by some combination of its constituent parts and so seemingly ought to be no more powerful than those parts, but... is. In practice. (There's a quip floating around, that "In theory, there is no difference between theory and practice. But, in practice, there is.")

A related issue here is the Curry-Howard correspondence, much touted in some circles as a fundamental connection between computation and logic. Except, I submit it can't be as fundamental as all that. Why? Because of the Church-Turing thesis. Which says, in essence, that there is a robust most-powerful sort of computation. In keeping with our expectation of an informal cap on formal power, the Church-Turing thesis in this general sense is inherently unprovable; however, specific parts of it are formally provable, formal equivalence between particular formal models of computation. The major proofs in that vein, establishing the credibility of the general principle, were done within the next several years after Gödel's Theorems proved that there isn't a most-powerful sort of formal logic. Long story short: most-powerful sort of computation, yes; most-powerful sort of formal logic, no; therefore, computation and formal logic are not the same thing.

Through my recent post exploring the difference between sapient minds and all our technological artifacts, I concluded, amongst other things, that (1) sapience cannot be measured by any standardized test, because for any standardized test one can always construct a technological artifact that will outperform sapient minds; and (2) sapient minds are capable of grasping the "big picture" within which all technology behaves, including what the purpose of a set of formal rules is, whether the purpose is achieved, when to step outside the rules, and how to improvise behavior once outside.

A complementary observation about formal systems is that each individual action taken —each axiomatic application— is driven by the elementary details of the system state. That is, the individual steps of the formal system are selected on a view looking up from the bottom of the information structure, whereas sapience looks downward from somewhere higher in the information structure. This can only be a qualitative description of the difference between the sapient and formal approaches, for the simple reason that we do not, in fact, know how to do sapience. As discussed in the earlier post, our technology does not even attempt to achieve actual sapience because we don't know, from a technical perspective, what we would be trying to achieve — since we can't even measure it, though we have various informal ways to observe its presence.

Keep in mind that this quality of sapience is not uniform. Though some cases are straightforward, in general clambering up into the higher levels of structure, from which to take a wide-angle view, may be extremely difficult even with sapience, and some people are better at it than others, apparently for reasons of both nature, nurture, and circumstance. Indeed, the mix of reasons that lead a Newton or an Einstein to climb particularly high in the structure are just the sort of thing I'd expect to be quite beyond the practical grasp of formal analysis.

What we see in Gödel's results is, then, that even when we accept a reductionist premise that the whole structure is built up by axioms from an elementary foundation, for a sufficiently powerful system there are fundamental limits to the sorts of high-level insights that can be assembled by building strictly upward from the bottom of the structure.

Is that a big insight? Formally it says nothing at all. But I can honestly say that, having reached it, for the first time in <mumble-mumble> decades of contemplation I see Gödel's results as evidence of something that makes sense to me rather than evidence that something is failing to make sense to me.

Physics

In modern physics, too, we have a large-scale phenomenon (classical reality) that evidently cannot be straightforwardly built up by simple accretion of low-level elements of the system (quanta). Is it possible to understand this as another instance of the same broad phenomenon as the failure, per Gödel, to build a robust notion of truth from elementary axioms?

Probably not, as I'll elaborate below. However, in the process I'll turn up some ideas that may yet lead somewhere, though quite where remains to be seen; so, a bit of meandering after all.

Gödel's axiomatic scenario has two qualitative features not immediately apparent for modern physics:

Axiomatic truth appears to be part of, and therefore to evolve toward, absolute truth; the gap between the two appears to be a quantitative thing that shrinks as one continues to derive results axiomatically, even though it's unclear whether it shrinks toward zero, or toward some other-sized gap. Whereas, the gap between quantum state and classical state is clearly qualitative and does not really diminish under any circumstances.
The axiomatic shortfall only kicks in for sufficiently powerful systems. It's not immediately clear what property in physics would correspond to axiomatic power of this sort.

The sapience/formalism dichotomy doesn't manifest the same way for different sorts of structure; witness the aforementioned difference between computational power and axiomatic power, where apparently one has a robust maximum while the other does not. There is no obvious precedent to expect the dichotomy to generate a Gödel-style scale-gap in arbitrary settings. Nonetheless; might there still be a physics analog to these features of axiomatic systems?

Quantum state-evolution does not smooth out toward classical state-evolution at scale; this is the point of the Schrödinger's-cat thought experiment. A Gödel-style effect in physics would seem to require some sort of shading from quantum state-evolution toward classical state-evolution. I don't see what shading of that sort would mean.

There is another possibility, here: turn the classical/quantum relationship on its head. Could classical state-evolution shade toward quantum state-evolution? Apparently, yes; I've already described a way for this to happen, when in my first post on co-hygiene I suggested that the network topology of spacetime, acting at a cosmological scale, could create a seeming of nondeterminism at comparatively small scales. Interestingly, this would also be a reversal in scale, with the effect flowing from cosmological scale to small scale. However, the very fact that this appears to flow from large to small does not fit the expected pattern of the Gödel analogy, which plays on the contrast between bottom-up formalism and top-down sapience.

On the other front, what of the sufficient-power threshold, clearly featured on the logic side of the analogy? If the quantum/classical dichotomy is an instance of the same effect, it would seem there must be something in physics corresponding to this power threshold. Physics considered in the abstract as a description of physical reality has no obvious place for power in a logical or computational sense. Interestingly, however, the particular alternative vein of speculation I've been exploring here lately (co-hygiene and quantum gravity) recommends modeling physical reality as a discrete structure that evolves through a dimension orthogonal to spacetime, progressively toward a stable state approximating the probabilistic predictions of quantum mechanics — and it is reasonable to ask how much computational power the primitive operations of this orthogonal evolution of spacetime ought to have.

In such a scenario, the computational power is applied to state-evolution from some initial state of spacetime to a stable outcome, for some sense of stable to be determined. As a practical matter, this amounts to a transformation from some probability distribution of initial states of spacetime, to a probability distribution of stable states of spacetime that presumably resembles the probability distributions predicted by quantum mechanics. As it is unclear how one chooses the initial probability distribution, I've toyed with the idea that a quantum mechanics-like distribution might be some sort of fixpoint under this transformation, so that spacetime would tend to come out resembling quantum mechanics more-or-less-regardless of the initial distribution.

The spacetime-rewriting relation would also be the medium through which cosmological-scale determinism would induce small-scale apparent nondeterminism.

Between inducing nondeterminism and transforming probability distributions, there would seem to be, potentially, great scope for dependence on the relative computational power of the rewriting relation. With such a complex interplay of factors at stake, it seems likely that even if there were a Gödel-like power threshold lurking, it would have to be deduced from a much better understanding of the rewriting relation, rather than contributing to a basic understanding of the rewriting relation. Nevertheless, I'm inclined to keep a weather eye out for any such power threshold as I move forward.

Thoughts on Jaynes's Breakdown of the Bicameral Mind

2018-03-05T09:07:00.000-08:00

It is one of those books that is either complete rubbish or a work of consummate genius, nothing in between! Probably the former, but I'm hedging my bets.
— comment about Jaynes's The Origin of Consciousness in the Breakdown of the Bicameral Mind in Richard Dawkins's The God Delusion, 2006.

I've just read Julian Jaynes's 1976 book The Origin of Consciousness in the Breakdown of the Bicameral Mind, and here I'm posting my thoughts; built on roughly the structure of, though wider-ranging than, a book review.

This book engages three of my particular interests, deeply entangled in the instance so that they come as a package. I'm interested in the evolution and nature of the human mind, which of course is Jaynes's subject matter. I'm also interested in how to read a forceful presentation of a theory without missing its fault lines. And I'm interested in how best to present an unorthodox theory. (I've touched on all three of these in various past posts on this blog.)

To be clear: I enjoyed reading Jaynes's book; I think he's glimpsing something real though it might not be quite what he thinks it is; and I think his book, and his ideas, are worth studying. Keep those things in mind, moving forward through this post. My interests will cause me to emphasize criticisms of Jaynes's theories, I'll be trying to assemble a coherent alternative to contrast with Jaynes's theories, and with all that going on in this post the positive aspects of my assessment might get a bit buried. But I wouldn't be paying such close attention to Jaynes if I didn't see his work as fundamentally deserving of that attention.

When studying any forceful presentation of a theory, there is risk of joining the author in whatever traps of thinking they're caught in. The best time to scout out where the traps/fault lines are (take your pick of metaphors) is on first reading. That's true of both orthodox and unorthodox theories, btw, indeed it's a common challenge for orthodox theories, where the traps must be easily overlooked for the theory to have achieved orthodoxy. Unorthodox theories are sometimes presented with markers that make them sound crazy, in which case the larger challenge may be to avoid underestimating them; but a strong unorthodox presentation, without craziness markers, — such as Jaynes's — can also contain hidden traps, and moreover the reader has to distinguish genuine traps from, so to speak, legitimate unorthodoxy.

Hence my reading Jaynes slowly and cautiously, jotting down whatever notes came to mind as I went along.

It wouldn't be difficult for Jaynes's basic theory to sound crazy; Dawkins has a point there. At its baldest, Jaynes's big idea is that until about four thousand years ago, human beings didn't have a conscious mind, but instead had a self-unaware left brain that took orders from hallucinated gods generated by their right brain — the bicameral mind. You don't want to dive right into the thick of a thing like that, and Jaynes doesn't do so. He builds his case slowly, so that as he adds pieces to the puzzle it's clear how they fit.

I see no need to choose between completely accepting or completely rejecting Jaynes's ideas, though. There seems room for Jaynes to be seeing some things others have missed, while missing some factors that lead him to a more-extreme-than-necessary explanation of what he sees. This particularly works if one has a suggestion for what Jaynes might be missing; and I do. I have in mind broadly memetics, and particularly the notion of verbal society which I suggested on this blog some time back and have revisited several times, notably [1], [2].

As for a work of consummate genius, well, that depends on one's view of genius. If it's possible for a work to be a masterstroke regardless of how much of it is right or wrong, then, why not? It's easy, when the Iliad says that someone did something because a god told them to, to say, oh, that's a poetic device; but in an academic climate where "poetic device" is the standard explanation, it takes something special to say — seriously, and with extensive scholarly research to back it up — that maybe, when the Iliad says a god told them to do something, the Iliad means just what it says.

Contents
Caveats
Background
The book that Jaynes wrote
Commentary
Storytelling

Caveats

When seeking to show an audience the plausibility of a paradigm scientific theory, it's common to point out things that are consistent with the theory. However, if you're trying to show plausibility of a highly unorthodox scientific theory (the sort whose opponents might call "lunatic fringe"), imo that technique basically doesn't work. My reasoning has to do with contrast between rival theories.

Imagine I've got a large whiteboard, with nothing written on it. (When I was in high-school, it would have been a blackboard; and some years from now perhaps it'll be some sort of giant touchscreen technology. At any rate, it's big; say at least a yard/meter high and wider than it is high, perhaps a lot wider.) The points on this whiteboard are possible explanations for things; that is, explanations that we could, in principle, entertain. I draw on it a small circle, perhaps the size of the palm of my hand. The points inside the circle are tried and true sorts of scientific theories; we have repeatedly used experiments to test them against alternative explanations, and in that way they have earned good reputations as solid, viable explanations. So if one of those well-reputed explanations works very well for some new thing you're looking at, it's a credible candidate to explain that thing.

What if none of the explanations in that small circle works for the new thing you're studying? I'll draw a larger circle around that one, maybe four times the diameter. Points inside this larger circle are explanations we have thought of, even if they seem quite loony. There are flying saucers in there, and Sasquatches, and ancient aliens visiting Earth to build pyramids. But they're all explanations that we've thought of, even if we didn't think highly of them. Even the strangest among them, though, may have some people who favor them. And when we've got a new thing that doesn't afford an explanation in the smaller circle, but it could be explained by, say, ancient pyramid-building aliens (to take a vivid example), some people will claim that's evidence for ancient pyramid-building aliens.

Except, it isn't evidence for ancient pyramid-building aliens. It's consistent with ancient pyramid-building aliens, but ancient pyramid-building aliens don't have the earned reputation of things in the smaller circle. Remember, those orthodox explanations earned their reputations through experiments that contrasted them against alternatives. But when none of those orthodox explanations works for this new thing, and ancient pyramid-building aliens does work for the new thing, what alternative theories should we be considering? Presumably, anything that has as much repute as ancient pyramid-building aliens.

Uhuh.

And this is why I've made these circles much smaller than the whole whiteboard. The points in the larger circle are explanations we have thought of; but most of the whiteboard is outside that circle, and all that larger outside is explanations that we could consider, but we haven't thought of them. And really, we don't know how much of that vast array of explanations we haven't thought of might be (if we thought of it) at least as well reputed as ancient pyramid-building aliens.

The moral of the story, it would seem, is that if you're studying a really unorthodox explanation, and you want to be able to say something stronger than just that it would suffice to explain the phenomenon, you should work at finding alternatives.

I don't mean to lambaste Jaynes for not coming up with alternatives; Jaynes was pulling off a profoundly impressive feat by coming up with one solid unorthodoxy, it's hardly fair to complain that he didn't come up with several. But it does seem that however many facts he finds to be consistent with his unorthodoxy, one ought not to interpret that as support for the unorthodoxy, as such. Throughout my reading of Jaynes, I kept this sort of skepticism in mind.

Another sort of trap for the unwary researcher in areas relating to the mind —orthodox or no— is highly abstract terms that really don't mean at all the same thing to everyone. (The same sort of problem may arise in religion, another area with really extraordinarily abstract terms.) I experienced this problem myself, some years ago, when reading Susan Blackmore's The Meme Machine. Through most of the book I felt Blackmore seemed pretty much on-target, until I came to her chapter on the self; and when I hit that chapter it was quickly clear that something was going horribly wrong. Suddenly, I found Blackmore saying things that on their face (the face presented to me, of course) were obviously, glaringly false. And not just saying them, reveling in them. She was quite excited, after having believed all her life that she had a self, to realize that the self did not exist. This struck me as beyond silly. If she was so sure she didn't have a self, who did she imagine had written her book?

I didn't take this to be, necessarily, a mistake by Blackmore; it didn't feel that way, though there wasn't any other explanation that felt compellingly right either. But not chalking it up to a mistake by Blackmore did not in any way change the overt falsity of what she was saying. Hence my initial phrasing, that something was going horribly wrong.

After considerable puzzling (about a week's worth), I worked out what was going wrong. It wasn't a problem with the concepts, neither on Blackmore's part nor mine. It was a problem with the word "self". Susan Blackmore had believed all her life in... something... and was quite excited to realize that that something did not exist. But she called that something "self". And that thing, that she called "self", was something I had never believed in to begin with. I had always used the word "self" to mean something else. So when she said she had realized that the self does not exist, to me she was denying the existence of something quite different from what she intended to say did not exist. I think she was denying the existence of what Daniel Dennett would call the audience of the Cartesian theater — which Dennett spent much of his classic book Consciousness Explained debunking.

The moral here would seem to be, don't assume that other people mean the same thing you do by these sorts of highly abstract words.

Those two potential traps came to mind for me pretty quickly when I started reading Jaynes. Another, more content-specific, trap occurred to me a few chapters into the book. There is a well-known (in some circles) phenomenon that medical students, as they learn about various diseases, start worrying that they themselves may be suffering from those diseases. I've inherited a story of someone remarking, about an instance of this phenomenon, "Just wait till they start studying psychiatry." Well. Jaynes was a psychologist. There's this tendency to think in terms of pathologies. And it seemed to me, as I got into the thick of the book, that Jaynes was placing undue weight on pathological states such as schizophrenia. Without that emphasis, it seemed, one should be able to formulate a theory in the same general direction as Jaynes was exploring, without going to the extreme he went to (his bicameral man).

Background

Jaynes is concerned with the development of consciousness over time, and, peripheral to that, the development of language over time.

Some major milestones of human development, more-or-less agreed upon:

About two and a half million years ago, stone tools appear. Start of the paleolithic (old stone age).
About forty or fifty thousand years ago, give or take, there is an explosion in the variety of artifacts. Art, tools for making tools, tools with artistic flare, tools for making clothing, etc. Start of the upper paleolithic (late stone age).
About ten thousand years ago (your millennium may vary), human agriculture begins. Start of the neolithic (new stone age).
About four thousand years ago, the first writing appears. This is a bit after the neolithic (perhaps a thousand years) and into the Bronze Age.
About 2500 years ago, around the time of Plato, science and philosophy blossom in ancient Greek civilization. Eric Havelock proposed that this is when ancient Greek society passes from orality to literacy.

According to Havelock's theory, the shift from oral society, in which knowledge is founded on oral epics such as the Iliad, to literate society in which knowledge is founded on writing, profoundly changes the character of human thinking. Modern Afghanistan has been suggested as an example of orality.

To Havelock's theory, I've proposed to add a still earlier phase of language and society, preceding orality, which I've tentatively called verbality. My notion of what verbality might look like has been inspired by the Pirahã language lately studied by Daniel Everett as recounted in his 2008 book Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle. In particular, amongst many other peculiar features of the Pirahã, their culture has no art or storytelling, while their language has no tense and no numerical or temporal vocabulary. It seems perfectly reasonable that the Pirahã would not be typical of verbality, since it's typical for a verbal culture to have vanished many thousands of years go; but I see it as a demonstration of possibility. It may be significant for Jaynes's theory that Everett describes a Pirahã group hallucination.

I don't have a good handle on just what precipitated the ancient transition from verbality to orality — although, if one speculates that the story of the expulsion from Eden might be, in some part, a distant memory of the verbality/orality transition, it may have been pretty traumatic. However, I do have a timeframe. If verbality does not support art, one would expect the transition to be clearly marked by the appearance of art; besides which, I expect a dramatic acceleration of memetic development starting at the transition; so, I place the end of verbality and start of orality circa forty thousand years go, at the beginning of the upper paleolithic.

Once orality starts, about forty thousand years ago, it would then be necessary to work out increasingly effective ways to tell stories. It seems likely to have been a very difficult and slow process; one would, on reflection, hardly expect ancient humans to immediately shift from not telling stories at all to great epics. I'm guessing that writing, which didn't show up for about thirty six thousand years, was a natural development once the art of storytelling reached a certain level of maturity. I really hadn't thought about oral society struggling to develop the art of storytelling, though, until I started reading Jaynes.

The book that Jaynes wrote

Jaynes begins with a chronological rundown of theories of consciousness. This is good strategy, as it places his ideas solidly into context, allows the reader to see him doing so, and allows him to be seen considering alternatives, which helps not only the credibility of the theory, but also of Jaynes himself; not incidentally, as proponents of unorthodoxy need to be seen to be well-informed and attentive. On the downside, his treatment of individual past theories tends to make light of them — although, I notice, on at least one occasion some chapters later, he acknowledges having just used such a tactic, suggestive that he perceives it as a perfectly valid stylistic mode and not something to be taken too much to heart. I think he'd come across better by showing more respect for rival theories; at any rate, it's my preference.

His rundown of past theories seems likely to suffer from a problem, such as I described earlier, with the highly abstract term consciousness. He's quite clear that these different theories are saying different things, but he appears to assume they are all trying to get at a single idea. The difficulty might also be described in terms of Kuhnian paradigms (which I've discussed often on this blog, e.g. [3], [4]). Amongst the functions of a paradigm, according to Kuhn, it determines what entities exist, what sorts of questions can be asked about them, and what sorts of answers can be given. So, while the different paradigms Jaynes describes are all searching for truth in the same general neighborhood, one should expect that some of the variance between them is not merely about what answer to give to a single common question that all of them are pursuing, but about what question is most useful to ask. As a reader, I struggled to deduce, from how Jaynes presented these past theories, just what question he wanted to answer; and I was still working on pinning that down after I'd finished the book. His own notion of consciousness is, to my understanding, substantially about narratization, essentially telling a story about the self. This notion of the self as a character in a story told by the mind seems fairly close to what I think of as self (as opposed to what Susan Blackmore apparently used to think of as self before studying memetics); and is clearly an application of storytelling (the thing that, by my hypothesis, is missing from verbality).

He is a good writer; reading his prose is — at least if you're interested in the subjects he's discussing — perhaps not a page-turner but nonetheless interesting rather than oppressive.

Following the introduction, he divides the work into three parts — Books I, II, and III — addressing the nature of the bicameral mind (Book I); the evidence he sees, burning across history, of the bicameral mind and its progressive breakdown (Book II); and the remnants of bicameralism he sees in our modern state (Book III). He added a substantial Afterword in 1990, apparently when he stopped lecturing at Princeton, at the age of 70, and seven years before his death.

Jaynes's central idea is that for some time leading up to about 4000 years ago, human minds functioned along different lines than the narratization-based consciousness we experience today. Instead, the human mind was, in Jaynes's terminology, bicameral. The left brain (more properly the hemisphere opposite the dominant side of the body, but most people are right-handed) handled ordinary stuff, and when additional oversight was needed, the right brain provided a hallucination of someone telling the left brain what to do. These hallucinations were perceived to be gods; or rather, in Jaynes's framework, by definition they were gods. One illustration he mentions, from the Iliad, has an angry Achilles asking Agamemnon to account for his behavior, Agamemnon says a god told him to, and Achilles just accepts that. The way Jaynes talks about these gods often makes them sound as if they were coherent beings, which struck me as an overestimation of how much coordination a civilization would likely be afforded simply by its population being bicameral. Jaynes portrays a nation of bicameral humans as extraordinarily well-coordinated (in terms that sometimes seem to flirt with group selection, a particular pet peeve of Richard Dawkins that he spent most of his book The Selfish Gene debunking).

Jaynes's notion of bicamerality is extensively tied to his ideas about human language. The area of the brain ordinarily responsible for language is on the left side of the brain but the corresponding right-side structure is largely unused; he figures that right-hand structure is where hallucinated voices came from. His general view of the differing functions of the hemispheres is largely in line with, if distinctly more cautious than, the pop-psychology notion of analytic left brain and synthetic/artistic right brain (apparently the pop-psychology view had just gotten started a few years before Jaynes's book came out). He has some specific notions about the stages by which human language developed, which I didn't fully absorb (too detailed, perhaps, to pick up while struggling with the big picture of the book on a first reading), though apparently he sees metaphor as key to the way full-blown human language works in general. In a passage that stuck in my mind, he says that his linguist friends tell him human language is very old, stretching far back in the paleolithic (I've read estimates from a hundred thousand years all the way back to the start of the paleolithic at two and a half million years); he suggests this is implausible because things ought to have moved along much faster if language had been around during all that time, and instead he proposes language only started at the beginning of the upper stone age, forty thousand years ago.

He dates the start of bicamerality to the onset of agriculture, at the paleolithic/neolithic boundary, circa ten thousand years ago. His reasoning (to the best of my understanding) is that to make agriculture work required coordination of large groups, and this coordination was achieved via bicameral gods. For some thousands of years (nominally, about six thousand) this worked well, but then the world got more stressful, partly due to increasing population through agriculture enabled by bicamerality, and the gods couldn't keep up, forcing the development of the new regime of consciousness.

Commentary

Jaynes seems to me to be operating at a disadvantage. Drawing inspiration from something he's familiar with, and viewing history through the lens of his individual perspective, he sees a pattern that he finds compellingly evident in history. It seems — from my individual perspective — that a less extreme explanation for the historical evidence might well be formulated; but the less extreme explanation I see uses tools that weren't available yet when Jaynes was developing his theory. Jaynes draws inspiration from his knowledge of the phenomenon of schizophrenics taking orders from hallucinations; which imho really is a delightfully bold move to shake up an orthodoxy that, like most orthodoxy, could do with a good shake-up. But, Jaynes's book was published in the same year with Dawkins's The Selfish Gene, which coined the word meme. It's easy to look back from four decades later and say that memetics can account for profound changes in population dynamics on a scale that Jaynes felt needed a radical hypothesis like bicamerality; but you can't stand on the shoulders of giants who haven't arrived yet.

Jaynes is concerned primarily, of course, with the breakdown of the bicameral mind, over time starting about four thousand years ago; he has little to say about the upper paleolithic, the thirty-thousand-or-so years from the start of language (by his reckoning, or the start of orality by mine) to the onset of bicamerality (by his reckoning, when the technological practice of agriculture was developed). His later discussion of modern hallucinations describes them as vestiges of bicamerality, which rather begs the question of whether humans in the upper paleolithic had hallucinations. The example of the Pirahã suggests to me that hallucinations —and language— were already part of the human condition even before the upper paleolithic. (An interesting question for further consideration is whether the Pirahã's group hallucinations were non-linguistic.)

My own preference is for less radical transitions (consistent with Occam's razor). Jaynes may be underestimating how much of qualitative consciousness can exist without narratization in the modern sense; how much of language can exist without support for art or storytelling; how far social structure may be determined by what is believed without involving any fundamental change in how belief is processed by the mind. He also appears, in particular, to be underestimating how loosely organized the modern "conscious" mind is. His view of consciousness is monolithic (something I particularly noted when he began to discuss schizophrenia in Book III). Recall the atomic notion of self, which Susan Blackmore described rejecting after having previously believed in it. If the self is a character in a story we tell ourselves, then the mind that's telling the story was never really atomic in the first place, and we needn't expect a mind that tells such a story to be fundamentally differently organized than one that doesn't tell such a story. If hallucinations are somewhere within the penumbra of normal human mental functioning (and to my non-psychologist's eye it seems they may bear some kinship to narratization), it's possible for such phenomena to have had changing roles in society over the millennia without requiring a traumatic shift to/from a bicameral mind.

Another major pitfall he's at risk for concerns interpretation of evidence. Our perception of the distant past is grounded in physical evidence, but we have to build up layers on layers of interpretation on it to produce a coherent picture, so that what we actually see in our coherent picture is almost all interpretation. That leaves tremendous scope for self-fulfilling expectations in the sort of reasoning Jaynes is doing, where he reconsiders the evidence in light of his theory to see how well it fits. Some of his remarks reveal he's aware of this, but still, there it is. When he talks about how the meaning of a word changed over time, one should keep in mind that this is how he supposes it changed over time; the actual evidence is only written words themselves, while all the meanings involved are couched in a vast network of guesses.

The distinction between supportive evidence and consistent evidence is not absolute; it depends on how distinctive the evidence is — how much it calls for explanation. This needs care when applied at scale. Jaynes, in particular, examines a great pile of assorted evidence. When the theory intersects with a sufficient mass of evidence, just being consistent with so much begins to seem impressive; but really one has to sum up over the whole mass, and the sum of very many data points can still be zero; it depends on the data points.

One doesn't want to give an unorthodox theory credit for explaining things that hadn't needed explaining.

Sometimes an explanation seems warranted. Jaynes remarks of the Iliad that it never describes human bodies as a whole, but rather as collections of parts, and that the same trend is visible in visual art of the time; though that seems open to an explanation in terms of evolving technology for storytelling, it doesn't seem gratuitous to ask for some explanation of it. Another point that gave me pause was his claim that the extraordinarily easy Spanish conquest of the Inca Empire was because the Inca Empire was bicameral, with the entire population following the dictates of their bicameral gods; though I didn't find it an altogether compelling case for his explanation, that chapter in history is odd enough that orthodox explanation isn't entirely at ease with it either.

Jaynes as a whole, though, gave me some general sense of unnecessary explanations. He sees evidence of hallucinations where I see unremarkable phenomena (such as "Houses of God") that may be consistent with his theory but don't need it. In Book III he is particularly keen on the idea that modern humans look for authority to take the place of the bicameral gods they have been deprived of; he sees a quest for bicameral authority in our attitude toward science (he's missing the difference between science and religion, btw, which may in part follow from predating memetics but I still found unsettling), and even sees the same quest for authority in our enjoyment of stage magic; but I have never felt that people looking for authorities to follow needed explanation. I figure it's a basic behavioral impulse with some evolutionary value, rather like the impulse to be fair to others, or the impulse to hate people who don't belong to one's own social group (a very mixed bag, our basic behavioral impulses). Yet more broadly, throughout the book he presents religion as a remnant of bicamerality. Admittedly, this may come under the heading of things he missed by predating memetics; I now react to it by thinking, religion is neatly explained by evolution of memetic organisms — which ties in to the verbality/orality hypothesis — but I only made that evolutionary connection myself in the mid-1990s (earlier post).

Occasionally, in Jaynes's efforts to fit his theory to know facts, he encounters facts that don't fit easily. Overall, this happens to him only sporadically. He is aware that demonic possession doesn't fit his model, and tries to make it fit anyway. He finds himself reaching to explain why poetry and music, which he maintains are remnants of bicameralism, still exist — which wouldn't be a problem if he hadn't started by hypothesizing they were remnants of a radically different type of mind rather than being phenomena within the normal range of the sort of mind we now have.

Storytelling

I look forward — after I fully digest my first reading of Jaynes — to a second reading. My particular objective on a second reading would be to consider in detail how the evidence he claims for his bicamerality storyline fits with my verbality/orality storyline. This objective wouldn't have been possible on the first reading, as I was too busy struggling to grok the overall shape of what he was saying; in fact, though I'd been accumulating thought fragments throughout his book, it wasn't until Jaynes's Afterword that I realized, in a definite Aha! moment (my notes pinpoint it at the top of page 458), that the key concept in relating Jaynes's theories with mine is storytelling, which underpins Jaynes's notion of consciousness and my notion of the verbality/orality transition. So, as part of that full digestion, following is the more elaborated form that my theories have achieved from their first pass by Jaynes.

My narrative timeline, as it now stands (yes, theorizing is itself storytelling, which in this case feeds into the story being told since it implies that the advent of storytelling would produce a tremendous acceleration of human intellectual development), starts with the transition from verbality to orality at the beginning of the upper paleolithic. Speculatively, this cultural transition may coincide in language development to the introduction of one or both of the two devices mentioned above as missing from Pirahã: time, and numbers. Jaynes's ideas about consciousness are rather close to those two factors, as well. Once the orality-threshold device is introduced, whatever it is, there is a distinct expansion of human activity.

If the start of the upper paleolithic is when orality starts, it's a long time before the period Jaynes primarily discusses, as his breakdown of the bicameral mind starts only about four thousand years ago. The intervening thirty six thousand years, be the same more or less, would have to be accounted for by the very slow process of inventing the art of advanced storytelling. As mentioned above, Jaynes has little to say about this period. He reckons language only began where I'm placing the verbality/orality transition, at the start of the upper paleolithic, and he (iirc) briefly describes a series of stages in the development of language that would have taken place during the upper paleolithic before the fully developed device of language catalyzed the emergence of the bicameral mind and the neolithic. Some of Jaynes's language stages would likely precede storytelling, but certainly a second reading should carefully examine these stages in case some of them offer some inspiration on storytelling after all. On the other hand, if he is indeed overestimating how much of consciousness must postdate his bicameral era, his timeline for the development of consciousness starting four thousand years ago might, on careful examination, be mapped more widely onto the entire oral period from (nominally) forty thousand to twenty five hundred years ago.

After the verbality/orality transition, the next specific event in my timeline is the emergence of writing, the point at which, by my conceptual framework, the art of storytelling exceeds a critical threshold enabling it to support the written form. This coincides with Jaynes's start of the breakdown of the bicameral mind, four thousand years ago. Jaynes's bicameral age is for me the late part of the larger oral period prior to emergent writing; his bicameral age might well be plausibly reinterpretable as a phase in the development of storytelling, perhaps something milder than but similar to bicamerality, though quite what that would be is unclear (and might stubbornly remain unresolved even after a second in-depth reading).

The period from the advent of writing onward is intensively covered in Jaynes's book, and wants close reconsideration from top to bottom. Several complications apply.

Reinterpretations are likely to be steep in this period, with a wide conceptual gap. In Jaynes's framework, bicamerality is an absolute state of mind with power to direct ancient empires, while religions are pale echoes of it; in mine, bicamerality is expected to fall within the normal operating range of the human mind (though perhaps not a part of the range commonly exercised in the modern era), while religions are memetic organisms with the power to direct ancient empires.

I remarked earlier on the treacherous nature of physical evidence with multiple layers of interpretation built on it. A particular complication here is that Jaynes is judging what people think by how they describe their experiences, but I am hypothesizing that throughout the entire period people were trying to figure out how to describe their experiences, and in particular I'm guessing that explaining one's own thoughts was especially hard to figure out; so that the further back in time you go, the less people's descriptions reflect their inner life.

Judging by the above rough sketch of a timeline, the Iliad as we know it — even after compensating (or trying to) for mutation between being composed and being written down — should already represent an extremely advanced stage of storytelling, chronologically about seven eighths of the way from the onset of storytelling toward the present day. Hopefully, a close second reading can use the depth of Jaynes's treatment to conjecture intermediate steps in the long evolution of advanced storytelling.

[Update: I undertook a second close reading of Jaynes, and explored the evolution of storytelling in that light, a year later in post Storytelling.]

Sapience and non-sapience

2018-02-13T20:22:00.000-08:00

DOCTOR: I knew a Galactic Federation once, lots of different lifeforms so they appointed a justice machine to administer the law.
ROMANA: What happened?
DOCTOR: They found the Federation in contempt of court and blew up the entire galaxy.
— The Stones of Blood, Doctor Who, BBC, 1978.

The biggest systemic threat atm to the future of civilization, I submit, is that we will design out of it the most important information-processing asset we have: ourselves. Sapient beings. Granted, there is a lot of bad stuff going on in the world right now; I put this threat first because coping with other problems tends to depend on civilization's collective wisdom.

That is, we're much less likely to get into trouble by successfully endowing our creations with sapience, than by our non-sapient creations leaching the sapience out of us. I'm not just talking about AIs, though that's a hot topic for discussion lately; our non-sapient creations include, for a few examples, corporations (remember Mitt Romney saying "corporations are people"?), bureaucracy (cf. Franz Kafka), AIs, big data analysis, restrictive user interfaces, and totalitarian governments.

I'm not saying AI isn't powerful, or useful. I'm certainly not suggesting human beings are all brilliant and wise — although one might argue that stupidity is something only a sapient being can achieve. Computers can't be stupid. They can do stupid things, but they don't produce the stupidity, merely conduct and amplify it. Including, of course, amplifying the consequences of assigning sapient tasks to non-sapient devices such as computers. Stupidity, especially by people in positions of power, is indeed a major threat in the world; but as a practical matter, much stupidity comes down to not thinking rationally, thus failing to tap the potential of our own sapience. Technological creations are by no means the only thing discouraging us from rational thought; but even in (for example) the case of religious "blind faith", technological creations can make things worse.

To be clear, when I say "collective wisdom", I don't just mean addressing externals like global climate change; I also mean addressing us. One of our technological creations is a global economic infrastructure that shapes most collective decisions about how the world is to run ("money makes the world go 'round"). We have some degree of control over how that infrastructure works, but limited control and also limited understanding of it; at some point I hope to blog about how that infrastructure does and can work; but the salient point for the current post is, if we want to survive as a species, we would do well to understand what human beings contribute to the global infrastructure. Solving the global economic conundrum is clearly beyond the scope of this post, but it seems that this post is a preliminary thereto.

I've mentioned before on this blog the contrast between sapience and non-sapience. Here I mean to explore the contrast, and interplay, between them more closely. Notably, populations of sapient beings have group dynamics fundamentally different from — and, seemingly, far more efficacious from an evolutionary standpoint than — the group dynamics of non-sapient constructs.

Not only am I unconvinced that modern science can create sapience, I don't think we can even measure it.

Contents
Chess
Memetics
The sorcerer's apprentice
Lies, damned lies, and statistics
Pro-sapient tech
Storytelling and social upheaval

We seem to have talked ourselves into an inferiority complex. Broadly, I see three major trends contributing to this.

For one thing, advocates of science since Darwin, in attempting to articulate for a popular audience the profound implications of Darwinian theory, have emphasized the power of "blind" evolution, and in doing so they've tended to describe it in decision-making terms, rather as if it were thinking. Evolution thinks about the ways it changes species over time in the same sense that weather thinks about eroding a mountain, which is to say, not at all. Religious thinkers have tended to ascribe some divine specialness to human beings, and even scientific thinkers have shown a tendency, until relatively recently, to portray evolution as culminating in humanity; but in favoring objective observation over mysticism, science advocates have been pushed (even if despite themselves) into downplaying human specialness. Moreover, science advocates in emphasizing evolution have also played into a strong and ancient religious tradition that views parts/aspects of nature, and Nature herself, as sapient (cf. my past remarks on oral society).

Meanwhile, in the capitalist structure of the world we've created, people are strongly motivated to devise ways to do things with technology, and strongly motivated to make strong claims about what they can do with it. There is no obvious capitalist motive for them to suggest technology might be inferior to people for some purposes, let alone for them to actually go out and look for advantages of not using technology for some things. Certainly our technology can do things with algorithms and vast quantities of data that clearly could not be done by an unaided human mind. So we've accumulated both evidence and claims for the power of technology, and neither for the power of the human mind.

The third major trend I see is more insidious. Following the scientific methods of objectivity highly recommended by their success in studying the natural world, we tried to objectively measure our intelligence; it seemed like a good idea at the time. And how do you objectively measure it? The means that comes to mind is to identify a standard, well-defined, structured task that requires intelligence (in some sense of the word), and test how well we do that task. It's just a matter of finding the right task to test for... right? No, it's not. The reason is appallingly simple. If a task really is well-defined and structured, we can in principle build technology to do it. It's when the task isn't well-defined and structured that a sapient mind is wanted. For quite a while this wasn't a problem. Alan Turing proposed a test for whether a computer could "think" that it seemed no computer would be passing any time soon; computers were nowhere near image recognition; computers were hilariously bad at natural-language translation; computers couldn't play chess on the level of human masters.

To be brutally honest, automated natural-language translation is still awful. That task is defined by the way the human mind works — which might sound dismissive if you infer mere eccentricities of human thinking, but becomes quite profound if you take "the way the human mind works" to mean "sapience". The most obvious way computers can do automatic translation well is if we train people to constrain their thoughts to patterns that computers don't have a problem with; which seemingly amounts to training people to avoid sapient thought. (Training people to avoid sapient thought is, historically, characteristic of demagogues.) Image processing is still a tough nut to crack, though we're making progress. But chess has certainly been technologized. It figures that would be the first-technologized of those tasks I've mentioned because it's the most well-defined and structured of them. When it happened, I didn't take it as a sign that computers were becoming sapient, but rather a demonstration that chess doesn't strictly require whatever-it-is that distinguishes sapience. I wasn't impressed by Go, either. I wondered about computer Jeopardy!; but on reflection, that too is a highly structured problem, with no more penalty for a completely nonsensical wrong answer than for a plausible wrong one. I'm not suggesting these aren't all impressive technological achievements; I'm suggesting the very objectivity of these measures hides the missing element in them — understanding.

Recently in a discussion I read, someone described modern advances in AI by saying computers are getting 'better and better at understanding the world' (or nearly those words), and I thought, understanding is just what they aren't doing. It seems to me the technology is doing what it's always done — getting better and better at solving classes of problems without understanding them. The idea that the technology understands anything at all seems to me to be an extraordinary claim, therefore requiring extraordinary proof which I do not see forthcoming since, as remarked, we expect to be unable to test it by means of the most obvious sort of experiment (a structured aptitude test). If someone wants to contend that the opposite claim I'm making is also extraordinary — the claim that we understand in a sense the technology does not — I'll tentatively allow that resolving the question in either direction may require extraordinary proof; but I maintain there are things we need to do in case I'm right.

Somebody, I maintain, has to bring a big-picture perspective to bear. To understand, in order to choose the goals of what our technology is set to do, in order to choose the structural paradigm for the problem, in order to judge when the technology is actually solving the problem and when the situation falls outside the paradigm. In order to improvise what to do when the situation does fall outside the paradigm. That somebody has to be sapient.

For those skeptics who may wonder (keeping in mind I'm all for skepticism, myself) whether there is an unfalsifiable claim lurking here somewhere, note that we are not universally prohibited from observing the gap between sapience and non-sapience. The difficulty is with one means of observation: a very large and important class of experiments are predictably incapable of measuring, or even detecting, the gap. The reason this does not imply unfalsifiability is that scientific inquiry isn't limited to that particular class of experiments, large and important though the class is; the range of scientific inquiry doesn't have specific formally-defined boundaries — because it's an activity of sapient minds.

The gap is at least suggested by the aforementioned difficulty of automatic translation. What's missing in automatic translation is understanding: by its nature automatic translation treats texts for translation as strings to be manipulated, rather than indications about the reality in which their author is embedded. Whatever is missed by automatic translation because it is manipulating strings without thinking about their meaning, that is a manifestation of the sapience/non-sapience gap. Presumably, with enough work one could continue to improve automatic translators; any particular failure of translation can always be fixed, just as any standardized test can be technologized. How small the automatic-translation shortfall can be made in practice, remains to be seen; but the shape of the shortfall should always be that of an automated system doing a technical manipulation that reveals absence of comprehension.

Consider fly-by-wire airplanes, which I mentioned in a previous post. What happens when a fly-by-wire airplane encounters a situation outside the parameters of the fly-by-wire system? It turns control over to the human pilots. Who often don't realize, for a few critical moments (if those moments weren't critical, we wouldn't be talking about them, and quite likely the fly-by-wire system would not have bailed) that the fly-by-wire system has stopped flying the plane for them; and they have to orient themselves to the situation; and they've mostly been getting practice at letting the fly-by-wire system do things for them. And then when this stacked-deck of a situation leads to a horrible outcome, there are strong psychological, political, and economic incentives to conclude that it was human error; after all, the humans were in control at the denouement, right? It seems pretty clear to me that, of the possible ways that one could try to divvy up tasks between technology and humans, the model currently used by fly-by-wire airplanes (and now, one suspects, drive-by-wire cars) is a poor model, dividing tasks for the convenience of whoever is providing the automation rather than for the synergism of the human/non-human ensemble. It doesn't look as if we know how to design such systems for synergism of the ensemble; and it's not immediately clear that there's any economic incentive for us to figure it out. Occasionally, of course, something that seems unprofitable has economic potential that's only waiting for somebody to figure out how to exploit it; if there is such potential here, we may need first to understand the information-processing characteristics of sapience better. Meanwhile, I suggest, there is a massive penalty, on a civilization-wide scale (which is outside the province of ordinary economics), if we fail to figure out how to design our technology to nurture sapience. It should be possible to nurture sapience without first knowing how it works, or even exactly what it does — though figuring out how to nurture it may bring us closer to those other things.

I'll remark other facets of the inferiority-complex effect, as they arise in discussion, below.

By the time I'm writing this post, I've moved further along a path of thought I mentioned in my first contentful post on this blog. I wrote then that in Dawkins's original description of memetics, he made an understandable mistake by saying that memetic life was "still in its infancy, still drifting clumsily about in its primeval soup". That much I'm quite satisfied with: it was a mistake — memetic evolution has apparently proceeded about three to five orders of magnitude faster than genetic evolution, and has been well beyond primeval soup for millennia, perhaps tens of millennia — and it was an understandable mistake, at that. I have more to say now, though, about the origins of the mistake. I wrote that memetic organisms are hard to recognize because you can't observe them directly, as their primary form is abstract rather than physical; and that's true as far as it goes; but there's also something deeper going on. Dawkins is a geneticist, and in describing necessary conditions under which replication gives rise to evolution, he assumed it would always require the sort of conditions that genetic replication needs to produce evolution. In particular, he appears to have assumed there must be a mechanism that copies a basic representation of information with fantastically high fidelity.

Now, this is a tricky point. I'm okay with the idea that extreme-fidelity basic replication is necessary for genetic evolution. It seems logically cogent that something would have to be replicated with extreme fidelity to support evolution-in-general (such as memetic evolution). But I see no reason this extreme-fidelity replication would have to occur in the basic representation. There's no apparent reason we must be able to pin down at all just what is being replicated with extreme fidelity, nor must we be able to identify a mechanism for extreme-fidelity copying. If we stipulate that evolution implies something is being extreme-fidelity-copied, and we see that evolution is taking place, we can infer that some extreme-fidelity copying is taking place; but evolution works by exploiting what happens with indifference to why it happens. We might find that underlying material is being copied wildly unfaithfully, yet somehow, beyond our ability to follow the connections, this copying preserves some inarticulable abstract property that leads to an observable evolutionary outcome. Evolution would exploit the abstract property with complete indifference to our inability to isolate it.

It appears that in the case of genetic evolution, we have identified a basic extreme-fidelity copying mechanism. In fact, apparently it even has an error-detection-and-correction mechanism built into it; which certainly seems solid confirmation that such extreme fidelity was direly needed for genetic evolution or such a sophisticated mechanism would never have developed. Yet there appears to be nothing remotely like that for memetic replication. If memetic evolution really had the same sort of dynamics as genetic evolution, we would indeed expect memetic life to be "still drifting clumsily about in its primeval soup"; it couldn't possibly do better than that until it had developed a super-high-fidelity low-level replicating mechanism.

Yet memetic evolution proceeds at, comparatively, break-neck pace, in spectacular defiance of the expectation. Therefore we may suppose that the dynamics of memetic evolution are altered by some factor profoundly different from genetic evolution.

I suggest the key altering factor of memetic evolution, overturning the dynamics of genetic evolution, is that the basic elements of the host medium — people, rather than chemicals — are sapient. What this implies is that, while memetic replication involves obviously-low-fidelity copying of explicitly represented information, the individual hosts are thinking about the content, processing it through the lens of their big-picture sapient perspective. Apparently, this can result in an information flow with abstract fixpoints — things that get copied with extreme fidelity — that can't be readily mapped onto the explicit representation (e.g., what is said/written). My sense of this situation is that if it is even useful to explicitly posit the existence of discrete "memes" in memetic evolution, it might yet be appropriate to treat them as unknown quantities rather than pouring effort into trying to identify them individually. It seems possible the wholesale discreteness assumption may be unhelpful as well — though ideas don't seem like a continuous fluid in the usual simple sense, either.

This particular observation of the sapient/non-sapient gap is from an unusual angle. When trying to build an AI, we're likely to think in terms of what makes an individual entity sapient; likewise when defining sapience. The group dynamics of populations of sapients versus non-sapients probably won't (at a guess) help us in any direct way to build or measure sapience; but it does offer a striking view of the existence of a sapience/non-sapience gap. I've remarked before that groups of people get less sapient at scale; a population of sapiences is not itself sapient; but it appears that, when building a system, mixing in sapient components can produce systemic properties that aren't attainable with uniformly non-sapient components, thus attesting that the two kinds of components do have different properties.

This evolutionary property of networks of sapiences affords yet another opportunity to underestimate sapience itself. Seeing that populations of humans can accumulate tremendous knowledge over time — and recognizing that no individual can hope to achieve great feats of intellect without learning from, and interacting with, such a scholastic tradition — and given the various motives, discussed above, for downplaying human specialness — it may be tempting to suppose that sapience is not, after all, a property of individuals. However, cogito, ergo that's taking the idea of collective intelligence to an absurdity. The evolutionary property of memetics I've described is not merely a property of how the network is set up; if it were, genetic evolution ought to have struck on it at some point.

There are, broadly, three idealized models (at least three) of how a self-directing system can develop. There's "blind evolution", which explores alternatives by maintaining a large population with different individuals blundering down different paths simultaneously, and if the population is big enough, the variety amongst individuals is broad enough, and the viable paths are close enough to blunder into, enough individuals will succeed well enough that the population evolves rather than going extinct. This strategy isn't applicable to a single systemic decision, as with the now-topical issue of global climate change: there's no opportunity for different individuals to live in different global climates, so there's no opportunity for individuals who make better choices to survive better than individuals who make poorer choices. As a second model, there's a system directed by a sapience; the individual sapient mind who runs the show can plan, devising possible strategies and weighing their possible consequences before choosing. It is also subject to all the weaknesses and fallibilities of individuals — including plain old corruption (which, we're reminded, power causes). The third model is a large population of sapiences, evolving memetically — and that's different again. I don't pretend to fully grok the dynamics of that third model, and I think it's safe to say no-one else does either; we're all learning about it in real time as history unfolds, struggling with different ways of arranging societies (governmentally, economically, what have you).

A key weakness of the third model is that it only applies under fragile conditions; in particular, the conditions may be deliberately disrupted, at least in the short term; keeping in mind we're dealing with a population of sapiences each potentially deliberate. When systemic bias or small controlling population interferes with the homogeneity of the sapient population, the model breaks down and the system control loses — at least, partly loses — its memetic dynamics. This is a vulnerability shared in common by the systems of democracy and capitalism.

There are, of course, more-than-adequate ways for us to get into trouble by succeeding in giving our technology sapience. A particularly straightforward one is that we give it sapience and it decides it doesn't want to do what we want it to. In science fiction this scenario may be accompanied by a premise that the created sapience is smarter than we are — although, looking around at history, there seems a dearth of evidence that smart people end up running the show. Even if they're only about as smart, and stupid, as we are, an influx of artificial sapiences into the general pool of sapience in civilization is likely to throw off the balance of the pool as a whole — either deliberately or, more likely, inadvertently. One has only to ask whether sapient AIs should have the right to vote to see a tangle of moral, ethical, and practical problems cascading forth (with vote rigging on one side, slavery on the other; not forgetting that, spreading opaque fog over the whole, we have no clue how to test for sapience). However, I see no particular reason to think we're close to giving our technology sapience; I have doubts we're even trying to do so, since I doubt we know where that target actually is, making it impossible for us to aim for it (though mistaking something else for the target is another opportunity for trouble). Even if we could eventually get ourselves into trouble by giving our technology sapience, we might not last long enough to do so because we get ourselves into trouble sooner by the non-sapient-technology route. So, back to non-sapience.

A major theme in non-sapient information processing is algorithms: rigidly specified instructions for how to proceed. An archetypal cautionary tale about what goes wrong with algorithms is The Sorcerer's Apprentice, an illustration (amongst other possible interpretations) of what happens when a rigid formula is followed without sapient oversight of when the formula itself ceases to be appropriate due to big-picture perspective. One might argue that this characteristic rigidity is an inherently non-sapient limitation of algorithms.

It's not an accident that error-handling is among the great unresolved mysteries of programming-language design — algorithms being neither well-suited to determine when things have gone wrong, nor well-suited to cope with the mess when they do.

Algorithmic rigidity is what makes bureaucracy something to complain about — blind adherence to rules even when they don't make sense in the context where they occur, evoking the metaphor of being tied up in red tape. The evident dehumanizing effect of bureaucracy is that it eliminates discretion to take advantage of understanding arbitrary aspects of big picture; it seems that to afford full scope to sapience, maximizing its potential, one wants to provide arbitrary flexibility — freedom — avoiding limitation to discrete choices.

A bureaucratic system can give lip service to "giving people more choices" by adding on additional rules, but this is not a route to the sort of innate freedom that empowers the potential of sapience. To the contrary: sapient minds are ultimately less able to cope with vast networks of complicated rules than technological creations such as computers — or corporations, or governments — are, and consequently, institutions such as corporations and governments naturally evolve vast networks of complicated rules as a strategy for asserting control over sapiences. There are a variety of ways to describe this. One might say that an institution, because it is a non-sapient entity in a sea of sapient minds, is more likely to survive if it has some property that limits sapient minds so they're less likely to overwhelm it. A more cynical way to say the same thing is that the institution survives better if it finds a way to prevent people from thinking. A stereotypical liberal conspiracy theorist might say "they" strangle "us" with complicated rules to keep us down — which, if you think about it, is yet another way of saying the same thing (other than the usual incautious assumption of conspiracy theorists, that the behavior must be a deliberate plot by individual sapiences rather than an evolved survival strategy of memetic organisms). Some people are far better at handling complexity than others, but even the greatest of our complexity tolerances are trivial compared to those of our non-sapient creations. Part of my point here is that I don't think that's somehow a "flaw" in us, but rather part of the inherent operational characteristics of sapience that shape the way it ought to be most effectively applied.

A second major theme in non-sapient information processing is "big data". Where algorithms contrast with sapience in logical strategy, big data contrasts in sheer volume of raw data.

These two dimensions — logical strategy and data scale — are evidently related. Algorithms can be applied directly to arbitrarily-large-scale data; sapience cannot, which is why big data is the province of non-sapient technology. I suggested in an earlier post that the device of sapience only works at a certain range of scales, and that the sizes of both our short- and our long-term memories may be, to some extent, essential consequences of sapience rather than accidental consequences of evolution. Not everyone tops out at the same scale of raw data, of course; some people can take in a lot more, or a lot less, than others before they need to impose some structure on it. Interestingly, this is pretty clearly not some sort of "magnitude" of sapience, as there have been acknowledged geniuses, of different styles, toward both ends of the spectrum; examples that come to mind, Leonard Euler (with a spectacular memory) and Albert Einstein (notoriously absent-minded).

That we sapiences can "make sense" of raw data, imposing structure on it and thereby coping with masses of data far beyond our ability to handle in raw form, would seem to be part of the essence of what it means to be sapient. The attendant limitation on raw data processing would then be a technical property of the Platonic realm in broadly the same sense as fundamental constants like π, e, etc., and distant kin to such properties of the physical realm as the conditions necessary for nuclear fusion.

Sometimes, we can make sense of vast data sets, many orders of magnitude beyond our native capacity, by leveraging technological capacity to process more-or-less-arbitrarily large volumes of raw data and boil it down algorithmically, to a scale/form within our scope. It should be clear that the success of the enterprise depends on how insightfully we direct the technology on how to boil down the data; essentially, we have to intuit what sorts of analysis will give us the right sorts of information to gain insight into the salient features of the data. We're then at the short end of a data-mining lever; the bigger the data mine, the trickier it is to reason out how to direct the technological part of the operation. It's also possible to deliberately choose an analysis that will give us the answer we want, rather than helping us learn about reality. And thus are born the twin phenomena of misuse of statistics and abuse of statistics.

There may be a temptation to apply technology to the problem of deciding how to mine the data. That —it should be clear on reflection— is an illusion. The technology is just as devoid of sapient insight when we apply it to the meta-analysis as when we applied it to the analysis directly; and the potential for miscues is yet larger, since technology working at the meta-level is in a position to make more biasing errors through lack of judgement.

One might be tempted to think of conceptualization, the process by which we impose concepts on raw data to structure and thus make sense of it, as "both cause and cure" of our limited capacity to process raw data; but this would, imo, be a mistake of orientation. Conceptualization — which seems to be the basic functional manifestation of sapience — may cause the limited-capacity problem, and it may also be the "cure", i.e., the means by which we cope with the problem, but neither of those is the point of conceptualization/sapience. As discussed, sapience differs from non-sapient information processing in ways that don't obviously fit on any sort of spectrum. Consider: logically, our inability to directly grok big data can't be a "failure" unless one makes a value judgement that that particular ability is something we should be able to do — and making a value judgement is something that can only be meaningfully ascribed to a sapience.

It's also rather common to imagine the possibility of a sapience of a different order, capable of processing vast (perhaps even arbitrarily vast) quantities of data. This can result from —as noted earlier— portraying evolution as if it were a sapient process. It may result from an extrapolation based on the existence of some people with higher raw-data tolerances than others; but this treats "intelligence" as an ordering correlated with raw data processing capacity — which, as I've noted above, it is not. Human sapiences toward the upper end of raw data processing capacity don't appear to be "more sapient", rather it's more like they're striking a different balance of parameters. Different strengths and weaknesses occur at different mixtures of the parameters, and this seems to me characteristic of an effect (sapience) that can only occur under a limited range of conditions, with the effect breaking down in different ways depending on which boundary of the range is crossed. Alternatively, it has sometimes been suggested there should be some sort of fundamentally different kind of mind, working on different principles than our own; but once one no longer expects this supposed effect to have anything to do with sapience as it occurs in humans, I see no basis on which to conjecture the supposed effect at all.

There's also yet another opportunity here for us to talk ourselves into an inferiority complex. We tend to break down a holistic situation into components for understanding, and then when things fail we may be inclined to ascribe failure to a particular component, rather than to the way the components fit together or to the system as a whole. So when a human/technology ensemble fails, we're that much more likely to blame the human component.

How can we design technology to nurture sapience rather than stifle it? Though I don't claim to grasp the full scope of this formidable challenge, I have some suggestions that should help.

On the stifling side, the two big principles I've discussed are algorithms and scale; algorithms eliminate the arbitrary flexibility that gives sapience room to function, while vast masses of data overwhelm sapiences (technology handles arbitrarily large masses of data smoothly, not trying to grok big-picture implications that presumably grow at least quadratically with scale). Evidently sapience needs full-spectrum access to the data (it can't react to what it doesn't know), needs to have hands-on experience from which to learn, needs to be unfettered in its flexibility to act on what it sees.

Tedium should be avoided. Aspects of this are likely well-known in some circles, perhaps know-how related to (human) assembly-line work; from my own experience, tedium can trip up sapience in a couple of ways, that blur into each other. Repeating actions over and over can lead to inattention, so that when a case comes along that ought to be treated differently, the sapient operator just does the same thing yet again, either failing to notice it at all, or "catching it too late" (i.e., becoming aware of the anomaly after having already committed to processing it in the usual way). On the other hand, paying full attention to an endless series of simple cases, even if they offer variations maintaining novelty, can exhaust the sapient operator's decision-making capacity; I, for one, find that making lots of little decisions drains me for a time, as if I had a reservoir of choice that, when depleted, refills at a limited natural rate. (I somewhat recall a theory ascribed to Barack Obama that a person can only make one or two big decisions per day; same principle.)

Another important principle to keep in mind is that sapient minds need experience. Even "deep learning" AIs need training, but with sapiences the need is deeper and wider; the point is not merely to "train" them to do a particular task, important though that is, but to give them accumulated broad experience in the whole unbounded context surrounding whatever particular tasks are involved. Teaching a student to think is an educator's highest aspiration. An expert sapient practitioner of any trade uses "tricks of the trade" that may be entirely outside the box. A typical metaphor for extreme forms of such applied sapient measures is 'chewing gum and baling wire'. One of the subtle traps of over-reliance on technology is that if sapiences aren't getting plenty of broad, wide hands-on experience, when situations outside known parameters arise there will be no-one clueful to deal with it — even if the infrastructure has sufficiently broad human-accessible flexibility to provide scope for out-of-the-box sapient measures. (An old joke describes an expert being called in to fix some sort of complex system involving pipes under pressure —recently perhaps a nuclear power plant, some older versions involve a steamboat— who looks around, taps a valve somewhere, and everything starts working again; the expert charges a huge amount of money —say a million dollars, though the figure has to ratchet up over time due to inflation— and explains, when challenged on the amount, that one dollar is for tapping the valve, and the rest is for knowing where to tap.)

This presents an economic/social challenge. The need to provide humans with hands-on experience is a long-term investment in fundamental robustness. For the same reason that standardized tests ultimately cannot measure sapience, short-term performance on any sufficiently well-structured task can be improved by applying technology to it, which can lead to a search for ways to make tasks more well-structured — with a completely predictable loss of ability to deal with... the unpredictable. I touched on an instance of this phenomenon when describing, in an earlier post, the inherent robustness of a traffic system made up of human drivers.

Suppression of sapience also takes much more sweeping, long-term systemic forms. A particular case that made a deep impression on me: in studying the history of my home town I was fascinated that the earliest European landowners of the area received land grants from the king, several generations before Massachusetts residents rose up in rebellion against English rule (causing a considerable ruckus, which you may have heard about). Those land grants were subject to proving the land, which is to say, demonstrating an ability to develop it. Think about that. We criticize various parties —developers, big corporations, whatever— for exploiting the environment, but those land grants, some four hundred years ago under a different system of government, required exploiting the land, otherwise the land would be taken away and given to someone else. Just how profoundly is that exploitation woven into the fabric of Western civilization? It appears to be quite beyond distinctions like monarchy versus democracy, capitalism versus socialism. We've got hold of the tail of a vast beast that hasn't even turned 'round to where we can see the thing as a whole; it's far, far beyond anything I can tackle in this post, except to note pointedly that we must be aware of it, and be thinking about it.

A much simpler, but also pernicious, source of long-term systemic bias is planning to add support for creativity "later". Criticism of this practice could be drawn to quite reasonable tactical concerns like whether anyone will really ever get around to attempting the addition, and whether a successful addition would fail to take hold because it would come too late to overcome previously established patterns of behavior; the key criticism I recommend, though, is that strategically, creativity is itself systemic and needs to be inherent in the design from the start. Anything tacked on as an afterthought would be necessarily inferior.

To give proper scope for sapience, its input — the information presented to the sapient operator in a technological interface — should be high-bandwidth from an unbounded well of ordered complexity. There has to be underlying rhyme-and-reason to what is presented, otherwise information overload is likely, but it mustn't be stoppered down to the sort of simple order that lends itself to formal, aka technological, treatment, which would defeat the purpose of bringing a sapience to bear on it. Take English text as archetypical: built up mostly from 26 letters and a few punctuation marks and whitespace, yet as one scales up, any formal/technological grasp on its complexity starts to fuzz until ultimately it gets entirely outside what a non-sapience can handle. Technology sinks in the swamp of natural language, while to a sapience natural language comes... well, naturally. This sort of emergent formal intractability seems a characteristic domain of sapience. There is apparently some range of variation in the sorts of rhyme and reason involved; for my part, I favor a clean simple set of orthogonal primitives, while another sort of mind favors a less tidy primitive set (more-or-less the design difference between Scheme and Common Lisp).

When filtering input to avoid simply overwhelming the sapient user, whitelisting is inherently more dangerous than blacklisting. That is, an automatic filter to admit information makes an algorithmic judgement about what may be important, which judgement is properly the purview of sapience, to assess unbounded context; whereas a filter to omit completely predictable information, though it certainly can go wrong, has a better chance of working since it isn't trying to make a call about which information is extraneous, only about which information is completely predictable (if properly designed; censorship being one of the ways for it to go horribly wrong).

On the output side —i.e., what the sapient operator is empowered to do— a key aspect is effective ability to step outside the framework. Sets of discrete top-level choices are likely to stifle sapient creativity rather than enhance it (not to be confused with a set of building blocks, which would include the aforementioned letters-plus-punctuation). While there is obvious advantage in facilities to support common types of actions, those facilities need to blend smoothly with robust handling of general cases, to produce graceful degradation when stepping off the beaten path. Handling some approaches more easily than others might easily turn into systemic bias against the others — a highly context-dependent pitfall, on which the reason for less-supported behavior seems to be the pivotal factor. (Consider the role of motive-for-deviation in the subjective balance between pestering the operator about an unconventional choice until they give it up, versus allowing one anomaly to needlessly propagate unchecked complications.)

A final thought, grounding this view of individual sapiences back into global systemic threats (where I started, at the top of the post).

Have you noticed it's really hard to adapt a really good book into a really good movie? So it seems to me. When top-flight literature translates successfully to a top-flight movie, the literature is more likely to have been a short story. A whole book is more likely to translate into a miniseries, or a set of movies. I was particularly interested by the Harry Potter movies, which I found suffered from their attempt to fit far too much into each single movie; the Harry Potter books were mostly quite long, and were notable for their rich detail, and that couldn't possibly be captured by one movie per book without reducing the richness to something telegraphic. The books were classics, for the ages; the movies weren't actually bad, but they weren't in the same rarefied league as the books. (I've wondered if one could turn the Harry Potter book set into a television series, with one season per book.)

The trouble in converting literature to cinematography is bandwidth. From a technical standpoint this is counter-intuitive: text takes vastly less digital storage than video; but how much of that data can be used as effective signal depends on what kind of signal is intended. I maintain that as a storytelling medium, text is extremely high-bandwidth while video is a severe bottleneck, stunningly inefficient at getting the relevant ideas across if, indeed, they can be expressed at all. In essence, I suggest, storytelling is what language has evolved for. A picture may be worth a thousand words, but (a) it depends on which words and which picture, (b) it's apparently more like 84 words, and (c) it doesn't follow that a thousand pictures are worth a thousand times as many words.

In a post here some time back, I theorized that human language has evolved in three major stages (post). The current stage in the developed world is literacy, in which society embraces written language as a foundation for acquiring knowledge. The preceding stage was orality, where oral sagas are the foundation for acquiring knowledge, according to the theory propounded by Eric Havelock in his magnum opus Preface to Plato, where he proposes that Plato lived on the cusp of the transition of ancient Greek society from orality to literacy. My extrapolation from Havelock's theory says that before the orality stage of language was another stage I've called verbality, which I speculate may have more-or-less resembled the peculiar Amazonian language Pirahã (documented by Daniel Everett in Don't Sleep There are Snakes). Pirahã has a variety of strange features, but what particularly attracted my attention was that, adding up these features, Pirahã apparently does not and cannot support an oral culture; Pirahã culture has no history, art, or storytelling (does not), and the language has no temporal vocabulary, tense, or number system (cannot).

'No storytelling' is where this relates back to books-versus-movies. The nature of the transition from verbality to orality is unclear to me; but I (now) conjecture that once the transition to orality occurs, there would then necessarily be a long period of linguistic evolution during which society would slowly figure out how to tell stories. At some point in this development, writing would arise and after a while precipitate the transition to literacy. But the written form of language, in order to support the transition to literate society, would particularly have to be ideally suited to storytelling.

Soon after the inception of email as a communication medium came the development of emoticons: symbols absent from traditional written storytelling but evidently needed to fill in for the contextual "body language" clues ordinarily available in face-to-face social interaction. Demonstrating that social interaction itself is not storytelling as such, for which written language was already well suited without emoticons. One might conjecture that video, while lower-storytelling-bandwidth than text, could have higher effective social-interaction-bandwidth than text. And on the other side of the equation, emoticons also demonstrate that the new electronic medium was already being used for non-storytelling social interaction.

For another glimpse into the character of the electronic medium, contrast the experience of browsing Wikibooks — an online library of some thousands of open-access textbooks — against the pre-Internet experience of browsing in an academic library.

On Wikibooks, perhaps you enter through the main page, which offers you a search box and links to some top-level subject pages like Computing, Engineering, Humanities, and such. Each of those top-level subject pages provides an array of subsections, and each subsection will list all its own books as well as listing its own sub-subsections, and so on. The ubiquitous search box will do a string search, listing first pages that mention your chosen search terms in the page title, then pages that contain the terms somewhere in the content of the page. Look at a particular page of a book, and you'll see the text, perhaps navigation links such as next/previous page, parent page, subpages; there might be a navigation box on the right side of the page that shows the top-level table of contents of the book.

At the pre-Internet library, typically, you enter past the circulation desk, where a librarian is seated. Past that, you come to the card catalog; hundreds of alphabetically labeled deep drawers of three-by-five index cards, each card cumulatively customized by successive librarians over decades, perhaps over more than a century if this is a long-established library. (Side insight, btw: that card catalog is, in its essence, a collaborative hypertext document very like a wiki.) You may spend some time browsing through the catalog, flipping through the cards in various drawers, jotting down notes and using them to move from one drawer to another — a slower process than if you could move instantly from one to another by clicking an electronic link, but also a qualitatively richer experience. At every moment, surrounding context bears on your awareness; other index cards near the one you're looking at, other drawers; and beyond that, strange though it now seems that this is worth saying, you are in a room, literally immersed in context. Furniture, lights, perhaps a cork bulletin board with some notices on it; posters, signs, or notices on the walls, sometimes even thematic displays; miscellany (is that a potted plant over there?); likely some other people, quietly going about their own business. The librarian you passed at the desk probably had some of their own stuff there, may have been reading a book. Context. Having taking notes on what you found in the card catalog and formulated a plan, you move on to the stacks; long rows of closely spaced bookcases, carefully labeled according to some indexing system referenced by the cards and jotted down in your notes, with perhaps additional notices on some of the cases — you're in another room — you come to the shelves, and may well browse through other books near what your notes direct you to, which you can hardly help noticing (not like an electronic system where you generally have to go out of your way to conjure up whatever context the system may be able to provide). You select the particular book you want, and perhaps take it to a reading desk (or just plunk down on the carpet right there, or a nearby footstool, to read); and as you're looking at a physical book, you may well flip through the pages as you go, yet another inherently context-intensive browsing technique made possible by the physicality of the situation.

What makes this whole pre-Internet experience profoundly different from Wikibooks — and I say this as a great enthusiast of Wikibooks — is the rich, deep, pervasive context. And context is where this dovetails back into the main theme of this post, recognizing context as the special province of sapience.

When the thriving memetic ecosystem of oral culture was introduced to the medium of written language, it did profoundly change things, producing literate culture, and new taxonomic classes of memetic organisms that could not have thrived in oral society (I'm thinking especially of scientific organisms); but despite these profound changes, the medium still thoroughly supported language, and context-intensive social interactions mostly remained in the realm of face-to-face encounters. So the memetic ecosystem continued to thrive.

Memetic ecosystem is where all of this links back to the earlier discussion of populations of sapiences.

That discussion noted system self-direction through a population of sapiences can break down if the system is thrown out of balance. And while the memetic ecosystem handily survived the transition to literacy, it's an open question what will happen with the transition to the Internet medium. This time, the new medium is highly context-resistant while it aggressively pulls in social interactions. With sapience centering on context aspects that are by default eliminated or drastically transformed in the transition, it seems the transition must have, somehow, an extreme impact on the way sapient minds develop. If there is indeed a healthy, stable form of society to be achieved on the far side of this transition, I don't think we should kid ourselves that we know what that will look like, but it's likely to be very different, in some way or other, from the sort of stable society that preceded.

The obvious forecast is social upheaval. The new system doesn't know how to put itself together, or really even know for sure whether it can. The old system is pretty sure to push back. As I write this, I look at the political chaos in the United States —and elsewhere— and I see these forces at work.

And I think of the word singularity.

Co-hygiene and quantum gravity

2017-06-16T20:17:00.000-07:00

[l'Universo] è scritto in lingua matematica
([The Universe] is written in the language of mathematics)
— Galileo Galilei, Il Saggiatore (The Assayer), 1623.

Here's another installment in my ongoing exploration of exotic ways to structure a theory of basic physics. In our last exciting episode, I backtraced a baffling structural similarity between term-rewriting calculi and basic physics to a term-rewriting property I dubbed co-hygiene. This time, I'll consider what this particular vein of theory would imply about the big-picture structure of a theory of physics. For starters, I'll suggest it would imply, if fruitful, that quantum gravity is likely to be ultimately unfruitful and, moreover, quantum mechanics ought to be less foundational than it has been taken to be. The post continues on from there much further than, candidly, I had expected it to; by the end of this installment my immediate focus will be distinctly shifting toward relativity.

To be perfectly clear: I am not suggesting anyone should stop pursuing quantum gravity, nor anything else for that matter. I want to expand the range of theories explored, not contract it. I broadly diagnose basic physics as having fallen into a fundamental rut of thinking, that is, assuming something deeply structural about the subject that ought not to be assumed; and since my indirect evidence for this diagnosis doesn't tell me what that deep structural assumption is, I want to devise a range of mind-bendingly different ways to structure theories of physics, to reduce the likelihood that any structural choice would be made through mere failure to imagine an alternative.

The structural similarity I've been pursuing analogizes between, on one side, the contrast of pure function-application with side-effect-ful operations in term-rewriting calculi; and on the other side, the contrast of gravity with the other fundamental forces in physics. Gravity corresponds to pure function-application, and the other fundamental forces correspond to side-effects. In the earlier co-hygiene post I considered what this analogy might imply about nondeterminism in physics, and I'd thought my next post in the series would be about whether or not it's even mathematically possible to derive the quantum variety of nondeterminism from the sort of physical structure indicated. Just lately, though, I've realized there may be more to draw from the analogy by considering first what it implies about non-locality, folding in nondeterminism later. Starting with the observation that if quantum non-locality ("spooky action at a distance") is part of the analog to side-effects, then gravity should be outside the entanglement framework, implying both that quantum gravity would be a non-starter, and that quantum mechanics, which is routinely interpreted to act directly from the foundation of reality by shaping the spectrum of alternative versions of the entire universe, would have to be happening at a less fundamental level than the one where gravity differs from the other forces.

On my way to new material here, I'll start with material mostly revisited from the earlier post, where it was mixed in with a great deal of other material; here it will be more concentrated, with a different emphasis and perhaps some extra elements leading to additional inferences. As for the earlier material that isn't revisited here — I'm very glad it's there. This is, deliberately, paradigm-bending stuff, where different parts don't belong to the same conceptual framework and can't easily be held in the mind all at once; so if I hadn't written down all that intermediate thinking at the time, with its nuances and tangents, I don't think I could recapture it all later. I'll continue here my policy of capturing the journey, with its intermediate thoughts and their nuances and tangents.

Until I started describing λ-calculus here in earnest, it hadn't registered on me that it would be a major section of the post. Turns out, though, my perception of λ-calculus has been profoundly transformed by the infusion of perspective from physics; so I found myself going back to revisit basic principles that I would have skipped lightly over twenty years ago, and perhaps even two years ago. It remains to be seen whether developments later in this post will sufficiently alter my perspective to provoke yet another recasting of λ-calculus in some future post.

Contents
Side-effects
Variables
Side-effect-ful variables
Quantum scope
Geometry and network
Cosmic structure

Side-effects

There were three main notions of computability in the 1930s that were proved equi-powerful by the Church-Turing thesis: general recursive functions, λ-calculus, and Turing machines (due respectively to Jacques Herbrand and Kurt Gödel, to Alonzo Church, and to Alan Turing). General recursive functions are broadly equational in style, λ-calculus is stylistically more applicative; both are purely functional. Turing machines, on the other hand, are explicitly imperative. Gödel apparently lacked confidence in the purely functional approaches as notions of mechanical calculability, though Church was more confident, until the purely functional approaches were proven equivalent to Turing machines; which to me makes sense as a matter of concreteness. (There's some discussion of the history in a paper by Solomon Feferman; pdf.)

This mismatch between abstract elegance and concrete straightforwardness was an early obstacle, in the 1960s, to applying λ-calculus to programming-language semantics. Gordon Plotkin found a schematic solution strategy for the mismatch in his 1975 paper "Call-by-name, call-by-value and the λ-calculus" (pdf); one sets up two formal systems, one a calculus with abstract elegance akin to λ-calculus, the other an operational semantics with concrete clarity akin to Turing machines, then proves well-behavedness theorems for the calculus and correspondence theorems between the calculus and operational semantics. The well-behavedness of the calculus allows us to reason conveniently about program behavior, while the concreteness of the operational semantics allows us to be certain we are really reasoning about what we intend to. For the whole arrangement to work, we need to find a calculus that is fully well-behaved while matching the behavior of the operational semantics we want so that the correspondence theorems can be established.

Plotkin's 1975 paper modified λ-calculus to match the behavior of eager argument evaluation; he devised a call-by-value λ_v-calculus, with all the requisite theorems. The behavior was, however, still purely functional, i.e., without side-effects. Traditional mathematics doesn't incorporate side-effects. There was (if you think about it) no need for traditional mathematics to explicitly incorporate side-effects, because the practice of traditional mathematics was already awash in side-effects. Mutable state: mathematicians wrote down what they were doing; and they changed their own mental state and each others'. Non-local control-flow (aka "goto"s): mathematicians made intuitive leaps, and the measure of proof was understandability by other sapient mathematicians rather than conformance to some purely hierarchical ordering. The formulae themselves didn't contain side-effects because they didn't have to. Computer programs, though, have to explicitly encompass all these contextual factors that the mathematician implicitly provided to traditional mathematics. Programs are usually side-effect-ful.

In the 1980s Matthias Felleisen devised λ-like calculi to capture side-effect-ful behavior. At the time, though, he didn't quite manage the entire suite of theorems that Plotkin's paradigm had called for. Somewhere, something had to be compromised. In the first published form of Felleisen's calculi, he slightly weakened the well-behavedness theorems for the calculus. In another published variant he achieved full elegance for the calculus but slightly weakened the correspondence theorems between the calculus and the operational semantics. In yet another published variant he slightly modified the behavior — in operational semantics as well as calculus — to something he was able to reconcile without compromising the strength of the various theorems. This, then, is where I came into the picture: given Felleisen's solution and a fresh perspective (each generation knows a little less about what can't be done than the generation before), I thought I saw a way to capture the unmodified side-effect-ful behavior without weakening any of the theorems. Eventually I seized an opportunity to explore the insight, when I was writing my dissertation on a nearby topic. To explain where my approach fits in, I need to go back and pick up another thread: the treatment of variables in λ-calculus.

Variables

Alonzo Church also apparently seized an opportunity to explore an insight when doing research on a nearby topic. The main line of his research was to see if one could banish the paradoxes of classical logic by developing a formal logic that weakens reductio ad absurdum — instead of eliminating the law of the excluded middle, which was a favored approach to the problem. But when he published the logic, in 1932, he mentioned reductio ad absurdum in the first paragraph and then spent the next several paragraphs ranting about the evils of unbound variables. One gathers he wanted everything to be perfectly clear, and unbound variables offended his sense of philosophical precision. His logic had just one possible semantics for a variable, namely, a parameter to be supplied to a function; he avoided the need for any alternative notions of universally or existentially quantified variables, by the (imho quite lovely) device of using higher-order functions for quantification. That is (since I've brought it up), existential quantifier Σ applied to function F would produce a proposition ΣF meaning that there is some true proposition FX, and universal quantifier Π applied to F, proposition ΠF meaning that every proposition FX is true. In essence, he showed that these quantifiers are orthogonal to variable-binding; leaving him with only a single variable-binding device, which, for some reason lost to history, he called "λ".

λ-calculus is formally a term-rewriting calculus; a set of terms together with a set of rules for rewriting a term to produce another term. The two basic well-behavedness properties that a term-rewriting calculus generally ought to have are compatibility and Church-Rosser-ness. Compatibility says that if a term can be rewritten when it's a standalone term, it can also be rewritten when it's a subterm of a larger term. Church-Rosser-ness says that if a term can be rewritten in two different ways, then the difference between the two results can always be eliminated by some further rewriting. Church-Rosser-ness is another way of saying that rewriting can be thought of as a directed process toward an answer, which is characteristic of calculi. Philosophically, one might be tempted to ask why the various paths of rewriting ought to reconverge later, but this follows from thinking of the terms as the underlying reality. If the terms merely describe the reality, and the rewriting lets us reason about its development, then the term syntax is just a way for us to separately describe different parts of the reality, and compatibility and Church-Rosser-ness are just statements about our ability (via this system) to reason separately about different aspects of the development at different parts of the reality without distorting our eventual conclusion about where the whole development is going. From that perspective, Church-Rosser-ness is about separability, and convergence is just the form in which the separability appears in the calculus.

The syntax of λ-calculus — which particularly clearly illustrates these principles — is

T ::= x | (TT) | (λx.T) .

That is, a term is either a variable; or a combination, specifying that a function is applied to an operand; or a λ-expression, defining a function of one parameter. The T in (λx.T) is the body of the function, x its parameter, and free occurrences of x in T are bound by this λ. An occurrence of x in T is free if it doesn't occur inside a context (λx.[ ]) within T. This connection between a λ and the variable instances it binds is structural. Here, for example, is a term involving variables x, y, and z, annotated with pointers to a particular binding λ and its variable instances:

((λx.((λy.((λx.(xz))(xy)))(xz)))(xy)) .
^^ ^ ^

The x instance in the trailing (xy) is not bound by this λ since it is outside the binding expression. The x instance in the innermost (xz) is not bound since it is captured by another λ inside the body of the one we're considering. I suggest that the three marked elements — binder and two bound instances — should be thought of together as the syntactic representation of a deeper, distributed entity that connects distant elements of the term.

There is just one rewriting rule — one of the fascinations of this calculus, that just one rule suffices for all computation — called the β-rule:

((λx.T₁)T₂) → T₁[x ← T₂] .

The left-hand side of this rule is the redex pattern (redex short for reducible expression); it specifies a local pattern in the syntax tree of the term. Here the redex pattern is that some particular parent node in the syntax tree is a combination whose left-hand child is a λ-expression. Remember, this rewriting relation is compatible, so the parent node doesn't have to be the root of the entire tree. It's important that this local pattern in the syntax tree includes a variable binder λ, thus engaging not only a local region of the syntax tree, but also a specific distributed structure in the network of non-local connections across the tree. Following my earlier post, I'll call the syntax tree the "geometry" of the term, and the totality of the non-local connections its "network topology".

The right-hand side of the rule specifies replacement by substituting the operand T₂ for the parameter x everywhere it occurs free in the body T₁; but there's a catch. One might, naively, imagine that this would be recursively defined as

x[x ← T] = T
x₁[x₂ ← T] = x₁ if x₁ isn't x₂

(T₁ T₂)[x ← T] = (T₁[x ← T] T₂[x ← T])

(λx.T₁)[x ← T₂] = (λx.T₁)
(λx₁.T₁)[x₂ ← T₂] = (λx₁.T₁[x₂ ← T₂]) if x₁ isn't x₂.

This definition just descends the syntax tree substituting for the variable, and stops if it hits a λ that binds the same variable; very straightforward, and only a little tedious. Except that it doesn't work. Most of it does; but there's a subtle error in the rule for descending through a λ that binds a different variable,

(λx₁.T₁)[x₂ ← T₂] = (λx₁.T₁[x₂ ← T₂]) if x₁ isn't x₂.

The trouble is, what if T₁ contains a free occurrence of x₂ and, at the same time, T₂ contains a free instance of x₁? Then, before the substitution, that free instance of x₁ was part of some larger distributed structure; that is, it was bound by some λ further up in the syntax tree; but after the substitution, following this naive definition of substitution, a copy of T₂ is embedded within T₁ with an instance of x₁ that has been cut off from the larger distributed structure and instead bound by the inner λx₁, essentially altering the sense of syntactic template T₂. The inner λx₁ is then said to capture the free x₁ in T₂, and the resulting loss of integrity of the meaning of T₂ is called bad hygiene (or, a hygiene violation). For example,

((λy.(λx.y))x) ⇒_β (λx.y)[y ← x]

but under the naive definition of substitution, this would be (λx.x), because of the coincidence that the x we're substituting for y happens to have the same name as the bound variable of this inner λ. If the inner variable had been named anything else (other than y) there would have been no problem. The "right" answer here is a term of the form (λz.x), where any variable name could be used instead of z as long as it isn't "x" or "y". The standard solution is to introduce a rule for renaming bound variables (called α-renaming), and restrict the substitution rule to require that hygiene be arranged beforehand. That is,

(λx₁.T) → (λx₂.T[x₁ ← x₂]) where x₂ doesn't occur free in T

(λx₁.T₁)[x₂ ← T₂] = (λx₁.T₁[x₂ ← T₂]) if x₁ isn't x₂ and doesn't occur free in T₂.

Here again, this may be puzzling if one thinks of the syntax as the underlying reality. If the distributed structures of the network topology are the reality, which the syntax merely describes, then α-renaming is merely an artifact of the means of description; indeed, the variable-names themselves are merely an artifact of the means of description.

Side-effect-ful variables

Suppose we want to capture classical side-effect-ful behavior, unmodified, without weakening any of the theorems of Plotkin's paradigm. Side-effects are by nature distributed across the term, and would therefore seem to belong naturally to its network topology. In Felleisen's basic calculus, retaining the classical behavior and requiring the full correspondence theorems, side-effect-ful operations create syntactic markers that then "bubble up" through the syntax tree till they reach the top of the term, from which the global consequence of the side-effect is enacted by a whole-term-rewriting rule — thus violating compatibility, since the culminating rule is by nature applied to the whole term rather than to a subterm. This strategy seems, in retrospect, to be somewhat limited by an (understandable) inclination to conform to the style of variable handling in λ-calculus, whose sole binding device is tied to function application at a specific location in the geometry. Alternatively (as I seized the opportunity to explore in my dissertation), one can avoid the non-compatible whole-term rules by making the syntactic marker, which bubbles up through the term, a variable-binder. These side-effect-ful bindings are no longer strongly tied to a particular location in the geometry; they float, potentially to the top of the term, or may linger further down in the tree if the side-effect happens to only affect a limited region of the geometry. But the full classical behavior (in the cases Felleisen addressed) is captured, and Plotkin's entire suite of theorems are supported.

The calculus in which I implemented this side-effect strategy (along with some other things, that were the actual point of the dissertation but don't apparently matter here) is called vau-calculus.

Recall that the β-rule of λ-calculus applies to a redex pattern at a specific location in the geometry, and requires a binder to occur there so that it can also tie in to a specific element of the network topology. The same is true of the side-effect-ful rules of the calculus I constructed: a redex pattern occurs at a specific location in the geometry with a local tie-in to the network topology. There may then be a substitutive operation on the right-hand side of the rule, which uses the associated element of the network topology to propagate side-effect-ful consequences back down the syntax tree to the entire encompassed subterm. There is a qualitative difference, though, between the traditional substitution of the β-rule and the substitutions of the side-effect-ful operations. A traditional substitution T₁[x ← T₂] may attach new T₂ subtrees at certain leaves of the T₁ syntax tree (free instances of x in T₁), but does not disturb any of the pre-existing tree structure of T₁. Consequently, the only effect of the β-rule on the pre-existing geometry is the rearrangement it does within the redex pattern. This is symmetric to the hygiene property, which assures (by active intervention if necessary, via α-renaming) that the only effect of the β-rule on the pre-existing network topology is what it does to the variable element whose binding is within the redex pattern. I've therefore called the geometry non-disturbance property co-hygiene. As long as β-substitution is the only variable substitution used, co-hygiene is an easily overlooked property of the β-rule since, unlike hygiene, it does not require any active intervention to maintain.

The substitutions used by the side-effect-ful rewriting operations go to the same α-renaming lengths as the β-rule to assure hygiene. However, the side-effect-ful substitutions are non-co-hygienic. This might, arguably, be used as a technical definition of side-effects, which cause distributed changes to the pre-existing geometry of the term.

Quantum scope

Because co-hygiene is about not perturbing pre-existing geometry, it seems reasonable that co-hygienic rewriting operations should be more in harmony with the geometry than non-co-hygienic rewriting operations. Thus, β-rewriting should be more in harmony with the geometry of the term than the side-effect-ful operations; which, subjectively, does appear to be the case. (The property that first drew my attention to all this was that α-renaming, which is geometrically neutral, is a special case of β-substitution, whereas the side-effect-ful substitutions are structurally disparate from α-renaming.)

And gravity is more in harmony with the geometry of spacetime than are the other fundamental forces; witness general relativity.

Hence my speculation, by analogy, that one might usefully structure a theory of basic physics such that gravity is co-hygienic while the other fundamental forces are non-co-hygienic.

One implication of this line of speculation (as I noted in the earlier post) would be fruitlessness of efforts to unify the other fundamental forces with gravity by integrating them into the geometry of spacetime. If the other forces are non-co-hygienic, their non-affinity with geometry is structural, and trying to treat them in a more gravity-like way would be like trying to treat side-effect-ful behavior as structurally akin to function-application in λ-calculus — which I have long reckoned was the structural miscue that prevented Felleisen's calculus from supporting the full set of well-behavedness theorems.

On further consideration, though, something more may be suggested; even as the other forces might not integrate into the geometry of spacetime, gravity might not integrate into the infrastructure of quantum mechanics. All this has to do with the network topology, a non-local infrastructure that exists even in pure λ-calculus, but which in the side-effect-ful vau-calculus achieves what one might be tempted to call "spooky action at a distance". Suppose that quantum entanglement is part of this non-co-hygienic aspect of the theory. (Perhaps quantum entanglement would be the whole of the non-co-hygienic aspect, or, as I discussed in the earlier post, perhaps there would be other, non-quantum non-locality with interesting consequences at cosmological scale; then again, one might wonder if quantum entanglement would itself have consequences at cosmological scale that we have failed to anticipate because the math is beyond us.) It would follow that gravity would not exhibit quantum entanglement. On one hand, this would imply that quantum gravity should not work well as a natural unification strategy. On the other hand, to make this approach work, something rather drastic must happen to the underpinnings of quantum mechanics, both philosophical and technical.

We understand quantum mechanics as describing the shape of a spectrum of different possible realities; from a technical perspective that is what quantum mechanics describes, even if one doesn't accept it as a philosophical interpretation (and many do accept that interpretation, if only on grounds of Occam's Razor that there's no reason to suppose philosophically some other foundation than is supported technically). But, shaped spectra of alternative versions of the entire universe seems reminiscent of whole-term rewriting in Felleisen's calculus — which was, notably, a consequence of a structural design choice in the calculus that actually weakened the internal symmetry of the system. The alternative strategy of vau-calculus both had a more uniform infrastructure and avoided the non-compatible whole-term rewriting rules. An analogous theory of basic physics ought to account for quantum entanglement without requiring wholesale branching of alternative universes. Put another way, if gravity isn't included in quantum entanglement, and therefore has to diverge from the other forces at a level more basic than the level where quantum entanglement arises, then the level at which quantum entanglement arises cannot be the most basic level.

Just because quantum structure would not be at the deepest level of physics, does not at all suggest that what lies beneath it must be remotely classical. Quantum mechanics is mathematically a sort of lens that distorts whatever classical system is passed through it; taking the Schrödinger equation as demonstrative,

iℏ ∂ Ψ

∂t

= Ĥ Ψ ,

the classical system is contained in the Hamiltonian function Ĥ, which is plugged into the equation to produce a suitable spectrum of alternatives. Hence my description of the quantum equation itself as basic. But, following the vau-calculus analogy, it seems some sort of internal non-locality ought to be basic, as it follows from the existence of the network topology; looking at vau-calculus, even the β-rule fully engages the network topology, though co-hygienically.

Geometry and network

The above insights on the physical theory itself are mostly negative, indicating what this sort of theory of physics would not be like, what characteristics of conventional quantum math it would not have. What sort of structure would it have?

I'm not looking for detailed math, just yet, but the overall shape into which the details would be cast. Some detailed math will be needed, before things go much further, to demonstrate that the proposed approach is capable of generating predictions sufficiently consistent with quantum mechanics, keeping in mind the well-known no-go result of Bell's Theorem. I'm aware of the need; the question, though, is not whether Bell's Theorem can be sidestepped — of course it can, like any other no-go theorem, by blatantly violating one of its premises — but whether it can be sidestepped by a certain kind of theory. So the structure of the theory is part of the possibility question, and needs to be settled before we can ask the question properly.

In fact, one of my concerns for this sort of theory is that it might have too many ways to get around Bell's Theorem. Occam's Razor would not look favorably on a theory with redundant Bell-avoidance devices.

Let's now set aside locality for a moment, and consider nondeterminism. Bell's Theorem calls (in combination with some experimental results that are, somewhat inevitably, argued over) for chronological nondeterminism, that is, nondeterminism relative to the time evolution of the physical system. One might, speculatively, be able to approximate that sort of nondeterminism arbitrarily well, in a fundamentally non-local theory, by exploiting the assumption that the physical system under consideration is trivially small relative to the whole cosmos. We might be able to draw on interactions with distant elements of the cosmos to provide a more-or-less "endless" supply of pseudo-randomness. I considered this possibility in the earlier post on co-hygiene, and it is an interesting theoretical question whether (or, at the very least, how) a theory of this sort could in fact generate the sort of quantum probability distribution that, according to Bell's Theorem, cannot be generated by a chronologically deterministic local theory. The sort of theory I'm describing, however, is merely a way to provide a local illusion of nondeterminism in a non-local theory with global determinism — and when we're talking chronology, it is difficult even to define global determinism (because, thanks to relativity, "time" is tricky to define even locally; made even trickier since we're now contemplating a theory lacking the sort of continuity that relativity relies upon; and is likely impossible to define globally, thanks to relativity's deep locality). It's also no longer clear anymore why one should expect chronological determinism at all.

A more straightforward solution, seemingly therefore favored by Occam's Razor, is to give up on chronological determinism and instead acquire mathematical determinism, by the arguably "obvious" strategy of supposing that the whole of spacetime evolves deterministically along an orthogonal dimension, converting unknown initial conditions (initial in the orthogonal dimension) into chronological nondeterminism. I demonstrated the principle of this approach in an earlier post. It is a bit over-powered, though; a mathematically deterministic theory of this sort — moreover, a mathematically deterministic and mathematically local theory of this sort — can readily generate not only a quantum probability distribution of the sort considered by Bell's Theorem, but, on the face of it, any probability distribution you like. This sort of excessive power would seem rather disfavored by Occam's Razor.

The approach does, however, seem well-suited to a co-hygiene-directed theory. Church-Rosser-ness implies that term rewriting should be treated as reasoning rather than directly as chronological evolution, which seemingly puts term rewriting on a dimension orthogonal to spacetime. The earlier co-hygiene post noted that calculi, which converge to an answer via Church-Rosser-ness, contrast with grammars, which are also term-rewriting systems but exist for the purpose of diverging and are thus naturally allied with mathematical nondeterminism whereas calculi naturally ally with mathematical determinism. So our desire to exploit the calculus/physics analogy, together with our desire for abstract separability of parts, seems to favor this use of a rewriting dimension orthogonal to spacetime.

A puzzle then arises about the notion of mathematical locality. When the rewriting relation, through this orthogonal dimension (which I used to call "meta-time", though now that we're associating it with reasoning some other name is wanted), changes spacetime, there's no need for the change to be non-local. We can apparently generate any sort of physical laws, quantum or otherwise, without the need for more than strictly local rewrite rules; so, again by Occam's Razor, why would we need to suppose a whole elaborate non-local "network topology"? A strictly local rewriting rule sounds much simpler.

Consider, though, what we mean by locality. Both nondeterminism and locality must be understood relative to a dimension of change, thus "chronological nondeterminism"; but to be thorough in defining locality we also need a notion of what it means for two elements of a system state to be near each other. "Yes, yes," you may say, "but we have an obvious notion of nearness, provided by the geometry of spacetime." Perhaps; but then again, we're now deep enough in the infrastructure that we might expect the geometry of spacetime to emerge from something deeper. So, what is the essence of the geometry/network distinction in vau-calculus?

A λ-calculus term is a syntax tree — a graph, made up of nodes connected to each other by edges that, in this case, define the potential function-application relationships. That is, the whole purpose of the context-free syntax is to define where the interactions — the redex patterns for applying the β-rule — are. One might plausibly say much the same for the geometry of spacetime re gravity, i.e., location in spacetime defines the potential gravitational interactions. The spacetime geometry is not, evidently, hierarchical like that of λ-calculus terms; that hierarchy is apparently a part of the function-application concept. Without the hierarchy, there is no obvious opportunity for a direct physical analog to the property of compatibility in term-rewriting calculi.

The network topology, i.e., the variables, provide another set of connections between nodes of the graph. These groups of connection are less uniform, and the variations between them do not participate in the redex patterns, but are merely tangential to the redex patterns thus cuing the engagement of a variable structure in a rewriting transformation. In vau-calculi the variable is always engaged in the redex through its binding, but this is done for compatibility; by guaranteeing that all the variable instances occur below the binding in the syntax tree, the rewriting transformation can be limited to that branch of the tree. Indeed, only the λ bindings really have a fixed place in the geometry, dictated by the role of the variable in the syntactically located function application; side-effect-ful bindings float rather freely, and their movement through the tree really makes no difference to the function-application structure as long as they stay far enough up in the tree to encompass all their matching variable instances. If not for the convenience of tying these bindings onto the tree, one might represent them as partly or entirely separate from the tree (depending on which kind of side-effect one is considering), tethered to the tree mostly by the connections to the bound variable instances. The redex pattern, embedded within the geometry, would presumably be at a variable instance. Arranging for Church-Rosser-ness would, one supposes, be rather more challenging without compatibility.

Interestingly, btw, of the two classes of side-effects considered by vau-calculus (and by Felleisen), this separation of bindings from the syntax tree is more complete for sequential-state side-effects than for sequential-control side-effects — and sequential control is much more simply handled in vau-calculus than is sequential state. I'm still wondering if there's some abstract principle here that could relate to the differences between various non-gravitational forces in physics, such as the simplicity of Maxwell's equations for electromagnetism.

This notion of a binding node for a variable hovering outside the geometry, tethered more-or-less-loosely to it by connections to variable instances, has a certain vague similarity to the aggressive non-locality of quantum wave functions. The form of the wave function would, perhaps, be determined by a mix of the nature of the connections to the geometry together with some sort of blurring effect resulting from a poor choice of representing structures; the hope would be that a better choice of representation would afford a more focused description.

I've now identified, for vau-calculus, three structural differences between the geometry and the network.

The geometry contains the redex patterns (with perhaps some exotic exceptions).
The geometric topology is much simpler and more uniform than the network topology.
The network topology is treated hygienically by all rewriting transformations, whereas the geometry is treated co-hygienically only by one class of rewriting transformations (β).

But which of these three do we expect to carry over to physics?

The three major classes of rewriting operations in vau-calculus — function application, sequential control, and sequential state — all involve some information in the term that directs the rewrite and therefore belongs in the redex pattern. All three classes of operations involve distributing information to all the instances of the engaged variable. But, the three classes differ in how closely this directing information is tied to the geometry.

For function application, the directing information is entirely contained in the geometry, the redex pattern of the β-rule, ((λx.T₁)T₂). The only information about the variable not contained within that purely geometric redex pattern is the locations of the bound instances.

For sequential control, the variable binder is a catch expression, and the bound variable instances are throw expressions that send a value up to the matching catch. (I examined this case in detail in an earlier post.) The directing information contained in the variable, beyond the locations of the bound instances, would seem to be the location of the catch; but in fact the catch can move, floating upward in the syntax tree, though moving the catch involves a non-co-hygienic substitutive transformation — in fact, the only non-co-hygienic transformation for sequential control. So the directing information is still partly tied to the syntactic structure (and this tie is somehow related to the non-co-hygiene). The catch-throw device is explicitly hierarchical, which would not carry over directly to physics; but this may be only a consequence of its relation to the function-application structure, which does carry over (in the broad sense of spacetime geometry). There may yet be more to make of a side analogy between vau-calculus catch-throw and Maxwell's Equations.

For sequential state, the directing information is a full-blown environment, a mapping from symbols to values, with arbitrarily extensive information content and very little relation to geometric location. The calculus rewrite makes limited use of the syntactic hierarchy to coordinate time ordering of assignments — not so much inherently hierarchical as inherently tied to the time sequencing of function applications, which itself happens to be hierarchical — but this geometric connection is even weaker than for catch-throw, and its linkage to time ordering is more apparent. In correspondence with the weaker geometric ties, the supporting rewrite rules are much more complicated, as they moderate passage of information into and out of the mapping repository.

"Time ordering" here really does refer to time in broadly the same sense that it would arise in physics, not to rewriting order as such. That is, it is the chronological ordering of events in the programming language described by the rewriting system, analogous to the chronological ordering of events described by a theory of physics. Order of rewriting is in part related to described chronology, although details of the relationship would likely be quite different for physics where it's to do with relativity. This distinction is confusing even in term-rewriting PL semantics, where PL time is strictly classical; one might argue that confusion between rewriting, which is essentially reasoning, and evaluation, which is the PL process reasoned about, resulted in the unfortunately misleading "theory of fexprs is trivial" result which I have discussed here previously.

It's an interesting insight that, while part of the use of syntactic hierarchy in sequential control/state — and even in function application, really — is about compatibility, which afaics does not at all carry over to physics, their remaining use of syntactic hierarchy is really about coordination of time sequencing, which does occur in physics in the form of relativity. Admittedly, in this sort of speculative exploration of possible theories for physics, I find the prospect of tinkering with the infrastructure of quantum mechanics not nearly as daunting as tinkering with the infrastructure of relativity.

At any rate, the fact that vau-calculus puts the redex pattern (almost always) entirely within a localized area of the syntax, would seem to be more a statement about the way the information is represented than about the geometry/network balance. That is, vau-calculus represents the entire state of the system by a syntactic term, so each item of information has to be given a specific location in the term, even if that location is chosen somewhat arbitrarily. It is then convenient, for time ordering, to require that all the information needed for a transformation should get together in a particular area of the term. Quantum mechanics may suffer from a similar problem, in a more advanced form, as some of the information in a wave function may be less tied to the geometry than the equations (e.g. the Schrödinger equation) depict it. What really makes things messy is devices that are related to the geometry but less tightly so than the primary, co-hygienic device. Perhaps that is the ultimate trade-off, with differently structured devices becoming more loosely coupled to the geometry and proportionately less co-hygienic.

All of which has followed from considering the first of three geometry/network asymmetries: that redex patterns are mostly contained in the geometry rather than the network. The other two asymmetries noted were (1) that the geometric structure is simple and uniform while the network structure is not, and (2) that the network is protected from perturbation while the geometry is not — i.e., the operations are all hygienic (protecting the network) but not all are co-hygienic (protecting the geometry). Non-co-hygiene complicates things only moderately, because the perturbations are to the simple, uniform part of the system configuration; all of the operations are hygienic, so they don't perturb the complicated, nonuniform part of the configuration. Which is fortunate for mathematical treatment; if the perturbations were to the messy stuff, it seems we mightn't be able to cope mathematically at all. So these two asymmetries go together. In my more cynical moments, this seems like wishful thinking; why should the physical world be so cooperative? However, perhaps they should be properly understood as two aspects of a single effect, itself a kind of separability, the same view I've recommended for Church-Rosser-ness; in fact, Church-Rosser-ness may be another aspect of the same whole. The essential point is that we are able to usefully consider individual parts of the cosmos even though they're all interconnected, because there are limits on how aggressively the interconnectedness is exercised. The "geometry" is the simple, uniform way of decomposing the whole into parts, and "hygiene" is an assertion that this decomposition suffices to keep things tractable. It's still fair to question why the cosmos should be separable in this way, and even to try to build a theory of physics in which the separation breaks down; but there may be some reassurance, re Occam's Razor, in the thought that these two asymmetries (simplicity/uniformity, and hygiene) are two aspects of a single serendipitous effect, rather than two independently serendipitous effects.

Cosmic structure

Most of these threads are pointing toward a rewriting relation along a dimension orthogonal to spacetime, though we're lacking a good name for it atm (I tend to want to name things early in the development process, though I'm open to change if a better name comes along).

One thread, mentioned above, that seems at least partly indifferent to the rewriting question, is that of changes in the character of quantum mechanics at cosmological scale. This relates to the notion of decoherence. It was recognized early in the conceptualization of quantum mechanics that a very small entangled quantum system would tend to interact with the rest of the universe and thereby lose its entanglement and, ultimately, become more classical. We can only handle the quantum math for very small physical systems; in fact, rather insanely small physical systems. Intuitively, what if this tendency of entanglement to evaporate when interacting with the rest of the universe ceases to be valid when the size of the physical system is sufficiently nontrivial compared to the size of the whole universe? In the traditional quantum mechanics, decoherence appears to be an all-or-nothing proposition, a strict dichotomy tied to the concept of observation. If something else is going on at large scales, either it is an unanticipated implication of the math-that-we-can't-do, or it is an aspect of the physics that our quantum math doesn't include because the phenomena that would cause us to confront this aspect are many orders of magnitude outside anything we could possibly apply the quantum math to. It's tantalizing that this conjures both the problem of observation, and the possibility that quantum mechanics may be (like Newtonian mechanics) only an approximation that's very good within its realm of application.

The persistently awkward interplay of the continuous and discrete is a theme I've visited before. Relativity appears to have too stiff a dose of continuity in it, creating a self-reference problem even in the non-quantum case (iirc Einstein had doubts on this point before convincing himself the math of general relativity could be made to work); and when non-local effects are introduced for the quantum case, continuity becomes overconstraining. Quantum gravity efforts suffer from a self-reference problem on steroids (non-renormalizable infinities). The Big Picture perspective here is that non-locality and discontinuity go together because a continuum — as simple and uniform as it is possible to be — is always going to be perceived as geometry.

The non-local network in vau-calculus appears to be inherently discrete, based on completely arbitrary point-to-point connections defined by location of variable instances, with no obvious way to set up any remotely similar continuous arrangement. Moreover, the means I've described for deriving nondeterminism from the network connections (on which I went into some detail in the earlier post) exploits the potential for chaotic scrambling of discrete point-to-point connections by following successions of links hopscotching from point to point. While the geometry might seem more amenable to continuity, a truly continuous geometry doesn't seem consistent with point-to-point network connections, either, as one would then have the prospect of an infinitely dense tangle of network connections to randomly unrelated remote points, a sort of probability-density field that seems likely to wash out the randomness advantages of the strategy and less likely to be mathematically useful; so the whole rewriting strategy appears discrete in both the geometry and network aspects of its configuration as well as in the discrete rewriting steps themselves.

The rewriting approach may suffer from too stiff a dose of discreteness, as it seems to force a concrete choice of basic structures. Quantum mechanics is foundationally flexible on the choice of elementary particles; the mathematical infrastructure (e.g. the Schrödinger equation) makes no commitment on the matter at all, leaving it to the Hamiltonian Ĥ. Particles are devised comparatively freely, as with such entities as phonons and holes. Possibly the rewriting structure one chooses will afford comparable flexibility, but it's not at all obvious that one could expect this level of versatile refactoring from a thoroughly discrete system. Keeping in mind this likely shortfall of flexibility, it's not immediately clear what the basic elements should be. Even if one adopts, say, the standard model, it's unclear how that choice of observable particles would correspond to concrete elements in a discrete spacetime-rewriting system (in one "metaclassical" scenario I've considered, spacetime events are particle-like entities tracing out one-dimensional curves as spacetime evolves across an orthogonal dimension); and it is by no means certain that the observable elements ought to follow the standard model, either. As I write this there is, part of the time, a cat sitting on the sofa next to me. It's perfectly clear to me that this is the correct way to view the situation, even though on even moderately closer examination the boundaries of the cat may be ambiguous, e.g. at what point an individual strand of fur ceases to be part of the cat. By the time we get down to the scale where quantum mechanics comes into play and refactoring of particles becomes feasible, though, is it even certain that those particles are "really" there? (Hilaire Belloc cast aspersions on the reality of a microbe merely because it couldn't be seen without the technological intervention of a microscope; how much more skepticism is recommended when we need a gigantic particle accelerator?)

Re the structural implications of quasiparticles (such as holes), note that such entities are approximations introduced to describe the behavior of vastly complicated systems underneath. A speculation that naturally springs to mind is, could the underlying "elementary" particles be themselves approximations resulting from complicated systems at a vastly smaller scale; which would seem problematic in conventional physics since quantum mechanics is apparently inclined to stop at Planck scale. However, the variety of non-locality I've been exploring in this thread may offer a solution: by maintaining network connections from an individual "elementary" particle to remote, and rather arbitrarily scrambled, elements of the cosmos, one could effectively make the entire cosmos (or at least significant parts of it) serve as the vastly complicated system underlying the particle.

It is, btw, also not certain what we should expect as the destination of a spacetime-rewriting relation. An obvious choice, sufficient for a proof-of-concept theory (previous post), is to require that spacetime reach a stable state, from which there is either no rewriting possible, or further rewriting leaves the system state unchanged. Is that the only way to derive a final state of spacetime? No. Whatever other options might be devised, one that comes to mind is some form of cycle, repeating a closed set of states of spacetime, perhaps giving rise to a set of states that would manifest in more conventional quantum math as a standing wave. Speculatively, different particles might differ from each other by the sort of cyclic pattern they settle into, determining a finite — or perhaps infinite — set of possible "elementary particles". (Side speculation: How do we choose an initial state for spacetime? Perhaps quantum probability distributions are themselves stable in the sense that, while most initial probability distributions produce a different final distribution, a quantum distribution produces itself.)

Granting that the calculus/physics analogy naturally suggests some sort of physical theory based on a discrete rewriting system, I've had recurring doubts over whether the rewriting ought to be in the direction of time — an intuitively natural option — or, as discussed, in a direction orthogonal to spacetime. At this point, though, we've accumulated several reasons to prefer rewriting orthogonal to spacetime.

Church-Rosser-ness. CR-ness is about ability to reason separately about the implications of different parts of the system, without having to worry about which reasoning to do first. The formal property is that whatever order one takes these locally-driven inferences in ("locally-driven" being a sort of weak locality), it's always possible to make later inferences that reach a common conclusion by either path. This makes it implausible to think of these inference steps as if they were chronological evolution.

Bell's Theorem. The theorem says, essentially, the probability distributions of quantum mechanics can't be generated by a conventionally deterministic local theory. Could it be done by a non-local rewriting theory evolving deterministically forward in time? My guess would be, probably it could (at least for classical time); but I suspect it'd be rather artificial, whereas my sense of the orthogonal-dimension rewriting approach (from my aforementioned proof-of-concept) is that it ought to work out neatly.

Relativity. Uses an intensively continuous mathematical infrastructure to construct a relative notion of time. It would be rather awkward to set an intensively discrete rewriting relation on top of this relative notion of time; the intensively discrete rewriting really wants to be at a deeper level of reality than any continuous relativistic infrastructure, rather than built on top of it (just as we've placed it at a deeper level than quantum entanglement), with apparent continuity arising from statistical averaging over the discrete foundations. Once rewriting is below relativity, there is no clear definition of a "chronological" direction for rewriting; so rewriting orthogonal to spacetime is a natural device from which to derive relativistic structure. Relativity is however a quintessentially local theory, which ought to be naturally favored by a predominately local rewriting relation in the orthogonal dimension. Deriving relativistic structure from an orthogonal rewriting relation with a simple causal structure also defuses the self-reference problems that have lingered about gravity.

It's rather heartening to see this feature of the theory (rewriting orthogonal to spacetime) — or really any feature of a theory — drawing support from considerations in both quantum mechanics and relativity.

The next phase of exploring this branch of theory — working from these clues to the sort of structure such a theory ought to have — seems likely to study how the shape of a spacetime-orthogonal rewriting system determines the shape of spacetime. My sense atm is that one would probably want particular attention to how the system might give rise to a relativity-like structure, with an eye toward what role, if any, a non-local network might play in the system. Keeping in mind that β-rule use of network topology, though co-hygienic, is at the core of what function application does and, at the same time, inspired my suggestion to simulate nondeterminism through repeatedly rescrambled network connections; and, likewise, keeping in mind evidence (variously touched on above) on the possible character of different kinds of generalized non-co-hygienic operations.

Interpreted programming languages

2016-08-29T12:07:00.000-07:00

Last night I drifted off while reading a Lisp book.
— xkcd 224.

It's finally registered on me that much of the past half century of misunderstandings and confusions about the semantics of Lisp, of quotation, and, yes, of fexprs, can be accounted for by failure to recognize there is a theoretical difference between an interpreted programming language and a compiled programming language. If we fail to take this difference into account, our mathematical technology for studying compiled programming languages will fail when applied to interpreted languages, leading to loss of coherence in language designs and a tendency to blame the language rather than the theory.

Technically, a compiler translates the program into an executable form to be run thereafter, while an interpreter figures out what the program says to do and just does it immediately. Compilation allows higher-performance execution, because the compiler takes care of reasoning about source-code before execution, usually including how to optimize the translation for whatever resources are prioritized (time, space, peripherals). It's easy to suppose this is all there is to it; what the computer does is an obvious focus for attention. One might then suppose that interpretation is a sort of cut-rate alternative to compilation, a quick-and-dirty way to implement a language if you don't care about performance. I think this misses some crucial point about interpretation, some insight to be found not in the computer, but in the mind of the programmer. I don't understand that crucial insight clearly enough — yet — to focus a whole blog post on it; but meanwhile, there's this theoretical distinction between the two strategies which also isn't to be found in the computer's-eye view, and which I do understand enough about to focus this blog post on.

It's not safe to say the language is the same regardless of which way it's processed, because the language design and the processing strategy aren't really independent. In principle a given language might be processed either way; but the two strategies provide different conceptual frameworks for thinking about the language, lending themselves to different language design choices, with different purposes — for which different mathematical properties are of interest and different mathematical techniques are effective. This is a situation where formal treatment has to be considered in the context of intent. (I previously blogged about another case where this happens, formal grammars and term-rewriting calculi, which are both term-rewriting systems but have nearly diametrically opposite purposes; over t‍ha‍r.)

I was set onto the topic of this post recently by some questions I was asked about Robert Muller's M-Lisp. My dissertation mentions Muller's work only lightly, because Muller's work and mine are so far apart. However, because they seem to start from the same place yet lead in such different directions, one naturally wants to know why, and I've struck on a way to make sense of it: starting from the ink-blot of Lisp, Muller and I both looked to find a nearby design with greater elegance — and we ended up with vastly different languages because Muller's search space was shaped by a conceptual framework of compilation while mine was shaped by a conceptual framework of interpretation. I will point out, below, where his path and mine part company, and remark briefly on how this divergence affected his destination.

The mathematical technology involved here, I looked at from a lower-level perspective in an earlier post. It turns out, from a higher-level perspective, that the technology itself can be used for both kinds of languages, but certain details in the way it is usually applied only work with compiled languages and, when applied to interpreted languages, result in the trivialization of theory noted by Wand's classic paper, "The Theory of Fexprs is Trivial".

Contents
M-language
Compilation
Universal program
S-language
Half a century's worth of misunderstandings and confusions

M-language

I'll explore this through a toy programming language which I'll then modify, starting with something moderately similar to what McCarthy originally described for Lisp (before it got unexpectedly implemented).

This is a compiled programming language, without imperative features, similar to λ-calculus, for manipulating data structures that are nested trees of atomic symbols. The syntax of this language has two kinds of expressions: S-expressions (S for Symbolic), which don't specify any computation but merely specify constant data structures — trees of atomic symbols — and M-expressions (M for Meta), which specify computations that manipulate these data structures.

S-expressions, the constants of the language, take five forms:

S ::= s | () | #t | #f | (S . S) .

That is, an S-expression is either a symbol, an empty list, true, false, or a pair (whose elements are called, following Lisp tradition, its car and cdr). A symbol name is a sequence of one or more upper-case letters. (There should be no need, given the light usage in this post, for any elaborate convention to clarify the difference between a symbol name and a nonterminal such as S.)

I'll assume the usual shorthand for lists, (S ...) ≡ (S . (...)), so for example (FOO BAR QUUX) ≡ (FOO . (BAR . (QUUX . ()))); but I won't complicate the formal grammar with this shorthand since it doesn't impact the abstract syntax.

The forms of M-expressions start out looking exactly like λ-calculus, then add on several other compound forms and, of course, S-expressions which are constants:

M ::= x | [λx.M] | [M M] | [if M M M] | [car M] | [cdr M] | [cons M M] | [eq? M M] | S .

The first form is a variable, represented here by nonterminal x. A variable name will be a sequence of one or more lower-case letters. Upper- versus lower-case letters is how McCarthy distinguished between symbols and variables in his original description of Lisp.

Variables, unlike symbols, are not constants; rather, variables are part of the computational infrastructure of the language, and any variable might stand for an arbitrary computation M.

The second form constructs a unary function, via λ; the third applies a unary function to a single argument. if expects its first argument to be boolean, and returns its second if the first is true, third if the first is false. car and cdr expect their argument to be a pair and extract its first and second element respectively. cons constructs a pair. eq? expects its two arguments to be S-expressions, and returns #t if they're identical, #f if they're different.

Compound unary functions, constructed by λ, are almost first-class. They can be returned as the values of expressions, and they can be assigned to variables; but as the grammar is set out, they cannot appear as elements of data structures. A pair expression is built up from two S-expressions, and a compound unary function is not an S-expression. McCarthy's original description of Lisp defines S-expressions this way; his stated purpose was only to manipulate trees of symbols. Trained as I am in the much later Lisp culture of first-class functions and minimal constraints, it felt unnatural to follow this narrower definition of S-expressions; but in the end I had to define it so. I'm trying to reproduce the essential factors that caused Lisp to come out the way it did, and, strange to tell, everything might have come out differently if not for the curtailed definition of S-expressions.

(One might ask, couldn't we indirectly construct a pair with a function in it using cons? A pair with a function in it would thus be a term that can't be represented directly by a source expression. This point likely doesn't matter directly to how things turned out; but fwiw, I suspect McCarthy didn't have in mind to allow that, no. It's entirely possible he also hadn't really thought about it yet at the preliminary stage of design where this detail had its impact on the future of Lisp. It's the sort of design issue one often discovers by playing around with a prototype — and in this case, playing around with a prototype is how things got out of hand; more on that later.)

Compilation

Somehow we want to specify the meanings of programs in our programming language. Over the decades, a number of techniques for formal PL semantics have been entertained. One of the first things tried was to set up a term-rewriting machine modeled roughly on λ-calculus, that would perform small finite steps until it reached a final state; that was Peter Landin's notion, when Christopher Strachey set him onto the problem in the 1960s. It took some years to get the kinks out of that approach, and meanwhile other techniques were tried — such as denotational semantics, meta-circular evaluation — but setting up a term-rewriting calculus has been quite a popular technique since the major technical obstacles to it were overcome. Of the three major computational models tied together in the 1930s by the Church-Turing thesis, two of them were based on term-rewriting: Turing machines, which were the convincing model, the one that lent additional credibility to the others when they were all proven equivalent; and λ-calculus, which had mathematical elegance that the nuts-and-bolts Turing-machine model lacked. The modern "small-step" rewriting approach to semantics (as opposed to "big-step", where one deduces how to go in a single step from start to finish) does a credible job of combining the strengths of Turing-machines and λ-calculus; I've a preference for it myself, and it's the strategy I'll use here. I described the technique in somewhat more depth in my previous post on this material.

Small-step semantics applies easily to this toy language because every intermediate computational state of the system is naturally represented by a source-code expression. That is, there is no obvious need to go beyond the source-code grammar we've already written. Some features that have been omitted from the toy language would make it more difficult to limit all computational states to source expressions; generally these would be "stateful" features, such as input/output or a mutable store. Landin used a rewriting system whose terms (computational states) were not source-code expressions. One might ask whether there are any language features that would make it impossible to limit computational states to source expressions, and the answer is essentially yes, there are — features related not to statefulness, but to interpretation. For now, though, we'll assume that all terms are source expressions.

We can define the semantics of the language in reasonably few rewriting schemata.

[[λx.M₁] M₂] → M₁[x ← M₂]
[if #t M₁ M₂] → M₁
[if #f M₁ M₂] → M₂
[car (S₁ . S₂)] → S₁
[cdr (S₁ . S₂)] → S₂
[cons S₁ S₂] → (S₁ . S₂)
[eq? S S] → #t
[eq? S₁ S₂] → #f if the S_k are different.

Rewriting relation → is the compatible closure of these schemata; that is, for any context C and terms M_k, if M₁ → M₂ then C[M₁] → C[M₂]. Relation → is also Church-Rosser: although a given term M may be rewritable in more than one way, any resulting difference can always be eliminated by later rewriting. That is, the reflexive transitive closure →^* has the diamond property: if M₁ →^* M₂ and M₁ →^* M₃, then there exists M₄ such that M₂ →^* M₄ and M₃ →^* M₄.

Formal equality of terms, =, is the symmetric closure of →^* (thus, the reflexive symmetric transitive compatible closure of the schemata, which is to say, the least congruence containing the schemata).

Another important relation is operational equivalence, ≅. Two terms are operationally equivalent just if replacing either by the other in any possible context preserves the observable result of the computation. M₁ ≅ M₂ iff for every context C and S-expression S, C[M₁] ↦* S iff C[M₂] ↦* S. (Fwiw, relation ↦ is what the computation actually does, versus → which is anything the rewriting calculus could do; → is compatible Church-Rosser and therefore nice to work with mathematically, but ↦ is deterministic and therefore we can be sure it does what we meant it to. Another way of putting it is that → has the mathematical character of λ-calculus while ↦ has the practical character of Turing-machines.)

Operational equivalence is exactly what must be guaranteed in order for an optimizing compiler to safely perform a local source-to-source transformation: as long as the two terms are operationally equivalent, the compiler can replace one with the other in any context. The rewriting calculus is operationally sound if formal equality implies operational equivalence; then the rewriting calculus can supply proofs of operational equivalence for use in optimizing compilation.

Before moving on, two other points of interest about operational equivalence:

Operational equivalence of S-expressions is trivial; that is, S₁ and S₂ are operationally equivalent only if they are identical. This follows immediately by plugging the trivial context into the definition of operational equivalence, C[S_k] ≡ S_k. Thus, in every non-trivial operational equivalence M₁ ≅ M₂, at least one of the M_k is not an S-expression.
All terms in the calculus — all M-expressions — are source expressions; but if there were any terms in the calculus that were not source expressions, they would be irrelevant to a source-to-source optimizer; however, if these non-source terms could be usefully understood as terms in an intermediate language used by the compiler, an optimizer might still be able to make use of them and their formal equalities.

Universal program

McCarthy's Lisp language was still in its infancy when the project took an uncontrollable turn in a radically different direction than McCarthy had envisioned going with it. Here's what happened.

A standard exercise in theory of computation is to construct a universal Turing machine, which can take as input an encoding of an arbitrary Turing machine T and an input w to T, and simulate what T would do given input w. This is an extremely tedious exercise; the input to a Turing machine looks nothing like the control device of a Turing machine, so the encoding is highly intrusive, and the control device of the universal machine is something of an unholy mess. McCarthy set out to lend his new programming language mathematical street cred by showing that not only could it simulate itself with a universal program, but the encoding would be much more lucid and the logic simpler in contrast to the unholy mess of the universal Turing machine.

The first step of this plan was to describe an encoding of programs in the form of data structures that could be used as input to a program. That is to say, an encoding of M-expressions as S-expressions. Much of this is a very straightforward homomorphism, recursively mapping the non-data M-expression structure onto corresponding S-expressions; for our toy language, encoding φ would have

φ(x) = symbol s formed by changing the letters of its name from lower-case to upper-case. Thus, φ(foo) = FOO.
φ([λx.M]) = (LAMBDA φ(x) φ(M)).
φ([M₁ M₂]) = (φ(M₁) φ(M₂)).
φ([if M₁ M₂ M₃]) = (IF φ(M₁) φ(M₂) φ(M₃)).
φ([car M]) = (CAR φ(M)).
φ([cdr M]) = (CDR φ(M)).
φ([cons M₁ M₂]) = (CONS φ(M₁) φ(M₂)).
φ([eq? M₁ M₂]) = (EQ φ(M₁) φ(M₂)).

(This encoding ignores the small detail that certain symbol names used in the encoding — LAMBDA, IF, CAR, CDR, CONS, EQ — must not also be used as variable names, if the encoding is to behave correctly. McCarthy seems not to have been fussed about this detail, and nor should we be.)

For a proper encoding, though, S-expressions have to be encoded in a way that unambiguously distinguishes them from the encodings of other M-expressions. McCarthy's solution was

φ(S) = (QUOTE S).

Now, in some ways this is quite a good solution. It has the virtue of simplicity, cutting the Gordian knot. It preserves the readability of the encoded S-expression, which supports McCarthy's desire for a lucid encoding. The main objection one could raise is that it isn't homomorphic; that is, φ((S₁ . S₂)) is not built up from φ(S₁) and φ(S₂).

As McCarthy later recounted, they had expected to have plenty of time to refine the language design before it could be implemented. (The FORTRAN compiler, after all, had been a massive undertaking.) Meanwhile, to experiment with the language they began hand-implementing particular functions. The flaw in this plan was that, because McCarthy had been so successful in demonstrating a universal Lisp function eval with simple logic, it wasn't difficult to hand-implement eval; and, because he had been so successful in making the encoding lucid, this instantly produced a highly usable Lisp interpreter. The sudden implementation precipitated a user community and substantial commitment to specifics of what had been a preliminary language design.

All this might have turned out differently if the preliminary design had allowed first-class functions to be elements in pairs. A function has to be encoded, homomorphically, which would require a homomorphic encoding of pairs, perhaps φ((M₁ . M₂)) = (CONS φ(M₁) φ(M₂)); once we allow arbitrary M-expressions within the pair syntax, (M₁ . M₂), that syntax itself becomes a pair constructor and there's really no need for a separate cons operator in the M-language; then CONS can encode the one constructor. One might then reasonably restrict QUOTE to base cases; more, self-encode () #t and #f, leaving only the case of encoding symbols, and rename QUOTE to SYMBOL. The encoding would then be fully homomorphic — but the encodings of large constant data structures would become unreadable. For example, fairly readable constant structure

((LAMBDA X (X Y)) FOO)

would encode through φ as

(CONS (CONS (SYMBOL LAMBDA) (CONS (SYMBOL X) (CONS (CONS (SYMBOL X) (CONS (SYMBOL Y) ())) ()))) (CONS (SYMBOL FOO) ())) .

That didn't happen, though.

The homomorphic, non-QUOTE encoding would naturally tend to produce a universal function with no practical potential for meta-programming. In theory, one could use the non-homomorphic QUOTE encoding and still not offer any native meta-programming power. However, the QUOTE-based encoding means there are data structures lying around that look exactly like working executable code except that they happen to be QUOTE‍d. In practice, the psychology of the notation makes it rather inevitable that various facilities in the language would allow a blurring of the line between data and code. Lisp, I was told when first taught the language in the 1980s, treats code as data. Sic: I was not told Lisp treats data as code, but that it treats code as data.

In other words, Lisp had accidentally become an interpreted language; a profoundly different beast than the compiled language McCarthy had set out to create, and one whose character naturally suggests a whole different set of features that would not have occurred to someone designing a compiled language in 1960. There were, of course, some blunders along the way (dynamic scope is an especially famous one, and I would rate the abandonment of fexprs in favor of macros as another of similar magnitude); but in retrospect I see all that as part of exploring a whole new design space of interpreted features. Except that over the past three decades or so the Lisp community seems to have somewhat lost track of its interpreted roots; but, I'll get back to that.

Of interest:

In S-expression Lisp, all source expressions are S-expressions. It is no less true now than before that an operational equivalence M₁ ≅ M₂ can only be nontrivial if at least one of the M_k is not an S-expression; but now, if the M_k are source expressions, we can be absolutely certain that they are both S-expressions. So if the operational equivalence relation is restricted to source expressions, it's trivial. This isn't disastrous; it just means that, in order to have nontrivial theory, we are going to have to have some terms that are not source expressions (as Landin did, though for a different reason); and if we choose to compile the language, we won't be allowed to call our optimizations "local source-to-source" (any given optimization could be one or the other, but not both at once).
This is the fork in the road where Muller and I went our separate ways. Muller's M-Lisp, taking the compiled-language view, supposes that S-expressions are encoded homomorphically, resulting in a baseline language with no native meta-programming power. He then considers how to add some meta-programming power to the resulting language. However, practical meta-programming requires the programmer be able to write lucid code that can also be manipulated as data; and the homomorphic encoding isn't lucid. So instead, meta-programming in Muller's extended M-Lisp uses general M-expressions directly (rather than their encodings). If an M-expression turns out to be wanted as data, it then gets belatedly encoded — with the drawback that the M-expression can't be rewritten until the rewriting schemata can tell it won't be needed as data. This causes difficulties with operational equivalence of general M-expressions; in effect, as the burden of meta-programming is shifted from S-expressions to general M-expressions, it carries along with it the operational-equivalence difficulties that had been limited to S-expressions.

S-language

McCarthy hadn't finished the details of M-expressions, so S-expression Lisp wasn't altogether an encoding of anything; it was itself, leaving its implementors rather free to invent it as they went along. Blurring the boundary between quoted data and unquoted code provided meta-programming facilities that hadn't been available in compiled languages (essentially, I suggest, a sort of flexibility we enjoy in natural languages). In addition to QUOTE itself (which has a rather fraught history entangled with first-class functions and dynamic scope, cf. §3.3.2 of my dissertation), from very early on the language had fexprs, which are like LAMBDA-constructed functions except that they treat their operands as data — as if the operands had been QUOTE‍d — rather than evaluating them as code (which may later be done, if desired, explicitly by the fexpr using EVAL). In 1963, macros were added — not the mere template-substitution macros found in various other languages, but macros that treat their operands as data and perform an arbitrary computation to generate a data structure as output, which is then interpreted as code at the point of call.

But how exactly do we specify the meanings of programs in this interpreted S-expression language? We could resort to the meta-circular evaluator technique; this is a pretty natural strategy since an evaluator is exactly what we have as our primary definition of the language. That approach, though, is difficult to work with mathematically, and in particular doesn't lend itself to proofs of operational equivalence. If we try to construct a rewriting system the way we did before, we immediately run into the glaring practical problem that the same representation is used for executable code, which we want to have nontrivial theory, and passive data which necessarily has perfectly trivial theory. That is, as noted earlier, all source expressions are S-expressions and operational equivalence of S-expressions is trivial. It's possible to elaborate in vivid detail the theoretical train wreck that results from naive application of the usual rewriting semantics strategy to Lisp with quotation (or, worse, fexprs); but this seems to be mainly of interest if one is trying to prove that something can't be done. I'm interested in what can be done.

If what you want is nontrivial theory, that could in principle be used to guide optimizations, this is not difficult to arrange (once you know how; cf. my past discussion of profundity index). As mentioned, all nontrivial operational equivalences must have at least one of the two terms not a source expression (S-expression), therefore we need some terms that aren't source expressions; and our particular difficulty here is having no way to mark a source expression unmistakably as code; so, introduce a primitive context that says "evaluate this source expression". The new context only helps with operational equivalence if it's immune to QUOTE, and no source expression is immune to QUOTE, so that's yet another way to see that the new context must form a term that isn't a source expression.

Whereas the syntax of the compiled M-language had two kinds of terms — constant S-expressions and computational M-expressions — the syntax of the interpreted S-language will have three kinds of terms. There are, again, the "constant" terms, the S-expressions, which are now exactly the source expressions. There are the "computational" terms, which are needed for the actual work of computation; these are collectively shaped something like λ-calculus. We expect a big part of the equational strength of the rewriting calculus to reside in these computational terms, roughly the same equational strength as λ-calculus itself, and therefore of course those terms have to be entirely separate from the source expressions which can't have nontrivial equational theory. And then there are the "interpretation" terms, the ones that orchestrate the gradual conversion of source expressions into computational expressions. The code-marking terms are of this sort. The rewriting rules involving these "interpretation" terms will amount to an algorithm for interpreting source code.

This neat division of terms into three groups won't really be as crisp as I've just made it sound. Interpretation is by nature a gradual process whose coordination seeps into other parts of the grammar. Some non-interpretation terms will carry along environment information, in order to make it available for later use. This blurring of boundaries is perhaps another part of the flexibility that (I'll again suggest) makes interpreted languages more similar to natural languages.

I'll use nonterminal T for arbitrary terms. Here are the interpretation forms.

T ::= [eval T T] | ⟨wrap T⟩ | e .

Form [eval T₁ T₂] is a term that stands for evaluating a term T₁ in an environment T₂. This immediately allows us to distinguish between statements such as

S₁ ≅ S₂
[eval S₁ e₀] ≅ [eval S₂ e₀]
∀e, [eval S₁ e] ≅ [eval S₂ e] .

The first proposition is the excessively strong statement that S-expressions S_k are operationally equivalent — interchangeable in any context — which can only be true if the S_k are identically the same S-expression. The second proposition says that evaluating S₁ in environment e₀ is operationally equivalent to evaluating S₂ in environment e₀ — that is, for all contexts C and all S-expressions S, C[[eval S₁ e₀]] ↦* S iff C[[eval S₂ e₀]] ↦* S. The third proposition says that evaluating S₁ in any environment e is operationally equivalent to evaluating S₂ in the same e — which is what we would ordinarily have meant, in a compiled language, if we said that two executable code (as opposed to data) expressions S_k were operationally equivalent.

The second form, ⟨wrap T⟩, is a wrapper placed around a function T, that induces evaluation of the operand passed to the function. If T is used without such a wrapper (and presuming T isn't already a wrapped function), it acts directly on its unevaluated operand — that is, T is a fexpr.

The third form, e, is simply an environment. An environment is a series of symbol-value bindings, ⟪s_k←T_k⟫; there's no need to go into gory detail here (though I did say more in a previous post).

The computational forms are, as mentioned, similar to λ-calculus with some environments carried along.

T ::= x | [combine T T T] | ⟨λx.T⟩ | ⟨εx.T⟩ .

Here we have a variable, a combination, and two kinds of function. Form ⟨λx.T⟩ is a function that substitutes its operand for x in its body T. Variant ⟨εx.T⟩ substitutes its dynamic environment for x in its body T.

Form [combine T₁ T₂ T₃] is a term that stands for combining a function T₁ with an operand T₂ in a dynamic environment T₃. The dynamic environment is the set of bindings in force at the point where the function is called; as opposed to the static environment, the set of bindings in force at the point where the function is constructed. Static environments are built into the bodies of functions by the function constructor, so they don't show up in the grammar. For example, [eval (LAMBDA X FOO) e₀] would evaluate to a function with static environment e₀, of the form ⟨wrap ⟨λx.[eval FOO ⟪...⟫]⟩⟩ with the contents of e₀ embedded somewhere in the ⟪...⟫.

Putting it all together,

T ::= x | [combine T T T] | ⟨λx.T⟩ | ⟨εx.T⟩ | [eval T T] | ⟨wrap T⟩ | e | S .

The rewriting schemata naturally fall into two groups, those for internal computation and those for source-code interpretation. (There are of course no schemata associated with the third group of syntactic forms, the syntactic forms for passive data, because passive.) The computation schemata closely resemble λ-calculus, except with the second form of function used to capture the dynamic environment (which fexprs sometimes need).

[combine ⟨λx.T₁⟩ T₂ T₃] → T₁[x ← T₂]
[combine ⟨εx.T₁⟩ T₂ T₃] → [combine T₁[x ← T₃] T₂ T₃] .

The interpretation schemata look very much like the dispatching logic of a Lisp interpreter.

[eval d T] → d
if d is an empty list, boolean, λ-function, ε-function, or environment
[eval s e] → lookup(s,e) if symbol s is bound in environment e
[eval (T₁ T₂) T₃] → [combine T₁ T₂ T₃]
[combine ⟨wrap T₁⟩ T₂ T₃] → [combine T₁ [eval T₂ T₃] T₃] .

(There would also be some atomic constants representing primitive first-class functions and reserved operators such as if, and schemata specifying what they do.)

Half a century's worth of misunderstandings and confusions

As I remarked earlier, Lisp as we know it might not have happened — at least, not when and where it did — if McCarthy had thought to allow first-class functions to occur in pairs. The thing is, though, I don't think it's all that much of an "accident". He didn't think to allow first-class functions to occur in pairs, and perhaps the reason we're likely to think to allow them today is that our thinking has been shaped by decades of the free-wheeling attitude fostered by the language that Lisp became because he didn't think to then. The actual sequence of events seems less unlikely than one might first suppose.

Researchers trying to set up semantics for Lisp have been led astray, persistently over the decades, by the fact that the primary Lisp constructor of first-class functions is called LAMBDA. Its behavior is not that of calculus λ, exactly because it's entangled with the process of interpreting Lisp source code. This becomes apparent when contemplating rewriting calculi for Lisp of the sort I've constructed above (and have discussed before on this blog): When you evaluate a LAMBDA expression you get a wrapped function, one that explicitly evaluates its operand and then passes the result to a computational function; that is, passes the result to a fexpr. Scan that: ordinary Lisp functions do not correspond directly to calculus λ-abstractions, but fexprs do correspond directly to calculus λ-abstractions. In its relation to Lisp, λ-calculus is a formal calculus of fexprs.

Much consternation has also been devoted to the perceived theoretical difficulty presented by Lisp's quotation operator (and presented in more extreme form by fexprs), because it presents a particular context that can distinguish any two S-expressions placed into it: (QUOTE S₁) and (QUOTE S₂) are observably distinct whenever the S_k are distinct from each other. Yet, this observation only makes sense in a compiled programming language. Back in the day, it would have been an unremarkable observation that Lisp only has syntax for data structures, no syntax at all for control. Two syntactically distinct Lisp source expressions are operationally non-equivalent even without any quotation or fexpr context, because they don't represent programs at all; they're just passive data structures. The context that makes a source expression code rather than data is patently not in the source; it's in what program you send the source expression to. Conventional small-step operational semantics presumes the decision to compile, along with a trivial interpretive mapping between source expressions and internal computational terms (so the interpretive mapping doesn't have to appear explicitly in the rewriting schemata). It is true that, without any such constructs as quotation or fexprs, there would be no reason to treat the language as interpreted rather than compiled; but once you've crossed that Rubicon, the particular constructs like quotation or fexprs are mere fragments of, and can be distractions from, the main theoretical challenge of defining the semantics of an interpreted language.

The evolution of Lisp features has itself been a long process of learning how best to realize the flexibility offered by interpreted language. Fexprs were envisioned just about from the very beginning — 1960 — but were sabotaged by dynamic scope, a misfeature that resulted from early confusion over how to handle symbol bindings in an interpreter. Macros were introduced in 1963, and unlike fexprs they lend themselves to preprocessing at compile time if one chooses to use a compiler; but macros really ought to be much less mathematically elegant... except that in the presence of dynamic scope, fexprs are virulently unstable. Then there was the matter of first-class functions; that's an area where Lisp ought to have had a huge advantage; but first-class functions don't really come into their own without static scope (The Art of the Interpreter noted this) — and first-class functions also present a difficulty for compilers (which is why procedures in ALGOL were second-class). The upshot was that after Lisp 1.5, when Lisp splintered into multiple dialects, first-class functions went into eclipse until they reemerged in the mid-1970s when Scheme introduced static scope into the language. Fexprs held on longer but, ironically, were finally rejected by the Lisp community in 1980 — just a little ahead of the mainstream adoption of Scheme's static-scope innovation. So for the next twenty years and more, Lisp had static scope and first-class functions, but macros and no fexprs. Meanwhile, EVAL — key workhorse of meta-linguistic flexibility — was expunged from the new generation of mainstream Lisps and has had great difficulty getting back in.

The latter half of Lisp's history has been colored by a long-term trend in programming language design as a whole. I've alluded to this several times above. I have no specific sources to suggest for this; it's visible in the broad sweep of what languages were created, what research was done, and I've sensed it though my interactions with the Lisp community over the past thirty years. When I learned Lisp in the mid-1980s, it was from the Revised Maclisp Manual, Saturday Evening Edition (which I can see a few feet away on my bookshelf as I write this, proof that manuals can be well-written). Maclisp was a product of the mentality of the 1970s. Scheme too was a product of that mentality. And what comes through to me now, looking back, isn't the differences between those languages (different though they are), but that those people knew, gut deep, that Lisp is an interpreted language — philosophically, regardless of the technical details of the language processing software. The classic paper I cited above for the relationship between first-class functions and static scope was one of the "lambda" papers associated with the development of Scheme: "The Art of the Interpreter". The classic textbook — the Wizard Book — that emerged from the Scheme design is Structure and Interpretation of Computer Programs.

But then things changed. Compilation had sometimes intruded into Lisp design, yes (with unfortunate results, as I've mentioned), but the intrusion became more systematic. Amongst Scheme's other achievements it had provided improved compilation techniques, a positive development but which also encouraged greater focus on the challenges of compilation. We refined our mathematical technology for language semantics of compiled languages, we devised complex type systems for use with compiled languages, more and more we designed our languages to fit these technologies — and as Lisp didn't fit, more and more we tweaked our Lisp dialects to try to make them fit. Of course some of the indigenous features of Lisp couldn't fit, because the mathematical tools were fundamentally incompatible with them (no pun intended). And somewhere along the line, somehow, we forgot, perhaps not entirely but enough, that Lisp is interpreted. Second-class syntax has lately been treated more and more as if it were a primary part of the language, rather than a distraction from the core design. Whatever merits such languages have, wholeheartedly embracing the interpretation design stance is not among them.

I'm a believer in trying more rather than less. I don't begrudge anyone their opportunity to follow the design path that speaks to them; but not all those paths speak to me. Second-class syntax doesn't speak to me, nor recasting Lisp into a compiled language. I'm interested in compiling Lisp, but want the language design to direct those efforts rather than the other way around. To me, the potential of interpretation beckons; the exciting things we've already found on that path suggest to me there's more to find further along, and the only way to know is to follow the path and see. To do that, it seems to me we have to recognize that it is a distinct path, the distinct philosophy of interpretation; and, in company with that, we need to hone our mathematical technology for interpreted languages.

These are your father's parentheses
Elegant weapons
for a more... civilized age.
— xkcd 297.

The co-hygiene principle

2016-06-11T15:21:00.000-07:00

The mathematics is not there till we put it there.
— Sir Arthur Eddington, The Philosophy of Physical Science, 1938.

Investigating possible connections between seemingly unrelated branches of science and mathematics can be very cool. Independent of whether the connections actually pan out. It can be mind-bending either way — I'm a big fan of mind-bending, as a practical cure for rigid thinking — and you can get all sorts of off-beat insights into odd corners that get illuminated along the way. The more unlikely the connection, the more likely potential for mind bending; and also the more likely potential for pay-off if somehow it does pan out after all.

Two hazards you need to avoid, with this sort of thing: don't overplay the chances it'll pan out — and don't underplay the chances it'll pan out. Overplay and you'll sound like a crackpot and, worse, you might turn yourself into one. Relish the mind bending, take advantage of it to keep your thinking limber, and don't get upset when you're not finding something that might not be there. And at the same time, if you're after something really unlikely, say with only one chance in a million it'll pan out, and you don't leave yourself open to the possibility it will, you might just draw that one chance in a million and miss it, which would be just awful. So treat the universe as if it has a sense of humor, and be prepared to roll with the punchlines.

Okay, the particular connection I'm chasing is an observed analogy between variable substitution in rewriting calculi and fundamental forces in physics. If you know enough about those two subjects to say that makes no sense, that's what I thought too when I first noticed the analogy. It kept bothering me, though, because it hooks into something on the physics side that's already notoriously anomalous — gravity. The general thought here is that when two seemingly disparate systems share some observed common property, there may be some sort of mathematical structure that can be used to describe both of them and gives rise to the observed common property; and a mathematical modeling structure that explains why gravity is so peculiar in physics is an interesting prospect. So I set out to understand the analogy better by testing its limits, elaborating it until it broke down. Except, the analogy has yet to cooperate by breaking down, even though I've now featured it on this blog twice (1, 2).

So, building on the earlier explorations, in this post I tackle the problem from the other end, and try to devise a type of descriptive mathematical model that would give rise to the pattern observed in the analogy.

This sort of pursuit, as I go about it, is a game of endurance; again and again I'll lay out all the puzzle pieces I've got, look at them together, and try to accumulate a few more insights to add to the collection. Then gather up the pieces and save them away for a while, and come back to the problem later when I'm fresh on it again. Only this time I've kind-of succeeded in reaching my immediate goal. The resulting post, laying out the pieces and accumulating insights, is therefore both an explanation of where the result comes from and a record of the process by which I got there. There are lots of speculations within it shooting off in directions that aren't where I ended up. I pointedly left the stray speculations in place. Some of those tangents might turn out to be valuable after all; and taking them out would create a deceptive appearance of things flowing inevitably to a conclusion when, in the event, I couldn't tell whether I was going anywhere specific until I knew I'd arrived.

Naturally, for finding a particular answer — here, a mathematical structure that can give rise to the observed pattern — the reward is more questions.

Contents
Noether's Theorem
Calculi
Analogy
Metatime
Transformations
Determinism and rewriting
Nondeterminism and the cosmic footprint
Massive interconnection
Factorization
Side-effects
Co-hygiene
Epilog: hygiene

Noether's Theorem

Noether's theorem (pedantically, Noether's first theorem) says that each differentiable invariant in the action of a system gives rise to a conservation law. This is a particularly celebrated result in mathematical physics; it's explicitly about how properties of a system are implied by the mathematical structure of its description; and invariants — the current fad name for them in physics is "symmetries" — are close kin to both hygiene and geometry, which relate to each other through the analogy I'm pursuing; so Noether's theorem has a powerful claim on my attention.

The action of a system always used to seem very mysterious to me, until I figured out it's one of those deep concepts that, despite its depth, is also quite shallow. It comes from Lagrangian mechanics, a mathematical formulation of classical mechanics alternative to the Newtonian mechanics formulation. This sort of thing is ubiquitous in mathematics, alternative formulations that are provably equivalent to each other but make various problems much easier or harder to solve.

Newtonian mechanics seeks to describe the trajectory of a thing in terms of its position, velocity, mass, and the forces acting on it. This approach has some intuitive advantages but is sometimes beastly difficult to solve for practical problems. The Lagrangian formulation is sometimes much easier to solve. Broadly, the time evolution of the system follows a trajectory through abstract state-space, and a function called the Lagrangian of the system maps each state into a quantity that... er... well, its units are those of energy. For each possible trajectory of the system through state-space, the path integral of the Lagrangian is the action. The principle of least action says that starting from a given state, the system will evolve along the trajectory that minimizes the action. Solving for the behavior of the system is then a matter of finding the trajectory whose action is smallest. (How do you solve for the trajectory with least action? Well, think of the trajectories as abstract values subject to variation, and imagine taking the "derivative" of the action over these variations. The least action will be a local minimum, where this derivative is zero. There's a whole mathematical technology for solving problems of just that form, called the "calculus of variations".)

The Lagrangian formulation tends to be good for systems with conserved quantities; one might prefer the Newtonian approach for, say, a block sliding on a surface with friction acting on it. And this Lagrangian affinity for conservative systems is where Noether's theorem comes in: if there's a differentiable symmetry of the action — no surprise it has to be differentiable, seeing how central integrals and derivatives are to all this — the symmetry manifests itself in the system behavior as a conservation law.

And what, you may ask, is this magical Lagrangian function, whose properties studied through the calculus of variations reveal the underlying conservation laws of nature? Some deeper layer of reality, the secret structure that underlies all? Not exactly. The Lagrangian function is whatever works: some function that causes the principle of least action to correctly predict the behavior of the system. In quantum field theory — so I've heard, having so far never actually grappled with QFT myself — the Lagrangian approach works for some fields but there is no Lagrangian for others. (Yes, Lagrangians are one of those mathematical devices from classical physics that treats systems in such an abstract, holistic way that it's applicable to quantum mechanics. As usual for such devices, its history involves Sir William Rowan Hamilton, who keeps turning up on this blog.)

This is an important point: the Lagrangian is whatever function makes the least-action principle work right. It's not "really there", except in exactly the sense that if you can devise a Lagrangian for a given system, you can then use it via the action integral and the calculus of variations to describe the behavior of the system. Once you have a Lagrangian function that does in fact produce the system behavior you want it to, you can learn things about that behavior from mathematical exploration of the Lagrangian. Such as Noether's theorem. When you find there is, or isn't, a certain differentiable symmetry in the action, that tells you something about what is or isn't conserved in the behavior of the system, and that result really may be of great interest; just don't lose sight of the fact that you started with the behavior of the system and constructed a suitable Lagrangian from which you are now deducing things about what the behavior does and doesn't conserve.

In 1543, Copernicus's heliocentric magnum opus De revolutionibus orbium coelestium was published with an unsigned preface by Lutheran theologian Andreas Osiander saying, more or less, that of course it'd be absurd to suggest the Earth actually goes around the Sun but it's a very handy fiction for the mathematics. Uhuh. It's unnecessary to ask whether our mathematical models are "true"; we don't need them to be true, just useful. When Francis Bacon remarked that what is most useful in practice is most correct in theory, he had a point — at least, for practical purposes.

Calculi

The rewriting-calculus side of the analogy has a structural backstory from at least the early 1960s (some of which I've described in an earlier post, though with a different emphasis). Christopher Strachey hired Peter Landin as an assistant, and encouraged him to do side work exploring formal foundations for programming languages. Landin focused on tying program semantics to λ-calculus; but this approach suffered from several mismatches between the behavioral properties of programming languages versus λ-calculus, and in 1975 Gordon Plotkin published a solution for one of these mismatches, in one of the all-time classic papers in computer science, "Call-by-name, call-by-value and the λ-calculus" (pdf). Plotkin defined a slight variant of λ-calculus, by altering the conditions for the β-rule so that the calculus became call-by-value (the way most programming languages behaved while ordinary λ-calculus did not), and proved that the resulting λ_v-calculus was fully Church-Rosser ("just as well-behaved" as ordinary λ-calculus). He further set up an operational semantics — a rewriting system that ignored mathematical well-behavedness in favor of obviously describing the correct behavior of the programming language — and proved a set of correspondence theorems between the operational semantics and λ_v-calculus.

[In the preceding paragraph I perhaps should have mentioned compatibility, the other crucial element of rewriting well-behavedness; which you might think I'd have thought to mention since it's a big deal in my own work, though less flashy and more taken-for-granted than Church-Rosser-ness.]

Then in the 1980s, Matthias Felleisen applied Plotkin's approach to some of the most notoriously "unmathematical" behaviors of programs: side-effects in both data (mutable variables) and control (goto and its ilk). Like Plotkin, he set up an operational semantics and a calculus, and proved correspondence theorems between them, and well-behavedness for the calculus. He introduced the major structural innovation of treating a side-effect as an explicit syntactic construct that could move upward within its term. This upward movement would be a fundamentally different kind of rewrite from the function-application — the β-rule — of λ-calculus; abstractly, a side-effect is represented by a context 𝓢, which moves upward past some particular context C and, in the process, modifies C to leave in its wake some other context C': C[𝓢[T]] → 𝓢[C'[T]] . A side-effect is thus viewed as something that starts in a subterm and expands outward to affect more and more of the term until, potentially, it affects the whole term — if it's allowed to expand that far. Of course, a side-effect might never expand that far if it's trapped inside a context that it can't escape from; notably, no side-effect can escape from context λ.[ ] , which is to say, no side-effect can escape from inside the body of a function that hasn't been called.

This is where I started tracking the game, and developing my own odd notions. There seemed to me to be two significant drawbacks to Felleisen's approach, in its original published form. For one thing, the transformation of context C to C', as 𝓢 moved across it, could be quite extensive; Felleisen himself aptly called these transformations "bubbling up"; as an illustration of how messy things could get, here are the rules for a continuation-capture construct C expanding out of the operator or operand of a function call:

(CT₁)T₂ → C(λx₁.(T₁(λx₂.(A(x₁(x₂T₂)))))) for unused x_k.
V(CT) → C(λx₁.(T(λx₂.(A(x₁(V‍x₂)))))) for unused x_k.

The other drawback to the approach was that as published at the time, it didn't actually provide the full measure of well-behavedness from Plotkin's treatment of call-by-value. One way or another, a constraint had to be relaxed somewhere. What does the side-effect construct 𝓢 do once it's finished moving upward? The earliest published solution was to wait till 𝓢 reaches the top of the term, and then get rid of it by a whole-term rewriting rule; that works, but the whole-term rewriting rule is explicitly not well-behaved: calculus well-behavedness requires that any rewriting on a whole term can also be done to a subterm, and here we've deliberately introduced a rewriting rule that can't be applied to subterms. So we've weakened the calculus well-behavedness. Another solution is to let 𝓢 reach the top of the term, then let it settle into some sort of normal form, and relax the semantics–calculus correspondence theorems to allow for equivalent normal forms. So the correspondence is weaker or, at least, more complicated. A third solution is to introduce an explicit context-marker — in both the calculus and the operational semantics — delimiting the possible extent of the side-effect. So you've got full well-behavedness but for a different language than you started out with. (Felleisen's exploration of this alternative is part of the prehistory of delimited continuations, but that's another story.)

[In a galling flub, I'd written in the preceding paragraph Church-Rosser-ness instead of well-behavedness; fixed now.]

It occurred to me that a single further innovation should be able to eliminate both of these drawbacks. If each side-effect were delimited by a context-marker that can move upward in the term, just as the side-effect itself can, then the delimiter would restore full Church-Rosser-ness without altering the language behavior; but, in general, the meanings of the delimited side-effects depend on the placement of the delimiter, so to preserve the meaning of the term, moving the delimiter may require some systematic alteration to the matching side-effect markers. To support this, let the delimiter be a variable-binding construct, with free occurrences of the variable in the side-effect markers. The act of moving the delimiter would then involve a sort of substitution function that propagates needed information to matching side-effect markers. What with one thing and another, my academic pursuits dragged me away from this line of thought for years, but then in the 2000s I found myself developing an operational semantics and calculus as part of my dissertation, in order to demonstrate that fexprs really are well-behaved (though I should have anticipated that some people, having been taught otherwise, would refuse to believe it even with proof). So I seized the opportunity to also elaborate my binding-delimiters approach to things that — unlike fexprs — really are side-effects.

This second innovation rather flew in the face of a tradition going back about seven or eight decades, to the invention of λ-calculus. Alonzo Church was evidently quite concerned about what variables mean; he maintained that a proposition with free variables in it doesn't have a clear meaning, and he wanted to have just one variable-binding construct, λ, whose β-rule defines the practical meanings of all variables. This tradition of having just one kind of variable, one binding construct, and one kind of variable-substitution (β-substitution) has had a powerful grip on researchers' imaginations for generations, to the point where even when other binding constructs are introduced they likely still have most of the look-and-feel of λ. My side-effect-ful variable binders are distinctly un-λ-like, with rewriting rules, and substitution functions, bearing no strong resemblance to the β-rule. Freedom from the β mold had the gratifying effect of allowing much simpler rewriting rules for moving upward through a term, without the major perturbations suggested by the term "bubbling up"; but, unsurprisingly, the logistics of a wild profusion of new classes of variables were not easy to work out. Much elegant mathematics surrounding λ-calculus rests squarely on the known simple properties of its particular take on variable substitution. The chapter of my dissertation that grapples with the generalized notion of substitution (Chapter 13, "Substitutive Reduction Systems", for anyone keeping score) has imho appallingly complicated foundations, although the high-level theorems at least are satisfyingly powerful. One thing that did work out neatly was enforcement of variable hygiene, which in ordinary λ-calculus is handled by α-renaming. In order to apply any nontrivial term-rewriting rule without disaster, you have to first make sure there aren't some two variables using the same name whose distinction from each other would be lost during the rewrite. It doesn't matter, really, what sort of variables are directly involved in the rewrite rule: an unhygienic rewrite could mess up variables that aren't even mentioned by the rule. Fortunately, it's possible to define a master α-renaming function that recurses through the term renaming variables to maintain hygiene, and whenever you add a new sort of variable to the system, just extend the master function with particular cases for that new sort of variable. Each rewriting rule can then invoke the master function, and everything works smoothly.

I ended up with four classes of variables. "Ordinary" variables, of the sort supported by λ-calculus, I found were actually wanted only for a specific (and not even technically necessary) purpose: to support partial evaluation. You could build the whole calculus without them and everything would work right, but the equational theory would be very weak. (I blogged on this point in detail here.) A second class of variable supported continuations; in effect, the side-effect marker was a "throw" and the binding construct was a "catch". Mutable state was more complicated, involving two classes of variables, one for assignments and one for lookups. The variables for assignment were actually environment identities; each assignment side-effect would then specify a value, a symbol, and a free variable identifying the environment. The variables for lookup stood for individual environment-symbol queries; looking up a symbol in an environment would generate queries for that environment and each of its ancestor environments. The putative result of the lookup would be a leaf subterm with free variable occurrences for all the queries involved, waiting to assimilate the query results, while the queries themselves would rise through the term in search of matching assignments. Whenever a query found a matching assignment, it would self-annihilate while using substitution to report the result to all waiting free variable occurrences.

Does all this detail matter to the analogy with physics? Well, that's the question, isn't it. There's a lot there, a great deal of fodder to chew on when considering how an analogy with something else might have a structural basis.

Analogy

Amongst the four classes of variables, partial-evaluation variables have a peculiar sort of... symmetry. If you constructed a vau-calculus with, say, only continuation variables, you'd still have two different substitution functions — one to announce that a delimiting "catch" has been moved upward, and one for α-renaming. If you constructed a vau-calculus with only mutable-state variables, you'd have, well, a bunch of substitution functions, but in particular all the substitutions used to enact rewriting operations would be separate from α-renaming. β-substitution, though, is commensurate with α-renaming; once you've got β-substitution of partial-evaluation variables, you can use it to α-rename them as well, which is why ordinary λ-calculus has, apparently at least, only one substitution function.

Qualitatively, partial-evaluation variables seem more integrated into the fabric of the calculus, in contrast to the other classes of variables.

All of which put me powerfully in mind of physics because it's a familiar observation that gravity seems qualitatively more integrated into the fabric of spacetime, in contrast to the other fundamental forces (xkcd). General relativity portrays gravity as the shape of spacetime, whereas the other forces merely propagate through spacetime, and a popular strategy for aspiring TOEs (Theories Of Everything) is to integrate the other fundamental forces into the geometry as well — although, looking at the analogy, perhaps that popular strategy isn't such a good idea after all. Consider: The analogy isn't just between partial-evaluation variables and gravity. It's between the contrast of partial-evaluation variables against the other classes of variables, and the contrast of gravity against the other fundamental forces. All the classes of variables, and all the fundamental forces, are to some extent involved. I've already suggested that Felleisen's treatment of side-effects was both weakened and complicated by its too-close structural imitation of λ, whereas a less λ-like treatment of side-effects can be both stronger and simpler; so, depending on how much structure carries through the analogy, perhaps trying to treat the other fundamental forces too much like gravity should be expected to weaken and complicate a TOE.

Projecting through the analogy suggests alternative ways to structure theories of physics, which imho is worthwhile independent of whether the analogy is deep or shallow; as I've remarked before, I actively look for disparate ways of thinking as a broad base for basic research. The machinery of calculus variable hygiene, with which partial-evaluation variables have a special affinity, is only one facet of term structure; and projecting this through to fundamental physics, where gravity has a special affinity with geometry, this suggests that geometry itself might usefully be thought of, not as the venue where physics takes place, but merely as part of the rules by which the game is played. Likewise, the different kinds of variables differ from each other by the kinds of structural transformations they involve; and projecting that through the analogy, one might try to think of the fundamental forces as differing from each other not (primarily) by some arbitrary rules of combination and propagation, but by being different kinds of structural manipulations of reality. Then, if there is some depth to the analogy, one might wonder if some of the particular technical contrasts between different classes of variables might be related to particular technical contrasts between different fundamental forces — which, frankly, I can't imagine deciphering until and unless one first sets the analogy on a solid technical basis.

I've speculated several times on this blog on the role of non-locality in physics. Bell's Theorem says that the statistical distribution of quantum predictions cannot be explained by any local, deterministic theory of physics if, by 'local and deterministic', you mean 'evolving forward in time in a local and deterministic way'; but it's quite possible to generate this same statistical distribution of spacetime predictions using a theory that evolves locally and deterministically in a fifth dimension orthogonal to spacetime. Which strikes a familiar chord through the analogy with calculus variables, because non-locality is, qualitatively at least, the defining characteristic of what we mean by "side-effects", and the machinery of α-renaming maintains hygiene for these operations exactly by going off and doing some term rearranging on the side (as if in a separate dimension of rewriting that we usually don't bother to track). Indeed, thought of this way, a "variable" seems to be an inherently distributed entity, spread over a region of the term — called its scope — rather than located at a specific point. A variable instance might appear to have a specific location, but only because we look at a concrete syntactic term; naturally we have to have a particular concrete term in order to write it down, but somehow this doesn't seem to do justice to the reality of the hygiene machinery. One could think of an equivalence class of terms under α-renaming, but imho even that is a bit passive. The reality of a variable, I've lately come to think, is a dynamic distributed entity weaving through the term, made up of the binding construct (such as a λ), all the free instances within its scope, and the living connections that tie all those parts together; I imagine if you put your hand on any part of that structure you could feel it humming with vitality.

Metatime

To give a slightly less hand-wavy description of my earlier post on Bell's Theorem — since it is the most concrete example we have to inform our view of the analogy on the physics side:

Bell looked at a refinement of the experiment from the EPR paradox. A device emits two particles with entangled spin, which shoot off in opposite directions, and their spins are measured by oriented detectors at some distance from the emitter. The original objection of Einstein Podolsky and Rosen was that the two measurements are correlated with each other, but because of the distance between the two detectors, there's no way for information about either measurement to get to where the other measurement takes place without "spooky action at a distance". Bell refined this objection by noting that the correlation of spin measurements depends on the angle θ between the detectors. If you suppose that the orientations of the detectors at measurement is not known at the time and place where the particles are emitted, and that the outcomes of the measurements are determined by some sort of information — "hidden variable" — propagating from the emission event at no more than the speed of light, then there are limits (called Bell's Inequality) on how the correlation can be distributed as a function of θ, no matter what the probability distribution of the hidden variable. The distribution predicted by quantum mechanics violates Bell's Inequality; so if the actual probability distribution of outcomes from the experiment matches the quantum mechanical prediction, we're living in a world that can't be explained by a local hidden-variable theory.

My point was that this whole line of reasoning supposes the state of the world evolves forward in time. If it doesn't, then we have to rethink what we even mean by "locality", and I did so. Suppose our entire four-dimensional reality is generated by evolving over a fifth dimension, which we might as well call "metatime". "Locality" in this model means that information about the state of one part of spacetime takes a certain interval of metatime to propagate a certain distance to another part of spacetime. Instead of trying to arrange the probability distribution of a hidden variable at the emission event so that it will propagate through time to produce the desired probability distribution of measurements — which doesn't work unless quantum mechanics is seriously wrong about this simple system — we can start with some simple, uniform probability distribution of possible versions of the entire history of the experiment, and by suitably arranging the rules by which spacetime evolves, we can arrange that eventually spacetime will settle into a stable state where the probability distribution is just what quantum mechanics predicts. In essence, it works like this: let the history of the experiment be random (we don't need nondeterminism here; this is just a statement of uniformly unknown initial conditions), and suppose that the apparent spacetime "causation" between the emission and the measurements causes the two measurements to be compared to each other. Based on θ, let some hidden variable decide whether this version of history is stable; and if it isn't stable, just scramble up a new one (we can always do that by pulling it out of the uniform distribution of the hidden variable, without having to posit fundamental nondeterminism). By choosing the rule for how the hidden variable interacts with θ, you can cause the eventual stable history of the experiment to exhibit any probability distribution you choose.

That immense power is something to keep a cautious eye on: not only can this technique produce the probability distribution predicted by quantum mechanics, it can produce any other probability distribution as well. So, if the general structure of this mathematical theory determines something about the structure of the physical reality it depicts, what it determines is apparently not, in any very straightforward fashion, that probability distribution.

Transformations

The side of the analogy we have prior detailed structural knowledge about is the vau-calculus side. Whatever useful insights we may hope to extract from the metatime approach to Bell's Theorem, it's very sketchy compared to vau-calculus. So if we want to work out a structural pattern that applies to both sides of the analogy, it's plausible to start building from the side we know about, questioning and generalizing as we go along. To start with,

Suppose we have a complex system, made up of interconnected parts, evolving by some sort of transformative steps according to some simple rules.

Okay, freeze frame. Why should the system be made up of parts? Well, in physics it's (almost) always the parts we're interested in. We ourselves are, apparently, parts of reality, and we interact with parts of reality. Could we treat the whole as a unit and then somehow temporarily pull parts out of it when we need to talk about them? Maybe, but the form with parts is still the one we're primarily interested in. And what about "transformative steps"; do we want discrete steps rather than continuous equations? Actually, yes, that is my reading of the situation; not only does fundamental physics appear to be shot through with discreteness (I expanded on this point a while back), but the particular treatment I used for my metatime proof-of-concept (above) used an open-ended sequence of discrete trials to generate the requisite probability distribution. If a more thoroughly continuous treatment is really wanted, one might try to recover continuity by taking a limit a la calculus.

Suppose we separate the transformation rules into two groups, which we call bookkeeping rules and operational rules; and suppose we have a set of exclusive criteria on system configurations, call these hygiene conditions, which must be satisfied before any operational rule can be applied.

Freeze again. At first glance, this looks pretty good. From any unhygienic configuration, we can't move forward operationally until we've done bookkeeping to ensure hygiene. Both calculus rewriting and the metatime proof-of-concept seemingly conform to this pattern; but the two cases differ profoundly in how their underlying hygiene (supposing that's what it is, in the physics case) affects the form of the modeled system, and we'll need to consider the difference carefully if we mean to build our speculations on a sound footing.

Determinism and rewriting

Hygiene in rewriting is all about preserving properties of a term (to wit, variable instance–binding correspondences), whereas our proof-of-concept metatime transformations don't appear to be about perfectly preserving something but rather about shaping probability distributions. One might ask whether it's possible to set up the internals of our metatime model so that the probability distribution is a consequence, or symptom, of conserving something behind the scenes. Is the seemingly nondeterministic outcome of our quantum observation in a supposedly small quantum system here actually dictated by the need to maintain some cosmic balance that can't be directly observed because it's distributed over a ridiculously large number of entities (such as the number of electrons in the universe)? That could lead to some bracing questions about how to usefully incorporate such a notion into a mathematical theory.

As an alternative, one might decide that the probability distribution in the metatime model should not be a consequence of absolutely preserving a condition. There are two philosophically disparate sorts of models involving probabilities: either the probability comes from our lack of knowledge (the hidden-variable hypothesis), and in the underlying model the universe is computing an inevitable outcome; or the probability is in the foundations (God playing dice, in the Einsteinian phrase), and in the underlying model the universe is exploring the range of possible outcomes. I discussed this same distinction, in another form, in an earlier post, where it emerged as the defining philosophical distinction between a calculus and a grammar (here). In those terms, if our physics model is fundamentally deterministic then it's a calculus and by implication has that structural affinity with the vau-calculi on the other side of the analogy; but if our physics model is fundamentally nondeterministic then it's a grammar, and our analogy has to try to bridge that philosophical gap. Based on past experience, though, I'm highly skeptical of bridging the gap; if the analogy can be set on a concrete technical basis, the TOE on the physics side seems to me likely to be foundationally deterministic.

The foundationally deterministic approach to probability is to start with a probabilistic distribution of deterministic initial states, and evolve them all forward to produce a probabilistic distribution of deterministic final states. Does the traditional vau-calculus side of our analogy, where we have so much detail to start with, have anything to say about this? In the most prosaic sense, ones suspects not; probability distributions don't traditionally figure into deterministic computation semantics, where this approach would mean considering fuzzy sets of terms. There may be some insight lurking, though, in the origins of calculus hygiene.

When Alonzo Church's 1932/3 formal logic turned out to be inconsistent, he tried to back off and find some subset of it that was provably consistent. Here consistent meant that not all propositions are equivalent to each other, and the subset of the logic that he and his student J. Barkley Rosser proved consistent in this sense was what we now call λ-calculus. The way they did it was to show that if any term T₁ can be reduced in the calculus in two different ways, as T₂ and T₃, then there must be some T₄ that both of them can be reduced to. Since logical equivalence of terms is defined as the smallest congruence generated by the rewriting relation of the calculus, from the Church-Rosser property it follows that if two terms are equivalent, there must be some term that they both can be reduced to; and therefore, two different irreducible terms cannot possibly be logically equivalent to each other.

Proving the Church-Rosser theorem for λ-calculus was not, originally, a simple matter. It took three decades before a simple proof began to circulate, and the theorem for variant calculi continues to be a challenge. And this is (in one view of the matter, at least) where hygiene comes into the picture. Church had three major rewriting rules in his system, later called the α, β, and γ rules. The α rule was the "bookkeeping" rule; it allowed renaming a bound variable as long as you don't lose its distinction from other variables in the process. The β rule is now understood as the single operational rule of λ-calculus, how to apply a function to an argument. The γ rule is mostly forgotten now; it was simply doing a β-step backward, and was later dropped in favor of starting with just α and β and then taking the congruent closure (reflexive, symmetric, transitive, and compatible). Ultimately the Church-Rosser theorem allows terms to be sorted into β-equivalence classes; but the terms in each class aren't generally thought of as "the same term", just "equivalent terms". α-equivalent terms, though, are much closer to each other, and for many purposes would actually be thought of as "the same term, just written differently". Recall my earlier description of a variable as a distributed entity, weaving through the term, made up of binding construct, instances, and living connections between them. If you have a big term, shot through with lots of those dynamic distributed entities, the interweaving could really be vastly complex. So factoring out the α-renaming is itself a vast simplification, which for a large term may dwarf what's left after factoring to complete the Church-Rosser proof. To see by just how much the bookkeeping might dwarf the remaining operational complexity, imagine scaling the term up to the sort of cosmological scope mentioned earlier — like the number of electrons in the universe.

It seems worth considering, that hygiene may be a natural consequence of a certain kind of factoring of a vastly interconnected system: you sequester almost all of the complexity into bookkeeping with terrifically simple rules applied on an inhumanly staggering scale, and comparatively nontrivial operational rules that never have to deal directly with the sheer scale of the system because that part of the complexity was factored into the bookkeeping. In that case, at some point we'll need to ask when and why that sort of factoring is possible. Maybe it isn't really possible for the cosmos, and a flaw in our physics is that we've been trying so hard to factor things this way; when we really dive into that question we'll be in deep waters.

It's now no longer clear, btw, that geometry corresponds quite directly to α-renaming. There was already some hint of that in the view of vau-calculus side-effects as "non-local", which tends to associate geometry with vau-calculus term structure rather than α-renaming as such. Seemingly, hygiene is then a sort of adjunct to the geometry, something that allows the geometry to coexist with the massive interconnection of the system.

But now, with massive interconnection resonating between the two sides of the analogy, it's definitely time to ask some of those bracing questions about incorporating cosmic connectivity into a mathematical theory of physics.

Nondeterminism and the cosmic footprint

We want to interpret our probability distribution as a footprint, and reconstruct from it the massively connected cosmic order that walked there. Moreover, we're conjecturing that the whole system is factorable into bookkeeping/hygiene on one hand(?), and operations that amount to what we'd ordinarily call "laws of physics" on the other; and we'd really like to deduce, from the way quantum mechanics works, something about the nature of the bookkeeping and the factorization.

Classically, if we have a small system that's acted on by a lot of stuff we don't know about specifically, we let all those influences sum to a potential field. One might think of this classical approach as a particular kind of cosmological factorization in which the vast cosmic interconnectedness is reduced to a field, so one can then otherwise ignore almost everything to model the behavior of the small system of interest using a small operational set of physical laws. We know the sort of cosmic order that reduces that way, it's the sort with classical locality (relative to time evolution); and the vaster part of the factorization — the rest of the cosmos, that reduced to a potential field — is of essentially the same kind as the small system. The question we're asking at this point is, what sort of cosmic order reduces such that its operational part is quantum mechanics, and what does its bookkeeping part look like? Looking at vau-calculus, with its α-equivalence and Church-Rosser β-equivalence, it seems fairly clear that hygiene is an asymmetric factorization: if the cosmos factors this way, the bookkeeping part wouldn't have to look at all like quantum mechanics. A further complication is that quantum mechanics may be an approximation only good when the system you're looking at is vastly smaller than the universe as a whole; indeed, this conjecture seems rather encouraged by what happens when we try to apply our modern physical theories to the cosmos as a whole: notably, dark matter. (The broad notion of asymmetric factorizations surrounding quantum mechanics brings to mind both QM's notorious asymmetry between observer and observed, and Einstein's suggestion that QM is missing some essential piece of reality.)

For this factorization to work out — for the cosmic system as a whole to be broadly "metaclassical" while factoring through the bookkeeping to either quantum mechanics or a very good approximation thereof — the factorization has to have some rather interesting properties. In a generic classical situation where one small thing is acted on by a truly vast population of other things, we tend to expect all those other things to average out (as typically happens with a classical potential field), so that the vastness of the population makes their combined influence more stable rather than less; and also, as our subsystem interacts with the outside influence and we thereby learn more about that influence, we become more able to allow for it and still further reduce any residual unpredictability of the system.

Considered more closely, though, the expectation that summing over a large population would tend to average out is based broadly on the paradigm of signed magnitudes on an unbounded scale that attenuate over distance. If you don't have attenuation, and your magnitudes are on a closed loop, such as angles of rotation, increasing the population just makes things more unpredictable. Interestingly, I'd already suggested in one of my earlier explorations of the hygiene analogy that the physics hygiene condition might be some sort of rotational constraint, for the — seemingly — unrelated reason that the primary geometry of physics has 3+1 dimension structure, which is apparently the structure of quaternions, and my current sense of quaternions is that they're the essence of rotation. Though when things converge like this, it can be very hard to distinguish between an accidental convergence and one that simply reassembles fragments of a deeper whole.

I'll have a thought on the other classical problem — losing unpredictability as we learn more about outside influences over time — after collecting some further insights on the structural dynamics of bookkeeping.

Massive interconnection

Given a small piece of the universe, which other parts of the universe does it interact with, and how?

In the classical decomposition, all interactions with other parts of the universe are based on relative position in the geometry — that is, locality. Interestingly, conventional quantum mechanics retains this manner of selecting interactions, embedding it deeply into the equational structure of its mathematics. Recall the Schrödinger equation,

iℏ ∂ Ψ

∂t

= Ĥ Ψ .

The element that shapes the time evolution of the system — the Hamiltonian function Ĥ — is essentially an embodiment of the classical expectation of the system behavior; which is to say, interaction according to the classical rules of locality. (I discussed the structure of the Schrödinger equation at length in an earlier post, yonder.) Viewing conventional QM this way, as starting with classical interactions and then tacking on quantum machinery to "fix" it, I'm put in mind of Ptolemaic epicycles, tacked on to the perfect-circles model of celestial motion to make it work. (I don't mean the comparison mockingly, just critically; Ptolemy's system worked pretty well, and Copernicus used epicycles too. Turns out there's a better way, though.)

How does interaction-between-parts play out in vau-calculus, our detailed example of hygiene at work?

The whole syntax of a calculus term is defined by two aspects: the variables — by which I mean those "distributed entities" woven through the term, each made of a binding, its bound instances, and the connections that hygiene maintains between them — and, well, everything else. Once you ignore the specific identities of all the variable instances, you've just got a syntax tree with each node labeled by a context-free syntax production; and the context-free syntax doesn't have very many rules. In λ-calculus there are only four syntax rules: a term is either a combination, or a λ-abstraction, or a variable, or a constant. Some treatments simplify this by using variables for the "constants", and it's down to only three syntax rules. The lone operational rule β,

((λx.T₁)T₂) → T₁[x ← T₂] ,

gives a purely local pattern in the syntax tree for determining when the operation can be applied: any time you have a parent node that's a combination whose left child is a λ-abstraction. Operational rules stay nearly this simple even in vau-calculi; the left-hand side of each operational rule specifies when it can be applied by a small pattern of adjacent nodes in the syntax tree, and mostly ignores variables (thanks to hygienic bookkeeping). The right-hand side is where operational substitution may be introduced. So evidently vau-calculus — like QM — is giving local proximity a central voice in determining when and how operations apply, with the distributed aspect (variables) coming into play in the operation's consequences (right-hand side).

Turning it around, though, if you look at a small subterm — analogous, presumably, to a small system studied in physics — what rules govern its non-local connections to other parts of the larger term? Let's suppose the term is larger than our subterm by a cosmically vast amount. The free variables in the subterm are the entry points by which non-local influences from the (vast) context can affect the subterm (via substitution functions). And there is no upper limit to how fraught those sorts of interconnections can get (which is, after all, what spurs advocates of side-effect-less programming). That complexity belongs not to the "laws of physics" (neither the operational nor even the bookkeeping rules), but to the configuration of the system. From classical physics, we're used to having very simple laws, with all the complexity of our problems coming from the particular configuration of the small system we're studying; and now we've had that turned on its head. From the perspective of the rules of the calculus, yes, we still have very simple rules of manipulation, and all the complexity is in the arrangement of the particular term we're working with; but from the perspective of the subterm of interest, the interconnections imposed by free variables look a lot like "laws of physics" themselves. If we hold our one subterm fixed and allow the larger term around it to vary probabilistically then, in general, we don't know what the rules are and have no upper bound on how complicated those rules might be. All we have are subtle limits on the shape of the possible influences by those rules, imposed roundaboutly by the nature of the bookkeeping-and-operational transformations.

One thing about the shape of these nonlocal influences: they don't work like the local part of operations. The substitutive part of an operation typically involves some mixture of erasing bound variable instances and copying fragments from elsewhere. The upshot is that it rearranges the nonlocal topology of the term, that is, rearranges the way the variables interweave — which is, of course, why the bookkeeping rules are needed, to maintain the integrity of the interweaving as it winds and twists. And this is why a cosmic system of this sort doesn't suffer from a gradual loss of unpredictability as the subsystem interacts with its neighbors in the nonlocal network and "learns" about them: each nonlocal interaction that affects it changes its nonlocal-network neighbors. As long as the overall system is cosmically vast compared to the the subsystem we're studying, in practice we won't run out of new network neighbors no matter how many nonlocal actions our subsystem undergoes.

This also gives us a specific reason to suspect that quantum mechanics, by relying on this assumption of an endless supply of fresh network neighbors, would break down when studying subsystems that aren't sufficiently small compared to the cosmos as a whole. Making QM (as I've speculated before), like Newtonian physics, an approximation that works very well in certain cases.

Factorization

Here's what the reconstructed general mathematical model looks like so far:

The system as a whole is made up of interconnected parts, evolving by transformative steps according to simple rules.
The interconnections form two subsystems: local geometry, and nonlocal network topology.
The transformation rules are of two kinds, bookkeeping and operational.
Operational rules can only be applied to a system configuration satisfying certain hygiene conditions on its nonlocal network topology; bookkeeping, which only acts on nonlocal network topology, makes it possible to achieve the hygiene conditions.
Operational rules are activated based on local criteria (given hygiene). When applied, operations can have both local and nonlocal effects, while the integrity of the nonlocal network topology is maintained across both kinds of effect via hygiene, hence bookkeeping.

To complete this picture, it seems, we want to consider what a small local system consists of, and how it relates to the whole. This is all the more critical since we're entertaining the possibility that quantum mechanics might be an approximation that only works for a small local system, so that understanding how a local system relates to the whole may be crucial to understanding how quantum mechanics can arise locally from a globally non-quantum TOE.

A local system consists of some "local state", stuff that isn't interconnection of either kind; and some interconnections of (potentially) both kinds that are entirely encompassed within the local system. For example, in vau-calculi — our only prior source for complete examples — we might have a subterm (λx.(y‍x)). Variables are nonlocal network topology, of course, but in this case variable x is entirely contained within the local system. The choice of variable name "x" is arbitrary, as long as it remains different from "y" (hygiene). But what about the choice of "y"? In calculus reasoning we would usually say that because y is free in this subterm, we can't touch it; but that's only true if we're interested in comparing it to specific contexts, or to other specific subterms. If we're only interested in how this subterm relates to the rest of the universe, and we have no idea what the rest of the universe is, then the free variable y really is just one end of a connection whose other end is completely unknown to us; and a different choice of free variable would tell us exactly as much, and as little, as this one. We would be just as well off with (λx.(z‍x)), or (λz.(w‍z)), or even (λy.(x‍y)) — as long as we maintain the hygienic distinction between the two variables. The local geometry that can occur just outside this subterm, in its surrounding context, is limited to certain specific forms (by the context-free grammar of the calculus); the nonlocal network topology is vastly less constrained.

The implication here is that all those terms just named are effectively equivalent to (λx.(y‍x)). One might be tempted to think of this as simply different ways of "writing" the same local system, as in physics we might describe the same local system using different choices of coordinate axes; but the choice of coordinate axes is about local geometry, not nonlocal network topology. Here we're starting with simple descriptions of local systems, and then taking the quotient under the equivalence induced by the bookkeeping operations. It's tempting to think of the pre-quotient simple descriptions as "classical" and analogize the quotient to "quantum", but in fact there is a second quotient to be taken. In the metatime proof-of-concept, the rewriting kept reshuffling the entire history of the experiment until it reached a steady state — the obvious analogy is to a calculus irreducible term, the final result of the operational rewrite relation of the calculus. And this, at last, is where Church-Rosser-ness comes in. Church-Rosser is what guarantees that the same irreducible form (if any) is reached no matter in what order operational rules are applied. It's the enabler for each individual state of the system to evolve deterministically. To emphasize this point: Church-Rosser-ness applies to an individual possible system state, thus belongs to the deterministic-foundations approach to probabilities. The probability distribution itself is made up of individual possibilities each one of which is subject to Church-Rosser-ness. (Church-Rosser-ness is also, btw, a property of the mathematical model: one doesn't ask "Why should these different paths of system state evolution all come back together to the same normal form?", because that's simply the kind of mathematical model one has chosen to explore.)

The question we're trying to get a handle on is why the nonlocal effects of some operational rules would appear to be especially compatible with the bookkeeping quotient of the local geometry.

Side-effects

In vau-calculi, the nonlocal operational effects (i.e., operational substitution functions) that do not integrate smoothly with bookkeeping (i.e., with α-renaming) are the ones that support side-effects; and the one nonlocal operational effect that does integrate smoothly with bookkeeping — β-substitution — supports partial evaluation and turns out to be optional, in the sense that the operational semantics of the system could be described without that kind of nonlocal effect and the mathematics would still be correct with merely a weaker equational theory.

This suggests that in physics, gravity could be understood without bringing nonlocal effects into it at all, though there may be some sort of internal mathematical advantage to bringing them in anyway; while the other forces may be thought of as, in some abstract sense, side-effects.

So, what exactly makes a side-effect-ful substitution side-effect-ful? Conversely, β-substitution is also a form of substitution; it engages the nonlocal network topology, reshuffling it by distributing copies of subterms, the sort of thing I speculated above may be needed to maintain the unpredictability aspect of quantum mechanics. So, what makes β-substitution not side-effect-ful in character? Beyond the very specific technicalities of β- and α-substitution; and just how much, or little, should we be abstracting away from those technicalities? I'm supposing we have to abstract away at least a bit, on the principle that physics isn't likely to be technically close to vau-calculi in its mathematical details.

Here's a stab at a sufficient condition:

A nonlocal operational effect is side-effect-ful just if it perturbs the pre-existing local geometry.

The inverse property, called "purity" in a programming context (as in "pure function"), is that the nonlocal operational effect doesn't perturb the pre-existing local geometry. β-substitution is pure in this sense, as it replaces a free variable-instance with a subterm but doesn't affect anything local other than the variable-instance itself. Contrast this with the operational substitution for control variables; the key clauses (that is, nontrivial base cases) of the two substitutions are

x[x ← T] → T .
(τx.T)[x ← C] → (τx.C[T[x ← C]]) .

The control substitution alters the local-geometric distance between pre-existing structure T and whatever pre-existing immediate context surrounds the subterm acted on. Both substitutions have the — conjecturally important — property that they substantially rearrange the nonlocal network topology by injecting arbitrary new network connections (that is, new free variables). The introduction of new free variables is a major reason why vau-calculi need bookkeeping to maintain hygiene; although, interestingly, it's taken all this careful reasoning about bookkeeping to conclude that bookkeeping isn't actually necessary to the notion of purity/impurity (or side-effect-ful/non-side-effect-ful); apparently, bookkeeping is about perturbations of the nonlocal network topology, whereas purity/impurity is about perturbations of the local geometry. To emphasize the point, one might call this non-perturbation of local geometry co-hygiene — all the nonlocal operational effects must be hygienic, which might or might not require bookkeeping depending on internals of the mathematics, but only the β- and gravity nonlocal effects are co-hygienic.

Co-hygiene

Abstracting away from how we got to it, here's what we have:

A complex system of parts, evolving through a Church-Rosser transformation step relation.
Interconnections within a system state, partitioned into local (geometry) and nonlocal (network).
Each transformation step is selected locally.
The nonlocal effects of each transformation step rearrange — scramble — nonlocal connections at the locus where applied.
Certain transformative operations have nonlocal effects that do not disrupt pre-existing local structure — that are co-hygienic — and thereby afford particularly elegant description.

What sort of elegance is involved in the description of a co-hygienic operation depends on the technical character of the mathematical model; for β-reduction, what we've observed is functional compatibility between β- and α-substitution, while for gravity we've observed the general-relativity integration between gravity and the geometry of spacetime.

So my proposed answer to the conundrum I've been pondering is that the affinity between gravity and geometry suggests a modeling strategy with a nonlocal network pseudo-randomly scrambled by locally selected operational transformations evolving toward a stable state of spacetime, in which the gravity operations are co-hygienic. A natural follow-on question is just what sort of mathematical machinery, if any, would cause this network-scrambling to approximate quantum mechanics.

On the side, I've got intimations here that quantum mechanics may be an approximation induced by the pseudo-random network scrambling when the system under study is practically infinitesimal compared to the cosmos as a whole, and perhaps that the network topology has a rotational aspect.

Meanwhile, an additional line of possible inquiry has opened up. All along I've been trying to figure out what the analogy says about physics; but now it seems one might study the semantics of a possibly-side-effect-ful program fragment by some method structurally akin to quantum mechanics. The sheer mathematical perversity of quantum mechanics makes me skeptical that this could be a practical approach to programming semantics; on the other hand, it might provide useful insights for the TOE mathematics.

Epilog: hygiene

So, what happened to hygiene? It was a major focus of attention through nearly the whole investigation, and then dropped out of the plot near the end.

At its height of prestige, when directly analogized to spacetime geometry (before that fell through), hygiene motivated the suggestion that geometry might be thought of not as the venue where physics happens but merely as part of its rules. That suggestion is still somewhat alive since the proposed solution treats geometry as abstractly just one of the two forms of interconnection in system state, though there's a likely asymmetry of representation between the two forms. There was also some speculation that understanding hygiene on the physics side could be central to making sense of the model; that, I'd now ease off on, but do note that in seeking a possible model for physics one ought to keep an eye out for a possible bookkeeping mechanism, and certainly resolving the nonlocal topology of the model would seem inseparable from resolving its bookkeeping. So hygiene isn't out of the picture, and may even play an important role; just not with top billing.

Would it be possible for the physics model to be unhygienic? In the abstract sense I've ended up with, lack of hygiene would apparently mean an operation whose local effect causes nonlocal perturbation. Whether or not dynamic scope qualifies would depend on whether dynamically scoped variables are considered nonlocal; but since we expect some nonlocal connectivity, and those variables couldn't be perturbed by local operations without losing most/all of their nonlocality, probably the physics model would have to be hygienic. My guess is that if a TOE actually tracked the nonlocal network (as opposed to, conjecturally, introducing a quantum "blurring" as an approximation for the cumulative effect of the network), the tracking would need something enough like calculus variables that some sort of bookkeeping would be called for.

Schools of artlanging

2016-01-31T15:31:00.000-08:00

Its [conlanging's] development to perfection must none the less certainly be prevented by its solitariness, the lack of interchange, open rivalry, study or imitation of others' technique.
"A Secret Vice", J.R.R. Tolkien, circa 1931.
The important thing is that conlanging start to have a critical apparatus within which the artistic merits of conlangs can be evaluated and where different schools of thought can define and defend themselves. [...] I therefore announce the founding of the Naturalist school of conlanging, which regards the following three things as values: [...]
— "Lighting Some Flames: Towards conlang artistry" (commonly known as "The Artlanger's Rant"), Jesse Bangs, Conlang Mailing List, March 12, 2002.
If naturalistic conlanging is the equivalent of realist painting, then what is an impressionist conlang? A surrealist conlang? What's the conlang equivalent of Guernica? Who is the conlanging equivalent of Gauguin or Dalí?
— The Art of Language Invention (Penguin Books, 2015), David J. Peterson, p. 264.

I've some things to say here about conlangs as works of art, with particular attention to the relationship between artist and audience.

The special occasion for this is that I've recently read David J. Peterson's new book The Art of Language Invention. It's kind of mind-boggling to realize how far conlanging has come since the term was coined 25 years ago, as isolated practitioners of Tolkien's "secret vice" found each other through the internet and began to draw together into a community. Peterson, of course, has played a central part in the conlanging community during its emergence from the shadows; he was involved in the establishment of the annual Language Creation Conference series and the Language Creation Society, and is probably most broadly recognized atm as the architect of the Dothraki language for the HBO series Game of Thrones.

Contents
Artlang appreciation
Small stuff
Grammar
Software
Schools

Artlang appreciation

Early in the Postscript of his book, Peterson asks "how an audience appreciates a conlang" (p. 260). Seems to me there are two ways: by observing its use — spoken or written — and by studying its description. (Descriptions can include examples of use, of course, and it seems awfully common for critiques of conlang descriptions to either praise the plentiful examples or complain that there aren't enough of them.) Peterson expresses doubts about studying a reference grammar, on grounds that doing so takes an awful lot of work. On the other hand, against the former method, way back in his Introduction (p. 6) he suggests that if the actors made a mistake speaking a conlang in a TV program, only about a tenth of a percent of viewers would notice.

I think the one-in-a-thousand number gives something of a false impression of how much benefit works of fiction can accrue from using conlangs. The number may be in the right ballpark, but all Peterson actually claimed for it was that it was how many viewers consciously notice a specific mistake in using a conlang. One might ask, instead, how many viewers would subliminally notice if the alien (in its broadest sense) speech were faked with nonsense rather than using a conlang.

Granted, the benefits of using a conlang may be amplified in written fiction, because there alien speech appears in a very stable form inviting close and protracted scrutiny (with the complicating factor that it'll almost certainly have to appear in a romanized form). For alien speech/writing in a TV or movie setting, though, the more of it there is the harder it is to produce an air of authenticity if there's really no language behind it. (Peterson does allude to this effect in passing on page 2.)

Much has been made of the use, in Star Wars: The Force Awakens, of physical models in preference to computer-generated imagery (CGI). I recall one of the actors remarking about this (all part of the pre-release publicity, of course) that it's possible to give a more authentic performance when interacting with a model that's really there than when pretending to interact in a blue room with something that isn't there. The difference, both indirectly through the actors' performances and directly through the feel of the set, is meant to make the movie seem real to an audience subliminally. I think the same principle applies to use of conlangs: for a truly authentic feel there has to be a language.

Studying a reference grammar is a lot of work. One thing we should be asking, in that regard, is how to make descriptions of language more accessible. In fact it's not at all clear how a language ought to be described even for linguistic purposes, a shortcoming likely to become gradually clearer to a conlanger as they get into the more advanced levels of the subject. For me the first major warning sign came with studying "The Blue Bird of Ergativity" (which I was set onto by the Conlangery Podcast) — a 2005 linguistics paper arguing, rather convincingly imho, that ergativity isn't really a thing, in the same way that blueness isn't a thing for birds: a completely superficial property with no biological significance. Before long I'd lost faith in things like the dative case. Keep in mind, natural language translation has been just a few years away for the past half century; it's a dreadfully hard problem, which seems to require a thinking mind who actually understands the text being translated and is fluent in both languages. Describing a language seems unlikely to be less difficult than translation.

All of this seems to me to imply another, deeper question, though. From How do we describe a language? we're led to What does it mean to construct a language?, and then we find that the real starting point should be What is a language?. A living natlang lives in the collective understanding of its speakers. An extinct natlang is presumably defined by the past collective understanding of its speakers, which we try to reconstruct. For a conlang? There's a good chance a conlang mightn't be spoken fluently even by its creator; so either the conlang is its description, or the conlang is some sort of natural resonance point that its description tries to lead us to.

Small stuff

One early passage in the book stirred a minor pet peeve of mine. On page 34, expanding on a remark that IPA represents sound independent of spelling, he goes off on a several-sentence rant about the absurdity of English spelling. As a fluent speaker and writer of English, I find his ridicule of English spelling perfectly reasonable (and I enjoyed the way he said it). What's striking to me, though, is that he has nothing similar to say, afaics, against IPA (the International Phonetic Alphabet). Presumably because he's a very highly trained linguist. It's possible to get so used to almost anything, that one stops noticing its oddness; and it seems as if Peterson may have done so with the IPA. Avoiding that fate myself, I've retained my awareness that the IPA is utterly insane.

There is, of course, no likely cure for the IPA, for pretty much the same reason there's no likely cure for English spelling. The ASCII-only alternative to the IPA, X-SAMPA (with its variant CXS), has the advantage of being ASCII-only and essentially nothing else to recommend it. I exaggerate less than one might think.

The prospects for getting an alternative adopted certainly wouldn't stop a conlanger from trying to design such a thing. So far my own efforts aren't worth blogging about. If they ever reach the level of being interesting, perhaps I will blog about them. In order to actually be useful for presenting conlangs, they'd have to be significantly easier than IPA for a non-linguist to understand, which seems a very high bar indeed given the challenge a non-linguist will face no matter what the notation (but this high bar also offers some small degree of guidance for efforts in that direction).

Another low-level point that bugs me is glossing, where an example of the conlang is juxtaposed with its translation on a separate line, and between them is a gloss explaining the features of each conlang element. Here I think the bar for improving utility is considerably lower. Glossing, like the examples it's attached to, is encouraged by conlang critiques. I used glossing myself in my blog post a while back on Lamlosuo, e.g.

losutu li susua

V.STR
speak SUB
CUR NEUT
sleep

the sometime-speaker sleeps

The trouble here may even be clear to a linguist (whereas a linguist probably wouldn't see the problem if the language being glossed were less peculiar). What do those abbreviations stand for? Hint: NEUT does stand for neutral, but V doesn't stand for verb, and SUB doesn't stand for subject. Even I do double-takes on STR, which in the circles I move in would ordinarily stand for string (but not in this gloss). Linguistics documents tend to have a list somewhere of abbreviations used; and that's awful for conlanging, because memorizing those stupid abbreviations is one more obstacle to learning about the language, which is the end goal. Your target audience for a conlang description isn't limited to professional linguists who consider learning a whole new system of analysis for this new language a more-or-less normal part of their profession. I actually regretted, after the fact, having indulged common practice by introducing gloss abbreviations; the above should probably have been something more like

losutu li susua

volitional
start
speak subordinate
cursor

neutral
sleep

the sometime-speaker sleeps

It would also be great for non-linguists if everything had sound files attached so you could hear what all the examples are supposed to sound like; but that is so difficult (it's not uncommon to create a conlang you can't yourself pronounce) one never hears complaints when it isn't done — just praise when it is well done.

Grammar

Almost all conlangs have reference grammars; it's hard to define a conlang without a grammar, although I recall hearing of a collaborative project in which people proposed sentences in the language and the community voted on which ones to allow. However, defining a conlang via a reference grammar has, in my experience fwiw, three major drawbacks to keep in mind.

First, grammars as used for natlangs are, or should be, descriptive rather than prescriptive. The classificatory schemes of such grammars are analytic rather than synthetic: even though a reference grammar is meant to describe the language (whereas a pedagogical grammar teaches its use), still it seeks to tell how the language works rather than why. This is close to the point mentioned earlier about the superficiality of ergativity. If such classifications are indeed superficial, someone carefully studying a natlang based on the reference grammar may still manage to glom on to the deeper nature of the language by using the grammar as a jumping-off point; but a conlang defined thus superficially may actually lack that deeper nature that its author, if pursuing naturalism, would like to have.

Second, grammars may tend to make languages seem much more orderly than they are in practice, especially spoken practice. Anyone who has transcribed an oral presentation is aware of this: we speak haltingly, with all sorts of false starts, typically with lots of low-level discourse particles apparently serving to help the listener navigate through the speaker's on-the-fly revisions in assembling their syntax (in English, um and er might be used for this, varying by dialect). Perhaps this would be mitigated in fiction, both written and on-screen, because dialog would tend to be somewhat unrealistically orderly anyway for the sake of exposition; but even so, fully explaining a paragraph of fluently written English (or whatever is the primary natlang of the work) tends to involve some remarkably esoteric grammatical gymnastics.

Third and likely most problematic, grammars tend to describe things in terms of a standard set of principles for organizing sentences and clauses. Why is that a problem? Because very often the organization of sentences and clauses isn't the point, but merely a tool used in service of some other purpose — often a purpose that can only be properly appreciated by studying longer narratives and discourses.

Consider verb–argument alignment. The point, apparently, is to specify which arguments to a verb play what part in the verb. Most commonly you have nominative-accusative alignment, where the agent of a transitive verb is marked the same as the subject of an intransitive verb, while the transitive patient is marked differently. Alternatively there's ergative-absolutive alignment, where transitive patient is marked as intransitive subject while transitive agent is marked differently. Why this difference matters may vary from language to language, but often has partly to do with the way verbs are valency-reduced — as with the passive voice in nominative-accusative languages, anti-passive in ergative-absolutive — when fitting together larger narratives. But then there's Tagalog, the dominant language of the Philippines. Tagalog uses Austronesian alignment, which means there's marking on the verb indicating what system is used to mark the arguments, with a variety of options available — and the reason for doing so has to do with specificity. That's right, specificity. How specific a noun is. Don't expect to get a really good explanation of how this works in Tagalog, either, because as best I can tell, as of now the linguistic community hasn't altogether figured it out yet.

What does seem clear to me, though, is that alignment in Tagalog is a particular instance where understanding what the feature is being used for is essential to making sense out of it (linguistically, at least; there's no need to consciously understand it in order to use it, otherwise Tagalog native speakers would simply explain it to the linguists and it would all be settled). That is, figuring out which argument does what in the verb isn't really the "point" of alignment, it's just the mechanical act we use to delineate which facets of the language we've chosen to put in the box labeled "alignment". I suspect some degree of the same effect applies to many of these seemingly low-level language features: their apparent mechanical operation isn't necessarily their primary purpose, and their primary purpose may be impossible to see clearly until one gets up to the narrative scale. (Appeal to the narrative scale is a recurring theme in the Conlangery Podcast, btw, even though they've never actually devoted a podcast to it, as such.)

What we're facing here is the difference between the "universal" — Chomskyist — approach to grammar, and the "functional" approach. In essence, the universal approach supposes that grammar is hardwired into the human brain, so that the shape of language is determined by that wiring, whereas the functional approach supposes that grammar is shaped by the functions people want language to serve. Whatever you think of the philosophies underlying these positions, there's a bunch of things going on in language that tend to get missed by the usual grammatical approach, and the functional approach seems like it ought to provide those missing elements. Except, apparently it doesn't. Functional grammar advocates tend to take the principle to its extreme and look at everything from the narrative perspective. Whatever you think of that philosophically, it doesn't seem very useful for conlanging since it doesn't provide local structure — less a problem for functional grammarians than conlangers, thanks again to the descriptive/prescriptive distinction.

Ideal for conlanging — or so it seems to this conlanger — would be a modified variant form of traditional grammars, with comparable degree of local structure but free of traditional assumptions about purpose and mechanical classification, avoiding implicit bias toward conjectured universal restrictions (conlangs don't have to obey any rules hardwired into the human brain, if there are any, and some conlangs deliberately try not to); and with high-level functionality integrated into it in a way that naturally meshes with the more low-level-mechanical facilities. Both universal and functional grammarians would seem to have philosophical reasons for not trying to construct a grammar like that; but conlangers should have no such disincentive.

Software

It's easy to write software that forces its human users to fit what they're doing into a rigid structure. Doing so can have negative consequences for society, as it reduces users' opportunities to do what human sapience is actually good at; but to the current point, limiting the user's options is definitely counterproductive when what you're trying to support is creative artlanging. It's not so easy to write software that supports the best of what humans can do (creative thinking) together with the best of what computers can do (rote memory and purely mechanical manipulation). That sort of software must by nature be general-purpose and inherently flexible. Two of the best artifacts we have so far in this genre — TeX and wikis — are both plausible candidates to support conlanging, but they both have flaws for the purpose. In brief, TeX is overly technical, wikis are computationally weak.

What we need is the sort of simple basic elements one finds in a wiki; with TeX-like computational power available at need — if the opportunity arises I'd like to try a fexpr-based extension strategy rather than the macro-based strategy shared by TeX and wikis; and an interactive element with enough power to let the computer provide advice and mechanical aid while aggressively preserving the user's flexibility of form as well as substance. Is all that even possible? I suspect so. The limiting factor isn't computational power (we've likely had enough of that since before the word "conlang" was coined) or brute force programming power (throwing more people at a project makes it take longer), but finessing the design of the software. I've got a few lines I'm pursuing, some of which I've been pursuing for decades; note that I've named just two artifacts distributed across a half century of development, an average of less than one every other decade. Patience is indicated.

Schools

Jesse Bangs's notion in the Artlanger's Rant was that Naturalism — which he defined as valuing a naturalistic, complex, creative conlang — would be a "school" of conlanging. There's some history there. The conlanging community that had formed around the CONLANG mailing list in 1991 underwent a schism in 1993, with a subgroup leaving to form the AUXLANG mailing list. You can't very well use an auxlang to unite the world if you can't get everyone to agree on which auxlang is best for that purpose, so it seems natural that the auxlangers were inclined to advocate their favorite auxlang; after the schism, the CONLANG mailing list adopted a policy against language advocacy (though afaict auxlangers who respect the advocacy ban are more encouraged than required to use the other list). When the Artlanger's Rant was written, that schism was still recent community history, and thinking of conlanging in terms of that sort of big goal would have been pretty natural (irony not intended).

I suspect Bangs's proposed definition of a Naturalist school may be a touch heavy on goals and light on means. He does mention complexity/completeness, which is about content; and creativity, i.e. not imitating your own native language, which is sort of a technical criterion. But I'm thinking of the nature of the description; like, the form of the reference grammar. If you're trying for something that evokes naturalism using a very traditional and rather sketchy grammar, might you end up with a sort-of cubist conlang? It seems that might modify the notion of completeness, shifting emphasis to a more abstract side of naturalism; not sure quite how that would work, but the nature of cubism suggests something of the sort; while on the purely technical side it might give a sense of simplification. Perhaps some other non-traditional form of grammar would produce a sort-of impressionist conlang. The goal of course is to nurture the conlangers' art, not to imitate schools of painting; but I do take from these pseudo-examples an encouragement to explore different forms of grammatical description.

What about the next part of Peterson's question? Can a conlang convey a powerful message like that of Guernica? Not to overplay the analogy, but can a conlang come across to its audience as anti-war? Can it carry profound ideas, or strong emotions? By what vector? The mere phonaesthetics of a conlang surely can't carry so much, though it can presumably contribute to it. Perhaps more can be be carried, for those who learn about them, by the vocabulary of the language, and the ideas/attitudes/ways of thinking that the language fosters. Improved lucidity of language description for non-linguists would seem to help here; and ideas/attitudes might be affected less by traditional, relatively low-level mechanical considerations than by the sort of narrative-level things that functional linguists emphasize. At any rate, I suspect that making some headway on the first round of challenges I've been discussing here will put us in a better position to tackle this second round of more abstract, expressive challenges.

Natural intelligence

2015-10-01T07:37:00.000-07:00

[...] man had always assumed that he was more intelligent than dolphins because he had achieved so much — the wheel, New York, wars and so on — while all the dolphins had ever done was muck about in the water having a good time. But conversely, the dolphins had always believed that they were far more intelligent than man — for precisely the same reasons.
— Douglas Adams, The Hitchhiker's Guide to the Galaxy, Chapter 23.

I have a few things I want to say here about the nature of natural, as opposed to artificial, intelligence. While I'm (as usual) open to all sorts of possibilities, the theory of mind I primarily favor atm has a couple of characteristics that I see are contrary to prominent current lines of thought, so I want to try to explain where I'm coming from, and perhaps something of why.

Contents
The sapience engine
Scale
Evolution
Unlikelihood
Sapience

The sapience engine

In a nutshell, my theory is that the (correctly functioning) human brain is a sapience engine — a device that manipulates information (or, stored and incoming signals, if you prefer to put it so) in a peculiar way that gives rise to a sapient entity. The property of sapience itself is a characteristic that arises from the peculiar nature of the manipulation being done; the resulting entity is sapient not because of how much information it processes, but because of how it processes it.

This rather simple theory has some interesting consequences.

Scale

There's an idea enjoying some popularity in AI research these days, that intelligence is just a matter of scale. (Ray Dillinger articulated this view rather well, imho, in a recent blog post, "How Smart is a Smart AI?".) I can see at least four reasons this view has appeal in the current intellectual climate.

Doing computation on a bigger and bigger scale is what we know how to do. Compounding this, those who pursue such a technique are rewarded, both financially and psychologically, for enthusing about the great potential of what they're pursuing. And of course, what we know how to do doesn't get serious competition for attention, because the alternative is stuff we don't know how to do and that doesn't play nearly as well. Better still, by ascribing full sapience to a range of computational power we haven't achieved yet, we absolve ourselves of blame for having not yet achieved full AI. Notwithstanding that just because we know how to do it doesn't mean it has to be able to accomplish what we want it to.
The more complex a computer program gets — the more branching possibilities it encompasses and the bigger the database it draws on — the more effectively it can fool us into seeing its behavior as "like us". (Ray Dillinger's post discusses this.)
The idea that sapience is just a matter of scale appeals to a certain self-loathing undercurrent in modern popular culture. It seems to have become very trendy to praise the cleverness of other species, or of evolution itself, and emphasize how insignificant we supposedly are; in its extreme form this leads to the view that there's nothing at all special about us, we're just another species occupying an ecological niche no more "special" than the niches of various kinds of ants etc. (My subjective impression is that this trend correlates with environmentalism, though even if the correlation is real I'm very wary of reading too much into it. I observe the trend in Derek Bickerton's Adam's Tongue, 2009, which I wouldn't otherwise particularly associate with environmentalism.)
The "scale" idea also gets support from residual elements of the previously popular, opposing view of Homo sapiens as special. Some extraordinary claims are made about what a vast amount of computing power is supposedly possessed by the human brain — evidently supposing that the human brain is actually doing computation, exploiting the computational potential of its many billions of neurons in a computationally efficient way. As opposed to, say, squandering that computational potential in an almost inconceivably inefficient way in order to do something that qualitatively isn't computation. The computational-brain idea also plays on the old "big brain" idea in human evolution, which supposed that the reason we're so vastly smarter than other primates is that our brains are so much bigger. (Terrance Deacon in The Symbolic Species, 1997, debunking the traditional big-brain idea at length, notes its appeal to the simplistic notion of the human brain as a computer.)

I do think scale matters, but I suspect its role is essentially catalytic (Deacon also expresses a catalytic view); and, moreover, I suspect that beyond a certain point, bigger starts to degrade sapience rather than enhance it. I see scale coming into play in two respects. As sketched in my previous post on the subject (here), I conjecture the key device is a non-Cartesian theater, essentially short-term memory. There are two obvious size parameters for adjusting this model: the size of the theater, and the size of the audience. I suspect that with too small an audience, the resultant entity lacks efficacy, while with too large an audience, it lacks coherence. Something similar seems likely to apply to theater size; I don't think the classic "seven plus or minus two" size of human short-term memory is at all arbitrary, nor strongly dependent on other constraints of our wetware (such as audience size).

Note that coherent groups of humans, though they represent collectively a good deal more computational potential than a single human, are generally a lot stupider. Committees — though they can sometimes produce quite good results when well-chaired — are notorious for their poor collective skills; "design by committee" is a running joke. A mob is well-described as a mindless beast. Democracy succeeds not because it necessarily produces brilliant results but because it resists the horrors of more purely centralized forms of government. Wikipedia, the most spectacular effort to date to harness the wisdom of the masses, rather thoroughly lacks wisdom, being prone to the vices of both committees and mobs. (Do I hate Wikipedia? No. I approve deeply of some of the effects it has had on the world, while deploring others; that's a complicated subject for another time.) One might suppose that somehow the individual people in a group are acting a bit like neurons (or some mildly larger unit of brain structure), and one would need a really big group of people before intelligence would start to reemerge, but honestly I doubt it. Once you get past a "group of one", the potential intelligence of a group of people seems to max out at a well-run committee of about six, and I see no reason to think it'll somehow magically reemerge later. Six, remember, is roughly the size of short term memory, and I've both wondered myself, and heard others wonder, if this is because the individuals on the committee each have short term memories of about that size; but as an alternative, I wonder if, just possibly, the optimal committee size is not so much an echo of the size of the non-Cartesian theater as a second example of the same deep phenomenon that led the non-Cartesian theater to have that size in the first place.

Evolution

As might be guessed from the above, since my last blog post on this subject I've been reading Derek Bickerton's Adam's Tongue (2009) and Terrance Deacon's The Symbolic Species (1997, which was recommended to me by a commenter on my earlier post). Both have a fair amount to say about Noam Chomsky, mostly in the nature of disagreement with Chomsky's notion of a universal language instinct hardwired into the brain.

But it struck me, repeatedly throughout both books, that despite Deacon's disagreements with Chomsky and Bickerton's disagreements with Deacon and Chomsky, all three were in agreement that communication is the essence of the human niche, and sapience is an adjunct to it. I wondered why they thought that, other perhaps than two of them being linguists and therefore inclined to see their own subject in whatever they look at (which could as well explain why I look at the same things and see an algorithm). Because I don't altogether buy into that linguistic assumption. They seem to be dismissing a possibility that imho is worth keeping in play for now.

There's a word I picked up from Deacon: exaptation. Contrasting with adaptation by changing prefix ad- to ex-. The idea is that instead of a species feature developing as an adaptation for a purpose that the species finds beneficial, the feature develops for some other purpose and then, once available, gets exapted for a different purpose. The classic example is feathers, which are so strongly associated with flight now, that it's surprising to find they were apparently exapted to that purpose after starting as an adaptation for something else (likely, for insulation).

So, here's my thought. I've already suggested, in my earlier post, that language is not necessary to sapient thought, though it does often facilitate it and should naturally arise as a consequence of it. What if sapience was exapted for language after originally developing for some other purpose?

For me, the central question for the evolution of human sapience is why it hadn't happened before. One possible answer is, of course, that it had happened before. I'm inclined to think not, though. Why not? Because we're leaving a heck of a big mark on the planet. I'm inclined to doubt that some other sapient species would have been less capable or, collectively, more wise; so it really seems likely to me that if this had happened before we might have noticed. (Could it have happened before without our noticing? Yes, but as I see it Occam's Razor doesn't favor that scenario.)

To elaborate this idea — exaptation of sapience for language — and to put it into perspective with the alternatives suggested by Deacon and Bickerton, I'll need to take a closer look at how an evolutionary path might happen to be extremely unlikely.

Unlikelihood

Evolution works by local search in the space of possible genetic configurations: imagining a drastically different design is something a sapient being might do, not something evolution would do. At any given point in the process, there has to be a currently extant configuration from which a small evolutionary step can reach another successful configuration. Why might a possible target configuration (or a family of related ones, such as "ways of achieving sapience") be unlikely to happen? Two obvious contributing factors would be:

the target is only successful under certain external conditions that rarely hold, so that most of time, even if an extant species were within a short evolutionary step of the target, it wouldn't take that step because it wouldn't be advantageous to do so.
there are few other configurations in the close neighborhood of the target, so that it's unlikely for any extant species to come within a short evolutionary step of the target.

To produce an unusually unlikely evolutionary target, it seems we should combine both of these sorts of factors. So suppose we have an evolutionary target that can only be reached from a few nearby configurations, and consider how a species would get to one of these nearby configurations. The species must have gotten there by taking a small evolutionary step that was advantageous at the time, right? Okay. I submit that, if the target is really to be hard to get to, we should expect this small step to a nearby configuration to have a different motivation than the last step to the target. This is because, if these two consecutive steps were taken for the same reason, then at any time a species takes the penultimate step, at that same time the last step should also be viable. That would cancel out our supposition that the target is only occasionally viable: any time it's viable to get to the nearby configuration, it's also viable to continue on to the target. In which case, for the target to be really difficult to get to, its nearby neighbors would have to be difficult to get to, merely pushing the problem of unlikelihood back from the target to its nearby neighbors, and we start over with asking why its neighbors are unlikely.

In other words, in order for the evolutionary target to be especially unlikely, we should expect it to be reached by an exaptation of something from another purpose.

(I'm actually unclear on whether or not feathers may be an example of this effect. Clearly they aren't necessary for flight, witness flying insects and bats; but without considerably more biological flight expertise, I couldn't say whether there are technical characteristics of feathered flight that have not been achieved by any other means.)

Sapience

This is one reason I'm doubtful of explaining the rarity of sapience while concentrating on communication. Lots of species communicate; so if sapience were an adaptation for communication, why would it be rare? Bickerton's book proposes a specific niche calling for enhanced communication: high-end scavenging, where bands of humans must cooperate to scavenge the remains of dead megafauna. Possible, yes; but I don't feel compelled by it. The purpose — the niche — doesn't seem all that unlikely to me.

Deacon's book proposes a more subtle, though seemingly closely related, purpose. Though less specific about the exact nature of the niche being occupied — which could be high-end scavenging, or perhaps group hunting — he suggests that in order to exploit the niche, hominins had to work together in large bands containing multiple mated pairs. This is tricky, he says, because in order for these group food-collection expeditions to be of major benefit to the species, those who go on the expeditions must find it in their own genetic self-interest to share the collected food with the stay-at-home nurturers and young. He discusses different ways to bring about the required self-interest motive; but evolution works, remember, in small steps, so not all of these strategies would be available for our ancestors. He suggests that the strategy they adopted for the purpose — the strategy, we may suppose, that was reachable by a small evolutionary step — was to have each mated pair enter into a social contract, essentially a marriage arrangement, in which the female agrees to mate only with a particular male in exchange for receiving food from that male's share. The arrangement holds together so long as they believe each other to be following the rules, and this requires intense communication between them plus sophisticated reasoning about each other's future behavior.

I do find (to the extent I understand them) Deacon's scenario somewhat more plausible than Bickerton's, in that it seems to provide more support for unlikelihood. Under Bickerton, a species tries to exploit a high-end scavenging niche, and the available solution to the coordination problem is proto-language. (He describes various other coordination techniques employed by bees and ants.) Under Deacon, a species tries to exploit a high-end scavenging-or-hunting niche, and the available solution to the cooperation problem is a social contract supported by symbolic thought. In either scenario, the species is presented with an opportunity that it can only exploit with an adaptation. For this to support unlikelihood, the adaptation has to be something that under most circumstances would not have been the easiest small-step solution to the challenge. Under Bickerton, the configuration of the species must make proto-language the closest available solution to the coordination problem. Under Deacon, the configuration of the species must make symbolic thinking the closest available solution to the cooperation problem. This is the sense in which, as I said, I find Deacon's scenario somewhat more plausible.

However, both scenarios seem to me to be missing something important. Both of them are centrally concerned with identifying a use (coordination per Bickerton, cooperation per Deacon) to which the new feature is to be put: they seek to explain the purpose of an adaptation. By my reasoning above, though, either the target of this adaptation should be something that's almost never the best solution for the problem, or the target should only be reachable if, at the moment it's wanted, some unlikely catalyzing factor is already present in the species (thus, available for exaptation). Or, of course, both.

From our end of human evolution, it seems that sapience is pretty much infinitely versatile, and so ought to be a useful adaptation for a wide variety of purposes. While this may be so, when conjecturing it one should keep in mind that if it is so, then sapience should be really difficult to achieve in the first place — because if it were both easy to achieve and useful for almost everything, one would expect it to be a very common development. The more immediately useful it is once achieved, the more difficult we'd expect it to be to achieve in the first place. I see two very plausible hypotheses here:

Sapience itself may be an adaptation for something other than communication. My previous post exploring sapience as a phenomenon (here) already suggested that sapience once achieved would quickly be exapted for communication. My previous posts regarding verbal culture (starting here) suggest that language, once acquired, may take some time (say, a few million years) to develop into a suitable medium for rapid technological development; so the big payoff we perceive from sapience would itself be a delayed exaptation of language, not contributing to its initial motivation. Deacon suggests there are significant costs to sapience, so that its initial adoption has to have a strong immediate benefit.
Sapience may require, for its initial emergence, exaptation of some other relatively unlikely internal feature of the mind. This calls for some deep mulling over, because we don't have at all a firm grasp of the internal construction of sapience; we're actually hoping for clues to the internal construction from studying the evolutionary process, which is what we're being offered here if we can puzzle it out.

Putting these pieces together, I envision a three-step sequence. First, hominin minds develop some internal feature that can be exapted for sapience if that becomes sufficiently advantageous to overcome its costs. Second, an opportunity opens up, whereby hominin communities have a lot to gain from group food-collection (be it scavenging or hunting), but to make it work requires sophisticated thinking about future behavior, leading to development of sapience. The juxtaposition of these first two steps being the prime source of unlikelihood. I place no specific requirement on how sapience is applied to the problem; I merely suppose that sapience (symbolic thinking, as Deacon puts it) makes individuals more able to realize that cooperating is in their own self-interest, and doing so is sufficiently advantageous to outweigh the costs of sapience, therefore genes for sapience come to dominate the gene pool. Third, as sapience becomes sufficiently ubiquitous in the population, it is naturally exapted for language, which then further plays into the group cooperation niche as well as synergizing with sapience more broadly. At this point, I think, the process takes on an internal momentum; over time, our ancestors increasingly exploit the language niche, becoming highly optimized for it, and the benefits of language continue to build until they reach critical mass with the neolithic revolution.

Thinking outside the quantum box

2015-06-28T23:01:00.000-07:00

Doctor: Don't tell me you're lost too.
Shardovan: No, but as you guessed, Doctor, we people of Castrovalva are too much part of this thing you call the occlusion.
Doctor: But you do see it, the spatial anomaly.
Shardovan: With my eyes, no — but, in my philosophy.
— Doctor Who, Castrovalva, BBC.

I've made no particular secret, on this blog, that I'm looking (in an adventuresome sort of way) for alternatives to quantum theory. So far, though, I've mostly gone about it rather indirectly, fishing around the edges of the theory for possible angles of attack without ever engaging the theory on its home turf. In this post I'm going to shave things just a bit closer — fishing still, but doing so within line-of-sight of the NO FISHING sign. I'm also going to explain why I'm being so indirect, which bears on what sort of fish I think most likely here.

To remind, in previous posts I've mentioned two reasons for looking for an alternative to quantum theory. Both reasons are indirect, considering quantum theory in the larger context of other theories of physics. First, I reasoned that when a succession of theories are getting successively more complicated, this suggests some wrong assumption may be shared by all of them (here). Later I observed that quantum physics and relativity are philosophically disparate from each other (here), a disparity that has been an important motivator for TOE (Theory of Everything) physicists for decades.

The earlier post looked at a few very minor bits of math, just enough to derive Bell's Inequality, but my goal was only to point out that a certain broad strategy could, in a sense, sidestep the nondeterminism and nonlocality of quantum theory. I made no pretense of assembling a full-blown replacement for standard quantum theory based on the strategy (though some researchers are attempting to do so, I believe, under the banner of the transactional interpretation). In the later post I was even less concrete, with no equations at all.

Contents
The quantum meme
How to fish
Why to fish
Hygiene again
The structure of quantum math
The structure of reality

The quantum meme

Why fish for alternatives away from the heart of the quantum math? Aside, that is, from the fact that any answers to be found in the heart of the math already have, presumably, plenty of eyeballs looking there for them. If the answer is to be found there after all, there's no lasting harm to the field in someone looking elsewhere; indeed, those who looked elsewhere can cheerfully write off their investment knowing they played their part in covering the bases — if it was at least reasonable to cover those bases. But going into that investigation, one wants to choose an elsewhere that's a plausible place to look.

Supposing quantum theory can be successfully challenged, I suggest it's quite plausible the successful challenge might not be found by direct assault (even though eventual confrontation would presumably occur, if it were really successful). Consider Thomas Kuhn's account of how science progresses. In normal science, researchers work within a paradigm, focusing their energies on problems within the paradigm's framework and thereby making, hopefully, rapid progress on those problems because they're not distracting themselves with broader questions. Eventually, he says, this focused investigation within the paradigm highlights shortcomings of the paradigm so they become impossible to ignore, researchers have a crisis of confidence in the paradigm, and after a period of distress to those within the field, a new paradigm emerges, through the process he calls a scientific revolution. I've advocated a biological interpretation of this, in which sciences are a variety of memetic organisms, and scientific revolution is the organisms' reproductive process. But if this is so, then scientific paradigms are being selected by Darwinian evolution. What are they being selected for?

Well, the success of science hinges on paradigms being selected for how effectively they allow us to understand reality. Science is a force to be reckoned with because our paradigms have evolved to be very good at helping us understand reality. That's why the scientific species has evolved mechanisms that promote empirical testing: in the long run, if you promote empirical testing and pass that trait on to your descendants, your descendants will be more effective, and therefore thrive. So far so good.

In theory, one could imagine that eventually a paradigm would come along so consistent with physical reality, and with such explanatory power, that it would never break down and need replacing. In theory. However, there's another scenario where a paradigm could get very difficult to break down. Suppose a paradigm offers the only available way to reason about a class of situations; and within that class of situations are some "chinks in the armor", that is, some considerations whose study could lead to a breakdown of the paradigm; but the only way to apply the paradigm is to frame things in a way that prevents the practitioner from thinking of the chinks-in-the-armor. The paradigm would thus protect itself from empirical attack, not by being more explanatory, but by selectively preventing empirical questions from being asked.

What characteristics might we expect such a paradigm to have, and would they be heritable? Advanced math that appears unavoidable would seem a likely part of such a complex. If learning the subject requires indoctrination in the advanced math, then whatever that math is doing to limit your thinking will be reliably done to everyone in the field; and if any replacement paradigm can only be developed by someone who's undergone the indoctrination, that will favor passing on the trait to descendant paradigms. General relativity and quantum theory both seem to exhibit some degree of this characteristic. But while advanced math may be an enabler, it might not be enough in itself. A more directly effective measure, likely to be enabled by a suitable base of advanced math, might be to make it explicitly impossible to ask any question without first framing the question in the form prescribed by the paradigm — as quantum theory does.

This suggests to me that the mathematical details of quantum theory may be a sort of tarpit, that pulls you in and prevents you from leaving. I'm therefore trying to look at things from lots of different perspectives in the general area without ever getting quite so close as to be pulled in. Eventually I'll have to move further and further in; but the more outside ideas I've tied lines to before then, the better I'll be able to pull myself out again.

How to fish

What I'm hoping to get out of this fishing expedition is new ideas, new ways of thinking about the problem. That's ideas, plural. It's not likely the first new idea one comes up with will be the key to unlocking all the mysteries of the universe. It's not even likely that just one new idea would ever do it. One might need a lot of new ideas, many of which wouldn't actually be part of a solution — but the whole collection of them, including all the ones not finally used, helps to get a sense of the overall landscape of possibilities, which may help in turning up yet more new ideas inspired from earlier ones, and indeed may make it easier to recognize when one actually does strike on some combination of ideas that produce a useful theory.

Hence my remark, in an aside in an earlier post, that I'm okay with absurd as long as it's different and shakes up my thinking.

Case in point. In the early 1500s, there was this highly arrogant and abrasive iconoclastic fellow who styled himself Philippus Aureolus Theophrastus Bombastus von Hohenheim; ostensibly our word "bombastic" comes from his name. He rejected the prevailing medical paradigm of his day, which was based on ancient texts, and asserted his superiority to the then-highly-respected ancient physician Celsus by calling himself "Paracelsus", which is the name you've probably heard of him under. He also shook up alchemical theory; but I mention him here for his medical ideas. Having rejected the prevailing paradigm, he was rather in the market for alternatives. He advocated observing nature, an idea that really began to take off after he shook things up. He advocated keeping wounds clean instead of applying cow dung to them, which seems a good idea. He proposed that disease is caused by some external agent getting into the body, rather than by an imbalance of humours, which sounds rather clever of him. But I'm particularly interested that he also, grasping for alternatives to the prevailing paradigm, borrowed from folk medicine the principle of like affects like. Admittedly, you couldn't do much worse than some of the prevailing practices of the day. But I'm fascinated by his latching on to like-effects-like, because it demonstrates how bits of replicative material may be pulled in from almost anywhere when trying to form a new paradigm. Having seen that, it figured later into my ideas on memetic organisms.

It also, along the way, flags out the existence of a really radically different way of picturing the structure of reality. Like-affects-like is a wildly different way of thinking, and therefore ought to be a great limbering-up exercise.

In fact, like-affects-like is, I gather, the principle underlying the anthropological phenomenon of magic — sympathetic magic, it's called. I somewhat recall an anthropologist expounding at length (alas, I wish I could remember where) that anthropologically this can be understood as the principle underlying all magic. So I got to thinking, what sort of mathematical framework might one use for this sort of thing? I haven't resolved a specific answer for the math framework, yet; but I've tried to at least set my thoughts in order.

What I'm interested in here is the mathematical and thus scientific utility of the like-affects-like principle, not its manifestation in the anthropological phenomenon of magic (as Richard Cavendish observed, "The religious impulse is to worship, the scientific to explain, the magical to dominate and command"). Yet the term "like affects like" is both awkward and vague; so I use the term sympathy for discussing it from a mathematical or scientific perspective.

How might a rigorous model of this work, structurally? Taking a stab at it, one might have objects, each capable of taking on characteristics with a potentially complex structure, and patterns which can arise in the characteristics of the objects. Interactions between the objects occur when the objects share a pattern. The characteristics of objects might be dispensed with entirely, retaining only the patterns, provided one specifies the structure of the range of possible patterns (perhaps a lattice of patterns?). There may be a notion of degrees of similarity of patterns, giving rise to varying degrees of interaction. This raises the question of whether one ought to treat similar patterns as sharing some sort of higher-level pattern and themselves interacting sympathetically. More radically, one might ask whether an object is merely an intersection of patterns, in which case one might aspire to — in some sense — dispense with the objects entirely, and have only a sort of web of patterns. Evidently, the whole thing hinges on figuring out what patterns are and how they relate to each other, then setting up interactions on that basis.

I distinguish between three types of sympathy:

Pseudo-sympathy (type 0). The phenomenon can be understood without recourse to the sympathetic principle, but it may be convenient to use sympathy as a way of modeling it.
Weak sympathy (type 1). The phenomenon may in theory arise from a non-sympathetic reality, but in practice there's no way to understand it without recourse to sympathy.
Strong sympathy (type 2). The phenomenon cannot, even in theory, arise from a non-sympathetic reality.

All of which gives, at least, a lower bound on how far outside the box one might think. One doesn't have to apply the sympathetic principle in a theory, in order to benefit from the reminder to keep one's thinking limber.

(It is, btw, entirely possible to imagine a metric space of patterns, in which degree of similarity between patterns becomes distance between patterns, and one slides back into a geometrical model after all. To whatever extent the merit of the sympathetic model is in its different way of thinking, to that extent one ought to avoid setting up a metric space of patterns, as such.)

Why to fish

Asking questions is, broadly speaking, good. A line comes to mind from James Gleick's biography of Feynman (quoted favorably by Freeman Dyson): "He believed in the primacy of doubt, not as a blemish upon our ability to know but as the essence of knowing." Nevertheless, one does have to pick and choose which questions are worth spending most effort on; as mentioned above, the narrow focus of normal scientific research enables its often-rapid progress. I've been grounding my questions about quantum mechanics in observations about the character of the theory in relation to other theories of physics.

By contrast, one could choose to ground one's questions in reasoning about what sort of features reality can plausibly have. Einstein did this when maintaining that the quantum theory was an incomplete theory of the physical world — that it was missing some piece of reality. An example he cited is the Schrödinger's cat thought-experiment: Until observed, a quantum system can exist in a superposition of states. So, set up an experiment in which a quantum event is magnified into a macroscopic event — through a detector, the outcome of the quantum event causes a device to either kill or not kill a cat. Put the whole experimental apparatus, including the cat, in a box and close it so the outcome cannot be observed. Until you open the box, the cat is in a superposition of states, both alive and dead. Einstein reasoned that since the quantum theory alone would lead to this conclusion, there must be something more to reality that would disallow this superposition of cat.

The trouble with using this sort of reasoning to justify a line of research is, all it takes to undermine the justification is to say there's no reason reality can't be that strange.

Hence my preference for motivations based on the character of the theory, rather than the plausibility of the reality it depicts. My reasoning is still subjective — which is fine, since I'm motivating asking a question, not accepting an answer — but at least the reasoning is then not based on intuition about the nature of reality. Intuition specifically about physical reality could be right, of course, but has gotten a bad reputation — as part of the necessary process by which the quantum paradigm has secured its ecological niche — so it's better in this case to base intuition on some other criterion.

Hygiene again

To make sure I'm fully girded for battle — this is rough stuff, one can't be too well armed for it — I want to revisit some ideas I collected in earlier blog posts, and squeeze just a bit more out of them than I did before.

My previous thought relating explicitly to Theories of Everything was that, drawing an analogy with vau-calculi, spacetime geometry should perhaps be viewed not as a playing field on which all action occurs, but rather as a hygiene condition on the interactions that make up the universe. This analogy can be refined further. The role of variables in vau-calculi is coordinating causal connections between distant parts of the term. There are four kinds of variables, but unboundedly many actual variables of each kind; and α-renaming keeps these actual variables from bleeding into each other. A particular variable, though we may think of it as a very simple thing — a syntactic atom, in fact — is perhaps better understood as a distributed, complex-structured entity woven throughout the fabric of a branch of the term's syntax tree, humming with the dynamically maintained hygiene condition that keeps it separate from other variables. It may impinge on a large part of the α-renaming infrastructure, but most of its complex distributed structure is separate from the hygiene condition. The information content of the term is largely made up of these complex, distributed entities, with various local syntactic details decorating the syntax tree and regulating the rewriting actions that shape the evolution of the term. Various rewriting actions cause propagation across one (or perhaps more than one) of these distributed entities — and it doesn't actually matter how many rewriting steps are involved in this propagation, as for example even the substitution operations could be handled by gradually distributing information across a branch of the syntax tree via some sort of "sinking" structure, mirror to the binding structures that "rise" through the tree.

Projecting some of this, cautiously, through the analogy to physics, we find ourselves envisioning a structure of reality in which spacetime is a hygiene condition on interwoven, sprawling complex entities that impinge on spacetime but are not "inside" it; whose distinctness from each other is maintained by the hygiene condition; and whose evolution we expect to describe by actions in a dimension orthogonal to spacetime. The last part of which is interestingly suggestive of my other previous post on physics, where I noted, with mathematical details sufficient to make the point, that while quantum physics is evidently nondeterministic and nonlocal as judged relative to the time dimension, one can recover determinism and locality relative to an orthogonal dimension of "meta-time" across which spacetime evolves.

One might well ask why this hygiene condition in physics should take the form of a spacetime geometry that, at least at an intermediate scale, approximates a Euclidean geometry of three space and one time dimension. I have a thought on this, drawing from another of my irons in the fire; enough, perhaps, to move thinking forward on the question. This 3+1 dimension structure is apparently that of quaternions. And quaternions are, so at least I suspect (I've been working on a blog post exploring this point), the essence of rotation. So perhaps we should think of our hygiene condition as some sort of rotational constraint, and the structure of spacetime follows from that.

I also touched on Theories of Everything in a recent post while exploring the notion that nature is neither discrete nor continuous but something between (here). If there is a balance going on between discrete and continuous facets of physical worldview, apparently the introduction of discrete elementary particles is not, in itself, enough discreteness to counterbalance the continuous feature provided by the wave functions of these particles, and the additional feature of wave-function collapse or the like is needed to even things out. One might ask whether the additional discreteness associated with wave-function collapse could be obviated by backing off somewhat on the continuous side. The uncertainty principle already suggests that the classical view of particles in continuous spacetime — which underlies the continuous wave function (more about that below) — is an over-specification; the need for additional balancing discreteness might be another consequence of the same over-specification.

Interestingly, variables in λ-like calculi are also over-specified: that's why there's a need for α-renaming in the first place, because the particular name chosen for a variable is arbitrary as long as it maintains its identity relative to other variables in the term. And α-renaming is the hygiene device analogized to geometry in physics. Raising the prospect that to eliminate this over-specification might also eliminate the analogy, or make it much harder to pin down. There is, of course, Curry's combinatorial calculus which has no variables at all; personally I find Church's variable-using approach easier to read. Tracing that through the analogy, one might conjecture the possibility of constructing a Theory of Everything that didn't need the awkward additional discreteness, by eliminating the distributed entities whose separateness from each other is maintained by the geometrical hygiene condition, thus eliminating the geometry itself in the process. Following the analogy, one would expect this alternative description of physical reality to be harder to understand than conventional physics. Frankly I have no trouble believing that a physics without geometry would be harder to understand.

The idea that quantum theory as a model of reality might suffer from having had too much put into it, does offer a curious counterpoint to Einstein's suggestion that quantum theory is missing some essential piece of reality.

The structure of quantum math

The structure of the math of quantum theory is actually pretty simple... if you stand back far enough. Start with a physical system. This is a small piece of reality that we are choosing to study. Classically, it's a finite set of elementary things described by a set of parameters. Hamilton (yes, that's the same guy who discovered quaternions) proposed to describe the whole behavior of such a system by a single function, since called a Hamiltonian function, which acts on the parameters describing the instantaneous state of the system together with parameters describing the abstract momentum of each state parameter (essentially, how the parameters change with respect to time). So the Hamiltonian is basically an embodiment of the whole classical dynamics of the system, treated as a lump rather than being broken into separate descriptions of the individual parts of the system. Since quantum theory doesn't "do" separate parts, instead expecting everything to affect everything else, it figures the Hamiltonian approach would be particularly compatible with the quantum worldview. Nevertheless, in the classical case it's still possible to consider the parts separately. For a system with a bunch of parts, the number of parameters to the Hamiltonian will be quite large (typically, at least six times the number of parts — three coordinates for position and three for momentum of each part).

Now, the quantum state of the system is described by a vector over a complex Hilbert space of, typically, infinite dimension. Wait, what? Yes, that's an infinite number of complex numbers. In fact, it might be an uncountably infinite number of complex numbers. Before you completely freak out over this, it's only fair to point out that if you have a real-valued field over three-dimensional space, that's an uncountably infinite number of real numbers (the number of locations in three-space being uncountably infinite). Still, the very fact that you're putting this thing in a Hilbert space, which is to say you're not asking for any particular kind of simple structure relating the different quantities, such as a three-dimensional Euclidean continuum, is kind of alarming. Rather than a smooth geometric structure, this is a deliberately disorganized mess, and honestly I don't think it's unfair to wish there were some more coherent reality "underneath" that gives rise to this infinite structure. Indeed, one might suspect this is a major motive for wanting a hidden variable theory — not wishing for determinism, or wishing for locality, but just wishing for a simpler model of what's going on. David Bohm's hidden variable theory, although it did show one could recover determinism with actual classical particles "underneath", did so without simplifying the mathematics — the mathematical structure of the quantum state was still there, just given a makeover as a potential field. In my earlier account of this bit of history, I noted that Einstein, seeing Bohm's theory, remarked, "This is not at all what I had in mind." I implied that Einstein didn't like Bohm's theory because it was nonlocal; but one might also object that Bohm's theory doesn't offer a simpler underlying reality, rather a more complicated one.

The elements of the vector over Hilbert space are observable classical states of the system; so this vector is indexed by, essentially, the sets of possible inputs to the Hamiltonian. One can see how, step by step, we've ended up with a staggering level of complexity in our description, which we cope with by (ironically) not looking at it. By which I mean, we represent this vast amorphous expanse of information by a single letter (such as ψ), to be manipulated as if it were a single entity using operations that perform some regimented, impersonal operation on all its components that doesn't in general require it to have any overall shape. I don't by any means deride such treatments, which recover some order out of the chaos; but it's certainly not reassuring to realize how much lack of structure is hidden beneath such neat-looking formulae as the Schrödinger equation. And the amorphism beneath the elegant equations also makes it hard to imagine an alternative when looking at the specifics of the math (as suspected based on biological assessment of the evolution of physics).

The quantum situation gets its structure, and its dynamics, from the Hamiltonian, that single creature embodying the whole of the rules of classical behavior for the system. The Schrödinger equation (or whatever alternative plays its role) governs the evolution of the quantum state vector over time, and contains within it a differential operator based on the classical Hamiltonian function.

iℏ ∂ Ψ

∂t

= Ĥ Ψ .

One really wants to stop and admire this equation. It's a linear partial differential equation, which is wonderful; nonlinearity is what gives rise to chaos in the technical sense, and one would certainly rather deal with a linear system. Unfortunately, the equation only describes the evolution of the system so long as it remains a purely quantum system; the moment you open the box to see whether the cat is dead, this wave function collapses into observation of one of the classical states indexing the quantum state vector, with (to paint in broad strokes) the amplitudes of the complex numbers in the vector determining the probability distribution of observed classical states.

It also satisfies James Clerk Maxwell's General Maxim of Physical Science, which says (as recounted by Ludwik Silberstein) that when we take the derivatives of our system with respect to time, we should end up with expressions that do not themselves explicitly involve time. When this is so, the system is "complete", or, "undisturbed". (The idea here is that if the rules governing the system change over time, it's because the system is being affected by some other factor that is varying over time.)

The equation is, indeed, highly seductive. Although I'm frankly on guard against it, yet here I am, being drawn into making remarks on its properties. Back to the question of structure. This equation effectively segregates the mathematical description of the system into a classical part that drives the dynamics (the Hamiltonian), and a quantum part that smears everything together (the quantum state vector). The wave function Ψ, described by the equation, is the adapter used to plug these two disparate elements together. The moment you start contemplating the equation, this manner of segregating the description starts to seem inevitable. So, having observed these basic elements of the quantum math, let us step back again before we get stuck.

The key structural feature of the quantum description, in contrast to classical physics, is that the parts can't be considered separately. This classical separability produced the sense of simplicity that, I speculated above, could be an ulterior motive for hidden variable theories. The term for this is superposition of states, i.e., a quantum state that could collapse into any of multiple classical states, and therefore must contain all of those classical states in its description.

A different view of this is offered by so-called quantum logic. The idea here (notably embraced by physicist David Finkelstein, who I've mentioned in an earlier post because he was lead author of some papers in the 1960s on quaternion quantum theory) is that quantum theory is a logic of propositions about the physical world, differing fundamentally from classical propositional logic because of the existence of superposition as a propositional principle. There's a counterargument that this isn't really a "logic", because it doesn't describe reasoning as such, just the behavior of classical observations when applied as a filter to quantum systems; and indeed one can see that something of the sort is happening in the Schrödinger equation, above — but that would be pulling us back into the detailed math. Quantum logic, whatever it doesn't apply to, does apply to observational propositions under the regime of quantum mechanics, while remaining gratifyingly abstracted from the detailed quantum math.

Formally, in classical logic we have the distributive law

P and (Q or R) = (P and Q) or (P and R) ;

but in quantum logic, (Q or R) is superpositional in nature, saying that we can eliminate options that are neither, yet allowing more than the union of situations where one holds and situations where the other holds; and this causes the distributive law to fail. If we know P, and we know that either Q or R (but we may be fundamentally unable to determine which), this is not the same as knowing that either both P and Q, or both P and R. We aren't allowed to refactor our proposition so as to treat Q separately from R, without changing the nature of our knowledge.

[note: I've fixed the distributive law, above, which I botched and didn't even notice till, thankfully, a reader pointed it out to me. Doh!]

One can see in this broadly the reason why, when we shift from classical physics to quantum physics, we lose our ability to consider the underlying system as made up of elementary things. In considering each classical elementary thing, we summed up the influences on that thing from each of the other elementary things, and this sum was a small tidy set of parameters describing that one thing alone. The essence of quantum logic is that we can no longer refactor the system in order to take this sum; the one elementary thing we want to consider now has a unique relationship with each of the other elementary things in the system.

Put that way, it seems that the one elementary thing we want to consider would actually have a close personal relationship with each other elementary thing in the universe. A very large Rolodex indeed. One might object that most of those elementary things in the universe are not part of the system we are considering — but what if that's what we're doing wrong? Sometimes, a whole can be naturally decomposable into parts in one way, but when you try to decompose it into parts in a different way you end up with a complicated mess because all of your "parts" are interacting with each other. I suggested, back in my first blog post on physics, that there might be some wrong assumption shared by both classical and quantum physics; well, the idea that the universe is made up of elementary particles (or quanta, whatever you prefer to call them) is something shared by both theories. The quantum math (Schrödinger equation again, above) has this classical decomposition built into its structure, pushing us to perceive the subsequent quantum weirdness as intrinsic to reality, or perhaps intrinsic to our observation of reality — but what if it's rather intrinsic to that particular way of slicing off a piece of the universe for consideration?

The quantum folks have been insisting for years that quantum reality seems strange only because we're imposing our intuitions from the macroscopic world onto the quantum-scale world where it doesn't apply. Okay... Our notion that the universe is made up of individual things is certainly based on our macroscopic experience. What if it breaks down sooner than we thought — what if, instead of pushing the idea of individual things down to a smaller and smaller scale until they sizzle apart into a complex Hilbert space, we should instead have concluded that individual things are something of an illusion even at macroscopic scales?

The structure of reality

One likely objection is that no matter how you split up reality, you'd still have to observe it classically and the usual machinery of quantum mechanics would apply just the same. There are at least a couple of ways — two come to mind atm — for some differently shaped 'slice' of reality to elude the quantum machinery.

The alternative slice might not be something directly observable.

Here an extreme example comes in handy (as hoped). Recall the sympathetic hypothesis, above. A pattern would not be subject to direct observation, any more than a Platonic ideal like "table" or "triangle" would be. (Actually, it seems possible a pattern would be a Platonic ideal.)

This is also reminiscent of the analogy with vau-calculus. I noted above that much of the substance of a calculus term is made up of variables, where by a variable I meant the entire dynamically interacting web delineated by a variable binding construct and all its matching variable instances. A variable in this sense isn't, so to speak, observable; one can observe a particular instance of a variable, but a variable instance is just an atom, and not particularly interesting.
The alternative slice might be something quantum math can't practically cope with. Quantum math is very difficult to apply in practice; some simple systems can be solved, but others are intractable. (It's fashionable in some circles to assume more powerful computers will solve all math problems. I'm reminded of a quote attributed to Eugene Wigner, commenting on a large quantum calculation: "It is nice to know that the computer understands the problem. But I would like to understand it, too.") It's not inconceivable that phenomena deviating from quantum predictions are "hiding in plain sight". My own instinct is that if this were so, they probably wouldn't be just on the edge of what we can cope with mathematically, but well outside that perimeter.

This raises the possibility that quantum mechanics might be an idealized approximation, holding asymptotically in a degenerate case — in somewhat the same way that Newtonian mechanics holds approximately for macroscopic problems that don't involve very high velocities.

We have several reasons, by this point, to suspect that whatever it is we're contemplating adding to our model of reality, it's nonlocal (that is, nonlocal relative to the time dimension, as is quantum theory). On one hand, bluntly, classical physics has had its chance and not worked out; we're already conjecturing that insisting on a classical approach is what got us into the hole we're trying to get out of. On the other hand, under the analogy we're exploring with vau-calculus, we've already noted that most of the term syntax is occupied by distributed variables — which are, in a deep sense, fundamentally nonlocal. The idea of spacetime as a hygiene condition rather than a base medium seems, on the face of it, to call for some sort of nonlocality; in fact, saying reality has a substantial component that doesn't follow the contours of spacetime is evidently equivalent to saying it's nonlocal. Put that way, saying that reality can be usefully sliced in a way that defies the division into elementary particles/things is also another way of saying it's nonlocal, since when we speak of dividing reality into elementary "things", we mean, things partitioned away from each other by spacetime. So what we have here is several different views of the same sort of conjectured property of reality. Keeping in mind, multiple views of a single structure is a common and fruitful phenomenon in mathematics.

I'm inclined to doubt this nonlocality would be of the sort already present in quantum theory. Quantum nonlocality might be somehow a degenerate case of a more general principle; but, again bluntly, quantum theory too has had its chance. Moreover, it seems we may be looking for something that operates on macroscopic scales, and quantum nonlocality (entanglement) tends to break down (decohere) at these scales. This suggests the prospect of some form of robust nonlocality, in contrast to the more fragile quantum effects.

So, at this point I've got in my toolkit of ideas (not including sympathy, which seems atm quite beyond the pale, limited to the admittedly useful role of devil's advocate):

a physical structure substantially not contained within spacetime.
- space emergent as a hygiene condition, perhaps rotation-related.
- robust nonlocality, with quantum nonlocality perhaps as an asymptotic degenerate case.
- some non-spacetime dimension over which one can recover abstract determinism/locality.
decomposition of reality into coherent "finite slices" in some way other than into elementary things in spacetime.
- slices may be either non-observable or out of practical quantum scope.
- the structural role of the space hygiene condition may be to keep slices distinct from each other.
- conceivably an alternative decomposition of reality may allow some over-specified elements in classical descriptions to be dropped entirely from the theory, at unknown price to descriptive clarity.

I can't make up my mind if this is appallingly vague, or consolidating nicely. Perhaps both. At any rate, the next phase of this operation would seem likely to shift further along the scale toward identifying concrete structures that meet the broad criteria. In that regard, it is probably worth remarking that current paradigm physics already decomposes reality into nonlocal slices (though not in the sense suggested here): the types of elementary particles. The slices aren't in the spirit of the "finite" condition, as there are only (atm) seventeen of them for the whole of reality; and they may, perhaps, be too closely tied to spacetime geometry — but they are, in themselves, certainly nonlocal.

Numberless differentiation

2015-05-17T10:26:00.000-07:00

The method described in I.3 resulted in an analogy between the "discrete" space of index values Z=(1,2,...) and the continuous state space Ω of the mechanical system[...]. That this cannot be achieved without some violence to the formalism and to mathematics is not surprising.
— John von Neumann, Mathematical Foundations of Quantum Mechanics, trans. Robert T. Beyer, 1955, §I.4.

In this post, I mean to explore a notion that arose somewhat peripherally in an earlier post. I noted that the abstractive power of a programming language is, in a sense, the second derivative of its semantics. I've also noted, in another post, a curious analogy between term rewriting calculi and cosmological physics, which I used to inspire some off-the-wall thinking about physics. So I've been wondering if the abstractive power notion might be developed into a formal tool for altogether non-numeric discrete systems akin to the continuous differential calculus that plays such a central role in modern physics.

Developing a mathematical tool of this sort is something of a guessing game. One hopes to find one's way to some natural structure in the platonic realm of the mathematics, using whatever clues one can discern, wherever one finds them (be they in the platonic realm or elsewhere).

I've a story from my mother, from when she was in graduate school, about a student asking for motivation for something in an algebraic topology class. The professor replied, when you build a beautiful cathedral you tear down the scaffolding. To me, honestly, this always seemed to be missing the point. You should keep a record of where the scaffolding was. And the architectural plans. And, while you're at it, the draft architectural plans.

Candidly, when I started this I did not know what I was expecting to find, and the path of reasoning has taken several unexpected turns along the way. One thing I haven't found is a purely non-numeric discrete device analogous to continuous differential calculus; but I've got a plate full of deep insights that deserve their chance to be shared, and if the non-numeric discrete device is there to discover, it seems I must be closer to it than I started.

Contents
Discrete continuous physics
Differentiation in physics
Abstractiveness in programming
Causation in term rewriting
Draft analogy
Breaking the draft analogy
Retrenching

Discrete continuous physics

Physics has been struggling with the interplay of continuous and discrete elements for centuries (arguably since ancient times). Continuous math is a clean powerful tool and therefore popular; on the face of it, one wouldn't expect to get nearly as much useful physics done with purely discrete structures. On the other hand, our ordinary experience includes discrete quantities as well as continuous ones (recalling Kronecker's attributed remark, God made the integers; all else is the work of man). Nineteenth-century classical physics reconciled continuous and discrete aspects by populating space with continuous fields and point particles.

The field/particle approach collapsed with quantum mechanics, which is commonly described as saying that various things appearing continuous at a macroscopic level turn out to be discrete at a very small scale. I'd put it almost the other way around. Our notion of discrete things comes from our ordinary, macroscopic experience, but evidently these macroscopic discrete things can be broken up into smaller things; and if we keep pushing this ordinary notion down to a sufficiently small scale, eventually it can't be pushed any further, and we end up having to postulate an underlying (non-observable) continuous wave function which, having been so-to-speak squashed as nearly flat as it can be, smears even supposedly discrete elementary particles into a continuous complex-valued field. Which gives rise to discrete events by means of wave-function collapse, the ultimate kludge. Wave-function collapse is a different way of reconciling continuous and discrete aspects of the theory, but tbh it feels a bit contrived.

In this view, howsoever we wandered into it, we've assumed a continuous foundation and sought to derive discreteness from the continuous substratum. Traditional quantum theory derives discreteness by means of wave function collapse. Some modern theories use the concept of a standing wave to derive discreteness; string theory does this, with closed loops vibrating at some overtone of their length, and the transactional interpretation does too, with waves propagating forward and backward in time between spacetime events and summing to a standing wave.

An alternative sometimes considered is that the foundation could be discrete — so-called digital physics. I've some doubts about this, intuitively, because — how to put this — I don't think Nature likes to be neatly classified as "discrete" or "continuous". I remember, in a long-ago brainstorming session with my father, posing the semi-rhetorical question, "what is neither discrete nor continuous?", to which he mildly suggested, "reality". A purely discrete foundation seems as unlikely to me as a purely continuous one; in fact, it's possible the overly continuous character of the underlying wave function in quantum theory may be one subtle reason I'm not fond of that theory.

I have a particular, if round-about, reason for interest in the prospects for introducing some form of discrete, nonlocal, nonquantum element into physics. I'd actually been thinking in this direction for some time before I realized why I was tending that way, and was rather excited to figure out why because it gives me a place to start looking for what kind of role this hypothetical discrete element might play in the physics. My reasoning comes from, broadly, Mach's principle — the idea that the way the local universe works is a function of the shape of the rest of the universe. (Mach's principle, btw, was given its name by Albert Einstein, who drew inspiration from it for general relativity.)

Suppose we want to study a very small part of the universe; say, for example, a single elementary particle. This requires us either to disregard the rest of the universe entirely, or to assume the influence of the rest of the universe on our single particle is neatly summarized by some simple factor in our system, such as a potential field. But quantum mechanics starts us thinking, somewhat at least, in terms of everything affecting everything else. Suppose this is actually more so than quantum mechanics tells us. Suppose our single particle is interacting nonlocally with the whole rest of the universe. We don't know anything about these interactions with the rest of the universe, so their influence on our one particle is —for us— essentially random. If it were all local and neatly summed up, we might try to ask for it to be described by a potential field of some sort; but since we're supposing this is nonlocal interaction, and we don't have any sort of nonlocal structure in our theory by which to describe it, we really can't do anything with it other than expect it to smear our probability distribution for our one particle.

This nonlocal interaction I'm talking about is presumably not, in general, the sort of nonlocal interaction that arises in quantum mechanics. That sort of interaction, at least when it occurs within the system one is studying, is quantum entanglement, and it's really quite fragile: at large scales it becomes very likely to decohere, its nonlocality breaking down under interactions with the rest of the universe. I'm hypothesizing, instead, some more robust sort of nonlocality, that can work on a large scale. From my reasoning above, it seems that as soon as we hypothesize some such robust nonlocal interaction, we immediately expect the behavior of our single-particle system to appear nondeterministic, simply because we no longer can have an undisturbed system, unless we consider the entire universe as a whole. In contrast to the frequent presentation of quantum nondeterminism as "strange", under this hypothesis nondeterminism at small scales is unsurprising — though not uninteresting, because we may get clues to the character of our robust nonlocal interaction by looking at how it disturbs our otherwise undisturbed small system.

Following these clues is made more challenging because we have to imagine not one new element of the theory, but two: on one hand, some sort of robust nonlocal interaction, and on the other hand, some local behavior that our small system would have had if the robust nonlocal interaction had not been present. We expect the familiar rules of quantum mechanics to be the sum of these two elements. I have already implied that the local element is not itself quantum mechanics, as I suggested something of quantum nondeterminism might be ascribed to the robust nonlocal element. One might suppose the local element is classical physics, and our robust nonlocal interaction is what's needed to sum with classical physics to produce quantum mechanics. Or perhaps, if we subtract the robust nonlocal interaction from quantum mechanics, we get something else again.

This also raises interesting possibilities at the systemic level. The sum of these two hypothetical elements doesn't have to be quantum mechanics exactly, it only has to be close enough for quantum mechanics to be useful when looking only at a small system; so, quantum mechanics could be a useful approximation in a special case, rather as Newtonian mechanics is. We may expect the robust nonlocal element to be significant at cosmological scales, and we may have misread some cosmological phenomena because we were expecting effects on that scale to be local. But just at the moment my point is that, since spacetime is apparently our primal source of continuity, and our robust nonlocal interaction is specifically bypassing that continuity, I would expect the new interaction to be at least partly discrete rather than continuous; intuitively, if it were altogether continuous one might expect it to be more naturally part of the geometry rather than remaining "nonlocal" as such.

Differentiation in physics

We want a schematic sense of the continuous device we're looking for an analogy to.

Suppose 𝓜 is a manifold (essentially, a possibly-curved n-dimensional space), and F is a field on 𝓜. The value of the field at a point p on 𝓜, Fp, might be a scalar (an ordinary number, probably real or complex), or perhaps a vector (directed magnitude); at any rate Fp would likely be some sort of tensor — but I'm trying to assume as little as possible. So Fp is some sort of field value at p.

The derivative of F at p is... something... that describes how F is changing on 𝓜 at p. If you know the derivative F ' of F at every point in 𝓜, you know the whole "shape" of F, and from this shape F ' you can almost reconstruct F. The process of doing so is integration. The process doesn't quite reconstruct F because you only know how F changes across 𝓜, not what it changes relative to. This absolute reference for all of F doesn't change, across the whole manifold 𝓜, so it isn't part of the information captured by F '. So the integral of F ' is, at least conceptually, F plus a constant.

By assuming just a bit more about the values of field F, we can see how to do more with derivative F ' than merely integrate it to almost-reconstruct F. The usual supposition is that Fp is either a number (a scalar, continuous), or a number together with a configuration relative to the manifold 𝓜. The idea here is that the configuration is a simple relationship to the manifold, and the number gives it "depth" (more-or-less literally) by saying "how big" is the configured entity. The simplest such configured value is a vector: a directed magnitude, where the direction on the manifold is the configuration, and the magnitude in that direction is the number. One can also have a bivector, where the configuration, rather than a linear direction, is a two-dimensional plane orientation. There's a more subtle distinction between vectors and covectors (in general, multivectors and comultivectors), which has to do with integration versus differentiation. These distinctions of "type" (scalar, vector, covector, bivector, cobivector) are relatively crude; most of the configuration information is a continuous orientation, some variant of "direction". And ultimately, we use the configurations to guide relations between numbers. All that information about "shape" is channeled into numerical equations.

The continuity of the manifold comes into the situation in two ways. One, already obliquely mentioned, is that the configuration of the field is only partly discrete, the rest of it (in all but the scalar-field case) being a sort of pseudo-direction on the manifold whose value is continuous, and interacts in some continuous fashion with the numerical depth/intensity of the field. The other insinuation of manifold continuity into the continuous derivative is more pervasive: the derivative describes the shape of the field at a point p by taking the shape in an ε-neighborhood of p in the limit as ε approaches zero. The very idea of this limit requires that the neighborhood size ε (as well as the field intensity and probably the pseudo-direction) be continuous.

Abstractiveness in programming

The analogy needs a similar schematic of the discrete device. Abstractiveness, in my treatment, is a relation between programming languages, in which some programming language Q is formally as abstractive as programming language P.

The basic structure on which the treatment rests is an abstract state machine, whose states are "languages" (ostensibly, programming languages), and whose transitions are labeled by texts. The texts are members of a set T of terms generated over a context-free grammar; each text is treated as an atomic whole, so when we consider a sequence of texts, the individual texts within the sequence remain distinct from each other, as if T were the "alphabet" in which the sequence is written (rather than all the texts in the sequence being run together as a string over a more primitive alphabet). Some subset of the texts are designated as observables, which are used to judge equivalences between programs. In a typical interpretation, non-observable texts are units of program code, such as module declarations, while observable texts are descriptors of program output.

Everything that matters about a state of this abstract machine is captured by the set of text sequences possible from that state (a touch of Mach's principle, there); so formally we define each language to be the set of text sequences, without introducing separate objects to represent the states. A language P is thus a set of text sequences, P ⊆ T^*, that's closed under prefixing (that is, if xy ∈ P then x ∈ P). We write P/x for the language reached from language P by text sequence x ∈ P; that is, P/x = { y | xy ∈ P }.

These structures are used to define formally when one language "does" at least as much as another language. The criterion is existence of a function φ : P → Q, also written φP = Q, that maps each text sequence x ∈ P to a text sequence φx ∈ φP such that φ preserves prefixes (φx is always a prefix of φxy) and φ also preserves some property of P understood as defining what it "does". For language expressiveness, this additional preserved property is concerned with the observables, resulting in a formal statement that Q is at least as expressive as P. For language abstractiveness, the additional preserved property is concerned with both observables and expressiveness relations between arbitrary P/x and P/y, resulting in a formal statement that Q is at least as abstractive as P.

The function φ is understood as "rewriting" P texts as Q texts; and the less intrusive this rewriting is, the more easily Q can subsume the capabilities of P. For expressiveness, the class of macro (aka polynomial) transformations is traditionally of interest; operations that can be eliminated by this sort of transformation are traditionally called syntactic sugar.

Causation in term rewriting

Anyone with a bit of background in computer science has probably worked both with grammars and with term-rewriting calculi (though likely not in depth at the same time). Have you thought about the contrast between them? I hadn't, really, until I ended up doing a master's thesis formulating a novel grammar model (pdf), followed by a doctoral dissertation formulating a novel term-rewriting calculus (here). Since I suspect grammars tend to be thought of with less, well, gravitas than calculi, I'll explain a bit of how the grammars in my master's thesis work.

The formalism is called RAGs — Recursive Adaptive Grammars. Where an attribute grammar would have a terminal alphabet for syntax, a nonterminal alphabet for metasyntax (that is, for specifying classes of syntax), and various domains of attribute values for semantics, a RAG has a single unified domain of answers serving all three roles (except that for metasyntax I'll use the deeper name cosemantics). A rule form looks like

⟨v₀,p₀⟩ → t₁ ⟨p₁,v₁⟩ ... t_n ⟨p_n,v_n⟩ t_n+1

where each t_k is a syntax string, and each ⟨•,•⟩ is a cosemantics/semantics pair suitable for labeling a parent node in a parse tree. Conceptually, cosemantics is inherited, specifying how a tree can grow downward (the cosemantics on the root node is the "start symbol"); while semantics is synthesized, specifying the resultant meaning of the parsed string. The v_k are variables, provided from other parts of the tree: cosemantics of the parent node, provided from above; and semantics of the children, provided from below. The p_k are polynomials, determined by the current rule from these variables, hence, from the cosemantics of the parent and semantics of the children. (If the grammar is well-behaved, each child's cosemantics p_k only uses variables v₀,...v_k−1, so there are no circular dependencies in the parse tree.)

Playing the role of the rule set of an attribute grammar, a RAG has a rule function ρ which maps each answer a into a set of rule forms ρ(a); in selecting answers for the variables in a rule form r (producing a rule instance of r), the inherited cosemantics v₀ has to be some answer a such that r ∈ ρ(a).

If the polynomials were only allowed to construct answers, our grammars could recognize somewhat more than context-free languages; but we also want to compute semantics using Turing-powerful computation. For this, I permitted in the polynomials a non-constructive binary operator — the query operator, written as an infix colon, •:•. Conceptually, parsing starts with cosemantics, parses syntax, and in doing so synthesizes semantics; but in the usual grammatical derivation relation, cosemantics and semantics are both on the left side of the derivation, while syntax is on the right, thus:

⟨cosemantics, semantics⟩ ⇒⁺ syntax

The query operator rearranges these elements, because it has the meaning "find the semantics that results from this cosemantics on this syntax":

cosemantics : syntax ⇒⁺ semantics

So we can embed queries in our rule forms, hence the "recursive" in "Recursive Adaptive Grammars". However, I meant RAGs to be based on an elementary derivation step; so I needed a set of derivation step axioms that would induce the above equivalence. My solution was to introduce one more operator — not permitted in rule forms at all, but used during intermediate steps in derivation. The operator: inverse, denoted by an overbar.

Imho, the inverse operator is elegant, bizarre, and cool. What it does is reverse the direction of derivation. That is, for any derivation step c₁ ⇒ c₂, we have another step c₂ ⇒ c₁. The inverse of the inverse of any term c is just the same term c back again; and every answer a is its own inverse. So, given any three answers a₁, a₂, and a₃, if ⟨a₁,a₂⟩ ⇒⁺ a₃ then a₃ ⇒⁺ ⟨a₁,a₂⟩ .

The whole derivation step relation is defined by four axioms. We've already discussed the first two of them, and the third is just the usual rewriting property of compatibility:

If c₁ ⇒ c₂ then c₂ ⇒ c₁ .
If ⟨a₁,a₂⟩ → a₃ is a rule instance of r ∈ ρ(a₁), then ⟨a₁,a₂⟩ ⇒ a₃ .
If c₁ ⇒ c₂ and C is a context in which the missing subterm isn't inside an inverse operator, then C[c₁] ⇒ C[c₂] .

The fourth axiom is the spark that makes queries come alive.

a₁ : ⟨a₁,a₂⟩ ⇒ a₂ .

So now, from ⟨a₁,a₂⟩ ⇒⁺ a₃ we can deduce by the first axiom a₃ ⇒⁺ ⟨a₁,a₂⟩ , by the third axiom a₁ : a₃ ⇒⁺ a₁ : ⟨a₁,a₂⟩ , and stringing this together with the fourth axiom, a₁ : a₃ ⇒⁺ a₂ .

The converse, that a₁ : a₃ ⇒⁺ a₂ implies ⟨a₁,a₂⟩ ⇒⁺ a₃ , can also be proven without too much fuss... if one has already proven the basic result that an answer cannot be derived from another answer. That is, if answer a is derived from c in one or more steps, c ⇒⁺ a , then c is not an answer. This would be trivial if not for the inverse operator, which allows an answer a to be the left-hand side of infinitely many derivations (the inverses of all derivations that end with a); the theorem says that once you've derived a, further derivation from there can't reach another answer. It took me two rather messy pages to prove this in my master's thesis (simple proofs, if they exist, can take decades to find), and I realized at the time it was a sort of analog to the Church-Rosser theorem for a calculus — an elementary well-behavedness result that one really wants to prove first because without it one has trouble proving other things.

I didn't think more of it until, years later when explaining RAGs to my dissertation committee (though in the end I didn't use RAGs in my dissertation), I got to see someone else go through a WTF moment followed by an aha! moment over the virulent un-Church-Rosser-ness of RAGs. Somehow it hadn't registered on me just how very like a term-rewriting calculus my RAG formalism might look, to someone used to working with λ-like calculi. Mathematical systems are routinely classified according to their formal properties, but there's something deeper going on here. RAGs, despite being in a meaningful way Turing-powerful, don't just lack the Church-Rosser property, they don't want it. A λ-like calculus that isn't Church-Rosser is badly behaved, likely pathological; but a grammar that is Church-Rosser is degenerate. It's a matter of purpose, which is not covered by formal mathematical properties.

My point (approached obliquely, but I'm not sure one would appreciate it properly if one didn't follow a route that offers a good view of the thing on approach) is that the purpose of a calculus is mainly to arrive at the result of an operation, whereas the purpose of a grammar is to relate a starting point to the usually-unbounded range of destinations reachable from it. This requires us to think carefully on what we're doing with a discrete analog to the conventional continuous notion of differentiation, because when we take a conventional derivative, the thing we're differentiating is a single-valued function, akin in purpose to a calculus meant to produce a result, whereas when we describe expressiveness/abstractiveness as derivatives of semantics, the thing we're differentiating appears, on the face of it, to be very like a grammar, describing a usually-unbounded network of roads departing from a common starting point.

Draft analogy

As foundation for a notion of discrete differentiation, we're looking for a useful correspondence between elements of these two kinds of structures, continuous and discrete. So far, it might seem we've identified only differences, rather than similarities, between the two. Continuous differentiation is necessarily in the "calculus" camp, abstractiveness/expressiveness necessarily in the "grammar" camp. Continuous differentiation at a point p is fundamentally local, depending in its deepest nature on a family of ε-neighborhoods of p as ε varies continuously toward zero. Abstractiveness/expressiveness of a language P is fundamentally global, depending in its deepest nature on a family of languages reachable from P by means of (in general, unbounded) text sequences. The local structure of the continuous case doesn't even exist in the discrete case, where language P has a set of closest neighbors; and the global structure on which the discrete case depends is deliberately ignored by the continuous derivative.

A starting point for an analogy is hidden in plain sight (though perhaps I've given it away already, as I chose my presentation to set the stage for it). Recall again Mach's principle, that the way the local universe works is a function of the shape of the rest of the universe. Wikipedia offers this anecdote for it:

You are standing in a field looking at the stars. Your arms are resting freely at your side, and you see that the distant stars are not moving. Now start spinning. The stars are whirling around you and your arms are pulled away from your body. Why should your arms be pulled away when the stars are whirling? Why should they be dangling freely when the stars don't move?

(One is reminded of the esoteric principle "as above, so below".)

In our continuous and discrete cases, we have two ways to examine the properties of a global structure at a particular location (p or P), one of which draws on an unbounded family of ε-neighborhoods approaching arbitrarily close, and the other on an (in general) unbounded family of discrete structures receding arbitrarily far. Either way, we derive our understanding of conditions at the particular location by aggregating from an unbounded, coherent structure. One might call the two strategies "unbounded locality" and "unbounded globality". If we accept that the way the local universe works and the global structure of the universe are two faces of the same thing, then these strategies ought to be two ways of getting at the same thing.

This strategic similarity is reassuring, but lacks tactical detail. In the continuous case, field F gave us a value at point p, the derivative gave us another field with again a value at p, and we could keep applying the differentiation operation (if F was sufficiently well-behaved to start with) producing a series of higher-derivative fields, each providing again a value at p. Yet in the discrete case we don't appear to have a local "value" at P, the purely local "state" being so devoid of information that we actually chose to drop the states from the formal treatment altogether; and while we may intuit that expressiveness is the derivative of semantics, and abstractiveness the derivative of expressiveness, the actual formalisms we constructed for these three things were altogether different from each other. The semantics apparently had only the text sequences, set of observables, and a (typically, infinite) discrete structure called a "language"; expressiveness added to this picture functions between languages; and abstractiveness augmented each language with a family of functions between the different nodes within each language, then defining functions between these "languages with expressive structure". On the face of it, this doesn't appear to be a series of structures of the same kind (as fields F, F ', F '' are of the same kind, granting we do expect them to vary their configurations in some regular, generalizable way).

In my treatment of abstractive power, though, I noted that if, in augmenting languages P and Q with "expressive structure", the family of functions we use is the degenerate family of identity functions, then the relation "〈Q,Id〉 is as abstractive as 〈P,Id〉" is equivalent to "Q is as expressive as P". This, together with Mach's principle, gives us a tactical analogy. The underlying "state machine", whose states are languages and whose transitions are labeled by program texts, corresponds to manifold 𝓜. A particular state, taken together with the rest of the machine reachable from that state, corresponds to point p of the manifold. An "expressive structure" overlain on the machine, creating a web of functions between its states, corresponds to a field F (or F ', etc.). Differentiation is then an operation mapping a given "field" (overlying web of functions) to another. Thus, starting with our discrete machine 𝓜, overlaying the family of identity functions gives us the semantics of 𝓜, differentiating the semantics gives 𝓜's expressive structure, differentiating again gives its abstractive structure.

This analogy provides a way for us to consider the various parts of our discrete scheme in terms of the major elements of the continuous scheme (manifold, field, point). But on closer inspection, there are a couple of flaws in the analogy that need to be thought out carefully — and these flaws will lead us to a second philosophical difference comparable in depth to the difference between grammars and term-rewriting calculi.

Breaking the draft analogy

One mismatch in the analogy concerns the difference in substance between 𝓜 and F. In the continuous case, manifold 𝓜 consists entirely of the oriented distances between its points (we know the distances themselves are numbers because they are the ε needed for differentiation). If you start with a field F of uniform value across 𝓜, and take its integral, you can reconstruct 𝓜. The key point here is that 𝓜 and F are made of the same "stuff". In our discrete scheme, though, 𝓜 is a state machine with term-labeled transitions, while F is a family of functions between term sequences. The discrete 𝓜 and F are concretely different structures, made of different "stuff" rather than commensurate in the manner of the continuous 𝓜 and F. It might not be immediately obvious what the consequences would be of this analogy mismatch; but as happens, it feeds into the second mismatch, which is more directly concerned with the practical use of derivatives in physics.

The second mismatch concerns how the manifold changes over time — or, equivalently, its curvature. Our main precedent here is general relativity.

In general relativity, mass warps the space near its position, and the resulting curvature of space guides the movement of the mass over time. It's not apparent that abstractiveness/expressiveness has anything analogous to time in this sense. It should be apparent, though —once one takes a moment to consider— that general relativity doesn't have absolute time either. Observers moving relative to each other will disagree in their perceptions of time, and of space, and of which observed events are and are not simultaneous; so rather than absolute time and space, we have four-dimensional spacetime that looks different to different observers. Thinking of spacetime as a whole, rather than trying to factor it into absolute space and time, a particle with mass traces a one-dimensional curve across the four-dimensional manifold of spacetime. When we said, a moment ago, that the mass's position warps space while space guides the mass's movement, this mutual influence is described by mathematical equations involving the manifold 𝓜 and fields F, F ', etc. These equations are all about numbers (which don't occur in our discrete scheme) and rely on the fact that the manifold and the fields are both made out of this same numeric stuff (which they aren't in our discrete scheme). Evidently, the key element here which we have failed to carry across the analogy is that of equations encompassing both 𝓜 and F.

But, when we look more closely to try to find a discrete element to support an analogy with the continuous equations, we find that the omission is after all something more primal than numbers, or even commonality of substance between 𝓜 and F. The omission is one of purpose. In the continuous case, the purpose of those equations is to enable us to solve for 𝓜 and F. But the discrete device has no interest in solving for 𝓜 at all. We don't use our theories of expressiveness and abstractiveness to define a language; we only use them to study languages that we've devised by other means.

Retrenching

At this juncture in the narrative, imho we've done rather well at reducing the question of the analogy to its essence. But we still don't have an answer, and we've exhausted our supply of insights in getting this far. We need more ammunition. What else do we know about these continuous and discrete schemes, to break us out of this latest impasse?

For one thing, the asymmetric presence of solving in the continuous scheme is a consequence of a difference in the practical purposes for which the two mathematical schemes were devised. In physics, we try to set up a mathematical system that matches reality, and then use the math to translate observations into constraints and, thus, predictions. Solving is the point of the mathematical exercise, telling us what we should expect to deduce from observations if the theory is correct. Whereas in expressiveness/abstractiveness, we consider various possible programming languages —all of which are of our own devising— and study the different consequences of them. Rather than trying to figure out the properties of the world we're living in, we study the properties of different invented worlds in hopes of deciding which we'd like to live in.

The difference goes deeper, though. Even if we did want to solve for state machine 𝓜 in the discrete scheme —in effect, solve for the programming language— we wouldn't find expressiveness/abstractiveness, nor even semantics, at all adequate to the task. Recall from the above account of the continuous scheme, when observing that 𝓜 and F are made of the same "stuff", I suggested that from F one might reconstruct 𝓜. Strange to tell, when I wrote that, I wasn't even thinking yet in terms of this gap in the analogy over solving for 𝓜 (though, obviously, that's where my reasoning was about to take me). There is nothing in manifold 𝓜 that's in any fundamental way more complicated than fields F etc. In stark contrast to the discrete case. Expressiveness/abstractiveness are constructed from families of functions over 𝓜; to the extent semantic/expressive/abstractive comparisons are anything more than directed graphs on the states of 𝓜, their complexity is an echo of a deliberately very limited part of the structural complexity latent in 𝓜 itself. While the internal structure of a programming language —state machine 𝓜 itself— is immensely complicated; tbh, it seems way more complicated than a mere space-time manifold, which is a continuum whereas a programming language is discontinuity incarnate.

The point about discontinuity recalls a key feature of the other analogy I drew, in my earlier post, between cosmological physics and term rewriting. Under that analogy, fundamental forces in physics corresponded to substitution functions in term-rewriting; while physical geometry, with its close relation to the gravitational force, corresponded to α-renaming, with its close relation to λ-substitution. In our current analogy, we've been treating continuity, with its role in ε-neighborhoods, as the defining feature of the manifold, and finding nothing like it in the syntactic complexity of a programming language. What if for purposes of solving, what we need is not so much continuity as geometry?

For this, we need to find something α-renaming-like in the discrete scheme, which is challenging because expressiveness/abstractiveness is not based on term-rewriting. There's more to it, though. Whatever-it-is, α-renaming-like, we mean to be the analog to the manifold. In other words, it's the discrete 𝓜. And since the whole programming language, or equivalently the state machine, evidently isn't α-renaming-like, 𝓜 isn't the programming language after all, but something else. Some facet of the programming language. And not just any facet — one that guides our discrete differentiation analogously to how the geometry of the manifold guides continuous differentiation.

Computation and truth

2015-05-10T08:20:00.000-07:00

a very great deal more truth can become known than can be proven.
— Richard Feynman, "The Development of the Space-Time View of Quantum Electrodynamics", Nobel Lecture, 1965.

In this post I'm going to take a sharper look at the relationship between computation and truth. This relates to my previously blogged interests both in difficulties with the conventional notion of type (here) and in possible approaches to sidestepping Gödel's Theorem (here).

I was motivated to pursue these ideas further now by two recent developments: first, a discussion on LtU that — most gratifyingly — pushed me to re-review the more globally oriented aspects of Gödel's Theorem (here); and second, following on that, an abrupt insight into what the Curry–Howard correspondence implies about the practical dynamics of static typing.

My biggest achievement here is on something I've wanted to do for many years: to make clear exactly why Gödel's Theorem is true. By proving it. As an elementary result. If it's really as profound as it's made out to be, it should have a really simple proof; yet I've never seen the proof treated as an elementary result. At an elementary level one usually sees hand-waving about the proof, in a manner I find unsatisfying because the details omitted seem rather essential to understanding why the result is true. I don't expect to specify here every tiny detail of a proof, but any detail I omit should be obviously minor; it shouldn't feel as if anything problematic is being hidden.

I've comparatively far less to say about Curry–Howard. It's taken me years to come up with this insight into the connection between Curry–Howard and static typing, though; so if the difficulty of coming up with it isn't just me being thick, the fact that it doesn't take long to state is a good sign.

Contents
That which is most computable is most true
Church–Turing
Enumeration
Diagonalization
Uncomputability
Gödel
Attitude
Curry–Howard
What we can't know
Gödel's Second Theorem
Turing's dissertation
Where do we go from here?

That which is most computable is most true

We want to get at truth using mechanical reasoning. Why? Because mechanical reasoning is objective. We can see what all the rules are, and decide whether we agree with them, and we can check how the particular reasoning was done and know whether or not it follows the rules, and if it all checks out we can all agree on the conclusion. Everything is out in the open, unlike things like divine revelation where an unadorned "I don't believe it" is a credible counter-argument.

In mechanical reasoning, propositions are built up using symbols from a finite alphabet, and a finite set of rules of deduction allow propositions to be proven, either from first principles (axioms) or from a finite number of previously proven propositions (theorems). In essence, this is computation (aka mechanical computation); I'm not talking about a sophisticated correspondence here, just common-sense mechanics: manipulating propositions using finite rules of deduction is manipulating strings over an alphabet using a finite set of rules, which is computation. What you can and can't do this way is governed by what you can and can't do by computation.

Some results that came out gradually over the first several decades of the twentieth century appear to say there is no one unique most-powerful notion of truth accessible by mechanical reasoning. However, it turns out there is a unique most-powerful notion of computation. By the time this second point was fully appreciated, mathematicians had already long since been induced to reconcile themselves, one way or another, to the first point. However, looking at it all from nearly a century downstream, it seems advisable to me to start my systematic study with the historically later result, the unique most-powerful notion of computation.

Church–Turing

In the 1930s, three different formal models of general computation were proven to be equi-powerful: general recursive functions (1931, Gödel and Herbrand), λ-calculus (1936, Church), and Turing machines (1936, Turing). That is, for any computation you can do with one of these models, there's an equivalent computation you can do with either of the other models. Of these three models, Turing machines are perhaps the least mathematically elegant, rather nuts-and-bolts sort of devices, but it's also most intuitively obvious that you can do something by mechanical computation iff (if and only if) you can do it with a Turing machine. When the other models were proven equi-powerful with Turing machines, it didn't add much to the credibility of Turing machines; rather, it added credibility to the other models.

The "thesis", also called a "conjecture", is that any model of mechanical computation, if it isn't underpowered, is also equi-powerful with Turing machines and all these other "Turing-powerful" models of computation. It's a "thesis", "conjecture", etc., because it's an inherently informal statement and therefore isn't subject to formal proof. But it's demonstrated its unassailability for about three quarters of a century, now. Mathematicians study what it takes to build a model more powerful than Turing machines (in fact, Turing himself touched on this in his dissertation); you have to bring in something that isn't altogether mechanical. The maximum amount of computational power you can get mechanically is a robust quantity, somehow built into the fabric of Platonic mathematics much as the speed of light is built into the fabric of physics, and this quantity of power is the same no matter which way you get there (what computational model you use). And, that amount of computational power is realizable; it isn't something you can only approach, but something you can achieve (mathematically speaking).

Enumeration

When exploring the limits of Turing-powerful computation, the basic technique is to frame computations in terms of enumerations.

Enumeration is where you just generate a list, perhaps a list that never ends; typically you don't want the whole list, you just watch it as it grows to see whether the information you actually want ever gets listed. As long as you're only asking whether a computation can be done — not how efficiently it can be done — all computations can be rearranged this way to use enumeration.

Suppose you've got any Turing machine. It computes outputs from inputs. Maybe it doesn't even always complete its computation: maybe sometimes it diverges, computing forever instead of finally producing an answer, perhaps by going into an infinite loop, or perhaps finding some more exotic way to non-terminate. But, given any such machine, you can always define a second machine that enumerates all the input/output pairs where the first machine given that input would halt with that output. How would you do that? It's straightforward, though tedious. Call the first machine T₁. We can enumerate all the possible inputs, since they're all built up from a finite alphabet; just list them in order of increasing length, and within inputs of a given length, list them alphabetically. For each of these inputs, using the mechanical rules of T₁, you can enumerate the states of T₁: the initial state, where it has the input and is about to start computing, the second state that results from what it does next from that first state, and so on. Call the m^th state of T₁ on the n^th input S(n,m). Imagine an expanding table, which we slowly fill out, where n is the row number and m is the column number. Row n is the entire computation by T₁ for the n^th input. We can't fill out the whole table by completing each row before moving on to the next, because some rows might extend to the right forever, as T₁ doesn't halt on that input. But we can mechanically compute any particular cell in the table, S(n,m). So we just have to enumerate all the pairs n,m such that we'll eventually get to each of them — for example, we can do all the entries where the sum of n and m is two (that's just the leftmost topmost entry, S(1,1)), then all the ones where the sum is three (that's S(1,2) and S(2,1)), four (that's S(1,3), S(2,2), and S(3,1)), and so on — and whenever we find a cell where T₁ halts, we output the corresponding T₁ input/output pair. We now have a T₂ that enumerates all and only the input/output pairs for which T₁ halts.

Diagonalization

Way back in the late nineteenth century, though, Georg Cantor noticed that if you can enumerate a series of enumerations — even a series of infinite enumerations — you can find an enumeration that wasn't in the series. He did this with real numbers, to show that not all real numbers are rational. This is going to sound very similar to what we just did for Turing machines.

Consider the numbers less than one and not less than zero. We can enumerate the rational numbers in this interval: each such number is a non-negative integer divided by a strictly larger integer, so just list them in order of increasing denominator, and for a given denominator, by increasing numerator (0/1, 0/2, 1/2, 0/3 1/3, 2/3, et cetera ad infinitum). For each of these ratios, we can enumerate the digits in the decimal representation of that ratio, starting with the tenths digit. (If the denominator divides a power of ten, after some point the decimal representation will be all zeros.) Call the m^th digit of the n^th ratio S(n,m). Imagine a table where n is the row number and m is the column number. Row n is the entire decimal representation of the n^th ratio. We can mechanically compute what should go in any particular entry of this table. And now comes the trick. We can construct the decimal representation of a real number, in the interval, that isn't equal to any of the ones we've enumerated. To do this, we read off the entries on the diagonal of our table from the upper left toward the lower right (S(1,1), S(2,2), etc.), and add one to each digit (modulo 10, so if we read a 9 we change that to 0): our n^th digit is (S(n,n) + 1) mod 10. This is a perfectly good way to specify a real number — it's an infinite sum of the form Σa_n10⁻ⁿ — and we know it isn't rational because every rational number in the interval has some decimal digit on which it differs from our real number.

This general technique is called diagonalization: you have an enumerable set of inputs (the labels on the rows), for each input you have an enumerated sequence (the row with that label), and you then produce an enumerated sequence that differs from every row by reading off the diagonal from upper left downward to the right. Since diagonalization is used to show something isn't enumerated, and enumeration is at the heart of computation, naturally diagonalization is useful for showing things can't be computed.

Uncomputability

Because a Turing machine, like any other algorithmic device such as a general recursive function or λ-calculus term, is fundamentally finite, it's a straightforward (if tedious) exercise to describe it as an input to another machine that can then, straightforwardly, interpret the description to simulate the behavior of the described machine. This is a universal Turing machine, which takes as input a machine description and an input to the described machine, and outputs the output of the described machine on that input — or doesn't halt, if the described machine wouldn't halt on that input.

However, although simulating a described machine is straightforward, it is not possible to determine in general, by computation, whether or not the described machine will halt on the given input. That is, we cannot possibly construct a Turing machine that always halts that determines whether any described machine halts on a given input. We can show this by diagonalization.

We can enumerate all possible machine descriptions, readily enough, since they're just alphabetic strings that obey some simple (and therefore checkable) syntactic rules. We can also, of course, enumerate all possible inputs to the described machines. Imagine a table where the entry S(n,m) at row n and column m is "yes" if the n^th machine halts on the m^th input, or "no" if it doesn't halt. Suppose we can construct a machine that computes the entries S(n,m) of this table. Then by going down the diagonal of the table we can also construct a machine whose behavior differs from every row of the table: Let machine A on the m^th input compute S(m,m), and if it's "no", halt and say "no", while if it's "yes", go into an infinite loop and thus never halt. If machine A is the n^th machine in the table, then A halts on the n^th input if and only if S(n,n) is "no". Assuming that our computation of S works right, there can't be any such n, and A was never enumerated.

Gödel

Gödel's Theorem (aka Gödel's First Theorem) says that any sufficiently powerful formal system is either incomplete or inconsistent — in essence, either it can't prove everything that's true, or it can prove things that aren't true.

To pin this down, we first need to work out what "sufficiently powerful" means. Gödel wanted a system powerful enough to reason about arithmetic: we can boil this down to, for an arithmetic function f and integer i, does f(i)=j or doesn't it? The functions of interest are, of course, general recursive functions, which are equi-powerful with Turing machines and with λ-calculus; so we can equivalently say we want to reason about whether a given Turing machine with input i will or will not produce output j. But a formal system is itself essentially a Turing machine; so in effect we're talking about a Turing machine L (the formal system; L for logic) that determines whether or not a Turing machine (the function f) on given input produces given output. The system would be consistent and complete if it confirms every true statement about whether or not f on given input produces given output, and doesn't confirm any false such statements.

Enumerate the machines f and make them the rows of a table. Enumerate the input/output pairs and make them the columns. In the entry for row n and column m, put a "yes" if L confirms that the n^th machine has the m^th input/output pair, a "no" if L confirms that it doesn't. Suppose L is consistent and complete.

It can't be both true and false that the n^th machine has the m^th input/output pair; so if L only confirms true propositions, there can't be both a "yes" and a "no" in any one table entry. What about blank table entries? For centuries it was generally agreed that a proposition must be either true or false; but this idea had fallen into some disrepute during the three decades leading up to Gödel's results. This is just as well, because, based on our supposition that L is consistent and complete, we can easily show that the table must have some blank entries. Suppose the table has no blank entries. Then for any machine f₁, and any input i, we can determine whether f₁ halts on i, thus: construct another machine f₂ that runs f₁ on i and then halts with output confirm. Because there are no blank entries in the table, we know L can determine whether or not f₂(i)=confirm, and this also determines whether or not f₁ halts on i. But we already know from the previous section that we cannot correctly determine by computation whether or not an arbitrary machine halts on an arbitrary input; therefore, there must be some blank entries in the table.

Is it possible for a proposition of this kind — that a given machine on a given input produces a given output — to be neither true nor false? If you think this isn't possible, then we have already proven to your satisfaction that L cannot be both consistent and complete. However, since we're collectively no longer so sure that propositions have to be either true or false, let's see if we can find a difficulty with the consistent complete system without insisting that every table entry must be filled in. Instead, we'll look for a particular entry that we know should be filled in, but isn't.

We're going to diagonalize. First, let's restrict our yes/no table by looking only at columns where the output is confirm (and, being really careful, suppress any duplicate column labels, so each column label occurs only once). So now our table has rows for machines f, columns for inputs i, and each entry (n,m) contains a "yes" if L confirms that the n^th machine on the m^th input produces output confirm, while the entry contains a "no" if L confirms it does not produce output confirm. The entry is blank if L doesn't confirm either proposition. Construct a machine A as follows. For given input, go through the column labels till you find the one that matches it (we were careful there'd be only one); call that column number m. Use L to confirm a "no" in table entry m,m, and once you've confirmed that, output confirm. If L never confirms that "no", then A never halts, and never outputs confirm. Since A is a Turing machine, it is the label on some row n of the table. What is the content of table entry n,n? Remember, the content of the table entry is what L actually confirms about the behavior of A on the n^th input. By construction, if the entry contains "no", then A outputs confirm, and the "no" is incorrect. If the entry contains "yes", and the "yes" is correct, then A outputs confirm, and by construction it must have done so because the entry contains an incorrect "no" that caused A to behave this way. Therefore, if L doesn't confirm anything that's false, this table entry must be blank. But if we know the table entry is blank, then we know that, by failing to put a "no" there, L has failed to confirm something true, and is therefore incomplete.

If we are sure the formal system proves everything that's true, then we cannot possibly be sure it doesn't prove anything that's false; if we are sure it doesn't prove anything that's false, we cannot possibly be sure it proves everything that's true. Heisenberg's uncertainty principle comes to mind.

Attitude

Gödel's results are commonly phrased in terms of what a formal system can prove about itself, and treated in terms of the rules of deduction in the formal system. There are both historical and practical reasons for this.

In the first half of the nineteenth century, the foundations of mathematics underwent a Kuhnian paradigm shift, settling on building things up formally from a set of axioms. In the 1890s people started to notice cracks in the axiomatic foundations, in the form of antinomies — pairs of contradictory statements that were both provable from the axioms. Mathematicians generally reacted by looking for some axiom they'd chosen that doesn't hold in general — as geometry had done in the early nineteenth century to explore non-Euclidean geometries that lack the parallel postulate. As a source of antinomies, attention fell primarily on the Law of the Excluded Middle, which says a proposition is either true or false; as an off-beat alternative, Alonzo Church considered weakening reductio ad absurdum, which says that if assuming a proposition leads to a contradiction, then the proposition is false. Thus, emphasis on choice of rules of deduction.

David Hilbert proposed to use a subset of a formal system to prove the consistency of the larger system; this would have the advantage that one might be more confident of the subset, so that using the subset to prove consistency would increase confidence in the larger system. Gödel's result was understood to mean that the consistency of the whole formal system (for a powerful system) can only be proved in an even more powerful system. Thus, emphasis on what a formal system can prove about itself.

Explorations of how to cope with the Theorem have continued to focus on the system's rules of deduction; my own earlier post tended this way. Alan Turing's dissertation at Princeton also followed this route. The emphasis on rules of deduction naturally suggests itself when looking for a way around Gödel's Theorem, because if you want to achieve a mechanical means for deriving truth, as a practical matter you can't achieve that without working out the specific mechanical means.

However, in this post I've been phrasing and treating Gödel's Theorem differently.

I phrased myself in terms of what we can know about the system — regardless of how we come to know — rather than what the system can prove about itself. (I'm not distinguishing, btw, between "what we can know" and "what can be true"; either would do, in principle, but we're no longer sure what "truth" is, and while it's awkward to talk about multiple notions of truth, it's easier to talk about multiple observers. When convenient I'll conjure a hypothetical omniscient being, to dispense with quibbles about "true but unknowable".)

My treatment of the Theorem conspicuously omits any internals of the formal system, supposing only that its conclusions are computable (and below I'll dispense with even that supposition). By depicting Gödel's Theorem without any reference to the rules of deduction, this approach seems to throw a wet blanket on attempts to cope with Gödel's Theorem by means of one's choice of rules of deduction — and frankly, I approve of discouraging attempts by that route. I'm not looking for a clever loophole in Gödel's result — invoking, say, uncountable infinities or second-order logic as a sort of Get Out of Jail Free card. In my experience, when somebody thinks they've found a loophole in something as fundamental as Gödel's Theorem, it's very likely they've outsmarted themselves and ended up with a bogus result. What I want is an obvious way of completely bypassing the Theorem; something poetically akin to cutting the Gordian Knot.

That is, I'm looking for a way around Gödel's Theorem with a high profundity index. This is an informal device I use to characterize the sort of solutions I favor. Imagine you could use numerical values to describe how difficult conceptual tasks are: each such value is a positive number, and the more difficult the task, the higher the number. Now, for a given idea, take the difficulty of coming up with the idea the first time, and divide by the difficulty of understanding the idea once it's been explained to you. That ratio is the profundity index of the idea. So an idea is profound if it was really difficult to come up with, but is really obvious once explained. If an idea that's incredibly hard to come up with in the first place turns out to be even harder to figure out how to explain clearly, the denominator you want is the difficulty of understanding it after somebody has figured out how to explain it clearly, and the numerator should include the difficulty of coming up with the explanation.

The metaphor of getting around something implies a desire to get to the other side; and it may be illuminating to ask why one wants to do so. We have here two notions, one practical and one philosophical. The notion of truth is as philosophical as you can get; it's the whole purpose of philosophy. The notion of mechanical computation is — despite quibbles about infinite resources and such — quintessentially practical, to do with getting results by an explicitly objective and reproducible procedure. Mathematicians in the second half of the nineteenth century sought to access truth through computation. The protracted collapse of that agenda across the first three decades of the twentieth century, culminating in Gödel's Theorem, has left us without a clear understanding of the proper role of computation in investigating truth; and with yet another in philosophy's long tradition of ways to not be sure what is true. So I suppose, in trying to get around Gödel's Theorem, my hopes are

to find a robust maximum of truth, as Turing power is a robust maximum of computational power.
to find a robust maximum way of obtaining truth through computation.

Gödel's Theorem tells us that Turing power itself, despite its robustness, does not provide a straightforward maximum of either truth or proof.

Curry–Howard

Though there are (of course) situations in which the Curry–Howard correspondence is exactly what one needs, in general I see it as badly overrated.

The basic correspondence is between rules of deduction in formal proof systems, and rules of static typing in a programming language (classically, the typed λ-calculus). The canonical example is that modus ponens corresponds to typed function application: modus ponens says that if proposition A is provable and proposition A⇒B is provable, then proposition B is provable; typed function application says that if a is an expression of type A and f is an expression of type A→B, then fa is an expression of type B. Moving outward from this insight, when you construct a correctly typed program you are also constructing a proof; thus proofs correspond to correctly typed programs. A theorem corresponds to a type, so that asking whether a theorem has a proof is asking whether the corresponding type has a correctly typed expression of that type — that is, provability of the theorem corresponds to realizability of the type. And so on.

The folk-wisdom version of the correspondence is that logic and computation are the same thing. The folk-corollary is that all reasoning should be done with types. This is the basis of modern type theory, and there are folks trying to recast both programming language design, logic, and mathematics in the image of types. Curry–Howard has taken on a (one hopes, metaphorically) theological significance.

It strikes me, though, that the basic correspondence does not involve computation at all. If, in the realm of programming, a type system ever becomes Turing-powerful, that's a major mark against it because we want fast automatic type-checking and, even if we're willing to wait a little longer, we certainly want our type-checks to be guaranteed to halt. In any event, types are not the primary vehicle of computation, rather they're a means of reasoning about the form of programs — thus, not even reasoning directly about our computations, but rather about the way we specify our computations.

It's easy to get tangled up trying to make sense of this proof–program connection. For example, when we say that we want our automatic type-checking algorithm to always halt, that limits the computational power involved in checking an individual step of a proof, but puts no limit on the computational power of proof in general because the length of allowable proofs is unbounded, just as the size of program expressions is unbounded. There is no evident notion of what it means for a proof to "halt", and this corresponds, through Curry–Howard, to saying there is no such thing as a "largest" expression in λ-calculus that cannot be made a subexpression of a larger expression; it has nothing whatever to do with halting of λ-calculus computations. The reason one gets tangled up like this is that although proofs and programs are technically connected through Curry–Howard, they have different and often incompatible purposes.

The purpose of a proof, I submit, is to elucidate a chain of reasoning. The more lucid, the better. Ideally one wants what Paul Erdős used to call "one from The Book" — the idea being that God has a jealously guarded book of the most beautiful proofs of all theorems (Paul Erdős was an atheist; he said "You don't have to believe in God, but you should believe in The Book"). But the first duty of a program is to elucidate an algorithm. Seriously. This shouldn't be a controversial statement, and it's scary to realize that for some people it is. I'll say it again. The first duty of a program is to elucidate an algorithm. You should be able to tell at a glance what the program is doing. That is your first line of defense against getting the program wrong. Proofs of program correctness, with all the benefits and problems thereof, are a later line of defense, possibly useful but no substitute for being able to tell what the program does. (Yes, this is yet another of my dozen-or-so topics I'm trying to draft a blog post about, under the working title "How do we know it's right?".) And this is where the two sides of the Curry–Howard correspondence part company. If you relentlessly drive your program syntax toward more lucid expression of algorithms, you obfuscate the inference rules of your type system, which is to say, the deductive steps of the corresponding proof. If you drive to simplify the type system, you're no longer headed for maximally lucid algorithms. In practice, it seems, you try to get simultaneously the best of both worlds and end up sliding toward the worst of both. (My past blog post on types is yonder.)

What we can't know

Enough type-bashing. Back on the Gödel front we were hoping for robust maxima of truth and of computable truth. What does Gödel's Theorem actually tell us about these goals?

First of all we should recognize that Gödel's Theorem is not, primarily, a limit on what can be known by mechanical means. The uncomputability of the Halting problem (proven above) is a limit on what can be known by mechanical means. Gödel's Theorem is a limit on what can be known at all. That is, it bears more on truth than it does on truth through computation. I touched on this point above. To further clarify, we can use a notion that played a brief role in Alan Turing's doctoral dissertation at Princeton (supervised by Alonzo Church), called an oracle.

Suppose we attach a peripheral device, O, to a Turing machine. The Turing machine, chugging along mechanically, can at any time ask a question of O, and O will give an answer (based on the question) in one step of the machine. O is a black box; we don't say how it works, and there's no need for it to be mechanical inside. Maybe there's a djinn, or a deity, or some such, inside O that's producing the answer. We're only assuming that it will always immediately answer any question asked of it. That's what we mean by an oracle. We'll call a Turing machine equipped with this device an O-machine.

Let's revisit the Halting problem, and see what we can —and can't— do to change the situation by introducing an oracle. Our basic result, remember, is that there is no purely mechanical means to determine whether or not an arbitrary Turing machine will halt on an arbitrary input. Okay. What if we had a non-mechanical means? Precisely, let us suppose we have an oracle O that will always tell us immediately whether a given ordinary Turing machine will halt on a given input. This supposition doesn't appear to have, in itself, any unfortunate consequences. Under the supposition that we have such an oracle O, we can easily build an O-machine that determines whether a given ordinary Turing machine will halt on a given input. What we can't do, given this oracle O, is build an O-machine that determines whether a given O-machine halts on a given input. This is one of those results that's easier to show if we make it stronger. Without even knowing what an oracle O does, we can prove that there is no O-machine that always halts that determines whether any described O-machine will halt on a given input. This is because we can describe all possible O-machines without ever having to describe the internals of O. O is the same for all of them, after all; we only have to describe the parts of the O-machines that differ from each other, and that we can do finitely no matter what O is. So we can still enumerate our O-machines, and diagonalize just as we did in the earlier section to prove an ordinary Turing machine couldn't solve the ordinary Halting problem. No matter what O is, even if it's got an omniscient being hidden inside it, an O-machine can't solve the O-Halting problem.

Likewise, our diagonalization to prove Gödel's Theorem works just as well for O-machines as for ordinary Turing machines. For any oracle O, let L be an O-machine that confirms some propositions, and fails to confirm others, about whether or not given O-machines on given inputs produce given outputs. Reasoning just as before, if we know that L confirms all true claims, then we cannot know that L doesn't confirm any false claims; if we know it doesn't confirm any false claims, then we cannot know it confirms all true claims. So even if we're allowed to consult an oracle, we still can't achieve an understanding of truth, L, that confirms all and only true claims (about O-machines, lest we forget).

We are now placing limits on what we can know, or equivalently, on what can be true. To get such results we must have started with some constraints, and we've conspicuously placed no constraints at all on O, not even computability. It's worth asking what our constraints were, that led to the result.

For one thing, we have required truth to agree with the reasoning in the diagonalization argument of our proof of the Theorem. Interestingly, this makes clear that the oracle itself is not our notion of truth, for we didn't require the oracle to respect the reasoning in our diagonalization; rather, with no constraints on how the oracle derives its answers from the questions asked of it, we proved a limitation on what those answers could mean. The O-machine L, together with our reasoning about it, became our I Ching, the means by which we mapped the oracle's behavior into the realm of truth.

The reasoning in the diagonalization isn't all we introduced; we also introduced... what? The requirement that our djinn/deity/whatever-it-is feed its answers to a mechanical device. The requirement that it base its answers only on what question the mechanical device asked of it. Ultimately, the requirement that the truth be about discrete inputs and outputs and be expressed as discrete propositions.

The discreteness of the mechanical device makes it possible to enumerate, and therefore diagonalize. The discreteness of the question/answer interactions with the oracle is apparently a corollary to that. The requirement that the oracle base its answer only on the question asked... is thought-provoking, since modern physics leans toward the concept of entanglement. Heisenberg's uncertainty principle already came up once above. One might wonder if truth ought to be reimagined in some way that makes it inherently part of the system in which it exists, rather than something separate; but here, the discrete separations — between machine and oracle, between subject and truth — seem a rather natural complement to the discreteness of the machine.

The common theme of discreteness is reminiscent of the importance attached to "chunking" of information in my earlier post on sapience and language. Moreover, the relationship between discreteness and continuity seems to be cropping up in several of my current major avenues of investigation — Gödel, linguistics, sapience, fundamental physics — and I find myself suspecting we lack some crucial insight into this relationship — hopefully, an insight with a very high profundity index, because the time required to acquire the insight, for the numerator of the index, appears to be measured in millennia, so in order for us to grok it once found we should hope for the denominator to be a lot smaller. However, I've no immediate thoughts in this direction. In the current case, it's more than a little mind-bending to try to construct a notion of truth that's inherently not separable into discrete propositions; indeed, one might wonder if the consequences of the attempt could be rather Lovecraftian. So for the nonce I'll keep looking for profundity around Gödel's results without abandoning the concept of a proposition.

Gödel's Second Theorem

Gödel's Second Theorem says that if a sufficiently powerful formal system L can prove itself consistent, then L is not consistent.

Where Gödel's First Theorem was, as I interpret it, mainly about the relationship between L and truth, his Second Theorem is much more concerned with the specific behavior of L. I covered the "sufficiently powerful" clause in the First Theorem mostly by completeness, i.e., by requiring that L confirm everything we know is true. The Second Theorem isn't about completeness, though, so we need something else. Just as we have already required truth to agree with our diagonalization argument, we'll now separately require L to agree with the diagonalization argument too. And, we don't want to require any controversial behaviors from L, like the Law of the Excluded Middle or reductio ad absurdum, since they would compromise the generality of our result. Here's a set of relatively tame specific behaviors. Write [T:i⇒o] and [T:i⇏o] for the canonical representations, recognized by L, of the propositions that machine T on input i does or does not produce output o.

[a] If T(i)=o, then L confirms [T:i⇒o].
There is a canonical way, recognizable by L, to set up a machine (S∧T) that combines two other machines S and T, by running each of them on its input, requiring them both to produce the same output, and producing that output itself. We require L to do some reasoning about these machines:
- [b] L confirms [(T∧T):i⇏o] iff it confirms [T:i⇏o].
- [c] If L confirms [(S∧T):i⇏o], then it confirms [(T∧S):i⇏o].
There is a canonical way, recognizable by L, to set up a machine (S∘T) that combines two other machines S and T, by running T on its input, then running S on the output from T, and producing the output from S. We require L to do some reasoning about these machines:
- [d] If T(i)=v, then L confirms [(S∘T):i⇏o] iff it confirms [S:v⇏o].
- [e] If S₁(i)=S₂(i)=v, then L confirms [((R∘S₁)∧T):i⇏o] iff it confirms [((R∘S₂)∧T):i⇏o].
- [f] L confirms [(((Q∘R)∘S)∧T):i⇏o] iff it confirms [((Q∘(R∘S))∧T):i⇏o].
- [g] L confirms [((S₁∘T)∧(S₂∘T)):i⇏o] iff it confirms [((S₁∧S₂)∘T):i⇏o].
There is a canonical way, recognizable by L, to set up a machine (T⇒o) that combines a machine T with an output o, by constructing a proposition that T on the given input produces output o. That is, (T⇒o)(i)=[T:i⇒o]. We require L to do some reasoning about these machines:
- [h] L confirms [(S∧T):i⇏confirm] iff it confirms [((L∘(S⇒confirm))∧T):i⇏confirm].

For us, several of these deductions would likely be special cases of more general rules, and maybe they are for L; we only require that L reach these conclusions by some means.

In order for L to prove its own consistency, we need to define consistency as an internal, checkable feature of L's behavior — rather than by the external feature that L says nothing false according to the external notion of truth. Leading candidates are

L does not confirm any antinomy; that is, there is no proposition such that L confirms both it and its negation.
There is some proposition that L does not confirm.

The second of these was widely favored in the early twentieth century: inconsistency means confirming all propositions whatever. Given reductio ad absurdum and the Law of the Excluded Middle, the first implies the second, thus: Starting with a known proof of an antinomy, consider any proposition q. Supposing not-q, we can derive an antinomy (because we could have derived it even if we hadn't supposed not-q), therefore, by reductio ad absurdum, the supposition must be false; that is, not-q is false. But then, by the Law of the Excluded Middle, since q definitely isn't false, it must be true. So every proposition can be proven.

I'll use the first of these two internal notions of consistency, because it is the weaker of the two, so that any theorem we derive from it will be a stronger result. Construct machines T_yes and T_no that, given a machine and an input/output pair, output propositions asserting the machine does and does not have that input/output: T_yes(T,i,o)=[T:i⇒o], T_no(T,i,o)=[T:i⇏o]. As self-proof of consistency, require that L confirm, for every possible input i, proposition [((L∘T_yes)∧(L∘T_no)):i⇏confirm].

Start out just as in the diagonalization for the First Theorem. Imagine a table where the column labels are all possible inputs, the row labels are all possible machines, and each entry contains a "yes" if L confirms that machine on that input outputs confirm, a "no" if L confirms that machine on that input doesn't output confirm. Construct a machine A as follows. First, construct A₀ that, on the m^th input, outputs the machine/input/output needed to construct a proposition that the m^th machine on the m^th input does or does not produce output confirm. (Remember, A₀ can do this by counting the column labels till it finds the input, then counting the row labels to find the corresponding machine.) Then just let A=(L∘(T_no∘A₀)). Let n be the row number of the row labeled by A. We're interested in the behavior of A on the n^th input; but this time, instead of just considering what is true about this behavior, we're interested in what L confirms about it.

Call the n^th input i. We've got all our dominoes lined up; now watch them fall.

By construction of A₀, A₀(i)=(A,i,confirm).

By construction of T_yes, T_yes(A,i,confirm)=[A:i⇒confirm].

By construction of T_no, T_no(A,i,confirm)=[A:i⇏confirm].

By consistency self-proof, L confirms [((L∘T_yes)∧(L∘T_no)):(A,i,confirm)⇏confirm].

By [d], L confirms [(((L∘T_yes)∧(L∘T_no))∘A₀):i⇏confirm].

By [g], L confirms [(((L∘T_yes)∘A₀)∧((L∘T_no)∘A₀)):i⇏confirm].

By _[f] and [c], L confirms [((L∘(T_yes∘A₀))∧(L∘(T_no∘A₀))):i⇏confirm].

By definition of A, L confirms [((L∘(T_yes∘A₀))∧A):i⇏confirm].

By [e], L confirms [((L∘(A⇒confirm))∧A):i⇏confirm].

By [h], L confirms [(A∧A):i⇏confirm].

By [b], L confirms [A:i⇏confirm].

By [a] and definition of A, L confirms [A:i⇒confirm].

For a bonus, by [a], L also proves itself inconsistent by confirming [((L∘T_yes)∧(L∘T_no)):i⇒confirm].

I don't think this result has much to do with the specific assumptions [b]–[h] about the behavior of L; going through the proof leaves me impressed by how relatively innocuous those assumptions were. Which, to my mind, is an insight well worth the exercise of going through the proof.

Turing's dissertation

I've mentioned Turing's American doctoral dissertation several times. (American because he already had sufficient academic credentials in Europe.)

Since Gödel had shown a formal system can't prove itself consistent, it was then of interest to ask how much more than a given formal system would be needed to prove it consistent. Gerhard Gentzen produced some interesting results of this sort, exploring the formal consequences of postulating restricted forms of mathematical induction (before he was arrested by the Germans late in World War II, transferred to the custody of the Soviets, and starved to death in a Soviet POW camp). Turing's dissertation explored another approach: when considering a formal system L, simply construct a new system L₁ that adds to L a postulate saying L is consistent. Naturally you can't add that postulate to L since it would (presuming sufficient power) cause L to become inconsistent if wasn't already; but if L actually was consistent to start with, L₁ should be consistent too since L₁ can't prove its own consistency, only that of L. To prove the consistency of L₁, you can construct an L₂ that adds to L₁ a postulate saying L₁ is consistent. And so on. In fact, Turing didn't even stop with L_k for every positive integer k; he supposed a family of formal systems L_a for any a representing an ordinal. Ordinals are (broadly) the sizes of well-ordered sets; there are infinitely many countably infinite ordinals; there are uncountably infinite ordinals. The only restriction for Turing was that since all this was being done with formal systems, the ordinals α had to be finitely represented.

Turing made the — imho, eminently reasonable — suggestion that the value of this technique lies in explicitly identifying where additional assumptions are being introduced. The additional assumptions themselves, consistency of all the individual ordinal logics in the system, are justified only by fiat, so nothing has been apparently contributed toward reaching truth through computation; the axioms weren't reached through computation, merely acknowledged in computation. Interestingly, nothing has been contributed toward getting around Gödel's Theorems, either: in terms of the framework set up in this post, the entire system of logics can be handled by a single formal system that is itself still subject to the Theorems.

I set out to simplify the situation, pruning unnecessary complications in order to better understand the essence of Gödel's results. It seems an encouraging sign for the success of that agenda, that Turing's authentically interesting but elaborate dissertation has little to say about my treatment — since this means the complexities addressed by his approach have been pruned.

Where do we go from here?

Types. The above suggests correctness proof should be separated from detailed program syntax. As I've remarked elsewhere, it should be possible to maintain propositions and theorems as first-class objects, so that proof becomes the computational act of constructing a theorem object — if one can determine the appropriate rules of deduction, which makes this concern dependent on choosing the appropriate notion of proof.

Truth. Gödel's First Theorem, as treated here, says that truth about the behavior of a formal logic must conform to a certain constraint on the shape of that truth. The impression I've picked up, in constructing this post, is that this constraint should not be particularly troubling. A Euclidean triangle can't have more than one obtuse interior angle; and the truth about a formal logic can't include both completeness and consistency. I've been progressively mellowing about this since the moment I started thinking of a formal system as a filter on truth — an insight preserved above at the comparison to the I Ching.

Proof. Gödel's Second Theorem, as treated here, not only places a notably small constraint on proof (as the First Theorem does on truth), but does so in a way notably disengaged from controversial rules of deduction. The insight I take from this is that, in seeking a robust notion of proof, Gödel's Theorems aren't all that important. The great paradigm crisis of the early twentieth century (in the admittedly rather specialized area of mathematical foundations) was about antinomies implied by the axioms, not self-proofs of consistency. The self-proofs of consistency were just an envisioned possible countermeasure, and when Gödel showed the countermeasure couldn't work, the form of his Second Theorem, together with the lines of research people had already been pursuing, led them to the red herring of an infinite hierarchy of successively more powerful systems each able to show the consistency of those below it. Whatever interesting results may accompany the infinite hierarchy, I now suspect they have nothing much to say about a robust maximum of proof. Turing's ordinal logics — in essence an explicit manifestation of the infinite hierarchy — were, remember, about assuming consistency, not establishing it.

So it seems to me the problem of interest — even in my earlier post on bypassing no-go theorems, where Gödel's results served as the rallying point for my explorations — is how to fix the axioms. In that regard, I have a thought, which clearly calls for a separate post, but I'll offer it here to mull over.

I speculated in the no-go theorems post that where the proof of Russell's Paradox ends after a few steps, leaving a foundational problem, an analogous recursive predicate in Lisp simply fails to terminate, which might or might not be considered a bug in the program but doesn't have the broad foundational import of the Paradox. If a person reasoned about whether or not the set A of all sets that do not contain themselves contains itself, they might say, "If A does not contain itself, then by definition it does contain itself; but then, since it does contain itself, by definition it does not contain itself; but then, since it does not contain itself, by definition it does contain itself..." — and the person saying this quickly sees that the reasoning is going 'round in circles. This is much like

($define! A ($lambda (P) (not? (P P)))) (A A)

except that the Lisp interpreter running this code probably doesn't deduce, short of a memory fault or the like, that the computation won't halt, whereas the human quickly deduces the non-halting. The formal proof, however, does not go 'round in circles. It says, "Suppose A does not contain itself. By definition, since A does not contain itself, A does contain itself. That's a contradiction, therefore by reductio ad absurdum, A does not not contain itself. By the Law of the Excluded Middle, since A does not not contain itself, A does contain itself. By definition, since A does contain itself, A does not contain itself. That's an antinomy." Why did this not go 'round in circles? Because the initial premise, "Suppose A does not contain itself", was discharged when reductio ad absurdum was applied. The human reasoner, though, never forgot about the initial assumption.

This isn't quite the same as a conditional proof, in which you start with an assumption P, show that this assumption leads to a consequence Q, and then conclude from this that P⇒Q regardless of whether or not P holds; the assumption has been discharged, but it lingers embedded in the conclusion. It only appears to have discharged the assumption because we have a notion of implication that neatly absorbs the memory of our having assumed P in order to prove Q. Really, all proofs are conditional in the sense that they only hold if we accept the rules of deduction by which we reached them; but we can't absorb that into our propositions using logical implication. We could still preface all of our propositions with "suppose that such-and-such laws of deduction are valid"; but when writing it formally we'd need a different notation because it's not the same thing as logical implication. We might be tempted to use something akin to a turnstile, which is (in my experience) strictly a meta-level notation — but in the case of reductio ad absurdum, it seems we don't want a meta-level notation. We want to qualify our propositions by the memory of other propositions we've rejected because they led to contradictions.

I'd expect to explore what this does to paradoxes (not just Russell's); I'm skeptical that in itself it would eliminate all the classical paradoxes. I do rather like the idea that the classical paradoxes are all caused by a single flaw, but I suspect the single flaw isn't in a single axiom; rather, it may be a single systemic flaw. It seems that, in some sense, not discharging the assumption in reductio ad absurdum is a way to detect impredicativity, the same source of paradoxes type theory was meant to eliminate.

I look forward to exploring that. Along with my dozen-or-so other draft post topics.

Sapience and language

2015-02-13T13:25:00.000-08:00

Language is the dress of thought
— Samuel Johnson, The Life of Cowley, 1779(?).
Only a few, out of the hundred, claimed to use mathematical symbology at all. [...] All of them said they did it [math or physics] mostly in imagery or figurative terms. An amazing 30% or so, including Einstein, were down here in the mudpies [doing]. Einstein's deposition said, "I have sensations of a kinesthetic or muscular type." Einstein could feel the abstract spaces he was dealing with, in the muscles of his arms and his fingers[...] almost no adult creative mathematician or physicist uses [symbology] to do it[...] They use this channel to communicate, but not to do their thing.
— Alan Kay, Doing With Images Makes Symbols, 1987.

I've some thoughts to pursue here, drawing together the timeline of human evolution, the nature of human consciousness, human language processing, evolutionary memetics, and a few other odds and ends.

I've been developing various elements of this for years; several of them have featured in earlier posts on this blog. They came together into a big picture of sorts early this year as I've been reading Richard Leakey's 1994 The Origin of Humankind. (I've found the content of great interest, although, for some reason I've been unable to quite place, the writing style often causes me to lose track of what he's just said and have to back up, sometimes by a page or more and sometimes more than once; I was sufficiently interested I was willing to back up for comprehension. Interestingly, the effect seemed less pronounced when the material was read aloud.)

Contents
Early hominids
Driverless cars
Thinking without language
The non-Cartesian theater
The dress of thought

Early hominids

Leakey's book describes controversies in paleoanthropology. Fascinating stuff if you're interested in how scientific ideas develop (which I am), or in how humans developed (which I also am). Much of it has to do with when in human evolution various peculiarly human traits emerged.

Darwin suggested that three major human traits all co-evolved, developing simultaneously as a package because they complement each other — making and using tools, bipedalism which frees up the hands to make and use tools, and a big brain for figuring out how to make and use tools. Since it's a cogent idea and, into the bargain, appeals to one's sense of human exceptionalism, the co-evolution idea held sway for about a century before people questioned it and it broke down under scrutiny. Bipedalism seems to have emerged around five million years ago, tools maybe two and half million years ago, and Leakey suggests a bigger brain more or less contemporaneous with tools.

Two other events are also of interest: the emergence of language, and the emergence of recognizably modern human behavior, with art, tools for making tools, tools with artistic flare, tools for making clothing, etc. The latter took place, by the evidence, around 35,000 years ago in Europe, some tens of thousands of years earlier in Africa; the beginning of the Upper Paleolithic, a.k.a. the Late Stone Age. When language emerged is harder to judge, but Leakey suggests it goes back as far as tools and the big brain. (It seems somewhat ironic that Leakey stresses how Darwin's co-evolution theory was wrong, and then Leakey separates out bipedalism but ends up promoting co-evolution of a different set of traits; I suspect he's likely right, but it is a bit bemusing that the theories would reshuffle that way.)

A question that hovers over Leakey's notion of early language development is, why did tool use evolve incredibly slowly for more than two million years, and then accelerate hugely at the beginning of the Upper Paleolithic? As it happens, I have an answer ready to hand, speculation of course but an interesting fit for the purpose (noting, by Leakey's account paleoanthropology has a healthy mix of speculation in it): the onset of the Upper Paleolithic might be the observable manifestation of the transition to Havelock's oral society from what, here in an earlier post, I called verbal society.

Our available example of verbal society — so I conjectured — is the Pirahã society recently discovered in the Amazon. An atypical example, necessarily under the conjecture, because examples typically would have disappeared dozens of millennia ago. To remind from the earlier post, here's a key excerpt from David J. Peterson's enumeration of anomalous properties of the Pirahã:

no temporal vocabulary or tense whatsoever, no number system, and a culture of people who have no oral history, no art, and no appreciation for storytelling.

If early genus homo had that sort of culture, it would seem to explain rather well why things picked up spectacularly when they got out of that mode and into the more advanced oral culture of Havelock's Preface to Plato.

This in turn offers some insight into a question that arose in the relation of verbal culture to my still earlier blog post on memetic organisms. According to my theory, sciences are a major taxon of memetic organisms specifically adapted to literate society, but they cannot survive in oral society; and religions were a major taxon of memetic organisms in oral society, but cannot survive in verbal society. The first commenter on my verbal-society post asked what sort of memetic organisms would be dominant in verbal society. I suggested language itself as a memetic taxon, a suggestion I'm now more doubtful of but which, in any case, seems at best a rather incomplete answer. If the transition to oral culture is the onset of the Upper Paleolithic, though, we have at least a basis from which to speculate on the memetics of verbal society, because verbal society is then the memetic environment that gives rise to the archaeological record of the first two and a half million years or so of human technology. I've no deep thoughts atm on what specifically one might infer from this archaeological record about the memetic evolution behind it, except perhaps that memetic evolution in verbal society was very slow; but in principle, it offers a place for such memetic investigations to start.

A side-plot in my post on verbal society was the observation that the verbal-to-oral transition was a good fit for the story of Adam and Eve's eating of the fruit of the tree of knowledge, and being expelled for that from the Garden of Eden. I got criticized for that comparison. I'd had the comparison in mind, really, not primarily based on likelihood of it actually being the origin of the story, but because I'd noticed a connection to the backstory of how we know about the Pirahã. As recorded in Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle (which I recommend), the Pirahã were studied by Christian missionaries; ultimately, the hope was to bring the Word of God to the Pirahã. The Pirahã culture wouldn't support a religion, which is an oral memetic organism; in fact, one of the missionaries (who wrote the book) ended up converting to atheism. When the comparison occurred to me between verbal-to-oral transition and eating of the fruit of the tree of knowledge, I was struck that this analogy would cast the Christian missionaries in the role of the Serpent in the Garden of Eden. The irony was irresistible, which is how the comparison ended up in the blog post.

In comments, the alternative suggestion was raised that the Fall corresponds to the development of agriculture (the onset of the Neolithic). A theory I'd heard before and that does seem to fit passably well. With several more years to think about it, though, I don't think these theories are mutually exclusive. The story could of course have nothing to do with either of these actual events; but if it has a basis in actual events, there's no reason it should be based on just one real event. In oral culture, elements are introduced into a tradition and then shift around and mutate, tending toward a stable form. So the story of the Fall, which eventually got written down, can contain echoes of multiple major ancient societal shifts that don't have to have happened all at the same time. Reconsidering the comparison now, I'm struck by the inclusion, on Leakey's list of innovations at the beginning of the Upper Paleolithic, of tools for making clothing — recalling that when Adam and Eve ate of the fruit of the tree of knowledge, one of the immediate effects was that they realized they were naked, and... made clothes. Huh.

Driverless cars

Proponents of aviation often cite statistics for how safe it is compared to driving. Statistics, though, even when accurate can often miss the point by presuming what questions ought to be asked. One such difficulty here is that the statistics comparing aviation to driving are per unit distance. I recall an SF novel (looking it up, A Talent for War) describing an interstellar hyperspace-jump technology in which ships sometimes just don't come back out of a jump — but it's the "safest form of travel per passenger-kilometer".

There is a second way those statistics on aviation versus driving can miss the point. Some things aren't even about probabilities. What if someone would rather risk themself in a situation where they have a significant degree of control over events rather than risk themself in a situation where they've completely given up control to someone else? That's a decision based on a philosophical criterion, not a numerical one, and there is nothing inherently irrational nor ignorant about it.

The matter has been further complicated by the advent of fly-by-wire airplane technology. If one could rationally prefer to retain control rather than give it up to another human, how about giving up control to another human versus to a software system? It has long been my impression (and I gather I'm not alone) that the people least willing to trust computers are found amongst those who know most about them. Now, here's a subtle point: it seems to me that when someone is concerned with whether or not they have significant control over events, it's not the routine events they care about, but the exceptional ones. The unforeseen circumstances. And here there does seem to be a qualitative difference between a human pilot and a fly-by-wire system. The fly-by-wire system is programmed some time beforehand by programmers who try to anticipate what situations the system should be able to handle; rather by definition, it doesn't know how to handle unforeseen circumstances. To some extent, at least, the fly-by-wire system incorporates knowledge of past accidents, although there may also be some conscious trade-offs in which not every known possibility is built into the system. The human, on the other hand, draws on their experience to try to cope with the situation — which might sound much the same as the fly-by-wire system incorporating knowledge of past accidents, but I submit is qualitatively different. The fly-by-wire approach is algorithmic, while the human approach is improvisational.

The difference may be clearer in the case of driving, which readers of this blog would seem more likely to have first-hand experience of (the reader is more likely to have driven a car than piloted an airplane).

Driverless cars have been touted lately as a coming technology. Those with a vested interest (be it financial or emotional) in the success of the technology naturally tend to portray it as "safer" than cars driven by humans. Yet we're told driverless cars couldn't be put on the road in numbers unless one also banned human drivers. Why would that be? Presumably because the driverless cars would have trouble with unexpected human behavior. Which begs the question, if humans are so unpredictable, and driverless cars are (going to be) so much better drivers than humans are, why would human drivers be better than driverless cars at coping with unpredictable drivers? It's often seemed to me, when some other driver does something unusual and I compensate for it, or I do something unusual and other drivers compensate for it, that what's really impressive about the statistics on traffic accidents isn't how high the numbers are, but how low the numbers are. Spend some time driving in traffic, and it's a good bet you'll see a bunch of situations where a human's unexpected behavior didn't result in an accident because other human drivers successfully compensated for it — improvisationally.

Humans are really good at coping with free-form unexpected situations. Comparatively. We don't always handle a sudden problem correctly; but we also don't fall off the edge of our programming, either. A computer program that doesn't know what to do really doesn't know what to do, in a way that clearly distinguishes natural intelligence from conventional software. (I'm interested here in the nature of human intelligence; artificial intelligence is not to the current point, though the current discussion might offer some insights into it.)

It seems likely to me that the point of the hominid big brain is to enable individuals to cope with unexpected situations. This might even help to explain our impulse to be in control of our own fate when an emergency happens: we are, as a species, specialists in coping with the unexpected, and it's in the best interests of our selfish genes that we each rely on our own ability to cope, so that the combinations of genes that cope most effectively will be favored in the gene pool. In other words, dispassionately, if somebody is going to be pruned from the population because of a failure to cope with an emergency, our selfish genes are better off if they get pruned for their own failure rather than someone else's. (Yes, social behaviors vastly complicate this; there's definitely a place for good Samaritans, heroes, etc.; but atm I'm commenting on why individuals would desire to control their own fate, not why they'd commit acts of altruism.)

One might take the speculative riff further than that. One of the questions that comes up in archaeology is, why would we have started in East Africa? Leakey suggests the emergence of bipedalism had to do with varying habitats brought on by the advent of the East African Rift. It certainly seems plausible bipedalism would have been enabled and favored by some sort of habitat shift. The subsequent emergence of tools, with co-evolving big brain and language, might then be supposed to follow simply from the opportunity provided by bipedalism; but for my part, I'm inclined to think the tool/brain/language effect may have required more of an evolutionary nudge than that. If it were that easily catalyzed, one might think it would have happened before us, and left evidence that we would have found by now. (Granted, that's easy to poke holes in. Maybe we're just randomly the first. Maybe it only happens once per planet because it rapidly causes the planet to be destroyed. Or —a rather more fun hypothesis— perhaps it has happened and left evidence, and the evidence is staring us in the face but we're assuming something that prevents us from seeing it. As long as you don't bring ancient aliens into it; I've contempt for fake science, though I enjoy exotic serious speculations.) So perhaps the tool/brain/language co-evolution got going, once enabled by bipedalism, because something in the environment was particularly irregular and therefore favored individuals able to individually cope with unforeseen circumstances.

What would this hypothetical environmental factor be, whose irregularity favors individual intelligence to cope with it? Besides the East African Rift, Leakey mentions the social intelligence hypothesis, that intelligence evolved to predict the behaviors of others in a complex social milieu. (This could put an interesting spin on Asperger's syndrome, which one might conjecture would isolate brain power from its usual application to socialization, potentially making a surplus available for other purposes.)

If that's too cerebral for you, here's an alternative theory: Maybe language started as a rather pointless mating display, like so many other exaggerated animal features such as the peacock's tail. So that men trying to chat up women in bars would be what the species originally evolved for. (In all seriousness, the two ideas are not mutually exclusive; language skill might have had some sort of survival value from the outset and consequently it was beneficial for it to be treated as desirable in a mate.)

Thinking without language

The Sapir–Whorf hypothesis says that language influences thought. Benjamin Lee Whorf, writing in the early-to-mid twentieth century, called this the principle of linguistic relativity, alluding to Einstein's theory of relativity since linguistic relativity implies that how we perceive the world varies with what language we use. Modern understanding of the hypothesis distinguishes strong and weak versions: the strong version says that the structure of a language prevents its speakers from non-conforming patterns of thought, the weak version that the structure of a language discourages its speakers from non-conforming patterns of thought. Typically for ideas named after people, Sapir and Whorf never coauthored a paper on the idea, didn't present it as a hypothesis, and didn't make the modern distinction between strong and weak versions.

The strong Sapir–Whorf hypothesis isn't taken very seriously by linguists nowadays, but the weak form is generally accepted to some extent or other. Popular culture enshrines the language–thought connection with tropes "in <language X>, there are forty different words for <Y>" and "in <language X>, there is no word for <Y>". Either claim is ambiguous as to which direction it expects the thought/language influence to go. Out of context, I'd guess forty-different-words-for is typically meant to imply that <language X> speakers think about <Y> a lot, which depends on thought influencing language; while no-word-for is more ambiguous in direction, and seems to me at least as often meant to imply that the speakers cannot even conceive of <Y>. Saying they can't conceive of it could still be using the language as evidence for thought, but seems likely to have a stiff dose of strong Sapir–Whorf mixed in. The glaring flaw in this strong Sapir–Whorf reasoning is that if your language doesn't have a word for <Y>, and you have a need for such a word, you're likely to invent a word for it — borrowing from another language, compounding from existing vocabulary, or whatever sort of coinage <language X> favors.

Word coinage is an example of how thought can, rather than being limited by language, drive expansion of language to encompass new realms of thought. This however raises a subtler question about the relation between language and thought. The meaning <Y> appears to conceptually presage the new word — but how much wiggle room is there between the ability to think of <Y> and the ability to express it? Could you have chosen in the example, as a <language X> speaker, to hold off on inventing a word for <Y>, and think about <Y> for a while without having a name for it? More so, is it possible to think without language? Is there perhaps a threshold of sophisticated thought, beyond which we need language to proceed?

Seems to me there's plenty of evidence of nontrivial thinking without language. It's definitely possible to think in pictures; I sometimes do this, and I've heard of others doing it (and, see the Alan Kay epigraph at the top of this post). One might suggest that pictures, being a concrete representation, are a sort of "language". But to amplify, it's possible to think in abstracts represented by pictures without any accompanying verbal descriptions of the abstracts. I'm confident of the lack of accompanying verbal descriptions because, in general, the abstracts may lack short names, while long, often awkward descriptions would have been conspicuous if present. In such a case, the pictures represent relationships amongst the abstracts, not the abstracts themselves, and possibly not all the relationships amongst them — so even if the pictures qualify as language, they express far less than the whole thought. More broadly, it's a common experience to come up with a deep idea and then have trouble putting it into words — which evidently implies that the two acts are distinct (coming up with it, and putting it into words), and frankly this effect isn't adequately explained by saying the thinker didn't "really" have an idea until they'd put it into words. The idea might become more refined whilst being put into words, and sometimes one finds in the process that the idea doesn't "pan out"... but there has to have been something to be refined, or something to not pan out.

So I don't find it at all plausible that thought arises from language. That, however, does not interfere either with the proposition that language naturally arises from thought, nor that language facilitates thought, both of which I favor. (I recall, a few years ago in conversation with an AI researcher, being mistaken for an advocate of symbolic AI since I'd suggested some reasoning under discussion could be aided by symbols. There's a crucial difference between aiding thought and being its foundation; these ideas need to be kept carefully separate.)

The non-Cartesian theater

Daniel Dennett's 1991 book Consciousness Explained is centrally concerned with debunking a common misapprehension about the internal structure of the human mind. Dennett reckoned the misapprehension is a holdover from the seventeenth-century mind–body dualism of René Descartes. In Cartesian dualism as recounted by Dennett, the body is a machine which, based on input to the senses, constructs a representation of the world in the brain where the mind/soul apprehends it. Although this doesn't actually solve the problem of interfacing between a material body and nonmaterial soul, it does at least simplify it by reducing the interface to a single specialized organ where the mind interacts with the material world (Descartes figured the point of interface was the pineal gland). Modern theorists envision the mind as an emergent behavior of the brain, rather than positing mind–body dualism; but they still have an unfortunate tendency, Dennett observed, to envision the mind as having a particular place — a "Cartesian theater" — where a representation of the world is presented for apprehension by the consciousness. An essential difficulty with this model of mind is that, having given up dualism, we can no longer invoke a supernatural explanation of the audience who watches the Cartesian theater. If a mind is something of type M, and we say that the mind has within it a Cartesian theater, then the audience is... something of type M. So the definition of type M is infinitely recursive, and supposing the existence of a Cartesian theater has no explanatory value at all. To understand consciousness better we have to reduce it to something else, rather than reducing it to the same thing again.

As an alternative to any model of mind based on the Cartesian theater, Dennett proposes a "multiple drafts" model of consciousness, in which representations of events are assembled in no one unique place (Cartesian theater) and, once assembled, may be used just as any other thoughts, revised, reconciled with each other, etc. Which is all very well but seems to me to be mostly a statement of lack of certain kinds of constraints on how consciousness works, with little to say about how consciousness actually does work.

It occurred to me in reading Dennett's book, though, that in setting about debunking a common misapprehension he was also making a mistake I'd seen before — when reading Richard Dawkins's 1976 book The Selfish Gene. Dawkins's book is another that sets about debunking a common misapprehension, in that case group selection, the idea that natural selection can function by selecting the most fit population of organisms. Dawkins argued compellingly that natural selection is intrinsically the differential survival of genes: whatever genes compete most successfully come to dominate, and only incidentally do these genes assemble and manipulate larger structures as means to achieve that survival, such as individual organisms, or populations of organisms. A hardy individual, or a hardy population, may help to make genes successful, but is not what is being selected; natural selection selects genes, and all else should be analyzed in terms of what it contributes to that end. Along the way, Dawkins illustrates the point with examples where people have fallen into the trap of reasoning in terms of group selection — and this is where, it seemed to me, Dawkins himself fell into a trap. The difference between group selection, where a population of organisms is itself selected for its success, and gene selection, where a population of organisms may promote the success of genes in its gene pool, is a subtle one. I got the impression (though I highly recommend Dawkins' book) that Dawkins was somewhat overeager to suppose thinking about successful populations must be based on the group-selection hypothesis, and thereby he might be led to discount some useful research.

Likewise, Dennett seemed to me overenthusiastic about ascribing a Cartesian theater to any model of consciousness that involves some sort of clearinghouse. That is, in his drive to debunk the Cartesian theater, the possibility of a non-Cartesian theater may have become collateral damage. This struck me as particularly unfortunate because I was already coming to suspect that a non-Cartesian theater might be a useful model of consciousness.

There is a widespread perception (my own first encounter with it was in a class on HCI in the 1980s) that the human mind has a short-term memory whose size is about seven plus-or-minus-two chunks of information. Skipping much historical baggage associated with this idea (it can be a fatal mistake, when looking for new ideas, to take on a set of canonical questions and theories already known to lead to the very conceptual impasse one is trying to avoid), the notion of a short-term memory of 7±2 chunks has interesting consequences when set against the "Cartesian theater". Sure, we can't reduce consciousness by positing a Cartesian theater, but this short-term memory looks like some kind of theater. If short-term memory stores information in chunks, and our brain architecture has an evident discrete aspect to it, and experience suggests thoughts are discrete, perhaps we can usefully envision the audience of this non-Cartesian theater as a collection of agents, each promoting some thought. An agent whose thought relates associatively to something on-stage (one of the 7±2 chunks) gets amplified, and the agents most successfully amplified get to take their turn on-stage, where they can bring their thought to the particular attention of other members of the audience. There are all sorts of opportunities to fine-tune this model, but as a general heuristic it seems to me to answer quite well for a variety of experienced phenomena of consciousness — fundamentally based on a non-Cartesian theater.

The dress of thought

If "chunking" information is a key strategy of human thought, this might naturally facilitate representing thought using sequences of symbols with a tree-like structure. Conversely, expressing thought in tree-like linguistic form would naturally afford facile thinking as well as facile expression. Thus, as suggested above, language would naturally arise from thought and would facilitate thought. As an evolutionary phenomenon, development of thought and development of language would thus favor each other, tending to co-evolve.

Treating language as a naturally arising, and facilitating, but incomplete manifestation of thought seems to me quite important for clear thinking about memetic evolution, notably for clear thinking about memetic evolution across my conjectured phase boundaries of culture, from verbal to oral and from oral to literate. The incompletion tells us we should not try to understand memes, nor culture, nor even language evolution, as a purely linguistic phenomenon. It does raise interesting questions about how culture and thought — the stuff of memes — are communicated from person to person (memetically, from host to host). Does the Pirahã language instill the Pirahã culture? Experience suggests that communicating with people in their physical presence is qualitatively more effective than doing so by progressively lower-bandwidth means, with text-only communications — which are extremely widespread now thanks to the internet — way down near the bottom. Emoticons developed quite early in the on-line revolution. Does the internet act as a sort of cultural filter, favoring transmission of some kinds of culture while curtailing others? I don't mean to answer these questions atm; but I do suggest that exploring them, including realizing they should be asked, is facilitated by understanding the thought–language relationship.

Somewhere along the evolutionary curve, co-evolving thought and language become an effective platform for the evolution of memetic lifeforms. From there onward, we're pushed to reconsider how we think about the evolutionary process. I agree with Dawkins that group selection is a mistake in thinking — that groups of organisms, and individual organisms, are essentially phenotypes of genes, and our understanding of their evolution should be grounded in survival of the fittest genes. The relationship between genes and memes, though, is something else again. At best, our genes and our memes enjoy a symbiotic relationship. Increasingly with risings levels of memetic evolution, our genes might be usefully understood as vehicles for our memes, much as Dawkins recommended understanding organisms as vehicles for our genes. This is in keeping with a general trend I've noticed, that thinking about memetic evolution forces us to admit that concepts in evolution have fuzzy boundaries. From the more familiar examples in genetic evolution we may habitually expect hard distinctions between replicator and phenotype, between organism and population; we want to assume an organism has a unique, constant genome throughout its life; we even want, though even without memetics we've been unable to entirely maintain, that organisms are neatly sorted into discrete species. All these simplifying assumptions are a bit blurry around the edges even in genetic biology, and memetics forces us to pay attention to the blur.

As a parting shot, I note that chunking is an important theme in two other major areas of my blogging interest. In one of several blog posts currently in development relating to physics (pursuant to some thoughts already posted, cf. here), I suggest that while quantum mechanics is sometimes portrayed as saying that in the strange world of the very small, seemingly continuous values turn out to be discrete, perhaps we should be thinking of quantum mechanics the other way around — that is, our notion of discrete entities works in our everyday world but, when we try to push it down into the realm of the very small it eventually cannot be pushed any further, and the pent-up continuous aspect of the world, having been smashed as flat as it can be, is smeared over the whole of creation in the form of quantum wave functions with no finite rate of propagation. In another developing post, I revisit my previously blogged interest in Gödel's Theorem, which arose historically from mathematicians' struggles to use discrete — which is to say, symbolic — reasoning to prove theorems about classical analysis. I don't go in for mysticism: I'm not inclined to suppose we're somehow fundamentally "limited" in our understanding by the central role of information chunking in our thought processes (cf. remarks in a previous blog post, here); but it does seem that information chunking has a complexly interesting interplay with the dynamics of the systems we think about.

Why is beta-substitution like the Higgs boson?

2014-04-04T16:49:00.000-07:00

"Why is a raven like a writing desk?"
...
"No, I give it up," Alice replied. "What's the answer?"
"I haven't the slightest idea," said the Hatter.
— Alice's Adventures in Wonderland, Chapter 7, Lewis Carroll.

I'm always in the market for new models of how a system can be structured. A wider range of models helps keep your thinking limber; the more kinds of structure you know, the more material you have to draw inspiration from when looking for alternatives to a given theory.

Several years ago, in developing vau-calculi, I noticed a superficial structural similarity between the different kinds of variable substitution I'd introduced in my calculi, and the fundamental forces of nature in physics. (I mentioned this in an earlier blog post.) Such observed similarities can, of course, be white noise; but it's also possible that both seemingly unrelated systems could share some deep pattern that gives rise to the observed similarity. In the case of vau-calculi and physics, the two systems are so laughably disparate that for several years I didn't look past being bemused by it. But just recently I was revisiting my interest in physics TOEs (that's Theory of Everything, the current preferred name, last I heard, for what colloquially used to be called Grand Unified Theory, and before that, Unified Field Theory), and I got to thinking.

This will take a bit of set-up, and the payoff may be anticlimactic; but given the apparent extreme difficulty of making progress in this area at all, I'll take what I can get.

Contents
Substitution in vau calculi
Theories of Everything
Hygienic physics

Substitution in vau calculi

Traditionally in λ-calculi, all variables are bound by a single construct, λ, and manipulated by a single operation, called substitution. Substitution is used in two ways.

The major rearrangements of calculus terms take place in an action called β-rewriting, where a variable is completely eliminated by discarding its binding λ and replacing all references to it in the body of the old λ with some given argument. The part about eliminating the old λ is just a local adjustment in the structure of the term; but the replacement of references is done by β-substitution, which is not localized at a particular point in the term structure but instead broadcast across the body (an entire branch of the term's syntax tree). When you do this big β-substitution operation, you have to be careful. A naive rule for substituting argument A for variable x in body B would be "replace every reference to x in B with A". If you naively do that you'll get into trouble, because B might contain λs that bind either x, or some other variable that's referred to in A. Then, by following that naive rule you would lose track of which variable reference is really meant to refer to which λ. This sort of losing track is called bad hygiene.

To maintain hygiene during β-substitution, we apply α-renaming, which simply means that we replace the variable of a λ, and all the references to it, with some other variable that isn't being used for other purposes and so won't lead to confusion. This is a special case of the same sort of operation as β-substitution, in which all references to a variable are replaced with something else; it just happens that the something else is another variable. These two cases, β-substitution and α-renaming, are not perceived as separate functions, just separate uses of the same function — substitution.

It's possible to extend λ-calculus to encompass side-effectful behaviors — say, continuations and mutable storage — but to do so with well-behaved (technically, compatible) rewriting rules, you need some sort of bounding construct to define the scope of the side-effect. In my construction of vau-calculus (a variant λ-calculus), I developed a general solution for bounding side-effects with a variable-binding construct that isn't λ, and operating the side-effects using a variable-substitution function different from β-substitution. (Discussed here.)

I ended up with four different kinds of variables, each with its own substitution operation — or operations. All four kinds of variables need α-renaming to maintain hygiene, though, and for the three kinds of side-effect variables, α-renaming is not a special case of operational substitution. If you count each variable type's α-renaming as a separate kind of substitution, there are a total of nine substitution functions (one both α and operational, three purely α, and five purely operational). λ-variables emerge as a peculiarly symmetric case, since they're the only type of variable whose substitutions (α and β) are commensurate.

This idea of multiple kinds of variables was not, btw, an unmixed blessing. One kind of variable — environment variables — turned out to be a lot more complicated to define than the others. Two kinds of variables (including those) each needed a second non-α substitution, falling into a sort of gray area, definitely not α-renaming but semi-related to hygiene and not altogether a full-fledged operational substitution. The most awkward part of my dissertation was the chapter in which I developed a general theory of rewriting systems with multiple kinds of variables and substitution functions — and the need to accommodate environment variables was at the heart of the awkwardness.

Theories of Everything

Theoretical physics can be incredibly complicated; but when looking for possible strategies to tackle the subject, imho the only practical way to think about it is to step back from the details and look at the Big Picture. So here's my take on it.

There are, conventionally, four fundamental forces: gravity, electromagnetism, the weak nuclear force, and the strong nuclear force. Gravity was the first of these we got any sort of handle on, about three and a half centuries ago with Isaac Newton's law of universal gravitation. Our understanding of electromagnetism dates to James Clerk Maxwell, about one and a half centuries ago. We've been aware of the weak nuclear force for less than a century, and the strong nuclear force for less than half a century.

Now, a bit more than a century ago, physics was based on a fairly simple, uniform model (due, afaics, to a guy about two and a half centuries ago, Roger Joseph Boscovich). Space had three Euclidean dimensions, changing with respect to a fourth Euclidean dimension of time; and this three-dimensional world was populated by point particles and space-filling fields. But then in the early twentieth century, physics kind of split in two. Two major theories arose, each of them with tremendous new explanatory power... but not really compatible with each other: general relativity, and quantum mechanics.

In general relativity, the geometry of space-time is curved by gravity — and gravity more-or-less is the curvature of space-time. The other forces propagate through space-time but, unlike gravity, remain separate from it. In quantum mechanics, waves of probability propagate through space, until observation (or something) causes the waves to collapse nondeterministically into an actuality (or something like an actuality); and various observable quantities are quantized, taking on only a discrete set of possible values. These two theories don't obviously have anything to do with each other, and leave gravity being treated in a qualitatively different way than the other forces.

Once gravity has become integrated with the geometry of space-time — through which all the forces, including gravity, propagate — it's rather hard to imagine achieving a more coherent view of reality by undoing the integration already achieved in order to treat gravity more like the other forces. As a straightforward alternative, various efforts have been taken to modify the geometry so as to integrate the other forces into it as well. This is made more challenging by the various discrete-valued quantities of quantum mechanics, as the geometry in general relativity is continuous. The phenomena for which these two theories were created are at opposite scales, and the two theories are therefore perceived as applying primarily at those scales: general relativity to the very large, and quantum mechanics to the very small; so in attempting to integrate the other forces into the geometry, modification of the geometry tends to be at the smallest scales. The two most recently-popular approaches to this are, to my knowledge, string theory and loop quantum gravity.

I've remarked in an earlier blog post, though, that the sequence of increasingly complex theories in physics seems to me likely symptomatic of a wrong assumption held in common by all the theories in the sequence (here). Consequently, I'm in the market for radically different ways one might structure a TOE. In that earlier post, I considered an alternative structure for physics, but I wasn't really looking at the TOE problem head-on; just observing that a certain alternative structure could, in a sense, eliminate one of the more perplexing features of quantum mechanics.

Hygienic physics

So here we have physics, with four fundamental forces, one of which (gravity) is somehow "special", more integrated with the fabric of things than the others are. And we have vau-calculus, with four kinds of variables, one of which (λ-variables) is somehow "special", more integrated with the fabric of things than the others are. Amusing, perhaps. Not, in itself, suggestive of a way to think about physics (not even an absurd one; I'm okay with absurd, if it's different and shakes up my thinking).

Take the analogy a bit further, though. All four forces propagate through space-time, but only gravity is integrated with it. All four operational substitutions entail α-renaming, but only β-substitution is commensurate with it. That's a more structural sort of analogy. Is there a TOE strategy here?

Well, each of the operational substitutions is capable of substantially altering the calculus term, but they're all mediated by α-renaming in order to maintain hygiene. There's really quite a lot more to term structure than the simple facets of it affected by α-renaming, with the quite a lot more being what the rewriting actions, with their operational substitutions, engage. There is, nonetheless, for most purposes only one α-renaming operation, which has to deal with all the different kinds of variables at once, because although each operational substitution directly engages only one kind of variable, doing it naively could compromise any of the four kinds of variables.

Projecting that through the structural analogy, we envision a TOE in which the geometry serves as a sort of "hygiene" condition on the forces, but is really only a tangential facet of the reality that the forces operate on — impinging on all the forces but only, after all, a hygiene condition rather than a venue. Gravity acts on the larger structure of reality in a way that's especially commensurate with the structure of the hygiene condition.

Suggestively, quantum mechanics, bearing on the three non-gravitational forces, is notoriously non-local in space-time; while the three kinds of non-λ variables mediate computational side-effects — which is to say, computational effects that are potentially non-local in the calculus term.

The status of gravity in the analogy suggests a weakness in the speculations of my earlier post on "metaclassical physics": my technique for addressing determinism and locality seems to divorce all forces equally from the geometrical structure of reality, not offering any immediately obvious opportunity for gravity to be any more commensurate with the geometry than any other force. I did mention above, that post wasn't specifically looking at TOEs; but still, I'm inclined to skepticism about an approach to fundamental physics that seeks to mitigate one outstanding problem and fails to suggest mitigation for others — that's kind of how we ended up with this bifurcated mess of relativity and quantum mechanics in the first place. As I also remarked in that post when discussing why I suspect something awry in theoretical physics, while you can tell you're using an unnatural structure by the progression of increasingly complicated descriptions, you can also tell you've hit on the natural structure when subsidiary problems just seem to melt away, and the description practically writes itself. Perhaps there's a way to add a hygiene condition to the metaclassical model, but I'd want to see some subsidiary problems melting.

Supposing one wants to try to construct a TOE, metaclassical or not, based on this strategy, the question that needs answering is, what is the primary structure of reality, to which the geometry serves a sort of tangential hygiene-enforcement role? For this, I note that the vau-calculus term structure is just a syntactic representation of the information needed to support the rewriting actions, mostly (though not exclusively) to support the substitutions. So, the analogous structure of reality in the TOE would be a representation of the information needed to support... mainly, the forces, and the particles associated with them. What we know about this information is, thus, mainly encapsulated in the table of elementary particles. Which one hopes would give us the wherewithal to encompass gravity — the analog to β-substitution — since it includes a mediating particle for mass: the Higgs boson.

[Note: I've further explored the rewriting/physics analogy in a later post, here.]

The great vectors-versus-quaternions debate

2014-03-19T09:19:00.000-07:00

What? You've never heard of it? A big knock-down, drag-out fight between great minds of its day over, more-or-less, the philosophy of how to go about mathematical physics. None of this "let's do an experiment to distinguish between these two theories" stuff; that's for wimps. This was the deep stuff: nuts-and-bolts versus mathematical elegance; generic versus well-behaved; even, so we're told, particles versus waves (I kid you not).

Old paradigms get crushed; it's part of how new paradigms establish and maintain themselves. History gets buried. But that doesn't mean we have to like it, or stand for it. As I write this, here's the sum total of what Wikipedia's article History of quaternions has to say about this colorful event in the history of mathematical physics:

From the mid-1880s, quaternions began to be displaced by vector analysis, which had been developed by Josiah Willard Gibbs and Oliver Heaviside. Both were inspired by the quaternions as used in Maxwell's A Treatise on Electricity and Magnetism, but — according to Gibbs — found that "... the idea of the quaternion was quite foreign to the subject." Vector analysis described the same phenomena as quaternions, so it borrowed ideas and terms liberally from the classical quaternion literature. However, vector analysis was conceptually simpler and notationally cleaner, and eventually quaternions were relegated to a minor role in mathematics and physics.

Yawn. (Also, so much for Wikipedian neutrality; but that's a different can of worms.)

In this post, I resurrect a paper about the vectors-versus-quaternions debate, written in the Long Ago when we used things called typewriters, and wrote-in special symbols by hand. It's been languishing for years in a file folder, one of those physical things that are the models for the icons on your phone.

Here's how the paper came about. I learned about the vectors-versus-quaternions debate from my father. In fact, I learned about quaternions from my father. I inherited his enthusiasm for them. And then, in my third year at WPI, I seized an opportunity to study the debate in depth.

One of the requirements for the BS degree at WPI was the Humanities Sufficiency project. The idea was that tech students should be well-rounded, so they should take (and pass) a bunch of humanities classes and then, building on those classes, write a term paper on a subject that bridges the gap between humanities and sciences. WPI had an undergraduate grade called "NR", short (I think) for "Not Recorded": if you didn't pass a course, it didn't go on your transcript (though you didn't get your tuition back, naturally). This resulted in students taking some classes because they were interested, and being sometimes more concerned with learning than with getting high grades. So you'll understand when I say, it took me till my third year to accumulate the class credits I needed for the Sufficiency because I only passed about 50% of the humanities classes I took. Though a bunch of people, including me, were rather bemused when I not only passed, but got high marks in, Philosophical Theories of Knowledge and Reality.

I chose vectors-versus-quaternions as my topic, with the no-nonsense title "Quaternions: A Case Study in the Selection of Tools for Mathematical Physics". The Sufficiency was ordinarily a half-semester project, but wasn't required to be, and with such a juicy topic and personal interest, of course I took a bit over that. Professor Parkinson actually apologized for not giving me a top grade on it, explaining that he had a strict rule never to give a top grade to a Sufficiency that took more than the basic half-semester. At the time, what I thought (but at least had enough tact not to say to him) was that I was doing the work for its own sake, not for a mere grade. Later, when that grade caused me to graduate With Distinction instead of With High Distinction, I understood why he'd apologized. And was belatedly a bit put out, after all.

The things I learned from this paper have ever after informed my understanding of how scientific paradigms are chosen — a really major theme in my life since, after all, my master's thesis (pdf) brushed against a rejected paradigm (extensible languages), while my dissertation outright resurrected one (fexprs). The influence from this is also deep in the foundations of my thinking on memetic organisms, which I blogged on some time back.

The writing style is a bit stiffer, here and there, than I strive for now. (I was even worse in highschool.) All in all, though, I'm still fairly pleased with the piece.

The original had both footnotes meant to be read inline, and endnotes with bibliographical details. Here, I've put both in sections at the end, using letters for the erstwhile footnotes, numbers for the endnotes. While I'm being pedantic, this version of the paper has three changes from the original version submitted in Spring 1986. (And, just to prove what a nerd I am, the changes were made in 2002.) Footnotes [m] and [y] have been added; and where Hamilton's nabla is first defined, I've corrected it to use full derivatives, having originally miswritten it with partial derivatives.

Associative memory is a strange thing. Certain details stick with us. I remember worrying about one partial sentence from Crowe that I just couldn't think how to put differently, and in the end deciding to let that passage stand; though at this late date I've no idea which passage that would be.

Contents: Title Body Footnotes Endnotes Bibliography

Quaternions: A Case Study in the Selection of Tools for Mathematical Physics

John N. Shutt

Presented to: Professor Parkinson
Department of Humanities
Term D, 1986

Submitted in Partial Fulfillment
of the Requirements of
the Humanities Sufficiency Program
Worcester Polytechnic Institute
Worcester, Massachusetts
Corrections and additions: 11 March 2002

Quaternions are a form of hypercomplex number with four components. Mathematically, they are the next most well-behaved algebra after the complex field. The extent of their usefulness for mathematical physics has been in doubt since their discovery.

This paper examines historically the principal issues in the use of quaternions for mathematical physics. The historical and mathematical background of quaternions is examined, followed by their application first to classical physics, and then to modern physics. The paper concludes with an analysis of some of the major issues.

Vectorial analysis, in its general sense, is the mathematical treatment of directed magnitudes. It arose in the first half of the nineteenth century as a synthesis of two major trends of thought, one in physics and the other in mathematics.¹

It has been pointed out² that the geometry of physics after Newton differed intrinsically from that of the ancient Greeks. The difference is that, while Newtonian physics is set on a Euclidean stage, many of the principal players are (implicitly) vectors, which are not present in Euclid.

This left physics by the early nineteenth century working under a handicap. Vectors underlay most of physics, but could not be handled gracefully. The primary mathematical tool for handling geometry was the Cartesian coordinate system; Cartesian coordinates are flexible in principle, but in practice they are apt to be unwieldy and generally opaque. A natural language for vectors was needed.^a

Mathematics by this time had become overextended.³ The problem was with the concept of number. The formulae of algebra, originally developed using only positive numbers, were now being successfully applied to negative and complex numbers. Since mathematicians had traditionally grounded their work on intuition, the more extensive number systems left many mathematicians uneasy.

The problem led George Peacock, in 1833, to postulate the principle of permanence of form. This principle said that "Whatever algebraical forms are equivalent when the symbols are general in form but specific in value [positive integers], will be equivalent likewise when the symbols are general in value as well as in form."⁴ By 'general in form' is meant that properties of a particular number cannot be generalized; for example, 14 mod 7 = 0 doesn't imply that x mod 7 = 0. 'General in value' is meant to refer to fractional, irrational, negative, or complex numbers, but leaves a question almost as big as the one to which the principle is addressed. Despite its shortcomings, the principle was important because it did recognize that algebra is based on rules.

At least six people^b had independently devised the geometrical representation of complex numbers before Gauss finally published the idea in 1831.⁵ Several of these people used this representation as a justification for the complex number field. (Of the first two of the six to make the discovery, Wessel embraced this justification but Gauss did not.) It was Gauss's publication that finally drew general attention to the idea.

However, William Rowan Hamilton (1805–1865) was not aware of Gauss's 1831 paper until 1852. He was influenced instead by John Warren, in whose work he would have been exposed to the concepts of the associative, commutative and distributive laws. Hamilton did not consider the geometrical approach a sufficient justification. In 1837 he presented a fresh approach, interpreting complex numbers as algebraic couples of real numbers.⁷ He defined addition, subtraction, multiplication and division of couples and then derived from them the primitive properties of complex numbers.

These mathematical developments suggested to many of the mathematicians involved that a further extension of number might be analogous to (3-dimensional) space.⁸ It was generally expected^c that the sought-after extension would have three terms, and obey all the laws of complex algebra (associative, commutative, and distributive), as well as having close ties to spatial geometry.

The ties to spatial geometry take many forms. The two that Hamilton eventually settled on are (1) the Law of the Norms,^d and (2) unique division. In retrospect, only the real and complex numbers satisfy both all the usual laws and (2), and (1) is simply impossible in three dimensions.^e 9

The idea that triplets (Hamilton's term) might not satisfy all the usual laws had occurred to Hamilton as early as 1830. He was acquainted in particular with non-commutative multiplication from some speculations in set theory.¹⁰ On 16 October 1843, as he walked to a Council of the Royal Irish Academy, several of the above ideas converged in his mind to produce quaternions. He had been working recently with triples of the form a+bi+cj, with i² = j² = −1; he now realized that he could satisfy (1) above by making the assumptions that ij = −ji, and that this product yielded a third imaginary component k = ij, with k² = −1.¹¹

The resulting quaternion has the form a+bi+cj+dk, with i² = j² = k² = ijk = −1. Quaternion multiplication is distributive over addition, and associative, but not commutative.⁹ The norm is a modulus of multiplication, and right-division and left-division are unique. Real and complex numbers and quaternions are the only three possible division algebras — that is, algebras with associative and commutative addition, distributive and associative multiplication, and unique division.^f

Hamilton created a plethora of new terms for use in his new algebra.¹⁴ A quaternion q is made up of a real part, called the scalar of q and denoted Sq, and an imaginary part, called the vector of q and denoted Vq. Alternatively, it can be expressed as the product of a positive real number (the 'length,' or square root of the norm of q), called the tensor of q and denoted Tq, and a quaternion with tensor equal to one, called the versor of q and denoted Uq.^g Thus q = Sq + Vq = TqUq.

A versor u has a unique decomposition u = cos θ + v sin θ with angle 0 ≤ θ ≤ π and unit vector v = UVu.^h 15 If p is a vector perpendicular to v, p' = vp is the rotation of p by angle θ about v. This allows great-circular arcs in space to be represented by quaternions, leading to elegant proofs in spherical trigonometry.

Any non-zero quaternion q has a unique inverse q⁻¹ such that qq⁻¹ = q⁻¹q = 1. Left- and right- division by q are defined respectively as pre- and post- multiplication by q⁻¹. If q is a versor q = cos θ + v sin θ, then for an arbitrary vector p, p' = qp(q⁻¹) is the conical rotation of p by angle 2θ about v.

Another useful decomposition is that of the quaternion product of vectors into its scalar and vector parts. If u,v are vectors separated by angle θ, Suv = −(TuTv) cos θ and Vuv = (TuTv) (sin θ) n where n = UVuv is a unit vector perpendicular to u and v.ⁱ The scalar part is commutative (Suv = Svu), and the vector part anticommutative (Vuv = −Vvu). Suv and Vuv were later to form the basis for modern vector analysis.

Quaternions were the subject of a debate at the British Association meeting of 1847.¹⁶ George Peacock, who favored quaternions, did not come forward, but Sir John Herschel did, and called quaternions "a cornucopia of scientific abundance." Against quaternions it was objected that owing to their complexity, quaternion calculations are overly prone to mistakes. There was also at the meeting at least one representative of the status quo; in Hamilton's words,

Mr. Airy, seeing that the subject could not be cushioned, rose to speak of his own acquaintance with it [quaternions], which he avowed to be none at all; but gave us to understand that what he did not know could not be worth knowing.

The background to this paper would be incomplete without some mention of Hermann Günther Grassman (1807–1877).¹⁷ In 1844 Grassman published a work, monumental in both size and scope, entitled Die lineale Ausdehnungslehre (calculus of extension).

The ideas behind the Ausdehnungslehre began in 1832 with the interpretation of a parallelogram as the geometrical product of two lines. Grassman generalized this insight to other shapes and an arbitrary number of dimensions, and placed it on an abstract mathematical basis. The system of the Ausdehnungslehre was a very broad mathematical generalization originating from these geometrical concepts. Several types of multiplication were defined, the only requirement for a multiplication being distributivity over addition.

The Ausdehnungslehre of 1844 was written in a strongly metaphysical style, and was also highly abstract at a time when mathematics was based on concrete intuition. Grassman was unknown. Consequently, the Ausdehnungslehre went unnoticed by the world at large.

Despite Grassman's efforts, including a revised Ausdehnungslehre in 1862, his work remained obscure throughout his life, only beginning to attract interest about the time of his death. One by one, Grassman's discoveries were remade by others, with Grassman's anticipation unveiled in a subsequent question of priority.

Although Grassman's inner and outer products are similar respectively to the scalar and vector parts of Hamilton's quaternion product of vectors, Grassman was conceptually distant from quaternions. The significance of the Ausdehnungslehre here is that it encompasses an n-dimensional system of vectorial analysis.^j

An account will now be given of three significant figures in the application of quaternions who worked within the quaternion tradition in the nineteenth century. These three figures are Hamilton, Tait, and Maxwell. Following this, the circumstances will be described by which quaternions were abandoned in favor of vector analysis.

The first major publication on quaternions was Hamilton's Lectures on Quaternions of 1853.¹⁹ The text ran to over 700 pages. It was difficult to read; by 1859, Herschel — a great enthusiast of quaternions and an able mathematician — had only managed to read through 129 of its pages.

In 1859 Hamilton began work on the Elements of Quaternions. It was originally to be an elementary treatise, but became a reference work longer than the Lectures — though without the metaphysical emphasis of the earlier work. The Elements developed quaternions mathematically in great detail, but did not add to their physical application. By his own admission, Hamilton was by this time out of touch with contemporary physics.

Hamilton was convinced of the value of quaternions to physics, and had published scattered such applications. In 1846 he had defined a (nameless) operator ᐊ = i ^d⁄_dx + j ^d⁄_dy + k ^d⁄_dz. However, he never did concentrate his own efforts on applications to physics, choosing instead to develop quaternionic theory. He had planned for a major section of his Elements on the ᐊ operator, but the section was never written because of his death in 1865. The Elements was published posthumously in 1866.

Peter Guthrie Tait purchased and read Hamilton's Lectures in 1853, out of general curiosity.^k 21 In 1857 he encountered an application that reminded him of Hamilton's ᐊ operator; pursuing this, he shortly became a devoted quaternionist, ultimately succeeding Hamilton as their chief advocate after the latter's death.

Tait's interest in quaternions was for their physical applications. His Elementary Treatise on Quaternions, published in 1867, was the first accessible introduction to quaternions. This work went into some detail on the operator ∇,^l using it to express several important theorems (e.g. Green's and Stokes').

Tait did much to further quaternionic applications to physics. Oddly, he seems to have scrupulously avoided quaternions in his other work, including all of his lectures at the University of Edinburgh. Quaternions are also omitted from Tait's collaboration in mechanics with Lord Kelvin, the Treatise on Natural Philosophy; speaking of this later, Kelvin said,

We [Kelvin and Tait] have had a thirty-eight years' war over quaternions.... Times without number I offered to let quaternions into Thomson and Tait [the Treatise], if he could only show that in any case our work would be helped by their use. You will observe that from beginning to end they were never introduced.

It should be understood that Kelvin was throughout his life resolutely opposed to all vectorial methods.

James Clerk Maxwell originally derived his equations in the 1860's using component notation.²² He began to study quaternions in 1870. In his Treatise on Electricity and Magnetism of 1873, he presented both component and quaternionic notation.

Maxwell was a firm believer in physical analogy. He favored quaternions as an aid to thinking, because the notation corresponds more closely than does that of components to physical reality. For calculation, however, he considered component notation superior. He made this distinction in the preliminary chapter of the Treatise, where he advocated "the introduction of the ideas, as distinguished from the operations and methods of Quaternions."

Maxwell's use of quaternions in the Treatise was accordingly limited primarily to the restatement of important results in quaternionic form. Nonetheless it led some physicists who had never done so before to study quaternions.

In particular, this was the case with Josiah Willard Gibbs of America and Oliver Heaviside of England.²³ These two men proceeded independently along very similar lines.

From Maxwell's Treatise, both went to Tait's Elementary Treatise on Quaternions. Both observed that, as actually used by Maxwell and for the most part even by Tait, the vector/scalar partition of quaternions was more important than the quaternions themselves. Both then developed systems that treated vectors and scalars as entirely separate entities, V ∇ and S ∇ as separate operators,^m etc.

Heaviside went no further than this. His notation was not entirely compatible with Tait's, but he never introduced concepts outside the quaternion tradition. Thus his system was essentially a subset of quaternion analysis.

Gibbs however, broke all ties with Hamilton, even to citing Grassman as the main precedent for his system.ⁿ His notation is substantially different from Tait's; in particular he replaced the prefix operators of Hamilton with infix operators.^o (∇ was an exception to this.) Most significantly, he introduced a concept totally alien to quaternion analysis — that of the dyad. (A dyad is neither a vector nor a quaternion, but a tensor.)

Neither Gibbs nor Heaviside shared Tait's scruples about using their systems in their other work. Gibbs applied his vector analysis in periodic courses at Yale starting in 1879, and in some of his physics papers. It was Heaviside who did most to disseminate vector analysis; he made heavy use of his system in several important electrical publications, such as his Electromagnetic Theory, permanently linking vectors to that rapidly growing field.

A debate took place in the early 1890's, on the proper vectorial system for mathematical physics.^p 25 This debate involved more than thirty letters and articles over five years (1890–1894) in eight leading scientific journals, as well as a scattering of other published writings. It was primarily between Gibbs and Heaviside on one side and the English quaternionists on the other.

Superficially, a prominent characteristic of the debate was its colorful verbiage. Heaviside was the contributor to this for the vectorists, while considered vectorist ideology was supplied primarily by the more dignified Professor Gibbs. Metaphors and the odd pot shot are scattered through the quaternionists' writings, which are often pervaded by a tone of bitterness. Particularly fiery are the literary antics of Alexander McAulay, an unknown youngster who joined the ranks of the quaternionists in 1893 ^26,26j with what Tait called "the perfervid outburst of an enthusiast."

The figures of Grassman and Hamilton became weapons in the debate. The quaternionists played heavily on Hamilton's fame. The vectorists dissociated themselves from Hamilton entirely, and placed themselves firmly behind Grassman,^q for whom they built a reputation. As a result of this and the ultimate triumph of vector analysis, Hamilton's fame was tarnished. (It was later resurrected through Hamilton's characteristic function in quantum mechanics.)

Gibbs was conversant with the systems of Hamilton and Grassman as well as his own. This served him well on several occasions in the debate, for Tait was ill acquainted not only with Grassman's Ausdehnungslehre but with both Gibbs' and Heaviside's vector analyses.^r

The question recurs throughout the debate, of why quaternions had not been more widely accepted.^s This was generally not itself an issue (an exception is Gibbs' letter to Nature of 16 March 1893 ^26g), but served as a focal point for other issues.

The opening shots of the debate were fired by Tait,^26a and were aimed principally not at vector analysis but at component notation. His arguments centered on expressiveness; no detail need be given, as the principal interest of the current discussion is with issues between vectorial systems.^t However, it is significant that Tait apparently failed to appreciate the coming threat of vector analysis. He seems to have repeatedly underestimated his opponents for several years into the debate.

What touched off the controversy was the following passage from the preface to the 1890 (third) edition of Tait's Treatise.

Even Prof. Willard Gibbs must be ranked as one of the retarders of Quaternion progress, in virtue of his pamphlet on Vector Analysis; a sort of hermaphrodite monster, compounded of the notations of Hamilton and Grassman.

The issue of notations, which Gibbs early subordinated to that of notions,^u 26b nevertheless was addressed repeatedly. Gibbs disliked the prefix operators of quaternions, because infix operators were the existing norm.^26b Tait countered that the prefix operators allowed the use of fewer parentheses, thus enhancing brevity of expression.^26c

C. G. Knott objected to the large number of operators in Gibbs' notation^26e, illustrating his point with Gibbs' abbreviations 'Pot,' 'New,' 'Lap,' and 'Max,' all of which represent various combinations of the Nabla operator. Gibbs argued convincingly²⁶ⁱ that the quaternionic equivalents of these operators were too complicated to be intelligible.

Alexander Macfarlane, like Knott a former student of Tait, brought out another issue in 1891:^26d the sign of the scalar product. In quaternion analysis, Suv for (positive) vectors u,v is negative because i² = j² = k² = −1 but the vectorists had no stake in √−1, so for convenience they defined the scalar product to be positive.

Macfarlane had his own solution to this. He distinguished versors from vectors. Since two right turns produce a reversal (−1), he set i² = j² = k² = −1 for versors; but for convenience he set i² = j² = k² = 1 for vectors. Thereafter in the debate he represented a third faction. The vectorists never specifically addressed his system, so that his net influence was simply to undermine the quaternionists.

The vectorists never addressed the issue of the sign of the scalar product; in any case, there was no need for them to do so, since Macfarlane did it for them. The response to Macfarlane's innovation came from Knott. In December of 1892, Knott objected^26e that without i² = j² = k² = −1, multiplication ceases to be associative. Macfarlane answered in May of 1893 ^26h that, just as with commutativity, associativity is only a convention and can be given up with impunity if it is convenient to do so.

The crux of the controversy was the issue of notions. Everyone in the debate (except Cayley, as noted earlier) agreed that vectors and scalars are important for physics. The vectorists maintained that the quaternion has no notable physical interpretation (other than rotation, Gibbs acknowledged, but dyadics serve this purpose satisfactorily^26b). The burden of proof throughout the debate thus lay on the quaternionists, although Tait did once make the same accusation of artificiality against dyadics.^26c

The quaternionists did little during the debate to prove their case. Knott did address the question twice. In 1892 he argued simply that the ratio of two vectors is a fundamental concept, so that since this ratio is a quaternion, quaternions are fundamental.^26e In 1893 he added to this an analogy:

Although sin θ and cos θ occur more frequently than θ itself, we should not conclude that θ plays no fundamental role. Similarly we should not infer that αβ [the full quaternion product] is not fundamental simply because Vαβ and Sαβ occur more frequently.³¹

There is another issue which recurs periodically throughout the debate. It is generalizability to higher numbers of dimensions. From his first article in 1890, Tait praised quaternions for being "uniquely adapted to Euclidean [i.e. three-dimensional] space."^26a Gibbs in his first letter of 1891 praised vectors for being generalizable "to space of four or more dimensions."^26b These views are representative of the positions taken on this issue by the respective sides in the debate. There is one exception: in February of 1893, Dr. William Peddie, Tait's assistant at the University of Edinburgh, attempted to show that "quaternions are applicable to space of four or any number of dimensions."^26f

If there was a winning side to the debate, it was the vectorists. They consistently outmaneuvered (Gibbs) and outspoke (Heaviside) their opponents. More fundamentally, the quaternionists did little to justify their position on the crucial issue of notions. In any event, the decisive factor in the ultimate acceptance of vector analysis was Heaviside's active use of it in his published work.^v

Passing mention may be made of the International Society for Promoting the Study of Quaternions and Allied Systems of Mathematics.³² It was organized shortly after the debate by Shunkichi Kimura, residing at Yale at the time, and Pieter Molenbroek, and published a bulletin from 1900 to 1913. The Society was plagued with difficulties from its inception, the first of which was that Tait declined the presidency due to ill health. (He died in 1901.) In 1913, all the offices were due for election at the same time, with no one to arrange it, and the Society slipped quietly into oblivion.

There was a tendency among quaternionists, which surfaced on several occasions in the debate, to think of the vectorists as ungrateful children. This is presumably the source of the bitter overtone in their writings that was mentioned earlier. It may be interesting in relation to this to consider the following excerpt from a review by Heaviside written in the early 1900's.

... as time went on, [after the controversy] ... it was most gratifying to find that Prof. Tait softened his harsh judgments, and came to recognize the existence of rich fields of pure vector analysis, and to tolerate the workers therein.... I appeased Tait considerably (during a little correspondence we had) by disclaiming any idea of discovering a new system.³³

Quaternions will now be considered in their application to twentieth-century physics. It is appropriate in this regard to recount the early history of the idea of using the real part of a quaternion to represent time. Priority in this idea belongs to Hamilton.

In the hierarchy of thought, Hamilton placed mathematics above physics, and metaphysics above mathematics.³⁴ An important element of his philosophy was the metaphysical importance of the number three.³⁵ He attached much significance to the tridimensionality of space, and this was a major impetus for his search for algebraic triplets.

After arriving at quaternions by purely algebraic means, Hamilton struggled to reconcile them with his philosophy. At least since 1835 he had used time as a metaphysical justification for the real numbers; this may have contributed to his later identification of quaternions with the four dimensions of space and time.

He clearly failed to pass on the idea to the physicist Tait. Recall that in the 1890's controversy the quaternionists represented quaternions as "uniquely suited" to three-dimensional space. Tait did speculate on the possibility of a fourth dimension, as did Maxwell.³⁷ However, it is not clear that either of them associated a fourth dimension with time.

In 1896, Kimura pointed out that ∇ is not a full quaternion operator because it does not include the derivative with respect to the real component.³⁸ He introduced a full quaternion differential operator _q∇, and used the scalar component to represent the derivative with respect to time. He had physical applications in mind.

The possibility of using quaternions occurred to Hermann Minkowski when he was formulating his space-time in the 1890's (naturally enough, since this was during the vector-quaternion controversy). He rejected them completely as "too narrow and clumsy for the purpose."³⁹

To understand what might have motivated Minkowski to make this judgment, consider some basic four-dimensional properties of quaternions. Cayley showed in the 1850's that the general rotation in Euclidean four-space may be expressed by the quaternion formula p' = upv, where u,v are arbitrary versors.^w 40 Also in Euclidean four-space, multiplication by an arbitrary versor u = cos θ + v sin θ may be understood as a relative turning by angle θ through 4-space.

Unfortunately, Minkowski space-time is non-Euclidean. While quaternions have 'length' (modulus of multiplication) Tq = (w² + x² + y² + z²)^¹⁄₂, interpreted as a space-time vector q should have length (w² − x² − y² − z²)^¹⁄₂.⁴¹

A solution to this problem was found in 1912 by Ludwig Silberstein.⁴² He let the scalar component, representing time, be imaginary by introducing a fourth √−1 independent of the three of quaternions. He was thus working with what Hamilton called biquaternions, quaternions whose four coefficients are complex numbers. Biquaternions do not, of course, have unique division, although Silberstein did define a biquaternion "inverse."^x

By this device, Silberstein was able to express the Lorentz transformation in the form q' = QqQ, where q is in frame S and q' is its equivalent in frame S'. Q is a complex versor (i.e. TQ = 1); further, the coefficient of SQ is real, and the three coefficients of VQ are imaginary. Thus Q = cos θ + v sin θ for imaginary unit vector v and angle θ. The resulting transformation is a rotation in Minkowski space-time by angle 2θ √−1. Compare this with the general Euclidean three-space rotation mentioned earlier.

A different solution to the problem was given in a 1945 paper by P. A. M. Dirac.⁴³ Dirac submitted that the value of quaternions lies in their special algebraic properties, and that therefore resorting to biquaternions is not productive.^y Restricting himself to real quaternions, he derived the general quaternionic linear transformation q' = (aq + b)(cq + d)⁻¹, with quaternion coefficients a,b,c,d.^z

He used this equation to describe a transformation in five-dimensional projective space, and restricted it to describe the Lorentz group. He then derived a one-to-one correspondence between the quaternions q and q' in his transformation and space-time vectors, through a comparatively involved set of equations. Finally, he used his quaternionic transformation to derive a general quaternionic formula for the relativistic addition of velocities. The elegance of this formula provoked the only non-mathematical comment in the paper, "The quaternion formulation appears to be the most suitable one for expressing generally the law of addition of velocities."

Dirac's treatment is contrary to the traditional usage of quaternions. Ever since their discovery, much of their perceived value has been in the physical interpretability of quaternionic formulae. This perception is evident in the following, written to Hamilton by John T. Graves.

There is still something in the system which gravels me. I have not yet any clear views as to the extent to which we are at liberty arbitrarily to create imaginaries, and endow them with supernatural properties. You are certainly justified by the event.... but I am glad that you have glimpses on physical analogies.⁴⁴

Physical interpretability was the quaternionists' main argument against component notation.

Dirac's approach was to set up a correspondence between quaternions and space-time vectors; but the correspondence was unintuitive. This abandonment of physical interpretation is consistent with the general philosophy of much of modern mathematical physics. However, it is not the way others have applied quaternions to modern physics.

Papers applying quaternions to modern physics are not as rare as one might suppose, numbering (as nearly as I can determine) at least in the dozens.⁴⁵ These papers deal with a wide range of topics. It is not within the scope of the current paper to examine all their areas of application;^aa for example, the use of quaternions to describe elementary particles will be omitted.⁴⁶ For modern quaternionic ideology, two representative examples will be used.

The first example is a 1964 paper, "Quaternions in Relativity," by Peter Rastall.⁴⁷ The paper begins with a brief historical account of quaternionic application to relativity, along with commentary on why quaternions have been (up to 1964) repeatedly passed by in favor of other formalisms.^bb His own interest in quaternions is for field equations in curved (Riemannian) space-time. He argues that for this general case, neither matrix nor spinor notation yields any clear physical interpretation because neither can easily be understood in terms of tetrads of real coordinates (x,y,z,t).

Quaternions, by which he means complex or bi- quaternions, are to provide this physical interpretability. He uses Silberstein's form for Lorentz transformations in flat space-time. (Recall that Silberstein's quaternions have only four non-zero real coefficients, corresponding directly to coordinates x,y,z,t.) He describes Riemannian space-time using tetrad formalism, i.e. with each tetrad of event coordinates he associates a set of four axes, which need not be orthogonal. By combining these tools he then derives his general field equation.

It is important to such quaternionic treatments of relativity that a quaternionic equivalent is possible for any matrix formula, and vice versa.^cc 48 The most basic quaternion-matrix equivalence, demonstrated by C. S. Peirce in 1881, is between a real quaternion w + xα + yβ + zγ (imaginaries α, β, γ) and a 2 × 2 complex matrix

w+ix    y+iz

−y+iz    w−ix

= w

1    0

0    1

+ x

i    0

0    −i

+ y

0    1

−1    0

+ z

0    i

i    0

The four matrices equivalent to 1, α, β, γ are essentially the Pauli spin matrices.

The second example consists of two papers written in 1962–63 by a group of physicists on quaternion quantum mechanics.^dd These papers take advantage (respectively) of two basic properties of quaternions: their close ties to 3-/4- vector spaces, and their lack of commutativity.

The first paper presents the fundamentals of the theory.⁴⁹ Their starting point is that

a propositional calculus exists that we can call general quantum mechanics (as distinguished from complex quantum mechanics) in as much as no number system or vector space at all is assumed in its formulation....

It is always possible to represent the pure states of a system of "general quantum mechanics" by rays in a vector space in a one-to-one manner...

and that the only number system over which this can be done for all such systems is 𝓠, the quaternions. They suggest that while real and complex quantum mechanics are very similar, "quaternion quantum mechanics has many new features that make it much richer."

The second paper capitalizes on a difficulty that arises from the non-commutativity of quaternions.^ee 50 Multiple mathematical descriptions arise that should be equivalent. By postulating invariance of the physical laws over these different descriptions, they arrive at a new field of which electromagnetism is a special case.

Perhaps the most extensive example of quaternionic application in the twentieth century (if not for all time) is the work of Otto Fischer. Fischer was a Swedish civil engineer who became interested in quaternions in the years before World War II. In the 1950's he published two substantial books on the application of quaternions.⁵¹

In Universal Mechanics and Hamilton's Quaternions (I have not had access to his other book), Fischer set himself a rather ambitious goal.

This is a book written by a civil engineer on universal mechanics with an attempt to introduce a certain order in its mathematical structure by means of Hamilton's Quaternions. The term "universal mechanics" refers to the mathematics of ordinary physics of motions, elasticity, hydrodynamics, aerodynamics, electromagnetism, together with relativistic and cosmic physics as well as quantum mechanics.

Fischer's aim is to create a close correspondence between concepts and explicit mathematical structure. In pursuing this goal, he correlates several types of mathematical structural hierarchies to branching specialities within universal mechanics. One such hierarchy is the "potential pyramid," which expands by repeated differentiation from an 'apex' potential. Closely related is an "affinor pyramid",^ff also formed by repeated differentiation. The potential pyramid is static, consisting of functions of simple, quadric or double quadric quaternions,^gg while an affinor pyramid consists of operators on such functions, and is therefor dynamic, taking its shape from the function to which it is applied.

The three different types of quaternions are the basis of Fischer's other major unifying structure. After introducing and doing much work with real quaternions in imaginaries i₁, i₂, i₃, he proceeds to quadric quaternions by introducing "superdirections" ï, j₁, j₂, j₃ that correspond roughly to different specialities in physics. Ultimately he applies this technique in the more general case of double quadric quaternions to his goal of unifying universal mechanics.^hh

Fischer appears to have been widely competent in the specialities of "universal mechanics." He surely did not, however, have a talent for presentation. The superficial appearance of the Universal Mechanics is that of the numerology of the 'crackpot fringe.'

Fischer was part of what has often been called the "Cult of Quaternions"⁵² — a tradition of enthusiastic devotees that began with Hamilton and continues to the present day.ⁱⁱ In 1910, C. S. Peirce wrote of his brother James Mills Peirce, who had been a noted quaternionist until his death in 1906, that he "remained to his dying day a superstitious worshipper of two hostile gods, Hamilton and the scalar √−1." An earlier reference to the 'cult' is found in the chapter on vector analysis in Heaviside's Electromagnetic Theory.

"Quaternion" was, I think, defined by an American schoolgirl to be "an ancient religious ceremony." This was, however, a complete mistake. The ancients — unlike Prof. Tait — knew not, and did not worship Quaternions.⁵³

The members of the 'cult' who have been mentioned in this paper all exhibited rational reasons for their enthusiasm. Yet, their reputation as a semi-religious community is not entirely unsupported in their own writings. A comparatively recent example is E. T. Whittaker's statement (1940) on quaternionic research after the turn of the century, that "the good work went on."^jj In 1871 Maxwell had observed, with no sarcasm intended, that "The unbelievers are rampant."⁵⁴

Throughout this paper, the motivating philosophies of applied quaternionists have been noted. Definite trends are visible in these philosophies.

From their discovery, the claims of quaternions for physical application have been interpretability and utility. Their interpretation is certainly an improvement over component notation; nevertheless, as F. D. Murnaghan has observed, to the general public quaternions were the archetype of a baffling abstract theory until they were supplanted in this role by Einstein's General Theory of Relativity.⁵⁶ The quaternionists simply continued to assert that quaternions are meaningful, and as a tactic against component notation this was sufficient.

Against vector analysis, however, the tactic failed. Having extracted the utilitarian part of quaternion analysis, the vectorists discarded the remainder and branded it meaningless. The result was that the quaternionists were left clinging forlornly to their claims of interpretability while practical-minded physicists flocked to vector analysis.

In the twentieth century, this parting of the ways has led to a peculiar twist of fate. Following their vectorist and anti-quaternionist traditions, physicists have adopted the utilitarian notation of matrices. This has led them away from interpretability, and ultimately even away from utility into increasing abstraction. Quaternionists have continued to emphasize interpretability, finally coming to be its uncontested claimants; but, being an acknowledged fringe group, have had no general acceptance to date.

Rastall's paper of 1964 concludes with the following paragraph. It illustrates clearly the underlying ideology of the modern physical application of quaternions. It is also a fitting note on which to end the current paper on quaternions.

The movement towards abstract algebraic and coordinate-independent formulations of physical theories, and away from particular matrix representations and special coordinate systems, is an increasingly popular one, and our work is in accord with it. Less popular, and seemingly opposed to this rarefied mathematical spirit, is our desire to make abstract concepts more concrete and imaginable. To pure mathematical minds the aim is unsympathetic. They are happy in their complex spaces, and would prefer to postulate an affine connection rather than to align tetrad vectors. It is a matter of taste. Those, however, who are prepared to exploit the accident of having been born in space-time may find this paper useful.

Footnotes

[a] This need increased greatly with the development of electromagnetic theory in the last third of the century.⁶
[b] The six are Wessel (1799) and Gauss (about the same time), Argand and Buée (1806), and Mourey and Warren (1828).
[c] Almost everyone looked for a superset of the complex numbers. An exception was Servois, who came close to quaternions in 1815.¹²
[d] In modern terms, the Law of the Norms says that the norm (the sum of the squares of the real coefficients of the terms) should be a modulus of multiplication. A modulus of multiplication is a function M(x) such that M(a) * M(b) = M(a*b). The Law of the Norms holds for real and complex numbers.
[e] In 1844 Augustus de Morgan presented a number of triple algebras that do not satisfy the Law of the Norms but do have moduli of multiplication. Given the distributive law, the Law of the Norms and the uniqueness of division are equivalent, so naturally de Morgan's triple algebras do not have unique division either.
[f] If the associative law is also dropped, only one further algebra, Cayley's octonions, is possible.¹³
[g] 'Scalar,' 'vector' and 'tensor' seem to be the only three of Hamilton's original quaternionic terms that have developed non-quaternionic usages. The modern meaning of 'tensor' is completely different from Hamilton's.
[h] Obviously, v is indeterminate for angles 0 and π. In the following description of quaternions, special cases such as this are generally omitted.
[i] u,v,n always have the same sense. If a right-handed coordinate system is used, u,v,n form a dextral set. Hamilton used a left-handed system, while Tait, Maxwell, Gibbs all used a right-handed system.¹⁸
[j] Also implied in Grassman's calculus of extension are matrix theory and modern tensor analysis.²⁰
[k] Remember, this is the same book that so effectively stymied Herschel.
[l] This operator was apparently written as ᐊ by Hamilton but as ᐁ by Tait. In Tait's form it has been variously given the names nabla (after a visually similar Assyrian harp), del, and atled (delta spelled backwards).²⁴
[m] In replacing V ∇ and S ∇ by ∇× and ∇· (Gibbs' notation), it is also necessary to redefine ∇ itself from quaternionic operator i ^d⁄_dx + j ^d⁄_dy + k ^d⁄_dz to vectorial operator ı ⃗ ^∂⁄_∂x + ȷ ⃗ ^∂⁄_∂y + k⃗ ^∂⁄_∂z a subtle and profound (not to say confusing) change that may serve to suggest the size of the intellectual gulf between quaternion analysis and vector analysis.
[n] There is, nevertheless, substantial evidence that Gibbs' system was inspired by quaternions, and not by the Ausdehnungslehre.²³
[o] 'Infix notation' means that the operator symbol appears between the operands, as in u × v. 'Prefix notation' means that the operator appears before the operands, as in Vuv. Prefix notation was later rediscovered by Jan Łukasiewicz,²⁷ and is now generally called Polish notation.
[p] The two sources used for the following discussion of the 1890's' controversy both handle the debate on a roughly chronological paper-by-paper basis. My discussion is a substantially different organization of the material.
[q] Gibbs provided the initiative that led to the publication of Grassman's collected works.²⁸
[r] In a letter to Nature of January 1893, Tait wrote:
I found that I should not only have to unlearn quaternions (in whose disfavor much is said) but also to learn a new and most uncouth parody of notations long familiar to me.... There I was content to leave the matter.... Dr. Knott [Cargill Gilston Knott, a former student of Tait and a staunch quaternionist] has actually had the courage to read the pamphlets of Gibbs and Heaviside; and, after an arduous journey through trackless jungles, has emerged a more resolute supporter of Quaternions than when he entered.²⁹
[s] This lack of acceptance was exaggerated. Actually, 168 works were published in the quaternion tradition in the 1890's.³⁰
[t] There was only one paper in the debate that argued against the use of all vectorial systems. This was written by Arthur Cayley in 1894. It named quaternions in particular, and was answered by Tait.^26k
[u] In his response to Tait's preface, Gibbs' wrote,
The criticism relates particularly to notations, but I believe that there is a deeper issue of notions underlying that of notations. Indeed, if my offense had been solely in the matter of notation, it would have been less accurate to describe my production as a monstrosity, than to characterize its dress as uncouth.
[v] Of course, quaternions didn't vanish overnight. They were in the final stages of disappearance in about 1910;³⁶ and, as will be seen, they have never vanished entirely.
[w] Cayley considered this result in a purely geometrical context — i.e. he didn't have time as a fourth dimension in mind. Note that a special case of this formula is when the rotation is orthogonal to the real axis; then v = u⁻¹, giving the general three-dimensional rotation described earlier.
[x] For a real quaternion q, the inverse is often defined as q⁻¹ = Kq/Nq, where Kq is the complement and Nq the norm of q.¹⁵ Silberstein simply carried this definition over to biquaternions.
[y] Taken in the context of 1945, Dirac's dismissal of biquaternions is reasonable. However, in the context of modern theoretical physics, which favors such rarified creatures as Clifford and Lie algebras, biquaternions actually do have some interesting properties.
[z] The form using left-division is q' = (qa + b)⁻¹(qc + d), with (of course) different quaternion values for a,b,c,d.
[aa] I have not discussed any specific areas of application to classical physics. It has not been necessary to do so, since the vector-quaternion controversy was independent of both area and method of application.
[bb] Discussions of the reception (or rather lack of reception) of quaternions in modern physics appear in most of the modern works I have examined. Rastall's observations are interesting; but his historical interpretations do not take into account the vectorist and anti-quaternionist traditions.
[cc] According to A. W. Conway, this equivalence is itself equivalent to wave-particle duality.
[dd] These papers appeared in Journal of Mathematical Physics. I find no subsequent papers in that journal by this group.
[ee] The trouble with the non-commutativity of 𝓠 is that there is no unique tensor product of the Hilbert space 𝓗_𝓠 with itself.
[ff] Fischer explains this terminology on page 4 of the Universal Mechanics:
It is more common in literature to use the term "tensor" for the general non-commutative "affinor" and speak of symmetric and antisymmetric tensors instead of tensors and axiators. But Spielreins terms apparently are more elucidative.
[gg] Quadric quaternions are quaternions whose four coefficients are themselves real quaternions; equivalently they are sums and products of quaternions in two independent sets of imaginaries. They have 16 real coefficients each. Double quadric quaternions are quadric quaternions whose 16 coefficients are themselves quadric quaternions, or equivalently sums and products of quaternions in four independent sets of imaginaries. Double quadric quaternions have 256 real coefficients each.
[hh] In gathering the above high-level description of Fischer's method I have made repeated forays into the Universal Mechanics. I found the book extremely difficult to read. It is highly concentrated and moves very rapidly, which is natural considering that it covers a very large quantity of material. This is compounded by the erroneous omission from the printing of scattered pages from the first two chapters of the book.
[ii] The most recent representative of the 'cult' in my bibliography is James D. Edmonds (1974).
[jj] Whittaker qualifies as a 'cultist' because of a passage in the same article (which appeared in the Mathematical Gazette):
... those who were in the outer circles of Hamilton's influence — e.g. Willard Gibbs in America and Heaviside in England — wasted their energies in devising bastard derivatives of the quaternion calculus...
This editorial comment resulted in a brief correspondence in the Mathematical Gazette between Whittaker and E. A. Milne in the spirit of the controversy of the 1890's.⁵⁵

Endnotes

[1] Cf. Michael J. Crowe, A History of Vector Analysis (Notre Dame: University of Notre Dame Press, 1967), p. 1.
[2] Crowe, pp. 127–128.
[3] On the mathematical ancestry of quaternions, see Morris Kline, Mathematical Thought from Ancient Through Modern Times (New York: Oxford University Press, 1972), pp. 772–779. On extensions to the concept of number prior to 1800, see E. T. Bell, Development of Mathematics, 2nd ed. (New York: McGraw-Hill, 1945), pp. 172–178.
[4] Quoted in Kline, p. 773.
[5] Crowe, pp. 5–11.
[6] This is remarked on by Crowe, p. 220.
[7] Crowe, pp. 23–27.
[8] On Hamilton's attempts, see Crowe pp. 26–28; on other attempts, see [5].
[9] Edmund T. Whittaker, "The Sequence of Ideas in the Discovery of Quaternions," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 97–98.
[10] Encyclopedia Britannica, 11th ed., s.v. "Quaternions," by Alexander McAulay, p. 720. The relevant part of the article is the historical profile, which is taken from the corresponding article in the 9th edition, by Peter Guthrie Tait.
[11] William Rowan Hamilton, "Quaternions," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 89–92. This is the first publication of some notes made by Hamilton on the day of his discovery of quaternions.
[12] See Crowe, p. 10.
[13] Kenneth O. May, "The Impossibility of a Division Algebra of Vectors in Three Dimensional Space," American Mathematical Monthly 73 (1966): 289–291. On Cayley's octonions, see Kline p. 792.
[14] On 'scalar' and 'vector,' see Crowe pp. 31–32. On 'tensor' and 'versor,' see Felix Klein, Elementary Mathematics from an Advanced Standpoint (New York: Dover Publications, 1945), p. 138. For some examples of others of Hamilton's terms, see Crowe p. 36.
[15] The basic properties of quaternions are taken from Louis Brand, Vector and Tensor Analysis (New York: John Wiley & Sons, 1947). Brand devotes the last chapter (chapter X, pp. 403–429) of his book to quaternions.
[16] Crowe, pp. 34–35.
[17] Crowe, pp. 54–96.
[18] Crowe, p. 155.
[19] On Hamilton's Lectures and Elements, see Crowe pp. 36–41.
[20] Bell, pp. 200, 204.
[21] On Tait's quaternionic work see Crowe pp. 117–125.
[22] On Maxwell's use of quaternions, see Crowe pp. 127–139.
[23] On the development of Gibbs' and Heaviside's systems of vector analysis, see Crowe pp. 150–177.
[24] See Crowe pp. 124, 146.
[25] The 1890's' controversy is described in some detail in Crowe pp. 182–224 (chapter 6). Much of the controversy is also covered in Alfred M. Bork, " 'Vectors versus Quaternions' — The Letters in Nature," American Journal of Physics 34 (1966): 202–211. Crowe's treatment is more comprehensive; however, Bork goes into more detail on the contents of the articles he discusses, and makes frequent use of quotations.
[26] General reference notes for specific papers in the controversy are ordered chronologically, and indexed by letter under number 26 (hence notes 26a–26k). References particularly to one or the other secondary source are numbered separately.
- [26a] Tait, Philosophical Magazine, January 1890, and the preface to his Elementary Treatise on Quaternions, 1890 edition. See Crowe pp. 183–185 and (less) Bork pp. 202–203.
- [26b] Gibbs, Nature, 2 April 1891. See Crowe pp. 185–186 and Bork p. 203.
- [26c] Tait, Nature, 30 April 1891. See Crowe pp. 186–187 and Bork pp. 203–204.
- [26d] Macfarlane, Proceedings of the American Association for the Advancement of Science, published in July 1892. See Crowe pp. 190–191 and Bork p. 205.
- [26e] Knott, Proceedings of the Royal Society of Edinburgh, read 19 December 1892. Crowe pp. 201–203 and Bork p. 207.
- [26f] Peddie, Proceedings of the Royal Society of Edinburgh, read 10 February 1893. See Crowe pp. 208–209.
- [26g] Gibbs, Nature, 16 March 1893. See Crowe pp. 198–200 and Bork p. 206.
- [26h] Macfarlane, Nature, 25 May 1893. See Crowe pp. 203–204 and Bork p. 207.
- [26i] Gibbs, Nature, 17 August 1893. See Crowe pp. 204–205 and Bork p. 208.
- [26j] McAulay, Utility of Quaternions in Physics, 1893. See Crowe pp. 194–195.
- [26k] Arthur Cayley, "Coordinates versus Quaternions," and Tait, "On the Intrinsic Nature of the Quaternion Method," both read before the Royal Society of Edinburgh on 2 July 1894. See Crowe pp. 211–215.
[27] Bork, p. 204.
[28] Crowe, p. 161.
[29] Bork, p. 206.
[30] Crowe, p. 111. The supposed slowness of acceptance is discussed in Crowe pp. 219–220.
[31] Crowe, p. 208.
[32] A brief account of the history of the International Society is given in Hubert Kennedy, "James Mills Peirce and the Cult of Quaternions," Historia Mathematica 6 (1979): 425–426.
[33] Crowe, p. 123.
[34] Kline, p. 778. These priorities are evident in much of his work.
[35] On Hamilton's metaphysics, see Thomas L. Hankins, "Triplets and Triads: Sir William Rowan Hamilton on the Metaphysics of Mathematics." Isis 68 (1977): 175–193.
[36] Crowe, p. 240.
[37] Alfred M. Bork, "The Fourth Dimension in Nineteenth Century Physics," Isis 55 (1964): 328–330.
[38] The article in which he wrote this is mentioned in ibid., p. 338. The original article is Shunkichi Kimura, "On the Nabla of Quaternions," Annals of Mathematics, 10 (1896): 127–155.
[39] James D. Edmonds, "Quaternion Quantum Theory: New Physics or Number Mysticism?" American Journal of Physics 42 (1974): 221. Edmonds derives his information from a 1914 book by Ludwig Silberstein. These same sentiments are attributed to Minkowski in Otto F. Fischer, "Hamilton's Quaternions and Minkowski's Potentials," Philosophical Magazine (7) 27 (1939): 375. Fischer does not identify the source of his information.
[40] Ludwig Silberstein, "Quaternionic Form of Relativity," Philosophical Magazine 23 (1912): 790.
[41] This observation is made in P. A. M. Dirac, "Application of Quaternions to Lorentz Transformations," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 261.
[42] Silberstein, pp. 790–809.
[43] Dirac, pp. 261–270.
[44] Crowe, p. 34.
[45] For a list of such papers, see Edmonds, p. 220.
[46] For references on this topic, see David Finkelstein et al., "Foundations of Quaternion Quantum Theory," Journal of Mathematical Physics 3 (1962): 217, and Peter Rastall, "Quaternions in Relativity," Reviews of Modern Physics 36 (1964): 820.
[47] Rastall, pp. 820–832.
[48] A. W. Conway, "Quaternions and Matrices," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 98–103. On the generality of quaternions, see also William Kingdon Clifford, "Applications of Grassman's Extensive Algebra," American Journal of Mathematics 1 (1878): 350–358.
[49] Finkelstein et al., "Foundations," pp. 207–220.
[50] David Finkelstein et al., "Principle of General Q Covariance," Journal of Mathematical Physics 4 (1963): 788–796.
[51] Crowe, pp. 254–255. The two books are Universal Mechanics and Hamilton's Quaternions, 1951, and Five Mathematical Structural Models in Natural Philosophy with Technical Physical Quaternions, 1957. I have not seen the latter book. It may be relevant that Silberstein used the name "physical quaternions" for his specialized biquaternions corresponding to space-time vectors.
[52] For example, the title of Kennedy's paper cited in [32].
[53] Crowe, p. 171.
[54] Crowe, p. 133.
[55] Edmund T. Whittaker, "The Hamiltonian Revival," Mathematical Gazette 24 (1940): 153–158. The associated correspondence appears in Mathematical Gazette 25 (1941): 106–108 and 25 (1941): 298–300.
[56] F. D. Murnaghan, "An Elementary Presentation of the Theory of Quaternions," Scripta Mathematica 10 (1944): 37.

Bibliography

Bell, E. T. The Development of Mathematics. 2nd ed. New York: McGraw-Hill Book Company, Inc., 1945.
Bork, Alfred M. "The Fourth Dimension in Nineteenth-Century Physics." Isis 55 (1964): 326–338.
—. " 'Vectors versus Quaternions' — the Letters in Nature." American Journal of Physics 34 (1966): 202–211.
Brand, Louis. Vector and Tensor Analysis. New York: John Wiley & Sons, Inc., 1947.
Clifford, William Kingdon. "Applications of Grassman's Extensive Algebra." American Journal of Mathematics 1 (1878): 350–358.
Conway, A. W. "Quaternions and Matrices." Proceedings of the Royal Irish Academy 50 (1945) sect. A: 98–103.
Crowe, Michael J. A History of Vector Analysis: the Evolution of the Idea of a Vectorial System. Notre Dame: University of Notre Dame Press, 1967.
Dirac, P. A. M. "Application of Quaternions to Lorentz Transformations." Proceedings of the Royal Irish Academy 50 (1945) sect. A: 261–270.
Edmonds, James D. "Quaternion Quantum Theory: New Physics or Number Mysticism?" American Journal of Physics 42 (1974): 220–223.
Finkelstein, David; Jauch, Josef M.; Schiminovich, Samuel; and Speiser, David. "Foundations of Quaternion Quantum Mechanics." Journal of Mathematical Physics 3 (1962): 207–220.
—. "Principle of General Q Covariance." Journal of Mathematical Physics 4 (1963): 788–796.
Fischer, Otto F. "Hamilton's Quaternions and Minkowski's Potentials." Philosophical Magazine (7) 27 (Jan.–June 1939): 375–385.
—. Universal Mechanics and Hamilton's Quaternions. Stockholm: Axion Institute, 1951.
Hamilton, William Rowan. "Quaternions." Proceedings of the Royal Irish Academy 50 (1945) sect. A: 89–92.
Hankins, Thomas L. "Triplets and Triads: Sir William Rowan Hamilton on the Metaphysics of Science." Isis 68 (1977): 175–193.
Kennedy, Hubert. "James Mills Peirce and the Cult of Quaternions." Historia Mathematica 6 (1979): 423–429.
Kimura, Shunkichi. "The Nabla of Quaternions." Annals of Mathematics 10 (1896): 127–155.
Klein, Felix. Elementary Mathematics from an Advanced Standpoint. Translated by E. R. Hedrick and C. A. Noble. New York: Dover Publications, 1945.
Kline, Morris. Mathematical Thought from Ancient through Modern Times. New York: Oxford University Press, 1972.
McAulay, Alexander. "Quaternions." Encyclopedia Britannica. 11th ed. 1911.
May, Kenneth O. "The Impossibility of a Division Algebra of Vectors in Three Dimensional Space." American Mathematical Monthly 73 (1966): 289–291.
Murnaghan, Francis D. "An Elementary Presentation of the Theory of Quaternions." Scripta Mathematica 10 (1944): 37–49.
Rastall, Peter. "Quaternions in Relativity." Reviews of Modern Physics 36 (1964): 820–832.
Silberstein, Ludwig. "Quaternionic Form of Relativity." Philosophical Magazine (6) 23 (Jan.–June 1912): 790–809.
Whittaker, Edmund T. "The Hamiltonian Revival." Mathematical Gazette 24 (1940): 153–158.
—. "The Sequence of Ideas in the Discovery of Quaternions." Proceedings of the Royal Irish Academy 50 (1945) sect. A: 93–98.

Continuations and term-rewriting calculi

2014-03-01T07:36:00.000-08:00

Thinking is most mysterious, and by far the greatest light upon it that we have is thrown by the study of language.
— Benjamin Lee Whorf.

In this post, I hope to defuse the pervasive myth that continuations are intrinsically a whole-computation device. They aren't. I'd originally meant to write about the relationship between continuations and delimited continuations, but find that defusing the myth is prerequisite to the larger discussion, and will fill up a post by itself.

To defuse the myth, I'll look at how continuations are handled in the vau-control-calculus. Explaining that calculus involves explaining the unconventional way vau-calculi handle variables. So, tracing back through the tangle of ideas to find a starting point, I'll begin with some remarks on the use of variables in term-rewriting calculi.

While I'm extracting this high-level insight from the lower-level math of the situation, I'll also defuse a second common misapprehension about continuations, that they are essentially function-like. This is a subtle point: continuations are invoked as if they were functions, and traditionally appear in the form of first-class functions, but their control-flow behavior is orthogonal to function application. This point is (as I've just demonstrated) difficult even to articulate without appealing to lower-level math; but it arises from the same lower-level math as the point about whole-computation, so I'll extract it here with essentially no additional effort.

Contents
Partial-evaluation variables
Continuations by global rewrite
Control variables
Insights

Partial-evaluation variables

From the lasting evidence, Alonzo Church had a thing about variables. Not as much of a thing as Haskell Curry, who developed a combinatorial calculus with no variables at all; but Church did feel, apparently, that a meaningful logical proposition should not have unbound variables in it. He had an elegant insight into how this could be accomplished: have a single binding construct —which for some reason he called λ— for the variable parameter in a function definition, and then —I quite enjoyed this— you don't need additional binding constructs for the existential and universal quantifiers, because you can simply make them higher-order functions and leave the binding to their arguments. For his quantifiers Π and Σ, Π(F,G) meant for all values v such that F(v) is true, G(v) is true; and Σ(F) meant there exists some value v such that F(v) is true. The full elegance of this was lost because only the computational subset of his logic survived, under the name λ-calculus, so the quantifiers fell by the wayside; but the habit of a single binding construct has remained.

In computation, though, I suggest that the useful purpose of λ-bound variables is partial evaluation. This notion dawned on me when working out the details of elementary vau-calculus. Although I've blogged about elementary vau-calculus in an earlier post, there I was looking at a different issue (explicit evaluation), and took variables for granted. Suppose, though, that one were centrally concerned only with capturing the operational semantics of Lisp (with fexprs) in a term-rewriting calculus at all, rather than capturing it in a calculus that looks as similar as possible to λ-calculus. One might end up with something like this:

T ::= S | s | (T . T) | [wrap T] | A
A ::= [eval T T] | [combine T T T]
S ::= d | () | e | [operative T T T T]
e ::= ⟪ B^* ⟫
B ::= s ← T

Most of this is the same as in my earlier post (explicit evaluation), but there are three differences: the internal structure of environments (e) is described; operatives have a different structure, which is fully described; and there are no variables.

Wait. No variables?

Here, a term (T) is either a self-evaluating term (S), a symbol (s), a pair, an applicative ([wrap T], where T is the underlying combiner), or an active term (A). An active term is the only kind of term that can be the left-hand side of a rewrite rule: it is either a plan to evaluate something in an environment, or a plan to invoke a combiner with some operands in an environment. A self-evaluating term is either an atomic datum such as a number (d), or nil, or an environment (e), or an operative — where an operative consists of a parameter tree, an environment parameter, a body, and a static environment. An environment is a delimited list of bindings (B), and a binding associates a symbol (s) with an assigned value (T).

The rewrite rules with eval on the left-hand side are essentially just the top-level logic of a Lisp evaluator:

[eval S e] → S
[eval s e] → lookup(s,e) if lookup(s,e) is defined
[eval (T₁ . T₂) e] → [combine [eval T₁ e] T₂ e]
[eval [wrap T] e] → [wrap [eval T e]]

That leaves two rules with combine on the left-hand side: one for combining an applicative, and one for combining an operative. Combining applicatives is easy:

[combine [wrap T₀] (T₁ ... T_n) e] → [combine T₀ ([eval T₁ e] ... [eval T_n e]) e]

Combining operatives is a bit more complicated. (It will help if you're familiar with how Kernel's $vau works; see here.) The combination is rewritten as an evaluation of the body of the operative (its third element) in a local environment. The local environment starts as the static environment of the operative (its fourth element); then the ordinary parameters of the operative (its first element) are locally bound to the operands of the combination; and the environment parameter of the operative (its second element) is locally bound to the dynamic environment of the combination.

[combine [operative T₁ T₂ T₃ T₄] V e] → [eval T₃ match(T₁,V) · match(T₂,e) · T₄] if the latter is defined

where match(X,Y) constructs an environment binding the symbols in definiend X to the corresponding subterms in Y; e_child · e_parent concatenates two environments, producing an environment that tries to look up symbols in e_child, and failing that looks for them in e_parent; and a value (V) is a term such that every active subterm is inside a self-evaluating subterm.

Sure enough, there are no variables here. This calculus behaves correctly. However, it has a weak equational theory. Consider evaluating the following two expressions in a standard environment e₀.

[eval ($lambda (x) (+ 0 x)) e₀]
[eval ($lambda (x) (* 1 x)) e₀]

Clearly, these two expressions are equivalent; we can see that they are interchangeable. They both construct an applicative that takes one numerical argument and returns it unchanged. However, the rewriting rules of the calculus can't tell us this. These terms reduce to

[wrap [operative (x) #ignore (+ 0 x) e₀]]
[wrap [operative (x) #ignore (* 1 x) e₀]]

and both of these terms are irreducible! Whenever we call either of these combiners, its body is evaluated in a local environment that's almost like e₀; but within the calculus, we can't even talk about what will happen when the body is evaluated. To do so we would have to construct an active evaluation term for the body; to build the active term we'd need to build a term for the local environment of the call; and to build a term for that local environment, we'd need to bind x to some sort of placeholder, meaning "some term, but we don't know what it is yet".

A variable is just the sort of placeholder we're looking for. So let's add some syntax. First, a primitive domain of variables. We call this domain x_p, where the "p" stands for "partial evaluation", since that's what we want these variables for (and because, it turns out, we're going to want other variables that are for other purposes). We can't put this primitive domain under nonterminal S because, when we find out later what a variable stands for, what it stands for might not be self-evaluating; nor under nonterminal A because what it stands for might not be active. So x_p has to go directly under nonterminal T.

T ::= x_p

We also need a binding construct for these variables. It's best to use elementary devices in the calculus, to give lots of opportunities for provable equivalences, rather than big monolithic devices that we'd then be hard-put to analyze. So we'll use a traditional one-variable construct, and expect to introduce other devices to parse the compound definiends that were handled, in the variable-less calculus, by function match.

S ::= ⟨λ x_p.T⟩

governed by, essentially, the usual β-rule of λ-calculus:

[combine ⟨λ x_p.T⟩ V e] → T[x_p ← V]

That is, combine a λ-expression by substituting its operand (V) for its parameter (x_p) in its body (T). Having decided to bind our variables x_p one at a time, we use three additional operative structures to deliver the various parts of the combination one at a time (a somewhat souped-up version of currying): one structure for processing a null list of operands, one for splitting a dotted-pair operand into its two halves, and one for capturing the dynamic environment of the combination.

S ::= ⟨λ₀.T⟩ | ⟨λ₂.T⟩ | ⟨ε.T⟩

The corresponding rewrite rules are

[combine ⟨λ₀.T⟩ () e] → T
[combine ⟨λ₂.T₀⟩ (T₁ . T₂) e] → [combine [combine T₀ T₁ e] T₂ e]
[combine ⟨ε.T₀⟩ T₁ e] → [combine [combine T₀ e ⟪⟫] T₁ e]

Unlike the variable-less calculus, where the combine rewrite rule initiated evaluation of the body of an operative, here evaluation of the body must be built into the body when the operative is constructed. This would be handled by the δ-rules (specialized operative-call rewrite rules) for evaluating function definitions. For example, for variables x,y and standard environment e₀,

[eval ($lambda (x) (+ 0 x)) e₀] →⁺ [wrap ⟨ε.⟨λy.⟨λ₂.⟨λx.⟨λ₀.[eval (+ 0 x) ⟪x ← x⟫ · e₀]⟩⟩⟩⟩⟩]

Variable y is a dummy-variable used to discard the dynamic environment of the call, which is not used by ordinary functions. Variable x is our placeholder, in the constructed term to evaluate the body, for the unknown operand to be provided later.

The innermost redex (reducible expression) here, [eval (+ 0 x) ⟪x ← x⟫ · e₀], can be rewritten through a series of steps,

[eval  (+ 0 x)  ⟪x ← x⟫ · e₀]
→ [combine  [eval  +  ⟪x ← x⟫ · e₀]  (0 x)  ⟪x ← x⟫ · e₀]
→ [combine  [wrap +]  (0 x)  ⟪x ← x⟫ · e₀]
→⁺ [combine  +  (0 x)  ⟪x ← x⟫ · e₀]

Where we can go from here depends on additional information of one or another kind. We may have a rule that tells us the addition operative + doesn't use its dynamic environment, so that we can garbage-collect the environment,

→ [combine + (0 x) ⟪⟫]

If we have some contextual information that the value of x will be numeric, and a rule that zero plus any number is that number back again, we'd have

→ x

At any rate, we only have the opportunity to even start the partial evaluation of the body, and contemplate these possible further steps, because the introduction of variables allowed us to write a term for the partial evaluation in the first place.

[edit: I'd goofed, in this post, on the combination rule for λ₀; it does not of course induce evaluation of T. Fixed now.]

Continuations by global rewrite

The idea of using λ-calculus to model programming language semantics goes back at least to Peter Landin in the early 1960s, but there are a variety of programming language features that don't fit well with λ-calculus. In 1975, Gordon Plotkin proposed a remedy for one of these features — eager argument evaluation, whereas ordinary λ-calculus allows lazy argument evaluation and thereby has different termination properties. Plotkin designed a variant calculus, the λ_v-calculus, and proved that on one hand λ_v-calculus correctly models the semantics of a programming language with eager argument evaluation, while on the other hand it is comparably well-behaved to traditional λ-calculus. Particularly, the calculus rewriting relation is compatible and Church-Rosser, and satisfies soundness and completeness theorems relative to the intended operational semantics. (I covered those properties and theorems a bit more in an earlier post.)

In the late 1980s, Matthias Felleisen showed that a technique similar to Plotkin's could be applied to other, more unruly kinds of programming-language behavior traditionally described as "side-effects": sequential control (continuations), and sequential state (mutable variables). This bold plan didn't quite work, in that he had to slightly weaken the well-behavedness properties of the calculus. In both cases (control and state), the problem is to distribute the consequences of a side-effect to everywhere it needs to be known; and Felleisen did this by having special constructs that would "bubble up" through the term, carrying the side-effect with them, until they encompassed the whole term, at which point there would be a whole-term rewriting rule to distribute the side-effect to everywhere it needed to go. The whole-term rewriting rules were the measure by which the well-behavedness of the calculus would fail, as whole-term rewriting isn't compatible.

For sequential control (our central interest here), Felleisen added two operators, C and A, to λ_v-calculus. The syntax of λ_v-calculus, before the addition, is just that of λ-calculus:

T ::= x | (λx.T) | (TT)

In place of the classic β-rule of λ-calculus, λ_v-calculus has β_v, which differs in that the operand in the rule is a value (redexes have to be inside λ-terms):

((λx.T)V) → T[x ← V]

The operational semantics, which acts only on whole terms, uses (per Felleisen) an evaluation context E to uniquely determine which subterm is reduced:

E ::= ⎕ | (ET) | ((λx.T)E)

E[((λx.T)V)] ↦ E[T[x ← V]]

For the control calculus, the term syntax adds A and C,

T ::= x | (λx.T) | (TT) | (AT) | (CT)

Neither of these operators has the semantics of call-with-current-continuation. Instead, (AT) means "abort the surrounding computation and just do T", while (CT) means "abort the surrounding computation and apply T to the (aborted) continuation". Although it's possible to build conventional call-with-current-continuation out of these primitive operators, the primitives themselves are obviously intrinsically whole-term operators. Operationally, evaluation contexts don't change at all, and the operational semantics has additional rules

E[(AT)] ↦ T
E[(CT)] ↦ T(λx.(AE[x])) for unused variable x

The compatible rewrite relation, →, has rules to move the new operators upward through a term until they reach its top level. The compatible rules for A are dead easy:

(AT₁)T₂ → AT₁
V(AT) → AT

Evidently, though, once the A operator reaches the top of the term, the only way to get rid of it, so that computation can proceed, is a whole-term rewrite rule,

AT ᐅ T

The whole-term rule for C is easy too,

CT ᐅ T(λx.(Ax))

but the compatible rewrite rules for C are, truthfully, just a bit frightening:

(CT₁)T₂ → C(λx₁.(T₁(λx₂.(A(x₁(x₂T₂)))))) for unused x_k
V(CT) → C(λx₁.(T(λx₂.(A(x₁(Vx₂)))))) for unused x_k

This does produce the right behavior (unless I've written it out wrong!), but it powerfully perturbs the term structure; Felleisen's description of this as "bubbling up" is apt. Imho, it's quite a marvelous achievement, especially given the prior expectation that nothing of the kind would be possible — an achievement in no way lessened by what can now be done with a great deal of hindsight.

The perturbation effect appears to me, in retrospect, to be a consequence of describing the control flow of continuations using function-application structure. My own approach makes no attempt to imitate function-application and, seemingly as a result, its constructs move upward without the dramatic perturbation of Felleisen's C.

Various constraints can be tampered with to produce more well-behaved results. Felleisen later proposed to adjust the target behavior — the operational semantics — to facilitate well-behavedness, in work considered formative for the later notion of delimited continuations. The constraint I've tampered with isn't a formal condition, but rather a self-imposed limitation on what sort of answers can be considered: I introduce a new binding construct whose form doesn't resemble λ, and whose rewriting rules use a different substitution function than the β-rule.

Control variables

Consider the following Scheme expression:

(call/cc (lambda (c) (c 3)))

Presuming this is evaluated in an environment with the expected binding for call/cc, we can easily see it is operationally equivalent to 3. Moreover, our reasoning to deduce this is evidently local to the expression; so why should our formalism have to rewrite the whole surrounding term (perturbing it in the process) in order to deduce this?

Suppose, instead of Felleisen's strategy of bubbling-up a side-effect to the top level of a term and then distributing it from there, we were to bubble-up (or, at least, migrate up) a side-effect to some sort of variable-binding construct, and then distribute it from there by some sort of substitution function to all free occurrences of the variable within the binding scope. The only problem, then, would be what happens if the side-effect has to be distributed more widely than the given scope — such as if a first-class continuation gets carried out of the subterm in which it was originally bound — and that can be solved by allowing the binding construct itself to migrate upward in the term, expanding its scope as much as necessary to encompass all instances of the continuation.

I did this originally in vau-calculus, of course, but for comparison with Felleisen's A/C, let's use λ_v-calculus instead. Introduce a second domain of control variables, x_c, disjoint from x_p, and "catch" and "throw" term structures (κx.T) and (τx.T).

T ::= x_p | (TT) | (λx_p.T) | (κx_c.T) | (τx_c.T)

Partial-evaluation variables are bound by λ, control variables are bound by κ (catch). Control variables aren't terms; they can only occur free in τ-expressions, where they identify the destination continuation for the throw. κ and τ are evaluation contexts; that is,

E ::= ... | (κx.E) | (τx.E)

The rewrite rules for τ are pretty much the same as for Felleisen's A, except that there is now a compatible rewrite rule for what to do once the throw reaches its matching catch, rather than a whole-term rewrite for eliminating the A once it reaches the top of the term.

(τx.T₁)T₂ → τx.T₁
V(τx.T) → τx.T
κx.(τx.T) → κx.T

What about rewrite rules for κ? One simple rule we need, in order to relate expressions with κ to expression without, is "garbage collection":

κx.T → T if x does not occur free in T

We also want rules for κ to migrate upward —non-destructively— when it occurs in an evaluation context; but κ may be the target of matching τ expressions, and if we move the κ without informing a matching τ, that τ will no longer do what it was meant to. Consider a κ, poised to move upward, with a matching τ somewhere in its body (embedded in some context C that doesn't capture the control variable).

V(κx.C[τx.T])

If C happens to be an evaluation context, then it is possible for the τ to move upward to meet the κ and disappear; and, supposing x doesn't occur free in T, we'd have (VT). Even if C isn't an evaluation context, (τx.T) thus represents the potential to form (VT). If we move the κ over the V, then, in order for the τ to still represent the same potential it did before, we'd have to change it to (τx.(VT)). And this has to happen for every matching τ. So let's fashion a substitution function for control variables, T[x ← C] where C doesn't capture any variables:

y[x ← C] → y
(T₁T₂)[x ← C] → ((T₁[x ← C])(T₂[x ← C]))
(λy.T)[x ← C] → (λy.(T[x ← C]))    where y isn't free in C
(κy.T)[x ← C] → (κy.(T[x ← C]))    where y isn't x or free in C
(τy.T)[x ← C] → (τy.(T[x ← C]))    if y isn't x
(τx.T)[x ← C] → (τx.C[T[x ← C]])

The "where" conditions are met by α-renaming as needed. Now we're ready to write our rewrite rules for moving κ upward:

(κx.T₁)T₂ → κx.(T₁[x ← ⎕T₂] T₂) where x isn't free in T₂
V(κx.T) → κx.(V(T[x ← V⎕])) where x isn't free in V
κy.(κx.T) → κy.(T[x ← y])

As advertised, κ moves upward without perturbing the term structure (contrast with the bubbling-up rules for C). If we need a first-class continuation, we can simply wrap τ in λ: (λy.(τx.y)). The δ-rule for call/cc would be

(call/cc T) → κx.(T(λy.(τx.y))) for unused x

If this occurs in some larger context, and the first-class continuation escapes into that larger context, then the matching κ will have had to move outward before it over some evaluation context E, and substitutions will have transformed the continuation to (λy.(τx.E[y])).

Insights

The orthogonality of continuation control-flow to function application is, in the lower-level math, rather explicitly demonstrated by the way κ moves smoothly upward through the term, in contrast to the perturbations of the bubbling-up rules for C as it forcibly interacts with function-application structure. The encapsulation of a τ within a λ to form a first-class continuation seals the deal.

The notion that continuations are a whole-term phenomenon —or, indeed, that any side-effect is a whole-term phenomenon— breaks down under the use of floating binding-constructs such as κ, which doesn't require the side-effect to remain encapsulated within a particular subterm, but does allow it to do so and thus allows local reasoning about it to whatever extent its actual behavior remains local. Whether or not that makes traditional continuations "undelimited" is a question of word usage: the κ binding-construct is a delimiter, but a movable one.

As a matter of tangential interest here, the vau-calculus handling of sequential state involves two new kinds of variables and four new syntactic constructs (two of which are binders, one for each of the new kinds of variables). Here's a sketch: Mutable data is contained in symbol-value assignments, which in turn are attached to environments; the identity of an environment is a variable, and its binding construct defines the region of the term over which that environment may be accessible. Assignments are a separate, non-binding syntactic construct, which floats upward toward its matching environment-binding. When a symbol is evaluated in an environment, a pair of syntax elements are created at the point of evaluation: a query construct to seek an assignment for the given symbol in the given environment, which binds a query-variable, and within it a matching result construct with a free occurrence of the query-variable. The result construct is an indivisible term. The query is a compound, which migrates upward through the term looking for an assignment for the symbol in the environment. When the query encounters a matching assignment, the query is annihilated (rather as a τ meeting its matching κ), while performing a substitution that replaces all matching result-constructs with the assigned value (there may by this time be any number of matching result-constructs, since the result of the original lookup may have been passed about in anticipation of eventually finding out what its value is).

As a final, bemusing note, there's a curious analogy (which I footnoted in my dissertation) between variables in the full side-effectful vau-calculus, and fundamental forces in physics. The four forces of nature traditionally are gravity, electromagnetism, strong nuclear force, and weak nuclear force; one of these —gravity— is quite different from the others, with a peculiar sort of uniformity that the others lack (gravity is only attractive). Whilst in vau-calculus we have four kinds of variables (partial-evaluation, control, environment, and query), of which one —partial-evaluation— is quite different from the others, with a peculiar sort of uniformity that the others lack (each of the other kinds of variable has an α-renaming substitution to maintain hygiene, and one or more separate kinds of substitution to aid its purpose in rewriting; but partial-evaluation variables have only one kind of substitution, of which α-renaming is a special case).

[Note: I later explored the physics analogy in a separate post, here.]

ab	=	(a₀b₀ − a₁b₁ − a₂b₂ − a₃b₃)
		+ i₁ (a₀b₁ + a₁b₀ + a₂b₃ − a₃b₂)
		+ i₂ (a₀b₂ − a₁b₃ + a₂b₀ + a₃b₁)
		+ i₃ (a₀b₃ + a₁b₂ − a₂b₁ + a₃b₀)	.

aitak	irakurri	nai du	amak	erre	duen	liburua
father	to read	wants	mother	to burn	has done	book
(ergative)	(infinitive)	(desire has)	(ergative)	(infinitive)	(relativized)	(absolutive)

TF	→	set a	→	TF
TF	→	set b	→	TT
TF	→	clear a	→	FF
TF	→	clear b	→	TF
TF	→	copy a	→	TT
TF	→	copy b	→	FF

		set a →
TT	w_TT		w_TT + w_FT
TF	w_TF		w_TF + w_FF
FT	w_FT		−w_FT
FF	w_FF		−w_FF

		set b →
TT	w_TT		w_TT + w_TF
TF	w_TF		−w_TF
FT	w_FT		w_FT + w_FF
FF	w_FF		−w_FF

		clear a →
TT	w_TT		−w_TT
TF	w_TF		−w_TF
FT	w_FT		w_TT + w_FT
FF	w_FF		w_TF + w_FF

		clear b →
TT	w_TT		−w_TT
TF	w_TF		w_TT + w_TF
FT	w_FT		−w_FT
FF	w_FF		w_FT + w_FF

		copy a →
TT	w_TT		w_TT + w_TF
TF	w_TF		−w_TF
FT	w_FT		−w_FT
FF	w_FF		w_FT + w_FF

		copy b →
TT	w_TT		w_TT + w_FT
TF	w_TF		−w_TF
FT	w_FT		−w_FT
FF	w_FF		w_TF + w_FF

By construction of A₀,	A₀(i)=(A,i,confirm).
By construction of T_yes,	T_yes(A,i,confirm)=[A:i⇒confirm].
By construction of T_no,	T_no(A,i,confirm)=[A:i⇏confirm].
By consistency self-proof,	L confirms [((L∘T_yes)∧(L∘T_no)):(A,i,confirm)⇏confirm].
By [d],	L confirms [(((L∘T_yes)∧(L∘T_no))∘A₀):i⇏confirm].
By [g],	L confirms [(((L∘T_yes)∘A₀)∧((L∘T_no)∘A₀)):i⇏confirm].
By _[f] and [c],	L confirms [((L∘(T_yes∘A₀))∧(L∘(T_no∘A₀))):i⇏confirm].
By definition of A,	L confirms [((L∘(T_yes∘A₀))∧A):i⇏confirm].
By [e],	L confirms [((L∘(A⇒confirm))∧A):i⇏confirm].
By [h],	L confirms [(A∧A):i⇏confirm].
By [b],	L confirms [A:i⇏confirm].
By [a] and definition of A,	L confirms [A:i⇒confirm].