Tuesday, March 17, 2020

Irregularity in language

you wouldn't believe the kind of hate mail I get about my work on irregular verbs
Stephen Pinker, in an interview with The Guardian, 2007.

Assembling my prototype conlang Lamlosuo transformed my understanding of irregularity in language.  That was unexpected.  The prototype was supposed to be a learning vehicle, yes — for learning about the language model I'd devised.  Irregularity wasn't mentioned on the syllabus.

I set out to create an experimental prototype conlang with radically different semantic underpinnings than human natural languages (natlangs).  (This blog is littered with evidence of my penchant for studying the structure of things by devising alternative structures to contrast with them.)  The prototype was meant as a testbed for trying out features for conlangs based on the envisioned semantics; it had no strong stake in regularity, one way or another, aside from an inclination not to deliberately build in irregularities that would make the testbed less convenient to work with.  The effect of the experiment, though, was rather like scattering iron filings on a piece of paper above a magnet, and thereby revealing unsuspected, ordinarily-invisible structure.  From contemplating the shape of the prototype that emerged, I've both revised my thinking on irregularity in general, and drawn take-away lessons on the character of the language structure the prototype is actually meant to explore.

My first post on Lamlosuo, several years ago now, laid out the premise of the project and a limited set of its structural consequences, while deferring further complications —such as an in-depth discussion of irregularity— to some later post.  This post is its immediate sequel, describing major irregular elements of Lamlosuo as they emerged, as well as what I learned from them about irregularity in general and about the language model in particular.

[Overall insights about the language project are largely —though by no means entirely— concentrated in the final section below.  Insights into irregularity are distributed through the discussion, as they arise from details of the language.]

Contents
Irregularity
Vector language
Regularity
Routine idiosyncrasies
Patterns of variation
Extraordinary idiosyncrasies
Whither Lamlosuo?
Irregularity

From our early-1970s hardcopy Britannica (bought by my parents to support their children's education), I gathered that commonly used words tend to accumulate irregularities, while uncommonly used words tend to accumulate regularity by shedding their irregularities.  From 1990s internet resources on conlanging (published there by scattered conlangers as they reached out through the new medium to form a community), I gathered that irregularity may be introduced into a conlang to make it feel more naturalistic.  All of which I still believe, but these credible ideas can easily morph into a couple of major misapprehensions about irregularity, both of which I was nursing by the time I committed to my first conlanging project, at the turn of the century:  that the only reason natlangs have irregularity is that natlangs evolve randomly in the field, so that a planned conlang would only have irregularity if the designer deliberately put it there; and that irregularity serves no useful function in a language, so that desire for naturalism would be the only reason a conlang designer would put it there.

20 years later, I'd put my current understanding this way:  Irregularity is a natural consequence of the impedance mismatch between the formal structure of language and the sapient semantics communicated through it (a mismatch I last blogged about yonder).  Sapient thought structures are too volatile to fit neatly into a single rigid format; large parts of a language, relatively far from its semantic core, may be tolerably regular, but the closer things get to its semantic core, the more often they call for variant structure.  It may even be advantageous for elements near the core to be just slightly out of tune with each other, so they create (to use another physics metaphor) a complex interference pattern that can be exploited to slip sapient-semantic notions through the formal structure.  Conversely, one may be able to deduce where the semantic core of the language is, from where this effect stirs up irregularity.  By similar feedback, also, structural nonuniformities can orient sapient users of the language as they work intensively with the semantic core; I analogize this with the bumps on the F and J keys of a QWERTY keyboard, which allow a touch-typist to feel when their fingers are in standard position.

These effects are likely to apply as well to programming languages, which are ultimately vehicles for sapient thought.  Note that the most peculiar symbol names of Lisp are concentrated at its semantic core:  lambda, car, cdr.

Vector language

My central goal for this first conlanging project was to entirely eliminate nouns and verbs, in a grammatical sense, by replacing the verb-with-arguments structural primitive of human natlangs with some fundamentally different structural primitive.  The verb-with-arguments structural pattern induces asymmetry between the grammatical functions of the central "verb" and the surrounding "nouns", which afaics is where the grammatical distinction between verbs and nouns comes from.  (My notes also call these "being-doing" languages, as verbs commonly specify "doing" something while nouns specify simply "being" something.)  In the structure I came up with to replace this, each content element would be, uniformly, an act of motion ("going"), understood to entail a thing that goes (the cursor), where it's going from and to, and perhaps some other elements such as the path by which it goes.  For the project as a whole I hoped to have several related languages and some grammatical variance between them, but figured I'd need first to understand better how a language of this sort can work, to understand the kinds of variation possible.  So I set out to build a prototype language, to serve as a testbed for studying whether-and-how the language model could work.

In the prototype language, there is just one open class of vocabulary words, called vectors, each of which has five participant slots, called roles.  The five roles are:  cursor, start, end, path, and pivot.  The name pivot suggests that the action is somehow oriented about the pivot element, but really the pivot role is a sort of catch-all, a place to put an additional object associated with the action in some way.  The pivot role in itself says something about irregularity.  In lexicon building, each vector has definitions for each of its occupied roles.  Defining all these roles for a given vector, I've found, establishes the meaning of the vector with great clarity.  The cursor is the only absolutely mandatory role:  there can't be a going without something that goes.  The start and end are usually clear.  The path is usually fairly straightforward as well, though sometimes occupied by an abstract process rather than a physical route of travel.  But each vector is, in the end, semantically unique; and its uniqueness rebels against being pinned down precisely into a predetermined form —I analogize this to the Heisenberg uncertainty principle, where constraining one part of a particle's description requires greater leeway for another part— so that while the cursor, start, and end are usually quite uniform, and the path has limited flexibility, the pivot provides more significant slack to accommodate the idiosyncrasy of each vector.

For example:  The first meaning I worked out for the language was a vector meaning speak.  This was before the language even had a phonology; it was meant to verify, before investing further in the structural primitive, that it was capable of handling abstracts; and speak, as a meaning in a conlang, was appealingly meta.  In a speech act, it seemed the thing that goes from somewhere to somewhere is the message; so I reckoned the cursor should be the message, the thing said.  The start would then be the speaker; and the end would be whomever receives it, the audience.  It was unclear whether the path would be more usefully assigned to the route by which the message travels, or the transmission medium through which it travels (such as the air carrying sound, or aether carrying radio waves); waiting for a preference to emerge, I toyed with one or the other in my notes but ultimately the path role of that vector has remained unoccupied.  For the pivot, I struck on the idea of making it the language in which the message is expressed (such as English — or Lamlosuo). 

The "escape-valve" pattern —regularity with an outlet to accommodate variant structure that doesn't neatly fit the regularity— recurred a number of times in the language design as it gradually emerged.  The various escape mechanisms accommodate different grades of variant structure, and while the relations between these devices are more complex than mere nesting, the whole reminds me somewhat of a set of matryoshka dolls.  With that image in mind, I'm going to try to order my description of these devices from the outside in, from the broadest and mildest irregularities to the narrowest and most extreme.

It's a fair question, btw, where all this emergent structure in the prototype emerges from.  It all comes through my mind; the question is, what was I tapping into?  (I'll set the origin of the vector primitive itself outside the scope of the question, as the initial inspiration seems rather discontinuous whereas the process after that may be somewhat fathomable.)  My intent has been to access the platonic structure of the language model; that's platonic with a lower-case p, meaning, structure independent of particular minds in the same sense that mathematical structure is independent of particular minds.  Given the chosen language primitive, I've tried through the prototype to explore the contours of the platonic structural space around that chosen primitive, letting natural eddies in that space shape the design while, hopefully, reducing perturbations from biases-of-thought enough to let the natural eddies dominate.  (I also have some things I could say on the relationship between platonic structure and sapient thought, which I might blog about at some point if I can figure out how to address it without getting myself, and possibly even those who read it, hopelessly mired in a quag of perspective bias.)

Regularity

The outermost nesting shell; the outer matryoshka doll, as it were; is, in theory, the entirely regular structure of the language.  I shall attempt to enumerate just those parts in this section, as briskly as may be.  This arrangement turns out to be somewhat challenging, both because language features aren't altogether neatly arranged by how regular they are, and because the noted concentration of irregularity toward the semantic core assures there will be some irregularity in nearly all remotely interesting examples in Lamlosuo (on top of the limitations of Lamlosuo's thin vocabulary).  Much of this material, with a bit more detail on some things and less on others, is included in the more leisurely treatment in the earlier post.

Ordinarily, a syllable has five possible onsetsf s l j w (as in fore, sore, lore, yore, wore);  five possible nucleii u e o a (close front, close back, mid front, mid back, open; in my idiolect, roughly as in teem, tomb, tame, tome, tom);  and two possible codasn m (as in nor, more).  In writing a word, if a front vowel (i or e) is followed by j and another vowel, or if a back vowel (u or o) is followed by w and another vowel, the consonant between those vowels is omitted; for example, lam‍losu‍wo would be shortened to lam‍losu‍o.  Two other sounds occasionally arise:  an allophone of f, written as t (the initial sound of thorn);  and one plosive written as an apostrophe, ' (the initial sound of tore).

A basic vector word consists of an invariant stem and a mandatory class suffix.  The stem is two or more consonant-vowel syllables (accent on the first syllable of the stem), and the class suffix is one consonant-vowel syllable.  There are eleven classes:  the neutral class, and ten genders;  a neutral vector is sort-of a lexical verb, an engendered vector is sort-of a lexical noun (though this distinction lacks grammatical punch, as they're all still vectors).  The neutral suffix after a back vowel (u or o) is ‑wa, otherwise it's ‑ja (so, the suffix consonant is omitted unless the stem ends with a).  Genders identify role (one of the five) and volitionality (volitional or non-volitional).  Non-volitional genders use front vowels, volitional genders use back vowels; the onset determines the role:  ‑li/‑lu cursor, ‑ti/‑tu start, ‑se/‑so end, ‑je/‑jo path, ‑we/‑wo pivot.  Somewhat relevant to irregularity, btw:  start and end genders deliberately use different vowels to strengthen their phonological contrast since they have relatively weak semantic contrast; while, on the other hand, an earlier experiment in the language determined that assigning the vowels in consistent sets (either i/u or e/o, never i/o or e/u) is a desirable regularity to avoid confusion.

For example:  The vector meaning speak has stem losu-.  The neutral form is losua; engendered forms are losuli (message, non-volitional), losutu (speaker, volitional), losuso (audience, volitional), losuo/losue (living language/non-living language).  My first thought for the non-volitional pivot, losue, was dead language; but then it occurred to me that that gender would also suit a conlang.

Vector words can also take any of a limited set of prefixes, each of the form consonant-vowel-consonant; as the two coda consonants are very similar (m and n), I try to avoid using two prefixes that differ only by coda.  In ideal principle, each prefix would modify its vector in a uniform way.  A vector prefix can also be detached from the vector it modifies, to become a preposition. 

A simple clause is a chain of vectors, where each pair of consecutive vectors in the chain are connected by means of role alignment.  Generically, one puts between the two vectors first a dominant role particle, which specifies a role of the first vector (the dominant vector in the alignment), then a subordinate role particle specifying a role of the second vector (the subordinate vector in the alignment), indicating that the same object occupies those two roles.  Ordinarily, the dominant role particles are just the volitional gender suffixes, the subordinate role particles are just the non-volitional gender suffixes, all now as standalone words, except using f rather than t for the start particles.  For instance, losua fu li susu‍a would equate the start of losua with the cursor of susu‍a.  If a vector is engendered, one may omit its role particle from an alignment, in which case by default it aligns on its engendered role (though an engendered vector can be explicitly aligned on any of its roles).  There are also a set of combined role particles, using the usual role consonants with vowel a; a combined role particle aligns both vectors on that role.

Each of the fifteen basic role particles (five dominant, five subordinate, five combined) has a restrictive variant; the distinction being that a non-restrictive alignment asserts a relationship between vectors whose meanings are determined by other means, while a restrictive alignment must be taken into account in determining the meanings of the vectors.  Each restrictive role particle prefixes the corresponding non-restrictive particle with its own vowel; thus, ja → a‍ja, etc.

A clause can be packaged up as an object by preceding it with a subordinate content particle.  A subordinate content particle is simply a single vowel, as a standalone word.  The five subordinate content particles determine the mood of the objectified clause (and can also be used at the front of a sentence to assign a mood to the whole thing):  a, indicative; i, invitational; u, imperative; e, noncommittal; o, tentative.  Having bundled up a clause as an object, one can then treat it as the subordinate half of a role alignment with a dominant vector.  There are also dominant content particles, which package up the dominant vector (just the one vector) as an object to align with some role of the subordinate vector, thus beginning a subordinate relative clause.  Dominant content particles prefix ow- to the corresponding subordinate content particles (the w attaches to the second syllable, and then is dropped since preceded by a back vowel) — with a lone exception for the dominant tentative content particle, which by strictly regular construction should be oo but uses leading vowel u (thus, uo) to avoid confusion with the dominant restrictive pivot particle (oo).  (In crafting that detail, I was reminded of English "its" versus "it's".)

The image of a subordinate content particle packaging up a subordinate clause and objectifying it for alignment with a dominant role seems to have built into it a phrase-structure view of the situation.  Possibly there is a way to view the same thing in a dependency-grammar framework (rather like wave-particle duality in physics); the whole constituency/dependency thing is not yet properly clear to me, and when I designed that part of Lamlosuo I was unaware of the whole controversy:  phrase-structure was the only approach to grammar I'd even seen, somewhat in grade-school and intensively in compiler parsing algorithms.  So, this particular part of the language design might or might not contain an embedded structural bias.

A provector has a stem of the form vowel-consonant and a class suffix.  The provector stems are in- (interrogative), um- (recollective), en- (indefinite), on- (relative), an- (demonstrative).  The recollective provector has an antecedent earlier in the clause, and does not align with its syntactic predecessor; where ordinarily alignment can only align a vector with two others (the one before it and the one after it), as antecedent of a recollective provector it can participate in any number of additional alignments.  (The demonstrative provector, btw, serves the function of a third-person pronoun, using cursor an‍lu/an‍li in general, volitional start an‍tu for a person of the same sex/gender as the speaker, volitional end an‍so for a person of different sex/gender from the speaker; but I digress.)

A vector can incorporate a simple clause.  Position the vector at the front of the simple clause, and join the entire clause together with plosives (') between its words; the whole then aligns as its first vector, with the rest of the incorporated clause aligned to it independent of any other surrounding context.  Recollective provectors may be disambiguated by incorporating a copy of the antecedent vector.

Routine idiosyncrasies

Beyond a vector's definitions of its neutral form and up-to-ten genders, each vector has a number of conventions associated with it that accommodate low-to-medium-grade vector-idiosyncrasies of the sort that occur broadly throughout the vocabulary.  Role alignment is not as simple as "the object that occupies this role of this vector is the same object that occupies that role of that vector":  that isn't always the sort of relation-between-vectors that's wanted, and when it is, there may be refinements needed to clarify what is meant.  The meaning of an alignment is resolved primarily by alignment conventions of the dominant vector.  My notes on the language design suggest that exceptions to the regular sense of alignment are most often associated with vectors corresponding, in a verb-with-arguments language, to conjunctions and helping verbs.

Combined role particles play a significant part in this because, it turns out, the "standard" meaning of the combined role particles —to align the same role of both vectors, thus la = lu li, sa = so se, etc.— is rarely wanted.  The combined role particles are therefore an especially likely choice for reassignment by convention based on more realistic uses of a particular vector.  A given vector often has some practical use, due to the particular meaning of that vector, for alignments that involve multiple roles of each vector (as a simple example, one might equate the cursor of both vectors, and at the same time equate the end of the first vector with the start of the second); or, sometimes, for some other more peculiar alignment strategy appropriate to the vector's particular meaning; and combined role particles are routinely drafted for the purpose.

Several rather ordinary vectors have some role that, by the nature of their meaning, is often a complex information structure described by a subordinate clause, and therefore they use the combined role particle on that role to imply a subordinate noncommittal content particle (e):  losua la — (say that —), lawa‍ja la — (teach that —), susu‍a wa — (dream that —); sofo‍a (deduce) and soo‍a (imply) do this on multiple roles.  A more mundane example of variant alignment conventions (not involving implied content particles) applies to stem seli-, go repeatedly, whose cursor is the set of instances of repeated going.  When dominant in an alignment with combined cursor particle la, the subordinate vector is what is done repeatedly (a restrictive alignment); subordinate start, path, and end are assigned to those dominant roles, while subordinate cursor is assigned to dominant pivot.  Preceding seli- by a number indicates the number of repetitions; for example, siwe‍a seli‍a la jasu‍a = sneeze three times.  (In fact, this can be shortened to siwe seli jasu‍a; see my earlier remark on interesting examples.)

A moderately irregular configuration is two neutral vectors used consecutively in a clause with no explicit particle between them.  The strictly regular language assigns no meaning to this arrangement, as there are no gender suffixes on the vectors to determine default roles when omitting the role particles; the configuration has to depend on conventions of the dominant (or, less likely, the subordinate) vector.  The language notes stipulate that this type of alignment is restrictive.

Patterns of variation

The alignment idiosyncrasies of particular vectors fall into overall patterns.  At the start of Lamlosuo I didn't see this coming, which in retrospect seems part of my general failure to appreciate that irregularity is more than skin deep.  With increasing number of vectors explored in the lexicon, though, I began to sense the shapes of these patterns beneath the surface, and then tried to work out what some of them were.

Because these are patterns that arise in other patterns that arise in the language, they compound the ambiguity between (again) the language's platonic structure versus my latent biases of thinking:  each lexicon entry is subject to this ambiguity, both in the choice of the entry and in its articulation, while the perception of patterns amongst the data points is ambiguous again.  This blog post has a lopsided interest in the platonic structure —my biases would be entirely irrelevant if not for the drive to subtract them from the picture— but I'd recommend here to not stint on even-handed skepticism.  Vulnerable as the process is to infiltration by biases of thinking (the phrase "leaky as a sieve" comes to mind), it should be no less vulnerable to infiltration from the platonic structure of the language.  Influences from the platonic realm can seep in both directly by perturbing interplay of language elements, and indirectly by perturbing biases of thought at any point in the backstory of the thought.  Biased influence can therefore be platonic influence; or, putting the same thing another way, the only biases we'd want to subtract from the picture are those that aren't ultimately caused by the platonic structure.  However murky the process gets, I'd still hope for the emergent patterns to carry evidence of the platonic structure.

Very early on, I'd speculated consecutive neutral vectors might align by chaining sequentially, cursor-to-cursor and end-to-start.  This in its pure form looked less plausible as the lexicon grew, as it became clear that many vectors were of the wrong form.  (For instance, aligning losua susu‍a in this way —susu‍a means sleep— would equate the message with the person who sleeps, and the audience with the act of falling asleep.)  Another early notion was that some vectors would be used to modify other vectors, by aligning in parallel with them — equating cursor-to-cursor, path-to-path, start-to-start, end-to-end.  I've called these modifiers advectors.  Parallel alignment could be assigned, by dominant-vector-driven convention, to consecutive neutral vectors, and perhaps to the combined path particle (ja, which in this case would take on restrictive effect by convention).  The sequential/parallel preference also arises in the semantics of more general alignments, such as the sentence (mentioned earlier) losua fu li susu‍a, which describes a speech act and a sleep act, both by the same person (dominant start, the speaker, is aligned with subordinate cursor, the sleeper); to understand the import of the alignment, one has to know whether the speaking and sleeping events take place in parallel (so that the person is speaking while sleeping) or in series (so that the person speaks and then sleeps).

When the merger of two vectors allows their combination to be treated as a single vector, the vector stems may be concatenated directly, forming a compound stem which can then be engendered after the merge.  For example, stem lolulelo- means father, sequentially combining lolu- (impregnate) and lelo- (give birth to).  According to current notes, btw, lolulelo- has shortened form lole-.

When a series of consecutive neutral vectors form a compact clause, short of merging into a single compact vector, I've considered a convention that the neutral class suffix may be omitted from all but the last of the sequence — "in all but the most formal modern usage", as the language notes say.  (Evidently I hesitated over this, as the latex document has a boolean flag in it to turn this language feature on or off; but it's currently on.)

Accumulating vocabulary gradually revealed that pivots generally fell into several groupings.  A reference point defining the action (whence the term pivot).  Intermediate point on the path.  Motivation for the action.  Agent causing the action.  Instrument.  Vehicle.  Listing these now, it seems apparent these are the sorts of things that —in a typical European natlang— might well manifest as a clutter of more-or-less-obscure noun cases.  I'd honestly never thought of those sorts of clutters-of-noun-cases as a form of intermediate-grade irregularity (despite having boggled at, say, Finnish locatives); and now I'm wondering why I hadn't.

Eventually, I worked out a tentative system of three logical roles —patient, agent, instrument— superimposed on the five concrete roles.  These logical roles would map to concrete roles identifying the associated noun primarily affected by the action (patient), initiating the action (agent), and implementing the action (instrument).  Of the three, only patient is mandatory; agent and instrument often occur, but sometimes either or both don't.  Afaics, agent and instrument are always distinct from each other, but either may map to the same concrete role as patient.

Patient is usually either cursor or end, though occasionally pivot or start; "path patient", say my notes, "is unattested". Agent is usually either cursor, start, or pivot; if the patient is the cursor, usually the agent is either pivot or cursor.  Instrument is usually either cursor or pivot:  pivot when cursor is agent, cursor when cursor isn't agent.  Patient also correlates with natlang verb valency:  when a vector corresponds to an intransitive verb, its patient is almost always the cursor, when to a transitive verb, its patient is typically the end.

For some time it remained unclear whether the logical roles should be considered a platonic feature.  I've often taken a "try it, see if it works" attitude toward adding things to the language, which is after all meant to be a testbed; the eventual rough indicator of a feature's platonic authenticity (platonicity?) is then how well it takes hold in the language once added.  A few of the things I've added just sat there inertly in the language design, until eventually discarded as failing to resonate with the design (such as a vector equivalent of the English verb be; which in retrospect clashes with the Lamlosuo design, both as copula which is what role particles are for, and as pure being whereas vectors impose motion on everything they portray).  Given some time to settle in, logical roles appear reasonably successful, having deeply integrated into some inner workings of the language:  various sorts of alignments both guide and are guided by logical roles.  Alignment guides logical roles, notably, in restrictive sequential or parallel alignments; for example, an advector inherits the logical roles of the other vector in parallel alignment.  Logical roles guide alignment in the highly irregular vector(s) at the apparent heart of the language, which I'll describe momentarily.

I wondered about aspect —the structure of an activity with respect to time (as opposed to its placement in time, which is tense)— for the prototype language, since aspect is a prominent feature of human natlangs.  Aspect has arisen in Lamlosuo mainly through the principle that the action of a neutral vector is usually supposed by default to happen once, whereas the action of an engendered vector is usually supposed to happen habitually.  Thus, in losua fu li susu‍a someone speaks and then sleeps, whereas in losutu li susu‍a a habitual speaker sleeps.  Usually, in a restrictive alignment, aspect too is inherited by the dominant vector, which affords some games with aspect by particular vectors (deferred to the next section below).  If one wanted more nuanced sorts of aspect in the testbed language, one might introduce them through alignments with particular vectors that exist to embody those aspects; however, I never actually did this.  Allowing myself to be guided by whatever "felt" natural to pursue (so one may speculate what sort of butterfly started the relevant breeze), my explorations led me instead to something... different.  Not relating a vector to time, but rather taking "tangents" to the basic vector at various points and in various abstract-spatial directions.  As the trend became more evident, I dubbed that sort of derived relation attitude.  (My language notes assert, within the fictional narrative, that the emphasis on attitude rather than aspect is a natural consequence of the language speakers' navigational mindset.)  Some rather mundanely regular particular vectors were introduced to support attitudes; looking through the lexicon, I see stems jali- (leave), jeli- (go continuously), joli- (arrive), supporting respectively the inceptive, progressive, and terminative attitudes.

Extraordinary idiosyncrasies

In any given language, it seems, there's likely to be some particular hotspot in the vocabulary where idiosyncrasies cluster.  Hopefully, the location of such a hotspot ought to say something profound about the language model, though as usual there's always potential bias-of-thought to take into account.  The English verb be is a serious contender for the most irregular verb in the language, with do coming in a respectable second to round out this semantic heart of the language structure.  As noted earlier, I've sometime referred to human languages as "being-doing languages"; and occasionally my notes have called vector languages "going languages".  Early on, I simplistically imagined that a generic vector meaning go might be the center of the language.  Apparently not, though; in the central neighborhood, yes, but not at the very heart.  The stand-out vector that's accumulated irregularity like it's going out of style is fajoa — meaning change state.

A sort of microcosm for this hotspot effect occurs in the finitely bounded set of Lamlosuo's vector prefixes (which, by the phonotactics described earlier, are each consonant-vowel-consonant, so there are at most 5×5×2 = 50 of them, or 25 if no two prefixes differ only in their final consonant; the current lexicon has 12, which is about 50% build-out and feels fairly dense).  Most of the prefixes are fairly straightforward in function (since prefix jun- makes a vector reflexive, junlosua would be talk to oneself; and so on).  The most exceptional prefix, consistently through the evolution of the language, has been lam-, which makes the vector deictic, i.e, makes it refer to the current situation.  The deictic prefix, as I've used it, is rather strongly transformative and I've used it only sparingly, on a few vectors where its effect is especially useful; in particular, stems losu-, sile-, jilu-.  (Though I would expect a fluent speaker to confidently use lam- in less usual ways when appropriate, as fluent speakers are apt to occasionally bend their language to the bafflement of L2 speakers.)

Stem lam‍losu- is the speaking in which the word itself occurs.  Several of its engendered forms are particularly useful; lamlosuo (volitional pivot) is the living language of the speaking in which the word occurs, hence the conlang itself (viewed fictionally as a living language); lam‍losu‍so (volitional end) is the audience, thus the second-person pronoun; lam‍losu‍tu (volitional start) is the speaker, thus the first-person pronoun.  The latter two are contracted (a purely syntactic form of irregularity, motivated by convenience/practicality) to laso and latu.

Stem sile- means experience the passage of time; the cursor is the experiencer; path, time; start, the experiencer's past; end, their future; pivot, the moment of experience.  lam‍sile- is the immediate experience-of-time, whose pivot is now; after working with it for a while, I adopted a convention that the past/present/future might colloquially omit the prefix.  Tense is indicated by aligning a clause with engendered sile‍tu (past), sile‍wo (present, if one wants to specify that explicitly), or sile‍so (future).  Hence, latu fi losua oa sile‍tu = sile‍tu a latu fi losua = I spoke.

Stem jilu- means go or travel in a generic sense (whereas go in a directional sense is wilu-).  lam‍jilu‍a is the going we're now engaged in; its cursor is an inclusive first-person pronoun (we who are going together); path, the journey we're all on (i.e, the activity we're engaged in); pivot, here; end (or occasionally start), there.  With preposition sum indicating a long path, this enters into the formal phrase sum lam‍sile‍tu sum lam‍jilu‍selong ago and far away.

Now, fajo-.  Change state.  The cursor is the thing whose state changes.  Non-volitional path is the process of state-change, volitional path is the instrument of state-change.  Non-volitional pivot is an intermediate state, volitional pivot is the agent of state-change.  Start and end, both non-volitional, are the state before and after the change.

When dominant fajo- aligns its cursor with some role of a subordinate vector, fajo- is the state change undergone by the aligned subordinate role during the action of the subordinate vector.  Either the dominant role, the subordinate role, or both may be elided; the dominant role when unspecified defaults to cursor —even if fajo- is engendered, an extraordinary exception— while the subordinate role when unspecified defaults to patient, making the meaning of the construct overtly dependent on which concrete role of the subordinate vector is the patient.  Along with all this, the dominant pivot aligns to the subordinate agent, and dominant path to subordinate instrument (when the subordinate vector has those logical roles).  According to the language notes, if the subordinate vector doesn't have an agent, and the subordinate pivot is an intermediate point on the subordinate path (as e.g. for sile-), and the subordinate cursor aligns with the dominant cursor, the dominant pivot is the state of the subordinate cursor as it passes through the subordinate pivot.

One thus has such creatures as fajo‍ti losu‍tu, the state of having not yet spoken; and fajo‍se losu‍so, the state of having been spoken to.  (Notice that these things take many more words to say in English than in Lamlosuo, whereas the past tense took many more words to say in Lamlosuo than in English.)

Cursor-aligned fajo- can also take the form of a preposition fam or prefix fam-, with the difference between the two that engenderment of the vector is applied after a prefix, but before a preposition.  Thus, fam⟨stem⟩⟨gender⟩ = fajo⟨gender⟩ ⟨stem⟩a.  For example, susu‍e = event of dreaming, fam‍susu‍e = fajo‍e susu‍a = state of dreaming.

When dominant fajo- aligns its path with a subordinate content clause, fajo- is the state change vector of the complex process described by the content clause.  Combined role particle ja initiates a noncommittal content clause by implying subordinate content particle e.  The dominant cursor is then the situation throughout the process, dominant start the situation before the process, dominant end the situation after the process, dominant pivot the agent of the process.

fajoa has siblings lajoa and wajoa.

lajoa describes a change of mental state.  Dominant path of lajo- doesn't align with a subordinate clause, but dominant cursor aligns similarly to fajo-, describing the change of mental state of whichever participant in the subordinate action; noting, the agent, if not otherwise determined, is the cursor's inclination toward the change (always available in the volitional pivot engendered form, lajo‍o).  For example, recalling sile‍tu = past (earlier point in time), where fajo‍ti sile‍a = fam‍sile‍ti = youth (external state at earlier point in time), lajo‍ti sile‍a = inexperience (internal state at earlier point in time).  When the subordinate vector already describes the mind, fajo- describes mental state, and lajo- is not used; e.g., fam‍susu‍e = state of dreaming is primarily an internal state.

wajoa describes the abstract process of being used as an instrument.  Cursor, instance of use; non-volitional path, process of use; volitional path, person who uses; (volitional/non-volitional) pivot, agent of use; (volitional/non-volitional) start, instrument of use; end, patient of use.  Alignment is similar to fajo-, but subordinate role defaults to instrument rather than patient.  For example, wajo‍o jilu‍a = person who uses a vehicle or riding beast, wajo‍o jilu‍e = person who uses a vehicle, wajo‍o jilu‍o = person who uses a riding beast.

On the periphery of this central knot of irregularity is jilu-, meaning (again) go or travel in a generic sense.  When dominant in an alignment with combined path particle ja, the role particle implies subordinate noncommittal content particle e, and jilu- aligns in parallel (it's an advector) to whatever complex process is described by the following subordinate clause.  (I don't group this with the larger family of mundane vectors using combined role particles to imply subordinate content, because here the alignment is implicitly restrictive and doesn't follow from complexity in the semantics of the vector, as with teach (lawa-), imply (soo-), etc.)  Here the alignment is purely a grammatical device; it unifies a complex process from the subordinate clause into a coherent vector, and objectifies it as the volitional path (engendered form jilu‍jo).  More subtly, jilu‍a ja with an engendered subordinate vector can provide a neutral vector with habitual aspect:  jilu‍a wo jeo‍e = go using a fast vehicle (once), jilu‍a ja jeo‍e = habitually go using a fast vehicle.

One can (btw) also play games with habitual aspect in using fajo-, exactly because it doesn't inherit the aspect of the subordinate clause:  engendering fajo- gives the state change habitual aspect, but gender in the subordinate clause does not.  Thus, latu we fajoa ja susu‍a lu laso = I (once) cause you to (once) sleeplatu we fajoa ja susu‍lu laso = I (once) cause you to habitually sleeplatu fajo‍o ja susu‍a lu laso = I habitually cause you to (once) sleeplatu fajo‍o ja susu‍lu laso = I habitually cause you to habitually sleep.  (Why I would have this soporific effect, we may suppose is provided by the context in which the sentence occurs.)

Whither Lamlosuo?

After a while —perhaps a year or more of tinkering— Lamlosuo began to take on an increasingly organic texture.  Natlangs draw richness from being shaped by many different people; a personal project, I think, when carried on for a long time starts to accrue richness from essentially the same source:  its single author is never truly the same person twice.  If you set aside the project and come back to it a week or a month later, you're not the same person you were when you set it aside; beside the additional things you've experienced in that time, most people would also no longer be quite immersed in some project details and would likely develop a somewhat different experience of them while reacquiring.  So the personal project really is developed by many people:  all the people that its single author becomes during development.  This enrichment cannot be readily duplicated over a short time, because the author doesn't change much in a short time.  This may be part of why the most impressive conlangs tend to be decades-long efforts; of course total labor adds up, but also, richness adds up.

The most active period of Lamlosuo development tailed off after about three years, due to a two-part problem in the vocabulary — phonaesthetic and semantic.

The phonology and phonotactics of Lamlosuo (whose conception I discussed a bit in the earlier post) are flat-out boring.  There are just-about no internal markers indicating morphological structure within a vector stem —even the class suffix is generally hard to recognize as not part of the stem— so there has been a bias toward two-syllable vector stems; it's been my perception that uniformly two-syllable simple stems help a listener identify the class suffix, so that nonuniform stem lengths (especially, odd-syllable-count stems) can be disorienting.  There are only a rather small number of two-syllable stems possible (basically, 54 = 625) and, moreover, packing those stems too close together within the available space not only makes them harder to remember, but harder even to distinguish.  After a while I reformed the lexicon a bit by easing in some informal principles about distance between stems (somewhat akin to Hamming distance) and some mnemonic similarities between semantically related stems.  The most recent version of the language document has 70 simple vector stems.

Semantically, a large part of the extant vocabulary is about the mechanics of saying things — attitude, conjunctions, numbers.  One also wants to have something to talk about.  Not wanting to build social biases into a vocabulary that didn't yet have a culture attached to it, I started with vocabulary for rather generic biological functions (eat, sleep...) and navigational maneuvers (go fast/slow, go against the current...) on the —naive— theory this would be "safe".  Later, with the mechanics-oriented vocabulary more complete, a small second wave of content-oriented words ventured into emotional, intellectual, and spiritual matters.  (The notes outline somewhat more ambitious spiritual structure than has been implemented yet; though I do rather like the stems deployed so far (speaking of bias) — fulo-, go wrongly, go contrary to the proper order of things; jolo-, go rightly, go with the proper order of things; wio-, inform emotion with reason; wie-, inspire reason with emotion.)

I did take away some lessons from building content vocabulary for Lamlosuo.  The vector approach has a distinctly dynamic effect on the language's outlook, since it doesn't lend itself to merely labeling things but asks for some sort of "going" to underlie each word.  This led, for instance, to the coining of two different words for blood, depending on what activity it's engaged in — jesa‍lu (circulating blood) and fesa‍lu (spilt blood).  Also, just as the vector concept induces conception of motions for a given noun, the identification of roles for each vector induces conception of participants for a given activity; for instance, in trying to provide a vector corresponding to English adjective fast, one has first advector jeo‍a, go at high speed, from which one then gets jeo‍lu (fast goer), jeo‍e (fast vehicle), jeo‍o (fast riding beast).

The dynamism of everything being in motion is accompanied by a secondary effect playing in counterpoint:  whereas human languages tend to provide each verb with a noun actor, Lamlosuo is more concerned to provide each vector with a noun affected.  This is a rather subtle difference.  The human-language tendency manifests especially in the nominative case (which of course ergative languages don't have, but then, accusative languages are more common); the Lamlosuo tendency is visible in the stipulation that the patient logical role is mandatory while the agent role is optional (keeping in mind, my terms patient and agent for Lamlosuo have somewhat different connotations than those terms as commonly used in human grammar:  affected is not quite the same as acted upon).  The distinction seems to derive from the relatively calm, measured nature of the vector metaphor for activity:  while going is more dynamic than being, it is on balance less dynamic than most forms of doing.  (If there's a bias there in my patterns of thought, I'm not sure its effect on this feature could be distinguished from its effect on the selection of the vector primitive in the first place.)

From time to time as Lamlosuo has developed, I've wondered about personal names.  If even labeling ordinary classes of things requires the conception of underlying motions, how is one to handle a label meant to signify the unique identity of a particular person?  I would resist a strategy for names that felt too much like backsliding into "being-doing" mentality, since much of the point of the exercise is to try something different (and since, into the bargain, any such backsliding would be highly suspect of bias on my part; not that I'd absolutely stonewall such a development, but the case for it would have to be pretty compelling).  Early in the development of Lamlosuo, I was able to simply defer the question, as at that point questions about the language that had answers were the exception, and this was just one more in the general sea of unknowns.  Lately, though, in closely studying the evolution of abstract thought in ancient Greece (reading Bruno Snell's The Discovery of Mind, as part of drafting a follow-up to my post on storytelling), I'm struck by how Lamlosuo's ubiquity of process sits in relation to Snell's analysis of abstract nouns, concrete nouns, and proper nouns (and verbs, to which Snell ascribes a special part in forming potent metaphors).  The larger conlanging project, within which Lamlosuo serves, posits a long timeline of development of the conspeakers' civilization, and as I look at it now this begs the question of how their abstractions evolved.  Mapping out the evolution might or might not provide, or inspire, a solution to the naming problem; at any rate, it's deeply unclear at this point what these factors imply for Lamlosuo, as well as for the larger project.

Avoiding cultural assumptions in the vocabulary created a highly clinical atmosphere (which is why I called the hope of culture-neutrality "naive":  lack of cultural features is a kind of culture; also note, human culture ought to contain traces from the evolution of abstract thought).  Each word tended to be given a rather pure, antiseptic meaning (until late in the game when I started deliberately working in a bit of flavor), heightening a trend already latent in the cut-and-dried mechanics of the language that arose from its early intent, as a testbed, to not bother with naturalism (so, in a way all this traces back to regularity).  For example:  hoping to insulate the various sex-related vocabulary words from lewd overtones, I set out to fashion an advector corresponding to the English adjective obscene, so that one might then claim the various other words weren't obscene without the advector (which of course amounts to making those other words more clinical).  The result took on a life of its own.  Advector josu-, do something obscene (with absolutely no implication whatever as to what is done); start agent; end patient; pivot instrument.  One is naturally led to consider the difference between a non-volitional instrument and a volitional instrument.  Throw in the reflective prefix and, for extra vitriol, an invitational mood particle, and you've got i jun‍josu‍a, which the language notes translate as "FU", but really it's more precise, more... clinical than that.

One natural next major step for Lamlosuo —if there were to be a next major step, rather than moving on to the other languages it was meant to prepare the way for— would be a push to significantly expand the vocabulary, to allow testing the dynamics of larger discourses.  (I wrote a long post a while back about long discourses).  However, the bland, narrow vocabulary space seemed an obstacle to this sort of major vocabulary-expansion operation.  A serious naturalistic conlang would combat this sort of problem partly through the richness that, as noted, comes from developing in many sessions over a long time; but ultimately one also has to mix this with some technological methods.  Purely technological methods would always create something with an artificial feel, so one really wants to find ways of using technological methods to amplify whatever sapient richness is input to the system; and that sounds like an appropriate study for a testbed language such as Lamlosuo.  Moreover, I just don't readily track all the complex details of a linguistic project like this — not if it's skipping like a pebble across time, with intervals between development sessions ranging from a few hours to a few years; I therefore imagined some sort of automated system that would help keep track of the parts of the language design, noting which parts are more, or less, conformant to expected patterns — and why.  (I'm very much aware that, in creating such designs, to maintain an authentic sapient pattern you need to be able to explain an exception just once and not have the system keep hounding you about it until you give the answer the automated system favors.)

And at this point, things take an abrupt turn toward fexprs.  (Cf. the law of the instrument.)  My internal document describing the language is written in LaTeX.  Yet, as just described, I'd like it to do more, and do it ergonomically.  As it happens, I have a notion how to approach this, dormant since early in the development of my dissertation:  I've had in mind that, if (as I've been inclined to believe for some years now) fexprs ought to replace macros in pretty much all situations where macros are used, then it follows that TeX, which uses macros as its basic extension mechanism, should be redesigned to use fexprs instead.  LaTeX is a (huge) macro package for TeX.

So, Lamlosuo waits on the speculative notion of a redesign of TeX.  It seems I ought to come out of such a redesign with some sort of deeper understanding of the practical relationship between macro-based and fexpr-based implementations, because Knuth's design of TeX is in essence quite integrated — a daunting challenge to contemplate tampering with.  (One also has to keep in mind that the extreme stability of the TeX platform is one of its crucial features.)  It's rather sobering to realize that a fexpr-based redesign of TeX isn't the most grandiose plan in my collection.

Thursday, May 2, 2019

Rewriting and emergent physics

Tempora mutantur, nos et mutamur in illis.
(Times change, and we change with them.)
Latin Adage, 16t‍h-century Germany.

I want to understand how a certain kind of mathematical system can act as a foundation for a certain kind of physical cosmos.  The ultimate goal of course would be to find a physical cosmos that matches the one we're in; but as a first step I'd like to show it's possible to produce certain kinds of basic features that seem prerequisite to any cosmos similar to the one we're in.  A demonstration of that much ought, hopefully, to provide a starting point to explore how features of the mathematical system shape features of the emergent cosmos.

The particular kind of system I've been incrementally designing, over a by-now-lengthy series of posts (most recently yonder), is a rewriting system —think λ-calculus— where a "term" (really more of a graph) is a state of the whole spacetime continuum, a vast structure which is rewritten according to some local rewrite rules until it reaches some sort of "stable" state.  The primitive elements of this state have two kinds of connections between them, geometry and network; and by some tricky geometry/network interplay I've been struggling with, gravity and the other fundamental forces are supposed to arise, while the laws of quantum physics emerge as an approximation good for subsystems sufficiently tiny compared to the cosmos as a whole.  That's what's supposed to happen for physics of the real world, anyway.

To demonstrate the basic viability of the approach, I really need to make two things emerge from the system.  The obvious puzzle in all this has been, from the start, how to coax quantum mechanics out of a classically deterministic rewriting system; inability to extract quantum mechanics from classical determinism has been the great stumbling block in devising alternatives to quantum mechanics for about as long as quantum mechanics has been around (harking back to von Neumann's 1932 no-go theorem).  I established in a relatively recent post (thar) that the quintessential mathematical feature of quantum mechanics, to be derived, is some sort of wave equation involving signed magnitudes that add (providing a framework in which waves can cancel, so producing interference and other quantum weirdness).  The geometry/network decomposition is key for my efforts to do that; not something one would be trying to achieve, evidently, if not for the particular sort of rewriting-based alternative mathematical model I'm trying to apply to the problem; but, contemplating this alternative cosmic structure in the abstract, starting from a welter of interconnected elements, one first has to ask where the geometry — and the network — and the distinction between the two — come from.

Time after time in these posts I set forth, for a given topic, all the background that seems relevant at the moment, sift through it, glean some new ideas, and then set it all aside and move on to another topic, till the earlier topic, developing quietly while the spotlight is elsewhere, becomes fresh again and offers enough to warrant revisiting.  It's not a strategy for the impatient, but there is progress, as I notice looking back at some of my posts from a few years ago.  The feasibility of the approach hinges on recognizing that its value is not contingent on coming up with some earth-shattering new development (like, say, a fully operational Theory of Everything).  One is, of course, always looking for some earth-shattering new development; looking for it is what gives the whole enterprise shape, and one also doesn't want to become one of those historical footnotes who after years of searching brushed past some precious insight and failed to recognize it, so that it had to wait for some other researcher to discover it later.  But, as I noted early in this series, the simple act of pointing out alternatives to a prevailing paradigm in (say) physics is beneficial to the whole subject, like tilling soil to aerate it.  Science works best with alternatives to choose between; and scientists work best when their thoughts and minds are well-limbered by stretching exercises.  For these purposes, in fact, the more alternatives the merrier, so that as a given post is less successful in reaching a focused conclusion it's more likely to compensate in variety of alternatives.

In this series of physics posts, I keep hoping to get down to mathematical brass tacks; but very few posts in the series actually do so (with a recent exception in June of last year).  Alas, though the current post does turn its attention more toward mathematical structure, it doesn't actually achieve concrete specifics.  Getting to the brass tacks requires first working out where they ought to be put.

Contents
Dramatis personae
Connections
Termination
Duality
Scale
Dramatis personae

A rewriting calculus is defined by its syntax and rewriting rules; for a given computation, one also needs to know the start term.  In this case, we'll put off for the moment worrying about the starting configuration for our system.

The syntax defines the shapes of the pieces each state (aka term, graph, configuration) is made of, and how the pieces can fit together.  For a λ-like calculus, the pieces of a term would be syntax-tree nodes; the parent/child connections in the tree would be the geometry, and the variable binding/instance connections would be the network.  My best guess, thus far, has been that the elementary pieces of the cosmos would be events in spacetime.  Connections between events would, according to the general scheme I've been conjecturing, be separated into local connections, defining spacetime, and non-local connections, providing a source of seeming-randomness if the network connections are sufficiently widely distributed over a cosmos sufficiently vast compared to the subsystem under consideration.

I'm guessing that, to really make this seeming-randomness trick work, the cosmos ought to be made up of some truly vast number of events; say, 1060, or 1080, or on up from there.  If the network connections are really more-or-less-uniformly distributed over the whole cosmos, irrespective of the geometry, then there's no obvious reason not to count events that occur, say, within the event horizon of a black hole, and from anywhere/anywhen in spacetime, which could add up to much more than the currently estimated number of particles in the universe.  Speculatively (which is the mode all of this is in, of course), if the galaxy-sized phenomena that motivate the dark-matter hypothesis are too big, relative to the cosmos as a whole, for the quantum approximation to work properly —so one would expect these phenomena to sit oddly with our lesser-scale physics— that would seem to suggest that the total size of the cosmos is finite (since in an infinite cosmos, the ratio of the size of a galaxy to the size of the universe would be exactly zero, no different than the ratio for an electron).  Although, as an alternative, one might suppose such an effect could derive, in an infinite cosmos, from network connections that aren't distributed altogether uniformly across the cosmos (so that connections with the infinite bulk of things get damped out).

With the sort of size presumed necessary to the properties of interest, I won't be able to get away with the sort of size-based simplifying trick I've gotten away with before, as with a toy cosmos that has only four possible states.  We can't expect to run a simulation with program states comparable in size to the cosmos; Moore's law won't stretch that far.  For this sort of research I'd expect to have to learn, if not invent, some tools well outside my familiar haunts.

The form of cosmic rewrite rules seems very much up-for-grabs, and I've been modeling guesses on λ-like calculi while trying to stay open to pretty much any outre possibility that might suggest itself.  In λ-like rewriting, each rewriting rule has a redex pattern, which is a local geometric shape that must be matched; it occurs, generally, only in the geometry, with no constraints on the network.  The redex-pattern may call for the existence of a tangential network connection —the β-rule of λ-calculus does this, calling for a variable binding as part of the pattern— and the tangential connection may be rearranged when applying the rule, just as the local geometry specified by the redex-pattern may be rearranged.  Classical λ-calculus, however, obeys hygiene and co-hygiene conditions:  hygiene prohibits the rewrite rule from corrupting any part of the network that isn't tangent to the redex-pattern, while co-hygiene prohibits the rewrite rule from corrupting any part of the geometry that isn't within the redex-pattern.  Impure variants of λ-calculus violate co-hygiene, but still obey hygiene.  The guess I've been exploring is that the rewriting rules of physics are hygienic (and Church-Rosser), and gravity is co-hygienic while the other fundamental forces are non-co-hygienic.

I've lately had in mind that, to produce the right sort of probability distributions, the fluctuations of cosmic rewriting ought to, in essence, compare the different possible behaviors of the subsystem-under-consideration.  Akin to numerical solution of a problem in the calculus of variations.

Realizing that the shape of spacetime is going to have to emerge from all this, the question arises —again— of why some connections to an event should be "geometry" while others are "network".  The geometry is relatively regular and, one supposes, stable, while the network should be irregular and highly volatile, in fact the seeming-randomness depends on it being irregular and volatile.  Conceivably, the redex-patterns are geometric (or mostly geometric) because the engagement of those connections within the redex-patterns cause those connections to be geometric in character (regular, stable), relative to the evolution of the cosmic state.

The overall character of the network is another emergent feature likely worth attention.  Network connections in λ-calculus are grouped into variables, sub-nets defined by a binding and its bound instances, in terms of which hygiene is understood.  Variables, as an example of network structure, seem built-in rather than emergent; the β-rule of λ-calculus is apparently too wholesale a rewriting to readily foster ubiquitous emergent network structure.  Physics, though, seems likely to engage less wholesale rewriting, from which there should also be emergent structure, some sort of lumpiness —macrostructures— such that (at a guess) incremental scrambling of network connections would tend to circulate those connections only within a particular lump/macrostructure.  The apparent alternative to such lumpiness would be a degree of uniform distribution that feels, to my intuition anyway, unnatural.  One supposes the lumpiness would come into play in the nature of stable states that the system eventually settles into, and perhaps the size and character of the macrostructures would determine at what scale the quantum approximation ceases to hold.

Connections

Clearly, how the connections between nodes —the edges in the graph— are set up is the first thing we need to know, without which we can't imagine anything else concrete about the calculus.  Peripheral to that is whether the nodes (or, for that matter, the edges) are decorated, that is, labeled with additional information.

In λ-calculus, the geometric connections are of just three forms, corresponding to the three syntactic forms in the calculus:  a variable instance has one parent and no children; a combination node has one parent and two children, operator and operand; and a λ-expression has one parent and one child, the body of the function.  For network connections, ordinary λ-calculus has one-to-many connections from each binding to its bound instances.  These λ network structures —variables— are correlated with the geometry; the instances of a variable can be arbitrarily scattered through the term, but the binding of the variable, of which there is exactly one, is the sole asymmetry of the variable and gives it an effective singular location in the syntax tree, required to be an ancestor in the tree of all the locations of the instances.  Interestingly, in the vau-calculus generalization of λ-calculus, the side-effectful bindings are somewhat less uniquely tied to a fixed location in the syntax tree, but are still one-per-variable and required to be located above all instances.

Physics doesn't obviously lend itself to a tree structure; there's no apparent way for a binding to be "above" its instances, nor apparent support for an asymmetric network structure.  Symmetric structures would seem indicated.  A conceivable alternative strategy might use time as the "vertical" dimension of a tree-like geometry, though this would seem rather contrary to the loss of absolute time in relativity.

A major spectrum of design choice is the arity of network structures, starting with whether network structures should have fixed arity, or unfixed as in λ-like calculi.  Unfixed arity would raise the question of what size the structures would tend to have in a stable state.  Macrostructures, "lumps" of structures, are a consideration even with fixed arity.

Termination

In exploring these realms of possible theory, I often look for ways to defer aspects of the theory till later, as a sort of Gordian-knot-cutting (reducing how many intractable questions I have to tackle all at once).  I've routinely left unspecified, in such deferral, just what it should mean for the cosmic rewriting system to "settle into a stable state".  However, at this point we really have no choice but to confront the question, because our explicit main concern is with mathematical properties of the probability distribution of stable states of the system, and so we can do nothing concrete without pinning down what we mean by stable state.

In physics, one tends to think of stability in terms of asymptotic behavior in a metric space; afaics, exponential stability for linear systems, Lyapunov stability for nonlinear.  In rewriting calculi, on the other hand, one generally looks for an irreducible form, a final state from which no further rewriting is possible.  One could also imagine some sort of cycle of states that repeat forever, though making that work would require answers to some logistical questions.  Stability (cyclic or otherwise) might have to do with constancy of which macrostructure each of an element's network connections associates to.

If rewriting effectively explores the curvature of the action function (per the calculus of variations as mentioned earlier), it isn't immediately obvious how that would then lead to asymptotic stability.  At any rate, different notions of stability lead to wildly different mathematical developments of the probability distribution, hence this is a major point to resolve.  The choice of stability criterion may depend on recognizing what criterion can be used in some technique that arrives at the right sort of probability distribution.

There's an offbeat idea lately proposed by Tim Palmer called the invariant set postulate.  Palmer, so I gather, is a mathematical physicist deeply involved in weather prediction, from which he's drawn some ideas to apply back to fundamental physics.  A familiar pattern in nonlinear systems, apparently, is a fractal subset of state space which, under the dynamics of the system, the system tends to converge upon and, if the system state actually comes within the set, remains invariant within.  In my rewriting approach these would be the stable states of the cosmos.  The invariant set should be itself a metric space of lower dimension than the state space as a whole and (if I'm tracking him) uncomputable.  Palmer proposes to postulate the existence of some such invariant subset of the quantum state space of the universe, to which the actual state of the universe is required to belong; and requiring the state of the universe to belong to this invariant set amounts to requiring non-independence between elements of the universe, providing an "out" to cope with no-go theorems such as Bell's theorem or the Kochen–Specker theorem.  Palmer notes that while, in the sort of nonlinear systems this idea comes from, the invariant set arises as a consequence of the underlying dynamics of the system, for quantum mechanics he's postulating the invariant set with no underlying dynamics generating it.  This seems to be where my approach differs fundamentally from his:  I suppose an underlying dynamics, produced by my cosmic rewriting operation, from which one would expect to generate such an invariant set.

Re Bell and, especially, Kochen-Specker, those no-go theorems rule out certain kinds of mutual independence between separate observations under quantum mechanics; but the theorems can be satisfied —"coped with"— by imposing some quite subtle constraints.  Such as Palmer's invariant set postulate.  It seems possible that Church-Rosser-ness, which tampers with independence constraints between alternative rewrite sequences, may also suffice for the theorems.

Duality

What if we treated the lumpy macrostructures of the universe as if they were primitive elements; would it be possible to then describe the primitive elements of the universe as macrostructures?  Some caution is due here for whether this micro/macro duality would belong to the fundamental structure of the cosmos or to an approximation.  (Of course, this whole speculative side trip could be a wild goose chase; but, as usual, on one hand it might not be a wild goose chase, and on the other hand wild-goose-chasing can be good exercise.)

Perhaps one could have two coupled sets of elements, each serving as the macrostructures for the other.  The coupling between them would be network (i.e., non-geometric), through which presumably each of the two systems would provide the other with quantum-like character.  In general the two would have different sorts of primitive elements and different interacting forces (that is, different syntax and rewrite-rules).  Though it seems likely the duals would be quite different in general, one might wonder whether in a special case they could sometimes have the same character, in which case one might even ask whether the two could settle into identity, a single system acting as its own macro-dual.

For such dualities to make sense at all, one would first have to work out how the geometry of each of the two systems affects the dynamics of the other system — presumably, manifesting through the network as some sort of probabilistic property.  Constructing any simple system of this sort, showing that it can exhibit the sort of quantum-like properties we're looking for, could be a worthwhile proof-of-concept, providing a buoy marker for subsequent explorations.

On the face of it, a basic structural difficulty with this idea is that primitive elements of a cosmic system, if they resemble individual syntax nodes of a λ-calculus term, have a relatively small fixed upper bound on how many macrostructures they can be attached to, whereas a macrostructure may be attached to a vast number of such primitive elements.  However, there may be a way around this.

Scale

I've discussed before the phenomenon of quasiparticles, group behaviors in a quantum-mechanical system that appear (up to a point) as if they were elementary units; such eldritch creatures as phonons and holes.  Quantum mechanics is fairly tolerant of inventing such beasts; they are overtly approximations of vastly complicated underlying systems.  Conventionally "elementary" particles can't readily be analyzed in the same way —as approximations of vastly complicated systems at an even smaller scale— because quantum mechanics is inclined to stop at Planck scale; but I suggested one might achieve a similar effect by importing the complexity through network connections from the very-large-scale cosmos, as if the scale of the universe were wrapping around from the very small to the very large.

We're now suggesting that network connections provide the quantum-like probability distributions, at whatever scale affords these distributions.  Moreover, we have this puzzle of imbalance between, ostensibly, small bounded network arity of primitive elements (analogous to nodes in a syntax tree) and large, possibly unbounded, network arity of macrostructures.  The prospect arises that perhaps the conventionally "elementary" particles —quarks and their ilk— could be already very large structures, assemblages of very many primitive elements.  In the analogy to λ-calculus, a quark would correspond to a subterm, with a great deal of internal structure, rather than to a parse-tree-node with strictly bounded structure.  The quark could then have a very large network arity, after all.  Quantum behavior would presumably arise from a favorable interaction between the influence of network connections to macrostructures at a very large cosmic scale, and the influence of geometric connections to microstructures at a very small scale.  The structural interactions involved ought to be fascinating.  It seems likely, on the face of it, that the macrostructures, exhibiting altogether different patterns of network connections than the corresponding microstructures, would also have different sorts of probability distributions, not so much quantum as co-quantum — whatever, exactly, that would turn out to mean.

If quantum mechanics is, then, an approximation arising from an interaction of influences from geometric connections to the very small and network connections to the very large, we would expect the approximation to hold, not at the small end of the range of scales, but only at a subrange of intermediate scales — not too large and at the same time not too small.  In studying the dynamics of model rewriting systems, our attention should then be directed to the way these two sorts of connections can interact to reach a balance from which the quantum approximation can emerge.

At a wild, rhyming guess, I'll suggest that the larger a quantum "particle" (i.e., the larger the number of primitive elements within it), the smaller each corresponding macrostructure.  Thus, as the quanta get larger, the macrostructures get smaller, heading toward a meeting somewhere in the mid scale — notionally, around the square root of the number of primitive elements in the cosmos — with the quantum approximation breaking down somewhere along the way.  Presumably, the approximation also requires that the macrostructures not be too large, hence that the quanta not be too small.  Spinning out the speculation, on a logarithmic scale, one might imagine the quantum approximation working tolerably well for, say, about the middle third of the lower half of the scale, with the corresponding macrostructures occupying the middle third of the upper half of the scale.  This would put the quantum realm at a scale from the number of cosmic elements raised to the 1/3 power, down to the number of cosmic elements raised to the 1/6 power.  For example, if the number of cosmic elements were 10120, quantum scale would be from 1040 down to 1020 elements.  The takeaway lesson here is that, even if those guesses are off by quite a lot, the number of primitive elements in a minimal quantum could still be rather humongous.

Study of the emergence of quasiparticles seems indicated.

Thursday, April 18, 2019

Nabla

They [quaternions] are relatively poor 'magicians'; and, certainly, they are no match for complex numbers in this regard.
Roger Penrose, The Road to Reality, 2005, §11.2.

In this post, I'm going to explore some deep questions about the nature of quaternion differentiation.

Along the way I'm going to suggest some reasons Penrose's assessment quoted above may be somewhat off-target.  I'm quite interested in Penrose's view of quaternions because he presents a twenty-first century form of the classical arguments against quaternions, with an (afaics) sincere effort at objectivity, by someone who patently does appreciate the profound power of elegant mathematics (in apparent contrast to the vectorists' side of the 1890s debate).  The short-short version:  Not only do I agree that quaternions lack the magic of complex numbers, I think it would be bizarre if they had that magic since they aren't complex numbers — but I see clues suggesting they've other magic of their own.

If I claimed to know just what the magic of quaternions is, it would be safe to bet I'd be wrong; the challenge is way too big for answers to come that easily.  However, in looking for indirect evidence that some magic is there to find, I'll pick up some clues to where to look next, the ambiguous note on which this post will end... or rather, trail off. 

Before I can start in on all that, though, I need to provide some background.

Contents
Setting the stage
Quaternions
Doubting classical nabla
Considering quaternions
Full nabla
Partial derivatives
Generalized quaternions
Rotation
Minkowski
Langlands
Setting the stage

When I was first learning vector calculus as a freshman in college (for perspective, that's about when Return of the Jedi came out), I initially supposed that the use of symbol ∇ in three different differential operators —  ∇,  ∇×,  ∇·  — was just a mnemonic device.  My father, who'd been interested in quaternions since, as best as I can figure, when he was in college (about when Casablanca came out), promptly set me straight:  those operators look similar because they're all fragments of a single quaternion differential operator called nabla:

 = 
i
x
 + 
j
y
 + 
k
z
.
If you know a smattering of vector calculus, you may be asking, isn't that just the definition of the gradient operator?  No, because of the seemingly small detail that the factors i, j, k aren't unit vectors, i, j, ki, j, k are imaginary numbers.  Which has extraordinary consequences.  I'd better take a moment to explain quaternions.

Quaternions

A vector space over some kind of scalar numbers is the n-tuples of scalars, for some fixed n, together with scalar multiplication (to multiply a vector by a scalar, just multiply all the elements of the vector by that scalar) and vector addition (add corresponding elements of the vectors).  An algebra is a vector space equipped with an internal operation called multiplication ("internal" meaning you multiply a vector by a vector, rather than by a scalar) and a multiplicative identity, such that scalar multiplication commutes with internal multiplication, and internal multiplication is bilinear (fancy term, simple once you've seen it:  each element of the product is a polynomial in the elements of the factors, where each term in each polynomial has one element from each factor).

Whatever interesting properties a particular algebra has are, in a sense, contained in its internal multiplication.  So when we speak of the "algebraic structure" of an algebra, what we're talking about is really just its multiplication table.

Quaternions are a four-dimensional hypercomplex algebra.  They're denoted by the symbol ℍ (after Hamilton, their discoverer).  Hypercomplex just means that the first of the four basis elements is the multiplicative identity, so that the first dimension of the vector space can be identified with the scalars, in this case the real numbers, ℝ.  Traditionally the four basis elements are called 1, i, j, k; which said, hereafter I'll prefer to call the imaginaries i1, i2, i3, and occasionally use i0 as a synonym for 1.  The four real vector-space components of a quaternion, I'll indicate by putting subscripts 0,1,2,3 on the name of the quaternion; thus, a = a0 + a1i1 + a2i2 + a3i3 = Σ akik.

Quaternion multiplication is defined by i12 = i22 = i32 = i1i2i3 = −1, where multiplication of the imaginary basis elements is associative (i1(i2i3) = (i1i2) i3  and so on) and different imaginary basis elements anticommute (i1i2 = − i2i1  and so on).  The whole multiplication table can be put together from these few rules, and we have the quaternion product (take a deep breath):

ab =  (a0b0 − a1b1 − a2b2 − a3b3)
i1 (a0b1 + a1b0 + a2b3 − a3b2)
i2 (a0b2 − a1b3 + a2b0 + a3b1)
i3 (a0b3 + a1b2 − a2b1 + a3b0) .
This is that bilinear multiplication table mentioned earlier, where each element of the product is a simple polynomial in elements of the factors.  If you stare at this a bit, you can also see that when a and b are imaginary (that is, a0 = b0 = 0), the real part of the product is minus the dot product of the vectors, and the imaginary part of the product is the cross product of the vectors:  ab = a×b − a·b.

A few handy notations:   real part  re(a) = a0;   imaginary part  i‍m(a) = a − a0;   conjugate  a* = re(a) − i‍m(a);   norm  ||a|| = sqrt (Σ ak2).

Quaternion multiplication is associative.  Quaternion multiplication is also non-commutative, which was a big deal when Hamilton first discovered quaternions in 1843, because until then all known kinds of numbers had obeyed all the "usual" laws of arithmetic.  But what's really interesting about quaternion multiplication — at least, on the face of it — is that it has unique division.  That is, for all quaternions a and b, where a is non-zero, there is exactly one quaternion x such that  ax = b, and exactly one quaternion x such that  x‍a = b.  In particular, with b=1, this says that every non-zero a has a unique left-multiplicative inverse, and a unique right-multiplicative inverse.  These are actually the same number, which we write

a−1  = 
a*
 ||a||2 
(the conjugate divided by the square of the norm).  So  aa−1 = a−1a = 1.

Division algebras are very special; pathological cases aside, there are only four of them:  real numbers, complex numbers, quaternions, and octonions.  (Yes, there are hypercomplex numbers with seven imaginaries that are even more mind-bending than quaternions.  But that's another story.)

To solve equation ax=b, we left-multiply both sides by a−1, thus  a−1b = a−1(ax) = (a−1a)x = x; and likewise, the solution to x‍a=b is  x = ba−1.  We call right-multiplication by  a−1  "right-division by a", and write  b / a = ba−1; similarly, left-division  a \ b = a−1b.  (Backslash, btw, is such a dreadfully overloaded symbol, I can somewhat understand why I haven't seen others use it this way; but I'm quite bowled over by how elegantly natural this use seems to me.  It preserves the order of symbols when applying associativity:  (a / b) c = a (b \ c).)  Naturally, I won't write division vertically unless the denominator is real.

Okay, we're girded for battle.  Back to nabla.

Doubting classical nabla

Our definition of nabla, you'll recall, was

 = 
i1
x1
 + 
i2
x2
 + 
i3
x3
.
This operator is immensely useful; people have been making great use of it, or its fragments in vector calculus, for well over a century and a half.  But three things about it bother me, the first two of which I've seen remarked by papers written in the past few decades.

Handedness

My first bother follows from the fact that quaternion multiplication isn't commutative.  This was, remember, a dramatic new idea in 1843; the key innovation that empowered Hamilton's discovery, because what he wanted — something akin to complex numbers but with arithmetic sensible in three dimensions — requires non-commutative multiplication.  But if multiplication isn't commutative, why should the partial derivatives in the definition of ∇ necessarily be left-multiplied by the imaginaries?  Why shouldn't they be right-multiplied, instead?

I've seen modern papers that use together both left and right versions of nabla.  Peter Michael Jack, who's had a web presence for (I believe) nearly as long as there's been a web, has suggested using a prefix operator for the left-multiplying nabla and a postfix operator for the right-multiplying nabla.  Candidly, I find that notation dire hard to read.  The point of prefix operators (which Hamilton championed, the better part of a century before Jan Łukasiewicz) is to make expressions much simpler to parse, and mixing prefix with postfix doesn't do that.  Another notation I see used in at least one place modernly is a subscript q on the left or right of the nabla symbol to indicate which side to put the imaginary factors on.  I'm not greatly enthused by that notation either because it uses a multi-part symbol.  I have an alternative solution in mind for the notational puzzle; but I'll want to make clear first the whole of what I'm trying to notate.

Truncation

My second bother is that traditionally defined nabla isn't even a full quaternion operator.  It only has the partial derivatives with respect to the three imaginary components.  Where's the partial with respect to the real component?  In the 1890s debate, the quaternionists said quaternions are profoundly meaningful as coherent entities, and the vectorists said scalars and vectors are meaningful while their sum is meaningless.  Now, I'm quite sympathetic to the importance of mathematical elegance, but come on, guys, make up your mind!  Either you go full-quaternion, or you don't. A nabla that only acts on vector functions is just lame.

There's a good deal of history related to the question of imaginary nabla versus full nabla.  The truncation of nabla to the imaginary components throughout the nineteenth century may have been partly an historical accident.  Consideration of the four-dimensional operator seems to have started just before the turn of the twentieth century, and modern quaternionists I've observed use a full-quaternion operator.  I'll have more to say about this history in the next section.

Meaning

My third bother is a byproduct, as best I can figure, of my persistent sense of arbitrariness about the nabla operator.  (This is the difficulty I've never seen anyone else remark upon.  Perhaps I'm missing something everyone else gets, or maybe I've just never looked in the right place; but then again, maybe people are just reluctant to publicly admit something doesn't make sense to them.  That might explain a lot about the world.)  It was never obvious to me why it should be meaningful — or, if you prefer the word, useful — to multiply the partial derivatives by the imaginaries in the first place.  It's clear to me why you'd do that if you were defining gradient, because gradient is meaningful for any number of dimensions, and doesn't depend in any way on the existence of a division operation.  But quaternions do have unique division, in fact it's rather a big deal that they have unique division, and the usual definition of ordinary derivative involves dividing by Δx.  So why are we multiplying by the imaginaries, instead of dividing by them?

Considering quaternions

Some of my above questions have historical answers, which also bear on the challenge raised by Penrose in the epigraph at the beginning of this post.

By Chapter 11 of The Road to Reality, where Penrose makes that remark (and where he also candidly describes the question of quaternions' use in physics as a "can of worms"), he's already described some marvelous properties of complex numbers, culminating with one (hyper‍functions) only published in 1958.  Which raises an important point.  Complex numbers have been intensely studied by mainstream researchers throughout the modern era of physics, yet Penrose's crowning bit of complex 'magic' wasn't discovered until 1958?

Compare that to how much, or rather how little, scrutiny quaternions have received.  Hamilton discovered them in 1843; but Hamilton was a mathematical genius, not a great communicator.  Quaternions, so I gather, remained the archetype of a baffling abstract theory until Einstein's General Theory of Relativity took over that role.  The first tome Hamilton wrote on the subject, Lectures on Quaternions, daunted the mathematical luminaries of the day; his later Elements of Quaternions, published incomplete in 1866 following his death in 1865, wasn't easy either.  The first accessible introduction to the subject was Peter Guthrie Tait's 1867 Elementary Treatise on Quaternions.  Quaternions got a big publicity boost when James Clerk Maxwell used them (for their conceptual clarity, rather than for mundane calculations) in his 1873 Treatise on Electricity and Magnetism.  And then in the 1890s quaternions "lost" the great vectors-versus-quaternions debate and their use gradually faded thereafter.  There simply weren't all that many people working with quaternions in the nineteenth century, and as world population increased in the twentieth century quaternions were no longer a hot topic.

Moreover, exploration of nabla got off on the wrong foot.  Hamilton seems to have first dabbled with it several years before he discovered quaternions, as a sort of "square root" of the Laplacian, at which time naturally he only gave it three components; and when he adapted it to a quaternionic form it still had only three components.  He didn't do much with it in the Lectures, and planned a major section on it for the Elements but, afaict, died before he got to it.  James Clerk Maxwell was a first-class mind and a passionate quaternion enthusiast, but died at the age of forty-eight in 1879 — the same year as William Kingdon Clifford, who was only thirty-three, another first-class mind who had explored quaternions.  The full-quaternion nabla was finally looked at, preliminarily, in 1896 by Shunkichi Kimura, but by that time the quaternionic movement was starting to wind down.  Yes, quaternions were still being used for some decades thereafter, but less and less, and the notations get harder and harder to follow as quaternionic notations were hybridized with Gibbs vector notations, further disrupting the continuity of the tradition and undermining systematic progress.  Imho, it's entirely possible for major insights to still be waiting patiently.

A subtle point on which Penrose's portrayal of quaternions is somewhat historically off:  Penrose says that although to a modern mind the one real and three imaginary components of quaternions naturally suggest the one time and three space dimensions of spacetime, that's just because we've been acclimated to the idea of spacetime by Einstein's theory of relativity; and quaternions don't actually work for relativity because they have the wrong signature (I'll say a bit more about this below; see here).  But as far as the notion of spacetime goes, the shoe is on the other foot.  Hamilton expected mathematics to coincide with reality (a principle Penrose also, broadly, embraces), and as soon as he discovered quaternions he did connect their structure metaphysically with the four dimensions of space and time.  Penrose is quite right, I think, that ideas like this get to be "in the air"; but in this case it looks to me like it first got into the air from quaternions.  So I'm more inclined to suspect quaternions suggested spacetime and thereby subtly contributed to relativity, rather than relativity and spacetime suggesting a connection to quaternions.  The latter implies an anachronistic influence that must be illusory (for relativity to influence Hamilton would seem to require a TARDIS); the former hints at some deeper magic.

The point about quaternions having the wrong signature has its own curious historical profile.  Penrose expresses very much the mainstream party line on the issue, essentially echoing the assessment of Hermann Minkowski a century earlier who, in formulating his geometry of spacetime, explicitly rejected quaternions, saying they were "too narrow and clumsy for the purpose".  The basic mathematical point here (or, at least, a form of it) is that the norm of a quaternion is the square root of the sum of the squares of its components, √(t2+x2+y2+z2), whereas in Minkowski spacetime the three spatial elements should be negative, √(t2x2y2z2).  But here the plot thickens.  Minkowski, who so roundly rejected quaternions, defines a differential operator that is, structurally, the four-dimensional nabla.  As for quaternions and relativity, Ludwik Silberstein (a notable popularizer of relativity, in his day) did use quaternions for special relativity — except that, to be precise, he used biquaternions.

Biquaternions (which Hamilton had also worked with) are quaternions whose four coefficients are complex numbers in a fourth, independent imaginary.  Or, equivalently, they're complex numbers whose two coefficients are quaternions in an independent set of three imaginaries.  Either way, that's a total of eight real coefficients.  Biquaternions do not, of course, have unique division.  However, there are some oddly suggestive features to Silberstein's treatment.  His spacetime vectors have only four non-zero real coefficients (of the four quaternion coefficients, a0 is real while ak≥1 are imaginary, so that Σ(akik)2 = a02−||a1||2−||a2||2−||a3||2; while other biquaternions he considers have imaginary a0 and real ak≥1).  Moreover, he prominently uses the "inverse" of a biquaternion, defined structurally just as for quaternions,

a*
||a||2
, notwithstanding the technical lack of general biquaternion division.

Silberstein's approach contrasts with the quaternionic treatment of special relativity by P.A.M. Dirac, published in 1945 as part of the centennial celebration for the discovery of quaternions.  Dirac used real quaternions on the grounds that since the merit of quaternions is in their having division, it would be pointless to use biquaternions which are of no particular mathematical interest.  His mapping of spacetime coordinates onto real quaternions was unintuitive.  But looking at the oddly familiar-looking patterns in Silberstein's treatment, and Minkowski's operator which is hard not to think of as a full quaternionic nabla, one might well wonder if there is something going on that defies Dirac's claim about the importance of unique division.  Perhaps we've been incautious in our assumptions about just where the deep magic is to be found.

There are two pitfalls in this kind of thinking, which the inquirer must thread carefully between.  On one hand, one might assume there is some unknown deep magic here, rather than trying to work out what it is; this not only would lean toward numerology, but if there really is something to be found, would miss out on the benefits of finding it.  On the other hand, one could derive some superficial mathematical account of the particular mathematical relationships involved, based on math one already knows about, and assume this is all there is to the matter; which would again guarantee that any deeper insight waiting to be found would not be found.  (Current mainstream thinking, btw, falls into the latter pitfall, essentially reasoning that geometric algebras are useful in a way that quaternions are not, therefore quaternions are not useful.)  Is there any situation where it would really be time to give up the search altogether?  Well, yes, one does come to mind — if one were to arrive at some deep insight into why one should really believe there isn't some deep magic here.  Which might itself be some rather deep and interesting magic.

Frankly, I don't even know quite where to look for this hypothetical deep magic.  I sense its presence, as I've just described; but so far, I'm exploring various questions in the general neighborhood, patiently, with the notion that if these sorts of great insights naturally emerge from a large, broad body of research (as they have done for complex numbers), the chances of finding such a thing should improve as one increases the overall size and breadth of one's body of lesser insights.

Which brings me back to the particular point I'm pursuing in this post, the full quaternion nabla.

Full nabla

From a purely technical perspective, it isn't difficult to define four versions of the full quaternion nabla, differing only by whether each imaginary acts on its corresponding partial derivative by left-multiplying, right-multiplying, left-dividing, or right-dividing.  The only remaining — purely technical — question is how to write these four different operators in an uncluttered way that keeps them straight.  Since the traditional nabla has three partial derivatives and is denoted by a triangle, I'll denote these full nablas, with four partial derivatives, by a square.  To keep track of how the imaginaries are introduced, I'll put a dot inside the square, near one of the corners:  upper left for left-multiplying by imaginaries, upper right for right-multiplying, lower left for left-dividing, lower right for right-dividing.  (This operator notation affords coherence, as the dot is inside so there's no mistaking it for a separate element, and, as a bonus, should also be easy to write quickly and accurately by hand on the back of an envelope.)

Let  a = f(x).  Noting that for imaginary ik,  1/ik = −ik, the full-quaternion nablas are

 ●     
 
a  = 
a
x0
 + 
i1 a
x1
 + 
i2 a
x2
 + 
i3 a
x3
     ● 
 
a  = 
a
x0
 + 
a i1
x1
 + 
a i2
x2
 + 
a i3
x3
 
 ●     
a  = 
a
x0
 − 
i1 a
x1
 − 
i2 a
x2
 − 
i3 a
x3
 =  (
 ●     
 
a )*
 
     ● 
a  = 
a
x0
 − 
a i1
x1
 − 
a i2
x2
 − 
a i3
x3
 =  (
     ● 
 
a )*
and when we expand  a = a0 + a1i1 + a2i2 + a3i3,
 ●     
 
a
 =  (
a0
x0
a1
x1
a2
x2
a3
x3
)
+ i1 (
a1
x0
+
a0
x1
+
a3
x2
a2
x3
)
+ i2 (
a2
x0
a3
x1
+
a0
x2
+
a1
x3
)
+ i3 (
a3
x0
+
a2
x1
a1
x2
+
a0
x3
) .
Here, the left-hand column is the partial with respect to x0, and the rest is the fragmentary differential operators from vector calculus:  the rest of the top row is minus the divergence, the rest of the diagonal is the gradient, and the remaining six terms are the curl.  When we reverse the order of multiplication for the right-multiplying
     ● 
 
, the imaginaries commute with the scalars and with themselves, but anticommute with each other — so everything stays the same except that the sign of the curl is reversed.  We have
 ●     
 
 = 
x0
div + grad + curl
     ● 
 
 = 
x0
div + grad curl
 
 ●     
 = 
x0
+ div grad curl
 
     ● 
 = 
x0
+ div grad + curl .
By taking differences between these nablas, one can isolate the partial with respect to x0, and the curl, and... the gradient minus the divergence.  One cannot, however, separate the gradient from the divergence this way, which raises the suspicion that the gradient and divergence are, in some profound sense, a single entity.  There may be some insights waiting here into the intuitive meanings of these various fragments of the full nabla.

Wait.  Wasn't part of the point of the 1890s debate that the quaternionists maintained the whole quaternion was in a profound sense a single entity?  Why are we still talking about the meanings of fragments of this thing, instead of the whole?  And while we're at it, why is it in any way meaningful to multiply-or-divide the partial derivatives by the basis elements?

Partial derivatives

From here, the path I've been following breaks up, with faint trails scattering off in many directions.  No one trail immediately suggests itself to me as especially worth a protracted stroll, so for now I'll take a quick look down the first turn or so of several, getting a sense of the immediate neighborhood, and let my back‍brain mull over what to explore in some future post.

Possibly, in my quest for the deeper meaning of the nabla operator, I may be asking too much.  With the caveat that this may be one of those situations where it's right to ask too much; some kinds of results must be pursued that way; but it's worth keeping in mind that, idealism as may be, there's always been a strong element of utility in the nabla tradition.  Starting with, as noted above, the pre-quaternion history of nabla, the choice of operator has been in significant part a matter of what works.

A secondary theme that's been in play at least since Shunkichi Kimura's 1896 treatment is total derivatives versus partial derivatives.  Without tangling in the larger question of coherent meaning, Kimura did address this point explicitly and up-front:  why write

 ●     
 
a  = 
a
x0
 + 
i1 a
x1
 + 
i2 a
x2
 + 
i3 a
x3
rather than
 ●     
 
a  = 
da
dx0
 + 
i1 da
dx1
 + 
i2 da
dx2
 + 
i3 da
dx3
  ?
Kimura, after noting that the two forms are interchangeable when the xk are independent, chose partial derivatives.  And reached this choice by considering the utility of the two candidate operators in expressing some standard equations, and adopting the operator he finds notationally more convenient.  It figures this would be the operator using partial derivatives, which are more technically primitive building blocks and thus —one would think— ought logically to provide a more versatile foundation.

An (arguably) more definite form of the total/partial question appears in modern quaternionic treatments of Maxwell's equations ([1], [2]), with the peculiar visible consequence that the definition of full nabla in these treatments has a stray factor of 1/c on the partial with respect to time (x0).  On investigation, this turns out to be a consequence of starting out with the total derivative with respect to time, supposing (as I track this, three and a half decades after I took diffy Q‍s) the whole is time-dependent.  Expanding the partials,

d
d‍x0
 = 
x0
 + 
x1
x0 x1
 + 
x2
x0 x2
 + 
x3
x0 x3
.
Now, the partials
xk≥1
x0
are the velocities of propagation along the spatial axes, which for Maxwell's equations are taken to be the speed of light, c.  This factor of c therefore shows up on three out of four partials, but not on the partial with respect to time; for convenience —that again— one defines an operator with a factor of 1/c on it, which eliminates the extra factors of c on three of the partials, but introduces a 1/c on the partial with respect to time. 

And then there is the matter of orienting the partials.  Which I'm still foggy on, how the imaginaries get in there and thus whether they multiply or divide, on the left or on the right.  I see treatments just splicing the imaginaries in with at most a casual reference to orientation in an algebra, which early classroom experience has conditioned me to treat as someone who understands it all and doesn't take time to explain every little thing (I've been in that position a few times myself); but over time I've started to suspect that the folks acting so in this case might not really understand it any better than I do (I've been in that position, too).

Generalized quaternions

Quaternions lost out on the concrete front to vector calculus.  But they also lost out on the abstract front.  Mathematicians took Hamilton's idea of using axioms to define more general forms of numbers and reason about their properties, and ran with it.  Linear algebra.  Clifford algebras.  Lie and Jordan algebras.  Rings.  Groups.  Monoids.  Semi-groups.  People who want special numbers won't go as far as quaternions, and people who want general numbers won't stop at quaternions.

Yet, generalized quaternions — quaternions whose four coefficients aren't real numbers — have occasionally been employed.  Why?  On the face of it, generalized quaternions don't have the specific properties that make real quaternions unique.  Are they used, then, out of some perceived mystical significance of quaternions, or is there actually something structural about quaternions, aside from their unique mathematical properties as a division algebra, that they can confer even in the generalized setting?  I do not, of course, have a decisive answer for this question.  I do have some places to look for small insights building toward prospects of an answer.

The places to look evidently fall into two groups, those that look within the scope of real quaternions and those that look at generalized forms of quaternions.  In looking at real quaternions the point is to understand what they have to offer beyond mere unique division, that might possibly linger after the unique division itself has dropped away.  I'll have more to say, further below, about real than generalized quaternions; I'm simply not familiar with much research using generalized quaternions as such, as most researchers either stick with real quaternions or drop quaternion structure.

On the generalized-quaternions front, I've already mentioned Silberstein; but, tbh, all I get from Silberstein is the question.  That is, Silberstein's work suggests to me there's something of interest in generalized quaternions, but doesn't go far enough to identify what.  There are some well-known generalizations that go off in different directions from Silberstein; besides geometric algebras, which are enjoying some popularity atm, there's the Cayley–Dickson construction, which offers an infinite sequence of hypercomplex algebras with 2n components, each losing just a bit more well-behavedness:  complexes, quaternions, octonions, sedenions, and on indefinitely (though usually not bothering with fancy names beyond the sedenions).  So far, I haven't felt any of those sorts of generalizations were retaining the character of quaternions; so that, whatever merits those generalizations might enjoy in themselves, they wouldn't offer insights into the peculiar merits of quaternions.

As it happens, I do know of someone who continues further in what appears to be the same direction as Silberstein.  But there's a catch.

The work I'm thinking of was done about sixty years ago by a Swedish civil engineer by the name of Otto Fischer.  He wrote two books on the subject, Universal Mechanics and Hamilton's Quaternions (1951) and Five Mathematical Structural Models in Natural Philosophy with Technical Physical Quaternions (1957).  It happens I can study the earlier book all I want, because my father bought a copy which I've inherited.  Fischer indeed did not stop at real quaternions nor biquaternions.  He moved on to what he called quadric quaternions — quaternions whose coefficients are themselves quaternions in an independent set of imaginaries, thus with six elementary imaginaries in two sets of three, and sixteen real coefficients — and thence to double quadric quaternions, which are quadric quaternions whose sixteen coefficients are themselves quadric quaternions in independent imaginaries, thus twelve elementary imaginaries in four sets of three, and 256 real coefficients.  If what is needed to bring out the secrets of generalized quaternions is a sufficiently general treatment, Fischer should qualify.

Looking back now, Fischer's work looks a bit fringe; but it didn't look so extra-paradigmatic at the time.  The 1890s vectors-quaternions debates were in the outer reaches of living memory, about as far removed as the 1950s are today; and work on quaternions had been done by some prominent physicists within the past few years.  In particular, Sir Arthur Eddington, who had tinkered with quaternions, had only recently died.  Fischer's work was — deservedly — criticized for its density, but afaict wasn't dismissed out of hand, as such.

In any case, my current interest is on the periphery of things, rather than in the center of prevailing paradigm research; so I can afford to tolerate a certain off-beat character in Fischer's work — up until Fischer gives me a reason to think I've nothing further worthwhile to find in it.  And Fischer comes across as competent and quite self-aware of the density and indirection of his work, which he seeks to mitigate — though there's a real question as to whether he succeeds.

What I really want to understand about Fischer's work is, having provided himself with such an immense array of generalized quaternionic structure, what does he use it for?  There are some clues readily visible in the preface and final sections of the book; somehow he seems to be associating different quaternion subsets of his general numbers with different specialties, and he's playing some kind of games with "pyramids" of differential operators.  To really get a handle on it all, I fear it may be necessary to confront the book in full depth from page 1, which I've tried far enough to realize it's the single densest mathematical treatment I've encountered (though he does take very seriously his own advice to "begin at the beginning", else I'd hold out no hope at all of making sense of it).

So, studying Fischer's work may be one source of... eventual... insights into the puzzle of generalized quaternions.  It certainly isn't a short-term prospect; but, there it is.

Before getting back to real quaternions in the next section, I'll digress to remark that Fischer reinforces a belief I've held ever since I really started researching the history of quaternions — in 1986 — that what we really need in mathematics is a certain type of software.

By my reading of the history, the vectorists in the 1890s debate really did have one important practical point in their favor:  if you have to deal with the algebra by hand, it seems it'd be vastly easier to not make careless errors when following the rectangular regimentation of matrix algebra than the spinning vortices of quaternion algebra.  (Recalling from my earlier post, the equivalence between matrix and quaternion methods is akin to the equivalence between particles and waves — with quaternions playing the part of waves.)  That is, if you try to do quaternion algebra, involving breaking things down into components, on the back of an envelope you're awfully likely to make a mistake; so I immediately imagined having a computer help you get it right.  (I didn't imagine a graphical user interface, btw, as that technology really didn't exist yet for personal computing.  Looking back, I find myself ambivalent about GUIs; sure, they can be sparkly, but they don't always help us think clearly; we're so busy thinking of how to use the graphics, we forget to think first and foremost about the logical structures we'd like to interface with.)

Thinking about this idea, I eventually decided the underlying logical structures one wants would be essentially proofs, so that in a sense the software would be a sort of "proof processor", by loose analogy with the "word processor".  Achieving the fluidity of back-of-the-envelope algebra was always key to my concept; my occasional encounters with "symbolic math" software have given me the impression of something far too cumbersome for what I envision.  Facilely moving between alternative paths of reasoning should be easy; symbol definition would seem to call for something halfway between conventional "declarative" and "imperative" styles.  I also imagined the computer trying, in its free moments, to devise context-sensitive helpful suggestions for what to do next — without trying to take control of the proof task away from the human user.  I've never been a fan of fully-automated proof, as such; in the early days of personal computing (as a commenter on another of my posts reminded me) we anticipated computers of the future would enhance our brain power, not attempt to replace it, and the enhancements weren't to be just increasing our ability to look things up, either.

Where does Fischer come into this?  Well, Fischer not only deals with massive grids of coordinates, his notation looks extremely idiosyncratic to me, using different conventions than anything else I've seen.  Perhaps a typical 1950s Swedish civil engineer would find much of it quite conventional.  But, unless you spend all of your time in one narrow mathematical sub‍community, studying mathematics is a pretty heavily linguistic exercise, because every sub‍community has their own language and one is forever having to translate between them.  Wouldn't it be nice to be able to just toggle some controls and switch between the way one author (such as Fischer) wrote, and the conventions used by whichever other author you prefer?

Btw, this software I'm describing?  Not a minor interest.  Not just a past interest.  I still want it, all the more because, even though I've always felt it was doable and would be immensely valuable, afaict we're no closer to having it now than we were thirty years ago.  Never assume that what you think is needed will be provided by somebody else.  Think of it this way:  if you can see it's doable and would be valuable, presumably you'd be more likely that most people to make it happen; so if you aren't going to the effort to make it happen, that's a sample of one suggesting that nobody else will go to the effort either.  I've also never felt I could properly describe this software in words, so even if I was gifted with a team of programmers to implement it I couldn't tell them what to do; so I figured if it was going to happen I'd have to do it myself.  Only, it looks like a huge project, so for one person to implement it would require a programming language with unprecedentedly vast abstractive power.  By some strange coincidence, designing a programming language like that is something I've been striving for ever since.

Rotation

Another trail that, sooner or later, clearly needs to be explored is the relationship, at its most utterly abstract, between quaternions and rotation.

Hamilton was looking at rotations, from the start.  Quaternions, as noted, stand in relation to matrices as waves to particles; in some profound sense, quaternions seem to be the essence of rotation.  The ordinary understanding of quaternion division is that a quaternion is the ratio of two three-vectors, and the non-commutativity of quaternion multiplication then follows directly from recognition that rotations on a sphere produce different results if done in a different order.  Even Silberstein, who was using biquaternions rather than real quaternions and was working in Minkowski rather than Euclidean spacetime, was doing rotations, which in itself suggests that what's going on is more than meets the eye.

This is a tricky point.  The relationship between quaternions and rotation is readily explained, indeed rather trivialized, in terms of peculiarities of rotation in three-dimensional Euclidean space.  This is very much the canonical view, the one embraced by Penrose.  Real quaternions become a single case in a general framework, and are then easily dismissed as merely an aberration that loses its seeming specialness when the wider context is properly appreciated.

The weakness in this reasoning is that it depends on the choice of general framework.  This would be easier to see if the framework involved were alternative rather than mainstream.  Suppose there were two different general frameworks in which the specific case (here, quaternions) could be fit; and in one of these frameworks, the specific case appears incidental, while in the other framework it appears pivotal.  It would then be hard to make a compelling case, based on the first framework, that the specific case is incidental, because the second framework would be right there calling that conclusion into question.  If the first framework is the only one we know about, though, the same case can be quite persuasive.  To even question the conclusion we'd have to imagine the possibility of an alternative framework; and actually finding such an alternative could be a formidable challenge.  Especially with the possibility hanging over us that perhaps the alternative mightn't really exist after all.

Investigating this trail seems likely to become an intensive study in avoiding conceptual pitfalls while dowsing for new mathematics.

Minkowski

A narrow, hence more technically fraught, target for mathematical dowsing is Minkowski spacetime.  Minkowski's decisive condemnation of a quaternionic approach —"too narrow and clumsy for the purpose"— is a standard quote on the subject, cited by quaternion opponents and proponents alike.  If there is an alternative general framework to be found, after all, it'd have to handle Minkowski.

Without actually wading into this thing (not to be undertaken lightly), I can only note from a distance a few features that may be of interest when the time comes.  The mechanical trouble in this is evidently to do with the pattern of signs, which seems reminiscent of the multiple variants of nabla (though the pessimist in me insists it can't be quite that easy); which, logically, oughtn't be applicable to the situation unless one were really already dealing with a derivative.  Off hand, the only way that comes to mind for derivatives to come into it is if the whole physical infrastructure is something less obvious than what Minkowski was doing — which, yes, is cheating; and cheating (so to speak) is likely the only way to end up with a different answer than Minkowski did, so this might, just conceivably, be a hopeful development.

Langlands

I wondered whether even to mention this.  The geometric Langlands correspondence lies at the extreme wide end of mathematical dowsing targets; about as poetic as mathematics comes (which is very poetic indeed), and at the same time about as esoteric as it comes (yea, verily).

Mathematics in its final form is, of course, highly formal (I say "of course", but see my earlier remarks on axioms as a legacy of quaternions).  The ideas don't start out formal though; and there's always lots of material that hasn't yet worked its way across to the formal side.  Moreover, attempts to describe the poetry of mathematics for non-mathematicians, in my experience, ultimately fail because they're trying to do something that can't really be done:  they're trying to divorce the (very real) poetic nature of mathematics from the technical nature of the subject, and at last this can't really be done because the true poetry is that the elegance arises from the technicalities.

Poking around on the internet, I found a discussion on Quora from a few years ago on the question Can the Langlands Program be described in layman's terms?  There were some earnest attempts that ultimately devolved into technical arcana; but my favorite answer, offered by a couple of respondents, was in essence:  no.

My own hand-wavy assessment:  Robert Langlands conjectured broad, deep connections between the seemingly distant mathematical subjects of number theory and algebraic geometry.  Especially distant in that, poetically speaking, number theory is a flavor of "discrete" math, while algebraic geometry is toward the continuous side of things.  (I riffed on the discrete/continuous balance in physics some time back.)  An especially high-publicity result fitting within this vast program was Andrew Wiles's proof of Fermat's Last Theorem, which hinged on proving a conjecture about elliptic curves.

Why would I even bring up such a thing?  The Langlands program has gotten tangled up, in this century, with supersymmetry in physics; and the geometric side of Langlands is about complex curves.  In effect, Langlands biases mathematical speculations toward further enhancing the reputation of complex numbers.  So if one suspects physics may also lean toward the quaternionic, and one is also looking for interesting mathematical properties of quaternions, it seems fair game to ask whether quaternions can play into some variation on Langlands.