Friday, February 13, 2015

Sapience and language

Language is the dress of thought
Samuel Johnson, The Life of Cowley, 1779(?).
Only a few, out of the hundred, claimed to use mathematical symbology at all. [...] All of them said they did it [math or physics] mostly in imagery or figurative terms. An amazing 30% or so, including Einstein, were down here in the mudpies [doing]. Einstein's deposition said, "I have sensations of a kinesthetic or muscular type." Einstein could feel the abstract spaces he was dealing with, in the muscles of his arms and his fingers[...] almost no adult creative mathematician or physicist uses [symbology] to do it[...] They use this channel to communicate, but not to do their thing.
Alan Kay, Doing With Images Makes Symbols, 1987.

I've some thoughts to pursue here, drawing together the timeline of human evolution, the nature of human consciousness, human language processing, evolutionary memetics, and a few other odds and ends.

I've been developing various elements of this for years; several of them have featured in earlier posts on this blog.  They came together into a big picture of sorts early this year as I've been reading Richard Leakey's 1994 The Origin of Humankind.  (I've found the content of great interest, although, for some reason I've been unable to quite place, the writing style often causes me to lose track of what he's just said and have to back up, sometimes by a page or more and sometimes more than once; I was sufficiently interested I was willing to back up for comprehension.  Interestingly, the effect seemed less pronounced when the material was read aloud.)

Early hominids
Driverless cars
Thinking without language
The non-Cartesian theater
The dress of thought
Early hominids

Leakey's book describes controversies in paleoanthropology.  Fascinating stuff if you're interested in how scientific ideas develop (which I am), or in how humans developed (which I also am).  Much of it has to do with when in human evolution various peculiarly human traits emerged.

Darwin suggested that three major human traits all co-evolved, developing simultaneously as a package because they complement each other — making and using tools, bipedalism which frees up the hands to make and use tools, and a big brain for figuring out how to make and use tools.  Since it's a cogent idea and, into the bargain, appeals to one's sense of human exceptionalism, the co-evolution idea held sway for about a century before people questioned it and it broke down under scrutiny.  Bipedalism seems to have emerged around five million years ago, tools maybe two and half million years ago, and Leakey suggests a bigger brain more or less contemporaneous with tools.

Two other events are also of interest:  the emergence of language, and the emergence of recognizably modern human behavior, with art, tools for making tools, tools with artistic flare, tools for making clothing, etc.  The latter took place, by the evidence, around 35,000 years ago in Europe, some tens of thousands of years earlier in Africa; the beginning of the Upper Paleolithic, a.k.a. the Late Stone Age.  When language emerged is harder to judge, but Leakey suggests it goes back as far as tools and the big brain.  (It seems somewhat ironic that Leakey stresses how Darwin's co-evolution theory was wrong, and then Leakey separates out bipedalism but ends up promoting co-evolution of a different set of traits; I suspect he's likely right, but it is a bit bemusing that the theories would reshuffle that way.)

A question that hovers over Leakey's notion of early language development is, why did tool use evolve incredibly slowly for more than two million years, and then accelerate hugely at the beginning of the Upper Paleolithic?  As it happens, I have an answer ready to hand, speculation of course but an interesting fit for the purpose (noting, by Leakey's account paleoanthropology has a healthy mix of speculation in it):  the onset of the Upper Paleolithic might be the observable manifestation of the transition to Havelock's oral society from what, here in an earlier post, I called verbal society.

Our available example of verbal society — so I conjectured — is the Pirahã society recently discovered in the Amazon.  An atypical example, necessarily under the conjecture, because examples typically would have disappeared dozens of millennia ago.  To remind from the earlier post, here's a key excerpt from David J. Peterson's enumeration of anomalous properties of the Pirahã:

no temporal vocabulary or tense whatsoever, no number system, and a culture of people who have no oral history, no art, and no appreciation for storytelling.
If early genus homo had that sort of culture, it would seem to explain rather well why things picked up spectacularly when they got out of that mode and into the more advanced oral culture of Havelock's Preface to Plato.

This in turn offers some insight into a question that arose in the relation of verbal culture to my still earlier blog post on memetic organisms.  According to my theory, sciences are a major taxon of memetic organisms specifically adapted to literate society, but they cannot survive in oral society; and religions were a major taxon of memetic organisms in oral society, but cannot survive in verbal society.  The first commenter on my verbal-society post asked what sort of memetic organisms would be dominant in verbal society.  I suggested language itself as a memetic taxon, a suggestion I'm now more doubtful of but which, in any case, seems at best a rather incomplete answer.  If the transition to oral culture is the onset of the Upper Paleolithic, though, we have at least a basis from which to speculate on the memetics of verbal society, because verbal society is then the memetic environment that gives rise to the archaeological record of the first two and a half million years or so of human technology.  I've no deep thoughts atm on what specifically one might infer from this archaeological record about the memetic evolution behind it, except perhaps that memetic evolution in verbal society was very slow; but in principle, it offers a place for such memetic investigations to start.

A side-plot in my post on verbal society was the observation that the verbal-to-oral transition was a good fit for the story of Adam and Eve's eating of the fruit of the tree of knowledge, and being expelled for that from the Garden of Eden.  I got criticized for that comparison.  I'd had the comparison in mind, really, not primarily based on likelihood of it actually being the origin of the story, but because I'd noticed a connection to the backstory of how we know about the Pirahã.  As recorded in Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle (which I recommend), the Pirahã were studied by Christian missionaries; ultimately, the hope was to bring the Word of God to the Pirahã.  The Pirahã culture wouldn't support a religion, which is an oral memetic organism; in fact, one of the missionaries (who wrote the book) ended up converting to atheism.  When the comparison occurred to me between verbal-to-oral transition and eating of the fruit of the tree of knowledge, I was struck that this analogy would cast the Christian missionaries in the role of the Serpent in the Garden of Eden.  The irony was irresistible, which is how the comparison ended up in the blog post.

In comments, the alternative suggestion was raised that the Fall corresponds to the development of agriculture (the onset of the Neolithic).  A theory I'd heard before and that does seem to fit passably well.  With several more years to think about it, though, I don't think these theories are mutually exclusive.  The story could of course have nothing to do with either of these actual events; but if it has a basis in actual events, there's no reason it should be based on just one real event.  In oral culture, elements are introduced into a tradition and then shift around and mutate, tending toward a stable form.  So the story of the Fall, which eventually got written down, can contain echoes of multiple major ancient societal shifts that don't have to have happened all at the same time.  Reconsidering the comparison now, I'm struck by the inclusion, on Leakey's list of innovations at the beginning of the Upper Paleolithic, of tools for making clothing — recalling that when Adam and Eve ate of the fruit of the tree of knowledge, one of the immediate effects was that they realized they were naked, and... made clothes.  Huh.

Driverless cars

Proponents of aviation often cite statistics for how safe it is compared to driving.  Statistics, though, even when accurate can often miss the point by presuming what questions ought to be asked.  One such difficulty here is that the statistics comparing aviation to driving are per unit distance.  I recall an SF novel (looking it up, A Talent for War) describing an interstellar hyperspace-jump technology in which ships sometimes just don't come back out of a jump — but it's the "safest form of travel per passenger-kilometer".

There is a second way those statistics on aviation versus driving can miss the point.  Some things aren't even about probabilities.  What if someone would rather risk themself in a situation where they have a significant degree of control over events rather than risk themself in a situation where they've completely given up control to someone else?  That's a decision based on a philosophical criterion, not a numerical one, and there is nothing inherently irrational nor ignorant about it.

The matter has been further complicated by the advent of fly-by-wire airplane technology.  If one could rationally prefer to retain control rather than give it up to another human, how about giving up control to another human versus to a software system?  It has long been my impression (and I gather I'm not alone) that the people least willing to trust computers are found amongst those who know most about them.  Now, here's a subtle point:  it seems to me that when someone is concerned with whether or not they have significant control over events, it's not the routine events they care about, but the exceptional ones.  The unforeseen circumstances.  And here there does seem to be a qualitative difference between a human pilot and a fly-by-wire system.  The fly-by-wire system is programmed some time beforehand by programmers who try to anticipate what situations the system should be able to handle; rather by definition, it doesn't know how to handle unforeseen circumstances.  To some extent, at least, the fly-by-wire system incorporates knowledge of past accidents, although there may also be some conscious trade-offs in which not every known possibility is built into the system.  The human, on the other hand, draws on their experience to try to cope with the situation — which might sound much the same as the fly-by-wire system incorporating knowledge of past accidents, but I submit is qualitatively different.  The fly-by-wire approach is algorithmic, while the human approach is improvisational. 

The difference may be clearer in the case of driving, which readers of this blog would seem more likely to have first-hand experience of (the reader is more likely to have driven a car than piloted an airplane).

Driverless cars have been touted lately as a coming technology.  Those with a vested interest (be it financial or emotional) in the success of the technology naturally tend to portray it as "safer" than cars driven by humans.  Yet we're told driverless cars couldn't be put on the road in numbers unless one also banned human drivers.  Why would that be?  Presumably because the driverless cars would have trouble with unexpected human behavior.  Which begs the question, if humans are so unpredictable, and driverless cars are (going to be) so much better drivers than humans are, why would human drivers be better than driverless cars at coping with unpredictable drivers?  It's often seemed to me, when some other driver does something unusual and I compensate for it, or I do something unusual and other drivers compensate for it, that what's really impressive about the statistics on traffic accidents isn't how high the numbers are, but how low the numbers are.  Spend some time driving in traffic, and it's a good bet you'll see a bunch of situations where a human's unexpected behavior didn't result in an accident because other human drivers successfully compensated for it — improvisationally.

Humans are really good at coping with free-form unexpected situations.  Comparatively.  We don't always handle a sudden problem correctly; but we also don't fall off the edge of our programming, either.  A computer program that doesn't know what to do really doesn't know what to do, in a way that clearly distinguishes natural intelligence from conventional software.  (I'm interested here in the nature of human intelligence; artificial intelligence is not to the current point, though the current discussion might offer some insights into it.)

It seems likely to me that the point of the hominid big brain is to enable individuals to cope with unexpected situations.  This might even help to explain our impulse to be in control of our own fate when an emergency happens:  we are, as a species, specialists in coping with the unexpected, and it's in the best interests of our selfish genes that we each rely on our own ability to cope, so that the combinations of genes that cope most effectively will be favored in the gene pool.  In other words, dispassionately, if somebody is going to be pruned from the population because of a failure to cope with an emergency, our selfish genes are better off if they get pruned for their own failure rather than someone else's.  (Yes, social behaviors vastly complicate this; there's definitely a place for good Samaritans, heroes, etc.; but atm I'm commenting on why individuals would desire to control their own fate, not why they'd commit acts of altruism.)

One might take the speculative riff further than that.  One of the questions that comes up in archaeology is, why would we have started in East Africa?  Leakey suggests the emergence of bipedalism had to do with varying habitats brought on by the advent of the East African Rift.  It certainly seems plausible bipedalism would have been enabled and favored by some sort of habitat shift.  The subsequent emergence of tools, with co-evolving big brain and language, might then be supposed to follow simply from the opportunity provided by bipedalism; but for my part, I'm inclined to think the tool/brain/language effect may have required more of an evolutionary nudge than that.  If it were that easily catalyzed, one might think it would have happened before us, and left evidence that we would have found by now.  (Granted, that's easy to poke holes in.  Maybe we're just randomly the first.  Maybe it only happens once per planet because it rapidly causes the planet to be destroyed.  Or —a rather more fun hypothesis— perhaps it has happened and left evidence, and the evidence is staring us in the face but we're assuming something that prevents us from seeing it.  As long as you don't bring ancient aliens into it; I've contempt for fake science, though I enjoy exotic serious speculations.)  So perhaps the tool/brain/language co-evolution got going, once enabled by bipedalism, because something in the environment was particularly irregular and therefore favored individuals able to individually cope with unforeseen circumstances.

What would this hypothetical environmental factor be, whose irregularity favors individual intelligence to cope with it?  Besides the East African Rift, Leakey mentions the social intelligence hypothesis, that intelligence evolved to predict the behaviors of others in a complex social milieu.  (This could put an interesting spin on Asperger's syndrome, which one might conjecture would isolate brain power from its usual application to socialization, potentially making a surplus available for other purposes.)

If that's too cerebral for you, here's an alternative theory:  Maybe language started as a rather pointless mating display, like so many other exaggerated animal features such as the peacock's tail.  So that men trying to chat up women in bars would be what the species originally evolved for.  (In all seriousness, the two ideas are not mutually exclusive; language skill might have had some sort of survival value from the outset and consequently it was beneficial for it to be treated as desirable in a mate.)

Thinking without language

The Sapir–Whorf hypothesis says that language influences thought.  Benjamin Lee Whorf, writing in the early-to-mid twentieth century, called this the principle of linguistic relativity, alluding to Einstein's theory of relativity since linguistic relativity implies that how we perceive the world varies with what language we use.  Modern understanding of the hypothesis distinguishes strong and weak versions:  the strong version says that the structure of a language prevents its speakers from non-conforming patterns of thought, the weak version that the structure of a language discourages its speakers from non-conforming patterns of thought.  Typically for ideas named after people, Sapir and Whorf never coauthored a paper on the idea, didn't present it as a hypothesis, and didn't make the modern distinction between strong and weak versions.

The strong Sapir–Whorf hypothesis isn't taken very seriously by linguists nowadays, but the weak form is generally accepted to some extent or other.  Popular culture enshrines the language–thought connection with tropes "in <language X>, there are forty different words for <Y>" and "in <language X>, there is no word for <Y>".  Either claim is ambiguous as to which direction it expects the thought/language influence to go.  Out of context, I'd guess forty-different-words-for is typically meant to imply that <language X> speakers think about <Y> a lot, which depends on thought influencing language; while no-word-for is more ambiguous in direction, and seems to me at least as often meant to imply that the speakers cannot even conceive of <Y>.  Saying they can't conceive of it could still be using the language as evidence for thought, but seems likely to have a stiff dose of strong Sapir–Whorf mixed in.  The glaring flaw in this strong Sapir–Whorf reasoning is that if your language doesn't have a word for <Y>, and you have a need for such a word, you're likely to invent a word for it — borrowing from another language, compounding from existing vocabulary, or whatever sort of coinage <language X> favors.

Word coinage is an example of how thought can, rather than being limited by language, drive expansion of language to encompass new realms of thought.  This however raises a subtler question about the relation between language and thought.  The meaning <Y> appears to conceptually presage the new word — but how much wiggle room is there between the ability to think of <Y> and the ability to express it?  Could you have chosen in the example, as a <language X> speaker, to hold off on inventing a word for <Y>, and think about <Y> for a while without having a name for it?  More so, is it possible to think without language?  Is there perhaps a threshold of sophisticated thought, beyond which we need language to proceed?

Seems to me there's plenty of evidence of nontrivial thinking without language.  It's definitely possible to think in pictures; I sometimes do this, and I've heard of others doing it (and, see the Alan Kay epigraph at the top of this post).  One might suggest that pictures, being a concrete representation, are a sort of "language".  But to amplify, it's possible to think in abstracts represented by pictures without any accompanying verbal descriptions of the abstracts.  I'm confident of the lack of accompanying verbal descriptions because, in general, the abstracts may lack short names, while long, often awkward descriptions would have been conspicuous if present.  In such a case, the pictures represent relationships amongst the abstracts, not the abstracts themselves, and possibly not all the relationships amongst them — so even if the pictures qualify as language, they express far less than the whole thought.  More broadly, it's a common experience to come up with a deep idea and then have trouble putting it into words — which evidently implies that the two acts are distinct (coming up with it, and putting it into words), and frankly this effect isn't adequately explained by saying the thinker didn't "really" have an idea until they'd put it into words.  The idea might become more refined whilst being put into words, and sometimes one finds in the process that the idea doesn't "pan out"... but there has to have been something to be refined, or something to not pan out.

So I don't find it at all plausible that thought arises from language.  That, however, does not interfere either with the proposition that language naturally arises from thought, nor that language facilitates thought, both of which I favor.  (I recall, a few years ago in conversation with an AI researcher, being mistaken for an advocate of symbolic AI since I'd suggested some reasoning under discussion could be aided by symbols.  There's a crucial difference between aiding thought and being its foundation; these ideas need to be kept carefully separate.)

The non-Cartesian theater

Daniel Dennett's 1991 book Consciousness Explained is centrally concerned with debunking a common misapprehension about the internal structure of the human mind.  Dennett reckoned the misapprehension is a holdover from the seventeenth-century mind–body dualism of René Descartes.  In Cartesian dualism as recounted by Dennett, the body is a machine which, based on input to the senses, constructs a representation of the world in the brain where the mind/soul apprehends it.  Although this doesn't actually solve the problem of interfacing between a material body and nonmaterial soul, it does at least simplify it by reducing the interface to a single specialized organ where the mind interacts with the material world (Descartes figured the point of interface was the pineal gland).  Modern theorists envision the mind as an emergent behavior of the brain, rather than positing mind–body dualism; but they still have an unfortunate tendency, Dennett observed, to envision the mind as having a particular place — a "Cartesian theater" — where a representation of the world is presented for apprehension by the consciousness.  An essential difficulty with this model of mind is that, having given up dualism, we can no longer invoke a supernatural explanation of the audience who watches the Cartesian theater.  If a mind is something of type M, and we say that the mind has within it a Cartesian theater, then the audience is... something of type M.  So the definition of type M is infinitely recursive, and supposing the existence of a Cartesian theater has no explanatory value at all.  To understand consciousness better we have to reduce it to something else, rather than reducing it to the same thing again.

As an alternative to any model of mind based on the Cartesian theater, Dennett proposes a "multiple drafts" model of consciousness, in which representations of events are assembled in no one unique place (Cartesian theater) and, once assembled, may be used just as any other thoughts, revised, reconciled with each other, etc.  Which is all very well but seems to me to be mostly a statement of lack of certain kinds of constraints on how consciousness works, with little to say about how consciousness actually does work.

It occurred to me in reading Dennett's book, though, that in setting about debunking a common misapprehension he was also making a mistake I'd seen before — when reading Richard Dawkins's 1976 book The Selfish Gene.  Dawkins's book is another that sets about debunking a common misapprehension, in that case group selection, the idea that natural selection can function by selecting the most fit population of organisms.  Dawkins argued compellingly that natural selection is intrinsically the differential survival of genes:  whatever genes compete most successfully come to dominate, and only incidentally do these genes assemble and manipulate larger structures as means to achieve that survival, such as individual organisms, or populations of organisms.  A hardy individual, or a hardy population, may help to make genes successful, but is not what is being selected; natural selection selects genes, and all else should be analyzed in terms of what it contributes to that end.  Along the way, Dawkins illustrates the point with examples where people have fallen into the trap of reasoning in terms of group selection — and this is where, it seemed to me, Dawkins himself fell into a trap.  The difference between group selection, where a population of organisms is itself selected for its success, and gene selection, where a population of organisms may promote the success of genes in its gene pool, is a subtle one.  I got the impression (though I highly recommend Dawkins' book) that Dawkins was somewhat overeager to suppose thinking about successful populations must be based on the group-selection hypothesis, and thereby he might be led to discount some useful research.

Likewise, Dennett seemed to me overenthusiastic about ascribing a Cartesian theater to any model of consciousness that involves some sort of clearinghouse.  That is, in his drive to debunk the Cartesian theater, the possibility of a non-Cartesian theater may have become collateral damage.  This struck me as particularly unfortunate because I was already coming to suspect that a non-Cartesian theater might be a useful model of consciousness.

There is a widespread perception (my own first encounter with it was in a class on HCI in the 1980s) that the human mind has a short-term memory whose size is about seven plus-or-minus-two chunks of information.  Skipping much historical baggage associated with this idea (it can be a fatal mistake, when looking for new ideas, to take on a set of canonical questions and theories already known to lead to the very conceptual impasse one is trying to avoid), the notion of a short-term memory of 7±2 chunks has interesting consequences when set against the "Cartesian theater".  Sure, we can't reduce consciousness by positing a Cartesian theater, but this short-term memory looks like some kind of theater.  If short-term memory stores information in chunks, and our brain architecture has an evident discrete aspect to it, and experience suggests thoughts are discrete, perhaps we can usefully envision the audience of this non-Cartesian theater as a collection of agents, each promoting some thought.  An agent whose thought relates associatively to something on-stage (one of the 7±2 chunks) gets amplified, and the agents most successfully amplified get to take their turn on-stage, where they can bring their thought to the particular attention of other members of the audience.  There are all sorts of opportunities to fine-tune this model, but as a general heuristic it seems to me to answer quite well for a variety of experienced phenomena of consciousness — fundamentally based on a non-Cartesian theater.

The dress of thought

If "chunking" information is a key strategy of human thought, this might naturally facilitate representing thought using sequences of symbols with a tree-like structure.  Conversely, expressing thought in tree-like linguistic form would naturally afford facile thinking as well as facile expression.  Thus, as suggested above, language would naturally arise from thought and would facilitate thought.  As an evolutionary phenomenon, development of thought and development of language would thus favor each other, tending to co-evolve.

Treating language as a naturally arising, and facilitating, but incomplete manifestation of thought seems to me quite important for clear thinking about memetic evolution, notably for clear thinking about memetic evolution across my conjectured phase boundaries of culture, from verbal to oral and from oral to literate.  The incompletion tells us we should not try to understand memes, nor culture, nor even language evolution, as a purely linguistic phenomenon.  It does raise interesting questions about how culture and thought — the stuff of memes — are communicated from person to person (memetically, from host to host).  Does the Pirahã language instill the Pirahã culture?  Experience suggests that communicating with people in their physical presence is qualitatively more effective than doing so by progressively lower-bandwidth means, with text-only communications — which are extremely widespread now thanks to the internet — way down near the bottom.  Emoticons developed quite early in the on-line revolution.  Does the internet act as a sort of cultural filter, favoring transmission of some kinds of culture while curtailing others?  I don't mean to answer these questions atm; but I do suggest that exploring them, including realizing they should be asked, is facilitated by understanding the thought–language relationship.

Somewhere along the evolutionary curve, co-evolving thought and language become an effective platform for the evolution of memetic lifeforms.  From there onward, we're pushed to reconsider how we think about the evolutionary process.  I agree with Dawkins that group selection is a mistake in thinking — that groups of organisms, and individual organisms, are essentially phenotypes of genes, and our understanding of their evolution should be grounded in survival of the fittest genes.  The relationship between genes and memes, though, is something else again.  At best, our genes and our memes enjoy a symbiotic relationship.  Increasingly with risings levels of memetic evolution, our genes might be usefully understood as vehicles for our memes, much as Dawkins recommended understanding organisms as vehicles for our genes.  This is in keeping with a general trend I've noticed, that thinking about memetic evolution forces us to admit that concepts in evolution have fuzzy boundaries.  From the more familiar examples in genetic evolution we may habitually expect hard distinctions between replicator and phenotype, between organism and population; we want to assume an organism has a unique, constant genome throughout its life; we even want, though even without memetics we've been unable to entirely maintain, that organisms are neatly sorted into discrete species.  All these simplifying assumptions are a bit blurry around the edges even in genetic biology, and memetics forces us to pay attention to the blur.

As a parting shot, I note that chunking is an important theme in two other major areas of my blogging interest.  In one of several blog posts currently in development relating to physics (pursuant to some thoughts already posted, cf. here), I suggest that while quantum mechanics is sometimes portrayed as saying that in the strange world of the very small, seemingly continuous values turn out to be discrete, perhaps we should be thinking of quantum mechanics the other way around — that is, our notion of discrete entities works in our everyday world but, when we try to push it down into the realm of the very small it eventually cannot be pushed any further, and the pent-up continuous aspect of the world, having been smashed as flat as it can be, is smeared over the whole of creation in the form of quantum wave functions with no finite rate of propagation.  In another developing post, I revisit my previously blogged interest in Gödel's Theorem, which arose historically from mathematicians' struggles to use discrete — which is to say, symbolic — reasoning to prove theorems about classical analysis.  I don't go in for mysticism: I'm not inclined to suppose we're somehow fundamentally "limited" in our understanding by the central role of information chunking in our thought processes (cf. remarks in a previous blog post, here); but it does seem that information chunking has a complexly interesting interplay with the dynamics of the systems we think about.

Friday, April 4, 2014

Why is beta-substitution like the Higgs boson?

"Why is a raven like a writing desk?"
"No, I give it up," Alice replied. "What's the answer?"
"I haven't the slightest idea," said the Hatter.
Alice's Adventures in Wonderland, Chapter 7, Lewis Carroll.

I'm always in the market for new models of how a system can be structured.  A wider range of models helps keep your thinking limber; the more kinds of structure you know, the more material you have to draw inspiration from when looking for alternatives to a given theory.

Several years ago, in developing vau-calculi, I noticed a superficial structural similarity between the different kinds of variable substitution I'd introduced in my calculi, and the fundamental forces of nature in physics.  (I mentioned this in an earlier blog post.)  Such observed similarities can, of course, be white noise; but it's also possible that both seemingly unrelated systems could share some deep pattern that gives rise to the observed similarity.  In the case of vau-calculi and physics, the two systems are so laughably disparate that for several years I didn't look past being bemused by it.  But just recently I was revisiting my interest in physics TOEs (that's Theory of Everything, the current preferred name, last I heard, for what colloquially used to be called Grand Unified Theory, and before that, Unified Field Theory), and I got to thinking.

This will take a bit of set-up, and the payoff may be anticlimactic; but given the apparent extreme difficulty of making progress in this area at all, I'll take what I can get.

Substitution in vau calculi
Theories of Everything
Hygienic physics
Substitution in vau calculi

Traditionally in λ-calculi, all variables are bound by a single construct, λ, and manipulated by a single operation, called substitution.  Substitution is used in two ways.

The major rearrangements of calculus terms take place in an action called β-rewriting, where a variable is completely eliminated by discarding its binding λ and replacing all references to it in the body of the old λ with some given argument.  The part about eliminating the old λ is just a local adjustment in the structure of the term; but the replacement of references is done by β-substitution, which is not localized at a particular point in the term structure but instead broadcast across the body (an entire branch of the term's syntax tree).  When you do this big β-substitution operation, you have to be careful.  A naive rule for substituting argument A for variable x in body B would be "replace every reference to x in B with A".  If you naively do that you'll get into trouble, because B might contain λs that bind either x, or some other variable that's referred to in A.  Then, by following that naive rule you would lose track of which variable reference is really meant to refer to which λ.  This sort of losing track is called bad hygiene.

To maintain hygiene during β-substitution, we apply α-renaming, which simply means that we replace the variable of a λ, and all the references to it, with some other variable that isn't being used for other purposes and so won't lead to confusion.  This is a special case of the same sort of operation as β-substitution, in which all references to a variable are replaced with something else; it just happens that the something else is another variable.  These two cases, β-substitution and α-renaming, are not perceived as separate functions, just separate uses of the same function — substitution.

It's possible to extend λ-calculus to encompass side-effectful behaviors — say, continuations and mutable storage — but to do so with well-behaved (technically, compatible) rewriting rules, you need some sort of bounding construct to define the scope of the side-effect.  In my construction of vau-calculus (a variant λ-calculus), I developed a general solution for bounding side-effects with a variable-binding construct that isn't λ, and operating the side-effects using a variable-substitution function different from β-substitution.  (Discussed here.)

I ended up with four different kinds of variables, each with its own substitution operation — or operations.  All four kinds of variables need α-renaming to maintain hygiene, though, and for the three kinds of side-effect variables, α-renaming is not a special case of operational substitution.  If you count each variable type's α-renaming as a separate kind of substitution, there are a total of nine substitution functions (one both α and operational, three purely α, and five purely operational).  λ-variables emerge as a peculiarly symmetric case, since they're the only type of variable whose substitutions (α and β) are commensurate.

This idea of multiple kinds of variables was not, btw, an unmixed blessing.  One kind of variable — environment variables — turned out to be a lot more complicated to define than the others.  Two kinds of variables (including those) each needed a second non-α substitution, falling into a sort of gray area, definitely not α-renaming but semi-related to hygiene and not altogether a full-fledged operational substitution.  The most awkward part of my dissertation was the chapter in which I developed a general theory of rewriting systems with multiple kinds of variables and substitution functions — and the need to accommodate environment variables was at the heart of the awkwardness.

Theories of Everything

Theoretical physics can be incredibly complicated; but when looking for possible strategies to tackle the subject, imho the only practical way to think about it is to step back from the details and look at the Big Picture.  So here's my take on it. 

There are, conventionally, four fundamental forces:  gravity, electromagnetism, the weak nuclear force, and the strong nuclear force.  Gravity was the first of these we got any sort of handle on, about three and a half centuries ago with Isaac Newton's law of universal gravitation.  Our understanding of electromagnetism dates to James Clerk Maxwell, about one and a half centuries ago.  We've been aware of the weak nuclear force for less than a century, and the strong nuclear force for less than half a century.

Now, a bit more than a century ago, physics was based on a fairly simple, uniform model (due, afaics, to a guy about two and a half centuries ago, Roger Joseph Boscovich).  Space had three Euclidean dimensions, changing with respect to a fourth Euclidean dimension of time; and this three-dimensional world was populated by point particles and space-filling fields.  But then in the early twentieth century, physics kind of split in two.  Two major theories arose, each of them with tremendous new explanatory power... but not really compatible with each other:  general relativity, and quantum mechanics. 

In general relativity, the geometry of space-time is curved by gravity — and gravity more-or-less is the curvature of space-time.  The other forces propagate through space-time but, unlike gravity, remain separate from it.  In quantum mechanics, waves of probability propagate through space, until observation (or something) causes the waves to collapse nondeterministically into an actuality (or something like an actuality); and various observable quantities are quantized, taking on only a discrete set of possible values.  These two theories don't obviously have anything to do with each other, and leave gravity being treated in a qualitatively different way than the other forces.

Once gravity has become integrated with the geometry of space-time — through which all the forces, including gravity, propagate — it's rather hard to imagine achieving a more coherent view of reality by undoing the integration already achieved in order to treat gravity more like the other forces.  As a straightforward alternative, various efforts have been taken to modify the geometry so as to integrate the other forces into it as well.  This is made more challenging by the various discrete-valued quantities of quantum mechanics, as the geometry in general relativity is continuous.  The phenomena for which these two theories were created are at opposite scales, and the two theories are therefore perceived as applying primarily at those scales:  general relativity to the very large, and quantum mechanics to the very small; so in attempting to integrate the other forces into the geometry, modification of the geometry tends to be at the smallest scales.  The two most recently-popular approaches to this are, to my knowledge, string theory and loop quantum gravity.

I've remarked in an earlier blog post, though, that the sequence of increasingly complex theories in physics seems to me likely symptomatic of a wrong assumption held in common by all the theories in the sequence (here).  Consequently, I'm in the market for radically different ways one might structure a TOE.  In that earlier post, I considered an alternative structure for physics, but I wasn't really looking at the TOE problem head-on; just observing that a certain alternative structure could, in a sense, eliminate one of the more perplexing features of quantum mechanics.

Hygienic physics

So here we have physics, with four fundamental forces, one of which (gravity) is somehow "special", more integrated with the fabric of things than the others are.  And we have vau-calculus, with four kinds of variables, one of which (λ-variables) is somehow "special", more integrated with the fabric of things than the others are.  Amusing, perhaps.  Not, in itself, suggestive of a way to think about physics (not even an absurd one; I'm okay with absurd, if it's different and shakes up my thinking).

Take the analogy a bit further, though.  All four forces propagate through space-time, but only gravity is integrated with it.  All four operational substitutions entail α-renaming, but only β-substitution is commensurate with it.  That's a more structural sort of analogy.  Is there a TOE strategy here?

Well, each of the operational substitutions is capable of substantially altering the calculus term, but they're all mediated by α-renaming in order to maintain hygiene.  There's really quite a lot more to term structure than the simple facets of it affected by α-renaming, with the quite a lot more being what the rewriting actions, with their operational substitutions, engage.  There is, nonetheless, for most purposes only one α-renaming operation, which has to deal with all the different kinds of variables at once, because although each operational substitution directly engages only one kind of variable, doing it naively could compromise any of the four kinds of variables.

Projecting that through the structural analogy, we envision a TOE in which the geometry serves as a sort of "hygiene" condition on the forces, but is really only a tangential facet of the reality that the forces operate on — impinging on all the forces but only, after all, a hygiene condition rather than a venue.  Gravity acts on the larger structure of reality in a way that's especially commensurate with the structure of the hygiene condition.

Suggestively, quantum mechanics, bearing on the three non-gravitational forces, is notoriously non-local in space-time; while the three kinds of non-λ variables mediate computational side-effects — which is to say, computational effects that are potentially non-local in the calculus term.

The status of gravity in the analogy suggests a weakness in the speculations of my earlier post on "metaclassical physics":  my technique for addressing determinism and locality seems to divorce all forces equally from the geometrical structure of reality, not offering any immediately obvious opportunity for gravity to be any more commensurate with the geometry than any other force.  I did mention above, that post wasn't specifically looking at TOEs; but still, I'm inclined to skepticism about an approach to fundamental physics that seeks to mitigate one outstanding problem and fails to suggest mitigation for others — that's kind of how we ended up with this bifurcated mess of relativity and quantum mechanics in the first place.  As I also remarked in that post when discussing why I suspect something awry in theoretical physics, while you can tell you're using an unnatural structure by the progression of increasingly complicated descriptions, you can also tell you've hit on the natural structure when subsidiary problems just seem to melt away, and the description practically writes itself.  Perhaps there's a way to add a hygiene condition to the metaclassical model, but I'd want to see some subsidiary problems melting.

Supposing one wants to try to construct a TOE, metaclassical or not, based on this strategy, the question that needs answering is, what is the primary structure of reality, to which the geometry serves a sort of tangential hygiene-enforcement role?  For this, I note that the vau-calculus term structure is just a syntactic representation of the information needed to support the rewriting actions, mostly (though not exclusively) to support the substitutions.  So, the analogous structure of reality in the TOE would be a representation of the information needed to support... mainly, the forces, and the particles associated with them.  What we know about this information is, thus, mainly encapsulated in the table of elementary particles.  Which one hopes would give us the wherewithal to encompass gravity — the analog to β-substitution — since it includes a mediating particle for mass:  the Higgs boson.

Wednesday, March 19, 2014

The great vectors-versus-quaternions debate

What?  You've never heard of it?  A big knock-down, drag-out fight between great minds of its day over, more-or-less, the philosophy of how to go about mathematical physics.  None of this "let's do an experiment to distinguish between these two theories" stuff; that's for wimps.  This was the deep stuff:  nuts-and-bolts versus mathematical elegance; generic versus well-behaved; even, so we're told, particles versus waves (I kid you not).

Old paradigms get crushed; it's part of how new paradigms establish and maintain themselves.  History gets buried.  But that doesn't mean we have to like it, or stand for it.  As I write this, here's the sum total of what Wikipedia's article History of quaternions has to say about this colorful event in the history of mathematical physics:

From the mid-1880s, quaternions began to be displaced by vector analysis, which had been developed by Josiah Willard Gibbs and Oliver Heaviside.  Both were inspired by the quaternions as used in Maxwell's A Treatise on Electricity and Magnetism, but — according to Gibbs — found that "... the idea of the quaternion was quite foreign to the subject."  Vector analysis described the same phenomena as quaternions, so it borrowed ideas and terms liberally from the classical quaternion literature.  However, vector analysis was conceptually simpler and notationally cleaner, and eventually quaternions were relegated to a minor role in mathematics and physics.
Yawn.  (Also, so much for Wikipedian neutrality; but that's a different can of worms.)

In this post, I resurrect a paper about the vectors-versus-quaternions debate, written in the Long Ago when we used things called typewriters, and wrote-in special symbols by hand.  It's been languishing for years in a file folder, one of those physical things that are the models for the icons on your phone.

Here's how the paper came about.  I learned about the vectors-versus-quaternions debate from my father.  In fact, I learned about quaternions from my father.  I inherited his enthusiasm for them.  And then, in my third year at WPI, I seized an opportunity to study the debate in depth.

One of the requirements for the BS degree at WPI was the Humanities Sufficiency project.  The idea was that tech students should be well-rounded, so they should take (and pass) a bunch of humanities classes and then, building on those classes, write a term paper on a subject that bridges the gap between humanities and sciences.  WPI had an undergraduate grade called "NR", short (I think) for "Not Recorded":  if you didn't pass a course, it didn't go on your transcript (though you didn't get your tuition back, naturally).  This resulted in students taking some classes because they were interested, and being sometimes more concerned with learning than with getting high grades.  So you'll understand when I say, it took me till my third year to accumulate the class credits I needed for the Sufficiency because I only passed about 50% of the humanities classes I took.  Though a bunch of people, including me, were rather bemused when I not only passed, but got high marks in, Philosophical Theories of Knowledge and Reality.

I chose vectors-versus-quaternions as my topic, with the no-nonsense title "Quaternions: A Case Study in the Selection of Tools for Mathematical Physics".  The Sufficiency was ordinarily a half-semester project, but wasn't required to be, and with such a juicy topic and personal interest, of course I took a bit over that.  Professor Parkinson actually apologized for not giving me a top grade on it, explaining that he had a strict rule never to give a top grade to a Sufficiency that took more than the basic half-semester.  At the time, what I thought (but at least had enough tact not to say to him) was that I was doing the work for its own sake, not for a mere grade.  Later, when that grade caused me to graduate With Distinction instead of With High Distinction, I understood why he'd apologized.  And was belatedly a bit put out, after all.

The things I learned from this paper have ever after informed my understanding of how scientific paradigms are chosen — a really major theme in my life since, after all, my master's thesis (pdf) brushed against a rejected paradigm (extensible languages), while my dissertation outright resurrected one (fexprs).  The influence from this is also deep in the foundations of my thinking on memetic organisms, which I blogged on some time back.

The writing style is a bit stiffer, here and there, than I strive for now.  (I was even worse in highschool.)  All in all, though, I'm still fairly pleased with the piece.

The original had both footnotes meant to be read inline, and endnotes with bibliographical details.  Here, I've put both in sections at the end, using letters for the erstwhile footnotes, numbers for the endnotes.  While I'm being pedantic, this version of the paper has three changes from the original version submitted in Spring 1986.  (And, just to prove what a nerd I am, the changes were made in 2002.)  Footnotes [m] and [y] have been added; and where Hamilton's nabla is first defined, I've corrected it to use full derivatives, having originally miswritten it with partial derivatives.

Associative memory is a strange thing.  Certain details stick with us.  I remember worrying about one partial sentence from Crowe that I just couldn't think how to put differently, and in the end deciding to let that passage stand; though at this late date I've no idea which passage that would be.

Contents: Title Body Footnotes Endnotes Bibliography

Quaternions: A Case Study in the Selection of Tools for Mathematical Physics

John N. Shutt

Presented to: Professor Parkinson
Department of Humanities
Term D, 1986

Submitted in Partial Fulfillment
of the Requirements of
the Humanities Sufficiency Program
Worcester Polytechnic Institute
Worcester, Massachusetts
Corrections and additions: 11 March 2002

Quaternions are a form of hypercomplex number with four components.  Mathematically, they are the next most well-behaved algebra after the complex field.  The extent of their usefulness for mathematical physics has been in doubt since their discovery.

This paper examines historically the principal issues in the use of quaternions for mathematical physics.  The historical and mathematical background of quaternions is examined, followed by their application first to classical physics, and then to modern physics.  The paper concludes with an analysis of some of the major issues.

Vectorial analysis, in its general sense, is the mathematical treatment of directed magnitudes.  It arose in the first half of the nineteenth century as a synthesis of two major trends of thought, one in physics and the other in mathematics.1

It has been pointed out2 that the geometry of physics after Newton differed intrinsically from that of the ancient Greeks.  The difference is that, while Newtonian physics is set on a Euclidean stage, many of the principal players are (implicitly) vectors, which are not present in Euclid.

This left physics by the early nineteenth century working under a handicap.  Vectors underlay most of physics, but could not be handled gracefully.  The primary mathematical tool for handling geometry was the Cartesian coordinate system; Cartesian coordinates are flexible in principle, but in practice they are apt to be unwieldy and generally opaque.  A natural language for vectors was needed.a

Mathematics by this time had become overextended.3  The problem was with the concept of number.  The formulae of algebra, originally developed using only positive numbers, were now being successfully applied to negative and complex numbers.  Since mathematicians had traditionally grounded their work on intuition, the more extensive number systems left many mathematicians uneasy.

The problem led George Peacock, in 1833, to postulate the principle of permanence of form.  This principle said that "Whatever algebraical forms are equivalent when the symbols are general in form but specific in value [positive integers], will be equivalent likewise when the symbols are general in value as well as in form."4  By 'general in form' is meant that properties of a particular number cannot be generalized; for example,  14 mod 7 = 0  doesn't imply that  x mod 7 = 0.  'General in value' is meant to refer to fractional, irrational, negative, or complex numbers, but leaves a question almost as big as the one to which the principle is addressed.  Despite its shortcomings, the principle was important because it did recognize that algebra is based on rules.

At least six peopleb had independently devised the geometrical representation of complex numbers before Gauss finally published the idea in 1831.5  Several of these people used this representation as a justification for the complex number field.  (Of the first two of the six to make the discovery, Wessel embraced this justification but Gauss did not.)  It was Gauss's publication that finally drew general attention to the idea.

However, William Rowan Hamilton (1805–1865) was not aware of Gauss's 1831 paper until 1852.  He was influenced instead by John Warren, in whose work he would have been exposed to the concepts of the associative, commutative and distributive laws.  Hamilton did not consider the geometrical approach a sufficient justification.  In 1837 he presented a fresh approach, interpreting complex numbers as algebraic couples of real numbers.7  He defined addition, subtraction, multiplication and division of couples and then derived from them the primitive properties of complex numbers.

These mathematical developments suggested to many of the mathematicians involved that a further extension of number might be analogous to (3-dimensional) space.8  It was generally expectedc that the sought-after extension would have three terms, and obey all the laws of complex algebra (associative, commutative, and distributive), as well as having close ties to spatial geometry.

The ties to spatial geometry take many forms.  The two that Hamilton eventually settled on are (1) the Law of the Norms,d and (2) unique division.  In retrospect, only the real and complex numbers satisfy both all the usual laws and (2), and (1) is simply impossible in three dimensions.e 9

The idea that triplets (Hamilton's term) might not satisfy all the usual laws had occurred to Hamilton as early as 1830.  He was acquainted in particular with non-commutative multiplication from some speculations in set theory.10  On 16 October 1843, as he walked to a Council of the Royal Irish Academy, several of the above ideas converged in his mind to produce quaternions.  He had been working recently with triples of the form  a+bi+cj, with  i2 = j2 = −1; he now realized that he could satisfy (1) above by making the assumptions that  ij = −ji, and that this product yielded a third imaginary component  k = ij, with  k2 = −1.11

The resulting quaternion has the form  a+bi+cj+dk, with  i2 = j2 = k2 = ijk = −1.  Quaternion multiplication is distributive over addition, and associative, but not commutative.9  The norm is a modulus of multiplication, and right-division and left-division are unique.  Real and complex numbers and quaternions are the only three possible division algebras — that is, algebras with associative and commutative addition, distributive and associative multiplication, and unique division.f

Hamilton created a plethora of new terms for use in his new algebra.14  A quaternion q is made up of a real part, called the scalar of q and denoted  Sq, and an imaginary part, called the vector of q and denoted  Vq.  Alternatively, it can be expressed as the product of a positive real number (the 'length,' or square root of the norm of q), called the tensor of q and denoted  Tq, and a quaternion with tensor equal to one, called the versor of q and denoted  Uq.g  Thus  q = Sq + Vq = TqUq.

A versor u has a unique decomposition  u = cos θ + v sin θ  with angle  0 ≤ θ ≤ π  and unit vector  v = UVu.h 15  If p is a vector perpendicular to vp' = vp  is the rotation of p by angle θ about v.  This allows great-circular arcs in space to be represented by quaternions, leading to elegant proofs in spherical trigonometry.

Any non-zero quaternion q has a unique inverse  q−1  such that  qq−1 = q−1q = 1.  Left- and right- division by q are defined respectively as pre- and post- multiplication by  q−1.  If q is a versor  q = cos θ + v sin θ, then for an arbitrary vector pp' = qp(q−1)  is the conical rotation of p by angle 2θ about v.

Another useful decomposition is that of the quaternion product of vectors into its scalar and vector parts.  If u,v are vectors separated by angle θSuv = −(TuTv) cos θ  and  Vuv = (TuTv) (sin θn  where  n = UVuv  is a unit vector perpendicular to u and v.i  The scalar part is commutative  (Suv = Svu), and the vector part anticommutative  (Vuv = −Vvu).  Suv  and  Vuv  were later to form the basis for modern vector analysis.

Quaternions were the subject of a debate at the British Association meeting of 1847.16  George Peacock, who favored quaternions, did not come forward, but Sir John Herschel did, and called quaternions "a cornucopia of scientific abundance." Against quaternions it was objected that owing to their complexity, quaternion calculations are overly prone to mistakes.  There was also at the meeting at least one representative of the status quo; in Hamilton's words,

Mr. Airy, seeing that the subject could not be cushioned, rose to speak of his own acquaintance with it [quaternions], which he avowed to be none at all; but gave us to understand that what he did not know could not be worth knowing.

The background to this paper would be incomplete without some mention of Hermann Günther Grassman (1807–1877).17  In 1844 Grassman published a work, monumental in both size and scope, entitled Die lineale Ausdehnungslehre (calculus of extension).

The ideas behind the Ausdehnungslehre began in 1832 with the interpretation of a parallelogram as the geometrical product of two lines.  Grassman generalized this insight to other shapes and an arbitrary number of dimensions, and placed it on an abstract mathematical basis.  The system of the Ausdehnungslehre was a very broad mathematical generalization originating from these geometrical concepts.  Several types of multiplication were defined, the only requirement for a multiplication being distributivity over addition.

The Ausdehnungslehre of 1844 was written in a strongly metaphysical style, and was also highly abstract at a time when mathematics was based on concrete intuition.  Grassman was unknown.  Consequently, the Ausdehnungslehre went unnoticed by the world at large.

Despite Grassman's efforts, including a revised Ausdehnungslehre in 1862, his work remained obscure throughout his life, only beginning to attract interest about the time of his death.  One by one, Grassman's discoveries were remade by others, with Grassman's anticipation unveiled in a subsequent question of priority.

Although Grassman's inner and outer products are similar respectively to the scalar and vector parts of Hamilton's quaternion product of vectors, Grassman was conceptually distant from quaternions.  The significance of the Ausdehnungslehre here is that it encompasses an n-dimensional system of vectorial analysis.j

An account will now be given of three significant figures in the application of quaternions who worked within the quaternion tradition in the nineteenth century.  These three figures are Hamilton, Tait, and Maxwell.  Following this, the circumstances will be described by which quaternions were abandoned in favor of vector analysis.

The first major publication on quaternions was Hamilton's Lectures on Quaternions of 1853.19  The text ran to over 700 pages.  It was difficult to read; by 1859, Herschel — a great enthusiast of quaternions and an able mathematician — had only managed to read through 129 of its pages.

In 1859 Hamilton began work on the Elements of Quaternions.  It was originally to be an elementary treatise, but became a reference work longer than the Lectures — though without the metaphysical emphasis of the earlier work.  The Elements developed quaternions mathematically in great detail, but did not add to their physical application.  By his own admission, Hamilton was by this time out of touch with contemporary physics.

Hamilton was convinced of the value of quaternions to physics, and had published scattered such applications.  In 1846 he had defined a (nameless) operator  ᐊ = iddx + jddy + kddz.  However, he never did concentrate his own efforts on applications to physics, choosing instead to develop quaternionic theory.  He had planned for a major section of his Elements on the ᐊ operator, but the section was never written because of his death in 1865.  The Elements was published posthumously in 1866.

Peter Guthrie Tait purchased and read Hamilton's Lectures in 1853, out of general curiosity.k 21  In 1857 he encountered an application that reminded him of Hamilton's ᐊ operator; pursuing this, he shortly became a devoted quaternionist, ultimately succeeding Hamilton as their chief advocate after the latter's death.

Tait's interest in quaternions was for their physical applications.  His Elementary Treatise on Quaternions, published in 1867, was the first accessible introduction to quaternions.  This work went into some detail on the operator ∇,l using it to express several important theorems (e.g. Green's and Stokes').

Tait did much to further quaternionic applications to physics.  Oddly, he seems to have scrupulously avoided quaternions in his other work, including all of his lectures at the University of Edinburgh.  Quaternions are also omitted from Tait's collaboration in mechanics with Lord Kelvin, the Treatise on Natural Philosophy; speaking of this later, Kelvin said,

We [Kelvin and Tait] have had a thirty-eight years' war over quaternions....  Times without number I offered to let quaternions into Thomson and Tait [the Treatise], if he could only show that in any case our work would be helped by their use.  You will observe that from beginning to end they were never introduced.
It should be understood that Kelvin was throughout his life resolutely opposed to all vectorial methods.

James Clerk Maxwell originally derived his equations in the 1860's using component notation.22  He began to study quaternions in 1870.  In his Treatise on Electricity and Magnetism of 1873, he presented both component and quaternionic notation.

Maxwell was a firm believer in physical analogy.  He favored quaternions as an aid to thinking, because the notation corresponds more closely than does that of components to physical reality.  For calculation, however, he considered component notation superior.  He made this distinction in the preliminary chapter of the Treatise, where he advocated "the introduction of the ideas, as distinguished from the operations and methods of Quaternions."

Maxwell's use of quaternions in the Treatise was accordingly limited primarily to the restatement of important results in quaternionic form.  Nonetheless it led some physicists who had never done so before to study quaternions.

In particular, this was the case with Josiah Willard Gibbs of America and Oliver Heaviside of England.23  These two men proceeded independently along very similar lines.

From Maxwell's Treatise, both went to Tait's Elementary Treatise on Quaternions.  Both observed that, as actually used by Maxwell and for the most part even by Tait, the vector/scalar partition of quaternions was more important than the quaternions themselves.  Both then developed systems that treated vectors and scalars as entirely separate entities,  V ∇  and  S ∇  as separate operators,m etc.

Heaviside went no further than this.  His notation was not entirely compatible with Tait's, but he never introduced concepts outside the quaternion tradition.  Thus his system was essentially a subset of quaternion analysis.

Gibbs however, broke all ties with Hamilton, even to citing Grassman as the main precedent for his system.n  His notation is substantially different from Tait's; in particular he replaced the prefix operators of Hamilton with infix operators.o (∇ was an exception to this.)  Most significantly, he introduced a concept totally alien to quaternion analysis — that of the dyad.  (A dyad is neither a vector nor a quaternion, but a tensor.)

Neither Gibbs nor Heaviside shared Tait's scruples about using their systems in their other work.  Gibbs applied his vector analysis in periodic courses at Yale starting in 1879, and in some of his physics papers.  It was Heaviside who did most to disseminate vector analysis; he made heavy use of his system in several important electrical publications, such as his Electromagnetic Theory, permanently linking vectors to that rapidly growing field.

A debate took place in the early 1890's, on the proper vectorial system for mathematical physics.p 25  This debate involved more than thirty letters and articles over five years (1890–1894) in eight leading scientific journals, as well as a scattering of other published writings.  It was primarily between Gibbs and Heaviside on one side and the English quaternionists on the other.

Superficially, a prominent characteristic of the debate was its colorful verbiage.  Heaviside was the contributor to this for the vectorists, while considered vectorist ideology was supplied primarily by the more dignified Professor Gibbs.  Metaphors and the odd pot shot are scattered through the quaternionists' writings, which are often pervaded by a tone of bitterness.  Particularly fiery are the literary antics of Alexander McAulay, an unknown youngster who joined the ranks of the quaternionists in 1893 26,26j with what Tait called "the perfervid outburst of an enthusiast."

The figures of Grassman and Hamilton became weapons in the debate.  The quaternionists played heavily on Hamilton's fame.  The vectorists dissociated themselves from Hamilton entirely, and placed themselves firmly behind Grassman,q for whom they built a reputation.  As a result of this and the ultimate triumph of vector analysis, Hamilton's fame was tarnished.  (It was later resurrected through Hamilton's characteristic function in quantum mechanics.)

Gibbs was conversant with the systems of Hamilton and Grassman as well as his own.  This served him well on several occasions in the debate, for Tait was ill acquainted not only with Grassman's Ausdehnungslehre but with both Gibbs' and Heaviside's vector analyses.r

The question recurs throughout the debate, of why quaternions had not been more widely accepted.s This was generally not itself an issue (an exception is Gibbs' letter to Nature of 16 March 1893 26g), but served as a focal point for other issues.

The opening shots of the debate were fired by Tait,26a and were aimed principally not at vector analysis but at component notation.  His arguments centered on expressiveness; no detail need be given, as the principal interest of the current discussion is with issues between vectorial systems.t  However, it is significant that Tait apparently failed to appreciate the coming threat of vector analysis.  He seems to have repeatedly underestimated his opponents for several years into the debate.

What touched off the controversy was the following passage from the preface to the 1890 (third) edition of Tait's Treatise.

Even Prof. Willard Gibbs must be ranked as one of the retarders of Quaternion progress, in virtue of his pamphlet on Vector Analysis; a sort of hermaphrodite monster, compounded of the notations of Hamilton and Grassman.

The issue of notations, which Gibbs early subordinated to that of notions,u 26b nevertheless was addressed repeatedly.  Gibbs disliked the prefix operators of quaternions, because infix operators were the existing norm.26b  Tait countered that the prefix operators allowed the use of fewer parentheses, thus enhancing brevity of expression.26c

C. G. Knott objected to the large number of operators in Gibbs' notation26e, illustrating his point with Gibbs' abbreviations 'Pot,' 'New,' 'Lap,' and 'Max,' all of which represent various combinations of the Nabla operator.  Gibbs argued convincingly26i that the quaternionic equivalents of these operators were too complicated to be intelligible.

Alexander Macfarlane, like Knott a former student of Tait, brought out another issue in 1891:26d the sign of the scalar product.  In quaternion analysis,  Suv  for (positive) vectors u,v is negative because  i2 = j2 = k2 = −1  but the vectorists had no stake in  √−1, so for convenience they defined the scalar product to be positive.

Macfarlane had his own solution to this.  He distinguished versors from vectors.  Since two right turns produce a reversal  (−1), he set  i2 = j2 = k2 = −1  for versors; but for convenience he set  i2 = j2 = k2 = 1  for vectors.  Thereafter in the debate he represented a third faction.  The vectorists never specifically addressed his system, so that his net influence was simply to undermine the quaternionists.

The vectorists never addressed the issue of the sign of the scalar product; in any case, there was no need for them to do so, since Macfarlane did it for them.  The response to Macfarlane's innovation came from Knott.  In December of 1892, Knott objected26e that without  i2 = j2 = k2 = −1, multiplication ceases to be associative.  Macfarlane answered in May of 1893 26h that, just as with commutativity, associativity is only a convention and can be given up with impunity if it is convenient to do so.

The crux of the controversy was the issue of notions.  Everyone in the debate (except Cayley, as noted earlier) agreed that vectors and scalars are important for physics.  The vectorists maintained that the quaternion has no notable physical interpretation (other than rotation, Gibbs acknowledged, but dyadics serve this purpose satisfactorily26b).  The burden of proof throughout the debate thus lay on the quaternionists, although Tait did once make the same accusation of artificiality against dyadics.26c

The quaternionists did little during the debate to prove their case.  Knott did address the question twice.  In 1892 he argued simply that the ratio of two vectors is a fundamental concept, so that since this ratio is a quaternion, quaternions are fundamental.26e  In 1893 he added to this an analogy:

Although  sin θ  and  cos θ  occur more frequently than θ itself, we should not conclude that θ plays no fundamental role.  Similarly we should not infer that  αβ  [the full quaternion product] is not fundamental simply because  Vαβ  and  Sαβ  occur more frequently.31

There is another issue which recurs periodically throughout the debate.  It is generalizability to higher numbers of dimensions.  From his first article in 1890, Tait praised quaternions for being "uniquely adapted to Euclidean [i.e. three-dimensional] space."26a  Gibbs in his first letter of 1891 praised vectors for being generalizable "to space of four or more dimensions."26b  These views are representative of the positions taken on this issue by the respective sides in the debate.  There is one exception: in February of 1893, Dr. William Peddie, Tait's assistant at the University of Edinburgh, attempted to show that "quaternions are applicable to space of four or any number of dimensions."26f

If there was a winning side to the debate, it was the vectorists.  They consistently outmaneuvered (Gibbs) and outspoke (Heaviside) their opponents.  More fundamentally, the quaternionists did little to justify their position on the crucial issue of notions.  In any event, the decisive factor in the ultimate acceptance of vector analysis was Heaviside's active use of it in his published work.v

Passing mention may be made of the International Society for Promoting the Study of Quaternions and Allied Systems of Mathematics.32  It was organized shortly after the debate by Shunkichi Kimura, residing at Yale at the time, and Pieter Molenbroek, and published a bulletin from 1900 to 1913.  The Society was plagued with difficulties from its inception, the first of which was that Tait declined the presidency due to ill health. (He died in 1901.)  In 1913, all the offices were due for election at the same time, with no one to arrange it, and the Society slipped quietly into oblivion.

There was a tendency among quaternionists, which surfaced on several occasions in the debate, to think of the vectorists as ungrateful children.  This is presumably the source of the bitter overtone in their writings that was mentioned earlier.  It may be interesting in relation to this to consider the following excerpt from a review by Heaviside written in the early 1900's.

... as time went on, [after the controversy] ... it was most gratifying to find that Prof. Tait softened his harsh judgments, and came to recognize the existence of rich fields of pure vector analysis, and to tolerate the workers therein....  I appeased Tait considerably (during a little correspondence we had) by disclaiming any idea of discovering a new system.33

Quaternions will now be considered in their application to twentieth-century physics.  It is appropriate in this regard to recount the early history of the idea of using the real part of a quaternion to represent time.  Priority in this idea belongs to Hamilton.

In the hierarchy of thought, Hamilton placed mathematics above physics, and metaphysics above mathematics.34  An important element of his philosophy was the metaphysical importance of the number three.35  He attached much significance to the tridimensionality of space, and this was a major impetus for his search for algebraic triplets.

After arriving at quaternions by purely algebraic means, Hamilton struggled to reconcile them with his philosophy.  At least since 1835 he had used time as a metaphysical justification for the real numbers; this may have contributed to his later identification of quaternions with the four dimensions of space and time.

He clearly failed to pass on the idea to the physicist Tait.  Recall that in the 1890's controversy the quaternionists represented quaternions as "uniquely suited" to three-dimensional space.  Tait did speculate on the possibility of a fourth dimension, as did Maxwell.37  However, it is not clear that either of them associated a fourth dimension with time.

In 1896, Kimura pointed out that ∇ is not a full quaternion operator because it does not include the derivative with respect to the real component.38  He introduced a full quaternion differential operator  q∇, and used the scalar component to represent the derivative with respect to time.  He had physical applications in mind.

The possibility of using quaternions occurred to Hermann Minkowski when he was formulating his space-time in the 1890's (naturally enough, since this was during the vector-quaternion controversy).  He rejected them completely as "too narrow and clumsy for the purpose."39

To understand what might have motivated Minkowski to make this judgment, consider some basic four-dimensional properties of quaternions.  Cayley showed in the 1850's that the general rotation in Euclidean four-space may be expressed by the quaternion formula  p' = upv, where u,v are arbitrary versors.w 40  Also in Euclidean four-space, multiplication by an arbitrary versor  u = cos θ + v sin θ  may be understood as a relative turning by angle θ through 4-space.

Unfortunately, Minkowski space-time is non-Euclidean.  While quaternions have 'length' (modulus of multiplication)  Tq = (w2 + x2 + y2 + z2)12, interpreted as a space-time vector q should have length  (w2 − x2 − y2 − z2)12.41

A solution to this problem was found in 1912 by Ludwig Silberstein.42  He let the scalar component, representing time, be imaginary by introducing a fourth  √−1  independent of the three of quaternions.  He was thus working with what Hamilton called biquaternions, quaternions whose four coefficients are complex numbers.  Biquaternions do not, of course, have unique division, although Silberstein did define a biquaternion "inverse."x

By this device, Silberstein was able to express the Lorentz transformation in the form  q' = QqQ, where q is in frame S and q' is its equivalent in frame S'Q is a complex versor (i.e.  TQ = 1); further, the coefficient of  SQ  is real, and the three coefficients of  VQ  are imaginary.  Thus  Q = cos θ + v sin θ  for imaginary unit vector v and angle θ.  The resulting transformation is a rotation in Minkowski space-time by angle  2θ √−1.  Compare this with the general Euclidean three-space rotation mentioned earlier.

A different solution to the problem was given in a 1945 paper by P. A. M. Dirac.43  Dirac submitted that the value of quaternions lies in their special algebraic properties, and that therefore resorting to biquaternions is not productive.y  Restricting himself to real quaternions, he derived the general quaternionic linear transformation  q' = (aq + b)(cq + d)−1, with quaternion coefficients a,b,c,d.z

He used this equation to describe a transformation in five-dimensional projective space, and restricted it to describe the Lorentz group.  He then derived a one-to-one correspondence between the quaternions q and q' in his transformation and space-time vectors, through a comparatively involved set of equations.  Finally, he used his quaternionic transformation to derive a general quaternionic formula for the relativistic addition of velocities.  The elegance of this formula provoked the only non-mathematical comment in the paper, "The quaternion formulation appears to be the most suitable one for expressing generally the law of addition of velocities."

Dirac's treatment is contrary to the traditional usage of quaternions.  Ever since their discovery, much of their perceived value has been in the physical interpretability of quaternionic formulae.  This perception is evident in the following, written to Hamilton by John T. Graves.

There is still something in the system which gravels me.  I have not yet any clear views as to the extent to which we are at liberty arbitrarily to create imaginaries, and endow them with supernatural properties.  You are certainly justified by the event....  but I am glad that you have glimpses on physical analogies.44
Physical interpretability was the quaternionists' main argument against component notation.

Dirac's approach was to set up a correspondence between quaternions and space-time vectors; but the correspondence was unintuitive.  This abandonment of physical interpretation is consistent with the general philosophy of much of modern mathematical physics.  However, it is not the way others have applied quaternions to modern physics.

Papers applying quaternions to modern physics are not as rare as one might suppose, numbering (as nearly as I can determine) at least in the dozens.45  These papers deal with a wide range of topics.  It is not within the scope of the current paper to examine all their areas of application;aa for example, the use of quaternions to describe elementary particles will be omitted.46  For modern quaternionic ideology, two representative examples will be used.

The first example is a 1964 paper, "Quaternions in Relativity," by Peter Rastall.47  The paper begins with a brief historical account of quaternionic application to relativity, along with commentary on why quaternions have been (up to 1964) repeatedly passed by in favor of other  His own interest in quaternions is for field equations in curved (Riemannian) space-time.  He argues that for this general case, neither matrix nor spinor notation yields any clear physical interpretation because neither can easily be understood in terms of tetrads of real coordinates  (x,y,z,t).

Quaternions, by which he means complex or bi- quaternions, are to provide this physical interpretability.  He uses Silberstein's form for Lorentz transformations in flat space-time.  (Recall that Silberstein's quaternions have only four non-zero real coefficients, corresponding directly to coordinates  x,y,z,t.)  He describes Riemannian space-time using tetrad formalism, i.e. with each tetrad of event coordinates he associates a set of four axes, which need not be orthogonal.  By combining these tools he then derives his general field equation.

It is important to such quaternionic treatments of relativity that a quaternionic equivalent is possible for any matrix formula, and vice 48  The most basic quaternion-matrix equivalence, demonstrated by C. S. Peirce in 1881, is between a real quaternion  w +  +  +   (imaginaries α, β, γ) and a  2 × 2  complex matrix

w+ix    y+iz
y+iz    wix
 =  w 
1    0
0    1
 +  x 
i    0
0    i
 +  y 
0    1
−1    0
 +  z 
0    i
i    0
The four matrices equivalent to 1, α, β, γ are essentially the Pauli spin matrices.

The second example consists of two papers written in 1962–63 by a group of physicists on quaternion quantum mechanics.dd  These papers take advantage (respectively) of two basic properties of quaternions: their close ties to 3-/4- vector spaces, and their lack of commutativity.

The first paper presents the fundamentals of the theory.49  Their starting point is that

a propositional calculus exists that we can call general quantum mechanics (as distinguished from complex quantum mechanics) in as much as no number system or vector space at all is assumed in its formulation....

It is always possible to represent the pure states of a system of "general quantum mechanics" by rays in a vector space in a one-to-one manner...
and that the only number system over which this can be done for all such systems is 𝓠, the quaternions.  They suggest that while real and complex quantum mechanics are very similar, "quaternion quantum mechanics has many new features that make it much richer."

The second paper capitalizes on a difficulty that arises from the non-commutativity of 50  Multiple mathematical descriptions arise that should be equivalent.  By postulating invariance of the physical laws over these different descriptions, they arrive at a new field of which electromagnetism is a special case.

Perhaps the most extensive example of quaternionic application in the twentieth century (if not for all time) is the work of Otto Fischer.  Fischer was a Swedish civil engineer who became interested in quaternions in the years before World War II.  In the 1950's he published two substantial books on the application of quaternions.51

In Universal Mechanics and Hamilton's Quaternions (I have not had access to his other book), Fischer set himself a rather ambitious goal.

This is a book written by a civil engineer on universal mechanics with an attempt to introduce a certain order in its mathematical structure by means of Hamilton's Quaternions.  The term "universal mechanics" refers to the mathematics of ordinary physics of motions, elasticity, hydrodynamics, aerodynamics, electromagnetism, together with relativistic and cosmic physics as well as quantum mechanics.
Fischer's aim is to create a close correspondence between concepts and explicit mathematical structure.  In pursuing this goal, he correlates several types of mathematical structural hierarchies to branching specialities within universal mechanics.  One such hierarchy is the "potential pyramid," which expands by repeated differentiation from an 'apex' potential.  Closely related is an "affinor pyramid",ff also formed by repeated differentiation.  The potential pyramid is static, consisting of functions of simple, quadric or double quadric quaternions,gg while an affinor pyramid consists of operators on such functions, and is therefor dynamic, taking its shape from the function to which it is applied.

The three different types of quaternions are the basis of Fischer's other major unifying structure.  After introducing and doing much work with real quaternions in imaginaries i1, i2, i3, he proceeds to quadric quaternions by introducing "superdirections" ï, j1, j2, j3 that correspond roughly to different specialities in physics.  Ultimately he applies this technique in the more general case of double quadric quaternions to his goal of unifying universal mechanics.hh

Fischer appears to have been widely competent in the specialities of "universal mechanics."  He surely did not, however, have a talent for presentation.  The superficial appearance of the Universal Mechanics is that of the numerology of the 'crackpot fringe.'

Fischer was part of what has often been called the "Cult of Quaternions"52 — a tradition of enthusiastic devotees that began with Hamilton and continues to the present day.ii  In 1910, C. S. Peirce wrote of his brother James Mills Peirce, who had been a noted quaternionist until his death in 1906, that he "remained to his dying day a superstitious worshipper of two hostile gods, Hamilton and the scalar  √−1."  An earlier reference to the 'cult' is found in the chapter on vector analysis in Heaviside's Electromagnetic Theory.

"Quaternion" was, I think, defined by an American schoolgirl to be "an ancient religious ceremony."  This was, however, a complete mistake.  The ancients — unlike Prof. Tait — knew not, and did not worship Quaternions.53

The members of the 'cult' who have been mentioned in this paper all exhibited rational reasons for their enthusiasm.  Yet, their reputation as a semi-religious community is not entirely unsupported in their own writings.  A comparatively recent example is E. T. Whittaker's statement (1940) on quaternionic research after the turn of the century, that "the good work went on."jj In 1871 Maxwell had observed, with no sarcasm intended, that "The unbelievers are rampant."54

Throughout this paper, the motivating philosophies of applied quaternionists have been noted.  Definite trends are visible in these philosophies.

From their discovery, the claims of quaternions for physical application have been interpretability and utility.  Their interpretation is certainly an improvement over component notation; nevertheless, as F. D. Murnaghan has observed, to the general public quaternions were the archetype of a baffling abstract theory until they were supplanted in this role by Einstein's General Theory of Relativity.56  The quaternionists simply continued to assert that quaternions are meaningful, and as a tactic against component notation this was sufficient.

Against vector analysis, however, the tactic failed.  Having extracted the utilitarian part of quaternion analysis, the vectorists discarded the remainder and branded it meaningless.  The result was that the quaternionists were left clinging forlornly to their claims of interpretability while practical-minded physicists flocked to vector analysis.

In the twentieth century, this parting of the ways has led to a peculiar twist of fate.  Following their vectorist and anti-quaternionist traditions, physicists have adopted the utilitarian notation of matrices.  This has led them away from interpretability, and ultimately even away from utility into increasing abstraction.  Quaternionists have continued to emphasize interpretability, finally coming to be its uncontested claimants; but, being an acknowledged fringe group, have had no general acceptance to date.

Rastall's paper of 1964 concludes with the following paragraph.  It illustrates clearly the underlying ideology of the modern physical application of quaternions.  It is also a fitting note on which to end the current paper on quaternions.

The movement towards abstract algebraic and coordinate-independent formulations of physical theories, and away from particular matrix representations and special coordinate systems, is an increasingly popular one, and our work is in accord with it.  Less popular, and seemingly opposed to this rarefied mathematical spirit, is our desire to make abstract concepts more concrete and imaginable.  To pure mathematical minds the aim is unsympathetic.  They are happy in their complex spaces, and would prefer to postulate an affine connection rather than to align tetrad vectors.  It is a matter of taste.  Those, however, who are prepared to exploit the accident of having been born in space-time may find this paper useful.

  • [a] This need increased greatly with the development of electromagnetic theory in the last third of the century.6
  • [b] The six are Wessel (1799) and Gauss (about the same time), Argand and Buée (1806), and Mourey and Warren (1828).
  • [c] Almost everyone looked for a superset of the complex numbers.  An exception was Servois, who came close to quaternions in 1815.12
  • [d] In modern terms, the Law of the Norms says that the norm (the sum of the squares of the real coefficients of the terms) should be a modulus of multiplication.  A modulus of multiplication is a function  M(x)  such that  M(a) * M(b) = M(a*b).  The Law of the Norms holds for real and complex numbers.
  • [e] In 1844 Augustus de Morgan presented a number of triple algebras that do not satisfy the Law of the Norms but do have moduli of multiplication.  Given the distributive law, the Law of the Norms and the uniqueness of division are equivalent, so naturally de Morgan's triple algebras do not have unique division either.
  • [f] If the associative law is also dropped, only one further algebra, Cayley's octonions, is possible.13
  • [g] 'Scalar,' 'vector' and 'tensor' seem to be the only three of Hamilton's original quaternionic terms that have developed non-quaternionic usages.  The modern meaning of 'tensor' is completely different from Hamilton's.
  • [h] Obviously, v is indeterminate for angles 0 and π.  In the following description of quaternions, special cases such as this are generally omitted.
  • [i] u,v,n always have the same sense. If a right-handed coordinate system is used, u,v,n form a dextral set.  Hamilton used a left-handed system, while Tait, Maxwell, Gibbs all used a right-handed system.18
  • [j] Also implied in Grassman's calculus of extension are matrix theory and modern tensor analysis.20
  • [k] Remember, this is the same book that so effectively stymied Herschel.
  • [l] This operator was apparently written as ᐊ by Hamilton but as ᐁ by Tait.  In Tait's form it has been variously given the names nabla (after a visually similar Assyrian harp), del, and atled (delta spelled backwards).24
  • [m] In replacing  V ∇  and  S ∇  by  ∇×  and  ∇·  (Gibbs' notation), it is also necessary to redefine ∇ itself from quaternionic operator  iddx + jddy + kddz  to vectorial operator  ı ⃗ x + ȷ ⃗ y + k⃗ z  a subtle and profound (not to say confusing) change that may serve to suggest the size of the intellectual gulf between quaternion analysis and vector analysis.
  • [n] There is, nevertheless, substantial evidence that Gibbs' system was inspired by quaternions, and not by the Ausdehnungslehre.23
  • [o] 'Infix notation' means that the operator symbol appears between the operands, as in  u × v.  'Prefix notation' means that the operator appears before the operands, as in  Vuv.  Prefix notation was later rediscovered by Jan Łukasiewicz,27 and is now generally called Polish notation.
  • [p] The two sources used for the following discussion of the 1890's' controversy both handle the debate on a roughly chronological paper-by-paper basis.  My discussion is a substantially different organization of the material.
  • [q] Gibbs provided the initiative that led to the publication of Grassman's collected works.28
  • [r] In a letter to Nature of January 1893, Tait wrote:
    I found that I should not only have to unlearn quaternions (in whose disfavor much is said) but also to learn a new and most uncouth parody of notations long familiar to me....  There I was content to leave the matter....  Dr. Knott [Cargill Gilston Knott, a former student of Tait and a staunch quaternionist] has actually had the courage to read the pamphlets of Gibbs and Heaviside; and, after an arduous journey through trackless jungles, has emerged a more resolute supporter of Quaternions than when he entered.29
  • [s] This lack of acceptance was exaggerated.  Actually, 168 works were published in the quaternion tradition in the 1890's.30
  • [t] There was only one paper in the debate that argued against the use of all vectorial systems.  This was written by Arthur Cayley in 1894.  It named quaternions in particular, and was answered by Tait.26k
  • [u] In his response to Tait's preface, Gibbs' wrote,
    The criticism relates particularly to notations, but I believe that there is a deeper issue of notions underlying that of notations.  Indeed, if my offense had been solely in the matter of notation, it would have been less accurate to describe my production as a monstrosity, than to characterize its dress as uncouth.
  • [v] Of course, quaternions didn't vanish overnight.  They were in the final stages of disappearance in about 1910;36 and, as will be seen, they have never vanished entirely.
  • [w] Cayley considered this result in a purely geometrical context — i.e. he didn't have time as a fourth dimension in mind.  Note that a special case of this formula is when the rotation is orthogonal to the real axis; then  v = u−1, giving the general three-dimensional rotation described earlier.
  • [x] For a real quaternion q, the inverse is often defined as  q−1 = Kq/Nq, where  Kq  is the complement and  Nq  the norm of q.15  Silberstein simply carried this definition over to biquaternions.
  • [y] Taken in the context of 1945, Dirac's dismissal of biquaternions is reasonable.  However, in the context of modern theoretical physics, which favors such rarified creatures as Clifford and Lie algebras, biquaternions actually do have some interesting properties.
  • [z] The form using left-division is  q' = (qa + b)−1(qc + d), with (of course) different quaternion values for a,b,c,d.
  • [aa] I have not discussed any specific areas of application to classical physics.  It has not been necessary to do so, since the vector-quaternion controversy was independent of both area and method of application.
  • [bb] Discussions of the reception (or rather lack of reception) of quaternions in modern physics appear in most of the modern works I have examined.  Rastall's observations are interesting; but his historical interpretations do not take into account the vectorist and anti-quaternionist traditions.
  • [cc] According to A. W. Conway, this equivalence is itself equivalent to wave-particle duality.
  • [dd] These papers appeared in Journal of Mathematical Physics.  I find no subsequent papers in that journal by this group.
  • [ee] The trouble with the non-commutativity of 𝓠 is that there is no unique tensor product of the Hilbert space  𝓗𝓠  with itself.
  • [ff] Fischer explains this terminology on page 4 of the Universal Mechanics:
    It is more common in literature to use the term "tensor" for the general non-commutative "affinor" and speak of symmetric and antisymmetric tensors instead of tensors and axiators.  But Spielreins terms apparently are more elucidative.
  • [gg] Quadric quaternions are quaternions whose four coefficients are themselves real quaternions; equivalently they are sums and products of quaternions in two independent sets of imaginaries.  They have 16 real coefficients each.  Double quadric quaternions are quadric quaternions whose 16 coefficients are themselves quadric quaternions, or equivalently sums and products of quaternions in four independent sets of imaginaries.  Double quadric quaternions have 256 real coefficients each.
  • [hh] In gathering the above high-level description of Fischer's method I have made repeated forays into the Universal Mechanics.  I found the book extremely difficult to read.  It is highly concentrated and moves very rapidly, which is natural considering that it covers a very large quantity of material.  This is compounded by the erroneous omission from the printing of scattered pages from the first two chapters of the book.
  • [ii] The most recent representative of the 'cult' in my bibliography is James D. Edmonds (1974).
  • [jj] Whittaker qualifies as a 'cultist' because of a passage in the same article (which appeared in the Mathematical Gazette):
    ... those who were in the outer circles of Hamilton's influence — e.g. Willard Gibbs in America and Heaviside in England — wasted their energies in devising bastard derivatives of the quaternion calculus...
    This editorial comment resulted in a brief correspondence in the Mathematical Gazette between Whittaker and E. A. Milne in the spirit of the controversy of the 1890's.55

  • [1] Cf. Michael J. Crowe, A History of Vector Analysis (Notre Dame: University of Notre Dame Press, 1967), p. 1.
  • [2] Crowe, pp. 127–128.
  • [3] On the mathematical ancestry of quaternions, see Morris Kline, Mathematical Thought from Ancient Through Modern Times (New York: Oxford University Press, 1972), pp. 772–779.  On extensions to the concept of number prior to 1800, see E. T. Bell, Development of Mathematics, 2nd ed. (New York: McGraw-Hill, 1945), pp. 172–178.
  • [4] Quoted in Kline, p. 773.
  • [5] Crowe, pp. 5–11.
  • [6] This is remarked on by Crowe, p. 220.
  • [7] Crowe, pp. 23–27.
  • [8] On Hamilton's attempts, see Crowe pp. 26–28; on other attempts, see [5].
  • [9] Edmund T. Whittaker, "The Sequence of Ideas in the Discovery of Quaternions," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 97–98.
  • [10] Encyclopedia Britannica, 11th ed., s.v. "Quaternions," by Alexander McAulay, p. 720.  The relevant part of the article is the historical profile, which is taken from the corresponding article in the 9th edition, by Peter Guthrie Tait.
  • [11] William Rowan Hamilton, "Quaternions," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 89–92.  This is the first publication of some notes made by Hamilton on the day of his discovery of quaternions.
  • [12] See Crowe, p. 10.
  • [13] Kenneth O. May, "The Impossibility of a Division Algebra of Vectors in Three Dimensional Space," American Mathematical Monthly 73 (1966): 289–291.  On Cayley's octonions, see Kline p. 792.
  • [14] On 'scalar' and 'vector,' see Crowe pp. 31–32.  On 'tensor' and 'versor,' see Felix Klein, Elementary Mathematics from an Advanced Standpoint (New York: Dover Publications, 1945), p. 138.  For some examples of others of Hamilton's terms, see Crowe p. 36.
  • [15] The basic properties of quaternions are taken from Louis Brand, Vector and Tensor Analysis (New York: John Wiley & Sons, 1947).  Brand devotes the last chapter (chapter X, pp. 403–429) of his book to quaternions.
  • [16] Crowe, pp. 34–35.
  • [17] Crowe, pp. 54–96.
  • [18] Crowe, p. 155.
  • [19] On Hamilton's Lectures and Elements, see Crowe pp. 36–41.
  • [20] Bell, pp. 200, 204.
  • [21] On Tait's quaternionic work see Crowe pp. 117–125.
  • [22] On Maxwell's use of quaternions, see Crowe pp. 127–139.
  • [23] On the development of Gibbs' and Heaviside's systems of vector analysis, see Crowe pp. 150–177.
  • [24] See Crowe pp. 124, 146.
  • [25] The 1890's' controversy is described in some detail in Crowe pp. 182–224 (chapter 6).  Much of the controversy is also covered in Alfred M. Bork, " 'Vectors versus Quaternions' — The Letters in Nature," American Journal of Physics 34 (1966): 202–211.  Crowe's treatment is more comprehensive; however, Bork goes into more detail on the contents of the articles he discusses, and makes frequent use of quotations.
  • [26] General reference notes for specific papers in the controversy are ordered chronologically, and indexed by letter under number 26 (hence notes 26a–26k).  References particularly to one or the other secondary source are numbered separately.
    • [26a] Tait, Philosophical Magazine, January 1890, and the preface to his Elementary Treatise on Quaternions, 1890 edition.  See Crowe pp. 183–185 and (less) Bork pp. 202–203.
    • [26b] Gibbs, Nature, 2 April 1891.  See Crowe pp. 185–186 and Bork p. 203.
    • [26c] Tait, Nature, 30 April 1891.  See Crowe pp. 186–187 and Bork pp. 203–204.
    • [26d] Macfarlane, Proceedings of the American Association for the Advancement of Science, published in July 1892.  See Crowe pp. 190–191 and Bork p. 205.
    • [26e] Knott, Proceedings of the Royal Society of Edinburgh, read 19 December 1892.  Crowe pp. 201–203 and Bork p. 207.
    • [26f] Peddie, Proceedings of the Royal Society of Edinburgh, read 10 February 1893.  See Crowe pp. 208–209.
    • [26g] Gibbs, Nature, 16 March 1893.  See Crowe pp. 198–200 and Bork p. 206.
    • [26h] Macfarlane, Nature, 25 May 1893.  See Crowe pp. 203–204 and Bork p. 207.
    • [26i] Gibbs, Nature, 17 August 1893.  See Crowe pp. 204–205 and Bork p. 208.
    • [26j] McAulay, Utility of Quaternions in Physics, 1893.  See Crowe pp. 194–195.
    • [26k] Arthur Cayley, "Coordinates versus Quaternions," and Tait, "On the Intrinsic Nature of the Quaternion Method," both read before the Royal Society of Edinburgh on 2 July 1894.  See Crowe pp. 211–215.
  • [27] Bork, p. 204.
  • [28] Crowe, p. 161.
  • [29] Bork, p. 206.
  • [30] Crowe, p. 111.  The supposed slowness of acceptance is discussed in Crowe pp. 219–220.
  • [31] Crowe, p. 208.
  • [32] A brief account of the history of the International Society is given in Hubert Kennedy, "James Mills Peirce and the Cult of Quaternions," Historia Mathematica 6 (1979): 425–426.
  • [33] Crowe, p. 123.
  • [34] Kline, p. 778.  These priorities are evident in much of his work.
  • [35] On Hamilton's metaphysics, see Thomas L. Hankins, "Triplets and Triads: Sir William Rowan Hamilton on the Metaphysics of Mathematics."  Isis 68 (1977): 175–193.
  • [36] Crowe, p. 240.
  • [37] Alfred M. Bork, "The Fourth Dimension in Nineteenth Century Physics," Isis 55 (1964): 328–330.
  • [38] The article in which he wrote this is mentioned in ibid., p. 338.  The original article is Shunkichi Kimura, "On the Nabla of Quaternions," Annals of Mathematics, 10 (1896): 127–155.
  • [39] James D. Edmonds, "Quaternion Quantum Theory: New Physics or Number Mysticism?" American Journal of Physics 42 (1974): 221.  Edmonds derives his information from a 1914 book by Ludwig Silberstein.  These same sentiments are attributed to Minkowski in Otto F. Fischer, "Hamilton's Quaternions and Minkowski's Potentials," Philosophical Magazine (7) 27 (1939): 375.  Fischer does not identify the source of his information.
  • [40] Ludwig Silberstein, "Quaternionic Form of Relativity," Philosophical Magazine 23 (1912): 790.
  • [41] This observation is made in P. A. M. Dirac, "Application of Quaternions to Lorentz Transformations," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 261.
  • [42] Silberstein, pp. 790–809.
  • [43] Dirac, pp. 261–270.
  • [44] Crowe, p. 34.
  • [45] For a list of such papers, see Edmonds, p. 220.
  • [46] For references on this topic, see David Finkelstein et al., "Foundations of Quaternion Quantum Theory," Journal of Mathematical Physics 3 (1962): 217, and Peter Rastall, "Quaternions in Relativity," Reviews of Modern Physics 36 (1964): 820.
  • [47] Rastall, pp. 820–832.
  • [48] A. W. Conway, "Quaternions and Matrices," Proceedings of the Royal Irish Academy 50 (1945) sect. A: 98–103.  On the generality of quaternions, see also William Kingdon Clifford, "Applications of Grassman's Extensive Algebra," American Journal of Mathematics 1 (1878): 350–358.
  • [49] Finkelstein et al., "Foundations," pp. 207–220.
  • [50] David Finkelstein et al., "Principle of General Q Covariance," Journal of Mathematical Physics 4 (1963): 788–796.
  • [51] Crowe, pp. 254–255.  The two books are Universal Mechanics and Hamilton's Quaternions, 1951, and Five Mathematical Structural Models in Natural Philosophy with Technical Physical Quaternions, 1957.  I have not seen the latter book.  It may be relevant that Silberstein used the name "physical quaternions" for his specialized biquaternions corresponding to space-time vectors.
  • [52] For example, the title of Kennedy's paper cited in [32].
  • [53] Crowe, p. 171.
  • [54] Crowe, p. 133.
  • [55] Edmund T. Whittaker, "The Hamiltonian Revival," Mathematical Gazette 24 (1940): 153–158.  The associated correspondence appears in Mathematical Gazette 25 (1941): 106–108 and 25 (1941): 298–300.
  • [56] F. D. Murnaghan, "An Elementary Presentation of the Theory of Quaternions," Scripta Mathematica 10 (1944): 37.

  • Bell, E. T.  The Development of Mathematics.  2nd ed.  New York: McGraw-Hill Book Company, Inc., 1945.
  • Bork, Alfred M.  "The Fourth Dimension in Nineteenth-Century Physics."  Isis 55 (1964): 326–338.
  • —.  " 'Vectors versus Quaternions' — the Letters in Nature."  American Journal of Physics 34 (1966): 202–211.
  • Brand, Louis.  Vector and Tensor Analysis.  New York: John Wiley & Sons, Inc., 1947.
  • Clifford, William Kingdon.  "Applications of Grassman's Extensive Algebra."  American Journal of Mathematics 1 (1878): 350–358.
  • Conway, A. W.  "Quaternions and Matrices."  Proceedings of the Royal Irish Academy 50 (1945) sect. A: 98–103.
  • Crowe, Michael J.  A History of Vector Analysis: the Evolution of the Idea of a Vectorial System.  Notre Dame: University of Notre Dame Press, 1967.
  • Dirac, P. A. M.  "Application of Quaternions to Lorentz Transformations." Proceedings of the Royal Irish Academy 50 (1945) sect. A: 261–270.
  • Edmonds, James D.  "Quaternion Quantum Theory: New Physics or Number Mysticism?"  American Journal of Physics 42 (1974): 220–223.
  • Finkelstein, David; Jauch, Josef M.; Schiminovich, Samuel; and Speiser, David.  "Foundations of Quaternion Quantum Mechanics."  Journal of Mathematical Physics 3 (1962): 207–220.
  • —.  "Principle of General Q Covariance."  Journal of Mathematical Physics 4 (1963): 788–796.
  • Fischer, Otto F.  "Hamilton's Quaternions and Minkowski's Potentials."  Philosophical Magazine (7) 27 (Jan.–June 1939): 375–385.
  • —.  Universal Mechanics and Hamilton's Quaternions. Stockholm: Axion Institute, 1951.
  • Hamilton, William Rowan.  "Quaternions."  Proceedings of the Royal Irish Academy 50 (1945) sect. A: 89–92.
  • Hankins, Thomas L.  "Triplets and Triads: Sir William Rowan Hamilton on the Metaphysics of Science."  Isis 68 (1977): 175–193.
  • Kennedy, Hubert.  "James Mills Peirce and the Cult of Quaternions." Historia Mathematica 6 (1979): 423–429.
  • Kimura, Shunkichi.  "The Nabla of Quaternions."  Annals of Mathematics 10 (1896): 127–155.
  • Klein, Felix.  Elementary Mathematics from an Advanced Standpoint. Translated by E. R. Hedrick and C. A. Noble.  New York:  Dover Publications, 1945.
  • Kline, Morris.  Mathematical Thought from Ancient through Modern Times.  New York: Oxford University Press, 1972.
  • McAulay, Alexander.  "Quaternions."  Encyclopedia Britannica. 11th ed.  1911.
  • May, Kenneth O.  "The Impossibility of a Division Algebra of Vectors in Three Dimensional Space."  American Mathematical Monthly 73 (1966): 289–291.
  • Murnaghan, Francis D.  "An Elementary Presentation of the Theory of Quaternions."  Scripta Mathematica 10 (1944): 37–49.
  • Rastall, Peter.  "Quaternions in Relativity."  Reviews of Modern Physics 36 (1964): 820–832.
  • Silberstein, Ludwig.  "Quaternionic Form of Relativity." Philosophical Magazine (6) 23 (Jan.–June 1912): 790–809.
  • Whittaker, Edmund T.  "The Hamiltonian Revival."  Mathematical Gazette 24 (1940): 153–158.
  • —.  "The Sequence of Ideas in the Discovery of Quaternions."  Proceedings of the Royal Irish Academy 50 (1945) sect. A: 93–98.