Structural insight: 2012

Wednesday, December 26, 2012

Metaclassical physics

Nature shows us only the tail of the lion. But there is no doubt in my mind that the lion belongs with it even if he cannot reveal himself to the eye all at once because of his huge dimension.
— Albert Einstein

What theoretical physics needs, I've long believed, is to violate some assumption that is shared in common by both classical physics and quantum mechanics. Everyone nowadays seems to understand that something different is needed, but I suspect the "radical" new theories I've heard of aren't radical enough.

So in this post I'm going to suggest a class of physical theory that seems to me to be radical enough (or if, you prefer, weird enough) to shake things up.

Indirect evidence of a wrong assumption

Suppose you're trying to find the best way to structure your description of something. (Examples: choosing the structure for a computer program to perform some task; or choosing the structure for a theory of physics.)

What you hope to find is the natural structure of what you're describing — a structure that affords a really beautiful, simple description. When you strike the natural structure, a sort of resonance occurs, in which various subsidiary problems you may have had with your description just melt away, and the description practically writes itself.

But here's a problem that often occurs: You've got a structure that affords a pleasing approximate description. But as you try to tweak the description for greater precision, instead of the description getting simpler, as it should if you were really fine-tuning near the natural structure, instead the description gets more and more complicated. What has happened, I suggest, is that you've got a local optimum in solution space, instead of the global optimum of the natural structure: small changes in the structure won't work as well as the earlier approximation, and may not work at all, so fundamental improvement would require a large change to the structure.

I suggest that physics over the past century-and-change has experienced just such a phenomenon. Classical physics was pleasing, but an approximation. Our attempts to come closer to reality gave us quantum mechanics, which has hideously ugly mathematics. And our attempts to improve QM... sigh.

You'll find advocates of these modern theories, and before them advocates of QM, gushing about how beautiful the math is, but frankly I find this wishful thinking. They focus on part of the math that's beautiful, and try to pretend the ugliness out of existence. Ignoring the elephant in the room. In the case of QM, the elephant is commonly called "observation", and in more formal social situations, "wave function collapse".

But it would be a mistake to focus too much on the messiness of QM math. If physics is stuck in a local optimum, we need to look foremost at big things that classical and quantum have in common, rather than getting too focused on details by which they contrast.

The saga of determinism and locality

Two really big ideas that have been much considered, in the contrast between classical and quantum physics, are determinism and locality.

In the classical view of reality, there are three dimensions of space and one dimension of time. In space, there are point particles, and space-filling fields. The particles move continuously through space over time, each tracing out a one-dimensional curve in four-space. The fields propagate in waves over time. The model is deterministic because, in principle, the state of everything at one moment in time completely determines the state of everything at all later moments in time. The model is local because neither particles nor waves travel faster than a certain maximum speed (the speed of light).

QM depicts nature as being fundamentally nondeterministic. Einstein didn't approve of that (it apparently offended his sense of the rhyming scheme of nature, as I've remarked before). God does not play dice, he said.

It's important to realize that Einstein was personally responsible, through his theory of special relativity, for making classical physics more local. Prior to relativity, classical physics did not prohibit things from moving arbitrarily fast; consequently, in considering what would happen to a given volume of space in a given interval of time, there was always the chance that by the end of the interval, some really fast particle might come zooming through the volume that had been on the other side of the universe at the beginning of the interval.

This relation between Einstein and locality helps us appreciate why Einstein, in attempting to demonstrate that quantum mechanics is flawed, constructed with two of his colleagues the EPR paradox showing that QM requires information to propagate across space faster than light. That is, in an attempt to discredit nondeterminism he reasoned that quantum nondeterminism implies non-locality, and since non-locality is obviously absurd, quantum nondeterminism must be wrong.

Perhaps you can see where this is going. Instead of discrediting nondeterminism, he ultimately contributed to discrediting locality.

Okay, let's back up a few years, to 1932. As an alternative to quantum nondeterminism, Einstein was interested in hidden variable theories. A hidden variable theory says that the state of reality is described by some sort of variables that evolve deterministically over time, but these underlying variables are fundamentally unobservable, so that the nondeterministic quantum world is merely our statistical knowledge about the hidden deterministic reality. In 1932, John von Neumann proved, formally, that no hidden variable theory can produce all exactly the same predictions as QM. (All hidden variable theories are experimentally distinguishable from QM.)

This is an example of a no-go theorem, a formal proof that a certain kind of theory cannot work. Often, the most interesting thing about a (correct) no-go theorem is its premise — precisely what it shows to be impossible. Because twenty years later, in 1952, David Bohm published a hidden variable theory experimentally indistinguishable from QM. The hidden variable theory was correct. The no-go theorem was correct. We can therefore deduce that what Bohm did must be different from what von Neumann proved impossible.

And so it was. On careful examination, von Neumann's proof assumes the hidden variable theory is local. Bohm's hidden variable theory has a quantum potential field that can propagate arbitrarily fast, so Bohm's theory is non-local. Einstein remarked, "This is not at all what I had in mind."

Now we come to Bell's Theorem. Published in 1964 (nine years after Einstein's death), this was another no-go theorem, based on a refinement of the EPR experiment. Bell showed that the particular probability distribution predicted by QM, in that experiment, could not possibly be produced by a deterministic local hidden variable theory.

Okay. Here again, maybe you can see already where I'm going with this. I'm preparing to propose a particular way of violating an assumption in the premise of Bell's Theorem. This particular violation may allow construction of a theory that isn't, to my knowledge, what any of the above cast of characters had in mind, but that might nevertheless be plausibly called a deterministic local hidden variable theory.

Changing history

The probability distribution Bell was considering, the one that couldn't be produced by a deterministic local hidden variable theory, has to do with the correlation between observations at two distant detectors, where both observations are based on a generating event that occurred earlier in time, at a point in space in between the detectors.

And one day some years ago, reading about all this, it occurred to me that if you think of these three events —two observations and one generation— as just three points between which signals can be sent back and forth, it's really easy to set up a simple mathematical model in which if you start with a uniform probability distribution, set the system going, and let the signals bounce back and forth until they reach a steady state, the probability distribution of the final state of the system will be exactly what QM predicts. This idea is somewhat reminiscent of a modern development in QM called the transactional interpretation (different, but reminiscent).

The math of this is really easy; it doesn't involve anything more complicated than a dot product of vectors. Wait, propagating back and forth in time? What does that even mean?

There are a lot of really badly thought-out depictions of time travel in modern science fiction. For which I'm sort-of grateful, because over the years I've been annoyed by them, and therefore thought about what was wrong with them, and thereby honed my thinking about time travel.

It seems to me the big problem with the idea of changing history is, what does "change history" mean? In order for something to change, it has to change relative to some other dimension. If a board is badly milled, its thickness may vary (change) along the length of the board, meaning its thickness depends on how far along its length you measure. The apparent magnitude of a star may vary with distance from the star. The position of a moving train varies over time. But if history changes, relative to what other dimension does it change? It isn't changing relative to any of the four dimensions of spacetime.

Let's suppose there is a fifth dimension, relative to which the entire four-dimensional spacetime continuum can change. As a simple name, let's call it "meta-time". This would, of course, raise lots of metaphysical questions; a favorite of mine is, if there's a fifth dimension of meta-time, why not a sixth of meta-meta-time, seventh of meta-meta-meta-time, and so proceed ad infinitum? Though fun to muse on, those sorts of questions aren't needed right now; just suppose for a moment there's meta-time, and let's see where it leads.

While we're at it, let's suppose this five-dimensional model is deterministic in the sense that, in principle, the state of spacetime at one moment in meta-time completely determines the state of spacetime at all later moments in meta-time. And let's also suppose the five-dimensional model is local in the sense that changes to spacetime (whatever they are) propagate, over meta-time, at some maximum rate. (So if you hop in your TARDIS and make a change to history on a particular day in 1963 London, that change to history propagates outward in space at, say, no more than 300,000km per meta-second, and propagates forward and backward in time at no more than one second per meta-second.)

That bit of math I mentioned earlier, in which the QM probability distribution of Bell's Theorem is reproduced? That can be made to use just this kind of model — a five-dimensional system, with three dimensions of space, one of time, and one of meta-time, with determinism and locality relative to meta-time. Granted, it's only a toy: it's nothing like a serious attempt to model reality with any generality at all, just a one-off model describing the particular experimental set-up of Bell's Theorem.

I've got one more suggestion to make, though. And I still won't have a full-blown theory, such as Bohm had (there's stuff Bohm's theory didn't include, but it did have some generality to it), but imho this last point is worth the price of admission. I wouldn't call it "compelling", because atm this is all too outre to be compelling, but I for one found it... remarkable. When I first saw it, it actually made me laugh out loud.

Metaclassical physics

Wondering what a full-blown theory of physics of this sort might look like, I tried to envision what sorts of things would inhabit this five-dimensional model.

In classical physics, as remarked, space contains point particles interacting with fields. And when you add in time, those things that used to look like point particles appear instead as one-dimensional curves, tracing the motion of the particle through spacetime. I was momentarily perplexed when I tried to add in meta-time. Would the three events in Bell's experiment, two observations and one generation, interact through vibrations in these one-dimensional curves traced through spacetime? Modern string theory does make a big thing out of stuff vibrating. Also, though, a one-dimensional curve vibrating, or otherwise evolving, over meta-time traces out a two-dimensional surface in the five-dimensional space-time-metatime continuum. We set out on this journey hoping to simplify things, hoping ideally to strike on the natural structure of physical reality and achieve resonance (the ring of truth?).

But wait. Why have point particles in space? Point particles in classical physics are nice because of the shape of the theory they produce – but points in space don't produce that shape of theory when they're moving through both time and meta-time. And those one-dimensional curves in spacetime don't play nearly such a central role in QM, notwithstanding they make a sort of cameo appearance in Feynman's path integral formulation.

What is really fundamental to QM is the elephant in the room, the thing that makes such a hideous mess out of QM mathematics: observation, known at up-scale parties as wave function collapse. QM views spacetime as a continuum punctuated by zero-dimensional spacetime events — essentially, observations.

And as spacetime evolves over meta-time, a zero-dimensional spacetime event traces out a one-dimensional curve.

So now, apparently, we have a theory in which a continuum is populated by zero-dimensional points and fields, evolving deterministically over a separate dimension with a maximum rate of propagation. Which is so much like classical physics that (as mentioned) when I saw it I laughed out loud.

Friday, January 27, 2012

Guarded continuations

Go to, let us go down, and there confound their language, that they may not understand one another's speech.

— Genesis 11:7 (King James Version)

I've been asked about Kernel's guarded continuations. In writing that section of the Kernel Report, I knew it wasn't as lucid as one might like, but getting it all down while I had it clearly in mind was something of an achievement in itself. I'm not sure I've ever seen a really lucid explanation of continuations; explaining them should probably be considered a major challenge in its own right, above and beyond figuring out how the mechanism should work. It's high time I tried to do better, though; however many tries it takes to get the explanation to come clear, I won't get there if my number of tries never increases.

On my Kernel web page I describe guarded continuations as 'Exception handling "done right"'. For an ultra-short tag line, that may be about as good as any; but guarded continuations are more a simultaneous blend-and-improvement on Scheme continuations and Java exception handling, rather than a supposedly-optimal exception facility. We may still be several generations of exception facilities away from getting exceptions right; but as may be, I think guarded continuations have something to add to the discussion.

It's also worth saying up front that guarded continuations are lower-level than a good exception facility would be. Kernel tries for very simple low-level facilities plus the ability to build higher-level abstractions from them. It's an open, and fascinating, question how well Kernel's abstraction facilities will actually apply to exception handling — and I hope to learn a lot from that question. (I expect shortcomings of Kernel's abstractive facilities to be not about the facilities' power (which I maintain is extraordinary) but about the directions in which that power can be applied. When I figure out what the missing directions are, I mean to go that way.)

Stacks and trees

Classically, programming languages deal with function calls using a stack. There are two kinds of stuff on the stack: local variables (data), and return addresses (control). The local data for a function call produces a single set of local variables; but the algorithm for that same call may produce many different return addresses, one for each point in the algorithm where it, in turn, calls a function. In principle, one might have two stacks, one for data and one for control; but classically, all local variables and all return addresses are use-once, to be discarded in the same order they were introduced, so it may seem natural to store both on a single stack (with resultant security problems, malware accessing data as control and vice versa).

First-class (lexically scoped) functions require local data to be stored, in general, as a tree rather than a stack. When a first-class function is returned as the result of a function call (in vintage parlance, as an upward funarg), lexical scoping expects that function to have access to —in general— the entire column of variables that were on the stack when the function was created, even though some of the function calls that created those variables have since ended. So data that used to be popped off the stack must now, potentially, persist. A set of local variables, which used to be the data part of a stack frame, is now an environment, with a fixed set of ancestors (variables further up the column) and potentially any number of children (one for each call to any function defined in that environment). (In Kernel, btw, environments are first-class objects; there was a lively LtU discussion on first-class environments, here.)

First-class continuations do to the control stack what first-class functions did to the data stack: they turn it into a tree. (First-class continuations do this to the data stack too, but I'm not aware of any language that has first-class continuations without first-class functions.) Just as first-class functions required entire columns of local variables to persist after the ends of function calls that created them, first-class continuations require entire columns of "return addresses" to persist after the ends of function calls that created them.

In visualizing the tree of continuations, the actual algorithms of the program hover in the areas beneath the nodes of the tree (possibly, multiple algorithms under a single node). Each child continuation is a socket waiting for a value to be plugged into it, to be chewed on by the algorithm above that child — which may, eventually, terminate and plug a value into its parent continuation. Primitive operations hover beneath the leaves of the tree. Kernel pretty much only needs one non-primitive operation: combination evaluation, which spins off a subtree to evaluate the operator, spins off subtrees as appropriate to evaluate operands, and finally hands the situation over to another algorithm to make the operative call (using for the operative call its parent continuation, in properly tail-recursive fashion).

Guards

When a function f calls another function g, f gives up some of its ability to regulate the future evolution of the computation. The programmer tends to assume that not all such power has been lost — that (1) the call to g won't terminate without control passing back through the point at which f called it, and (2) that particular call to g won't return through that point more than once. In the presence of first-class continuations, though, even these weak assumptions needn't hold. Once f calls g, the dynamic extent of that call might be exited, and/or later reentered any number of times, in ways that bypass f entirely. Perhaps f opens a file, calls g, and then closes the file before relinquishing control to whoever called f; if g terminates bypassing f, the file might not be closed outside the call to f, and if g is reentered bypassing f, the file might not be open inside the call to g.

Scheme addresses this sort of problem by means of dynamic-wind, a device that causes specified actions to be performed whenever a continuation-call tries to enter or leave the dynamic extent of the call to g. In R5RS Scheme (the version Kernel primarily diverged from), there were a couple of problems with dynamic-wind.

Although dynamic-wind allowed something to be done every time the dynamic extent of the call to g was entered, and every time it was exited, these before and after actions were unable to observe (let alone affect) what value was being passed into or out of the dynamic extent, so failing to support context-sensitive regulation of access to/from the dynamic extent. (In Javascript terms <shudder>, before and after are akin to finally rather than catch.)
The R5RS was explicitly baffled by what would happen if continuations were used to pass into or out of the dynamic extents of the calls to before and after — which suggested to me that the whole thing was something of a kludge. The correct behavior in such cases ought to follow naturally from the conceptual framework, and if it didn't follow naturally one had to wonder if the conceptual framework was altogether coherent.

Java exception handling differs from this arrangement in several ways.

Mostly obviously and trivially, it only allows exiting a dynamic extent, not reentering one.
An intercepting catch has access to the value being thrown (unlike the after of dynamic-wind).
A catch selectively intercepts throws, based on the "target" (one might even say, "destination") of the throw. Whereas Lisp cleanly separates the target of an abnormal transfer of control from the value being transferred —the continuation called, versus the value passed to it— Java entangles them in a single thrown object whose type is the "target".
Once a Java exception is caught, that's the end of the exception unless it is explicitly rethrown — in contrast to the after of dynamic-wind, which ordinarily does not disrupt the abnormal transfer of control during which it is called.

Interestingly, one fairly intricate characteristic shared by both Scheme's dynamic-wind and Java's catch is that the target is a node of a tree. This is, perhaps, all the more interesting since the trees are seemingly very different in nature. The Java target is a type in a tree of exception types; the Scheme target is a continuation in a tree of continuations.

Kernel's guarded continuations drew on both these models for inspiration. An applicative guard-continuation allows the construction of a continuation with entry- and exit-guard clauses. Exit-guard clauses selectively intercept continuation-calls from inside to outside the guarded dynamic extent (the branch of the continuation tree descending from the guarded continuation), and are therefore analogous to Java catch clauses, with some adjustments. Entry-guard clauses are symmetrically designed to selectively intercept continuation-calls from outside to inside the guarded extent.

Each guard clause has a selector continuation. Keep in mind that a continuation-call has two ends: it goes from one point in the continuation-tree to another, and in this case we know that one end of the call is inside the guarded extent, while the other end is outside. A continuation-call matches the selector of a clause just if the outside end of the call lies within the tree branch descending from the selector. For an exit-guard clause this means that the called continuation must be in the selector's branch of the tree. For an entry-guard, the starting point of the continuation-call must be within the selector's branch of the tree.

Because entry guards are selected based on the source of the continuation-call, a list of interceptors selected when the continuation-call is initiated will be different, in general, from what one would get by stopping at some intermediate interceptor and then "rethrowing" the call (in Java catch style). This suggests that normal termination of an intercepting guard should continue along the path of the original continuation-call, reserving explicit diversion for when the interceptor doesn't want to follow the original path. Secondarily, although we do want it to be convenient (as well as possible) for an interceptor to continue along the path of the original continuation-call, we don't really want to give the interceptor first-class continuations for either end of the call (source or destination). We may have built some abstract device on top of the continuation primitives, and as a result, the source/destination continuations may be internal details that really oughtn't be casually spread about.

Therefore, normal termination of an intercepting guard continues along the path of the continuation-call, and the interceptor when called is provided with an explicit continuation for diverting control to just outside the guarded extent (Java-style).

For versatile guard selection, each guarded gateway (entry or exit for a dynamic extent) has a list of guard clauses. When a continuation-call goes through the gate (entering or exiting), the selectors of these clauses are checked, one at a time, until one is found that matches the call, or until there are no more clauses so that no interception occurs. No more than one clause is selected. A root continuation is provided, which can be used as a selector that matches all possible calls. Using the root continuation, plus the ability for an interceptor to continue along the continuation-call path, one can (for example) arrange that a clause will be selected except when the call matches some narrower selector (for example, the error continuation, which is the ancestor of all evaluation-aborting error handlers). One provides a clause that catches errors and causes them to continue along their path, followed by a clause that catches everything and does whatever it is one wants done only in non-error situations.

When a continuation-call is made, a list of intercepting guard clauses can be immediately constructed, since continuation guards are fixed at the time the guarded continuation is constructed. The list doesn't have to be constructed then, of course, exactly because all the relevant parts of the continuation tree are immutable; but if it isn't constructed immediately, one will have to store the information needed to select the clauses later — which includes the source of the call as well as its destination (destination to select exit guards, source to select entry guards).

The Kernel Report carefully works out subtleties of the device, such that when a guard intercepts an entry/exit to dynamic extent d, and then itself makes a continuation-call, that call does not select any guard on d.

The traditional Scheme device for acquiring a first-class continuation object is call/cc, which calls a procedure and passes to that procedure the continuation to which that call would normally return. Frankly, this was always a very clumsy way to work with continuations; one might almost suspect it was devised as an "esoteric programming language" feature, akin to INTERCAL's COME FROM statement. With guarded continuations, though, dependence on call/cc finally becomes untenable. If I'm working in Java, and I want to create a new class of exception, I do that by adding a subtype to the tree of exception types; to do the same thing in Kernel, it seems I want to add a new continuation to the tree branch of continuations descending from the error continuation. If I had to use call/cc to acquire continuations, I could only add a new descendant of the error continuation if I were actually in the dynamic extent of the error continuation — so, to define a new "type" of error during normal computation, I'd have to first "throw" an error exception, passing through all the exit- and entry-guards to get to the point in the tree where I want to add the new continuation. What's wanted is ability to add a child continuation to an existing continuation without having to be inside the dynamic extent of the parent. Kernel provides just that ability, with extend-continuation (to add some algorithm between child and parent) and guard-continuation (to add guards between child and parent). (Interestingly, call/cc still has to be provided as a primitive, since it can't be derived from the other primitives.)

An illustrative example

Alas, no. Source code for continuation guarding is generally bulky and hard to read, and attempting to illuminate the basic concepts of continuation guarding that way would more likely just create an unhelpful mess. For me, at least, it's all best understood by visualizing the tree of nested dynamic scopes, whereas the source code only directly reveals a tree of nested static syntax; the dynamic tree then has to be deduced, and the two trees don't correspond as closely as one might think.

How best to arrange for a clear illustration, I'll have to give some more thought. Meanwhile, imagine there was an illustration here that made it all crystal clear to you.

Delimited continuations

I've also been asked to discuss guarded continuations versus delimited continuations. <cough> That's a rather large topic. I'd probably have to start by explaining why I think "delimited continuations" is a misnomer — by my reckoning, delimited continuations aren't continuations at all. They're functions, and functions are a completely different sort of creature than continuations. Continuations only have one side (input), which is why it's simply meaningless to talk about composing them. (Can you see I'm working up to a major rant, here?)

So I'm going to end this post here —it's already the longest post I've yet written on this blog— and hope to put together something on delimited continuations later. Perhaps that post will also help to further clarify guarded continuations, but we'll see. It's yet another item for my collection of in-development blog posts (along with guarded environments, another interesting topic this has reminded me of).