## Thursday, July 18, 2013

### Bypassing no-go theorems

This is not at all what I had in mind.
— Albert Einstein, in response to David Bohm's hidden variable theory.

A no-go theorem is a formal proof that a certain kind of theory cannot work.  (The term no-go theorem seems to be used mainly in physics; I find it useful in a more general context.)

A valid no-go theorem identifies a hopeless avenue of research; but in some cases, it also identifies a potentially valuable avenue for research.  This is because in some cases, the no-go theorem is commonly understood more broadly than its actual technical result.  Hence the no-go theorem is actually showing that some specific tactic doesn't work, but is interpreted to mean that some broad strategy doesn't work.  So when you see a no-go theorem that's being given a very broad interpretation, you may do well to ask whether there is, after all, a way to get around the theorem, by achieving what the theorem is informally understood to preclude without doing what the theorem formally precludes.

In this post, I'm going to look at four no-go theorems with broad informal interpretations.  Two of them are in physics; I touch on them briefly here, as examples of the pattern (having explored them in more detail in an earlier post).  A third is in programming-language semantics, where I found myself with a result that bypassed a no-go theorem of Mitchell Wand.  And the fourth is a no-go theorem in logic that I don't actually know quite how to bypass... or even whether it can be bypassed... yet... but I've got some ideas where to look, and it's good fun to have a go at it:  Gödel's Theorem.

John von Neuman's no-go theorem

In 1932, John von Neumann proved that no hidden variable theory can make all exactly the same predictions as quantum mechanics (QM):  all hidden variable theories are experimentally distinguishable from QM.  In 1952, David Bohm published a hidden variable theory experimentally indistinguishable from QM.

How did Bohm bypass von Neumann's no-go theorem?  Simple, really.  (If bypassing a no-go theorem is possible at all, it's likely to be very simple once you see how).  The no-go theorem assumed that the hidden variable theory would be local; that is, that under the theory, the effect of an event in spacetime cannot propagate across spacetime faster than the speed of light.  This was, indeed, a property Einstein wanted out of a hidden variable theory:  no "spooky action at a distance".  But Bohm's hidden variable theory involved a quantum potential field that obeys Schrödinger's Equation — trivially adopting the mathematical infrastructure of quantum mechanics, spooky-action-at-a-distance and all, yet doing it in a way that gave each particle its own unobservable precise position and momentum.  Einstein remarked, "This is not at all what I had in mind."

John Stewart Bell's no-go theorem

In 1964, John Stewart Bell published a proof that all local hidden variable theories are experimentally distinguishable from QM.  For, of course, suitable definition of "local hidden variable theory".  Bell's result can be bypassed by formulating a hidden variable theory in which signals can propagate backwards in time — an approach advocated by the so-called transactional interpretation of QM, and which, as noted in my earlier post on metaclassical physics, admits the possibility of a theory that is still "local" with respect to a fifth dimension of meta-time.

Mitchell Wand's no-go theorem

In 1998, Mitchell Wand published a paper The Theory of Fexprs is Trivial.

The obvious interpretation of the title of the paper is that if you include fexprs in your programming language, the theory of the language will be trivial.  When the paper first came out, I had recently hit on my key insight about how to handle fexprs, around which the Kernel programming language would grow, so naturally I scrutinized Wand's paper very closely to be sure it didn't represent a fundamental threat to what I was doing.  It didn't.  I might put Wand's central result this way:  If a programming language has reflection that makes all computational states observable in the syntactic theory of the language, and if computational states are in one-to-one correspondence with syntactic forms, then the syntactic theory of the language is trivial.  This isn't a problem for Kernel because neither of these conditions holds:  not all computational states are observable, and computational states are not in one-to-one correspondence with syntactic forms.  I could make a case, in fact, that in S-expression Lisp, input syntax represents only data:  computational states cannot be represented using input syntax at all, which means both that the syntactic theory of the language is already trivial on conceptual grounds, and also that the theory of fexprs is not syntactic.

At the time I started writing my dissertation, the best explanation I'd devised for why my theory was nontrivial despite Wand was that Wand did not distinguish between Lisp evaluation and calculus term rewriting, whereas for me Lisp evaluation was only one of several kinds of term rewriting.  Quotation, or fexprs, can prevent an operand from being evaluated; but trivialization results from a context preventing its subterm from being rewritten.  It's quite possible to prevent operand evaluation without trivializing the theory, provided evaluation is a specific kind of rewriting (requiring, in technical parlance, a redex that includes some context surrounding the evaluated term).

Despite myself, though, I was heavily influenced by Wand's result and by the long tradition in which it followed.  Fexprs had been rejected circa 1980 as a Lisp paradigm, in favor of macros.  A rejected paradigm is usually ridiculed in order to rally followers more strongly behind the new paradigm (see here).  My pursuit of $vau as a dissertation topic involved a years-long process of gradually ratcheting up expectations. At first, I didn't think it would be worth formulating a vau-calculus at all, because of course it wouldn't be well-enough behaved to justify the formulation. Then I thought, well, an operational semantics for an elementary subset of Kernel would be worth writing. Then I studied Plotkin's and Felleisen's work, and realized I could provide a semantics for Kernel that would meet Plotkin's well-behavedness criteria, rather than the slightly weakened criteria Felleisen had used for his side-effectful calculus. And then came the shock. When I realized that the vau-calculus I'd come up with, besides being essentially as well-behaved as Plotkin's call-by-value lambda-calculus, was actually (up to isomorphism) a conservative extension of call-by-value lambda-calculus. In other words, my theory of fexprs consisted of the entire theory of call-by-value lambda-calculus plus additional theorems. I was boggled. And I was naively excited. I figured, I'd better get this result published, quick, before somebody else notices it and publishes it first — because it's so incredibly obvious, it can't be long before someone else does find it. Did I say "naively"? That's an understatement. There's some advice for prospective graduate students floating around, which for some reason I associate with Richard Feynman (though I could easily be wrong about that [note: a reader points out this]), to the effect that you shouldn't be afraid people will steal your ideas when you share them, because if your ideas are any good you'll have trouble getting anyone to listen to them at all. In studying this stuff for years on end I had gone so far down untrodden paths that I was seeing things from a drastically unconventional angle, and if even so I had only just come around a corner to where I could see this thing, others were nowhere close to any vantage from which they could see it. [note: I've since written a post elaborating on this, Explicit evaluation.] Kurt Gödel's no-go theorem Likely the single most famous no-go theorem around is Gödel's Theorem. (Actually, it's Gödel's Theorems, plural, but the common informal understanding of the result doesn't require this distinction — and Gödel's result lends itself spectacularly to informal generalization.) This is what I'm going to spend most of this post on, because, well, it's jolly good fun (recalling the remark attributed to Abraham Lincoln: People who like this sort of thing will find it just the sort of thing they like). The backstory to Gödel was that in the early-to-mid nineteenth century, mathematics had gotten itself a shiny new foundation in the form of a formal axiomatic approach. And through the second half of the nineteenth century mathematicians expanded on this idea. Until, as the nineteenth century gave way to the twentieth, they started to uncover paradoxes implied by their sets of axioms. A perennial favorite (perhaps because it's easily explained) is Russell's Paradox. Let A be the set of all sets that do not contain themselves. Does A contain itself? Intuitively, one can see at once that if A contains itself, then by its definition it does not contain itself; and if it does not contain itself, then by its definition it does contain itself. The paradox mattered for mathematicians, though, for how it arose from their logical axioms, so we'll be a bit more precise here. The two key axioms involved are reductio ad absurdum and the Law of the Excluded Middle. Reductio ad absurdum says that if you suppose a proposition P, and under this supposition you are able to derive a contradiction, then not-P. Supposing A contains itself, we derive a contradiction, therefore A does not contain itself. Supposing A does not contain itself, we derive a contradiction, therefore —careful!— A does not not contain itself. This is where the Law of the Excluded Middle comes in: A either does or does not contain itself, therefore since it does not not contain itself, it does contain itself. We have therefore an antinomy, that is, we've proved both a proposition P and its negation not-P (A both does and does not contain itself). And antinomies are really bad news, because according to these axioms we've already named, if there is some proposition P for which you can prove both P and not-P, then you can prove every proposition, no matter what it is. Like this: Take any proposition Q. Suppose not-Q; then P and not-P, which is a contradiction, therefore by reductio ad absurdum, not-not-Q, and by the Law of the Excluded Middle, Q. When Russell's Paradox was published, the shiny new axiomatic foundations of mathematics were still less than a human lifetime old. Mathematicians started trying to figure out where things had gone wrong. The axioms of classical logic were evidently inconsistent, leading to antinomies, and the Law of the Excluded Middle was identified as a problem. One approach to the problem, proposed by David Hilbert, was to back off to a smaller set of axioms that were manifestly consistent, then use that smaller set of axioms to prove that a somewhat larger set of axioms was consistent. Although clearly the whole of classical logic was inconsistent, Hilbert hoped to salvage as much of it as he could. This plan to use a smaller set of axioms to bootstrap consistency of a larger set of axioms was called Hilbert's program, and I'm remarking it because we'll have occasion to come back to it later. Unfortunately, in 1931 Kurt Gödel proved a no-go theorem for Hilbert's program: that for any reasonably powerful system of formal logic, if the logic is consistent, then it cannot prove the consistency of its own axioms, let alone its own axioms plus some more on the side. The proof ran something like this: For any sufficiently powerful formal logic M, one can construct a proposition A of M that amounts to "this proposition is unprovable". If A were provable, that would prove that A is false, an antinomy; if not-A were provable, that would prove that A is true, again an antinomy; so M can only be consistent if both A and not-A are unprovable. But if M were able to prove its own consistency, that would prove that A is unprovable (because A must be unprovable in order for M to be consistent), which would prove that A is true, producing an antinomy, and M would in fact be inconsistent. Run by that again: If M can prove its own consistency, then M is in fact inconsistent. Typically, on completion of a scientific paradigm shift, the questions that caused the shift cease to be treated as viable questions by new researchers; research on those questions tapers off rapidly, pushed forward only by people who were already engaged by those questions at the time of the shift. So it was with Gödel's results. Later generations mostly treated them as the final word on the foundations of mathematics: don't even bother, we know it's impossible. That was pretty much the consensus view when I began studying this stuff in the 1980s, and it's still pretty much the consensus view today. Going there Having been trained to think of Gödel's Theorem as a force of nature, I nevertheless found myself studying it more seriously when writing the theoretical background material for my dissertation. I found myself discoursing at length on the relationship between mathematics, logic, and computation, and a curious discrepancy caught my eye. Consider the following Lisp predicate. ($define! A ($lambda (P) (not? (P P)))) Predicate A takes one argument, P, which is expected to be a predicate of one argument, and returns the negation of what P would return when passed to itself. This is a direct Lisp translation of Russell's Paradox. What happens when A is passed itself? The answer is, when A is passed itself, (A A), nothing interesting happens — which is really very interesting. The predicate attempts to recurse forever, never terminating, and in theory it will eventually fill up all available memory with a stack of pending continuations, and terminate with an error. What it won't do is cause mathematicians to despair of finding a solid foundation for their subject. If asking whether set A contains itself is so troublesome, why is applying predicate A to itself just a practical limit on how predicate A should be used? That question came from my dissertation. Meanwhile, another question came from the other major document I was developing, the R-1RK. I wanted to devise a uniquely Lisp-ish variant of the concept of eager type-checking. It seemed obvious to me that there should not be a fixed set of rules of type inference built into the language; that lacks generality, and is not extensible. So my idea was this: In keeping with the philosophy that everything should be first-class, let theorems about the program be an encapsulated type of first-class objects. And carefully design the constructors for this theorem type so that you can't construct the object unless it's provable. In effect, the constructors are the axioms of the logic. Modus ponens, say, is a constructor: given a theorem P and a theorem P-implies-Q, the modus-ponens constructor allows you to construct a theorem Q. As desired, there is no built-in inference engine: the programmer takes ultimate responsibility for figuring out how to prove things. Of the many questions in how to design such a first-class theorem type, one of the early ones has to be, what system of axioms should we use? Clearly not classical logic, because we know that would give us an inconsistent system. This though was pretty discouraging, because it seemed I would find myself directly confronting in my design the very sort of problems that Gödel's Theorem says are ultimately unsolvable. But then I had a whimsical thought; the sort of thing that seems at once not-impossible and yet such a long shot that one can just relax and enjoy exploring it without feeling under pressure to produce a result in any particular timeframe (and yet, I have moved my thinking forward on this over the years, which keeps it interesting). What if we could find a way to take advantage of the fact that our logic is embedded in a computational system, by somehow bleeding off the paradoxes into mere nontermination? So that they produce the anticlimax of functions that don't terminate instead of the existential angst of inconsistent mathematical foundations? Fragments At this point, my coherent vision dissolves into fragments of tentative insight, stray puzzle pieces I'm still pushing around hoping to fit together. One fragment: Alfred Tarski —who fits the aforementioned profile of someone already engaged by the questions when Gödel's results came out— suggested post-Gödel that the notion of consistency should be derived from common sense. Hilbert's program had actually pursued a formal definition of consistency, as the property that not all propositions are provable; this does have a certain practicality to it, in that the practical difficulty with the classical antinomies was that they allowed all propositions Q to be proven, so that "Q can be proven" ceased to be a informative statement about Q. Tarski, though, remarked that when a non-mathematician is told that both P and not-P are true, they can see that something is wrong without having to first receive a lecture on the formal consequences of antinomies in interaction with reductio ad absurdum. So, how about we resort to some common sense, here? A common-sensical description of Russell's Paradox might go like this: A is the set of all sets that do not contain themselves. If A contains itself, then it does not contain itself. But if it does not contain itself, then it does contain itself. But if it does contain itself, then it does not contain itself. But if it does not contain itself... And that is just what we want to happen: instead of deriving an antinomy, the reasoning just regresses infinitely. A human being can see very quickly that this is going nowhere, and doesn't bother to iterate beyond the first four sentences at most (and once they've learned the pattern, next time they'll probably stop after even fewer sentences), but they don't come out of the experience believing that A both does and does not belong to itself; they come out believing that there's no way of resolving the question. So perhaps we should be asking how to make the conflict here do an infinite regress instead of producing a (common-sensically wrong) answer after a finite number of steps. This seems to be some sort of deep structural change to how logical reasoning would work, possibly not even a modification of the axioms at all but rather of how they are used. If it does involve tampering with an axiom, the axiom tampered with might well be reductio ad absurdum rather than the Law of the Excluded Middle. This idea — tampering with reductio ad absurdum rather than the Law of the Excluded Middle — strikes a rather intriguing historical chord. Because, you see, one of the mathematicians notably pursuing Hilbert's program pre-Gödel did try to eliminate the classical antinomies by leaving intact the Law of the Excluded Middle and instead modifying reductio ad aburdum. His name was Alonzo Church —you may have heard of him— and the logic he produced had, in retrospect, a curiously computational flavor to it. While he was at it, he took the opportunity to simplify the treatment of variables in his logic, by having only a single structure that binds variables, which (for reasons lost in history) he chose to call λ. Universal and existential quantifiers in his logic were higher-order functions, which didn't themselves bind variables but instead operated on functions that did the binding for them. Quite a clever device, this λ. Unfortunately, it didn't take many years after Church's publication to show that antinomies arose in his system after all. Following the natural reflex of Hilbert's program, Church tried to find a subset of his logical axioms that could be proven consistent — and succeeded. It turned out that if you left out all the operators except λ you could prove that each proposition P was only equivalent to at most one irreducible form. This result was published in 1936 by Church and one of his students, J. Barkley Rosser, and today is known as the Church–Rosser Theorem (you may have heard of that, too). In the long run, Church's logic is an obscure historical footnote, while its λ-only subset turned out to be of great interest for computation, and is well-known under the name "λ-calculus". So evidently this idea of tampering with reductio ad absurdum and bringing computation into the mix is not exactly unprecedented. Is it possible that there is something there that Alonzo Church didn't notice? I'd have to say, "yes". Alonzo Church is one of those people who (like Albert Einstein, you'll recall he came up in relation to the first no-go theorem I discussed) in retrospect appears to have set a standard of intelligence none of us can possibly aspire to — but all such people are limited by the time they live in. Einstein died years before Bell's Theorem was published. Heck, Aristotle was clearly a smart guy too, and just think of everything he missed through the inconvenience of being born about two millennia before the scientific revolution. And Alonzo Church couldn't, by the nature of the beast, have created his logic based on a modern perspective on computation and logic since it was in part the further development of his own work over many decades that has produced that modern perspective. I've got one more puzzle piece I'm pushing around, that seems like it ought to fit in somewhere. Remember I said Church's logic was shown to have antinomies? Well, at the time the antinomy derivation was rather baroque. It involved a form of the Richard Paradox, which concerns the use of an expression in some class to designate an object that by definition cannot be designated by expressions of that class. (A version due to G.G. Berry concerns the twenty-one syllable English expression "the least natural number not nameable in fewer than twenty-two syllables".) The Richard Paradox is naturally facilitated by granting first-class status to functions, as λ-calculus and Lisp do. But, it turns out, there is a much simpler paradox contained in Church's logic, involving less logical machinery and therefore better suited for understanding what goes wrong when λ-calculus is embedded in a logic. This is Curry's Paradox. I'll assume, for this last bit, that you're at least hazily familiar with λ-calculus, so it'll come back to you when you see it. For Curry's Paradox, we need one logical operator, three logical axioms, and the machinery of λ-calculus itself. Our one logical operator is the binary implication operator, ⇒. The syntax of the augmented λ-calculus is then T ::= x | c | (λx.T) | (TT) | (T⇒T) That is, a term is either a variable, or a constant, or a lambda-expression, or an application, or an implication. We don't need a negation operator, because we're sticking with the generalized notion of inconsistency as the property that all propositions are provable. Our axioms assert that certain terms are provable: 1. For all terms P and Q, if provably P and provably (P⇒Q), then provably Q. (modus ponens) 2. For all terms P, provably P⇒P. 3. For all terms P and Q, provably ((P⇒(P⇒Q))⇒(P⇒Q)). The sole rewriting axiom of λ-calculus, lest we forget, is the β-rule: (λx.F)G → F[x ← G] That is, to apply function (λx.F) to argument G, substitute G for all free occurrences of x in F. To prove inconsistency, first we need a simple result that comes entirely from λ-calculus itself, called the Fixpoint Theorem. This result says that for every term F, there exists a term G such that FG = G (that is, every term F has a fixpoint). The proof works like this: Suppose F is any term, and let G = (λx.(F(xx)))(λx.(F(xx))), where x doesn't occur in F. Then G = (λx.(F(xx)))(λx.(F(xx))) → (F(xx))[x ← (λx.(F(xx)))] = F((λx.(F(xx)))(λx.(F(xx)))) = FG. Notice that although the Fixpoint Theorem apparently says that every F has a fixpoint G, it does not actually require F to be a function at all: instead of providing a G to which F can be applied, it provides a G from which FG can be derived. And —moreover— for most F, derivation from G is a divergent computation (G → FG → F(FG) → F(F(FG)) → ...). Now we're ready for our proof of inconsistency: that for every term P, provably P. Suppose P is any term. Let Q = λx.(x⇒P). By the Fixpoint Theorem, let R be a term such that QR = R. By writing out the definition of Q and then applying the β-rule, we have QR = (λx.(x⇒P))R → (R⇒P), therefore R = (R⇒P). By the second axiom, provably (R⇒R); but R = R⇒P, so, by replacing the right hand R in (R⇒R) with (R⇒P), provably (R⇒(R⇒P)). By the third axiom, provably ((R⇒(R⇒P))⇒(R⇒P)); and we already have provably (R⇒(R⇒P)), so, by modus ponens, provably (R⇒P). But R = (R⇒P), so provably R. Since provably R and provably (R⇒P), by modus ponens, provably P. Note: I've had to fix errors in this proof twice since publication; there's some sort of lesson there about either formal proofs or paradoxes. So, why did I go through all this in detail? Besides, of course, enjoying a good paradox. Well, mostly, this: The entire derivation turns on the essential premise that derivation in the calculus, as occurs (oddly backwards) in the proof of the Fixpoint Theorem, is a relation between logical terms — which is to say that all terms in the calculus have logical meaning. And we've seen something like that before, in my early explanation of Mitchell Wand's no-go theorem: trivialization of theory resulted from assuming that all calculus derivation was evaluation. So, if we got around Wand's no-go theorem by recognizing that some derivation is not evaluation, what can we do by recognizing that some derivation is not deduction? #### 22 comments: 1. Another possible route around Godel's no-go is to observe that a fixed-point combinator or Russel's paradox encoding are only possible in *untyped* lambda-calculus. To avoid them in a logic, the theory should include a hierarchy of types, similarly to the original system of the Principia Mathematica. There is nothing new here. I believe as early as in some footnote in Zermelo's Set Theory there is a mention of typed version of set theory to which Godel's theorem doesn't apply. 1. I, of course, am not a big fan in types, per my earlier blog post (Where to types come from?). In this case, it seems to me any system of types that successfully avoids paradoxes would necessarily restrict things more than necessary to avoid paradoxes (unless the type system itself is undeciable, which I find unappealing). What I'd really like is a system that seems to restrict things just enough to avoid paradoxes. At this point what I really need is to pinpoint the kinds of propositions about programs I want to prove; given that, I suspect I can build the thing. 2. Type system does prohibit certain constructions, but as far as mathematics is concerned, it is not restrictive. As a matter of fact, all the usual mathematics is typed, but implicitly. When a mathematician speaks, for example, about a ring R being a set of elements plus a set of operations, it is actually meant that an element of R is a tuple ((*ring*, R), x), tagged with a type label. This is the only way to not introduce a maze of disjointness conditions (otherwise your ring might end up being not disjoint from polynomials on it, for example). For a rather convincing demonstration see http://homotopytypetheory.org/book/. It is also consistently based on the idea of theorems (proofs, in fact) being first-class objects. 3. When a mathematician speaks, for example, about a ring R being a set of elements plus a set of operations, it is actually meant that an element of R is a tuple ((*ring*, R), x), tagged with a type label. This is not the case. When a mathematician says one thing, there's no reason to suppose they actually meant something else. There was a time when in Europe Latin was considered the scholarly language, and people's native languages were considered inferior. One could imagine a scholar in those times claiming that when someone said something fluently in, say, English, what they actually meant was something in Latin. Consider, for example, the spurious "rule" that an English sentence cannot properly end with a preposition — a rule with no validity for English, but true for Latin. It is a current fad to view the world through glasses tinted the color of the Curry–Howard correspondence, and deny the existence of anything that isn't visible through those glasses. My point, or at least part of it, is that types are a secondary structure whose introduction into a description of the primary system leads to obscure indirect consequences, so that if you really want to understand what's going on, you need to not introduce them. Note that when you spoke of a mathematician saying one thing and actually meaning something else, the something else involved a bunch of secondary structure that the mathematician in the first place had not dragged into the situation. 2. Tagging structure is "meant" in the sense, that if you force a mathematician to formalise his proof, it (or something similar) would arise, when a certain level of detail is reached. The underlying problem is that the set theory as it is (ZFC), is not an adequate universe of mathematical discourse. ZFC's sets are not structureless receptacles, which can be endowed with a structure (of a group or topological space), they already have some non-trivial structure: membership. Because of this, from set-theory point of view one might ask not only whether a point X on a Cartesian plane belongs to a triangle D, but also whether D belongs to X. The answer to the latter question depends on the irrelevant details of how points, real numbers, rational numbers (and so on all the way down) are encoded in sets. The answer can well be positive (of course except when X is a member of D---this would violate regularity). By implicit tagging mathematicians avoid such spurious set intersections. In effect, it amounts to building, on top of ZFC, of a version of the original Cantor's set-theory, in which such monsters were outlawed. See Lawver's "WHY ARE WE CONCERNED?" (http://conceptualmathematics.wordpress.com/2012/06/20/why-are-we-concerned/) I guess that the success of theory of categories proves that mathematicians do enjoy working with rich types. 1. You're missing the point. I acknowledge it's an easy point to miss if one hasn't got it yet. Types are a mold one can force anything into if one wants to do so badly enough, but it doesn't follow that it would advisable to do so, nor that "types are everywhere" in any useful sense. The success of category theory —and your impression of that "success" may be skewed by surrounding yourself with people who think highly of category theory— has no bearing on types either. If you define all distinctions between things to be symptoms of types, then you'll see types everywhere. This is rather like a theologian defining God to be the prime mover of all things and then, looking about at everything in motion, seeing God is everywhere. 2. I think you are missing a point too and a rather deeper one, viz., what constitutes a valid argument. A barrage of uninvited and unfounded implications about intentions and beliefs of your opponent certainly doesn't (as usual in such cases, they are all, without exception, wrong). Nor does repeating the statement (that types are not advisable) instead of addressing submitted examples to the contrary. Instead of continuing in the same vein, i.e., exposing to the world alleged defects of your intellectual upbringing and patent absurdity of heresies you swear allegiance to, let me do something more constructive and display on the door of this church a very short list of theses indeed: (A) that mathematicians do not use, in their day-to-day practice, ZFC sets; (B) that instead they use naive set-theory, which is typed such (C) that mathematical objects of different *types* (rings, triangles, manifolds, sequences) never accidentally coincide, even when nothing is known about the structure of their elements), but (D) in other cases, a type is without hesitation considered as a sub-type of another, even when their elements are different by construction (e.g., a rational number *is* a real number, even when the latter was constructed as a Dedekind's cut of the formers); (E) that outside of elementary school, functions are not considered as untyped lambda-forms, but have domain and range and (F) that widespread use of "morphism diagrams" indicates utility of typed functions (not to say about the fundamental rôle that function spaces play in many branches of mathematics); (G) that the success of theory of categories cannot be simply dismissed and its unbearing on types should be elaborated. Because my last example of (C), the one about a point and a triangle, somehow let you unimpressed, let me give you another one, straight from Commutative Algebra by Zariski and Samuel, (ch. I, §16). Consider an arbitrary ring R. Define a polynomial p over R as a finite sequence of coefficients (r0, r1, ...). Let S be the set (ring) of all polynomials. R can be embedded in S by the mapping j = \r->(r, 0, 0, ...). To avoid the nuisance of the explicit embedding, define a new version of the ring of polynomials R[x] as the union of R and (S - j(R)). Now one can, for example, define a function f on R[x] by the case analysis: - if p is in R, f(p) = ... - if p is in (S - j(R)), f(p) = ... Right? Wrong! Because an element r of R might easily have a form r = (r0, r1, ...), that is, be a sequence of other elements of R. In this case, R and (S - j(R)) are not disjoint and the entire theory collapses. Possibilities like this appear all over in mathematics and keeping track of them would be rather tedious, intractably tedious, in fact. Every theorem now stated as "For any group G, ..." would have to be re-formulated as "For any group G, with elements not having form ... or form ..." and such useless conditions would accumulate until totally incomprehensible. Besides, these conditions would depend on internal details of constructions within proofs, so one could never be sure whether a result can be translated in a different environment (typical problem with untyped software too). Instead mathematicians just assume (without even bothering to state it formally) that all their sets and elements are properly tagged as to be nicely disjoint when necessary. 3. You're evidently mistaking my attempts to communicate as accusations/attacks. They aren't. I'm looking for a way to communicate a subtle point that I've never before noticed needed to be put into words; this is quite exciting, because figuring out what needs to be articulated is often a prelude to major progress. It is also extremely frustrating because the most important things tend to be incredibly difficult to articulate the first time (though very easy to articulate the second time). Most of what you're saying, I would have considered obvious. Mixed into it though is this thread of not distinguishing different ways of organizing one's thoughts. That distinction is at the heart of what I'm trying to articulate. It's one thing to say one could, in principle, describe what a mathematician is saying/doing in terms of type tags. It's quite another to say the mathematician is describing what they're doing that way (even implicitly). 4. The difference between possible description of an activity and perception of the activity by the actor doesn't strike me as difficult to articulate. That this is part of your point was rather obvious from the beginning and I fully took this into account. My thesis is that this difference is moot, because mathematicians, in the course of their activities, don't speak or even think about issues like that. They have no opinion on the subject of properties of sets, because they are interested in, say, manifolds, not sets. In fact, *they* invented sets and epsilon-deltas and Peano axioms and a lot of other stuff only to make their manifolds and fibrations clearer and nicer. If necessary, they will do this again: the bulk of mathematics doesn't not depend on foundations at all. Because of this, the only way to understand what kind of foundations is best for mathematics is to analyse existing mathematical proofs and definitions, not what mathematicians say. After all, as Lakatos put it, "scientists know as much about foundations as fish about hydrodynamics". As a more striking example, most mathematicians profess Platonism, yet persist in proofs by contradiction, that is, freely work in formal systems that have no models. The other point you made is that somebody believing all distinctions being ones of type would see types everywhere. Again, this is the *only* way to see. E.g., similarly, one who believes in the primitive untyped universe, would invariably see types as an unadvisable mold. Believing that one of such ways is apriori right is exactly the theology. Typed and untyped theories are not, naturally, arranged in any hierarchy, even if some expositions make them so. To clarify the status of my assertion about how mathematicians treat types: in their normal activity they don't mention the properties of the underlying substrate, assuming it vaguely to be a "naive set theory". When I (being a mathematician by education), discuss the problems of untyped ZFC-like theory, like the one outlined above, with them, they invariably dismiss them as pathologies, asserting that disjointness and sub-typing can be achieved (e.g., by labelling) as necessary. Now, the ones who are interested in "foundations" (e.g., realise that paradoxes are property of Frege's *logical* theory, rather than Cantor's *set* theory) have more nuanced views, but in general: * they don't describe the substrate, unless asked to; * their actions are consistent with typed substrate and inconsistent with untyped substrate; * when asked, they describe some version of typed substrate. 5. The difference between possible description of an activity and perception of the activity by the actor doesn't strike me as difficult to articulate. That however is not the difference I'm trying to articulate. the bulk of mathematics doesn't not depend on foundations at all. On this we agree (supposing I understand your intended point, which is perhaps an incautious assumption given how much difficulty we're having in other areas). I liked the way Morris Kline put it: "Progress in mathematics almost demands a complete disregard of logical scruples." However, this is a distraction here (as is the analogy about fish and hydrodynamics) because of primary interest is mathematicians actually engaged in studying foundations. More like dolphins with advanced degrees in hydrodynamics (which just goes to show what can happen when an analogy runs amok). Your remark about Platonism suggests to me, btw, you're using an overly prescriptive definition of the term; but now here I'm digressing from the point. Digression is contagious. this is the *only* way to see. This statement is fairly strong evidence you've not yet realized what point I'm driving at. I'd expect you disagreeing with my point to look entirely different. This looks like you looking in one direction while I'm pointing in another. There's likely some clever way I could express myself that would cause you to say (or at least think), oh, that's what he means; why on Earth didn't he say so in the first pllce? Too bad I've so far not figured out what that way of expressing myself is. 3. > That however is not the difference I'm trying to articulate. The only way to resolve this problem, I can think of, is by trying again. Just telling "wrong" and hoping the elusive point will be found by poking around is not going to work. > However, this is a distraction here (as is the analogy about fish and hydrodynamics) because of primary interest is mathematicians actually engaged in studying foundations. Sorry, you misunderstood me completely. From the very first "When a mathematician speaks..." I meant a "normal" practicing mathematician. Note that I specifically mentioned that "all the *usual* mathematics is typed". And the running examples of points, triangles and manifolds should have provided an ample evidence of this. Opinions of people working on foundations matter only as much as large is that area of mathematics, i.e., very little. > This statement is fairly strong evidence you've not yet realized what point I'm driving at. Failures of understanding are possible in many places. I am not talking about typed universe being the only possible one. I am talking about the two following points of view being symmetrical: "the universe is typed, any distinction is one in type" and "the universe is untyped, any type is a mold". The former insists that typed setting is primitive and untyped one is an abstraction of the typed. The latter posits that untyped setting is primitive and typed one is built on top of the untyped. Both are consistent. The former can be formalised as a first order theory of categories with untyped universe as the category SET, defined by a chain of adjunctions. The latter can be formalised as ZFC with types as tags or categories. You seem to believe in the second one. You also seem to believe that I believe in the first one. 1. The only way to resolve this problem, I can think of, is by trying again. Just telling "wrong" and hoping the elusive point will be found by poking around is not going to work. We may be in agreement on this too, though putting things differently. Seems to me just trying again isn't going to work either, because the obstacle would remain, unidentified. So I've been trying to pinpoint where the mismatch of perspectives is. Sorry, you misunderstood me completely. From the very first "When a mathematician speaks..." I meant a "normal" practicing mathematician. I did understand you meant a "normal" practicing mathematician. It seemed to me that re the matter of Platonism vs proof by contradiction you were mismatching with what those sorts of mathematicians would mean when they profess "Platonism". I was also pointing out that the analogy about scientists and fish breaks down here, a lesser reason being that I don't think mathematicians (even "normal" mathematicians) are altogether "scientists", and a greater reason being that —for my current purposes, anyway— the mathematicians of interest are those of us studying foundations. The former insists that typed setting is primitive and untyped one is an abstraction of the typed. The latter posits that untyped setting is primitive and typed one is built on top of the untyped. [...] You seem to believe in the second one. You also seem to believe that I believe in the first one. We are perhaps getting somewhere. I don't believe in the first or second, and I hope you don't either. Mathematics is replete with alternative equivalent formulations of things. I had the pleasure while an undergraduate of taking a class from Brigitte Servatius who was in the process of coauthoring a book on matroid theory. (The book is Combinatorial Rigidity, 1993, and I'm rather tickled to be mentioned at the end of the preface.) There are a slew of different ways to formulate matroids, one of which seems to be in some sense the "simplest". There's Euclidean geometry, which I suspect has something peculiar about it for purposes of understanding other geometries. And of course the Church–Turing thesis doesn't keep Turing machines from being, for most purposes, much clumsier to program than lambda-calculus. And in designing programs, the choice of program structure can make a flabbergasting difference in how readily the program can be written (and debugged). This is my particular long-term interest, to understand how ability to express things is changed when one uses a language A to define a language B — my preferred name for such an investigation being abstraction theory. It's interesting to me, for example, that in mathematics one can abstract from one formulation to another and back again, and end up with really the same formulation one started with, whereas in our programming languages if we try to abstract from one formulation to another and back again, we generally end up with something that's only an approximation of what we started with. A failure of our programming languages, in which I think typing is complicit. I observe that while typed and untyped approaches may be "interchangeable" in the usual mathematical sense, it doesn't follow that they have the same abstractive properties, nor that certain things can be understood equally well in either formulation. My intuition says that when tangling with the paradoxes, bringing types into the picture is obfuscating. 2. > Mathematics is replete with alternative equivalent formulations of things. First, I'd like to clarify: is this an attempt to formulate the elusive point? I have to ask because, while I recall this being mentioned in some other your post, I cannot see anything related in the present discussion up to this point. Anyway, I very much agree. For me an ideal mathematical textbook would give multiple definitions of a concept, leaving their equivalence to a theorem or exercise (e.g., Cauchy vs. Dedekind reals). I think this email, which you refer to indirectly elsewhere as an example of typing red tape demonstrates of the same principle: free translation (or lack thereof) between organizing computation around data (classes) or around functions. As for your conclusion about types aggravating the complexity of such translations, I see the following arguments to the contrary: * translational maps between representations, which can be identified with proofs of equivalence, look very much like proof objects in some (constructive) proof theory: they need some coherency and good-behavedness conditions. Such theories are typed, see again homotopytypetheory.org; * most importantly, "alternative equivalent formulations of things" are only equivalent under some conditions. For example, Cauchy and Dedekind reals are equivalent in ZFC, but generally different in a constructive logic or elementary topos. Hence, formulations and translations must somehow record their assumptions, and this record is nothing but an interface to (a type of) the underlying logic or set theory or, in general, some other underlying layer of abstraction. These types guarantee that equivalence doesn't break. Compare this with the Stepanov's thesis that algorithms are defined on algebraic structures. 4. I confess I don't understand the fixpoint theorem at all. If F is ($lambda (x) (cons x x)) and cons has the usual Lisp definition, then surely this is a function with no fixpoint? There must be some subtle difference between lambda calculus and lambda calculus derived languages here? I can easily see that for any function F there can be a function G for which F is G's fixpoint... but I don't see how the reverse applies. I'm guessing that Lisp cons is not representable as pure lambda calculus?

1. Oh wait... I think I finally get it. A fixpoint is a recursive non-terminating calculation, a circular self-reference which of course evaluates to itself, so FG 'evaluates to' G in the sense that it's an infinite regression and any step along an infinite corridor gets you that same infinite corridor. So (\$lambda (x) (cons x x)) has no fixpoint in the domain of actually-allocated Lisp cons cells, but does have a fixpoint in the domain of functions applied to functions applied to functions...

So a fixpoint is essentially an infinite loop... and for any function, it's possible to force an infinite loop by passing it a reference to itself. Have I got that right?

2. And by 'domain' I mean 'range'. Mathematics is hard. :(

3. The key to understanding the fixpoint theorem, for me anyway, was realizing that when it says FG=G, it does not mean that for every F there is a G such that by calling F with parameter G one can get back G. I mention this in the blog post, above. The equality operator in FG=G is the equivalence relation induced by the calculus rewriting relation: that is, it's the symmetric reflexive transitive closure of the rewriting step relation. When you look at the proof of the theorem, it actually works by constructing an expression G such that evaluating G produces, as an intermediate result, FG. There is no need for F to even be a function, and if F is a function it doesn't matter what, if anything, F would actually do when called, because the proof of the theorem doesn't involve calling F.

So, F isn't being passed anything; we get distracted from this by knowing that F is, in fact, a function, and so we think about what would happen if F were called with G as a parameter — but F doesn't have to be called. In fact, if we tried to evaluate FG in Scheme or Kernel, eager-argument-evaluation would guarantee that F would never be called, because the operand G would have to be fully evaluated before passing it to F, and G has been chosen so that it reduces to FG, which reduces to F(FG), and so on ad infinitum.

The theorem could just as well have been stated as, "For every F there is a G such that G=FG", which is really saying the same thing but doesn't sound as impressive.

4. "The key to understanding the fixpoint theorem, for me anyway, was realizing that when it says FG=G, it does not mean that for every F there is a G such that by calling F with parameter G one can get back G"

Thanks! Yes, I think I finally grasp this now; it's very difficult for me to unlearn the practical "programmer" mindset which expects every function evaluation to actually terminate and produce a concrete value, and the evaluation of those values to be none of the calling routine's business, hidden behind an abstraction wall; we also tend to be trained to think of infinite left-recursive expansions as pathological cases and to be avoided at all costs. While I'm guessing lambda calculus, like turing machines and Typographical Number Theory, originated in the "typographical" family of approaches to logic where the formulae were considered as sequences of text symbols to be manipulated by an all-seeing mathematician obeying some text-substitution rules, but with read/write access to the whole sheet of paper.

The ideas of information security and data-hiding being built into an evaluating mechanism presumably originated much later. (Though it's interesting to wonder how much the WW2 Bletchley Park cryptography environment with its information walls cross-pollinated with the mathematics of computation being worked on in that same environment; again, I'm guessing not much as such; I have the impression that it wasn't until the 1960s, with timesharing systems, that the idea of an electronic computer having more than one simultaneous "user", and the need for active security between executing routines, was considered. But was there anything else happening in signals, cybernetics and information theory, eg Shannon et al, that might have emerged earlier?)

I must get back to reading your dissertation - I'm really enjoying it so far. The Smoothness Conjecture is a great statement of an intuition I've been trying to articulate, as to why macros and quasiquotation in traditional Lisps just feel "wrong" and clumsily bolted-on to me.

5. "Modus ponens, say, is a constructor: given a theorem P and a theorem P-implies-Q, the modus-ponens constructor allows you to construct a theorem Q."

This seems to come close to something that I've been worrying at: is it possible for cons cell structure to itself be a logic?

For example, what if symbol X meant 'X is asserted', and pair (X . Y) meant 'X implies Y'? Could we then use the mechanisms of Lisp list traversal themselves to make logical deductions, albeit of a fairly restricted and minimal kind? It seems like list structure would come close to natively implementing something http://en.wikipedia.org/wiki/Implicational_propositional_calculus .

An immediate result of this would be that proper list terms, eg (X Y) would not quite mean 'X implies Y' but actually 'X implies that (Y implies nil)', with S-expression nil having the meaning of trivial truth. (Which is the opposite of how classical Lisp interprets nil, but closer to Scheme. However instead of #FALSE being a value in itself, falsehood would be the complete _absence_ of a cons cell rather than an empty one.)

I've been tickling the edge of this idea by considering a simple extension to S-expressions which I've nicknamed T-expressions: take a dotted pair/list, but allow a list of items rather than a single one after the dot. This list represents a set of multiple possible completions of the pair, and could be empty. We could then use these modified S-expressions to represent arbitrary binary-relation (table or dictionary) structure, which flattens out to a list in the case that it represents a single row. There are two distinguishable null-like values: standard nil (), and a new empty set (.)

I've found that by restricting the part after the dot to proper lists, I can union and intersect these expressions just like sets (they are sets of S-expression proper lists, essentially), which isn't something that's always defined for many of today's object or dictionary structures (such as, say, JSON objects). I can also non-commutatively 'multiply' and 'divide' them, with multiply being Cartesian product and 'divide' being a 'lookup' operation, which returns distinguishable true (nil) and false (empty set) results.

I'm not sure how far beyond this can be taken towards an actual logic, but it seems like a useful intermediate result to me, and I haven't seen it published anywhere so far.

6. I've read through your blog in its entirety (a first for me). I am very fond of the tendency of your thinking (I also enjoy your prose). Much of the content is beyond my ken, but I'm working to close the gap. What I do understand has had a pretty sustained impact on my thinking for the last 4-5 months. Thank you for the inspiration, and please file this preamble under "solicitations to keep writing".

Shortly after I read this piece of yours, I read Quine's "The Ways of Paradox" (https://math.dartmouth.edu/~matc/Readers/HowManyAngels/Paradox.html). I think your essay and his share a common spirit. I wanted to mention it, just in case you were unfamiliar.

> What if we could find a way to take advantage of the fact that our logic is embedded in a computational system, by somehow bleeding off the paradoxes into mere nontermination? So that they produce the anticlimax of functions that don't terminate instead of the existential angst of inconsistent mathematical foundations?

I read this as a (potentially pivotal?) insight contributing to our acclimation to the paradox. It is that historical process of acclimation which is the central theme of Quine's essay. In concluding his essay, he writes,

>Like any veridical paradox, [Gödel's pardox] is one we can get used to, thereby gradually sapping its quality of paradox. But this one takes some sapping. And mathematical logicians are at it, most assiduously. Gödel's result started a trend of research that has grown in thirty years to the proportions of a big and busy branch of mathematics sometimes called proof theory, having to do with recursive functions and related matters, and embracing indeed a general abstract theory of machine computation.

I am following the clues of a suspicion that your critique of types and Quine's critique of propositionalism are akin, but I have a long way to go before I can say anything sensible on that front. I'll let you know if I ever make anyting of it. Many thanks for taking the time to make your thoughts available!

7. You might want to take a look at "Map Theory" by Klaus Grue (http://www.diku.dk/~grue/papers/Grue92/Grue92.pdf). To quote another paper by Grue, "[t]he deepest difference between set and map theories shows up in the treatment of infinite looping: Russell's sentence {x | x \not\in x} \in {x | x \not\in x} is just a term that takes an infinitely long time to compute, but set theory deals with this sentence by forbidding it rather than taking its value seriously."

8. Chaitin has an interesting treatment of this stuff in 'Meta Math!'... he seems to think the Turing approach is more general, and then goes on to try to showcase the weirdness into a single "most indeterminable" number he calls Omega.