## Sunday, May 10, 2015

### Computation and truth

a very great deal more truth can become known than can be proven.
Richard Feynman, "The Development of the Space-Time View of Quantum Electrodynamics", Nobel Lecture, 1965.

In this post I'm going to take a sharper look at the relationship between computation and truth.  This relates to my previously blogged interests both in difficulties with the conventional notion of type (here) and in possible approaches to sidestepping Gödel's Theorem (here).

I was motivated to pursue these ideas further now by two recent developments:  first, a discussion on LtU that — most gratifyingly — pushed me to re-review the more globally oriented aspects of Gödel's Theorem (here); and second, following on that, an abrupt insight into what the Curry–Howard correspondence implies about the practical dynamics of static typing.

My biggest achievement here is on something I've wanted to do for many years:  to make clear exactly why Gödel's Theorem is true.  By proving it.  As an elementary result.  If it's really as profound as it's made out to be, it should have a really simple proof; yet I've never seen the proof treated as an elementary result.  At an elementary level one usually sees hand-waving about the proof, in a manner I find unsatisfying because the details omitted seem rather essential to understanding why the result is true.  I don't expect to specify here every tiny detail of a proof, but any detail I omit should be obviously minor; it shouldn't feel as if anything problematic is being hidden.

I've comparatively far less to say about Curry–Howard.  It's taken me years to come up with this insight into the connection between Curry–Howard and static typing, though; so if the difficulty of coming up with it isn't just me being thick, the fact that it doesn't take long to state is a good sign.

Contents
That which is most computable is most true
Church–Turing
Enumeration
Diagonalization
Uncomputability
Gödel
Attitude
Curry–Howard
What we can't know
Gödel's Second Theorem
Turing's dissertation
Where do we go from here?
That which is most computable is most true

We want to get at truth using mechanical reasoning.  Why?  Because mechanical reasoning is objective.  We can see what all the rules are, and decide whether we agree with them, and we can check how the particular reasoning was done and know whether or not it follows the rules, and if it all checks out we can all agree on the conclusion.  Everything is out in the open, unlike things like divine revelation where an unadorned "I don't believe it" is a credible counter-argument.

In mechanical reasoning, propositions are built up using symbols from a finite alphabet, and a finite set of rules of deduction allow propositions to be proven, either from first principles (axioms) or from a finite number of previously proven propositions (theorems).  In essence, this is computation (aka mechanical computation); I'm not talking about a sophisticated correspondence here, just common-sense mechanics:  manipulating propositions using finite rules of deduction is manipulating strings over an alphabet using a finite set of rules, which is computation.  What you can and can't do this way is governed by what you can and can't do by computation.

Some results that came out gradually over the first several decades of the twentieth century appear to say there is no one unique most-powerful notion of truth accessible by mechanical reasoning.  However, it turns out there is a unique most-powerful notion of computation.  By the time this second point was fully appreciated, mathematicians had already long since been induced to reconcile themselves, one way or another, to the first point.  However, looking at it all from nearly a century downstream, it seems advisable to me to start my systematic study with the historically later result, the unique most-powerful notion of computation.

Church–Turing

In the 1930s, three different formal models of general computation were proven to be equi-powerful:  general recursive functions (1931, Gödel and Herbrand), λ-calculus (1936, Church), and Turing machines (1936, Turing).  That is, for any computation you can do with one of these models, there's an equivalent computation you can do with either of the other models.  Of these three models, Turing machines are perhaps the least mathematically elegant, rather nuts-and-bolts sort of devices, but it's also most intuitively obvious that you can do something by mechanical computation iff (if and only if) you can do it with a Turing machine.  When the other models were proven equi-powerful with Turing machines, it didn't add much to the credibility of Turing machines; rather, it added credibility to the other models.

The "thesis", also called a "conjecture", is that any model of mechanical computation, if it isn't underpowered, is also equi-powerful with Turing machines and all these other "Turing-powerful" models of computation.  It's a "thesis", "conjecture", etc., because it's an inherently informal statement and therefore isn't subject to formal proof.  But it's demonstrated its unassailability for about three quarters of a century, now.  Mathematicians study what it takes to build a model more powerful than Turing machines (in fact, Turing himself touched on this in his dissertation); you have to bring in something that isn't altogether mechanical.  The maximum amount of computational power you can get mechanically is a robust quantity, somehow built into the fabric of Platonic mathematics much as the speed of light is built into the fabric of physics, and this quantity of power is the same no matter which way you get there (what computational model you use).  And, that amount of computational power is realizable; it isn't something you can only approach, but something you can achieve (mathematically speaking).

Enumeration

When exploring the limits of Turing-powerful computation, the basic technique is to frame computations in terms of enumerations.

Enumeration is where you just generate a list, perhaps a list that never ends; typically you don't want the whole list, you just watch it as it grows to see whether the information you actually want ever gets listed.  As long as you're only asking whether a computation can be done — not how efficiently it can be done — all computations can be rearranged this way to use enumeration.

Suppose you've got any Turing machine.  It computes outputs from inputs.  Maybe it doesn't even always complete its computation:  maybe sometimes it diverges, computing forever instead of finally producing an answer, perhaps by going into an infinite loop, or perhaps finding some more exotic way to non-terminate.  But, given any such machine, you can always define a second machine that enumerates all the input/output pairs where the first machine given that input would halt with that output.  How would you do that?  It's straightforward, though tedious.  Call the first machine T1.  We can enumerate all the possible inputs, since they're all built up from a finite alphabet; just list them in order of increasing length, and within inputs of a given length, list them alphabetically.  For each of these inputs, using the mechanical rules of T1, you can enumerate the states of T1:  the initial state, where it has the input and is about to start computing, the second state that results from what it does next from that first state, and so on.  Call the mth state of T1 on the nth input S(n,m).  Imagine an expanding table, which we slowly fill out, where n is the row number and m is the column number.  Row n is the entire computation by T1 for the nth input.  We can't fill out the whole table by completing each row before moving on to the next, because some rows might extend to the right forever, as T1 doesn't halt on that input.  But we can mechanically compute any particular cell in the table, S(n,m).  So we just have to enumerate all the pairs n,m such that we'll eventually get to each of them — for example, we can do all the entries where the sum of n and m is two (that's just the leftmost topmost entry, S(1,1)), then all the ones where the sum is three (that's S(1,2) and S(2,1)), four (that's S(1,3), S(2,2), and S(3,1)), and so on — and whenever we find a cell where T1 halts, we output the corresponding T1 input/output pair.  We now have a T2 that enumerates all and only the input/output pairs for which T1 halts.

Diagonalization

Way back in the late nineteenth century, though, Georg Cantor noticed that if you can enumerate a series of enumerations — even a series of infinite enumerations — you can find an enumeration that wasn't in the series.  He did this with real numbers, to show that not all real numbers are rational.  This is going to sound very similar to what we just did for Turing machines.

Consider the numbers less than one and not less than zero.  We can enumerate the rational numbers in this interval:  each such number is a non-negative integer divided by a strictly larger integer, so just list them in order of increasing denominator, and for a given denominator, by increasing numerator (0/1, 0/2, 1/2, 0/3 1/3, 2/3, et cetera ad infinitum).  For each of these ratios, we can enumerate the digits in the decimal representation of that ratio, starting with the tenths digit.  (If the denominator divides a power of ten, after some point the decimal representation will be all zeros.)  Call the mth digit of the nth ratio S(n,m).  Imagine a table where n is the row number and m is the column number.  Row n is the entire decimal representation of the nth ratio.  We can mechanically compute what should go in any particular entry of this table.  And now comes the trick.  We can construct the decimal representation of a real number, in the interval, that isn't equal to any of the ones we've enumerated.  To do this, we read off the entries on the diagonal of our table from the upper left toward the lower right (S(1,1), S(2,2), etc.), and add one to each digit (modulo 10, so if we read a 9 we change that to 0):  our nth digit is  (S(n,n) + 1) mod 10.  This is a perfectly good way to specify a real number — it's an infinite sum of the form Σan10−n — and we know it isn't rational because every rational number in the interval has some decimal digit on which it differs from our real number.

This general technique is called diagonalization:  you have an enumerable set of inputs (the labels on the rows), for each input you have an enumerated sequence (the row with that label), and you then produce an enumerated sequence that differs from every row by reading off the diagonal from upper left downward to the right.  Since diagonalization is used to show something isn't enumerated, and enumeration is at the heart of computation, naturally diagonalization is useful for showing things can't be computed.

Uncomputability

Because a Turing machine, like any other algorithmic device such as a general recursive function or λ-calculus term, is fundamentally finite, it's a straightforward (if tedious) exercise to describe it as an input to another machine that can then, straightforwardly, interpret the description to simulate the behavior of the described machine.  This is a universal Turing machine, which takes as input a machine description and an input to the described machine, and outputs the output of the described machine on that input — or doesn't halt, if the described machine wouldn't halt on that input.

However, although simulating a described machine is straightforward, it is not possible to determine in general, by computation, whether or not the described machine will halt on the given input.  That is, we cannot possibly construct a Turing machine that always halts that determines whether any described machine halts on a given input.  We can show this by diagonalization.

We can enumerate all possible machine descriptions, readily enough, since they're just alphabetic strings that obey some simple (and therefore checkable) syntactic rules.  We can also, of course, enumerate all possible inputs to the described machines.  Imagine a table where the entry S(n,m) at row n and column m is "yes" if the nth machine halts on the mth input, or "no" if it doesn't halt.  Suppose we can construct a machine that computes the entries S(n,m) of this table.  Then by going down the diagonal of the table we can also construct a machine whose behavior differs from every row of the table:  Let machine A on the mth input compute S(m,m), and if it's "no", halt and say "no", while if it's "yes", go into an infinite loop and thus never halt.  If machine A is the nth machine in the table, then A halts on the nth input if and only if S(n,n) is "no".  Assuming that our computation of S works right, there can't be any such n, and A was never enumerated.

Gödel

Gödel's Theorem (aka Gödel's First Theorem) says that any sufficiently powerful formal system is either incomplete or inconsistent — in essence, either it can't prove everything that's true, or it can prove things that aren't true.

To pin this down, we first need to work out what "sufficiently powerful" means.  Gödel wanted a system powerful enough to reason about arithmetic:  we can boil this down to, for an arithmetic function f and integer i, does f(i)=j or doesn't it?  The functions of interest are, of course, general recursive functions, which are equi-powerful with Turing machines and with λ-calculus; so we can equivalently say we want to reason about whether a given Turing machine with input i will or will not produce output j.  But a formal system is itself essentially a Turing machine; so in effect we're talking about a Turing machine L (the formal system; L for logic) that determines whether or not a Turing machine (the function f) on given input produces given output.  The system would be consistent and complete if it confirms every true statement about whether or not f on given input produces given output, and doesn't confirm any false such statements.

Enumerate the machines f and make them the rows of a table.  Enumerate the input/output pairs and make them the columns.  In the entry for row n and column m, put a "yes" if L confirms that the nth machine has the mth input/output pair, a "no" if L confirms that it doesn't.  Suppose L is consistent and complete.

It can't be both true and false that the nth machine has the mth input/output pair; so if L only confirms true propositions, there can't be both a "yes" and a "no" in any one table entry.  What about blank table entries?  For centuries it was generally agreed that a proposition must be either true or false; but this idea had fallen into some disrepute during the three decades leading up to Gödel's results.  This is just as well, because, based on our supposition that L is consistent and complete, we can easily show that the table must have some blank entries.  Suppose the table has no blank entries.  Then for any machine f1, and any input i, we can determine whether f1 halts on i, thus:  construct another machine f2 that runs f1 on i and then halts with output confirm.  Because there are no blank entries in the table, we know L can determine whether or not f2(i)=confirm, and this also determines whether or not f1 halts on i.  But we already know from the previous section that we cannot correctly determine by computation whether or not an arbitrary machine halts on an arbitrary input; therefore, there must be some blank entries in the table.

Is it possible for a proposition of this kind — that a given machine on a given input produces a given output — to be neither true nor false?  If you think this isn't possible, then we have already proven to your satisfaction that L cannot be both consistent and complete.  However, since we're collectively no longer so sure that propositions have to be either true or false, let's see if we can find a difficulty with the consistent complete system without insisting that every table entry must be filled in.  Instead, we'll look for a particular entry that we know should be filled in, but isn't.

We're going to diagonalize.  First, let's restrict our yes/no table by looking only at columns where the output is confirm (and, being really careful, suppress any duplicate column labels, so each column label occurs only once).  So now our table has rows for machines f, columns for inputs i, and each entry (n,m) contains a "yes" if L confirms that the nth machine on the mth input produces output confirm, while the entry contains a "no" if L confirms it does not produce output confirm.  The entry is blank if L doesn't confirm either proposition.  Construct a machine A as follows.  For given input, go through the column labels till you find the one that matches it (we were careful there'd be only one); call that column number m.  Use L to confirm a "no" in table entry m,m, and once you've confirmed that, output confirm.  If L never confirms that "no", then A never halts, and never outputs confirm.  Since A is a Turing machine, it is the label on some row n of the table.  What is the content of table entry n,n?  Remember, the content of the table entry is what L actually confirms about the behavior of A on the nth input.  By construction, if the entry contains "no", then A outputs confirm, and the "no" is incorrect.  If the entry contains "yes", and the "yes" is correct, then A outputs confirm, and by construction it must have done so because the entry contains an incorrect "no" that caused A to behave this way.  Therefore, if L doesn't confirm anything that's false, this table entry must be blank.  But if we know the table entry is blank, then we know that, by failing to put a "no" there, L has failed to confirm something true, and is therefore incomplete.

If we are sure the formal system proves everything that's true, then we cannot possibly be sure it doesn't prove anything that's false; if we are sure it doesn't prove anything that's false, we cannot possibly be sure it proves everything that's true.  Heisenberg's uncertainty principle comes to mind.

Attitude

Gödel's results are commonly phrased in terms of what a formal system can prove about itself, and treated in terms of the rules of deduction in the formal system.  There are both historical and practical reasons for this.

In the first half of the nineteenth century, the foundations of mathematics underwent a Kuhnian paradigm shift, settling on building things up formally from a set of axioms.  In the 1890s people started to notice cracks in the axiomatic foundations, in the form of antinomies — pairs of contradictory statements that were both provable from the axioms.  Mathematicians generally reacted by looking for some axiom they'd chosen that doesn't hold in general — as geometry had done in the early nineteenth century to explore non-Euclidean geometries that lack the parallel postulate.  As a source of antinomies, attention fell primarily on the Law of the Excluded Middle, which says a proposition is either true or false; as an off-beat alternative, Alonzo Church considered weakening reductio ad absurdum, which says that if assuming a proposition leads to a contradiction, then the proposition is false.  Thus, emphasis on choice of rules of deduction.

David Hilbert proposed to use a subset of a formal system to prove the consistency of the larger system; this would have the advantage that one might be more confident of the subset, so that using the subset to prove consistency would increase confidence in the larger system.  Gödel's result was understood to mean that the consistency of the whole formal system (for a powerful system) can only be proved in an even more powerful system.  Thus, emphasis on what a formal system can prove about itself.

Explorations of how to cope with the Theorem have continued to focus on the system's rules of deduction; my own earlier post tended this way.  Alan Turing's dissertation at Princeton also followed this route.  The emphasis on rules of deduction naturally suggests itself when looking for a way around Gödel's Theorem, because if you want to achieve a mechanical means for deriving truth, as a practical matter you can't achieve that without working out the specific mechanical means.

However, in this post I've been phrasing and treating Gödel's Theorem differently.

I phrased myself in terms of what we can know about the system — regardless of how we come to know — rather than what the system can prove about itself.  (I'm not distinguishing, btw, between "what we can know" and "what can be true"; either would do, in principle, but we're no longer sure what "truth" is, and while it's awkward to talk about multiple notions of truth, it's easier to talk about multiple observers.  When convenient I'll conjure a hypothetical omniscient being, to dispense with quibbles about "true but unknowable".)

My treatment of the Theorem conspicuously omits any internals of the formal system, supposing only that its conclusions are computable (and below I'll dispense with even that supposition).  By depicting Gödel's Theorem without any reference to the rules of deduction, this approach seems to throw a wet blanket on attempts to cope with Gödel's Theorem by means of one's choice of rules of deduction — and frankly, I approve of discouraging attempts by that route.  I'm not looking for a clever loophole in Gödel's result — invoking, say, uncountable infinities or second-order logic as a sort of Get Out of Jail Free card.  In my experience, when somebody thinks they've found a loophole in something as fundamental as Gödel's Theorem, it's very likely they've outsmarted themselves and ended up with a bogus result.  What I want is an obvious way of completely bypassing the Theorem; something poetically akin to cutting the Gordian Knot.

That is, I'm looking for a way around Gödel's Theorem with a high profundity index.  This is an informal device I use to characterize the sort of solutions I favor.  Imagine you could use numerical values to describe how difficult conceptual tasks are:  each such value is a positive number, and the more difficult the task, the higher the number.  Now, for a given idea, take the difficulty of coming up with the idea the first time, and divide by the difficulty of understanding the idea once it's been explained to you.  That ratio is the profundity index of the idea.  So an idea is profound if it was really difficult to come up with, but is really obvious once explained.  If an idea that's incredibly hard to come up with in the first place turns out to be even harder to figure out how to explain clearly, the denominator you want is the difficulty of understanding it after somebody has figured out how to explain it clearly, and the numerator should include the difficulty of coming up with the explanation.

The metaphor of getting around something implies a desire to get to the other side; and it may be illuminating to ask why one wants to do so.  We have here two notions, one practical and one philosophical.  The notion of truth is as philosophical as you can get; it's the whole purpose of philosophy.  The notion of mechanical computation is — despite quibbles about infinite resources and such — quintessentially practical, to do with getting results by an explicitly objective and reproducible procedure.  Mathematicians in the second half of the nineteenth century sought to access truth through computation.  The protracted collapse of that agenda across the first three decades of the twentieth century, culminating in Gödel's Theorem, has left us without a clear understanding of the proper role of computation in investigating truth; and with yet another in philosophy's long tradition of ways to not be sure what is true.  So I suppose, in trying to get around Gödel's Theorem, my hopes are

• to find a robust maximum of truth, as Turing power is a robust maximum of computational power.
• to find a robust maximum way of obtaining truth through computation.
Gödel's Theorem tells us that Turing power itself, despite its robustness, does not provide a straightforward maximum of either truth or proof.

Curry–Howard

Though there are (of course) situations in which the Curry–Howard correspondence is exactly what one needs, in general I see it as badly overrated.

The basic correspondence is between rules of deduction in formal proof systems, and rules of static typing in a programming language (classically, the typed λ-calculus).  The canonical example is that modus ponens corresponds to typed function application:  modus ponens says that if proposition A is provable and proposition AB is provable, then proposition B is provable; typed function application says that if a is an expression of type A and f is an expression of type AB, then fa is an expression of type B.  Moving outward from this insight, when you construct a correctly typed program you are also constructing a proof; thus proofs correspond to correctly typed programs.  A theorem corresponds to a type, so that asking whether a theorem has a proof is asking whether the corresponding type has a correctly typed expression of that type — that is, provability of the theorem corresponds to realizability of the type.  And so on.

The folk-wisdom version of the correspondence is that logic and computation are the same thing. The folk-corollary is that all reasoning should be done with types.  This is the basis of modern type theory, and there are folks trying to recast both programming language design, logic, and mathematics in the image of types.  Curry–Howard has taken on a (one hopes, metaphorically) theological significance.

It strikes me, though, that the basic correspondence does not involve computation at all.  If, in the realm of programming, a type system ever becomes Turing-powerful, that's a major mark against it because we want fast automatic type-checking and, even if we're willing to wait a little longer, we certainly want our type-checks to be guaranteed to halt.  In any event, types are not the primary vehicle of computation, rather they're a means of reasoning about the form of programs — thus, not even reasoning directly about our computations, but rather about the way we specify our computations.

It's easy to get tangled up trying to make sense of this proof–program connection.  For example, when we say that we want our automatic type-checking algorithm to always halt, that limits the computational power involved in checking an individual step of a proof, but puts no limit on the computational power of proof in general because the length of allowable proofs is unbounded, just as the size of program expressions is unbounded.  There is no evident notion of what it means for a proof to "halt", and this corresponds, through Curry–Howard, to saying there is no such thing as a "largest" expression in λ-calculus that cannot be made a subexpression of a larger expression; it has nothing whatever to do with halting of λ-calculus computations.  The reason one gets tangled up like this is that although proofs and programs are technically connected through Curry–Howard, they have different and often incompatible purposes.

The purpose of a proof, I submit, is to elucidate a chain of reasoning.  The more lucid, the better.  Ideally one wants what Paul Erdős used to call "one from The Book" — the idea being that God has a jealously guarded book of the most beautiful proofs of all theorems (Paul Erdős was an atheist; he said "You don't have to believe in God, but you should believe in The Book").  But the first duty of a program is to elucidate an algorithm.  Seriously.  This shouldn't be a controversial statement, and it's scary to realize that for some people it is.  I'll say it again.  The first duty of a program is to elucidate an algorithm.  You should be able to tell at a glance what the program is doing.  That is your first line of defense against getting the program wrong.  Proofs of program correctness, with all the benefits and problems thereof, are a later line of defense, possibly useful but no substitute for being able to tell what the program does.  (Yes, this is yet another of my dozen-or-so topics I'm trying to draft a blog post about, under the working title "How do we know it's right?".)  And this is where the two sides of the Curry–Howard correspondence part company.  If you relentlessly drive your program syntax toward more lucid expression of algorithms, you obfuscate the inference rules of your type system, which is to say, the deductive steps of the corresponding proof.  If you drive to simplify the type system, you're no longer headed for maximally lucid algorithms.  In practice, it seems, you try to get simultaneously the best of both worlds and end up sliding toward the worst of both.  (My past blog post on types is yonder.)

What we can't know

Enough type-bashing.  Back on the Gödel front we were hoping for robust maxima of truth and of computable truth.  What does Gödel's Theorem actually tell us about these goals?

First of all we should recognize that Gödel's Theorem is not, primarily, a limit on what can be known by mechanical means.  The uncomputability of the Halting problem (proven above) is a limit on what can be known by mechanical means.  Gödel's Theorem is a limit on what can be known at all.  That is, it bears more on truth than it does on truth through computation.  I touched on this point above.  To further clarify, we can use a notion that played a brief role in Alan Turing's doctoral dissertation at Princeton (supervised by Alonzo Church), called an oracle.

Suppose we attach a peripheral device, O, to a Turing machine.  The Turing machine, chugging along mechanically, can at any time ask a question of O, and O will give an answer (based on the question) in one step of the machine.  O is a black box; we don't say how it works, and there's no need for it to be mechanical inside.  Maybe there's a djinn, or a deity, or some such, inside O that's producing the answer.  We're only assuming that it will always immediately answer any question asked of it.  That's what we mean by an oracle.  We'll call a Turing machine equipped with this device an O-machine.

Let's revisit the Halting problem, and see what we can —and can't— do to change the situation by introducing an oracle.  Our basic result, remember, is that there is no purely mechanical means to determine whether or not an arbitrary Turing machine will halt on an arbitrary input.  Okay.  What if we had a non-mechanical means?  Precisely, let us suppose we have an oracle O that will always tell us immediately whether a given ordinary Turing machine will halt on a given input.  This supposition doesn't appear to have, in itself, any unfortunate consequences.  Under the supposition that we have such an oracle O, we can easily build an O-machine that determines whether a given ordinary Turing machine will halt on a given input.  What we can't do, given this oracle O, is build an O-machine that determines whether a given O-machine halts on a given input.  This is one of those results that's easier to show if we make it stronger.  Without even knowing what an oracle O does, we can prove that there is no O-machine that always halts that determines whether any described O-machine will halt on a given input.  This is because we can describe all possible O-machines without ever having to describe the internals of OO is the same for all of them, after all; we only have to describe the parts of the O-machines that differ from each other, and that we can do finitely no matter what O is.  So we can still enumerate our O-machines, and diagonalize just as we did in the earlier section to prove an ordinary Turing machine couldn't solve the ordinary Halting problem.  No matter what O is, even if it's got an omniscient being hidden inside it, an O-machine can't solve the O-Halting problem.

Likewise, our diagonalization to prove Gödel's Theorem works just as well for O-machines as for ordinary Turing machines.  For any oracle O, let L be an O-machine that confirms some propositions, and fails to confirm others, about whether or not given O-machines on given inputs produce given outputs.  Reasoning just as before, if we know that L confirms all true claims, then we cannot know that L doesn't confirm any false claims; if we know it doesn't confirm any false claims, then we cannot know it confirms all true claims.  So even if we're allowed to consult an oracle, we still can't achieve an understanding of truth, L, that confirms all and only true claims (about O-machines, lest we forget).

We are now placing limits on what we can know, or equivalently, on what can be true.  To get such results we must have started with some constraints, and we've conspicuously placed no constraints at all on O, not even computability.  It's worth asking what our constraints were, that led to the result.

For one thing, we have required truth to agree with the reasoning in the diagonalization argument of our proof of the Theorem.  Interestingly, this makes clear that the oracle itself is not our notion of truth, for we didn't require the oracle to respect the reasoning in our diagonalization; rather, with no constraints on how the oracle derives its answers from the questions asked of it, we proved a limitation on what those answers could mean.  The O-machine L, together with our reasoning about it, became our I Ching, the means by which we mapped the oracle's behavior into the realm of truth.

The reasoning in the diagonalization isn't all we introduced; we also introduced... what?  The requirement that our djinn/deity/whatever-it-is feed its answers to a mechanical device.  The requirement that it base its answers only on what question the mechanical device asked of it.  Ultimately, the requirement that the truth be about discrete inputs and outputs and be expressed as discrete propositions.

The discreteness of the mechanical device makes it possible to enumerate, and therefore diagonalize.  The discreteness of the question/answer interactions with the oracle is apparently a corollary to that.  The requirement that the oracle base its answer only on the question asked... is thought-provoking, since modern physics leans toward the concept of entanglement.  Heisenberg's uncertainty principle already came up once above.  One might wonder if truth ought to be reimagined in some way that makes it inherently part of the system in which it exists, rather than something separate; but here, the discrete separations — between machine and oracle, between subject and truth — seem a rather natural complement to the discreteness of the machine.

The common theme of discreteness is reminiscent of the importance attached to "chunking" of information in my earlier post on sapience and language.  Moreover, the relationship between discreteness and continuity seems to be cropping up in several of my current major avenues of investigation — Gödel, linguistics, sapience, fundamental physics — and I find myself suspecting we lack some crucial insight into this relationship — hopefully, an insight with a very high profundity index, because the time required to acquire the insight, for the numerator of the index, appears to be measured in millennia, so in order for us to grok it once found we should hope for the denominator to be a lot smaller.  However, I've no immediate thoughts in this direction.  In the current case, it's more than a little mind-bending to try to construct a notion of truth that's inherently not separable into discrete propositions; indeed, one might wonder if the consequences of the attempt could be rather Lovecraftian.  So for the nonce I'll keep looking for profundity around Gödel's results without abandoning the concept of a proposition.

Gödel's Second Theorem

Gödel's Second Theorem says that if a sufficiently powerful formal system L can prove itself consistent, then L is not consistent.

Where Gödel's First Theorem was, as I interpret it, mainly about the relationship between L and truth, his Second Theorem is much more concerned with the specific behavior of L.  I covered the "sufficiently powerful" clause in the First Theorem mostly by completeness, i.e., by requiring that L confirm everything we know is true.  The Second Theorem isn't about completeness, though, so we need something else.  Just as we have already required truth to agree with our diagonalization argument, we'll now separately require L to agree with the diagonalization argument too.  And, we don't want to require any controversial behaviors from L, like the Law of the Excluded Middle or reductio ad absurdum, since they would compromise the generality of our result.  Here's a set of relatively tame specific behaviors.  Write [T:io] and [T:io] for the canonical representations, recognized by L, of the propositions that machine T on input i does or does not produce output o.

• [a] If T(i)=o, then L confirms [T:io].

• There is a canonical way, recognizable by L, to set up a machine (ST) that combines two other machines S and T, by running each of them on its input, requiring them both to produce the same output, and producing that output itself.  We require L to do some reasoning about these machines:

• [b] L confirms [(TT):io] iff it confirms [T:io].
• [c] If L confirms [(ST):io], then it confirms [(TS):io].

• There is a canonical way, recognizable by L, to set up a machine (ST) that combines two other machines S and T, by running T on its input, then running S on the output from T, and producing the output from S.  We require L to do some reasoning about these machines:

• [d] If T(i)=v, then L confirms [(ST):io] iff it confirms [S:vo].
• [e] If S1(i)=S2(i)=v, then L confirms [((RS1)∧T):io] iff it confirms [((RS2)∧T):io].
• [f] L confirms [(((QR)∘S)∧T):io] iff it confirms [((Q∘(RS))∧T):io].
• [g] L confirms [((S1T)∧(S2T)):io] iff it confirms [((S1S2)∘T):io].

• There is a canonical way, recognizable by L, to set up a machine (To) that combines a machine T with an output o, by constructing a proposition that T on the given input produces output o.  That is, (To)(i)=[T:io].  We require L to do some reasoning about these machines:

• [h] L confirms [(ST):iconfirm] iff it confirms [((L∘(Sconfirm))∧T):iconfirm].

For us, several of these deductions would likely be special cases of more general rules, and maybe they are for L; we only require that L reach these conclusions by some means.

In order for L to prove its own consistency, we need to define consistency as an internal, checkable feature of L's behavior — rather than by the external feature that L says nothing false according to the external notion of truth.  Leading candidates are

• L does not confirm any antinomy; that is, there is no proposition such that L confirms both it and its negation.
• There is some proposition that L does not confirm.
The second of these was widely favored in the early twentieth century:  inconsistency means confirming all propositions whatever.  Given reductio ad absurdum and the Law of the Excluded Middle, the first implies the second, thus:  Starting with a known proof of an antinomy, consider any proposition q.  Supposing not-q, we can derive an antinomy (because we could have derived it even if we hadn't supposed not-q), therefore, by reductio ad absurdum, the supposition must be false; that is, not-q is false.  But then, by the Law of the Excluded Middle, since q definitely isn't false, it must be true.  So every proposition can be proven.

I'll use the first of these two internal notions of consistency, because it is the weaker of the two, so that any theorem we derive from it will be a stronger result.  Construct machines Tyes and Tno that, given a machine and an input/output pair, output propositions asserting the machine does and does not have that input/output:  Tyes(T,i,o)=[T:io], Tno(T,i,o)=[T:io].  As self-proof of consistency, require that L confirm, for every possible input i, proposition [((LTyes)∧(LTno)):iconfirm].

Start out just as in the diagonalization for the First Theorem.  Imagine a table where the column labels are all possible inputs, the row labels are all possible machines, and each entry contains a "yes" if L confirms that machine on that input outputs confirm, a "no" if L confirms that machine on that input doesn't output confirm.  Construct a machine A as follows.  First, construct A0 that, on the mth input, outputs the machine/input/output needed to construct a proposition that the mth machine on the mth input does or does not produce output confirm.  (Remember, A0 can do this by counting the column labels till it finds the input, then counting the row labels to find the corresponding machine.)  Then just let A=(L∘(TnoA0)).  Let n be the row number of the row labeled by A.  We're interested in the behavior of A on the nth input; but this time, instead of just considering what is true about this behavior, we're interested in what L confirms about it.

Call the nth input i.  We've got all our dominoes lined up; now watch them fall.

 By construction of A0, A0(i)=(A,i,confirm). By construction of Tyes, Tyes(A,i,confirm)=[A:i⇒confirm]. By construction of Tno, Tno(A,i,confirm)=[A:i⇏confirm]. By consistency self-proof, L confirms [((L∘Tyes)∧(L∘Tno)):(A,i,confirm)⇏confirm]. By [d], L confirms [(((L∘Tyes)∧(L∘Tno))∘A0):i⇏confirm]. By [g], L confirms [(((L∘Tyes)∘A0)∧((L∘Tno)∘A0)):i⇏confirm]. By [f] and [c], L confirms [((L∘(Tyes∘A0))∧(L∘(Tno∘A0))):i⇏confirm]. By definition of A, L confirms [((L∘(Tyes∘A0))∧A):i⇏confirm]. By [e], L confirms [((L∘(A⇒confirm))∧A):i⇏confirm]. By [h], L confirms [(A∧A):i⇏confirm]. By [b], L confirms [A:i⇏confirm]. By [a] and definition of A, L confirms [A:i⇒confirm].
For a bonus, by [a], L also proves itself inconsistent by confirming [((LTyes)∧(LTno)):iconfirm].

I don't think this result has much to do with the specific assumptions [b]–[h] about the behavior of L; going through the proof leaves me impressed by how relatively innocuous those assumptions were.  Which, to my mind, is an insight well worth the exercise of going through the proof.

Turing's dissertation

I've mentioned Turing's American doctoral dissertation several times.  (American because he already had sufficient academic credentials in Europe.)

Since Gödel had shown a formal system can't prove itself consistent, it was then of interest to ask how much more than a given formal system would be needed to prove it consistent.  Gerhard Gentzen produced some interesting results of this sort, exploring the formal consequences of postulating restricted forms of mathematical induction (before he was arrested by the Germans late in World War II, transferred to the custody of the Soviets, and starved to death in a Soviet POW camp).  Turing's dissertation explored another approach:  when considering a formal system L, simply construct a new system L1 that adds to L a postulate saying L is consistent.  Naturally you can't add that postulate to L since it would (presuming sufficient power) cause L to become inconsistent if wasn't already; but if L actually was consistent to start with, L1 should be consistent too since L1 can't prove its own consistency, only that of L.  To prove the consistency of L1, you can construct an L2 that adds to L1 a postulate saying L1 is consistent.  And so on.  In fact, Turing didn't even stop with Lk for every positive integer k; he supposed a family of formal systems La for any a representing an ordinal.  Ordinals are (broadly) the sizes of well-ordered sets; there are infinitely many countably infinite ordinals; there are uncountably infinite ordinals.  The only restriction for Turing was that since all this was being done with formal systems, the ordinals α had to be finitely represented.

Turing made the — imho, eminently reasonable — suggestion that the value of this technique lies in explicitly identifying where additional assumptions are being introduced.  The additional assumptions themselves, consistency of all the individual ordinal logics in the system, are justified only by fiat, so nothing has been apparently contributed toward reaching truth through computation; the axioms weren't reached through computation, merely acknowledged in computation.  Interestingly, nothing has been contributed toward getting around Gödel's Theorems, either:  in terms of the framework set up in this post, the entire system of logics can be handled by a single formal system that is itself still subject to the Theorems.

I set out to simplify the situation, pruning unnecessary complications in order to better understand the essence of Gödel's results.  It seems an encouraging sign for the success of that agenda, that Turing's authentically interesting but elaborate dissertation has little to say about my treatment — since this means the complexities addressed by his approach have been pruned.

Where do we go from here?

Types.  The above suggests correctness proof should be separated from detailed program syntax.  As I've remarked elsewhere, it should be possible to maintain propositions and theorems as first-class objects, so that proof becomes the computational act of constructing a theorem object — if one can determine the appropriate rules of deduction, which makes this concern dependent on choosing the appropriate notion of proof.

Truth.  Gödel's First Theorem, as treated here, says that truth about the behavior of a formal logic must conform to a certain constraint on the shape of that truth.  The impression I've picked up, in constructing this post, is that this constraint should not be particularly troubling.  A Euclidean triangle can't have more than one obtuse interior angle; and the truth about a formal logic can't include both completeness and consistency.  I've been progressively mellowing about this since the moment I started thinking of a formal system as a filter on truth — an insight preserved above at the comparison to the I Ching.

Proof.  Gödel's Second Theorem, as treated here, not only places a notably small constraint on proof (as the First Theorem does on truth), but does so in a way notably disengaged from controversial rules of deduction.  The insight I take from this is that, in seeking a robust notion of proof, Gödel's Theorems aren't all that important.  The great paradigm crisis of the early twentieth century (in the admittedly rather specialized area of mathematical foundations) was about antinomies implied by the axioms, not self-proofs of consistency.  The self-proofs of consistency were just an envisioned possible countermeasure, and when Gödel showed the countermeasure couldn't work, the form of his Second Theorem, together with the lines of research people had already been pursuing, led them to the red herring of an infinite hierarchy of successively more powerful systems each able to show the consistency of those below it.  Whatever interesting results may accompany the infinite hierarchy, I now suspect they have nothing much to say about a robust maximum of proof.  Turing's ordinal logics — in essence an explicit manifestation of the infinite hierarchy — were, remember, about assuming consistency, not establishing it.

So it seems to me the problem of interest — even in my earlier post on bypassing no-go theorems, where Gödel's results served as the rallying point for my explorations — is how to fix the axioms.  In that regard, I have a thought, which clearly calls for a separate post, but I'll offer it here to mull over.

I speculated in the no-go theorems post that where the proof of Russell's Paradox ends after a few steps, leaving a foundational problem, an analogous recursive predicate in Lisp simply fails to terminate, which might or might not be considered a bug in the program but doesn't have the broad foundational import of the Paradox.  If a person reasoned about whether or not the set A of all sets that do not contain themselves contains itself, they might say, "If A does not contain itself, then by definition it does contain itself; but then, since it does contain itself, by definition it does not contain itself; but then, since it does not contain itself, by definition it does contain itself..." — and the person saying this quickly sees that the reasoning is going 'round in circles.  This is much like

`(\$define! A (\$lambda (P) (not? (P P))))(A A)`
except that the Lisp interpreter running this code probably doesn't deduce, short of a memory fault or the like, that the computation won't halt, whereas the human quickly deduces the non-halting.  The formal proof, however, does not go 'round in circles.  It says, "Suppose A does not contain itself.  By definition, since A does not contain itself, A does contain itself. That's a contradiction, therefore by reductio ad absurdum, A does not not contain itself.  By the Law of the Excluded Middle, since A does not not contain itself, A does contain itself.  By definition, since A does contain itself, A does not contain itself.  That's an antinomy."  Why did this not go 'round in circles?  Because the initial premise, "Suppose A does not contain itself", was discharged when reductio ad absurdum was applied.  The human reasoner, though, never forgot about the initial assumption.

This isn't quite the same as a conditional proof, in which you start with an assumption P, show that this assumption leads to a consequence Q, and then conclude from this that P⇒Q regardless of whether or not P holds; the assumption has been discharged, but it lingers embedded in the conclusion.  It only appears to have discharged the assumption because we have a notion of implication that neatly absorbs the memory of our having assumed P in order to prove Q.  Really, all proofs are conditional in the sense that they only hold if we accept the rules of deduction by which we reached them; but we can't absorb that into our propositions using logical implication.  We could still preface all of our propositions with "suppose that such-and-such laws of deduction are valid"; but when writing it formally we'd need a different notation because it's not the same thing as logical implication.  We might be tempted to use something akin to a turnstile, which is (in my experience) strictly a meta-level notation — but in the case of reductio ad absurdum, it seems we don't want a meta-level notation.  We want to qualify our propositions by the memory of other propositions we've rejected because they led to contradictions.

I'd expect to explore what this does to paradoxes (not just Russell's); I'm skeptical that in itself it would eliminate all the classical paradoxes.  I do rather like the idea that the classical paradoxes are all caused by a single flaw, but I suspect the single flaw isn't in a single axiom; rather, it may be a single systemic flaw.  It seems that, in some sense, not discharging the assumption in reductio ad absurdum is a way to detect impredicativity, the same source of paradoxes type theory was meant to eliminate.

I look forward to exploring that.  Along with my dozen-or-so other draft post topics.

#### 6 comments:

1. This description of Godel's incompleteness theorem is similar to other traditional accounts in that it misses an important fact: according to Godel's *completeness* theorem (a result much more fundamental than the celebrated incompleteness one) a sentence is provable iff it is valid in all models. This means that the "Godel's sentence", which is unprovable in PA, is necessary invalid in some versions of natural numbers.

Of course, up to this day, there is no shortage of people who believe that there is only one true set of natural numbers (sanctioned by the Pope and Congress, presumably) and that their Turing machines compute functions on this sacred object.

2. This comment has been removed by the author.

3. This comes from a discussion at Lambda the ultimate, answering this comment -

In your rendition of Gödel's incompleteness theorem, you speak about a Turing machine L and another A.

the machine A must obviously contain L.
No problem with that. Let's note it with A(L)

L is about implementation, in the sense that to be able to, given some machine f, determine that it produces some output, either you run the machine, or you already have within L all possible results of all possible machines, which is crazy.

So L corresponds with a given way to encode Turing machines and run them.

There are many, many ways of doing this (Encoding and running Turing machines). Let's note this by adding an index to L, L^i. It's totally harmless, and we can still have L^1, that is exactly and simply L, by
definition here and now.

However now we also have different A(L^i), and also A(L^1).

In your text, we may say that you encode A(L^1) in L^1 and feed it to A(L^1).
And this produces a contradiction.

But using indexes allows us to wonder:

What happens if we encode the A(L^1) in L^2 and feed it to the A(L^2) machine?
It will find no trouble at all, no contradiction.
And feeding it to A(L^2) amounts to nothing but running it;
So, running the Turing machine A(L^1) in different environments leads to different
results. The results are implementation dependent.

Expressing the same Turing machine, A(L^1), in 2 different encodings, L^1 and L^2, leads to different results.

1.- http://lambda-the-ultimate.org/node/5425#comment-94109

1. Of course, the argument is identical if we consider that L* i is not contained in A but is part of its input. The same machine, A, fed the same input, A + L^ 1, gives different results in different L^i.

4. Question about diagonalization: How do you prove the diagonal does not occur as a single 'line' entry in the table? Surely it could occur? For example a table where every line is 123456789, would have a diagonal that is 123456789?

1. For the diagonalization method, what's important about the diagonal is that for every row of the table, there is *at least* one entry in that row that's identical to the diagonal: for the nth row, its nth entry is identical to the nth entry on the diagonal (that's how we define what we mean by "the diagonal"). It doesn't matter if some row also has other entries in common with the diagonal, as long as it has at least one. If we can construct a row that differs in every entry from the diagonal, it cannot be the nth row in the table for *any n*. For a typical diagonalization proof that the halting problem is undecidable, there is in fact some row of the table that's identical to the diagonal, because it's possible to construct a Turing machine that for the nth input generates the nth entry of the diagonal, and since that Turing machine exists it must appear as a row of the table.