Structural insight: Interpreted programming languages

Monday, August 29, 2016

Interpreted programming languages

Last night I drifted off while reading a Lisp book.
— xkcd 224.

It's finally registered on me that much of the past half century of misunderstandings and confusions about the semantics of Lisp, of quotation, and, yes, of fexprs, can be accounted for by failure to recognize there is a theoretical difference between an interpreted programming language and a compiled programming language. If we fail to take this difference into account, our mathematical technology for studying compiled programming languages will fail when applied to interpreted languages, leading to loss of coherence in language designs and a tendency to blame the language rather than the theory.

Technically, a compiler translates the program into an executable form to be run thereafter, while an interpreter figures out what the program says to do and just does it immediately. Compilation allows higher-performance execution, because the compiler takes care of reasoning about source-code before execution, usually including how to optimize the translation for whatever resources are prioritized (time, space, peripherals). It's easy to suppose this is all there is to it; what the computer does is an obvious focus for attention. One might then suppose that interpretation is a sort of cut-rate alternative to compilation, a quick-and-dirty way to implement a language if you don't care about performance. I think this misses some crucial point about interpretation, some insight to be found not in the computer, but in the mind of the programmer. I don't understand that crucial insight clearly enough — yet — to focus a whole blog post on it; but meanwhile, there's this theoretical distinction between the two strategies which also isn't to be found in the computer's-eye view, and which I do understand enough about to focus this blog post on.

It's not safe to say the language is the same regardless of which way it's processed, because the language design and the processing strategy aren't really independent. In principle a given language might be processed either way; but the two strategies provide different conceptual frameworks for thinking about the language, lending themselves to different language design choices, with different purposes — for which different mathematical properties are of interest and different mathematical techniques are effective. This is a situation where formal treatment has to be considered in the context of intent. (I previously blogged about another case where this happens, formal grammars and term-rewriting calculi, which are both term-rewriting systems but have nearly diametrically opposite purposes; over t‍ha‍r.)

I was set onto the topic of this post recently by some questions I was asked about Robert Muller's M-Lisp. My dissertation mentions Muller's work only lightly, because Muller's work and mine are so far apart. However, because they seem to start from the same place yet lead in such different directions, one naturally wants to know why, and I've struck on a way to make sense of it: starting from the ink-blot of Lisp, Muller and I both looked to find a nearby design with greater elegance — and we ended up with vastly different languages because Muller's search space was shaped by a conceptual framework of compilation while mine was shaped by a conceptual framework of interpretation. I will point out, below, where his path and mine part company, and remark briefly on how this divergence affected his destination.

The mathematical technology involved here, I looked at from a lower-level perspective in an earlier post. It turns out, from a higher-level perspective, that the technology itself can be used for both kinds of languages, but certain details in the way it is usually applied only work with compiled languages and, when applied to interpreted languages, result in the trivialization of theory noted by Wand's classic paper, "The Theory of Fexprs is Trivial".

Contents
M-language
Compilation
Universal program
S-language
Half a century's worth of misunderstandings and confusions

M-language

I'll explore this through a toy programming language which I'll then modify, starting with something moderately similar to what McCarthy originally described for Lisp (before it got unexpectedly implemented).

This is a compiled programming language, without imperative features, similar to λ-calculus, for manipulating data structures that are nested trees of atomic symbols. The syntax of this language has two kinds of expressions: S-expressions (S for Symbolic), which don't specify any computation but merely specify constant data structures — trees of atomic symbols — and M-expressions (M for Meta), which specify computations that manipulate these data structures.

S-expressions, the constants of the language, take five forms:

S ::= s | () | #t | #f | (S . S) .

That is, an S-expression is either a symbol, an empty list, true, false, or a pair (whose elements are called, following Lisp tradition, its car and cdr). A symbol name is a sequence of one or more upper-case letters. (There should be no need, given the light usage in this post, for any elaborate convention to clarify the difference between a symbol name and a nonterminal such as S.)

I'll assume the usual shorthand for lists, (S ...) ≡ (S . (...)), so for example (FOO BAR QUUX) ≡ (FOO . (BAR . (QUUX . ()))); but I won't complicate the formal grammar with this shorthand since it doesn't impact the abstract syntax.

The forms of M-expressions start out looking exactly like λ-calculus, then add on several other compound forms and, of course, S-expressions which are constants:

M ::= x | [λx.M] | [M M] | [if M M M] | [car M] | [cdr M] | [cons M M] | [eq? M M] | S .

The first form is a variable, represented here by nonterminal x. A variable name will be a sequence of one or more lower-case letters. Upper- versus lower-case letters is how McCarthy distinguished between symbols and variables in his original description of Lisp.

Variables, unlike symbols, are not constants; rather, variables are part of the computational infrastructure of the language, and any variable might stand for an arbitrary computation M.

The second form constructs a unary function, via λ; the third applies a unary function to a single argument. if expects its first argument to be boolean, and returns its second if the first is true, third if the first is false. car and cdr expect their argument to be a pair and extract its first and second element respectively. cons constructs a pair. eq? expects its two arguments to be S-expressions, and returns #t if they're identical, #f if they're different.

Compound unary functions, constructed by λ, are almost first-class. They can be returned as the values of expressions, and they can be assigned to variables; but as the grammar is set out, they cannot appear as elements of data structures. A pair expression is built up from two S-expressions, and a compound unary function is not an S-expression. McCarthy's original description of Lisp defines S-expressions this way; his stated purpose was only to manipulate trees of symbols. Trained as I am in the much later Lisp culture of first-class functions and minimal constraints, it felt unnatural to follow this narrower definition of S-expressions; but in the end I had to define it so. I'm trying to reproduce the essential factors that caused Lisp to come out the way it did, and, strange to tell, everything might have come out differently if not for the curtailed definition of S-expressions.

(One might ask, couldn't we indirectly construct a pair with a function in it using cons? A pair with a function in it would thus be a term that can't be represented directly by a source expression. This point likely doesn't matter directly to how things turned out; but fwiw, I suspect McCarthy didn't have in mind to allow that, no. It's entirely possible he also hadn't really thought about it yet at the preliminary stage of design where this detail had its impact on the future of Lisp. It's the sort of design issue one often discovers by playing around with a prototype — and in this case, playing around with a prototype is how things got out of hand; more on that later.)

Compilation

Somehow we want to specify the meanings of programs in our programming language. Over the decades, a number of techniques for formal PL semantics have been entertained. One of the first things tried was to set up a term-rewriting machine modeled roughly on λ-calculus, that would perform small finite steps until it reached a final state; that was Peter Landin's notion, when Christopher Strachey set him onto the problem in the 1960s. It took some years to get the kinks out of that approach, and meanwhile other techniques were tried — such as denotational semantics, meta-circular evaluation — but setting up a term-rewriting calculus has been quite a popular technique since the major technical obstacles to it were overcome. Of the three major computational models tied together in the 1930s by the Church-Turing thesis, two of them were based on term-rewriting: Turing machines, which were the convincing model, the one that lent additional credibility to the others when they were all proven equivalent; and λ-calculus, which had mathematical elegance that the nuts-and-bolts Turing-machine model lacked. The modern "small-step" rewriting approach to semantics (as opposed to "big-step", where one deduces how to go in a single step from start to finish) does a credible job of combining the strengths of Turing-machines and λ-calculus; I've a preference for it myself, and it's the strategy I'll use here. I described the technique in somewhat more depth in my previous post on this material.

Small-step semantics applies easily to this toy language because every intermediate computational state of the system is naturally represented by a source-code expression. That is, there is no obvious need to go beyond the source-code grammar we've already written. Some features that have been omitted from the toy language would make it more difficult to limit all computational states to source expressions; generally these would be "stateful" features, such as input/output or a mutable store. Landin used a rewriting system whose terms (computational states) were not source-code expressions. One might ask whether there are any language features that would make it impossible to limit computational states to source expressions, and the answer is essentially yes, there are — features related not to statefulness, but to interpretation. For now, though, we'll assume that all terms are source expressions.

We can define the semantics of the language in reasonably few rewriting schemata.

[[λx.M₁] M₂] → M₁[x ← M₂]
[if #t M₁ M₂] → M₁
[if #f M₁ M₂] → M₂
[car (S₁ . S₂)] → S₁
[cdr (S₁ . S₂)] → S₂
[cons S₁ S₂] → (S₁ . S₂)
[eq? S S] → #t
[eq? S₁ S₂] → #f if the S_k are different.

Rewriting relation → is the compatible closure of these schemata; that is, for any context C and terms M_k, if M₁ → M₂ then C[M₁] → C[M₂]. Relation → is also Church-Rosser: although a given term M may be rewritable in more than one way, any resulting difference can always be eliminated by later rewriting. That is, the reflexive transitive closure →^* has the diamond property: if M₁ →^* M₂ and M₁ →^* M₃, then there exists M₄ such that M₂ →^* M₄ and M₃ →^* M₄.

Formal equality of terms, =, is the symmetric closure of →^* (thus, the reflexive symmetric transitive compatible closure of the schemata, which is to say, the least congruence containing the schemata).

Another important relation is operational equivalence, ≅. Two terms are operationally equivalent just if replacing either by the other in any possible context preserves the observable result of the computation. M₁ ≅ M₂ iff for every context C and S-expression S, C[M₁] ↦* S iff C[M₂] ↦* S. (Fwiw, relation ↦ is what the computation actually does, versus → which is anything the rewriting calculus could do; → is compatible Church-Rosser and therefore nice to work with mathematically, but ↦ is deterministic and therefore we can be sure it does what we meant it to. Another way of putting it is that → has the mathematical character of λ-calculus while ↦ has the practical character of Turing-machines.)

Operational equivalence is exactly what must be guaranteed in order for an optimizing compiler to safely perform a local source-to-source transformation: as long as the two terms are operationally equivalent, the compiler can replace one with the other in any context. The rewriting calculus is operationally sound if formal equality implies operational equivalence; then the rewriting calculus can supply proofs of operational equivalence for use in optimizing compilation.

Before moving on, two other points of interest about operational equivalence:

Operational equivalence of S-expressions is trivial; that is, S₁ and S₂ are operationally equivalent only if they are identical. This follows immediately by plugging the trivial context into the definition of operational equivalence, C[S_k] ≡ S_k. Thus, in every non-trivial operational equivalence M₁ ≅ M₂, at least one of the M_k is not an S-expression.
All terms in the calculus — all M-expressions — are source expressions; but if there were any terms in the calculus that were not source expressions, they would be irrelevant to a source-to-source optimizer; however, if these non-source terms could be usefully understood as terms in an intermediate language used by the compiler, an optimizer might still be able to make use of them and their formal equalities.

Universal program

McCarthy's Lisp language was still in its infancy when the project took an uncontrollable turn in a radically different direction than McCarthy had envisioned going with it. Here's what happened.

A standard exercise in theory of computation is to construct a universal Turing machine, which can take as input an encoding of an arbitrary Turing machine T and an input w to T, and simulate what T would do given input w. This is an extremely tedious exercise; the input to a Turing machine looks nothing like the control device of a Turing machine, so the encoding is highly intrusive, and the control device of the universal machine is something of an unholy mess. McCarthy set out to lend his new programming language mathematical street cred by showing that not only could it simulate itself with a universal program, but the encoding would be much more lucid and the logic simpler in contrast to the unholy mess of the universal Turing machine.

The first step of this plan was to describe an encoding of programs in the form of data structures that could be used as input to a program. That is to say, an encoding of M-expressions as S-expressions. Much of this is a very straightforward homomorphism, recursively mapping the non-data M-expression structure onto corresponding S-expressions; for our toy language, encoding φ would have

φ(x) = symbol s formed by changing the letters of its name from lower-case to upper-case. Thus, φ(foo) = FOO.
φ([λx.M]) = (LAMBDA φ(x) φ(M)).
φ([M₁ M₂]) = (φ(M₁) φ(M₂)).
φ([if M₁ M₂ M₃]) = (IF φ(M₁) φ(M₂) φ(M₃)).
φ([car M]) = (CAR φ(M)).
φ([cdr M]) = (CDR φ(M)).
φ([cons M₁ M₂]) = (CONS φ(M₁) φ(M₂)).
φ([eq? M₁ M₂]) = (EQ φ(M₁) φ(M₂)).

(This encoding ignores the small detail that certain symbol names used in the encoding — LAMBDA, IF, CAR, CDR, CONS, EQ — must not also be used as variable names, if the encoding is to behave correctly. McCarthy seems not to have been fussed about this detail, and nor should we be.)

For a proper encoding, though, S-expressions have to be encoded in a way that unambiguously distinguishes them from the encodings of other M-expressions. McCarthy's solution was

φ(S) = (QUOTE S).

Now, in some ways this is quite a good solution. It has the virtue of simplicity, cutting the Gordian knot. It preserves the readability of the encoded S-expression, which supports McCarthy's desire for a lucid encoding. The main objection one could raise is that it isn't homomorphic; that is, φ((S₁ . S₂)) is not built up from φ(S₁) and φ(S₂).

As McCarthy later recounted, they had expected to have plenty of time to refine the language design before it could be implemented. (The FORTRAN compiler, after all, had been a massive undertaking.) Meanwhile, to experiment with the language they began hand-implementing particular functions. The flaw in this plan was that, because McCarthy had been so successful in demonstrating a universal Lisp function eval with simple logic, it wasn't difficult to hand-implement eval; and, because he had been so successful in making the encoding lucid, this instantly produced a highly usable Lisp interpreter. The sudden implementation precipitated a user community and substantial commitment to specifics of what had been a preliminary language design.

All this might have turned out differently if the preliminary design had allowed first-class functions to be elements in pairs. A function has to be encoded, homomorphically, which would require a homomorphic encoding of pairs, perhaps φ((M₁ . M₂)) = (CONS φ(M₁) φ(M₂)); once we allow arbitrary M-expressions within the pair syntax, (M₁ . M₂), that syntax itself becomes a pair constructor and there's really no need for a separate cons operator in the M-language; then CONS can encode the one constructor. One might then reasonably restrict QUOTE to base cases; more, self-encode () #t and #f, leaving only the case of encoding symbols, and rename QUOTE to SYMBOL. The encoding would then be fully homomorphic — but the encodings of large constant data structures would become unreadable. For example, fairly readable constant structure

((LAMBDA X (X Y)) FOO)

would encode through φ as

(CONS (CONS (SYMBOL LAMBDA) (CONS (SYMBOL X) (CONS (CONS (SYMBOL X) (CONS (SYMBOL Y) ())) ()))) (CONS (SYMBOL FOO) ())) .

That didn't happen, though.

The homomorphic, non-QUOTE encoding would naturally tend to produce a universal function with no practical potential for meta-programming. In theory, one could use the non-homomorphic QUOTE encoding and still not offer any native meta-programming power. However, the QUOTE-based encoding means there are data structures lying around that look exactly like working executable code except that they happen to be QUOTE‍d. In practice, the psychology of the notation makes it rather inevitable that various facilities in the language would allow a blurring of the line between data and code. Lisp, I was told when first taught the language in the 1980s, treats code as data. Sic: I was not told Lisp treats data as code, but that it treats code as data.

In other words, Lisp had accidentally become an interpreted language; a profoundly different beast than the compiled language McCarthy had set out to create, and one whose character naturally suggests a whole different set of features that would not have occurred to someone designing a compiled language in 1960. There were, of course, some blunders along the way (dynamic scope is an especially famous one, and I would rate the abandonment of fexprs in favor of macros as another of similar magnitude); but in retrospect I see all that as part of exploring a whole new design space of interpreted features. Except that over the past three decades or so the Lisp community seems to have somewhat lost track of its interpreted roots; but, I'll get back to that.

Of interest:

In S-expression Lisp, all source expressions are S-expressions. It is no less true now than before that an operational equivalence M₁ ≅ M₂ can only be nontrivial if at least one of the M_k is not an S-expression; but now, if the M_k are source expressions, we can be absolutely certain that they are both S-expressions. So if the operational equivalence relation is restricted to source expressions, it's trivial. This isn't disastrous; it just means that, in order to have nontrivial theory, we are going to have to have some terms that are not source expressions (as Landin did, though for a different reason); and if we choose to compile the language, we won't be allowed to call our optimizations "local source-to-source" (any given optimization could be one or the other, but not both at once).
This is the fork in the road where Muller and I went our separate ways. Muller's M-Lisp, taking the compiled-language view, supposes that S-expressions are encoded homomorphically, resulting in a baseline language with no native meta-programming power. He then considers how to add some meta-programming power to the resulting language. However, practical meta-programming requires the programmer be able to write lucid code that can also be manipulated as data; and the homomorphic encoding isn't lucid. So instead, meta-programming in Muller's extended M-Lisp uses general M-expressions directly (rather than their encodings). If an M-expression turns out to be wanted as data, it then gets belatedly encoded — with the drawback that the M-expression can't be rewritten until the rewriting schemata can tell it won't be needed as data. This causes difficulties with operational equivalence of general M-expressions; in effect, as the burden of meta-programming is shifted from S-expressions to general M-expressions, it carries along with it the operational-equivalence difficulties that had been limited to S-expressions.

S-language

McCarthy hadn't finished the details of M-expressions, so S-expression Lisp wasn't altogether an encoding of anything; it was itself, leaving its implementors rather free to invent it as they went along. Blurring the boundary between quoted data and unquoted code provided meta-programming facilities that hadn't been available in compiled languages (essentially, I suggest, a sort of flexibility we enjoy in natural languages). In addition to QUOTE itself (which has a rather fraught history entangled with first-class functions and dynamic scope, cf. §3.3.2 of my dissertation), from very early on the language had fexprs, which are like LAMBDA-constructed functions except that they treat their operands as data — as if the operands had been QUOTE‍d — rather than evaluating them as code (which may later be done, if desired, explicitly by the fexpr using EVAL). In 1963, macros were added — not the mere template-substitution macros found in various other languages, but macros that treat their operands as data and perform an arbitrary computation to generate a data structure as output, which is then interpreted as code at the point of call.

But how exactly do we specify the meanings of programs in this interpreted S-expression language? We could resort to the meta-circular evaluator technique; this is a pretty natural strategy since an evaluator is exactly what we have as our primary definition of the language. That approach, though, is difficult to work with mathematically, and in particular doesn't lend itself to proofs of operational equivalence. If we try to construct a rewriting system the way we did before, we immediately run into the glaring practical problem that the same representation is used for executable code, which we want to have nontrivial theory, and passive data which necessarily has perfectly trivial theory. That is, as noted earlier, all source expressions are S-expressions and operational equivalence of S-expressions is trivial. It's possible to elaborate in vivid detail the theoretical train wreck that results from naive application of the usual rewriting semantics strategy to Lisp with quotation (or, worse, fexprs); but this seems to be mainly of interest if one is trying to prove that something can't be done. I'm interested in what can be done.

If what you want is nontrivial theory, that could in principle be used to guide optimizations, this is not difficult to arrange (once you know how; cf. my past discussion of profundity index). As mentioned, all nontrivial operational equivalences must have at least one of the two terms not a source expression (S-expression), therefore we need some terms that aren't source expressions; and our particular difficulty here is having no way to mark a source expression unmistakably as code; so, introduce a primitive context that says "evaluate this source expression". The new context only helps with operational equivalence if it's immune to QUOTE, and no source expression is immune to QUOTE, so that's yet another way to see that the new context must form a term that isn't a source expression.

Whereas the syntax of the compiled M-language had two kinds of terms — constant S-expressions and computational M-expressions — the syntax of the interpreted S-language will have three kinds of terms. There are, again, the "constant" terms, the S-expressions, which are now exactly the source expressions. There are the "computational" terms, which are needed for the actual work of computation; these are collectively shaped something like λ-calculus. We expect a big part of the equational strength of the rewriting calculus to reside in these computational terms, roughly the same equational strength as λ-calculus itself, and therefore of course those terms have to be entirely separate from the source expressions which can't have nontrivial equational theory. And then there are the "interpretation" terms, the ones that orchestrate the gradual conversion of source expressions into computational expressions. The code-marking terms are of this sort. The rewriting rules involving these "interpretation" terms will amount to an algorithm for interpreting source code.

This neat division of terms into three groups won't really be as crisp as I've just made it sound. Interpretation is by nature a gradual process whose coordination seeps into other parts of the grammar. Some non-interpretation terms will carry along environment information, in order to make it available for later use. This blurring of boundaries is perhaps another part of the flexibility that (I'll again suggest) makes interpreted languages more similar to natural languages.

I'll use nonterminal T for arbitrary terms. Here are the interpretation forms.

T ::= [eval T T] | ⟨wrap T⟩ | e .

Form [eval T₁ T₂] is a term that stands for evaluating a term T₁ in an environment T₂. This immediately allows us to distinguish between statements such as

S₁ ≅ S₂
[eval S₁ e₀] ≅ [eval S₂ e₀]
∀e, [eval S₁ e] ≅ [eval S₂ e] .

The first proposition is the excessively strong statement that S-expressions S_k are operationally equivalent — interchangeable in any context — which can only be true if the S_k are identically the same S-expression. The second proposition says that evaluating S₁ in environment e₀ is operationally equivalent to evaluating S₂ in environment e₀ — that is, for all contexts C and all S-expressions S, C[[eval S₁ e₀]] ↦* S iff C[[eval S₂ e₀]] ↦* S. The third proposition says that evaluating S₁ in any environment e is operationally equivalent to evaluating S₂ in the same e — which is what we would ordinarily have meant, in a compiled language, if we said that two executable code (as opposed to data) expressions S_k were operationally equivalent.

The second form, ⟨wrap T⟩, is a wrapper placed around a function T, that induces evaluation of the operand passed to the function. If T is used without such a wrapper (and presuming T isn't already a wrapped function), it acts directly on its unevaluated operand — that is, T is a fexpr.

The third form, e, is simply an environment. An environment is a series of symbol-value bindings, ⟪s_k←T_k⟫; there's no need to go into gory detail here (though I did say more in a previous post).

The computational forms are, as mentioned, similar to λ-calculus with some environments carried along.

T ::= x | [combine T T T] | ⟨λx.T⟩ | ⟨εx.T⟩ .

Here we have a variable, a combination, and two kinds of function. Form ⟨λx.T⟩ is a function that substitutes its operand for x in its body T. Variant ⟨εx.T⟩ substitutes its dynamic environment for x in its body T.

Form [combine T₁ T₂ T₃] is a term that stands for combining a function T₁ with an operand T₂ in a dynamic environment T₃. The dynamic environment is the set of bindings in force at the point where the function is called; as opposed to the static environment, the set of bindings in force at the point where the function is constructed. Static environments are built into the bodies of functions by the function constructor, so they don't show up in the grammar. For example, [eval (LAMBDA X FOO) e₀] would evaluate to a function with static environment e₀, of the form ⟨wrap ⟨λx.[eval FOO ⟪...⟫]⟩⟩ with the contents of e₀ embedded somewhere in the ⟪...⟫.

Putting it all together,

T ::= x | [combine T T T] | ⟨λx.T⟩ | ⟨εx.T⟩ | [eval T T] | ⟨wrap T⟩ | e | S .

The rewriting schemata naturally fall into two groups, those for internal computation and those for source-code interpretation. (There are of course no schemata associated with the third group of syntactic forms, the syntactic forms for passive data, because passive.) The computation schemata closely resemble λ-calculus, except with the second form of function used to capture the dynamic environment (which fexprs sometimes need).

[combine ⟨λx.T₁⟩ T₂ T₃] → T₁[x ← T₂]
[combine ⟨εx.T₁⟩ T₂ T₃] → [combine T₁[x ← T₃] T₂ T₃] .

The interpretation schemata look very much like the dispatching logic of a Lisp interpreter.

[eval d T] → d
if d is an empty list, boolean, λ-function, ε-function, or environment
[eval s e] → lookup(s,e) if symbol s is bound in environment e
[eval (T₁ T₂) T₃] → [combine T₁ T₂ T₃]
[combine ⟨wrap T₁⟩ T₂ T₃] → [combine T₁ [eval T₂ T₃] T₃] .

(There would also be some atomic constants representing primitive first-class functions and reserved operators such as if, and schemata specifying what they do.)

Half a century's worth of misunderstandings and confusions

As I remarked earlier, Lisp as we know it might not have happened — at least, not when and where it did — if McCarthy had thought to allow first-class functions to occur in pairs. The thing is, though, I don't think it's all that much of an "accident". He didn't think to allow first-class functions to occur in pairs, and perhaps the reason we're likely to think to allow them today is that our thinking has been shaped by decades of the free-wheeling attitude fostered by the language that Lisp became because he didn't think to then. The actual sequence of events seems less unlikely than one might first suppose.

Researchers trying to set up semantics for Lisp have been led astray, persistently over the decades, by the fact that the primary Lisp constructor of first-class functions is called LAMBDA. Its behavior is not that of calculus λ, exactly because it's entangled with the process of interpreting Lisp source code. This becomes apparent when contemplating rewriting calculi for Lisp of the sort I've constructed above (and have discussed before on this blog): When you evaluate a LAMBDA expression you get a wrapped function, one that explicitly evaluates its operand and then passes the result to a computational function; that is, passes the result to a fexpr. Scan that: ordinary Lisp functions do not correspond directly to calculus λ-abstractions, but fexprs do correspond directly to calculus λ-abstractions. In its relation to Lisp, λ-calculus is a formal calculus of fexprs.

Much consternation has also been devoted to the perceived theoretical difficulty presented by Lisp's quotation operator (and presented in more extreme form by fexprs), because it presents a particular context that can distinguish any two S-expressions placed into it: (QUOTE S₁) and (QUOTE S₂) are observably distinct whenever the S_k are distinct from each other. Yet, this observation only makes sense in a compiled programming language. Back in the day, it would have been an unremarkable observation that Lisp only has syntax for data structures, no syntax at all for control. Two syntactically distinct Lisp source expressions are operationally non-equivalent even without any quotation or fexpr context, because they don't represent programs at all; they're just passive data structures. The context that makes a source expression code rather than data is patently not in the source; it's in what program you send the source expression to. Conventional small-step operational semantics presumes the decision to compile, along with a trivial interpretive mapping between source expressions and internal computational terms (so the interpretive mapping doesn't have to appear explicitly in the rewriting schemata). It is true that, without any such constructs as quotation or fexprs, there would be no reason to treat the language as interpreted rather than compiled; but once you've crossed that Rubicon, the particular constructs like quotation or fexprs are mere fragments of, and can be distractions from, the main theoretical challenge of defining the semantics of an interpreted language.

The evolution of Lisp features has itself been a long process of learning how best to realize the flexibility offered by interpreted language. Fexprs were envisioned just about from the very beginning — 1960 — but were sabotaged by dynamic scope, a misfeature that resulted from early confusion over how to handle symbol bindings in an interpreter. Macros were introduced in 1963, and unlike fexprs they lend themselves to preprocessing at compile time if one chooses to use a compiler; but macros really ought to be much less mathematically elegant... except that in the presence of dynamic scope, fexprs are virulently unstable. Then there was the matter of first-class functions; that's an area where Lisp ought to have had a huge advantage; but first-class functions don't really come into their own without static scope (The Art of the Interpreter noted this) — and first-class functions also present a difficulty for compilers (which is why procedures in ALGOL were second-class). The upshot was that after Lisp 1.5, when Lisp splintered into multiple dialects, first-class functions went into eclipse until they reemerged in the mid-1970s when Scheme introduced static scope into the language. Fexprs held on longer but, ironically, were finally rejected by the Lisp community in 1980 — just a little ahead of the mainstream adoption of Scheme's static-scope innovation. So for the next twenty years and more, Lisp had static scope and first-class functions, but macros and no fexprs. Meanwhile, EVAL — key workhorse of meta-linguistic flexibility — was expunged from the new generation of mainstream Lisps and has had great difficulty getting back in.

The latter half of Lisp's history has been colored by a long-term trend in programming language design as a whole. I've alluded to this several times above. I have no specific sources to suggest for this; it's visible in the broad sweep of what languages were created, what research was done, and I've sensed it though my interactions with the Lisp community over the past thirty years. When I learned Lisp in the mid-1980s, it was from the Revised Maclisp Manual, Saturday Evening Edition (which I can see a few feet away on my bookshelf as I write this, proof that manuals can be well-written). Maclisp was a product of the mentality of the 1970s. Scheme too was a product of that mentality. And what comes through to me now, looking back, isn't the differences between those languages (different though they are), but that those people knew, gut deep, that Lisp is an interpreted language — philosophically, regardless of the technical details of the language processing software. The classic paper I cited above for the relationship between first-class functions and static scope was one of the "lambda" papers associated with the development of Scheme: "The Art of the Interpreter". The classic textbook — the Wizard Book — that emerged from the Scheme design is Structure and Interpretation of Computer Programs.

But then things changed. Compilation had sometimes intruded into Lisp design, yes (with unfortunate results, as I've mentioned), but the intrusion became more systematic. Amongst Scheme's other achievements it had provided improved compilation techniques, a positive development but which also encouraged greater focus on the challenges of compilation. We refined our mathematical technology for language semantics of compiled languages, we devised complex type systems for use with compiled languages, more and more we designed our languages to fit these technologies — and as Lisp didn't fit, more and more we tweaked our Lisp dialects to try to make them fit. Of course some of the indigenous features of Lisp couldn't fit, because the mathematical tools were fundamentally incompatible with them (no pun intended). And somewhere along the line, somehow, we forgot, perhaps not entirely but enough, that Lisp is interpreted. Second-class syntax has lately been treated more and more as if it were a primary part of the language, rather than a distraction from the core design. Whatever merits such languages have, wholeheartedly embracing the interpretation design stance is not among them.

I'm a believer in trying more rather than less. I don't begrudge anyone their opportunity to follow the design path that speaks to them; but not all those paths speak to me. Second-class syntax doesn't speak to me, nor recasting Lisp into a compiled language. I'm interested in compiling Lisp, but want the language design to direct those efforts rather than the other way around. To me, the potential of interpretation beckons; the exciting things we've already found on that path suggest to me there's more to find further along, and the only way to know is to follow the path and see. To do that, it seems to me we have to recognize that it is a distinct path, the distinct philosophy of interpretation; and, in company with that, we need to hone our mathematical technology for interpreted languages.

These are your father's parentheses
Elegant weapons
for a more... civilized age.
— xkcd 297.

49 comments:

gascheSeptember 2, 2016 at 6:15 AM
I enjoyed this blog post and I followed the technical developments, but I wouldn't be able to describe concisely and precisely after reading it what is the difference between interpretation and compilation that you are referring to -- if it is a "theoretical" difference I would expect a concise, formal expression of it.
ReplyDelete
Replies
John CowanSeptember 2, 2016 at 6:10 PM
I think what you are calling compiled vs. interpreted languages are already known by other names. It's common to call them static vs. dynamic languages, though this gets tangled up with the orthogonal question of type: Scheme is a static language with dynamic types, and I was reading a while back about a dynamic language that has statically typed records. Going further back, static languages are fundamentally non-self-modifying languages running on Harvard machines, whereas dynamic languages are self-modifying languages running on Von Neumann machines.
ReplyDelete
Replies
Nate CullSeptember 6, 2016 at 2:01 AM
"To me, the potential of interpretation beckons; the exciting things we've already found on that path suggest to me there's more to find further along, and the only way to know is to follow the path and see."

I wholeheartedly agree. To me, one of the main attractions of interpretation is that it puts the focus back on runtime: this is important because, on the Internet, "it's always runtime somewhere". One machine's code is always another machine's data.

I think Alan Kay understood this deeply with Smalltalk and its view of the 'net as made of small organic objects that can update at any time. And I think this understanding was almost completely lost with C++ and Java and their focus on compilation.

Really fascinating to see some of this history, though I knew a little of it - in particular, that M-expressions used case to distinguish between symbols and variables. Is that where Prolog inherited that particular (unfortunate) quirk from?

But I'm confused by this:

"In S-expression Lisp, all source expressions are S-expressions... if the Mk are source expressions, we can be absolutely certain that they are both S-expressions. So if the operational equivalence relation is restricted to source expressions, it's trivial."

How can we be 'absolutely certain'? Didn't we just establish that in 'S-expression Lisp', the φ-encoding translates S-expressions (from original M-expression Lisp) into (QUOTE) expressions? And M-expressions into non-QUOTE forms of S-expressions?

So we're not now dealing with a language of M-expressions and S-expressions - we're dealing with a language of non-QUOTE-expressions and QUOTE-expressions. So shouldn't the triviality rule in S-expression Lisp actually substitute the words "QUOTE-expression" for "S-expression"? As in:

"Operational equivalence of QUOTE-expressions is trivial; that is, Q1 and Q2 are operationally equivalent only if they are identical."

And isn't that (at least part of) the source of the unjust banning of FEXPRS this whole time?

Forgive me if I'm missing something blindingly obvious here. Why do we need the machinery of 'some terms that are not source-expressions'? Don't the source-expressions encode the M/S difference sufficiently?
ReplyDelete
Replies
Nate CullSeptember 6, 2016 at 2:21 AM
Ah... is the problem that in S-expression Lisp, the entire program can be itself wrapped in a (QUOTE) expression, and so the φ-encoding step would have to be performed, selectively, on all (QUOTE)-expressions that might be fed to an (EVAL)? And that it's hard to figure out in advance (without some machinery like that of vau-calculus) which blocks of data might end up in an EVAL?

"The context that makes a source expression code rather than data is patently not in the source; it's in what program you send the source expression to."

This seems to be the key to everything to me. And working programmers keep something like this in mind all the time, even when we're looking at 'just data'.

'Operational equivalence' isn't just a property that belongs to 'code'. It belongs to 'raw data' as well. Consider whitespace, and case sensitivity, to take two examples. The idea of whether two chunks of text 'are equal' depends on the context in which we think this text is going to be evaluated. Is it going to a case-insensitive tool? Then all forms of upper and lower case are equivalent. Is it going to be subjected to standard tokenisation and whitespace-stripping? Then the whitespace between tokens can be disregarded for equality purposes.

Can we tease something out of this programmer's intuition and formalise it?
ReplyDelete
Replies
Nate CullSeptember 6, 2016 at 2:23 AM
This comment has been removed by the author.
ReplyDelete
Replies
Nate CullSeptember 6, 2016 at 3:40 AM
(I wish I could edit without deleting).

I guess what I'm trying to ask is: can we apply some of the thinking that's gone into compilation, and apply it to compilation-like preprocessing of *data*?

We already have an idea a little like 'compilation of data': 'normal forms', as in SQL. Is there perhaps some middle ground between techniques developed for optimising of code, and techniques databases use for optimising queries and indexes?

Could some kind of 'data compilation' help sort out some of the mess that parsing is in, where we have lots of ad-hoc parsers just crawling with terrifying security vulnerabilities, wired straight into the Internet?

An idea that's easy to say but hard to formalise, I guess.
ReplyDelete
Replies
John ShuttSeptember 7, 2016 at 6:07 AM
@patchworkZombie: Sometimes the long-term drafts do get completed and posted, but often the one that gets posted was the most recently created, while older ones get salvaged for parts. There's awkward overlap between some current drafts, as I struggle to wrap my head around some very big ideas. Fwiw, here are some of the more promising working titles; the list started in reverse-alphabetical order by file name, and I kept that order because it makes a surprising amount of sense.

The twisted hourglass. The imho beautiful side of category theory; heavily visual. My main current interest, being delayed by technical challenge of image files.

Pitfalls of alternative science. How to pursue alternative ideas without sounding like, or being, a crackpot. Tiny bits of this sometimes show up in posted entries. The whole may never materialize, because it requires discussing my experience some years ago trying to get a paper peer-reviewed on the theory of fexprs, and I still haven't figured out how to talk about it without bitterness (which would defeat the intent of this draft).

Non-sapience and anti-sapience. What isn't sapient, what sapience doesn't do, what actively suppresses sapience. Aspires to consider how to promote humanity in a sapience-hostile civilization.

Self-organizing civilization. "How are we, as a civilization, to make decisions?" Aspires to subsume politics and economics.

Refactor — eliminating lexical nouns and verbs. My other conlang project. I see ways to eliminate nouns/verbs either lexicially or grammatically, but not both as once.

The quantum approximation. About whether it's mathematically possible to account for quantum mechanics using a physics TOE such as I described in my cohygiene post.

Nabla. Trying to work out the, er, quintessence of quaternions.

Markup languages and escaping the macro tarpit. Proposing markup languages should be designed using fexprs instead of macros.

Lisp, mud, wikis, and reimagining programming. Radically crowd-sourced programming that maximizes the sapient element rather than suppressing it.

How do we know it's right?. Discusses a "three-fold path": mechanistic reasoning (such as elaborate type systems), testing, and self-evidence.

Church's 1932/3 logic. I haven't seen a good exposition of it on the internet, so decided to write one. Bogged down over what to do with the battery of axioms whose individual significance is obscure.
ReplyDelete
Replies
Nate CullSeptember 7, 2016 at 7:20 PM
I also realise that we still do need *something* like quoting - eg, source code has to exist as data somewhere, and if we're Lisp people

Trying to wrap my head around the idea of "terms not in the source expression", are those the "eval, wrap and e" terms? And is that linked to the idea of markers for explicit evaluation? https://fexpr.blogspot.co.nz/2013/07/explicit-evaluation.html

Is the reason why a marker like 'eval' cannot occur in a source expression because it would immediately be acted upon by the interpreter and replaced with something else? Am I on the right track or way off?

I share your conviction that explicit evaluation is more correct than implicit, but I have quite a mental block about the idea of *where* evaluation occurs in an explict-evaluation model. And how to stop it happening before you want it to.

I have this vague idea in the back of my head that I've not yet been able to make come forward and fully identify itself: that a nice computing model would be like a spreadsheet, with a lot of 'cells' which have both an interior (eg expression / formula / definition) and an exterior (eg computed value). And that it might be useful to hold BOTH of those 'values' at once, and allow a visitor to decide at runtime (subject to some kind of security access check) whether they want to read the 'inside' or the 'outside' of the cell. As, eg, the user of a spreadsheet can do.

The main idea, I think, being that this wouldn't be quite a 'term-rewriting' system because the definition-side of the term wouldn't vanish; it would still be there even after it was 'computed'. Probably, though, with all the intermediate steps being thrown away.

This would be very similar to how we keep 'source files' and 'object files' of all our compiled software - except that we'd do it for ALL computation, at a fine-grained level, not just the large chunks we call 'programs'.

It would also be similar to how Unix filesystems can have 'symbolic links' (or Windows can have 'shortcut files') : the symbolic link is like a function definition, the file it redirects to is like a computed value. We'd maybe like to store both, and allow the user to read one or the other at will. If we can come up with one universal mechanism for solving this problem - since it recurs in so many contexts - that would be a win.

I wish I could convince my brain to be clearer about what it means by this, or how to do it.
ReplyDelete
Replies
Nate CullSeptember 7, 2016 at 7:22 PM
Ug. "Source code has to exist as data somewhere, and if we're Lisp people, we'd like that source code data representation to be homoiconic - as much like its final form as possible."
ReplyDelete
Replies
John ShuttSeptember 8, 2016 at 6:04 AM
@Nate Cull:

"I'm very concerned that, globally, the IT industry is heading in exactly the OPPOSITE direction. A reason why I think your research is very important for the future."

Yes, it's heading in the opposite direction. Which connects to the draft Non-sapience and anti-sapience. Basically, entities like governments and mega-corporations would find people easier to deal with if people were less sapient; and the role of those entities in deciding the direction of technological development connects to draft Self-organizing civilization. Which is the sort of thing I had in mind when I mentioned struggling to wrap my head around very big ideas: that's four drafts all on aspects of one very big idea.

(A third annoying thing about this comments interface is that comments can only be nested one layer deep.)
ReplyDelete
Replies
patchworkZombieSeptember 9, 2016 at 4:56 PM
I found an interesting sentence in Teitelman's phd thesis PILOT (1966) page 35
"In LISP, all data are in the form of symbolic expressions, or S-expressions. S-expressions are of indefinite length and have a branching tree structure in which subexpressions can be readily isolated. LISP computations are also written in the form of S-expressions. This makes LISP especially adaptable for our purposes. Like machine languages, and unlike most other higher level languages, one can write programs in LISP which will generate programs for further execution. Furthermore, it is possible to execute data as programs, and conversely treat programs as data."
ReplyDelete
Replies
UnknownSeptember 12, 2016 at 4:21 PM
I'm baffled by your epsilon abstraction: does it only come into play in the interpreter? In the term

[combine ⟨εx.T1⟩ T2 T3] → [combine T1[x ← T3] T2 T3]

does T1[x <- T3] reduce to a lambda abstraction? If not, what is it doing? I don't want to waste your time, I would be obliged to you if you would take the time to explain it to me. Thanks.
ReplyDelete
Replies
Kartik Sharma KanthAugust 11, 2017 at 1:46 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
UnknownSeptember 15, 2017 at 10:32 PM
This might be helpful. I noticed that at the heart of the matter, there is the Leibniz rule for equality, x=y -> f x=f y. This is not going to work out when fexprs are thrown into the mix. x=y means that x evaluates the same as y in the same environment. When f is an fexpr the Leibniz will not be true because the fexpr distinguishes x's syntax from y. But all you have to do to make Leibniz true is put this proviso, that f is not an fexpr. This goes to what you were saying about equality being in relation to an evaluation context.
ReplyDelete
Replies
UnknownSeptember 16, 2017 at 1:31 AM
I think it is interesting that "eval" if limited to functions is so that all programs are static data, you end up with the imperative paradigm, where 'f' is the function pointer (or even better the s-expression) and 'f()' means evaluate it. In an imperative language every function definition is data, and postfixing parenthesis is the eval operator.
ReplyDelete
Replies
haider aliOctober 28, 2019 at 7:46 AM
I just want to let you know that I just check out your site and I find it very interesting and informative..
cs作业
ReplyDelete
Replies
NiyazJanuary 10, 2022 at 3:03 AM
More impressive Blog!!! Its more useful for us...Thanks for sharing with us...
big data in education
big data education
ReplyDelete
Replies
Serap SJanuary 30, 2022 at 2:38 AM
yalova eskort

antep eskort

sivas eskort

iskenderun eskort

van eskort

antalya eskort

düzce bayan

manisa bayan

izmit bayan

görükle bayan
ReplyDelete
Replies
AnonymousMay 20, 2022 at 10:46 AM
tül perde modelleri
Numara onay
VODAFONE MOBİL ÖDEME BOZDURMA
Nft Nasil Alinir
ANKARA EVDEN EVE NAKLİYAT
TRAFİK SİGORTASİ
dedektör
web sitesi kurma
ASK ROMANLARİ
ReplyDelete
Replies
AnonymousMay 26, 2022 at 11:22 PM
smm panel
smm panel
iş ilanları
İnstagram Takipçi Satın Al
https://www.hirdavatciburada.com/
HTTPS://WWW.BEYAZESYATEKNİKSERVİSİ.COM.TR
servis
jeton hilesi indir
ReplyDelete
Replies
AnonymousJune 27, 2022 at 10:09 AM
nft nasıl alınır
lisans satın al
en son çıkan perde modelleri
minecraft premium
özel ambulans
yurtdışı kargo
uc satın al
en son çıkan perde modelleri
ReplyDelete
Replies
Jack DanielJanuary 30, 2023 at 5:02 AM
This is really a high quality content. Very helpful & informative.
hire a hacker
ReplyDelete
Replies
John BezosFebruary 15, 2023 at 7:47 PM
Great article. Your blogs are unique and simple that is understood by anyone.
apps to hack iphone
ReplyDelete
Replies
onlinegemstonebeadsMay 30, 2023 at 11:32 PM
This comment has been removed by the author.
ReplyDelete
Replies
Power Of BacklinksMay 21, 2024 at 3:50 AM
Hey thank you!!! I was seeking for the particular information for long time. Good Luck ?
Dubai Visa
ReplyDelete
Replies
UAE VisaMay 29, 2024 at 5:35 AM
Definitely your all weblogs are helpful for me. I really enjoy a lot reading your all posts.
uae visa
ReplyDelete
Replies
dubai-visaJuly 8, 2024 at 10:47 PM
dubai visa
ReplyDelete
Replies
Nadia AdamsDecember 10, 2024 at 4:32 PM
This was such an insightful post on interpreted programming languages! It’s fascinating to see how these languages balance flexibility and performance. Just Amazing Discounts
ReplyDelete
Replies
Barbara NimmoDecember 10, 2024 at 4:33 PM
I’ve recently come across some great coding resources at discounted rates through Saving Cents Together, and I’m excited to dive deeper into learning more about these topics.
ReplyDelete
Replies
Botões EliteMay 5, 2025 at 11:36 PM
nice share thanks for the information. botoes
ReplyDelete
Replies

Add comment