Saturday, June 4, 2011

Primacy of syntax

One of the most influential papers in the history of computer science is Christopher Strachey's 1967 "Fundamental Concepts in Programming" — lecture notes for the International Summer School on Computer Programming, Copenhagen.  Christopher Strachey basically created the academic subject of programming languages, and in those lecture notes you will find many core concepts laid out.  First-class objects.  Left-values and right-values.  Polymorphism.  And so on.

Although Strachey did write a finished version of the lecture notes for publication in the proceedings of the summer school, those proceedings never materialized — so copies of the lecture notes were passed from hand to hand.  (The modern name for such a phenomenon is "going viral", but it's surely been happening since before writing was invented.)  Finally, on the twenty-fifth anniversary of Strachey's untimely death, the finished paper was published.

By that time, though, I'd spent over a decade studying my cherished copy of the raw lecture notes.  The notes consist largely of abbreviated, even cryptic, non-sentences, but their pithy phrases are often more expressive than the polished prose of the published paper — and sometimes the ideas candidly expressed didn't even make it into the published paper.  A particular item from the lecture notes, that has stuck with me from the first moment I read it (about a quarter century ago, by now) is
Basic irrelevance of syntax and primacy of semantics.
This seems to me to neatly capture an attitude toward syntax that became firmly rooted in the late 1960s, and has scarcely been loosened since.  My own passion, though, within the academic subject Strachey created, is abstraction; and from my first explorations of abstraction in the late 1980s, it has seemed to me that abstraction defies the separation between syntax and semantics.  Sometime during the 1988/89 academic year, I opined to a professor that the artificial distinction between syntax and semantics had been retarding the development of abstraction technology for over twenty years — and the professor I said it to laughed, and after a moment distractedly remarked "that's funny" as they headed off to their next appointment.  Nonplussed by that response (I was still rather in awe of professors, in those days), to myself alone I thought, but I wasn't joking.


A source code element modifies the language in which it occurs.  Starting, say, with standard Java, one introduces a new class, and ends up with a different programming language — almost exactly like standard Java, but not quite because it now has this additional class in it.  That is abstraction, building a new language on top of an old one.
Abstraction:  Transformation of one programming language into another by means of facilities available in the former language.
What makes it abstraction, rather than just an arbitrary PL transformation, is that it uses the facilities of the pre-existing programming language.  (For those who prefer to trace their definitions back a few centuries, the key requirement that the new language be latent in the old —that it be drawn out of the old, Latin abstrahere— appears in John Locke's account of abstraction in his An Essay Concerning Human Understanding; the relevant passage is the epigraph at the top of Chapter 1 of the Wizard Book — and is also the epigraph at the top of Chapter 1 of my dissertation.)

An abstractively powerful programming language is, by my reckoning, a language from which one can abstract to a wide variety of other languages.  The precise formalization of that is a subtle thing; and worthy of a blog entry of its own, especially since my formal treatment of it, WPI-CS-TR-08-01, gets rather dense in some parts.  For the current entry, though, we don't need those details; the key point here is that the result of the abstraction is another language.  This is in contrast to the denotational approach to programming languages (which Strachey helped create, BTW, and which is visible in nascent form in the 1967 lecture notes):  denotationally, the programming language is essentially a function which, when applied to a source-code term, produces a semantic value of another sort entirely.
[Note:  I've since written an entry on abstractive power.]
The idea that semantic results are languages is a very powerful one.  The results of computations (and any other embellishment one wants) can be modeled by introducing additional sorts of terms that may occur in languages; the significance of each language is then fully represented by the possible sequences of terms that can follow from it.  But then, one doesn't really need the explicit semantics at all.  Only the sequences of terms matter, and one can define a programming language by a set of sequences of terms.  At that point, one could say that all semantics is being represented as syntax (which is a truism about semantics, anyway), or one could just as well say that semantics has vanished entirely to be replaced with pure syntax.

Lisp and syntax

Lisp has been described as a language with no syntax.  There is a sense in which that's true:  if by "syntax" one means "syntax for representing programs rather than data".  In primordial S-expression Lisp, the only syntax exclusively represents data.  (The way that happened —to remind— was that McCarthy had originally envisioned a second kind of syntax, M-expressions, representing programs, but he'd also described an algorithm for encoding M-expressions as S-expressions.  He expected to have years in which to polish details of the language since writing a compiler was then understood to be such a colossal undertaking, but in the meantime they were hand-coding specific Lisp functions — and then S.R. Russell hand-coded eval, and abruptly they had a working interpreter for S-expression Lisp.)

I believe, by the way, this is how Lisp should be taught to novices:  Teach them S-expression syntax first, set it firmly in their minds that such expressions are data, and only after that begin to teach them about evaluation.  Mike Gennert and I tried this approach with a class back in the spring semester of 1999/2000.  Over the duration of the course, we led the students through writing a Scheme interpreter in Java, starting with a level-0 "REPL" loop that was missing the "E" — it would read in an S-expression, and just write it out without evaluating it.  By the end of the term we'd added in proper tail recursion.  The experiment as a whole wasn't as successful as we'd hoped, because at the moment we tried it, the department's curriculum was in a state of flux, and many of the students didn't already know Java; but we didn't have the sorts of problems I've seen, or heard others describe, due to novice students failing to think in terms of evaluating S-expressions.

The connection to abstraction is immediate and compelling.  Abstraction is all about specifying how syntax will be used for subsequent program code, and the design of Lisp is focused on virtuoso manipulation of the very type of data structure (S-expressions) that is effectively the syntax of program code.  The more power Lisp gives the programmer to control how syntax will be interpreted, the more abstractive power accrues.  Since fexprs greatly expand the programmer's direct control over the interpretation-time behavior of the evaluator (short of the "throw everything out and start over" tactic of running a meta-circular evaluator — a tactic that lacks stability), fexprs should massively increase the (already formidable) abstractive power of Lisp.  That's why the subtitle of my dissertation is $vau: the ultimate abstraction.

Note: Polymorphism

Strachey's 1967 lecture notes are most known —and cited— for coining the term polymorphism.  Therein, he divided polymorphism into two forms:  parametric polymorphism, and ad hoc polymorphism.

The parametric/ad hoc distinction struck me as a logical bipartition of polymorphism; that is, every possible form of polymorphism would necessarily be either parametric or ad hoc.  Evidently Cardelli and Wegner did not interpret Strachey this way; their taxonomy placed "inclusion polymorphism" outside of both parametric and ad hoc.

It also strikes me that the name ad hoc reflects real disapproval.  This goes back to the "basic irrelevance of syntax" remark.  At heart, parametric polymorphism is semantic, ad hoc is syntactic; one might be tempted to call them "semantic polymorphism" and "syntactic polymorphism", or even "good polymorphism" and "bad polymorphism".  There is a close connection between my perception that parametric/ad hoc reflects semantic/syntactic, and my perception that parametric/ad hoc was meant to be exhaustive.

Although I expect parametric polymorphism should have greater abstractive power than ad hoc polymorphism, I don't like biasing terminology.  I'd like to see any formal results stand on their own merits.  So, years ago, I looked for an alternative to "ad hoc".  The best I came up with at the time:  selection polymorphism.


  1. Why not go further?

    For example, within the domain of syntax, the leading book on compilers completely overlooks scannerless parsing - and instead focuses on parsing techniques that are deterministic, and have unambiguous parse trees, even at the expense of allowing the programmer to concisely state the grammar. As I've commented before on Lambda the Ultimate, there is a close connection between how your fexpr's solve code representation problems for extensible languages, and how scannerless parsers solve grammar representation problems for extensible languages without requiring "rule cracking" (a grammar transformation that violates abstraction, as it intuitively implies: cracking open a sealed grammar rule to insert a new symbol recognizing function).

    At any rate, I've decided to share your blog post on Alan Kay's FONC mailing list.

  2. The above terminology confused me the first few times I read it. I'll explain the source of confusion for anyone who might benefit:

    * If we add a new class to Java, I would usually say we are 'extending' Java, and that the 'abstraction' is what the new class represents.

    * I can conceive of at least two language transforms involved with every new abstraction: (a) the extension of the host language (Java to Java'), (b) the implementation of the extension (Java' to Java)

  3. I've had some more time to chew on the content.

    While I agree with your goal of 'abstractively powerful' systems, I disagree with your pursuit of it through syntactic abstraction. My reasons are as follows:

    (1) In languages for open systems, parts of the model are hidden. It is not the case that we can fully represent such a language as a sequence of terms.

    (2) We can achieve abstractively powerful language without code-walking transforms such as fexprs. Haskell is a fine example, modeling languages in monads or arrows. Having a good base 'semantics' helps a lot - i.e. you use fexprs to control evaluation order, but in Haskell we can model sequencing directly.

    (3) We want more than abstraction. We also want to inductively reason about properties across composition. Code walking transforms are too powerful, and hinder such reasoning.

  4. Awelonblue — I'm going to try to comment on both your posts; sorry about the length :-).

    Re your first:
    A difficulty in discussing this is that, as suggested by your first comment, words "abstraction" and its kin are heavily overloaded.  There's the process of abstraction in programming, where I start.  That's different from "an abstraction", which can be used to mean an instance of the process (as I use it), or the product of such a process, or the means of such a process.  I rarely speak of "an" abstraction, and then only in the first sense, while... I think... you're using it in the second and maybe third senses.

    I believe I'm also thinking of the process of abstraction in programming more abstractly than you are (which, you'll notice, is a very different sense of "abstract", further muddling things).  This seems a necessary consequence of the sense of "an abstraction" you've chosen.  When you describe the Java class as the abstraction, that constrains your thinking to a special case of the fabulously general class of transformations I'm admitting:  broadly speaking, you'll only look at monotonically increasing the language.  I've worked to avoid building into my mathematical framework any preconceptions about the specific techniques of interest; and I've been specially interested to avoid assuming monotonic increase, because it seems to me monotonic increase curtails practical abstractive power.

    The "transformation" from Java-prime back to Java is a notably non-constructive one:  the sole constraint I place on the relationship between the languages is that the derivation of Java-prime from Java must be mediated by a term permitted by Java.

    Re your second:
    (1) You've misunderstood what I'm doing; hopefully my future blog entry about the mathematics can be written so as to avoid this misunderstanding — and to that end, I really want to better understand the misunderstanding.

    Hiding (if I'm tracking your sense of the word) is irrelevant to the applicability of my treatment; it just means there are multiple ways of getting to isomorphically the same language.  I'm also wondering whether you might be assuming that "terms" have to be program syntax input by the programmer; there is no such constraint.  My treatment is, in an absolute sense, totally general:  anything it can't represent cannot be represented (since anything you can say about a language is said using some kind of syntax).

    (2 and 3) Here you seem to be moving into the realm of claims about mathematical properties of different kinds of abstractions.  Proving or disproving such claims is the point of having a formal mathematical theory of abstraction.  Our intuitions are telling us different things, but even aside from whether those things are true, we may have to cast them in mathematical form before we can even tell whether they actually contradict each other — it still seems possible we might both be right :-).

  5. I am familiar with the concept and value of non-monotonic language transforms. I use arrows, for example, to express function-like things that are in some ways less powerful than functions. I utilized the Java example because that is what you used.

    Hidden models include sensors, actuators, users, foreign services, closed libraries. It is true that some terms in an open system - specifically, references - may be too volatile or opaque to be entered by human hand, but I mention open systems for other reasons. Side-effects and relationships in an open system are not 'fully represented' in any term graph, and it would take a near-omniscient viewpoint (which might exist, but not within the language) to say things about them. References are extremely resistant to syntactic transform, and so we must focus on semantic approaches to abstraction anyway.

    Point (2) is simply an assertion that we can achieve 'abstractively powerful' languages via primarily semantic mechanisms. No need fexprs or generative grammars or their like. I’m not fond of code-as-data transformation as a basis for abstraction, but I am interested in reconciling generative grammars with predicative performance and security in open systems. (I’ve tabled the problem to later develop atop RDP.)

    Point (3) is about a specific mechanism for mathematical reasoning. A property is compositional only if you can inductively reason about it without peeking inside the operands. For example, if you know two subprograms are deadlock free, it's nice to be able to say whether their composition is deadlock free.

    Where your passion is abstraction, mine is composition in open systems. Compositional reasoning is the only form of mathematical reasoning we can consistently support in an open system. I consider composition more valuable than abstraction. Fortunately, I do not need to choose, I just need to be careful about which abstraction mechanisms I allow.

    It seems to me that inductive reasoning about important properties across composition is the very essence of semantics.

    So my points in summary are: (1) I want open systems, which severely constrain what syntax I can see or transform. (2) I can achieve abstractively powerful systems without focusing on sequencing or transformation of terms. (3) I essentially want semantics more than abstraction.