Friday, December 9, 2011

The trouble with monads

IT was six men of Indostan
    To learning much inclined,
Who went to see the Elephant
    (Though all of them were blind),
That each by observation
    Might satisfy his mind.
The Blind Men and the Elephant, John Godfrey Saxe
Note:  So far, I've found three different ways of applying this epigraph to the following blog post.
Monadic programming is a device by which pure functional languages are able to handle impure forms of computation (side-effects).  Depending on who you ask, it's either very natural or preposterously esoteric.

I submit there is a systemic problem with monadic programming in its current form.  Superficially, the problem appears technical.  Just beneath that surface, it appears to be about breaking abstraction barriers.  Below that, the abstraction barriers seem to have already been broken by the pure functional programming paradigm.  I'll suggest, broadly, an alternative approach to functional programming that might resurrect the otherwise disabled abstraction barriers.

What monads do for pure FP

A pure function has no side-effects.  Pure functional programming —programming entirely with pure functions— has some advantages for correctness proofs, which is (arguably) all very well as long as the purpose of the program is to produce a final answer.  Producing a final answer is just what a pure function does.  Sometimes, though, impurities are part of the purpose of the program.  I/O is the canonical example:  if the point is to interact with the external world, a program that cannot interact with the external world certainly misses the point.

If the point is to interact with the external world, and you still want to use a pure function to do it, you can write a pure function that (in essence) takes the state of the external world as a parameter, and returns its result together with a consequent state of the external world.  Other forms of impurity can be handled similarly, with a pure function that explicitly provides for the particular variety of impurity; monads are "simply" a general pattern for a broad range of such explicit provisions.

Note, however, that while the monadic program is parameterized by the initial state of the external world, the monad itself is hardcoded into the type signature of the pure function.

What goes wrong

The essential difficulty with this approach is that since the monad is hardcoded into the function's type signature, it also gets hardcoded into clients who wish to call that function.

To illustrate the resulting brittleness, suppose some relatively small function f is used by a large program p, involving many functions, with calls to f at the bottom of many layers of nesting of calls.  Suppose all the functions in p are pure, but later, we decide each call to f should incidentally output some diagnostic message, which makes no difference to the operation of the program but is meant to be observed by a human operator.  That's I/O, and the type signature of f hadn't provided for I/O; so we have to change its type signature by wiring in a suitable I/O monad.  But then, each function that directly calls f has to be changed to recognize the new signature of f, and since the calling function now involves I/O, its type signature too has to change.  And type signatures change all the way up the hierarchy of nested function calls, until the main function of p gets a different type signature.

Every direct or indirect client of f has been forced to provide for the stateful I/O behavior of f.  One could ask, though, why this stateful behavior of f should make any difference at all to those clients.  They don't do any I/O, and if not for this type-signature business they wouldn't care that f does; so why should f's I/O be any of their business?  For them to bother with it at all seems a violation of an abstraction barrier of f.

Actually, this very real abstraction violation was not caused by the introduction of monads into pure functional programming — it was highlighted by that introduction.  The violation had already occurred with the imposition of pure functional programming, which denies each component function the right to practice impurities behind an abstraction barrier while merely presenting a pure appearance to its clients.

The introduction of monads also created the distracting illusion that the clients were the ones responsible for violating the abstraction barrier.  On the contrary, the clients are merely where the symptoms of the violation appear.  The question should not be why the client function cares whether f is internally impure (it doesn't care; its involvement was forced), but rather, who is it who does care, and why?


Monads come from a relatively modern branch of mathematics (it's a mere half-century old) called category theory.

A category is a well-behaved family of morphisms between objects of some uniform kind.  The category provides one operation on morphisms:  composition, which is defined only when one morphism ends on the same object where the next morphism starts.  (Technically, a category is its composition operation, in that two different categories may have the same objects and the same morphisms, and still be different if their composition operation is different.)

The canonical example is category Set, which is the family of mathematical functions between sets — with the usual notion of function composition.  That's all possible functions, between all possible sets.  This is typical of the scale at which category theory is brought to bear on computation theory:  a category represents the universe of all possible computations of interest.  The categories involved are then things like all computable pure functions, or all computable functions with a certain kind of side-effect — it should be clearly understood that these categories are big.  Staggeringly, cosmologically, big.

Besides well-behaved families of morphisms between objects of a uniform kind, there are also well-behaved families of morphisms from objects of one uniform kind to objects of another uniform kind.  These families of heterogenous morphisms are called adjunctions.  An adjunction includes, within its structure, a category of homogeneous morphisms within each of the two kinds of objects — called the domain category (from which) and codomain category (to which the heterogeneous morphisms of the adjunction go).  The adjunction also projects objects and morphisms of each category onto the other, projects each of its own heterogeneous morphisms as a homogeneous morphism in each of the categories, and requires various relations to hold in each category between the various projections.

The whole massive adjunction structure can be viewed as a morphism from the domain category to the codomain category — and adjunctions viewed this way are, in fact, composable in a (what else?) very well-behaved way, so that one has a category Adj whose objects are categories and whose morphisms are adjunctions.  If the categories we're interested in are whole universes of computation, and the adjunctions are massive structures relating pairs of universes, the adjunctive category Adj is mind-numbingly vast.  (In its rigorous mathematical treatment, Adj is a large category, which means it's too big to be contained in large categories, which can themselves only contain "small" categories — an arrangement that prevents large categories from containing themselves and thereby avoids Russell's paradox.)

A monad is the result of projecting all the parts of an adjunction onto its domain category — in effect, it is the "shadow" that the adjunction casts in the domain.  This allows the entire relation between the two categories to be viewed within the universe of the domain; and in the categorical view of computation, it allows various forms of impure computation to be viewed within the universe of pure computation.  This was (to my knowledge) the earliest use of monads in relation to computation:  a device for viewing impure computation within the world of pure computation.  A significant limitation in this manner of viewing impure computations is that, although adjunctions are composable, monads in general are not.  Here the "shadow" metaphor works tolerably well:  two unconnected things may appear from their shadows to be connected.  Adjunctions are only composable if the domain of one is the codomain of the other — which is almost certainly not true here, because all our monads have the same domain category (pure computation), while the shadows cast in pure computation all appear to coincide since the distinct codomains have all been collapsed into the domain.

Who is viewing these various forms of computation, through the shadows they cast on the world of pure computation?  Evidently, the programmer — in their role as Creator.  A God's eye view.  Viewing the totality of p through the universe of pure computation is the point of the exercise; the need for all clients of f to accommodate themselves to f's internal use of I/O is an artifact of the programmer's choice of view.

Rethinking the paradigm

So, here are the points we have to work with.
  • A monad represents the physical laws of a computational universe within which functions exist.
  • The monad itself exists within the pure computational universe, rather than within the universe whose laws it represents.  This is why monads are generally uncomposable:  they have forgotten which actual universes they represent, and composition of adjunctions wants that forgotten knowledge.
  • Function signatures reflect these computational laws, but serve two different purposes.  From the client's eye view, a function signature is an interface abstraction; while from the God's eye view (in the pure computational universe), a function signature is the laws under which the function and everything it uses must operate.
To uncouple a function's interface from its laws of computation, we need a function signature that does not take a God's eye view.  The most "purity" one can then ascribe to a function f, looking at the definition of f without the definitions of other functions it calls, is that f uses other functions as if they were pure, and doesn't itself introduce any impurities. From a God's eye view, computational laws then have to pass from function to function, either
  • by synthesis — when function f calls function g, g returns its computational laws along with its result value, and f works out how to knit them all into a coherent behavior, and returns its own knit-together computational laws along with its own result value — or
  • by inheritance — when function f calls function g, f passes in its computation laws along with its parameters to g, and g works out both how to knit them into its own computational laws internally, and how to present itself to f in terms f can understand.
For purposes of abstraction, inheritance ought to be the way to go, because the point of abstraction is to reduce the conceptual burden on the caller, rather than increase it.  Not surprisingly, inheritance also appears to be the more daunting strategy to figure out.


  1. I think types should capture the interface of a value. In your example, when you wish to add some effect to the function f, but are sure that it `makes no difference to the operation of the program`, you should use unsafePerformIO or equivalent: if an effect does not change the interface, it can be cast away.

    This is by the way a perfectly fine way to do debugging logging, and I believe it is accepted by Haskell programmers. Of course, you have to restrict yourself to operationrs that "make no difference"; if you suddenly realize that it actually makes a difference because, for example, you then need to control the "order of evaluation" of different parts of the program, then it changes the mode of use, hence the interface, and it is natural that it "breaks abstraction".

    Note that this case is a bit particular; it doesn't feel good to "hack away" the effect by using an unsafe feature. But this is actually necessary here because the notion of "making no difference" you use is different from the one of your programming language, which is observational equivalence. By saying that "oh, this logging does not matter", you have a notion of equivalence that is more lenient than the observational equivalence -- you quotient over some things that are actually observable. There is no way the type system is ever to admit that your change is safe (because from what he knows, it isn't), you have to force him to accept it.

    In the more common case where you want to use an effect inside (say, a mutable reference for optimization purposes) but it *really* is not observable, you can try to get your type system to admit it and keep the type effect-less by itself. That is, for example, what the ST monad does in Haskell, in a specific case where a parametricity/genericity based argument allows the type system to prove that some use of effects are unobservable.

    This is only a comment on your example. I think that, the way it is presented, it is not actually convincing (for the reason that I said: you should just use unsafePerformIO). Might be it could be turned into a good example by changing the setting; but I'm afraid you will end up with "changing the type is inevitable and the right thing to do, the only problem is to propagate that change through the calls, the hughe code changes required are boring"; this is a valid engineering problem, but I'm not quite sure it stands at the fundamental level you describe.

    I'm sorry I don't have much to say on your presentation of monad coming from adjunctions. It reminds me of the work on algebraic effects, which tend to compose more easily than general monads.

  2. It seems to me what you call "adjunctions" are actually "functors", or I did not get it?

  3. gasche — It's interesting you say the example has to do with the programmer's understanding of doesn't-matter versus the language's.  Because that 'programmer's understanding' is the God's eye view.  Whether it's acceptable from the language interpreter's perspective to use unsafe IO ("just do it") rather depends on what the language interpreter is using the information for.  Note, it's an illustration of the problem, not a claimed example of practicality; I tend to go for demonstrating a principle without worrying about demonstrations of how often it'll come up in practice, perhaps because I assume that if a system has a theoretical hole in it, sooner or later someone would be sure to step right in the hole, therefore showing the hole is there is sufficient to establish the situation needs fixing for the long run (and I plan for the long run, because that's all that will matter — in the long run).

    DAY — I mean adjunctions, the highest-level of the basic categorical structures — categories, functors, natural transformations, adjunctions.

  4. Minor note - looks like there's a couple words missing in the 4th paragraph under Bigness:

    >called the domain category *(from which)* and codomain category (to which the heterogeneous morphisms of the adjunction go).

  5. The grammatical structure must be harder to follow than I'd hoped. I did mean to write it that way. The tail of the sentence is meant to distribute over both: 'domain category from which, and codomain category to which, the heterogeneous morphisms of the adjunction go'.

  6. I looked at monads years ago and concluded that uniqueness types were a better way of handling I/O and other stateful behaviors; more natural and straightforward.

    But uniqueness types still require that intermediate functions (or predicates or combiners) adapt to the state being passed around, though in a more natural way.

    Nowadays I have hopes for a model that combines immutability with signals that fake mutability in a controllable way (Conceptually. Under the hood the relation is reversed for optimization; mutability is used to fake signals when possible). Signals don't need the co-operation of intermediate code, they are between objects. They should be blockable, though. I blogged a bit about that, and

  7. It does not 'break abstractions' to control effects as part of abstraction. I disagree with your argument that preventing easy addition of a diagnostic message is `what goes wrong`. But I do have my own objections to purity due to concerns with composition.

    Functions can be defined independently of a monad by generic programming techniques. I.e. instead of writing a program for monad Foo, we can write a function for typeclass (FooFeature m). This seems to undermine your complaint of `hardcoded` type signatures. It also fits a role for inheritance.

    I prefer a capability-secure but effectful model. Inheritance and synthesis can both be modeled by representing records of capabilities. Purity can be modeled by isolation.

  8. When you talk about synthesis and inheritance, toward the end of this post, are you referring back to the use of these terms in your master's thesis (i.e., RAGs)? If you used this with Kernel--with fexprs--would the inheritance occur at the point in function call where beta substitution is taking place (after evaluation of all arguments has occurred? I'm still trying to wrap my head around the fexprs-as-abstractions-in-lambda-calculus idea. I haven't found anything much like it outside of your writings.

    1. Inheritance/synthesis here is the same sort of idea as in RAGs, yes. Inheritance is where stuff is passed into a called function, so yes, that would be the point in vau-calculus where beta-substitution occurs.

    2. Point blank: Are all functions fexprs in vau calculus?

      * If so, why is the fact so obscure, even among your blog readers?
      * If not, which functions are truly plain old Scheme procedures and nothing more?

    3. Point blank: Are all functions fexprs in vau calculus?

      * If so, why is the fact so obscure, even among your blog readers?
      * If not, which functions are truly plain old Scheme procedures and nothing more?

    4. So, anything derived--directly or indirectly--from $vau is a fexpr...

      You won't just be inheriting from function definitions, you'll be inheriting from individual function calls.

      That is very interesting.

    5. I generally avoided the words "function" and "procedure" for Kernel and vau-calculus, because they get used in different ways. The alternative word was "combiner". Combiners are either "applicative" or "operative"; an applicative is a wrapper around some other combiner, and applying an applicative causes its operands to be evaluated to arguments, and the arguments are then passed to whatever combiner underlies the applicative (most often, an operative). Hence the title of my dissertation "Fexprs as the basis of Lisp function application". Ordinary Scheme procedures are applicatives whose underlying combiners are operatives.

    6. If I imagine a program to be a tree...then, given my amateur understanding of this, operatives would correspond to xylem and applicatives to phloem. Photosynthesis would correspond to the wrapping and unwrapping of applicatives; the "wood" in the interior of the tree, to operatives that operate like macros or special forms; the bark and the buds, to that part of the program generated by $lambda.

      How am I doing? :D

    7. 'Fraid I don't follow that analogy.

      It seems the tree we're discussing is that of dynamic extents; in Kernel some of its nodes are protected by guarded continuations.