Compilers for Free

DRAFT

This blog post is a draft. Please contact me if you spot errors or have feedback. If you are interested in applying the idea that the post describes, or know of related work based on similar ideas as the approach that this post describes, I would also love to hear from you.

Abstract

We present an approach to deriving a compiler from a high-level semantics. We illustrate the approach by implementing two compilers: one for arithmetic expressions, and one for arithmetic and conditional expressions. The programs we present are implemented in Haskell.

The contents of this blog post is based on joint work with Andrew Tolmach, Eelco Visser, Bram Crielaard, and Chiel Bruin.

Contents

Arithmetic Expressions

We start by considering how to compile arithmetic expressions.

Syntax, Monad, and Denotation

The syntax of arithmetic expressions is given by the following data type:

data Expr = Val Int | Add Expr Expr

The key idea behind our approach is to specify the semantics as a monadic denotation function which folds over the abstract syntax tree of our object language, and yields as output a monadic computation.

To illustrate the approach we shall define a monadic denotation function for arithmetic expressions. The monad we shall use to give meaning to arithmetic expressions will include the monadic operations given by the following type class:

class Monad m => MonadOps m val where
  intv :: Int -> m val
  addv :: val -> val -> m val

The monadic operations are chosen to closely correspond to the side-effectful operations that the machine that our compiler shall be targeting will interpret. The machine we shall target is register-based, and will execute sequences of instructions. In particular, the machine will know how to interpret instructions with a type signature akin to the following (we leave the actual notion of instruction abstract for now):

inti :: Int -> Reg -> Instr
addi :: Reg -> Reg -> Reg -> Instr

inti i r binds integer value i to register r, and addi r1 r2 r adds the integer values stored in registers r1 and r2, and stores the result in r.

Without further ado; the denotation function:

denote :: MonadOps m val =>
          Expr -> m val
denote (Val i) = intv i
denote (Add e1 e2) = do
  v1 <- denote e1
  v2 <- denote e2
  addv v1 v2

We will illustrate how we can instantiate the MonadOps interface to yield first a definitional interpreter, and subsequently a definitional compiler. The definitional compiler is given by mapping each monadic operation to underlying low-level instructions. The simplicitly of this mapping is what motivates the name of this blog post: the compiler comes “for free”.

An Interpreter for Arithmetic Expressions

It is straightforward to instantiate the MonadOps interface with the identity monad to yield an interpreter for our language:

instance MonadOps Identity Int where
  intv = return . id
  addv i1 i2 = return (i1 + i2)
interp :: Expr -> Int
interp = runIdentity . denote

Modus Operandi for Deriving a Compiler

To derive a compiler from the high-level denotation function we introduced above, we proceed in three steps:

  1. We introduce a language-independent notion command trees with join points, which forms a free monad.
  2. We provide a signature for commands for our arithmetic expressions.
  3. We provide a compilation function from command trees to instructions.

Command Trees With Join Points

We introduce a data type of command trees with join points. While the simple arithmetic expression language does not require a notion of join point, we still illustrate our approach in terms of a data type that supports it. Later in this post, we will see how command trees are useful for compiling conditional expressions that do have a notion of join point.

A command tree is a syntactic representation for possible-sequences of commands, and a command is a syntactic representation of a computational operation, such as the operations that we defined in the MonadOps type class.1

data Tree sig a where
  Leaf :: a -> Tree sig a
  Node :: sig n b p r q ->
          Vec n (p -> Tree sig r) ->
          Option b (q -> Tree sig a) ->
          Tree sig a

A Leaf v represents the direct return of a value v.

On the other hand, Node op ps k is a command tree node which comprises:

  1. an operation op, described by signature sig (we summarize what sig is below); and
  2. a sequence of sub-continuation-command-trees ps, whose join point (if it exists) is given by
  3. a continuation-command-tree k.

The Tree type is is strongly typed yet agnostic to what the underlying notion of command is. To balance this tension between typedness and agnosticism, Tree requires a signature which takes the form of an indexed type. The type of a signature sig is:

sig :: Nat -> Bool -> * -> * -> * -> *

A signature instance sig n b p r q tells us:

  • What the command is and what its parameters are (this goes into the definition of sig itself, and we illustrate this in the Cmd data type that we introduce further below);
  • How many sub-continuations the command has (represented as a natural number index n :: Nat).
  • Whether the command has a continuation or not (represented as an index b :: Boolean where False represents no continuation).
  • Each sub-continuation has an input type index p and a return type index r (both Haskell types).
  • And, finally, the command return type, and hence continuation parameter type, is given by the type index q (a Haskell type).

Tree makes use of a length-indexed vector type, Vec which we borrow from the vec library on Hackage, and makes use of the following Option type for representing an optional continuation:

data Option :: Bool -> * -> * where
  Some :: a -> Option 'True  a
  None ::      Option 'False a

Composing Command Trees with Join Points

Tree has a notion of monadic bind which lets us compose two command trees t1 and t2 by concatenating t2 to all of the continuation branches of t1:

instance Monad (Tree sig) where
  Leaf x       >>= g = g x
  Node op ps k >>= g =
    Node op ps (fmap (\ k' x -> k' x >>= g) k)

Note that monadic bind only concatenates in the continuation k of Node op ps k, because the sub-continuations (ps) represent computations that can dispatch to the join point given by k.

Before we move on to the actual steps involved in the construction of our compiler, we introduce a few functions which will come in handy later.

We can fold over a Tree to interpret, compile, or transform the tree:

foldT :: (forall a. a -> f a) ->
         (forall a n b p r q.
          sig n b p r q -> Vec n (p -> f r) -> Option b (q -> f a) ->
          f a) ->
         forall a. Tree sig a -> f a
foldT gen _   (Leaf x)       = gen x
foldT gen alg (Node op cs k) =
  alg op (fmap (\ c  -> foldT gen alg . c) cs)
         (fmap (\ k' -> foldT gen alg . k') k)

And, given a command signature instance and a matching list of sub-continuations, we can “lift” these into a command tree. We give separate lifting functions for commands that have a continuation (liftT), and commands that do not (liftF):

liftT :: sig n 'True p r q -> Vec n (p -> Tree sig r) -> Tree sig q
liftT op ps = Node op ps (Some Leaf)
liftF :: sig n 'False p r q -> Vec n (p -> Tree sig r) -> Tree sig q
liftF op ps = Node op ps None

Interlude: Command Trees With Join Points are a Free Monad

Tree is a specialization of the standard notion of Free monad (Swierstra 2008).2 The reasons we choose to implement and use Tree instead of encoding it in terms of Free are:

  1. Tree is “freer” than Free, in the same sense as the Freer monad3 (Kiselyov and Ishii 2015) is freer than Free: the monad instance for Tree sig above does not require sig to be a functor. The standard notion of free monad is subject such requirements. We recall its definition below.
  2. Tree can be implemented in dependently-typed languages like Coq or Agda, whereas Free cannot because Free is not a strictly-positive type.
  3. The Tree data type provides a direct representation of the flavor of command trees that we consider. The encoding in terms of Free is less direct.

To illustrate how Tree is a specialization of the standard notion of a Free monad, we first recall the standard notion of a Free monad:4

data Free f a = Pure a
              | Impure (f (Free f a))

If f is functorial, then Free f is functorial and monadic:

instance Functor f => Functor (Free f) where
  fmap f (Pure a)   = Pure (f a)
  fmap f (Impure g) = Impure (fmap (fmap f) g)
instance Functor f => Monad (Free f) where
  return = Pure
  Pure x   >>= g = g x
  Impure f >>= g = Impure (fmap (\ k' -> k' >>= g) f)

To show how Tree is a specialization of Free, consider the following J functor:

data J sig a where
  J :: sig n b p r q ->
       Vec n (p -> Free (J sig) r) ->
       Option b (q -> a) ->
       J sig a

instance Functor (J sig) where
  fmap f (J op subs k) = J op subs (fmap (\ k' -> f . k') k)

Using J:

type Tree' sig a = Free (J sig) a

If we unfold the definition of J in the data type Free (J sig) a, and if we unfold the definition of fmap in the monadic bind of Monad (Free (J sig)), we see that Tree sig a and Tree' sig a are, in fact, equivalent.

Step 1: A Signature for Commands for Arithmetic Expressions

Having introduced a notion of command trees with join points, we are ready for the first step of constructing our compiler. In this step, we define a signature data type for a notion of command that allows us to instantiate the MonadOps interface from above with Tree as the target monad, instead of the identity monad we used for interpretation earlier. This signature data type and MonadOps instance effectively instantiates denote as a compiler from Expressions to command Trees. In a subsequent step (further below), we compile command trees to instruction sequences with explicitly named registers.

In order to compile to command trees, we introduce a command for each operation in MonadOps:

data Cmd v :: Nat -> Bool -> * -> * -> * -> * where
  Intv :: Int ->    Cmd v 'Z 'True Void Void v
  Addv :: v -> v -> Cmd v 'Z 'True Void Void v

We use 'Z to indicate that a command has zero sub-continuations; and Void is the uninhabited type in Haskell, used here as parameter and return type of the non-existent sub-continuations of Intv and Addv.

Using Cmd v, we get a command tree MonadOps instance for free:

instance MonadOps (Tree (Cmd v)) v where
  intv i     = liftT (Intv i) VNil
  addv i1 i2 = liftT (Addv i1 i2) VNil

We also get for free a compiler from expressions into command trees:

compT :: Expr -> Tree (Cmd v) v
compT = denote

We can run command trees by folding over them and mapping each command to its corresponding monadic operation:

runT :: Expr -> Int
runT =
  runIdentity .
  foldT return
        (\ op ps k ->
           case (op, ps, k) of
             (Intv i, VNil, Some k) ->
               return i >>= k
             (Addv i1 i2, VNil, Some k) ->
               return (i1 + i2) >>= k
        ) .
  compT

Here the cases in foldT map each command back into the monad we used to instantiate MonadOps in our interpreter from earlier.

This concludes our first step towards obtaining a compiler that corresponds to the interpreter we gave above. Command trees are not quite instructions yet: they chain together sequences of commands, similarly to how instructions need to be chained, but they do so by relying on Haskell functions rather than explicitly-named registers.

Step 2. Compiling Command Trees to Instructions

The second step of our derivation compiles command trees into sequences of instructions that operate on explicitly-named registers. We use strings for register names:

type Reg = String

The Instr type encodes sequences of instructions:

data Instr (sig :: Nat -> Bool -> * -> * -> * -> *) where
  EndI :: Reg -> Instr sig
  BndI :: sig n b p r q -> Reg -> Instr sig -> Instr sig

The reason we chose the operations in MonadOps the way we did when we defined the type class above, was to correspond closely to the target language that we would be compiling to. In fact, we chose MonadOps to correspond so closely to the target language that we compile to that we shall reuse the signature of the commands that we instantiated MonadOps with in our command tree compiler above. Thus BndI op r c represents a command sequence where the command op :: sig n b p r q is the initial instruction in the sequence which, when run, will yield a value that will be bound to the register named r, and thereby made available to the remaining instruction sequence c.

Before we show how to generate instruction sequences, we introduce a monad Gen for generating fresh register names:

type Gen = State Int

Here, State refers to a standard state monad.

fresh generates a fresh register name:

fresh :: Gen Reg
fresh = do
  i <- get
  put (i + 1)
  return ("r" ++ show i)

instantiate k uses fresh to generate a new register name, and passes that name to k:

instantiate :: (Reg -> Gen b) -> Gen (Reg, b)
instantiate k = do
  r  <- fresh
  k' <- k r
  return (r, k')

Using fresh and instantiate, we define a compilation function compI from command trees to instructions, where commands are parameterized by register names rather than values (Cmd Reg):

compI :: Tree (Cmd Reg) Reg -> Gen (Instr (Cmd Reg))
compI (Leaf x) = return (EndI x)
compI (Node op@(Intv _) VNil (Some k)) = do
  (r, c) <- instantiate (compI . k)
  return (BndI op r c)
compI (Node op@(Addv _ _) VNil (Some k)) = do
  (r, c) <- instantiate (compI . k)
  return (BndI op r c)

Note that compI does nothing interesting: it merely applies the continuations in the command trees in a way that makes explicit where register names are bound and referenced.

We have thus obtained a compiler for free, by using a free monad as an intermediate representation:

comp :: Expr -> Instr (Cmd Reg)
comp = fst . flip runState 0 . compI . denote

The instructions that we produce can be further processed and compiled to do proper register allocation, and/or further translated to formats that target a specific back-end (e.g., machine code or a virtual machine). For the purpose of this blog post, we do not pursue this further, but instead present a simple “machine” that we will use to informally argue that the compiler is correct.

First, we introduce a function for folding over instruction sequences:

foldI :: (Reg -> a) ->
         (forall n b p r q. sig n b p r q -> Reg -> a -> a) ->
         Instr sig -> a
foldI gen _   (EndI x)      = gen x
foldI gen alg (BndI op r c) = alg op r (foldI gen alg c)

Using this function, we can implement a “machine” which uses a standard reader monad for accumulating explicit register-bindings that associate a register name with an integer value (i.e., as pairs (Reg, Int)):

runI :: Expr -> Int
runI =
  flip runReader []  .
  foldI (\ r -> do nv <- ask; return (fromJust (lookup r nv)))
        (\ op r c ->
           case op of
             Intv i ->
               local (\ nv -> (r, i) : nv) c
             Addv r1 r2 -> do
               nv <- ask
               let i1 = fromJust (lookup r1 nv)
                   i2 = fromJust (lookup r2 nv)
               local (\ nv -> (r, i1 + i2) : nv) c) .
  comp

The monad that is being wrapped and used internally in the runI function is the same monad as we used to define our interpretation function (Identity, for our simple arithmetic expression language). The only difference between runI and the runT function we defined earlier, is that runT used Haskell functions and Haskell function application to represent continuations, whereas runI uses explicit register bindings and a reader monad. Besides this difference, the cases in the application of foldI in runI above essentially map each command back into the monad we used to instantiate MonadOps in our interpreter from earlier. Just like runT did.

This concludes the derivation of our first compiler for arithmetic expressions.

Conditional Expressions

Next, we extend our expression language with conditional expressions.

Syntax and Monad for Conditional Expressions

The Expr' data type below is an extension of Expr with two new constructs:

data Expr' = Val' Int | Add' Expr' Expr'
           | Leq' Expr' Expr' | Ifte' Expr' Expr' Expr'

Leq' e1 e2 is a less-than-or-equal expression which returns a Boolean value; and Ifte' e e1 e2 is an if-then-else expression that expects e to be Boolean-value-producing.

We shall compile this language into a register-based target instruction language with labels and labeled jumps. The monadic operations that we shall use to give a denotation function for Expr's include an operation for introducing a forward-referenceable label (lbl'), and an operation for conditionally jumping to a label (cjmp'):5

class Monad m => MonadOps' m lbl val | m -> lbl, m -> val where
  intv'   :: Int -> m val
  addv'   :: val -> val -> m val
  leqv'   :: val -> val -> m val
  cjmp'   :: val -> lbl -> m val
  lbl'    :: (lbl -> m val) -> m val

To deal with join points, we also add operations for creating (tmp'), mutating (set'), and getting (get') the value of temporaries; i.e., mutable cells (or registers) used to store the value that an expression returns.

  tmp'    :: m val
  set'    :: val -> val -> m val
  get'    :: val -> m val

At this point, it is reasonable to ask: are we still implementing a “high-level semantics”? The operations above are, after all, quite low-level, imperative, and specific to the language that our compiler will be targeting.

We claim that the semantics we shall define by means of a denotation function for Expr' can still be “high-level”. Denotation functions are not required to be defined in terms of these operations directly. Rather, we can build abstractions defined in terms of the operations. For example, we can introduce syntactic sugar for a high-level if-then-else construct (ifte'):

  ifte' :: val -> m val -> m val -> m val
  ifte' v b1 b2 = do
    vt <- tmp'
    lbl' (\ ljoin -> do
      lbl' (\ lthen -> do
        cjmp' v lthen
        v <- b2
        set' vt v
        jmp' ljoin)
      v <- b1
      set' vt v)
    get' vt

Here, ifte' makes use of the following syntactic sugar for unconditional jumps (jmp'):

  jmp' :: lbl -> m val
  jmp' l = do
    vz <- intv' 0
    vt <- leqv' vz vz
    cjmp' vt l

Note that jmp' is a very inefficient way of doing an unconditional, labeled jump. We desugar jmp' in this way to keep the MonadOps' interface as small as possible. It is easy to extend MonadOps' with an additional operation for unconditional jumps.

Intermezzo: Language Design and Low-Level Details

It is reasonable to ask: why are we baking low-level details into the monad at all? There is a wide range of alternative ways we could have chosen to factor the type class. (For example, a type class that declares intv', addv', leqv', and ifte'.)

We argue that low-level details such as control-flow and memory management are essential aspects of language design, and completely hiding these details in the monad obscures this essence. We classify language design as the design of the Expr data type and the denote function; and as meta-language design the MonadOps type class, its instances, and the data types and functions required to create the instances. Given this classification, we subscribe to the following:

  • Language designers should have control over essential aspects of language design, such as low-level control-flow and memory management aspects.
  • Meta-language designers should cater to the needs of language designers by providing declarative abstractions that strike a balance between exposing enough detail, and providing declarative abstractions that language designers and language users can intuit.

The approach we illustrate in this post exposes to language designers what the underlying instructions are that we compile into when necessary, and in a way that denotation functions read and write similarly to a definitional interpreter. For the purpose of exposition, we focus on two extremely simple languages that have little in the way of interesting memory operations, and only the simplest kind of interesting control-flow (conditional expressions). Yet we believe these languages illustrate that command trees are a useful intermediate representation which strikes a useful balance between delarativity and detail. We conjecture, and intend to explore in the future, how command trees scale to define compilers for more interesting languages in a way that is consistent with the view on language design summarized above, and in a way that compilers follow “for free” from a high-level denotation function that exposes essential details but hides irrelevant ones.

Denotation Function for Conditional Expressions

We return to the task of giving a denotation function in terms of MonadOps'. The following defines a denotation function for our language with arithmetic expressions and conditional expressions:

denote' :: MonadOps' m lbl val =>
           Expr' -> m val
denote' (Val' i) = intv' i
denote' (Add' e1 e2) = do
  v1 <- denote' e1
  v2 <- denote' e2
  addv' v1 v2
denote' (Leq' e1 e2) = do
  v1 <- denote' e1
  v2 <- denote' e2
  leqv' v1 v2
denote' (Ifte' e e1 e2) = do
  v <- denote' e
  ifte' v (denote' e1) (denote' e2)

An Interpreter for Conditional Expressions

As before, we can instantiate MonadOps' to provide an interpreter for our language. This interpreter provides a kind of “reference semantics” for our language. We will only argue informally that the compiler we present later respects this reference semantics. Formal reasoning is beyond the scope of this post.

The monadic operations that we used for MonadOps require jumps and state. To encode these concepts in an interpreter, we resort to a standard continuation monad transformer (ContT), a standard state monad (State), and a notion of tagged values (V) that distinguishes integer values, Boolean values, and temporaries (we use a String to represent the name of a temporary):

type Temp = String
data V = IntV Int | BoolV Bool | TmpV Temp

The MonadOps' instance that uses continuations for modeling jumps, and state for modeling temporaries, is given below:

instance MonadOps' (ContT V (State [(Temp, V)])) (V -> State [(Temp, V)] V) V where
  intv' i = return (IntV i)
  addv' (IntV i1) (IntV i2) = return (IntV (i1 + i2))
  leqv' (IntV i1) (IntV i2) = return (BoolV (i1 <= i2))
  cjmp' (BoolV True) k = withContT (\ _ _ -> k (IntV 0)) (return (IntV 0))
  cjmp' (BoolV False) _ = return (IntV 0)
  lbl' k = withContT (\ l _ -> runContT (k l) l) (return (IntV 0))
  tmp' = do
    s <- get
    let t = "t" ++ show (length s)
    modify (\ s -> (t, IntV 0) : s)
    return (TmpV t)
  set' (TmpV t) v = do
    modify (\ s -> (t, v) : s)
    return v
  get' (TmpV t) = do
    s <- get
    return (fromJust (lookup t s))

Here, labels are represented as continuations, and jumping to a label is a matter of invoking a “label continuation”.

Using the instance of MonadOps' above, our interpreter is:

interp' :: Expr' -> V
interp' =
  fst .
  flip runState [] .
  flip runContT return .
  denote'

Modus Operandi, Again

As with the compiler we built for arithmetic expressions, our compiler for conditional expressions will be built in two steps:

  1. We provide a signature for commands for conditional expressions, and get (for free) a compiler from Expr' into command trees.
  2. We provide a compilation function from command trees to labeled instructions.

For each of the two steps, we provide a semantics for the intermediate and target representations, and discuss how these semantics relate to the reference semantics we gave in interp' above.

Step 1: A Signature for Commands

The signature for commands is given below, and corresponds to the type signatures for the corresponding commands in the MonadOps' type class we gave above:

data Cmd' v l :: Nat -> Bool -> * -> * -> * -> * where
  Intv' :: Int ->    Cmd' v l 'Z      'True  Void Void v
  Addv' :: v -> v -> Cmd' v l 'Z      'True  Void Void v
  Leqv' :: v -> v -> Cmd' v l 'Z      'True  Void Void v
  Cjmp' :: v -> l -> Cmd' v l 'Z      'True  Void Void v
  Lbl'  ::           Cmd' v l ('S 'Z) 'True  l    v    v
  Tmp'  ::           Cmd' v l 'Z      'True  Void Void v
  Set'  :: v -> v -> Cmd' v l 'Z      'True  Void Void v
  Get'  :: v ->      Cmd' v l 'Z      'True  Void Void v

The only command that has a sub-continuation (indicated by the natural number ('S 'Z)) is the Lbl' command, corresponding to the lbl' operation in MonadOps' which has a monadic computation as an argument.

The close correspondence between Cmd' and MonadOps' affords the following instance of MonadOps':

instance MonadOps' (Tree (Cmd' v l)) l v where
  intv' i = liftT (Intv' i) VNil
  addv' v1 v2 = liftT (Addv' v1 v2) VNil
  leqv' v1 v2 = liftT (Leqv' v1 v2) VNil
  cjmp' v l = liftT (Cjmp' v l) VNil
  lbl' k = liftT Lbl' (k ::: VNil)
  tmp' = liftT Tmp' VNil
  set' vr v = liftT (Set' vr v) VNil
  get' vr = liftT (Get' vr) VNil

The following compilation function from expressions to command trees follows for free:

compT' :: Expr' -> Tree (Cmd' v l) v
compT' = denote'

We can execute command trees for conditional expressions by mapping each command onto the corresponding monad operation that we gave in the MonadOps' instance for interp' above; just like we did for arithmetic expressions above:

runT' :: Expr' -> V
runT' =
  fst .
  flip runState [] .
  flip runContT return .
  foldT return
        (\ op ps k ->
           case (op, ps, k) of
             (Intv' i, VNil, Some k) ->
               return (IntV i) >>= k
             (Addv' (IntV i1) (IntV i2), VNil, Some k) ->
               return (IntV (i1 + i2)) >>= k
             (Leqv' (IntV i1) (IntV i2), VNil, Some k) ->
               return (BoolV (i1 <= i2)) >>= k
             (Cjmp' (BoolV True) l, VNil, Some k) ->
               withContT (\ _ _ -> l (IntV 0)) (return (IntV 0)) >>= k
             (Cjmp' (BoolV False) _, VNil, Some k) ->
               return (IntV 0) >>= k
             (Lbl', k0 ::: VNil, Some k) ->
               withContT (\ l _ -> runContT (k0 l) l) (return (IntV 0)) >>= k
             (Tmp', VNil, Some k) -> do
               s <- get
               let t = "t" ++ show (length s)
               modify (\ s -> (t, IntV 0) : s)
               k (TmpV t)
             (Set' (TmpV t) v, VNil, Some k) -> do
               modify (\ s -> (t, v) : s)
               k v
             (Get' (TmpV t), VNil, Some k) -> do
               s <- get
               k (fromJust (lookup t s))) .
  compT'

Step 2: Compiling Command Trees

In our second step, we compile command trees into sequences of labeled instructions. We use strings as labels:

type Label = String

Furthermore, we refine our notion of instruction sequence from earlier, to compose sequences of instructions that may or may not yield a result (sequenced using BndI' and SeqI' as defined below):

data Instr' (sig :: Nat -> Bool -> * -> * -> * -> *) where
  EndI' :: Reg -> Instr' sig
  BndI' :: sig n b p r q -> Reg -> Instr' sig -> Instr' sig
  SeqI' :: sig n b p r q ->        Instr' sig -> Instr' sig
  LblI' :: Label ->                Instr' sig -> Instr' sig

We also introduce a notion of concatenation for instructions which our compiler will make use of:

concatI' :: Instr' sig -> Instr' sig -> Instr' sig
concatI' (EndI' _)       c  = c
concatI' (BndI' op r c1) c2 = BndI' op r (concatI' c1 c2)
concatI' (SeqI' op c1)   c2 = SeqI' op   (concatI' c1 c2)
concatI' (LblI' l  c1)   c2 = LblI' l    (concatI' c1 c2)

As for arithmetic expressions, the main purpose of our compiler for conditional expressions is to make bindings explicit. Unlike the instructions we had for arithmetic expressions, instructions for conditional expressions bind both register names and label names. We use a monad for generating fresh names of either kind:

type Gen' = State (Int, Int)

And introduce functions for generating fresh names (their implementation is obvious and omitted):

freshr :: Gen' Reg
freshl :: Gen' Reg

instantiate' k uses fresh to generate a new register name, and passes that name to k, just as instrantiate from earlier did:

instantiate' :: (Reg -> Gen' b) -> Gen' (Reg, b)
instantiate' k = do
  r  <- freshr
  k' <- k r
  return (r, k')

Without further ado, the heart of our compiler, which translates command trees to instruction sequences:

compI' :: Tree (Cmd' Reg Label) Reg ->
          Gen' (Instr' (Cmd' Reg Label))
compI' (Leaf x) = return (EndI' x)
compI' (Node op@(Intv' _) VNil (Some k)) = do
  (r, c) <- instantiate' (compI' . k)
  return (BndI' op r c)
compI' (Node op@(Addv' _ _) VNil (Some k)) = do
  (r, c) <- instantiate' (compI' . k)
  return (BndI' op r c)
compI' (Node op@(Leqv' _ _) VNil (Some k)) = do
  (r, c) <- instantiate' (compI' . k)
  return (BndI' op r c)
compI' (Node op@(Cjmp' _ _) VNil (Some k)) = do
  (_, c) <- instantiate' (compI' . k)
  return (SeqI' op c)
compI' (Node Lbl' (k0 ::: VNil) (Some k)) = do
  l <- freshl
  c0 <- compI' (k0 l)
  (_, c ) <- instantiate' (compI' . k )
  return (concatI' c0 (LblI' l c))
compI' (Node op@Tmp' VNil (Some k)) = do
  (r, c) <- instantiate' (compI' . k)
  return (BndI' op r c)
compI' (Node op@(Set' _ _) VNil (Some k)) = do
  (r, c) <- instantiate' (compI' . k)
  return (BndI' op r c)
compI' (Node op@(Get' _) VNil (Some k)) = do
  (r, c) <- instantiate' (compI' . k)
  return (BndI' op r c)

The only case of the compiler that does something slightly interesting is the case for the Lbl' instruction, which concatenates a sub-continuation with a (labeled) continuation.

Our (free) compiler is thus:

comp' :: Expr' -> Instr' (Cmd' Reg Label)
comp' = fst . flip runState (0, 0) . compI' . denote'

Before we wrap up and discuss the takeaways from what we have seen in this post, we illustrate how the semantics of compiled instruction sequences corresponds rather systematically to the “reference semantics” in interp' from earlier. The main difference is how labels and jumps are treated. Since we now have a first-order representation, we do not use a continuation monad for dealing with jumps, but instead actually look up a code label in an instruction sequence, using the following resolveI' function:

resolveI' :: Instr' sig -> Label -> Instr' sig
resolveI' (BndI' _ _ c) l = resolveI' c l
resolveI' (SeqI' _   c) l = resolveI' c l
resolveI' (LblI' l'  c) l | l == l' = c
resolveI' (LblI' _   c) l = resolveI' c l

The other difference is that the cases of the execI' function below use registers for explicit memory. These differences aside, execI' corresponds to the “reference semantics” in interp', in the sense that each signature command is mapped onto the corresponding monad operation that we gave in the MonadOps' instance for interp' above.

execI' :: Instr' (Cmd' Reg Label) ->
          ReaderT (Instr' (Cmd' Reg Label)) (State [(Temp, V)]) V
execI' (EndI' r) = do
  s <- get
  return (fromJust (lookup r s))
execI' (BndI' (Intv' i) r c) = do
  modify (\ s -> (r, IntV i) : s)
  execI' c
execI' (BndI' (Addv' r1 r2) r c) = do
  s <- get
  let IntV i1 = fromJust (lookup r1 s)
      IntV i2 = fromJust (lookup r2 s)
  modify (\ s -> (r, IntV (i1 + i2)) : s)
  execI' c
execI' (BndI' (Leqv' r1 r2) r c) = do
  s <- get
  let IntV i1 = fromJust (lookup r1 s)
      IntV i2 = fromJust (lookup r2 s)
  modify (\ s -> (r, BoolV (i1 <= i2)) : s)
  execI' c
execI' (SeqI' (Cjmp' r l) c) = do
  s <- get
  let BoolV b = fromJust (lookup r s)
  case b of
    True  -> do
      p <- ask
      let k = resolveI' p l
      execI' k
    False ->
      execI' c
execI' (LblI' _ c) = execI' c
execI' (BndI' Tmp' r c) = do
  s <- get
  let t = "t" ++ show (length s)
  modify (\ s -> (r, TmpV t) : (t, IntV 0) : s)
  execI' c
execI' (BndI' (Set' r1 r2) r c) = do
  s <- get
  let TmpV t = fromJust (lookup r1 s)
      v      = fromJust (lookup r2 s)
  modify (\ s -> (r, v) : (t, v) : s)
  execI' c
execI' (BndI' (Get' rt) r c) = do
  s <- get
  let TmpV t = fromJust (lookup rt s)
      v      = fromJust (lookup t  s)
  modify (\ s -> (r, v) : s)
  execI' c
runI' :: Expr' -> V
runI' e =
  let i = comp' e
  in (fst .
      flip runState [] .
      flip runReaderT i .
      execI')
     i

Conclusion

We have presented an approach to deriving a compiler from a high-level semantics. We have illustrated the approach by building two compilers: one for arithmetic expressions, and one for arithmetic and conditional expressions. The key idea of the approach is to use denotation functions to define a high-level semantics, defined as a mapping of object language expressions into monadic operations. By choosing monadic operations in a way that they are low-level enough to correspond closely with the target instruction set, compilers follow for free from denotation functions. We argue that the low-level nature of the monadic operations that our approach relies on does not compromise the high-level nature of denotation functions, because we can build high-level abstractions on top of the low-level monadic operations, and use the high-level abstractions in our denotation functions.

References

Kiselyov, Oleg, and Hiromi Ishii. 2015. “Freer Monads, More Extensible Effects.” In Proceedings of the 8th ACM SIGPLAN Symposium on Haskell, Haskell 2015, Vancouver, BC, Canada, September 3-4, 2015, edited by Ben Lippmeier, 94–105. ACM. https://doi.org/10.1145/2804302.2804319.
Swierstra, Wouter. 2008. “Data Types à La Carte.” J. Funct. Program. 18 (4): 423–36. https://doi.org/10.1017/S0956796808006758.