{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE QuantifiedConstraints #-}
{-# LANGUAGE UndecidableInstances #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FunctionalDependencies #-}
{-# OPTIONS -W #-}
import Control.Monad
import Control.Monad.Identity
import Control.Monad.State
import Control.Monad.Reader
import Control.Monad.Cont
import Data.Vec.Lazy (Vec (VNil, (:::)))
import Data.Nat
import Data.Void
import Data.Maybe
Compilers for Free
DRAFT
This blog post is a draft. Please contact me if you spot errors or have feedback. If you are interested in applying the idea that the post describes, or know of related work based on similar ideas as the approach that this post describes, I would also love to hear from you.
Abstract
We present an approach to deriving a compiler from a high-level semantics. We illustrate the approach by implementing two compilers: one for arithmetic expressions, and one for arithmetic and conditional expressions. The programs we present are implemented in Haskell.
The contents of this blog post is based on joint work with Andrew Tolmach, Eelco Visser, Bram Crielaard, and Chiel Bruin.
Contents
-
1 Arithmetic Expressions
- 1.1 Syntax, Monad, and Denotation
- 1.2 An Interpreter for Arithmetic Expressions
- 1.3 Modus Operandi for Deriving a Compiler
- 1.4 Command Trees With Join Points
- 1.5 Composing Command Trees with Join Points
- 1.6 Interlude: Command Trees With Join Points are a Free Monad
- 1.7 Step 1: A Signature for Commands for Arithmetic Expressions
- 1.8 Step 2. Compiling Command Trees to Instructions
-
2 Conditional Expressions
- 2.1 Syntax and Monad for Conditional Expressions
- 2.2 Intermezzo: Language Design and Low-Level Details
- 2.3 Denotation Function for Conditional Expressions
- 2.4 An Interpreter for Conditional Expressions
- 2.5 Modus Operandi, Again
- 2.6 Step 1: A Signature for Commands
- 2.7 Step 2: Compiling Command Trees
- 3 Conclusion
Arithmetic Expressions
We start by considering how to compile arithmetic expressions.
Syntax, Monad, and Denotation
The syntax of arithmetic expressions is given by the following data type:
data Expr = Val Int | Add Expr Expr
The key idea behind our approach is to specify the semantics as a monadic denotation function which folds over the abstract syntax tree of our object language, and yields as output a monadic computation.
To illustrate the approach we shall define a monadic denotation function for arithmetic expressions. The monad we shall use to give meaning to arithmetic expressions will include the monadic operations given by the following type class:
class Monad m => MonadOps m val where
intv :: Int -> m val
addv :: val -> val -> m val
The monadic operations are chosen to closely correspond to the side-effectful operations that the machine that our compiler shall be targeting will interpret. The machine we shall target is register-based, and will execute sequences of instructions. In particular, the machine will know how to interpret instructions with a type signature akin to the following (we leave the actual notion of instruction abstract for now):
inti :: Int -> Reg -> Instr
addi :: Reg -> Reg -> Reg -> Instr
inti i r
binds integer value i
to register r
, and addi r1 r2 r
adds the integer values stored in registers r1
and r2
, and stores the result in r
.
Without further ado; the denotation function:
denote :: MonadOps m val =>
Expr -> m val
Val i) = intv i
denote (Add e1 e2) = do
denote (<- denote e1
v1 <- denote e2
v2 addv v1 v2
We will illustrate how we can instantiate the MonadOps
interface to yield first a definitional interpreter, and subsequently a definitional compiler.
The definitional compiler is given by mapping each monadic operation to underlying low-level instructions.
The simplicitly of this mapping is what motivates the name of this blog post: the compiler comes “for free”.
An Interpreter for Arithmetic Expressions
It is straightforward to instantiate the MonadOps
interface with the identity monad to yield an interpreter for our language:
instance MonadOps Identity Int where
= return . id
intv = return (i1 + i2) addv i1 i2
interp :: Expr -> Int
= runIdentity . denote interp
Modus Operandi for Deriving a Compiler
To derive a compiler from the high-level denotation function we introduced above, we proceed in three steps:
- We introduce a language-independent notion command trees with join points, which forms a free monad.
- We provide a signature for commands for our arithmetic expressions.
- We provide a compilation function from command trees to instructions.
Command Trees With Join Points
We introduce a data type of command trees with join points. While the simple arithmetic expression language does not require a notion of join point, we still illustrate our approach in terms of a data type that supports it. Later in this post, we will see how command trees are useful for compiling conditional expressions that do have a notion of join point.
A command tree is a syntactic representation for possible-sequences of commands, and a command is a syntactic representation of a computational operation, such as the operations that we defined in the MonadOps
type class.1
data Tree sig a where
Leaf :: a -> Tree sig a
Node :: sig n b p r q ->
Vec n (p -> Tree sig r) ->
Option b (q -> Tree sig a) ->
Tree sig a
instance Functor (Tree sig) where
fmap f (Leaf x) = Leaf (f x)
fmap f (Node op cs k) = Node op cs (fmap (\ c -> fmap f . c) k)
instance Applicative (Tree sig) where
pure = Leaf
<*>) = ap (
A Leaf v
represents the direct return of a value v
.
On the other hand, Node op ps k
is a command tree node which comprises:
- an operation
op
, described by signaturesig
(we summarize whatsig
is below); and - a sequence of sub-continuation-command-trees
ps
, whose join point (if it exists) is given by - a continuation-command-tree
k
.
The Tree
type is is strongly typed yet agnostic to what the underlying notion of command is.
To balance this tension between typedness and agnosticism, Tree
requires a sig
nature which takes the form of an indexed type.
The type of a signature sig
is:
sig :: Nat -> Bool -> * -> * -> * -> *
A signature instance sig n b p r q
tells us:
- What the command is and what its parameters are (this goes into the definition of
sig
itself, and we illustrate this in theCmd
data type that we introduce further below); - How many sub-continuations the command has (represented as a natural number index
n :: Nat
). - Whether the command has a continuation or not (represented as an index
b :: Boolean
whereFalse
represents no continuation). - Each sub-continuation has an input type index
p
and a return type indexr
(both Haskell types). - And, finally, the command return type, and hence continuation parameter type, is given by the type index
q
(a Haskell type).
Tree
makes use of a length-indexed vector type, Vec
which we borrow from the vec
library on Hackage, and makes use of the following Option
type for representing an optional continuation:
data Option :: Bool -> * -> * where
Some :: a -> Option 'True a
None :: Option 'False a
instance Functor (Option b) where
fmap f (Some x) = Some (f x)
fmap _ None = None
Composing Command Trees with Join Points
Tree
has a notion of monadic bind which lets us compose two command trees t1
and t2
by concatenating t2
to all of the continuation branches of t1
:
instance Monad (Tree sig) where
Leaf x >>= g = g x
Node op ps k >>= g =
Node op ps (fmap (\ k' x -> k' x >>= g) k)
Note that monadic bind only concatenates in the continuation k
of Node op ps k
, because the sub-continuations (ps
) represent computations that can dispatch to the join point given by k
.
Before we move on to the actual steps involved in the construction of our compiler, we introduce a few functions which will come in handy later.
We can fold over a Tree
to interpret, compile, or transform the tree:
foldT :: (forall a. a -> f a) ->
forall a n b p r q.
(-> Vec n (p -> f r) -> Option b (q -> f a) ->
sig n b p r q ->
f a) forall a. Tree sig a -> f a
Leaf x) = gen x
foldT gen _ (Node op cs k) =
foldT gen alg (fmap (\ c -> foldT gen alg . c) cs)
alg op (fmap (\ k' -> foldT gen alg . k') k) (
And, given a command signature instance and a matching list of sub-continuations, we can “lift” these into a command tree.
We give separate lifting functions for commands that have a continuation (liftT
), and commands that do not (liftF
):
liftT :: sig n 'True p r q -> Vec n (p -> Tree sig r) -> Tree sig q
= Node op ps (Some Leaf) liftT op ps
liftF :: sig n 'False p r q -> Vec n (p -> Tree sig r) -> Tree sig q
= Node op ps None liftF op ps
Interlude: Command Trees With Join Points are a Free Monad
Tree
is a specialization of the standard notion of Free
monad (Swierstra 2008).2
The reasons we choose to implement and use Tree
instead of encoding it in terms of Free
are:
Tree
is “freer” thanFree
, in the same sense as theFreer
monad3 (Kiselyov and Ishii 2015) is freer thanFree
: the monad instance forTree sig
above does not requiresig
to be a functor. The standard notion of free monad is subject such requirements. We recall its definition below.Tree
can be implemented in dependently-typed languages like Coq or Agda, whereasFree
cannot becauseFree
is not a strictly-positive type.- The
Tree
data type provides a direct representation of the flavor of command trees that we consider. The encoding in terms ofFree
is less direct.
To illustrate how Tree
is a specialization of the standard notion of a Free
monad, we first recall the standard notion of a Free
monad:4
data Free f a = Pure a
| Impure (f (Free f a))
If f
is functorial, then Free f
is functorial and monadic:
instance Functor f => Functor (Free f) where
fmap f (Pure a) = Pure (f a)
fmap f (Impure g) = Impure (fmap (fmap f) g)
instance Functor f => Applicative (Free f) where
pure = Pure
<*>) = ap (
instance Functor f => Monad (Free f) where
return = Pure
Pure x >>= g = g x
Impure f >>= g = Impure (fmap (\ k' -> k' >>= g) f)
To show how Tree
is a specialization of Free
, consider the following J
functor:
data J sig a where
J :: sig n b p r q ->
Vec n (p -> Free (J sig) r) ->
Option b (q -> a) ->
J sig a
instance Functor (J sig) where
fmap f (J op subs k) = J op subs (fmap (\ k' -> f . k') k)
Using J
:
type Tree' sig a = Free (J sig) a
If we unfold the definition of J
in the data type Free (J sig) a
, and if we unfold the definition of fmap
in the monadic bind of Monad (Free (J sig))
, we see that Tree sig a
and Tree' sig a
are, in fact, equivalent.
Step 1: A Signature for Commands for Arithmetic Expressions
Having introduced a notion of command trees with join points, we are ready for the first step of constructing our compiler.
In this step, we define a signature data type for a notion of command that allows us to instantiate the MonadOps
interface from above with Tree
as the target monad, instead of the identity monad we used for interpretation earlier.
This signature data type and MonadOps
instance effectively instantiates denote
as a compiler from Expr
essions to command Tree
s.
In a subsequent step (further below), we compile command trees to instruction sequences with explicitly named registers.
In order to compile to command trees, we introduce a command for each operation in MonadOps
:
data Cmd v :: Nat -> Bool -> * -> * -> * -> * where
Intv :: Int -> Cmd v 'Z 'True Void Void v
Addv :: v -> v -> Cmd v 'Z 'True Void Void v
We use 'Z
to indicate that a command has zero sub-continuations; and Void
is the uninhabited type in Haskell, used here as parameter and return type of the non-existent sub-continuations of Intv
and Addv
.
Using Cmd v
, we get a command tree MonadOps
instance for free:
instance MonadOps (Tree (Cmd v)) v where
= liftT (Intv i) VNil
intv i = liftT (Addv i1 i2) VNil addv i1 i2
We also get for free a compiler from expressions into command trees:
compT :: Expr -> Tree (Cmd v) v
= denote compT
We can run command trees by folding over them and mapping each command to its corresponding monadic operation:
runT :: Expr -> Int
=
runT .
runIdentity return
foldT ->
(\ op ps k case (op, ps, k) of
Intv i, VNil, Some k) ->
(return i >>= k
Addv i1 i2, VNil, Some k) ->
(return (i1 + i2) >>= k
.
) compT
Here the cases in foldT
map each command back into the monad we used to instantiate MonadOps
in our interpreter from earlier.
This concludes our first step towards obtaining a compiler that corresponds to the interpreter we gave above. Command trees are not quite instructions yet: they chain together sequences of commands, similarly to how instructions need to be chained, but they do so by relying on Haskell functions rather than explicitly-named registers.
Step 2. Compiling Command Trees to Instructions
The second step of our derivation compiles command trees into sequences of instructions that operate on explicitly-named registers. We use strings for register names:
type Reg = String
The Instr
type encodes sequences of instructions:
data Instr (sig :: Nat -> Bool -> * -> * -> * -> *) where
EndI :: Reg -> Instr sig
BndI :: sig n b p r q -> Reg -> Instr sig -> Instr sig
The reason we chose the operations in MonadOps
the way we did when we defined the type class above, was to correspond closely to the target language that we would be compiling to.
In fact, we chose MonadOps
to correspond so closely to the target language that we compile to that we shall reuse the sig
nature of the commands that we instantiated MonadOps
with in our command tree compiler above.
Thus BndI op r c
represents a command sequence where the command op :: sig n b p r q
is the initial instruction in the sequence which, when run, will yield a value that will be bound to the register named r
, and thereby made available to the remaining instruction sequence c
.
Before we show how to generate instruction sequences, we introduce a monad Gen
for generating fresh register names:
type Gen = State Int
Here, State
refers to a standard state monad.
fresh
generates a fresh register name:
fresh :: Gen Reg
= do
fresh <- get
i + 1)
put (i return ("r" ++ show i)
instantiate k
uses fresh
to generate a new register name, and passes that name to k
:
instantiate :: (Reg -> Gen b) -> Gen (Reg, b)
= do
instantiate k <- fresh
r <- k r
k' return (r, k')
Using fresh
and instantiate
, we define a compilation function compI
from command trees to instructions, where commands are parameterized by register names rather than values (Cmd Reg
):
compI :: Tree (Cmd Reg) Reg -> Gen (Instr (Cmd Reg))
Leaf x) = return (EndI x)
compI (Node op@(Intv _) VNil (Some k)) = do
compI (<- instantiate (compI . k)
(r, c) return (BndI op r c)
Node op@(Addv _ _) VNil (Some k)) = do
compI (<- instantiate (compI . k)
(r, c) return (BndI op r c)
Note that compI
does nothing interesting: it merely applies the continuations in the command trees in a way that makes explicit where register names are bound and referenced.
We have thus obtained a compiler for free, by using a free monad as an intermediate representation:
comp :: Expr -> Instr (Cmd Reg)
= fst . flip runState 0 . compI . denote comp
The instructions that we produce can be further processed and compiled to do proper register allocation, and/or further translated to formats that target a specific back-end (e.g., machine code or a virtual machine). For the purpose of this blog post, we do not pursue this further, but instead present a simple “machine” that we will use to informally argue that the compiler is correct.
First, we introduce a function for folding over instruction sequences:
foldI :: (Reg -> a) ->
forall n b p r q. sig n b p r q -> Reg -> a -> a) ->
(Instr sig -> a
EndI x) = gen x
foldI gen _ (BndI op r c) = alg op r (foldI gen alg c) foldI gen alg (
Using this function, we can implement a “machine” which uses a standard reader monad for accumulating explicit register-bindings that associate a register name with an integer value (i.e., as pairs (Reg, Int)
):
runI :: Expr -> Int
=
runI flip runReader [] .
-> do nv <- ask; return (fromJust (lookup r nv)))
foldI (\ r ->
(\ op r c case op of
Intv i ->
-> (r, i) : nv) c
local (\ nv Addv r1 r2 -> do
<- ask
nv let i1 = fromJust (lookup r1 nv)
= fromJust (lookup r2 nv)
i2 -> (r, i1 + i2) : nv) c) .
local (\ nv comp
The monad that is being wrapped and used internally in the runI
function is the same monad as we used to define our interpretation function (Identity
, for our simple arithmetic expression language).
The only difference between runI
and the runT
function we defined earlier, is that runT
used Haskell functions and Haskell function application to represent continuations, whereas runI
uses explicit register bindings and a reader monad.
Besides this difference, the cases in the application of foldI
in runI
above essentially map each command back into the monad we used to instantiate MonadOps
in our interpreter from earlier.
Just like runT
did.
This concludes the derivation of our first compiler for arithmetic expressions.
deriving instance Show v => Show (Cmd v n b p r q)
deriving instance Show (Instr (Cmd Reg))
= comp (Add (Add (Val 5) (Val 6)) (Add (Val 19) (Val 12)))
testI -- BndI (Intv 5) "r0" (BndI (Intv 6) "r1" (BndI (Addv "r0" "r1") "r2" (BndI (Intv 19) "r3" (BndI (Intv 12) "r4" (BndI (Addv "r3" "r4") "r5" (BndI (Addv "r2" "r5") "r6" (EndI "r6")))))))
Conditional Expressions
Next, we extend our expression language with conditional expressions.
Syntax and Monad for Conditional Expressions
The Expr'
data type below is an extension of Expr
with two new constructs:
data Expr' = Val' Int | Add' Expr' Expr'
| Leq' Expr' Expr' | Ifte' Expr' Expr' Expr'
Leq' e1 e2
is a less-than-or-equal expression which returns a Boolean value; and Ifte' e e1 e2
is an if-then-else expression that expects e
to be Boolean-value-producing.
We shall compile this language into a register-based target instruction language with labels and labeled jumps.
The monadic operations that we shall use to give a denotation function for Expr'
s include an operation for introducing a forward-referenceable label (lbl'
), and an operation for conditionally jumping to a label (cjmp'
):5
class Monad m => MonadOps' m lbl val | m -> lbl, m -> val where
intv' :: Int -> m val
addv' :: val -> val -> m val
leqv' :: val -> val -> m val
cjmp' :: val -> lbl -> m val
lbl' :: (lbl -> m val) -> m val
To deal with join points, we also add operations for creating (tmp'
), mutating (set'
), and getting (get'
) the value of temporaries; i.e., mutable cells (or registers) used to store the value that an expression returns.
tmp' :: m val
set' :: val -> val -> m val
get' :: val -> m val
At this point, it is reasonable to ask: are we still implementing a “high-level semantics”? The operations above are, after all, quite low-level, imperative, and specific to the language that our compiler will be targeting.
We claim that the semantics we shall define by means of a denotation function for Expr'
can still be “high-level”.
Denotation functions are not required to be defined in terms of these operations directly.
Rather, we can build abstractions defined in terms of the operations.
For example, we can introduce syntactic sugar for a high-level if-then-else construct (ifte'
):
ifte' :: val -> m val -> m val -> m val
= do
ifte' v b1 b2 <- tmp'
vt -> do
lbl' (\ ljoin -> do
lbl' (\ lthen
cjmp' v lthen<- b2
v
set' vt v
jmp' ljoin)<- b1
v
set' vt v) get' vt
Here, ifte'
makes use of the following syntactic sugar for unconditional jumps (jmp'
):
jmp' :: lbl -> m val
= do
jmp' l <- intv' 0
vz <- leqv' vz vz
vt cjmp' vt l
Note that jmp'
is a very inefficient way of doing an unconditional, labeled jump.
We desugar jmp'
in this way to keep the MonadOps'
interface as small as possible.
It is easy to extend MonadOps'
with an additional operation for unconditional jumps.
Intermezzo: Language Design and Low-Level Details
It is reasonable to ask: why are we baking low-level details into the monad at all?
There is a wide range of alternative ways we could have chosen to factor the type class.
(For example, a type class that declares intv'
, addv'
, leqv'
, and ifte'
.)
We argue that low-level details such as control-flow and memory management are essential aspects of language design, and completely hiding these details in the monad obscures this essence.
We classify language design as the design of the Expr
data type and the denote
function; and as meta-language design the MonadOps
type class, its instances, and the data types and functions required to create the instances.
Given this classification, we subscribe to the following:
- Language designers should have control over essential aspects of language design, such as low-level control-flow and memory management aspects.
- Meta-language designers should cater to the needs of language designers by providing declarative abstractions that strike a balance between exposing enough detail, and providing declarative abstractions that language designers and language users can intuit.
The approach we illustrate in this post exposes to language designers what the underlying instructions are that we compile into when necessary, and in a way that denotation functions read and write similarly to a definitional interpreter. For the purpose of exposition, we focus on two extremely simple languages that have little in the way of interesting memory operations, and only the simplest kind of interesting control-flow (conditional expressions). Yet we believe these languages illustrate that command trees are a useful intermediate representation which strikes a useful balance between delarativity and detail. We conjecture, and intend to explore in the future, how command trees scale to define compilers for more interesting languages in a way that is consistent with the view on language design summarized above, and in a way that compilers follow “for free” from a high-level denotation function that exposes essential details but hides irrelevant ones.
Denotation Function for Conditional Expressions
We return to the task of giving a denotation function in terms of MonadOps'
.
The following defines a denotation function for our language with arithmetic expressions and conditional expressions:
denote' :: MonadOps' m lbl val =>
Expr' -> m val
Val' i) = intv' i
denote' (Add' e1 e2) = do
denote' (<- denote' e1
v1 <- denote' e2
v2
addv' v1 v2Leq' e1 e2) = do
denote' (<- denote' e1
v1 <- denote' e2
v2
leqv' v1 v2Ifte' e e1 e2) = do
denote' (<- denote' e
v ifte' v (denote' e1) (denote' e2)
An Interpreter for Conditional Expressions
As before, we can instantiate MonadOps'
to provide an interpreter for our language.
This interpreter provides a kind of “reference semantics” for our language.
We will only argue informally that the compiler we present later respects this reference semantics.
Formal reasoning is beyond the scope of this post.
The monadic operations that we used for MonadOps
require jumps and state.
To encode these concepts in an interpreter, we resort to a standard continuation monad transformer (ContT
), a standard state monad (State
), and a notion of tagged values (V
) that distinguishes integer values, Boolean values, and temporaries (we use a String
to represent the name of a temporary):
type Temp = String
data V = IntV Int | BoolV Bool | TmpV Temp
deriving instance Show V
The MonadOps'
instance that uses continuations for modeling jumps, and state for modeling temporaries, is given below:
instance MonadOps' (ContT V (State [(Temp, V)])) (V -> State [(Temp, V)] V) V where
= return (IntV i)
intv' i IntV i1) (IntV i2) = return (IntV (i1 + i2))
addv' (IntV i1) (IntV i2) = return (BoolV (i1 <= i2))
leqv' (BoolV True) k = withContT (\ _ _ -> k (IntV 0)) (return (IntV 0))
cjmp' (BoolV False) _ = return (IntV 0)
cjmp' (= withContT (\ l _ -> runContT (k l) l) (return (IntV 0))
lbl' k = do
tmp' <- get
s let t = "t" ++ show (length s)
-> (t, IntV 0) : s)
modify (\ s return (TmpV t)
TmpV t) v = do
set' (-> (t, v) : s)
modify (\ s return v
TmpV t) = do
get' (<- get
s return (fromJust (lookup t s))
Here, labels are represented as continuations, and jumping to a label is a matter of invoking a “label continuation”.
Using the instance of MonadOps'
above, our interpreter is:
interp' :: Expr' -> V
=
interp' fst .
flip runState [] .
flip runContT return .
denote'
interp' :: Expr' -> V
=
interp' fst .
flip runState (tail [("", IntV 0)]) .
flip runContT return .
denote'
Modus Operandi, Again
As with the compiler we built for arithmetic expressions, our compiler for conditional expressions will be built in two steps:
- We provide a signature for commands for conditional expressions, and get (for free) a compiler from
Expr'
into command trees. - We provide a compilation function from command trees to labeled instructions.
For each of the two steps, we provide a semantics for the intermediate and target representations, and discuss how these semantics relate to the reference semantics we gave in interp'
above.
Step 1: A Signature for Commands
The signature for commands is given below, and corresponds to the type signatures for the corresponding commands in the MonadOps'
type class we gave above:
data Cmd' v l :: Nat -> Bool -> * -> * -> * -> * where
Intv' :: Int -> Cmd' v l 'Z 'True Void Void v
Addv' :: v -> v -> Cmd' v l 'Z 'True Void Void v
Leqv' :: v -> v -> Cmd' v l 'Z 'True Void Void v
Cjmp' :: v -> l -> Cmd' v l 'Z 'True Void Void v
Lbl' :: Cmd' v l ('S 'Z) 'True l v v
Tmp' :: Cmd' v l 'Z 'True Void Void v
Set' :: v -> v -> Cmd' v l 'Z 'True Void Void v
Get' :: v -> Cmd' v l 'Z 'True Void Void v
The only command that has a sub-continuation (indicated by the natural number ('S 'Z)
) is the Lbl'
command, corresponding to the lbl'
operation in MonadOps'
which has a monadic computation as an argument.
deriving instance (Show v, Show l) => Show (Cmd' v l n b p r q)
The close correspondence between Cmd'
and MonadOps'
affords the following instance of MonadOps'
:
instance MonadOps' (Tree (Cmd' v l)) l v where
= liftT (Intv' i) VNil
intv' i = liftT (Addv' v1 v2) VNil
addv' v1 v2 = liftT (Leqv' v1 v2) VNil
leqv' v1 v2 = liftT (Cjmp' v l) VNil
cjmp' v l = liftT Lbl' (k ::: VNil)
lbl' k = liftT Tmp' VNil
tmp' = liftT (Set' vr v) VNil
set' vr v = liftT (Get' vr) VNil get' vr
The following compilation function from expressions to command trees follows for free:
compT' :: Expr' -> Tree (Cmd' v l) v
= denote' compT'
compT' :: Expr' -> Tree (Cmd' V (V -> State [(Temp, V)] V)) V
= denote' compT'
We can execute command trees for conditional expressions by mapping each command onto the corresponding monad operation that we gave in the MonadOps'
instance for interp'
above; just like we did for arithmetic expressions above:
runT' :: Expr' -> V
=
runT' fst .
flip runState [] .
flip runContT return .
return
foldT ->
(\ op ps k case (op, ps, k) of
Intv' i, VNil, Some k) ->
(return (IntV i) >>= k
Addv' (IntV i1) (IntV i2), VNil, Some k) ->
(return (IntV (i1 + i2)) >>= k
Leqv' (IntV i1) (IntV i2), VNil, Some k) ->
(return (BoolV (i1 <= i2)) >>= k
Cjmp' (BoolV True) l, VNil, Some k) ->
(-> l (IntV 0)) (return (IntV 0)) >>= k
withContT (\ _ _ Cjmp' (BoolV False) _, VNil, Some k) ->
(return (IntV 0) >>= k
Lbl', k0 ::: VNil, Some k) ->
(-> runContT (k0 l) l) (return (IntV 0)) >>= k
withContT (\ l _ Tmp', VNil, Some k) -> do
(<- get
s let t = "t" ++ show (length s)
-> (t, IntV 0) : s)
modify (\ s TmpV t)
k (Set' (TmpV t) v, VNil, Some k) -> do
(-> (t, v) : s)
modify (\ s
k vGet' (TmpV t), VNil, Some k) -> do
(<- get
s lookup t s))) .
k (fromJust ( compT'
runT' :: Expr' -> V
=
runT' fst .
flip runState (tail [("", IntV 0)]) .
flip runContT return .
return
foldT ->
(\ op ps k case (op, ps, k) of
Intv' i, VNil, Some k) ->
(return (IntV i) >>= k
Addv' (IntV i1) (IntV i2), VNil, Some k) ->
(return (IntV (i1 + i2)) >>= k
Leqv' (IntV i1) (IntV i2), VNil, Some k) ->
(return (BoolV (i1 <= i2)) >>= k
Cjmp' (BoolV True) l, VNil, Some k) ->
(-> l (IntV 0)) (return (IntV 0)) >>= k
withContT (\ _ _ Cjmp' (BoolV False) _, VNil, Some k) ->
(return (IntV 0) >>= k
Lbl', k0 ::: VNil, Some k) ->
(-> runContT (k0 l) l) (return (IntV 0)) >>= k
withContT (\ l _ Tmp', VNil, Some k) -> do
(<- get
s let t = "t" ++ show (length s)
-> (t, IntV 0) : s)
modify (\ s TmpV t)
k (Set' (TmpV t) v, VNil, Some k) -> do
(-> (t, v) : s)
modify (\ s
k vGet' (TmpV t), VNil, Some k) -> do
(<- get
s lookup t s))) .
k (fromJust ( compT'
Step 2: Compiling Command Trees
In our second step, we compile command trees into sequences of labeled instructions. We use strings as labels:
type Label = String
Furthermore, we refine our notion of instruction sequence from earlier, to compose sequences of instructions that may or may not yield a result (sequenced using BndI'
and SeqI'
as defined below):
data Instr' (sig :: Nat -> Bool -> * -> * -> * -> *) where
EndI' :: Reg -> Instr' sig
BndI' :: sig n b p r q -> Reg -> Instr' sig -> Instr' sig
SeqI' :: sig n b p r q -> Instr' sig -> Instr' sig
LblI' :: Label -> Instr' sig -> Instr' sig
deriving instance (forall n b p r q. Show (sig n b p r q)) => Show (Instr' sig)
We also introduce a notion of concatenation for instructions which our compiler will make use of:
concatI' :: Instr' sig -> Instr' sig -> Instr' sig
EndI' _) c = c
concatI' (BndI' op r c1) c2 = BndI' op r (concatI' c1 c2)
concatI' (SeqI' op c1) c2 = SeqI' op (concatI' c1 c2)
concatI' (LblI' l c1) c2 = LblI' l (concatI' c1 c2) concatI' (
As for arithmetic expressions, the main purpose of our compiler for conditional expressions is to make bindings explicit. Unlike the instructions we had for arithmetic expressions, instructions for conditional expressions bind both register names and label names. We use a monad for generating fresh names of either kind:
type Gen' = State (Int, Int)
And introduce functions for generating fresh names (their implementation is obvious and omitted):
freshr :: Gen' Reg
= do
freshr <- get
(i, l) + 1, l)
put (i return ("r" ++ show i)
freshl :: Gen' Reg
= do
freshl <- get
(i, l) + 1)
put (i, l return ("l" ++ show l)
instantiate' k
uses fresh
to generate a new register name, and passes that name to k
, just as instrantiate
from earlier did:
instantiate' :: (Reg -> Gen' b) -> Gen' (Reg, b)
= do
instantiate' k <- freshr
r <- k r
k' return (r, k')
Without further ado, the heart of our compiler, which translates command trees to instruction sequences:
compI' :: Tree (Cmd' Reg Label) Reg ->
Gen' (Instr' (Cmd' Reg Label))
Leaf x) = return (EndI' x)
compI' (Node op@(Intv' _) VNil (Some k)) = do
compI' (<- instantiate' (compI' . k)
(r, c) return (BndI' op r c)
Node op@(Addv' _ _) VNil (Some k)) = do
compI' (<- instantiate' (compI' . k)
(r, c) return (BndI' op r c)
Node op@(Leqv' _ _) VNil (Some k)) = do
compI' (<- instantiate' (compI' . k)
(r, c) return (BndI' op r c)
Node op@(Cjmp' _ _) VNil (Some k)) = do
compI' (<- instantiate' (compI' . k)
(_, c) return (SeqI' op c)
Node Lbl' (k0 ::: VNil) (Some k)) = do
compI' (<- freshl
l <- compI' (k0 l)
c0 <- instantiate' (compI' . k )
(_, c ) return (concatI' c0 (LblI' l c))
Node op@Tmp' VNil (Some k)) = do
compI' (<- instantiate' (compI' . k)
(r, c) return (BndI' op r c)
Node op@(Set' _ _) VNil (Some k)) = do
compI' (<- instantiate' (compI' . k)
(r, c) return (BndI' op r c)
Node op@(Get' _) VNil (Some k)) = do
compI' (<- instantiate' (compI' . k)
(r, c) return (BndI' op r c)
The only case of the compiler that does something slightly interesting is the case for the Lbl'
instruction, which concatenates a sub-continuation with a (labeled) continuation.
Our (free) compiler is thus:
comp' :: Expr' -> Instr' (Cmd' Reg Label)
= fst . flip runState (0, 0) . compI' . denote' comp'
Before we wrap up and discuss the takeaways from what we have seen in this post, we illustrate how the semantics of compiled instruction sequences corresponds rather systematically to the “reference semantics” in interp'
from earlier.
The main difference is how labels and jumps are treated.
Since we now have a first-order representation, we do not use a continuation monad for dealing with jumps, but instead actually look up a code label in an instruction sequence, using the following resolveI'
function:
resolveI' :: Instr' sig -> Label -> Instr' sig
BndI' _ _ c) l = resolveI' c l
resolveI' (SeqI' _ c) l = resolveI' c l
resolveI' (LblI' l' c) l | l == l' = c
resolveI' (LblI' _ c) l = resolveI' c l resolveI' (
The other difference is that the cases of the execI'
function below use registers for explicit memory.
These differences aside, execI'
corresponds to the “reference semantics” in interp'
, in the sense that each signature command is mapped onto the corresponding monad operation that we gave in the MonadOps'
instance for interp'
above.
execI' :: Instr' (Cmd' Reg Label) ->
ReaderT (Instr' (Cmd' Reg Label)) (State [(Temp, V)]) V
EndI' r) = do
execI' (<- get
s return (fromJust (lookup r s))
BndI' (Intv' i) r c) = do
execI' (-> (r, IntV i) : s)
modify (\ s
execI' cBndI' (Addv' r1 r2) r c) = do
execI' (<- get
s let IntV i1 = fromJust (lookup r1 s)
IntV i2 = fromJust (lookup r2 s)
-> (r, IntV (i1 + i2)) : s)
modify (\ s
execI' cBndI' (Leqv' r1 r2) r c) = do
execI' (<- get
s let IntV i1 = fromJust (lookup r1 s)
IntV i2 = fromJust (lookup r2 s)
-> (r, BoolV (i1 <= i2)) : s)
modify (\ s
execI' cSeqI' (Cjmp' r l) c) = do
execI' (<- get
s let BoolV b = fromJust (lookup r s)
case b of
True -> do
<- ask
p let k = resolveI' p l
execI' kFalse ->
execI' cLblI' _ c) = execI' c
execI' (BndI' Tmp' r c) = do
execI' (<- get
s let t = "t" ++ show (length s)
-> (r, TmpV t) : (t, IntV 0) : s)
modify (\ s
execI' cBndI' (Set' r1 r2) r c) = do
execI' (<- get
s let TmpV t = fromJust (lookup r1 s)
= fromJust (lookup r2 s)
v -> (r, v) : (t, v) : s)
modify (\ s
execI' cBndI' (Get' rt) r c) = do
execI' (<- get
s let TmpV t = fromJust (lookup rt s)
= fromJust (lookup t s)
v -> (r, v) : s)
modify (\ s execI' c
runI' :: Expr' -> V
=
runI' e let i = comp' e
in (fst .
flip runState [] .
flip runReaderT i .
execI') i
Conclusion
We have presented an approach to deriving a compiler from a high-level semantics. We have illustrated the approach by building two compilers: one for arithmetic expressions, and one for arithmetic and conditional expressions. The key idea of the approach is to use denotation functions to define a high-level semantics, defined as a mapping of object language expressions into monadic operations. By choosing monadic operations in a way that they are low-level enough to correspond closely with the target instruction set, compilers follow for free from denotation functions. We argue that the low-level nature of the monadic operations that our approach relies on does not compromise the high-level nature of denotation functions, because we can build high-level abstractions on top of the low-level monadic operations, and use the high-level abstractions in our denotation functions.
References
For readers familiar with free monads, and wondering how what we present in this section relates to the standard notion of
Free
monad, see the section dedicated to this question.↩︎See the “free” package, or see one of the many other implementations of free monads available on Hackage.↩︎
Again these definitions correspond to those found in the “free” package on Hackage↩︎
The things that go after the vertical bar (
|
) in the lineclass Monad m => MonadOps' m lbl val | m -> lbl, m -> val
are functional dependencies, needed to help Haskell with type inference.↩︎