# Calc: a simple Haskell eDSL

*2013-03-09*

The other day I started playing around with Accelerate, a Haskell eDSL for gpgpu computing. Accelerate provides us with multidimensional arrays and several functions to manipulate them that we can use to build expressions. These expressions can be compiled into Cuda code using the Cuda backend and then run on the gpu. To me it seemed like the library was imbued with some form of arcane magic, so I decided to investigate this eDSL deal further.

My first stop was the wiki page on Haskell eDSLs. Eventually, I stumbled upon several tutorials, and they all seemed to start with a language for a very simple calculator. I then decided to make my own version of that language and make a compiler that would translate expressions to something executable.

This post is therefore an introductory tutorial to Haskell eDSLs. My intent is that you get a first grasp of the idea so that you can then move on to more complicated matters.

Full source code can be reached here.

## The Calc Language

The language we’ll be building is one for a simple calculator that can add, subtract, and multiply numbers. We start off by formalising what an expression in this language looks like:

```
type Number = Integer
data Expr
= Lit Number
| Add Expr Expr
| Sub Expr Expr
| Mul Expr Expr
deriving Show
```

The syntax should be readable even by non-Haskell programmers. What
we’re saying is that an expression can be either a literal, the sum of
two expressions, the difference between two expressions or their
product. The `deriving Show`

part is simply to make the
expressions printable in ghci.

Now we can go ahead and build expressions. The literal number 2 is simply

`Lit 2`

The expression `2 + 3`

can be written as

`Add (Lit 2) (Lit 3)`

And we can even nest expressions, like in
`(2+3)*(4-2)`

:

`Mul (Add (Lit 2) (Lit 3)) (Sub (Lit 4) (Lit 2))`

And so on. But of course, having to construct an AST manually is a
pain; we’d like to type `2+3`

and `(2+3)*(4-2)`

just like we do regularly. That’s when we make `Expr`

an
instance of `Num`

:

```
instance Num Expr where
e1 + e2 = Add e1 e2
e1 - e2 = Sub e1 e2
e1 * e2 = Mul e1 e2
abs e = e
signum e = e
fromInteger = Lit
```

The definitions for `abs`

and `signum`

are a
cheat, but for this small example it’s ok. Once we’ve rolled this
instance we can start typing expressions just as if they were regular
integers:

```
> (2+3)*4*2 :: Int
40
> (2+3)*4*2 :: Expr
Mul (Mul (Add (Lit 2) (Lit 3)) (Lit 4)) (Lit 2)
```

This is quite powerful. We type a bunch of expressions using everyday syntax, and Haskell builds an AST for us.

But that’s not it. Notice that `Expr`

is now a
`Num`

. That means that any function in Haskell that is
polymorphic over `Num`

can now be used to build an
expression. Take, for instance, the list of all fibonacci numbers:

```
fib :: Num a => [a]
fib = 0 : 1 : zipWith (+) fib (tail fib)
```

Since all that fib requires is that elements can be added together,
`fib`

has type `Num a => [a]`

. Now we can build
a list of the first 5 fibonacci numbers as plain `Int`

s:

```
> take 5 fib :: [Int]
[0,1,1,2,3]
```

But here’s the cool part: `Expr`

is an instance of
`Num`

, so we can build a list of `Expr`

s as
well:

```
> take 5 fib :: [Expr]
[ Lit 0
, Lit 1
, Add (Lit 0) (Lit 1)
, Add (Lit 1) (Add (Lit 0) (Lit 1))
, Add (Add (Lit 0) (Lit 1)) (Add (Lit 1) (Add (Lit 0) (Lit 1)))
]
```

Take a look at that! `fib`

is building the expressions
that yield the first 5 numbers in the sequence. The power of this is
that we can use any function that evalutes to `Num`

to build
an `Expr`

, such as `fib`

. In other words, we can
now use the host language, Haskell, to build complicated expressions in
our Calc language.

## The Calc Compiler

An expression is kind of useless if we can’t do anything with it
other than building it. What we wish now is being able to evaluate an
`Expr`

. We could write a simple interpreter for that matter,
but I figured that a compiler would be cooler.

The compiler we are going to write translates an `Expr`

into x86 Linux assembly. This assembly code will then be compiled and
run like a regular assembly program, and we’ll make that program return
the result of the evaluated expression back into Haskell. For this task
we’ll be using `nasm`

and `ld`

.

First we define a helper function, `nconcat`

, that
concatenates a list of lists by intercalating a new line between each
successive list:

`nconcat = intercalate "\n"`

Next we define functions for each operation that can be done on
expressions, namely addition, subtraction and multiplication. The
convention we take is that these functions read their arguments off the
stack and return the result in the `eax`

register.

```
add = nconcat
[ "add:"
, "mov eax, [esp+4]"
, "mov ebx, [esp+8]"
, "add eax, ebx"
, "ret"
, ""
]
sub = nconcat
[ "sub:"
, "mov ebx, [esp+4]"
, "mov eax, [esp+8]"
, "sub eax, ebx"
, "ret"
, ""
]
mul = nconcat
[ "mul:"
, "mov eax, [esp+4]"
, "mov ebx, [esp+8]"
, "mul ebx"
, "ret"
, ""
]
```

Moving on, we define the `compile'`

function, which
translates an expression into a string:

```
compile' :: Expr -> String
compile' (Lit x) = "mov eax, " ++ show x
compile' (Add x y) = binOp "add" x y
compile' (Sub x y) = binOp "sub" x y
compile' (Mul x y) = binOp "mul" x y
```

The `compile'`

function relies on `binOp`

,
which we define next. `binOp`

takes a function name and two
expressions and applies that function to the evaluations of the given
expressions:

```
type Op = String
binOp :: Op -> Expr -> Expr -> String
binOp op x y
= nconcat
[ compile' x
, "push eax"
, compile' y
, "push eax"
, "call " ++ op
, "add esp, 8"
]
```

So `binOp`

and `compile'`

are mutually
recursive. `compile'`

compiles a single expression, using
`binOp`

when this expression is a function of two other
expressions.

A question that arises is how to make the assembly program return the
result of an evaluation back to Haskell. Since the Calc language only
defines expressions that evaluate to integers, we’re going to make a
little hack and make the assembly program return the result via its exit
code. For this purpose, we define the `exit`

function:

```
exit
= nconcat
[ "exit:"
, "mov ebx, [esp+4]"
, "mov eax, 1"
, "int 0x80"
, ""
]
```

`exit`

reads a number from the stack and exits with that
number as the exit code. `int 0x80`

is the way we perform a
syscall on Linux, `eax=1`

is how we instruct the kernel to
perform an exit, and `ebx`

holds the exit code.

This exit code hack has one limitation, which is that only values in the range 0..255 can be returned. For our illustrative purposes this is fine, however.

Now we have all of the elements to build an assembly program. For readability, we define a program to be

`newtype Prog = Compute Expr deriving Show`

Next we define the function that compiles a program:

```
compile :: Prog -> String
compile (Compute e)
= nconcat
[ header
, add
, sub
, mul
, exit
, "_start:"
, compile' e
, "push eax"
, "call exit"
]
```

where `header`

is defined as

```
header
= nconcat
[ "BITS 32"
, "section .text"
, "global _start"
, ""
]
```

This `header`

code is just a bunch of directives
`nasm`

expects. `BITS 32`

tells nasm we’re making
a 32-bit program. `section .text`

specifies that we are
defining the `.text`

section, where the executable code is,
and `global _start`

specifies the entry point.

The `compile'`

function takes an expression, compiles it
and wraps it with the header, the functions on expressions and a call to
exit that returns the result as the program’s exit code.

To visualise all of this, let’s compile an example expression to see the code that is produced:

```
> let e = 17 :: Expr
> compile (Compute e)
```

The resulting code is

```
BITS 32
section .text
global _start
add:
mov eax, [esp+4]
mov ebx, [esp+8]
add eax, ebx
ret
sub:
mov ebx, [esp+4]
mov eax, [esp+8]
sub eax, ebx
ret
mul:
mov eax, [esp+4]
mov ebx, [esp+8]
mul ebx
ret
exit:
mov ebx, [esp+4]
mov eax, 1
int 0x80
_start:
mov eax, 17
push eax
call exit
```

Notice how in `_start`

, the value 17 is moved to
`eax`

, pushed onto the stack and followed by a call to exit.
This makes the program quit with exit code 17.

The following code is what the expression `2*3`

compiles
to, omitting all of the boilerplate:

```
_start:
mov eax, 2
push eax
mov eax, 3
push eax
call mul
add esp, 8
push eax
call exit
```

The generated code could be better, for example by pushing the
literals 2 and 3 directly instead of moving them into `eax`

and then pushing `eax`

, but for our purposes it’s
sufficient.

Now we need to compile the generated assembly code. This is exactly
what the `nasm`

function does:

```
nasm :: String -> IO String
nasm code
= do writeFile "foo.s" code
system "nasm -f elf foo.s"
system "ld -o foo foo.o"
return "./foo"
```

The `nasm`

function takes some code, dumps it into
`foo.s`

, compiles it with `nasm`

, links it with
`ld`

, and then returns the command that we must execute to
run the generated program.

Finally, we define the `run`

function, which takes a
program, compiles it, runs it and interprets the result:

```
run :: Prog -> IO Int
run prog =
let code = compile prog
in nasm code >>= system >>= return . readExit
readExit :: ExitCode -> Int
readExit ExitSuccess = 0
readExit (ExitFailure x) = x
```

And voila. Now we can compute those fibonacci numbers and any expression that we fancy:

```
> let e = fib !! 6 :: Expr
> run . Compute $ e
8
> fib !! 6 :: Int
8
> 2*3 + 5*6 - 3
33
> run . Compute $ 2*3 + 5*6 - 3
33
```

## Where To Go From Here

The Calc language is easy to model because all expressions evaluate
to the same type: `Integer`

. As soon as we add expressions of
different types the language gets more complicated and we need something
like a `GADT`

. The GADTs section on
the wiki has an excellent tutorial on modeling more sophisticated
languages, so it is a good step to take from here.