Lisp In Seven Parts

Crafting interpreters in Erlang

Part 1

I’d like to write about a computer program that has been described by Dr William Byrd as being the most beautiful program ever written, which you may hear all about in his wonderfully re-watchable talk on it.

It is a Lisp interpreter written in Scheme, which itself is a dialect of Lisp. After having written the program out in full both on paper and in a text editor dozens of times, I decided to take it with me to new places, starting with Racket. The main benefit of ever so slightly changing the implementation talked about by Dr Byrd (written in Chez-Scheme) into a Racket program was that I was able to make use of Racket’s built-in pattern matching, so things just worked nicely out of the box. You’re also then able to hook into everything else Racket has to offer (which is a lot). But Racket itself is based on Scheme, so I then wondered what it’d be like to take it with me elsewhere. Somewhere a little different. Racket’s pattern matching was quite useful, so maybe a language with support for that. A functional programming language might make sense or at least something with good support for functions as values. The target language, Lisp, means we’ll be dealing with lists so a language that makes lists fun might be an idea. I’m sure this sounds like a lot of languages but of the ones I personally know, everything points to Erlang.

We give Erlang a list, it gives us that list back:

Eshell V11.0  (abort with ^G)
1> [1,2,[a,b,c]].
[1,2,[a,b,c]]

A list in Erlang doesn’t look all that different from a list in Scheme:

> '(1 2 (a b c))
(1 2 (a b c))


Part 2

What does the target language look like?

We can create functions
(lambda (x) x)

A function can be defined using lambda which is followed by a list of arguments and then a function body. The above function is known as the identity function as it simply returns its one and only argument, x.

Though in this text lambda is the term used to create functions, this term is arbitrary and as such you’re able to use any term you wish in place of it.

We can call functions
((lambda (x) (sum x 1)) 5)

Above is a function that will add 1 to its argument (given that sum has already been defined). It is being applied to 5 so we know that in this case, the expression will evaluate to 6.

Just like in Scheme, a function can be called by wrapping its operator and its operands in parentheses. As an example in Scheme, one may add two numbers together by applying the + operator to two operands like so:

> (+ 5 10)
15

These are known as S-Expressions. The general rule is that the function (also known as an operator or a procedure) goes first and then its arguments follow, each separated by a space.

A language such as this one that can evaluate expressions, define and then apply functions to arguments, is turing-complete. From this point on anything that one may express with logic may be expressed in this form. Each of the aforementioned components can be seen in action in this call to the identity function in which the identity function is passed as the argument:

((lambda (x) x) (lambda (y) y))

That is, a function is defined, (lambda (x) x), which itself is wrapped in parenthesizes meaning that it is being called with the argument, (lambda (y) y).

Part 3

interp.erl

-module(interp).

-export([eval_expr/2]).

eval_expr(Expr, _Env) when is_number(Expr) -> Expr.

The above is an Erlang module I’ve named interp. In Erlang, modules are a place to store code. The filename and the module must match. eval_expr is a function I have defined in this module which takes 2 parameters, Expr and _Env. This function is exported which means it may be called from outside of its module. when is_number is a guard which in the above case will only match this function if Expr is indeed a number. The reason it’s _Env and not simply Env is because if we don’t plan on making use of a function’s argument within the function body, it’s an unused variable and this must be denoted with a leading underscore. You may simply ignore what _Env actually means for now. The function’s body is written following -> and in this case Expr is returned and that’s it.

$ erl interp.erl
Eshell V11.0  (abort with ^G)
1> interp:eval_expr(42, []).
42

We may now evaluate numbers. Excellent.

Part 4

In Erlang one may define the same function multiple times and match against them based on their arguments using Erlang’s built-in pattern matching.

light_switch(on) -> 1;
light_switch(off) -> 0.

on and off are not variables. They’re called atoms. Things that simply represent themselves. They usually start with a lowercase letter whereas variables start with an uppercase letter. i.e. hello is an atom where as Hello is the name of a variable. One may also write the above as:

light_switch(X) ->
  if
    X == on -> 1;
    X == off -> 0
  end.

But why go to all that trouble when one can pattern match like in the first case? The last expression to be evaluated in a function is what that function will return. In this case, it’s the result from the if statement, or in the earlier example 1 or 0.

The following shows some expressions using Erlang’s atoms:

Eshell V11.0  (abort with ^G)
1> is_atom(hello).
true
2> is_atom('hello').
true
3> 'hello' == hello.
true

Scheme uses what it calls symbols as names for its variables. A symbol can be thought of at the same thing as an atom in Erlang. Something that represents itself and between the two languages, they look somewhat similar.

> 'hello
hello
> (symbol? 'hello)
#t
> (define hello "world")
> hello
"world"
> (eval 'hello)
"world"


Part 5

Let’s keep going.

interp.erl

-module(interp).

-export([eval_expr/2]).

eval_expr(Expr, _Env) when is_number(Expr) -> Expr;
eval_expr(Expr, Env) when is_atom(Expr) ->
  Env(Expr).

A new additional eval_expr definition but this time it is matching against Expr to be an atom with its guard. It then passes that atom to Env in what looks to be a function call. This must mean that Env is a function.

Env is the environment. You may be familiar with how you’re able to store and retrieve environment variables on your operating system whereby you have program-x running which has access to the system’s environment variables and also program-y which if running in the same environment has access to the same environment variables. One of the things the environment will be used for in this interpreter is the storage and retrieval of variables, mapping values passed as function arguments to their respective identifiers within a function body. An example in Scheme:

> ((lambda (x) (mul x x)) 5)
25

When it comes to applying x and x to mul, the function body needs to know what x actually is at runtime in order to then multiply it by itself. This is where the environment comes in. Just keep this in mind for now and we’ll keep going with some pattern matching against lambdas next.

interp.erl

-module(interp).

-export([eval_expr/2]).

eval_expr(Expr, _Env) when is_number(Expr) -> Expr;
eval_expr(Expr, Env) when is_atom(Expr) ->
  Env(Expr);
eval_expr([lambda, [X], Body], Env) ->
  fun(Arg) ->
    eval_expr(Body,
      fun(Y) ->
        case Y of
          X -> Arg;
          _ -> Env(Y)
        end
      end)
  end.

You may now start to notice some lispy patterns in the code, such as [lambda, [X], Body] which with a few syntactic modifications may be read as (lambda (x) body) in its lisp-like form.

This is a finer example of Erlang’s pattern matching compared to what has been seen so far. There are no guards on this new eval_expr, however we expect this function to only be called if it’s passed with a list containing the atom, lambda; a list containing an unknown variable, X, and a function body, Body.

fun(Arg) ->
  Arg + 1
end

In Erlang, functions may also be created using the fun keyword, the above being a function that when called will return Arg + 1.

In this interpreter we’re going to extend the environment when the Body is being evaluated. This is done by passing in a new environment in the call to eval_expr. Let’s take a look at what the extended environment looks like:

eval_expr([lambda, [X], Body], Env) ->
  fun(Arg) ->
    eval_expr(Body,
      fun(Y) ->
        case Y of
          X -> Arg;
          _ -> Env(Y)
        end
      end)
  end.

If we go back to an earlier line of code:

eval_expr(Expr, Env) when is_atom(Expr) ->
  Env(Expr);

remember that when an atom is evaluated, that value gets looked up by calling Env while passing that atom as the argument.

Inside the environment itself, Y, the passed argument to the environment, is compared against X, which is the identifier for the variable being looked up inside the function body. An example:

(lambda (myArg) (+ 1 myArg))

When 1 and myArg are being applied to +, we need to know what myArg is. Here one may say, relative to the eval_expr([lambda, [X], Body]… block above, that:

X = myArg
Y = myArg

And therefor X == Y

However in a case such as the following:

(lambda (myArg) (+ 1 somethingElse))

X = myArg
Y = somethingElse

And therefor X != Y.

If X == Y then the argument we’re looking for, Arg, is what the environment lookup will return. If X and Y are not the same, i.e. the identifier of the variable passed to the function call does not match that of the identifier in the function body, we call Env(Y) where Env comes from what is passed initially to eval_expr.

There is only one final piece left to this part of the code. And that is function application. In other words, a function call. Now we can create functions, let’s add in the ability to call them by pattern matching something that looks like a function call.

interp.erl

-module(interp).

-export([eval_expr/2]).

eval_expr(Expr, _Env) when is_number(Expr) -> Expr;
eval_expr(Expr, Env) when is_atom(Expr) ->
  Env(Expr);
eval_expr([lambda, [X], Body], Env) ->
  fun(Arg) ->
    eval_expr(Body,
      fun(Y) ->
        case Y of
          X -> Arg;
          _ -> Env(Y)
        end
      end)
  end;
eval_expr([Rator, Rand], Env) ->
  Rator1 = eval_expr(Rator, Env),
  Rand1 = eval_expr(Rand, Env),
  Rator1(Rand1).

If we pattern match against list containing two things, an operator and an operand, we evaluate each of them and then apply the rator to the rand. i.e. call rator while passing rand as the argument. lambdas can now be created and called.

This is Lisp in terms of Erlang.

Part 6

A helper function to create and pass an initial environment would be nice to play around with this interpreter. Let’s export a new function that we’ll then define, eval/1.

-export([eval_expr/2, eval/1]).

And then define it:

eval(Expr) ->
  eval_expr(Expr, fun(_Y) -> [error, variable_unbound] end).

The default environment doesn’t have any variable values to lookup. So if an undefined variable is found within a function body, this is what will be returned, a list containing two atoms, error and variable_unbound. These atoms are arbitrary and may be named anything you wish.

Thus far, our target language syntax is represented as a list in Erlang. Here is the identity function with the number 50 being passed:

[[lambda, [x], x], 50]

Which can be read as:

((lambda (x) x) 50)

in its lispy form.

With a helper function now in place, we can open up a new Erlang shell, compile the interp module and then call the eval function passing in a tiny Lisp program using Erlang’s lists.

$ erl
Eshell V11.0  (abort with ^G)
1> c(interp).
{ok,interp}
2> interp:eval([[lambda, [x], x], [lambda, [y], y]]).
#Fun<interp.1.24778783>


Part 7

With the core logic now implemented all that is left to do is to build a lispy front-end so we’re then able to write actual programs in terms of Lisp. You may have noticed that while building this interpreter, we’ve actually constructed it from the middle outwards. Often times in the past when I’ve taken on the task of writing an interpreter I’ve started first with the tokenizer, then the parser then the actual interpretation and so on… which is the same as the order of execution at runtime. At first glance it may make sense to actually write the program in the same sort of order. Get one piece working, test it, move on to the next piece, attach the two pieces together and so on, in a linear fashion.

In many cases, the order in which a program is built does matter. Perhaps in an ideal world it shouldn’t matter. But if one goes to write some code and does not first pre-define some solid interfaces between how the various parts of code should interact with each other, the part you begin to write first may influence the part that is written next. This effect, as stated, is predicated on the programmer not thinking too much about the interfaces and focusing only on the program from a top down perspective. In our case, we haven’t spoken much about the interfaces for this interpreter other than that Erlang’s built in lists are used to represent lists in the target language and Erlang functions to represent lambdas. These are choices that I have made as to keep things simple. It is in essence a mapping of concepts in the target language to features and data structures that exist within the host language i.e.

Target language -> Host Language
lambda -> an Erlang fun
number
-> a number in Erlang
symbol
-> an atom in Erlang
(1 2 3)
-> a list in Erlang i.e. [1,2,3]

You may wish, for example, to store some meta data of some sort in your lambdas, so rather than having an Erlang function being the thing that represents our lambdas it could be a list in Erlang that contains both the function and the additional meta data. Defining your lambdas like this will mean you’ll also need to modify how function calls, or function application, works to take into account your new list data structure and to get the function from that list to execute. These kinds of data structures are out of scope for this interpreter, but there’s nothing whatsoever stopping you from playing around with this.

Parsing S-Expressions from strings into Erlang lists, though is not the focus of this text, has been written so that we can write programs using Lisp’s syntax to test out this new interprerter. You may read through the parsing code in your own time or simply copy and paste this in to a file and run it. From here you’ll be able to take the interpreter in your own direction, add features. Maybe even implement it in your favourite language.

interp.erl

-module(interp).

-export([run/1,eval_expr/2,eval/1]).

run(ExprStr) -> eval(parse_string(ExprStr)).

eval(Expr) ->
  eval_expr(Expr, fun(_Y) -> [error, variable_unbound] end).

eval_expr(Expr, _Env) when is_number(Expr) -> Expr;
eval_expr(Expr, Env) when is_atom(Expr) ->
  Env(Expr);
eval_expr([lambda, [X], Body], Env) ->
  fun(Arg) ->
    eval_expr(Body,
      fun(Y) ->
        case Y of
          X -> Arg;
          _ -> Env(Y)
        end
      end)
  end;
eval_expr([Rator, Rand], Env) ->
  Rator1 = eval_expr(Rator, Env),
  Rand1 = eval_expr(Rand, Env),
  Rator1(Rand1).

parse_string(Expr) ->
  lists:nth(1, parse(tokenize(Expr), [])).

parse([], Acc) -> Acc;
parse([[$(]|Rest], Acc) ->
  {Rem, SubTree} = parse(Rest, []),
  parse(Rem, [SubTree | Acc]);
parse([[$)]|Rest], Acc) ->
  {Rest, lists:reverse(Acc)};
parse([[]|Rest], Acc) -> parse(Rest, Acc);
parse([H|T], Acc) -> parse(T, [parse_token(H)|Acc]).

parse_token(Tkn) ->
  case is_string_an_integer(Tkn) of
    true -> list_to_integer(Tkn);
    _    -> list_to_atom(Tkn)
  end.

is_string_an_integer("") -> false;
is_string_an_integer(Str) ->
  lists:all(fun(D) -> D >= $0 andalso D =< $9 end, Str).

tokenize(Expr) -> pad_tokens(Expr, ["(", ")"]).

pad_tokens(Str, []) ->
  string:split(lists:flatten(Str), " ", all);
pad_tokens(Str, [Tkn|Rest]) ->
  pad_tokens(
    string:replace(Str, Tkn, " " ++ Tkn ++ " ", all), Rest).

The parsing code and a new run function are the final bits needed to run this and test it out.

$ erl
Eshell V11.0  (abort with ^G)
1> c(interp).
{ok,interp}
2> F = interp:run("((lambda (x) x) (lambda (y) y))").
#Fun<interp.1.99366422>
3> F(10).
10

If you enjoyed reading this feel free to subscribe for free and get other things to read about directly to your inbox.

-> A Quick One on Quicksort
-> How Not to Waste Some Time