Post

Writing a Lisp Interpreter in Python

In this blog post, I’m going to cover another John Crickett coding challenge that I just completed - building a Lisp interpreter in Python. Lisp is the second oldest high-level language implemented (in 1959, after FORTRAN in 1957), probably most well-known for its fully parenthesized prefix notation (that definitely takes some getting used to). Scott Wlaschin calls Lisp a “galaxy brain” language for introducing the symbolic programming paradigm (if you haven’t watched his talk linked here I highly recommend it). This coding challenge has definitely been the most challenging to date - I had to spend a few hours just familiarizing myself with Lisp syntax so that I could understand it enough to cobble together an implementation with basic features.

For some interesting historical context on Lisp, and some thoughts on why, perhaps, Lisp is not more popular today, you can read “Lisp: Good News, Bad News, How to Win Big” by Richard P. Gabriel.

About

py-lisp-interpreter is a basic Lisp interpreter, written in Python. Running pylisp from the command line will allow the user to enter the Lisp REPL environment, or execute a .txt file from a provided file path.

Throughout this post, I will discuss how I implemented the lexing, parsing, and evaluation phases of my Lisp interpreter. For a helpful overview of the phases of an interpreter (and compiler), please see my previous post here

Type Definitions

In the header of our .py file containing our interpreter, we’ll be explicit about our Lisp types:

1
2
3
4
5
Symbol = str            # Implement a Lisp Symbol as a Python str
Number = (int, float)   # Implement a Lisp Number as a Python int or float
Atom = (Symbol, Number) # Implement a Lisp Atom as a Symbol or Number
List = list             # Implement a Lisp List as a Python list
Exp = (Atom, List)      # Implement a Lisp expression as an Atom or List

(Thanks to Peter Norvig for helping me conceptualize how to relate Lisp and Python types).

Lexer

Typically, depending on the syntax of a programming language, implementing a lexer can be quite complicated. In this case, however, thanks to Lisp’s syntax, and the beauty of Python, all we need to tokenize the input for our simple Lisp interpreter is str.split():

1
2
3
4
5
6
7
8
9
10
11
def tokenize(input: str) -> List[str]:
    """
    Split input string into a list of tokens. Note that we pad
    parentheses with white space before splitting to separate
    parentheses from Atoms
    --> we want '(+ 1 2)' to tokenize to ['(', '+', '1', '2', ')']
    not ['(+', '1', '2)']
    """

    tokenized_input: List[str] = input.replace("(", " ( ").replace(")", " ) ").split()
    return tokenized_input

Parser

Once we’ve tokenized the input source program, we can generate an abstract syntax tree (AST). We’ll use Python lists to express the AST, and lists of lists to denote nested expressions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def generate_ast(tokens: List) -> List:
    """
    Generate abstract syntax tree from input tokens.

    Example:
    tokenized_input = ['(', 'defun', 'doublen', '(', 'n', ')', '(', '*', 'n', '2', ')', ')']
    -->
    ast = ['defun', 'doublen', ['n'], ['*', 'n', 2]]
    """
    t = tokens.pop(0)

    # start a new sublist everytime we encounter an open parens
    if t == "(":
        ast = []
        # append tokens to sublist until we encounter a close parens
        while tokens[0] != ")":
            ast.append(generate_ast(tokens))
        tokens.pop(0)  # pop off ')'
        return ast
    elif t == ")":
        raise SyntaxError("Mismatched parens.")
    else:
        return atomize(t)

def atomize(token: str) -> Atom:
    """
    Atomize input tokens. Every token is either an int, float, or Symbol.

    Note that
        Symbol := str
        Number := (int, float)
        Atom   := (Symbol, Number)
    """
    try:
        return int(token)
    except ValueError:
        try:
            return float(token)
        except ValueError:
            return Symbol(token)

Evaluation

Once we have the AST, we are ready to evaluate the program! To do this, we need to be able to look up symbols in a mapping from variable name to value. We will call this our SymbolTable. We also need to make sure we support locally declared variables. To do this, we will allow for nested mappings, and when we need to look up a variable name in the SymbolTable, we can just look up the variable in the innermost mapping, then work outwards until we reach the global scope.

To implement SymbolTable, we will inherit from the Python dict class, and define our own find method to check the appropriate scope for where variables are defined.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import operator as op
import math

class SymbolTable(dict):
    """
    A mapping of {variable name: value}
    """
    def __init__(self, params, args, outer_scope=None):
        self.update(zip(params, args))
        self.outer_scope = outer_scope

    def find(self, var):
        if var in self:
            return self[var]
        elif self.outer_scope is not None:
            return self.outer_scope.find(var)
        else:
            raise NameError(f"NameError: name '{var}' is not defined")


global_symbol_table = SymbolTable()
global_symbol_table.update(
    {
        "+": op.add,
        "-": op.sub,
        "*": op.mul,
        "/": op.truediv,
        "<=": op.le,
        "<": op.lt,
        ">": op.gt,
        ">=": op.ge,
        "!=": op.ne,
        "=": op.eq,
    }
)
# add standard math library operators to symbol_table
global_symbol_table.update(math.__dict__)

Now that we have our SymbolTable for keeping track of variables in different scopes, we are ready to evaluate:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def eval(x: Exp, st=global_symbol_table):
    """Evaluate the abstract syntax tree"""
    if isinstance(x, Number):
        return x
    elif isinstance(x, Symbol):
        # start with the innermost scope of the symbol table
        # to find symbol definition, searching outer scope until
        # symbol definition is found
        return st.find(x)
    elif x[0] == "if":
        condition, statement, alternative = x[1:4]
        expression = (
            statement
            if eval(condition, SymbolTable(st.keys(), st.values(), st))
            else alternative
        )
        return eval(expression, SymbolTable(st.keys(), st.values(), st))
    elif x[0] == "defun":
        # `func_name`: str
        # `params`: List[str]
        # `func_body`: List
        # Example:
        #   "(defun doublen (n) (* 2 n))" -->
        #   `func_name`: "doublen"
        #   `params`: ["n"]
        #   `func_body`: ["*", 2, "n"]
        func_name, params, func_body = x[1:4]
        st[func_name] = (params, func_body)
        return f"Defined function: {func_name.upper()}"
    elif x[0] == "format":
        if isinstance(x[-1], list):
            fill_val = eval(x[-1], SymbolTable(st.keys(), st.values(), st))
            res = " ".join(str(i) for i in x[2:-1])
        else:
            fill_val = ""
            res = " ".join(str(i) for i in x[2:])
        if "~D~%" in res:
            res = res.replace('"', "").replace("~D~%", str(fill_val))
        else:
            res = res.replace('"', "").replace("~%", str(fill_val))
        return res
    else:
        func_name = x[0]
        func = eval(x[0], SymbolTable(st.keys(), st.values(), st))
        args = [eval(arg, SymbolTable(st.keys(), st.values(), st)) for arg in x[1:]]

        # if `func` is a tuple, it is a user defined function, so update local scoping
        # of symbol table with user-provided parameters
        if isinstance(func, tuple):
            params, func_body = func
            if len(args) != len(params):
                raise ValueError(
                    f'Function "{func_name}" expects {len(params)} arguments, but {len(args)} were provided.'
                )
            st.update(zip(params, args))
            return eval(func_body, SymbolTable(st.keys(), st.values(), st))
        elif isinstance(func, (int, float, str)):
            return func
        else:
            return func(*args)

Instructions

For Windows, create a folder named Aliases in your C drive: C:/Aliases. Add this folder to PATH. Next, create a batch file that will execute when you call the specified alias. For example, on my machine, I have a batch file named pylisp.bat located at C:/Aliases, that contains the following script:

1
2
3
4
@echo off
echo.
call C:\...\py-lisp-interpreter\pylisp_venv\Scripts\activate.bat
python C:\...\py-lisp-interpreter\main.py %*

So now, when I type pylisp in the command prompt, this batch file will execute, which in turn, launches the appropriate Python virtual environment, then runs the py-lisp-interpreter Python script.

Examples

Running pylisp from the command line launches a cli option to enter the REPL environment or execute a local file:

C:\> pylisp

[?] Would you like to open the REPL environment, or execute a file?: file
 > file
   REPL

Enter the location of the file: test_script.txt
Defined function: HELLO
Defined function: MEANING_OF_LIFE
Defined function: MEANING_OF_LIFE_ANSWER
Defined function: DOUBLEN
Defined function: FIB
Defined function: FACT
Hello Coding Challenge World
42
The meaning of life is 42
The double of 5 is 10
The double of 21 is 42
The double of 107 is 214
Factorial of 5 is 120
Factorial of 6 is 720
Factorial of 7 is 5040
Factorial of 10 is 3628800
Factorial of 12 is 479001600
The 7th number of the Fibonacci sequence is 13

After running a local file, the user is then prompted to enter another file to execute, or they can exit the program:

[?] Would you like to execute another file?: No
   Yes
 > No

If the user chooses to enter the REPL environment, they can then execute simple Lisp expressions:

C:\> pylisp

[?] Would you like to open the REPL environment, or execute a file?: REPL
   file
 > REPL

pylisp> (+ 21 21)
42
pylisp> (pow 2 3)
8.0
pylisp> (sin (/ pi 2) )
1.0
pylisp> (sin (/ pi 4) )
0.7071067811865476
pylisp> (defun fact (n) (if (<= n 1) 1 (* n (fact (- n 1)))))
Defined function: FACT
pylisp> (fact 5)
120
pylisp> (fact 10)
3628800
pylisp> (defun add (a b) (+ a b))
Defined function: ADD
pylisp> (add 4 5)
9
pylisp> (add (add 21 21) 42 )
84
pylisp> (add (add 21 21) (fact 5) )
162

Libraries

Just like the py-wc and py-head projects, everything I needed for py-lisp-interpreter was included in Python’s Standard Library. For a little more cli functionality, I installed inquirer.

Final Thoughts

This was the most difficult coding challenge for me thus far. I learned a lot about the Lisp programming language, and really came to appreciate it’s syntax. Implementing the eval function to evaluate the AST proved to be more involved and nuanced than I first thought, and improved my ability to “think recursively.”

Finally, if you check my repo for this project, you’ll see that I implemented two helper functions that both check if the Lisp program is valid before executing. Specifically, they check if the program has matching parentheses (if parens are not balanced, it is not a valid program).

For me, using a stack to add and pop parens as we scan the source code was the intuitive way to check balanced parens. After implementing that approach, I enjoyed thinking of a way to solve the same problem in a “functional” paradigm. The below function uses transform (map) reduce to do so. Specifically, ( get transformed to 1, ) get transformed to -1, and every other character gets mapped to 0. Now we can just apply a simple reduction with the + operator, and if the result is equal to 0, we know we had balanced parens in the input source code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def are_parens_matched_map_reduce(s: str) -> bool:
    """
    A more functional approach to check that all parens are matching.
    Uses built-in Python `map` and `reduce` functions.

    Raises:
        SyntaxError: If `s` is empty or if `s` does
        not have matching opened and closed parentheses
    """
    t: List = tokenize(s)
    if len(t) == 0:
        raise SyntaxError(f"Input string cannot be empty.")
    # make sure that user input starts and end with open/close parens
    elif t[0] != "(" or t[-1] != ")":
        raise SyntaxError(
            f'Input string "{s}" must start and end with open/closed parens.'
        )

    ### transform-reduce
    # transform (map) open parens to 1, close parens to -1,
    # and all other chars to 0, then sum the resulting iterator
    # if all parens are matched, res will be 0, otherwise, throw
    # a SyntaxError, because there are mismatched parens
    d = {"(": 1, ")": -1}
    res = reduce(lambda a, b: a + b, map(lambda x: d.get(x, 0), t))
    if res != 0:
        raise SyntaxError(f'Input string "{s}" contains mismatched parens.')
    else:
        return True

Acknowledgements

Thanks to John Crickett for the idea from his site, Coding Challenges!

Thanks to Peter Norvig for helping me when I got stuck, especially in setting up my symbol table for appropriately tracking scope.

If you happen to peruse my code and notice any bugs or opportunities for optimizations, please let me know!

This post is licensed under CC BY 4.0 by the author.