Another tiny scripting engine family

18 September 2020

Last year I came up with not one but two kinds of interpreter so tiny they can outright hide into a bigger program and serve as scripting engines. Both were used as the basis for real languages that work, thus proving them viable. Yet when I needed to put one into the Ramus revival this spring, I did something entirely different.

We programmers tend to complicate things, but the essence of scripting is being able to tell a computer: "do this; test that; if so then yay, else nay". It's really that simple: a series of statements, each beginning with a word which says how to (wait for it) interpret the remaining words, at least within the same statement. Which in turn is great news for programmers, because it's very easy to parse a language like that: split input into lines and lines into words. Seriously, don't bother with anything more complicated for now. You don't need anything else to parse code like:

set a 0
set b $a
incr a
incr b 2
echo $a $b

Primitive? Sure, but it works, and people new to programming get it easily.

So let's look at one way to make this work. To keep it short, I'll use Python 3. Hopefully it's not too hard to follow even for fans of other languages:

variables = {}
commands = {}

def run_script(text):
    for line in text.split("\n"):
        if line == "" or line.isspace(): continue
        words = [parse_value(i) for i in line.split()]
        name = words[0]
        args = words[1:]
        commands[name](*args)

def parse_value(word):
    if word[0] == "$":
        return variables[word[1:]]
    elif word.isdigit():
        return int(word)
    elif word[0] == "-" and word[1:].isdigit():
        return int(word)
    else:
        return word

How's that! I wrote smaller interpreters in the past, but 20 lines is pretty damn good. Could have made it even shorter by going with just one dictionary, but this way it's more clear what I mean. It ran almost on first try, too.

Well, it's not going to until we define the actual commands:

def do_set(name, value):
    variables[name] = value

def do_incr(name, value=1):
    variables[name] += value

def do_echo(*args):
    print(*args)

commands["set"] = do_set
commands["incr"] = do_incr
commands["echo"] = do_echo

Yep, that's really all you need to run the example at the beginning. (Did you get "1 2" for output as expected?) Other basics can be added just as easily.

It might not be as obvious how to add loops and conditionals. After all, we have no way to declare a literal list, for example, never mind blocks of code. But we don't need to, because a command doesn't have to start at the first word of a line. Consider a script like:

lappend numbers 1 2 3 4 5
foreach i $numbers echo \$i

The lappend command is trivial to implement with the current setup:

def do_lappend(name, *values):
    if name not in variables:
        variables[name] = list(values)
    else:
        variables[name].extend(values)

commands["lappend"] = do_lappend

On the other hand, adding a for-each loop requires reworking all the code so far (except the already defined commands):

def run_script(text):
    for line in text.split("\n"):
        if line == "" or line.isspace(): continue
        run_command(line.split())

def run_command(words):
    words = [parse_value(i) for i in words]
    name = words[0]
    args = words[1:]
    commands[name](*args)

def parse_value(word):
    if word[0] == "\\":
        return word[1:]
    elif word[0] == "$":
        return variables[word[1:]]
    elif word.isdigit():
        return int(word)
    elif word[0] == "-" and word[1:].isdigit():
        return int(word)
    else:
        return word

See what I did there? Now it's possible to rerun a command that was already split up. Also, to pass through a word starting with a dollar sign so it's only parsed when it's supposed to. It took exactly five more lines, and now we can demonstrate that loop:

def do_foreach(name, values, *words):
    for i in values:
        variables[name] = i
        run_command(words)

commands["foreach"] = do_foreach

The way it works is, do_foreach receives all its arguments like any other command, but only keeps two for its own use; the rest make up the loop body. It's fragile and limited: better remember to escape those dollar signs, and you can only have one command in the body. We can work around that, as you'll see, but loops just aren't at home in our little language. Conditionals are another story:

set a 3
test $a > 0
iftrue set b \$a
iftrue incr b -1
iftrue echo Yes!!
iffalse set b 0
iffalse echo Noo...

You can probably guess thar test is supposed to set a flag that subsequent iftrue and iffalse commands can check whenever needed. But what about the conditions proper? Do we have to parse arithmetic expressions now?

Nope! In fact the syntax of test is fixed, always taking the same form:

def do_test(op1, test, op2):
    if test == ">":
        variables["*TEST*"] = (op1 > op2)
    else:
        print("Unknown operator in test:", test)

def do_iftrue(*words):
    if "*TEST*" not in variables:
        print("iftrue without test")
    elif variables["*TEST*"]:
        run_command(words)

def do_iffalse(*words):
    if "*TEST*" not in variables:
        print("iftrue without test")
    elif not variables["*TEST*"]:
        run_command(words)

commands["test"] = do_test
commands["iftrue"] = do_iftrue
commands["iffalse"] = do_iffalse

Yes, adding all the other operators would be tedious, but it can't be helped. And yes, the test result can live just fine in an ordinary variable. No need to modify the interpreter just for that. Except the code doesn't work as shown, because iftrue and iffalse send their respective commands to be re-parsed, and parse_value chokes on anything that's not a string (anymore). It would be easy enough to escape arguments like 0 and -1, but seriously? So let's make another small change:

def parse_value(word):
    if type(word) != str:
        return word
    elif word[0] == "\\":
        return word[1:]
    elif word[0] == "$":
        return variables[word[1:]]
    elif word.isdigit():
        return int(word)
    elif word[0] == "-" and word[1:].isdigit():
        return int(word)
    else:
        return word

Now it should finally run and print "Yes!!", but still. Our language just won't get very far without some means to type literal lists, strings, or anything that can work as blocks of code. Good thing it doesn't need to.

Well, there is something we can do. For one thing, notice how run_command happily accepts any words. Like for example those stored by do_append. We just need to tell them about each other:

def do_eval(*words):
    run_command(*words)

commands["eval"] = do_eval

Just like that, you can now run code like:

lappend thumbs-up echo Okay, \$name !
set name Venus
eval $thumbs-up
set name Steve
eval $thumbs-up

and get (you've guessed it):

Okay, Venus !
Okay, Steve !

Except... it doesn't solve the quoting problem, which was the whole point. We're also going to need some way to tell the interpreter: "set aside any code you see from now on, until further notice, without any parsing". Which in turn means changing run_script again:

compiling = False
proc_name = ""
proc_code = []

def run_script(text):
    global compiling, proc_name, proc_code
    for line in text.split("\n"):
        if line == "" or line.isspace():
            continue
        elif not compiling:
            run_command(line.split())
        elif line.strip() != "end":
            proc_code.append(line.split())
        else:
            variables[proc_name] = list(proc_code)
            proc_name = ""
            proc_code.clear()
            compiling = False

Note how procedures (as we're going to call them) are stored alongside variables, because they're not the same as commands. They don't take arguments for one thing. So we'll need some other way to run them:

def do_call(name):
    for i in variables[name]:
        run_command(i)

commands["call"] = do_call

But we still haven't given the interpreter a way to start compiling in the first place:

def do_proc(name):
    global compiling, proc_name
    compiling = True
    proc_name = name

commands["proc"] = do_proc

Ta-da! Now we can finally run code that looks like,

proc count-up
    incr counter
    echo $counter
end
set counter 0
call count-up
call count-up
call count-up

Note how there's no need for a dollar sign in front of a procedure name when calling it. That's one less way to get in trouble.

So now the foreach command can do more than one thing in the loop body. Still in a clunky way however, and frankly procedures are a poor fit for the language. Worse, sooner or later we're going to want stuff like while loops, and it's just not possible, the way this works so far. So how about we set this first interpreter aside and start over briefly:

variables = {}
commands = {}

code = []
labels = {}
program_counter = 0

def load_script(text):
    for line in text.split("\n"):
        if line == "" or line.isspace():
            continue
        else:
            code.append(line.split())

def run_script():
    global program_counter
    program_counter = 0
    while program_counter < len(code):
        run_command(code[program_counter])
        program_counter += 1

You can copy-paste run_command and parse_value from the original version unchanged. Same for the three commands needed to run the first example, or lappend for that matter. Even foreach still works like before, so why did we bother?

Because now we can also do this:

set d 1
label start
echo $d
incr d
test $d > 10
iffalse goto start

How? With just two more commands:

def do_label(name):
    global program_counter
    labels[name] = program_counter

def do_goto(name):
    global program_counter
    program_counter = labels[name]

commands["label"] = do_label
commands["goto"] = do_goto

Now you see why the new run_script relies on a while loop, and why the program counter is visible to the entire interpreter. Speaking of which: note how after a jump, execution continues with the line after the label. It can still happen for a label command to be called again, but only if we jump to another label placed before it. Which won't change anything, at least unless the code changes in the mean time. But hopefully no-one is going to try while it's running!

How ironic. I've shown you two different ways to do this, and I'm still only three quarters into the intended word count. So let's take the time to look at another issue. Without a way to compose commands, how are we going to, say, take the size of a list and do something with it? Why, it's perfectly possible for a command to place results in predefined variables instead of returning them:

def do_length(value):
    variables["#"] = len(value)

commands["length"] = do_length

That's right... we can name a variable anything we want. Try it:

length $c
echo $#
length hello
echo $#

In fact, list / string length is such an often used operation that Lua for example has an operator for it. And we can, too:

def parse_value(word):
    if type(word) != str:
        return word
    elif word[0] == "\\":
        return word[1:]
    elif word[0] == "$":
        return variables[word[1:]]
    elif word[0] == "#":
        return len(variables[word[1:]])
    elif word.isdigit():
        return int(word)
    elif word[0] == "-" and word[1:].isdigit():
        return int(word)
    else:
        return word

Though of course the echo #c syntax only works for variables, so length is still good to have around.

So this is it. No, really: pretty much everything else is a detail. And it's very easy to add new commands; beware of overdoing it!

That said, after reading the first draft, a friend (hi, Adrick!) asked what the point of this exercise was. Not that it's supposed to have a point. But it does. It was important for this language to be usable by non-programmers and still have a tiny implementation. Neither a Lisp-like nor a Forth-like would have fit the bill. While this still unnamed language was praised for proving more intuitive than Javascript. And the original version used in production was very much improvised. I should update it using insights from this article at some point.

In the mean time, enjoy, and dare to be different.