No Time To Play

Code versus data: a battle for the ages

by on Mar.26, 2015, under Miscellaneous

You know what HTML is, right? It’s the file format browsers read in order to show you pretty web pages, with all the bells and whistles you’re accustomed to. It can get quite complex, but at its most basic it nothing more than this:

Here is some bold text for your enjoyment.

Let’s look at the HTML that produces the text above. Tell me, is it
code or data?

Here is some <b>bold text</b> for your enjoyment.

It looks like data, right? After all, it’s mostly text, meant to be read by human beings. They even call these HTML documents! How could anyone think it’s code?

Well, I say it is, because it instructs your web browser to do five things in sequence:

  1. Display “Here is some “;
  2. switch to bold text;
  3. display “bold text”;
  4. switch to regular text;
  5. display ” to work with.”.

Does that look like programming to you yet? Maybe it’s not cryptic enough. Let’s see how the same effect could be accomplished with an older language called Troff, that they used in the mainframe era:

Here is some
.B
bold text
.R
for your enjoyment.

There you go. The exact sequence of instructions I listed above, made explicit — a big no-no nowadays, when we like to pretend computers are easy. But even if you just select the text and click “Bold” in your favorite editor, deep down you’re expressing the same thing — a little computer program.

How is that possible? Easy: because data is never “just data”. Computers act on it (and so do people). But in order to act upon data, we need to give it meaning — to interpret it.

And once you interpret data, how is it any different from a program?

Now, I’m not saying HTML is somehow on the same level as BASIC. But many people (even some programmers!) seem to attribute programming languages some sort of mystique — some sort of unique quality that sets them apart from data just because they are made on purpose for telling computers what to do.

But data can do that just as well. In fact, computer programs are data too. We’re just conditioned to treat them as distinct most of the time.

Wait, you’re going to say. Some data really is just data, not meant to be interpreted as instructions for the computer. Take a plain text file for example. Sure, it has newlines, but can you consider those instructions to the text editor? That’s kind of ludicrous, right? You have to draw the line somewhere.

Why, sure. The question is where to draw the line. Take these five numbers: 184 1 0 205 51. Now those look perfectly innocuous… no way a computer could find meaning in them. Except… that’s machine code for calling interrupt 0x33 on Intel CPUs — how I used to initialize the mouse in DOS. Get the CPU to execute those numbers as code, even by accident, and it will happily do what the numbers say…

That’s how buffer overflow attacks happen. Maybe you’ve heard of them. And don’t even get me started about SQL injections. There’s no way people would type code into a website’s login form, right? That’s supposed to be data.

Well, you know what they say about assuming…

After such an introduction, you might be surprised to learn that this article is not, in fact, about computer security. But what it is about will take some explaining. For now, let’s say it’s about writing software more easily. Specifically, videogames.

Say you decided to code a very short and crappy text adventure, of the sort kids with home computers used to churn out in the 1980es. A first, naive implementation might look like this:

here = "ground-floor"

cable_plugged = False

def describe_room(room):
    if room == "ground-floor":
        print("You are in an abandoned factory.")
        print("You see: conveyor belt, big black button, stairs")
    elif room == "top-floor":
        print("You're at the top of the conveyor belt.")
        if (cable_plugged):
            print("You see: plugged cable, humming motor.")
        else:
            print("You see: unplugged cable, electric motor.")

def do_command(command):
    global here, cable_plugged
    if here == "ground-floor":
        if command == "up" or command == "u":
            here = "top-floor"
        elif command == "push button":
            if cable_plugged:
                print("The conveyor belt starts moving.")
                print("You have won."); exit()
            else:
                print("Nothing happens.")
        else:
            print("Huh?")
    elif here == "top-floor":
        if command == "down" or command == "d":
            here = "ground-floor"
        elif command == "plug cable":
            print("You plug in the cable. The motor hums.")
            cable_plugged = True
        elif command == "unplug cable":
            print("You unplug the cable. The motor falls silent.")
            cable_plugged = False
        else:
            print("Huh?")

while True:
    describe_room(here)
    command = input("What do you do? ")
    if command == "quit":
        break
    else:
        do_command(command)

Ta-daa! And it’s less than 50 lines of Python 3. Try it, it’s actually playable, if completely uninteresting. But that’s not nearly its biggest problem.

Quite clearly, hardcoding every little detail as a special case simply doesn’t scale. If you tried to add more rooms, more commands, more flags… the complexity would quickly get out of hand. In fact, that was likely one of the first things you’ve learned as a programmer. So what can you do? Identify the commonalities.

here = "ground-floor"

cable_plugged = False

def push_button():
    if cable_plugged:
        print("The conveyor belt starts moving.")
        print("You have won."); exit()
    else:
        print("Nothing happens.")

def plug_cable():
    global cable_plugged
    print("You plug in the cable. The motor hums.")
    cable_plugged = True

def unplug_cable():
    global cable_plugged
    print("You unplug the cable. The motor falls silent.")
    cable_plugged = False

descs = {
    "ground-floor": """You are in an abandoned factory.
        You see: conveyor belt, big black button, stairs""",
    "top-floor": """You're at the top of the conveyor belt.
        You see: insulated cable, electric motor."""
}

exits = {
    "ground-floor": {"up": "top-floor", "u": "top-floor"},
    "top-floor": {"down": "ground-floor", "d": "ground-floor"}
}

commands = {
    "ground-floor": {"push button": push_button},
    "top-floor": {"plug cable": plug_cable, "unplug cable": unplug_cable}
}

while True:
    print(descs[here])
    cmd = input("What do you do? ")
    if cmd == "quit":
        break
    elif cmd in commands[here]:
        commands[here][cmd]()
    elif cmd in exits[here]:
        here = exits[here][cmd]
    else:
        print("Huh?")

Oops! It appears we’ve gained two lines of code and actually lost functionality. So why bother? Easy: because now you can add another room with as little as three lines of code, rather than at least eight in the first version. (Try it!) Add ten more rooms, and you’ve saved fifty lines of code — the entire length of the program so far.

Such is the power of implicit instructions.

Now what if I told you we can do even better? Look at those command handlers. Don’t they look kind of… uniform? Indeed, we can reduce them to a couple of basic forms:

(print a message, set a flag)
(test a flag, if true (print a message, set a flag), else print a message)

(Actually we don’t set a flag in push_button(), we exit directly. But that’s easily changed.)

There’s one more thing to do: group all those flags and messages together.

flags = {"cable_plugged": False, "won": False}

msgs = {
    "winmsg": "The conveyor belt starts moving.",
    "nope": "Nothing happens.",
    "plug": "You plug in the cable. The motor hums.",
    "unplug": "You unplug the cable. The motor falls silent."
}

Now we can rewrite the commands handlers as follows:

commands = {
    "ground-floor": {"push button":
        ("cable_plugged", True, ("winmsg", "won", True), "nope")},
    "top-floor": {"plug cable": ("plug", "cable_plugged", True),
        "unplug cable": ("plug", "cable_plugged", False)}
}

The only snag is that we now need a new Python function — just one — to interpret those tuples for us. Don’t worry, it’s very short:

def do_command(command):
    if isinstance(command, str):
        print(msgs[command])
    elif len(command) == 3:
        print(msgs[command[0]])
        flags[command[1]] = command[2]
    elif flags[command[0]] == command[1]:
        do_command(command[2])
    else:
        do_command(command[3])

And of course now we have to call it instead of the old command handlers:

while not flags["won"]:
    print(descs[here])
    cmd = input("What do you do? ")
    if cmd == "quit":
        break
    elif cmd in commands[here]:
        do_command(commands[here][cmd])
    elif cmd in exits[here]:
        here = exits[here][cmd]
    else:
        print("Huh?")

print("You have won!")

Wait, did I say “interpret”? As a matter of fact, yes: do_command() is quite literally an interpreter for a very simple programming language. (Note the recursive calls: you can have conditionals in conditionals and so on.) Let that sink in for a moment: your data structures now have code in them… while still being “just data”!

There’s only one downside: now the entire code is fully 5 lines longer than before. So what exactly did we achieve here?

The answer is that now the game’s description is entirely separated from the code that runs it. We can rewrite those final 25 lines of Python into any other language and the game data will barely notice. The data is, after all, almost valid JSON. No, really:

{"exits":{"ground-floor": {"u": "top-floor", "up": "top-floor"},
    "top-floor": {"d": "ground-floor", "down": "ground-floor"}},
"here": "ground-floor", "flags": {"cable_plugged": false, "won": false},
"descs": {"ground-floor": "You are in an abandoned factory.\n\t\t\tYou see: conveyor belt, big black button, stairs",
    "top-floor": "You're at the top of the conveyor belt.\n\t\t\tYou see: insulated cable, electric motor."},
"commands": {"ground-floor":
        {"push button": ["cable_plugged", true, ["winmsg", "won", true], "nope"]},
    "top-floor": {"unplug cable": ["plug", "cable_plugged", false],
        "plug cable": ["plug", "cable_plugged", true]}},
"msgs": {"unplug": "You unplug the cable. The motor falls silent.",
    "winmsg": "The conveyor belt starts moving.",
    "nope": "Nothing happens.",
    "plug": "You plug in the cable. The motor hums."}}

What we have here is, in effect, a language for describing text adventures. One that’s completely independent of the serialization format (you can just as easily re-encode the data as XML) and can be read with 25 lines of trivial Python code — or any other language you care to use. Of course, there are some limitations. You can’t do just anything in the command handlers, or have dynamic descriptions — or lose the game. But first of all you can write a variety of text adventures even so, and second, you can always extend the format. I just had too keep it short to my purposes.

This approach isn’t even new. Various text adventure engines used in the 1980es, such as The Quill, worked in exactly this way. And while they were noticeably limited compared to Infocom’s Z-Machine, that was mostly because they had to run on computers with as little as 16 kilobytes of RAM. On a modern PC you can go as sophisticated as you like.

But why a database and not a virtual machine? All modern text adventures use one of the latter — and many graphical adventures, too. And indeed, a VM is more flexible. It can run faster as well, which is especially important on older computers, or in web browsers. But a database has advantages of its own:

  • you can eyeball the data and tweak it by hand;
  • you can analyze and transform the data with automated tools;
  • you can put a graphical front-end on it;
  • if you ever lose the interpreter, you can guess how the system works by reading the data, and make it work again at least partially.

Most importantly for my goal in writing this article, it demonstrates how the line between data and code is blurry at best. And that should change the way you view content, configuration… even scripting. Now you can treat them all as part of a whole, rather than misshapen pieces to juggle awkwardly.

See, there is no code versus data. The conflict was all in your mind.

(Special thanks to Chris Riddoch for suggesting a better approach, and to Abbey Spracklin for inspiring this article in the first place.)

Creative Commons License
Code versus data: a battle for the ages by Felix Pleșoianu is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

:, ,

1 Trackback or Pingback for this entry

Posts by date

March 2015
M T W T F S S
« Feb   Apr »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Posts by month