You know what HTML is, right? It's the file format browsers read in
order to show you pretty web pages, with all the bells and whistles
you're accustomed to. It can get quite complex, but at its most basic
it nothing more than this:
Here is some bold text for your enjoyment.
Let's look at the HTML that produces the text above. Tell me, is it
code or data?
Here is some <b>bold text</b> for your enjoyment.
It looks like data, right? After all, it's mostly text, meant to be
read by human beings. They even call these HTML documents! How could
anyone think it's code?
Well, I say it is, because it instructs your web browser to do five
things in sequence:
- Display "Here is some ";
- switch to bold text;
- display "bold text";
- switch to regular text;
- display " for your enjoyment.".
Does that look like programming to you yet? Maybe it's not cryptic
enough. Let's see how the same effect could be accomplished with an
older language called Troff, that they used in the mainframe era:
Here is some
.B
bold text
.R
for your enjoyment.
There you go. The exact sequence of instructions I listed above, made
explicit — a big no-no nowadays, when we like to pretend computers
are easy. But even if you just select the text and click "Bold" in
your favorite editor, deep down you're expressing the same thing —
a little computer program.
How is that possible? Easy: because data is never "just data".
Computers act on it (and so do people). But in order to act upon
data, we need to give it meaning — to interpret it.
And once you interpret data, how is it any different from a program?
Now, I'm not saying HTML is somehow on the same level as BASIC. But
many people (even some programmers!) seem to attribute programming
languages some sort of mystique — some sort of unique quality that
sets them apart from data just because they are made on purpose for
telling computers what to do.
But data can do that just as well. In fact, computer programs are data
too. We're just conditioned to treat them as distinct most of the time.
Wait, you're going to say. Some data really is just data, not meant to
be interpreted as instructions for the computer. Take a plain text
file for example. Sure, it has newlines, but can you consider those
instructions to the text editor? That's kind of ludicrous, right? You
have to draw the line somewhere.
Why, sure. The question is where to draw the line. Take these five
numbers: 184 1 0 205 51. Now those look perfectly innocuous... no way
a computer could find meaning in them. Except... that's machine code
for calling interrupt 0x33 on Intel CPUs — how I used to initialize
the mouse in DOS. Get the CPU to execute those numbers as code, even
by accident, and it will happily do what the numbers say...
That's how buffer overflow attacks happen. Maybe you've heard
of them. And don't even get me started about SQL injections.
There's no way people would type code into a website's login form,
right? That's supposed to be data.
Well, you know what they say about assuming...
After such an introduction, you might be surprised to learn that this
article is not, in fact, about computer security. But what it is
about will take some explaining. For now, let's say it's about writing
software more easily. Specifically, videogames.
Say you decided to code a very short and crappy text adventure, of the
sort kids with home computers used to churn out in the 1980es. A first,
naive implementation might look like this:
here = "ground-floor"
cable_plugged = False
def describe_room(room):
if room == "ground-floor":
print("You are in an abandoned factory.")
print("You see: conveyor belt, big black button, stairs")
elif room == "top-floor":
print("You're at the top of the conveyor belt.")
if (cable_plugged):
print("You see: plugged cable, humming motor.")
else:
print("You see: unplugged cable, electric motor.")
def do_command(command):
global here, cable_plugged
if here == "ground-floor":
if command == "up" or command == "u":
here = "top-floor"
elif command == "push button":
if cable_plugged:
print("The conveyor belt starts moving.")
print("You have won."); exit()
else:
print("Nothing happens.")
else:
print("Huh?")
elif here == "top-floor":
if command == "down" or command == "d":
here = "ground-floor"
elif command == "plug cable":
print("You plug in the cable. The motor hums.")
cable_plugged = True
elif command == "unplug cable":
print("You unplug the cable. The motor falls silent.")
cable_plugged = False
else:
print("Huh?")
while True:
describe_room(here)
command = input("What do you do? ")
if command == "quit":
break
else:
do_command(command)
Ta-daa! And it's less than 50 lines of Python 3. Try it, it's actually
playable, if completely uninteresting. But that's not nearly its
biggest problem.
Quite clearly, hardcoding every little detail as a special case simply
doesn't scale. If you tried to add more rooms, more commands, more
flags... the complexity would quickly get out of hand. In fact, that
was likely one of the first things you've learned as a programmer. So
what can you do? Identify the commonalities.
here = "ground-floor"
cable_plugged = False
def push_button():
if cable_plugged:
print("The conveyor belt starts moving.")
print("You have won."); exit()
else:
print("Nothing happens.")
def plug_cable():
global cable_plugged
print("You plug in the cable. The motor hums.")
cable_plugged = True
def unplug_cable():
global cable_plugged
print("You unplug the cable. The motor falls silent.")
cable_plugged = False
descs = {
"ground-floor": """You are in an abandoned factory.
You see: conveyor belt, big black button, stairs""",
"top-floor": """You're at the top of the conveyor belt.
You see: insulated cable, electric motor."""
}
exits = {
"ground-floor": {"up": "top-floor", "u": "top-floor"},
"top-floor": {"down": "ground-floor", "d": "ground-floor"}
}
commands = {
"ground-floor": {"push button": push_button},
"top-floor": {"plug cable": plug_cable, "unplug cable": unplug_cable}
}
while True:
print(descs[here])
cmd = input("What do you do? ")
if cmd == "quit":
break
elif cmd in commands[here]:
commands[here][cmd]()
elif cmd in exits[here]:
here = exits[here][cmd]
else:
print("Huh?")
Oops! It appears we've gained two lines of code and actually lost
functionality. So why bother? Easy: because now you can add another
room with as little as three lines of code, rather than at least eight
in the first version. (Try it!) Add ten more rooms, and you've saved
fifty lines of code — the entire length of the program so far.
Such is the power of implicit instructions.
Now what if I told you we can do even better? Look at those command
handlers. Don't they look kind of... uniform? Indeed, we can reduce
them to a couple of basic forms:
(print a message, set a flag)
(test a flag, if true (print a message, set a flag), else print a message)
(Actually we don't set a flag in push_button()
, we exit directly. But
that's easily changed.)
There's one more thing to do: group all those flags and messages
together.
flags = {"cable_plugged": False, "won": False}
msgs = {
"winmsg": "The conveyor belt starts moving.",
"nope": "Nothing happens.",
"plug": "You plug in the cable. The motor hums.",
"unplug": "You unplug the cable. The motor falls silent."
}
Now we can rewrite the commands handlers as follows:
commands = {
"ground-floor": {"push button":
("cable_plugged", True, ("winmsg", "won", True), "nope")},
"top-floor": {"plug cable": ("plug", "cable_plugged", True),
"unplug cable": ("plug", "cable_plugged", False)}
}
The only snag is that we now need a new Python function — just one —
to interpret those tuples for us. Don't worry, it's very short:
def do_command(command):
if isinstance(command, str):
print(msgs[command])
elif len(command) == 3:
print(msgs[command[0]])
flags[command[1]] = command[2]
elif flags[command[0]] == command[1]:
do_command(command[2])
else:
do_command(command[3])
And of course now we have to call it instead of the old command
handlers:
while not flags["won"]:
print(descs[here])
cmd = input("What do you do? ")
if cmd == "quit":
break
elif cmd in commands[here]:
do_command(commands[here][cmd])
elif cmd in exits[here]:
here = exits[here][cmd]
else:
print("Huh?")
print("You have won!")
Wait, did I say "interpret"? As a matter of fact, yes: do_command()
is
quite literally an interpreter for a very simple programming language.
(Note the recursive calls: you can have conditionals in conditionals
and so on.) Let that sink in for a moment: your data structures now
have code in them... while still being "just data"!
There's only one downside: now the entire code is fully 5 lines longer
than before. So what exactly did we achieve here?
The answer is that now the game's description is entirely separated
from the code that runs it. We can rewrite those final 25 lines of
Python into any other language and the game data will barely notice.
The data is, after all, almost valid JSON. No, really:
{"exits":{"ground-floor": {"u": "top-floor", "up": "top-floor"},
"top-floor": {"d": "ground-floor", "down": "ground-floor"}},
"here": "ground-floor", "flags": {"cable_plugged": false, "won": false},
"descs": {"ground-floor": "You are in an abandoned factory.\n\t\t\tYou see: conveyor belt, big black button, stairs",
"top-floor": "You're at the top of the conveyor belt.\n\t\t\tYou see: insulated cable, electric motor."},
"commands": {"ground-floor":
{"push button": ["cable_plugged", true, ["winmsg", "won", true], "nope"]},
"top-floor": {"unplug cable": ["plug", "cable_plugged", false],
"plug cable": ["plug", "cable_plugged", true]}},
"msgs": {"unplug": "You unplug the cable. The motor falls silent.",
"winmsg": "The conveyor belt starts moving.",
"nope": "Nothing happens.",
"plug": "You plug in the cable. The motor hums."}}
What we have here is, in effect, a language for describing text
adventures. One that's completely independent of the serialization
format (you can just as easily re-encode the data as XML) and can be
read with 25 lines of trivial Python code — or any other language you
care to use. Of course, there are some limitations. You can't do just
anything in the command handlers, or have dynamic descriptions — or
lose the game. But first of all you can write a variety of text
adventures even so, and second, you can always extend the format. I
just had too keep it short to my purposes.
This approach isn't even new. Various text adventure engines used in
the 1980es, such as The Quill, worked in exactly this way.
And while they were noticeably limited compared to Infocom's Z-Machine,
that was mostly because they had to run on computers with as little as
16 kilobytes of RAM. On a modern PC you can go as sophisticated as you
like.
But why a database and not a virtual machine? All modern text
adventures use one of the latter — and many graphical adventures, too.
And indeed, a VM is more flexible. It can run faster as well, which is
especially important on older computers, or in web browsers. But a
database has advantages of its own:
- you can eyeball the data and tweak it by hand;
- you can analyze and transform the data with automated tools;
- you can put a graphical front-end on it;
- if you ever lose the interpreter, you can guess how the system works by reading the data, and make it work again at least partially.
Most importantly for my goal in writing this article, it demonstrates
how the line between data and code is blurry at best. And that should
change the way you view content, configuration... even scripting.
Now you can treat them all as part of a whole, rather than misshapen
pieces to juggle awkwardly.
See, there is no code versus data. The conflict was all in your mind.
(Special thanks to Chris Riddoch for suggesting a better approach, and
to Abbey Spracklin for inspiring this article in the first place.)