How to build a quine

Posted on December 28, 2021

I want to show you how to do a magic trick. I first saw the trick on Dan Piponi's blog, A Neighborhood of Infinity. Here's how Dan described it:

The code below spits out a Haskell program that prints out a Perl program that prints out a Python program that prints out a Ruby program that prints out a C program that prints out a Java program that prints out the original program.

I really wanted to understand how Dan did that, but I could not wrap my head around his Haskell code. I decided to to start from scratch, and eventually figured out how to do the trick. This is the first of four posts that explain how in detail.

We'll need some terminology. A quine is a program P such that Eval P = P. That is, evaluating the program produces the original program, or at least a textual representation thereof. A quine relay is a sequence of programs, say [P, Q, R] such that

Eval P = Q
Eval Q = R
Eval R = P.

The exact number of programs does not matter. What matters is that the programs cyclically produce one another. The programs are not required to be written in the same language; it is more interesting when they are not. The Eval functions are taken to be appropriate for the languages at hand. If P, Q, and R are written in different languages, we would require distinct Eval functions, say python, ruby, and lua interpreters.

My initial goal was to figure out how generate quine relays. The trouble was that I had only a vague idea, one I could not even begin to implement. I decided that solving a simpler version of the problem might provide insight. Since I didn't know how to generate quine relays, how about a writing a program to generate ordinary quines? It turned out I didn't know how to do that either. OK, I had to solve an even simpler version of the problem. How about writing a single quine by hand? I started there.

I came up with a four step recipe, illustrated below in python. The approach is easily adapted to various languages, as will be seen in the next post in this series.

Step One

Start with an empty list and three print loops.

data = [
]
for d in data:
    print(d)
for d in data:
    print(d)
for d in data:
    print(d)

The loops don't print anything because data is empty. The plan is to fill data with carefully chosen strings, and to adjust the print loops to print what we need.

Step Two

The middle loop will be special, let's set that up. We want it to wrap each line of data in double quotes, and then append a comma. The variable q is used to denote the double quote character. Notice that the value of q is given as chr(34). The reason for using the character encoding will be explained in the next step.

data = [
]
q = chr(34)
for d in data:
    print(d)
for d in data:
    print(q + d + q + ',')
for d in data:
    print(d)

Now we have a slightly longer Python program that doesn't print anything.

Step Three

To get output we need to put some content into data. Let's do so via a bit of copy-paste.

Copy the entire text of the program so far.
Paste it between the brackets that delimit data.
Wrap each pasted line in double quotes, so that we have valid strings. Take care to preserve indentation.
Separate the strings by appending commas, so that data is a valid list of strings.
Indent each line 4 spaces, for PEP 8 compliance, and adjust the middle print loop accordingly. This step is technically optional, but this whole effort is an exercise in obsessive detail. In for a dime, in for a dollar. Notice that the adjustment to the middle print loop is made to both the code and its representation in the data.

The result looks like this:

data = [
    "data = [",
    "]",
    "q = chr(34)",
    "for d in data:",
    "    print(d)",
    "for d in data:",
    "    print('    ' + q + d + q + ',')",
    "for d in data:",
    "    print(d)",
]
q = chr(34)
for d in data:
    print(d)
for d in data:
    print('    ' + q + d + q + ',')
for d in data:
    print(d)

The reason for defining q via character encoding is that above we wrapped each line of pasted text in double quotes. Had the double quote literal been present inside the pasted text, it would have led to woe and confusion. The quotes used to delimit python strings would have collided with quotes used as characters inside the strings. Better to avoid that via the character encoding trick.

At this point the program does print something, but not what we want. Let's fix it.

Step Four

We will adjust the print loops so that

The first loop emits the text of the first line of the program, by printing its representation, the first element of data.
The second loop emits the entire contents of data, including the double quotes. No change is required.
The third loop emits the text of the rest of program.

Python's slice notation is handy here. We need to make any adjustments twice: once in the working print loops, and again in their quoted analogs. These have to agree.

We also introduce a variable, boundary, used to coordinate the action of the first and third print loops. A variable may seem like overkill, but it will be useful to have this number parameterized when we start constructing quine relays in a subsequent post.

Here's the final version:

data = [
    "data = [",
    "]",
    "q = chr(34)",
    "boundary = 1",
    "for d in data[:boundary]:",
    "    print(d)",
    "for d in data:",
    "    print('    ' + q + d + q + ',')",
    "for d in data[boundary:]:",
    "    print(d)",
]
q = chr(34)
boundary = 1
for d in data[:boundary]:
    print(d)
for d in data:
    print('    ' + q + d + q + ',')
for d in data[boundary:]:
    print(d)

The claim is that this version is a quine. How does it work? Think back to the copy-paste step. We copied every line in the program, and pasted those lines into data. So each line in the text of the resulting program occurs twice: once as data, once as code.

When executed, the program prints each line twice. The middle loop prints all the the data, in quotes. The first and third loops coordinate to print each line of the surrounding code. The two loops coordinate via the value of boundary, which contains the number of lines before the first quoted string.

Testing

To check whether the program (saved as quine.py) is indeed a quine, use the diff utility to compare the file quine.py to the output from running the file.

> python --version
Python 2.7.16

> python3 --version
Python 3.9.5

> diff -s quine.py =(python quine.py)
Files quine.py and /dev/fd/63 are identical

> diff -s quine.py =(python3 quine.py)
Files quine.py and /dev/fd/63 are identical

diff is shown running in a zsh shell above. If you prefer bash, try using different process substitution syntax: <() instead of =(). Or skip process substitution altogether. Simply save the output to a file and diff it against quine.py. The latter approach would make more sense from a pure testing point of view, because it is more likely to succeed on a wider variety of platforms. But it does not display as well in a blog post, and process substitution on the CLI was my actual workflow.

But wait, there's more

The next post in this series uses the same approach to build a quine in F#. The goal will be to illustrate some issues that arise when the programming language changes, and to set the stage for automating quine generation in the subsequent post.

The supporting code repo is drcabana/quines.