drcabana.org
Computational divertissements


Quines and the halting problem

Posted on

Quines are generally portrayed as curiosities, often with the focus on writing the shortest possible quine. This leads to pretty little things like this Haskell one-liner:

main=putStr.ap(++)show$"main=putStr.ap(++)show$"

That is really slick. By comparison, the quine I demonstrated earlier has the elegant lines of a UPS van. But cargo vans do have their charms, not the least of which is the ability to haul cargo. With some refactoring we can make a quine haul computational cargo. Here's how.

We'll start with the by now familiar quine below, and do a bit of refactoring. The attentive reader will notice that relative to the earlier version, I have dropped the PEP 8 four space indentation of data. It is annoying and I am over it.

data = [
"data = [",
"]",
"q = chr(34)",
"boundary = 1",
"for d in data[:boundary]:",
"    print(d)",
"for d in data:",
"    print(q + d + q + ',')",
"for d in data[boundary:]:",
"    print(d)",
]
q = chr(34)
boundary = 1
for d in data[:boundary]:
    print(d)
for d in data:
    print(q + d + q + ',')
for d in data[boundary:]:
    print(d)

The first step is to repackage the print loops as a function. Instead of printing, the function quine returns a string. We leave the printing to the top level of the program. Here's the idea:

data = [
"data = [",
"]",
"def quine():",
"    q = chr(34)",
"    boundary = 1",
"    return data[:boundary] + [q + d + q + ',' for d in data] + data[boundary:]",
"for line in quine():",
"    print(line)",
]
def quine():
    q = chr(34)
    boundary = 1
    return data[:boundary] + [q + d + q + ',' for d in data] + data[boundary:]
for line in quine():
    print(line)

Notice that we can pack in arbitrary code and still preserve the quine property, so long as we add the new code as both data and code. For example, let's insert a hello function.

data = [
"data = [",
"]",
"def hello():",
"    return 'Hello'",
"def quine():",
"    q = chr(34)",
"    boundary = 1",
"    return data[:boundary] + [q + d + q + ',' for d in data] + data[boundary:]",
"for line in quine():",
"    print(line)",
]
def hello():
    return 'Hello'
def quine():
    q = chr(34)
    boundary = 1
    return data[:boundary] + [q + d + q + ',' for d in data] + data[boundary:]
for line in quine():
    print(line)

Now lets modify the top level so the program can express its dual nature. The new version will behave differently depending on whether it is invoked with a command line argument. Notice that because we stuck an import statement before declaring data we had to increment boundary.

import sys
data = [
"import sys",
"data = [",
"]",
"def hello():",
"    return 'Hello'",
"def quine():",
"    q = chr(34)",
"    boundary = 2",
"    return data[:boundary] + [q + d + q + ',' for d in data] + data[boundary:]",
"if __name__ == '__main__' :",
"    if len(sys.argv) > 1:",
"        for line in quine():",
"            print(line)",
"    else:",
"        print(hello())",
]
def hello():
    return 'Hello'
def quine():
    q = chr(34)
    boundary = 2
    return data[:boundary] + [q + d + q + ',' for d in data] + data[boundary:]
if __name__ == '__main__' :
    if len(sys.argv) > 1:
        for line in quine():
            print(line)
    else:
        print(hello())

The new program is a chameleon. It is a hello world program that can act as a quine on demand, as seen below.

❯ python chameleon.py
Hello

❯ diff -s chameleon.py =(python chameleon.py -q)
Files chameleon.py and /tmp/zshd1nU9h are identical

With a bit of work, arbitrary programs can be given the ability to act as quines when desired. So it turns out quines are not particularly rare. But are programs with quine powers even useful? Two possibilities jump to mind.

An obvious idea is to write worms. That strikes me as a knucklehead thing to do, but there it is. Writing worms is pretty well understood by now, so it's not like I'm letting the cat out of the bag.

The other idea (my motive for writing this post) is to give a simple explanation of why the halting problem is insoluble. An explanation that makes intuitive sense. Turing's original explanation was basically a diagonalization argument. That type of argument can be challenging for people who are not comfortable with mathematics. I think quines can shed light here.

Suppose for the sake of argument that there were a program capable of solving the halting problem. I'm going to call such a program a 'halting oracle'. What exactly does a halting oracle claim to be able to do?

Basically, it is a program that accepts other programs as input. It returns True if the input program eventually halts, and it returns False if it does not. The oracle must decide correctly, every time. Unlike many programs, the oracle never fails to halt. It always returns an answer. Given any program, it correctly decides whether that program eventually halts.

It turns out there are no halting oracles. They do not exist because they cannot exist. Here is why. Suppose some program Halts claims to be a halting oracle. We can write a program Q designed to stump any purported halting oracle. We build Q so that it has its own source code available via a quine function, as seen above. Q calls the oracle, and does something along these lines:

if Halts(quine()):
  while true: print('I win')
else:
  return 'You lose'

Q in effect asks the oracle 'Do I halt?'. If the oracle replies True (yes), then Q branches to an infinite loop. If the oracle replies False (no) then Q immediately halts. Heads I win, tails you lose. The point is that a halting oracle must correctly decide every time. Halts cannot do so for Q, and so fails to meet the requirements.

Concluding remarks

It's too bad about the non-existence of halting oracles, such a program would be very handy. So much so as to be magical. It could make short work of tough mathematical questions. For instance, consider what was long known as the Fermat conjecture. Fermat claimed that he had a proof that there are no positive integer solutions to the equation a^n + b^n = c^n where n is greater than 2. That claim was discovered after Fermat's death. If Fermat had a proof no one ever found it. Generations of mathematicians got nerd sniped into looking for a proof (or a refutation) of Fermat's claim. It took over three and a half centuries of effort to find a proof. A halting oracle would have made short work of the matter. The conjecture is parameterized by four positive integers. Write a program to enumerate all such 4-tuples, and sequentially test each one to see whether it satisfies a^n + b^n = c^n. If the program finds one that does, halt. Otherwise just keep looking. Here's the good part: don't try run the program, it might never stop. Instead give it to the halting oracle, which presumably would say the program does not halt, hence Fermat's claim is correct.

Too ivory tower for you? How about using a halting oracle to snag some of that sweet bitcoin that you missed out on when it sold for pennies? Breaking encryption is easy if you have a really really fast way to exhaustively search huge keyspaces, say at least 2^2048 distinct keys. It is not even remotely feasible to exhaustively search spaces that size with current tech. But a halting oracle is magic. Write a program that can enumerate the entire keyspace, but have it walk just the first half, testing each potential key to see whether it is the one sought. If the key is found, return it. If the key is not found in that first half of the keyspace, then loop forever. The cool part is you don't run the program, instead you ask the oracle whether it halts. If it does, the key is in the first half of the keyspace. If not, it is in the second half. Rinse, lather, repeat. You now have binary search on a vast scale, without needing the order properties that conventional binary search requires, or even actually running any search at all. Magic!

It turns out that halting oracles are just too good to be true.