R
1
R
2
R
3
R
4
Figure 2. Scratch space as a circular list
If H is 1, we must halt the machine. We do so by reading from
the invalid memory address 0:
mov [N], 0 ;; select between 0 and N
mov [N+1], N
mov X, [N + H]
mov X, [X] ;; load from 0 or N
If this memory access does not halt the program, then we have
successfully found another candidate transition and put a pointer to
it in T, and we have the current symbol cell in S. We are therefore
in a suitable state to run the program again, so we use our single
unconditional jump to loop back to the start:
jmp start
This simultes an arbitrary Turing machine, using only mov (and
a single jmp to loop the program).
6. Register use
While building our Turing machine simulator, we made free use
of many temporary registers. The x86 architecture has never been
accused of having too many registers, and we seem to already have
exceeded the budget of 8 general-purpose registers.
The registers could be allocated more carefully to fit within the
limit, but we take a different approach: we show that any program
that can be implemented using our restricted instruction set can be
translated to an equivalent program using at most four registers.
Suppose the original program uses n registers R
1
, . . . , R
n
. We
represent these in the translated program with n preallocated cells
of scratch space. The second word of each cell in scratch space
points to the next cell, and the second word of the last cell points to
the first, laying them out as a circularly linked list (see Figure 2).
Our four registers are S, which at startup points to the first cell of
scratch space, and three other registers A, B and C. We can advance
S to the next cell as follows:
mov A, 1
mov S, [S + A]
If the instruction mov S, [S + 1] is available directly, then
this can be done without using A. This allows us to load any R
i
value into A, B or C: we load 1 into the destination register, then
advance S to the correct position, and then perform mov A, [S].
We always know during the translation which scratch cell S
points to, so the number of mov S, [S + A] instructions required
to move S to the right scrach register is easily determined. For in-
stance, to load R
2
, R
4
and R
1
into registers A, B and C respectively,
we generate the following code (assuming four scratch cells):
mov A, 1
mov S, [S + A]
mov A, [S]
mov B, 1
mov S, [S + B]
mov S, [S + B]
mov B, [S]
mov C, 1
mov S, [S + C] ;; wraps back to R1
mov C, [S]
Our operations have at most three source operands (for indexed
store), so for any operation we can use a sequence like the above to
load the operands into the registers A, B and C. We generate code to
perform the operation using those registers, and then generate code
to store the result back into the scrach space (if there is a result).
We assume that the result is in register A, but it is trivial to modify
the following for any other result register.
We use B as a scratch register, and cycle S as before:
mov B, 1
mov S, [S + A]
The mov S, [S + B] instruction is repeated as many times as
necessary to reach the desired scratch cell, and then the result is
stored using mov [S], A.
This transformation works for any program that can be imple-
mented in our restricted instruction set, including the Turing ma-
chine simulator of the previous section. It is therefore possible to
simulate an arbitrary Turing machine using only the mov instruction
and four registers.
Thus, while it has been known for quite some time that x86 has
far too many instructions, we can now contribute the novel result
that it also has far too many registers.
7. Discussion
Finding Turing-completeness in unlikely places has long been a
pastime of bored computer scientists. The number of bizarre ma-
chines that have been shown Turing-complete is far too great to
describe them here, but a few resemble what this paper describes.
That a computer can get by with just one instruction was shown
by Farhad Mavaddat and Behrooz Parhami [2]. Their machine uses
a single instruction that takes two memory addresses and a jump
target, and its operation is “subtract and branch if less than or
equal”, combining arithmetic, memory load, memory store and
conditional branching into one instruction.
A single instruction machine using only MOVE was described
by Douglas W. Jones [1], where Turing-completeness is gained by
having memory-mapped arithmetic and logic units (so that other
operations can be performed by moving data a predefined mem-
ory location and collecting the result afterwards). Some machines
based on this principle have been built, and are generally known as
“move machines” or “transport-triggered architectures”.
Ra
´
ul Rojas [3] shows that, with self-modifying code, an instruc-
tion set with load, store, increment, zero and unconditional branch-
ing is Turing-complete. In the same paper, he also shows that a
machine without self-modifying code or code generation is Turing-
complete with increment and double-indirect loads and stores. The
double-indirect memory operations use a register as the address of
a memory cell holding the address to access (in pseudo-x86 nota-
tion, the load looks like mov A, [[A]]).
Removing all but the mov instruction from future iterations of
the x86 architecture would have many advantages: the instruc-
tion format would be greatly simplified, the expensive decode unit
would become much cheaper, and silicon currently used for com-
plex functional units could be repurposed as even more cache. As
long as someone else implements the compiler.
References
[1] D. W. Jones. The Ultimate RISC. ACM SIGARCH Computer Architec-
ture News, 16(3):48–55, 1988.
[2] F. Mavaddat and B. Parhamt. URISC: The Ultimate Reduced Instruc-
tion Set Computer. International Journal of Electrical Engineering Ed-
ucation, 25(4):327–334, 1988.
[3] R. Rojas. Conditional Branching is not Necessary for Universal Com-
putation in von Neumann Computers. Journal of Universal Computer
Science, 2(11):756–768, 1996.
4 2013/7/19