The von Neumann and Morgenstern development of rational economic agents requires events with arbitrary objective probabilities. Objective probabilities in the world usually arise from symmetries and the simplest example is the flip of a fair coin. If a coin comes up heads with probability , we’ll call it an “
-coin”. Can we use use an ordinary fair coin to simulate an arbitrary
-coin? If so, how many actual flips are required on average to simulate an
-flip?
Simulating an arbitrary coin in 2 fair flips
Here’s a simple way to do it. Consider the binary representation of the desired probability (eg.
). Flip an ordinary coin until it comes up heads. If this takes
flips, look at the
th bit of
. If it is 1, report “heads”, otherwise report “tails”. This process will examine the
th bit with probability
. The total probability of reporting heads is the sum of
over all the 1’s in
’s binary expansion. But that sum is exactly
! So this procedure reports heads with the correct probability.
How many flips does this require? If we denote the average number of flips by , then half the time it only requires a single flip and half the time it requires the first flip plus an average of
more flips:
Solving this we see that, on average, we can simulate an arbitrary probability coin with only two flips of an ordinary coin!
Simulating any distribution using fair flips
Two flips sounds pretty good, but is that the best we can do? Clearly we can do better for specific values of . If
, for example, then we require only 1 flip. If
then we can get away with
flips: if the first flip is heads, report “heads”, otherwise report the second flip. We’d also like to expand the simulation to use an ordinary fair coin to generate any of
possible outcomes with given probabilities
.
Here’s an algorithm which does this using the smallest number of coin flips possible. We can think of all possible outcomes of a sequence of ordinary coin flips as defining an infinite binary tree. The left branch of a node at depth corresponds to the
th flip coming up heads and the right branch to it coming up tails. Any procedure that repeatedly flips a fair coin and eventually reports a label between
and
can be represented by assigning labels to nodes in this infinite binary tree. None of the descendants of a labeled node should be labeled since the algorithm stops flipping once it reports a label. And enough subbranches should be labeled that there is zero probability for having to flip an infinite number of times. A particular node at depth
will be reached with probability
. The sum of the probabilities of all nodes with label
must be
.
A tree requiring a minimal average number of flips will not have two nodes labeled at the same depth. If it did and they were siblings, we could move the label to their parent and reduce the average depth. If it did and they were not siblings, we could swap one of the labels
with the other’s sibling and then move the label
to the parent.
So in an optimal tree, the set of nodes with label will be at distinct depths
and the sum of
over them must be
. This means that the depths of nodes with label
will correspond exactly to the
s in the binary expansion of
! There is an ambiguity in binary expansions since
. If any of the probabilities is rational with a denominator that is a power of 2, we should use the representation without the trailing infinite sequence of
s in order to get the fastest algorithm. Any such tree will require the same average number of flips to reach a label.
To express this as a specific algorithm, we can choose the particular ordering which labels available nodes in the binary tree at each depth from left to right. The current state will be represented by an integer which starts with the value 0 and the depth
will count how many flips have occurred. A fair coin is flipped and if it comes up tails, the state
is incremented. We examine the
th bit of the binary expansions of the probabilities
. If there are
ones in this bit location, we assign them to the integers from
to
. If
falls in this range, the corresponding index is output as the result. Otherwise, the index
is replaced by
. The fair coin is flipped again and the whole process is repeated until an index is output. This simple procedure produces samples with the proper probabilities and uses as few coin flips on average as possible.
Best case analysis
How many coin flips does this procedure require? It depends on the particular values of the s. Let’s first discuss the best case. With
non-zero probability outcomes, the values for the
s which lead to the fastest simulation are
, for
and
. For example, for
the distribution is:
,
,
,
, and
. With these choices, whenever the fair coin comes up heads, the algorithm reports a result and it always reports a result on the
th flip. If the algorithm didn’t stop on the
th tail, the average number of flips would be 2 as in the first analysis above. It makes it to the
th flip with probability
and the immediate result avoids an average of 2 additional flips. Thus the average number of flips required is only
.
Worst case analysis
What’s the worst case? The analysis is somewhat trickier here. As in the analysis above, the expected number of flips will be the solution to a recursive equation. We want the
s to be between 0 and 1, sum to 1, and produce the greatest average number of flips. But we can’t directly recurse on this because the
s shifted left by one bit may sum to more than 1, producing carry bits. With
values it is possible to produce a carry
as large as
. In order to produce a recursion relation, we need to determine the worst average time as a function
of the carry produced. To avoid the ambiguity in representing binary fractions, let’s assume that the
are not rational numbers with power of 2 denominators. In that case, to have
, the sum of each column together with any carry from the right must produce a 1 in the least significant bit. The worst case arises when this requirement is satisfied and the leftmost column has as few bits
as possible. To produce a carry of
and a least significant bit of 1, we must have
, where
is the carry from the right and
is the number of bits in the first column. We can easily solve for
as a function of
which makes
as small as possible. If
, then we can have
and
. If
, then we have
and
.
Copyright 2007 Stephen M. Omohundro
Filed under: algorithms, essays, probability