Coin flips

The von Neumann and Morgenstern development of rational economic agents requires events with arbitrary objective probabilities. Objective probabilities in the world usually arise from symmetries and the simplest example is the flip of a fair coin. If a coin comes up heads with probability 0\le\alpha\le1 , we’ll call it an “\alpha-coin”. Can we use use an ordinary fair coin to simulate an arbitrary \alpha-coin? If so, how many actual flips are required on average to simulate an \alpha-flip?

Simulating an arbitrary coin in 2 fair flips

Here’s a simple way to do it. Consider the binary representation of the desired probability \alpha (eg. .1011011100...). Flip an ordinary coin until it comes up heads. If this takes i flips, look at the ith bit of \alpha. If it is 1, report “heads”, otherwise report “tails”. This process will examine the ith bit with probability 2^{-i}. The total probability of reporting heads is the sum of 2^{-i} over all the 1’s in  \alpha’s binary expansion. But that sum is exactly \alpha! So this procedure reports heads with the correct probability.

How many flips does this require? If we denote the average number of flips by T, then half the time it only requires a single flip and half the time it requires the first flip plus an average of T more flips:

T={1\over2}\cdot1+{1\over2}\cdot (1+T)

Solving this we see that, on average, we can simulate an arbitrary probability coin with only two flips of an ordinary coin!

Simulating any distribution using fair flips

Two flips sounds pretty good, but is that the best we can do? Clearly we can do better for specific values of \alpha. If \alpha={1\over2} , for example, then we require only 1 flip. If \alpha={3\over4} then we can get away with 1{1\over2} flips: if the first flip is heads, report “heads”, otherwise report the second flip. We’d also like to expand the simulation to use an ordinary fair coin to generate any of n possible outcomes with given probabilities p_1, p_2, \ldots, p_n .

Here’s an algorithm which does this using the smallest number of coin flips possible. We can think of all possible outcomes of a sequence of ordinary coin flips as defining an infinite binary tree. The left branch of a node at depth d corresponds to the dth flip coming up heads and the right branch to it coming up tails. Any procedure that repeatedly flips a fair coin and eventually reports a label between 1 and n can be represented by assigning labels to nodes in this infinite binary tree. None of the descendants of a labeled node should be labeled since the algorithm stops flipping once it reports a label. And enough subbranches should be labeled that there is zero probability for having to flip an infinite number of times. A particular node at depth d will be reached with probability 2^{-d}. The sum of the probabilities of all nodes with label i must be p_i.

A tree requiring a minimal average number of flips will not have two nodes labeled i at the same depth. If it did and they were siblings, we could move the label to their parent and reduce the average depth. If it did and they were not siblings, we could swap one of the labels i with the other’s sibling and then move the label i to the parent.

So in an optimal tree, the set of nodes with label i will be at distinct depths d and the sum of 2^{-d} over them must be p_i. This means that the depths of nodes with label i will correspond exactly to the 1s in the binary expansion of p_i! There is an ambiguity in binary expansions since 1=.1111\ldots. If any of the probabilities is rational with a denominator that is a power of 2, we should use the representation without the trailing infinite sequence of 1s in order to get the fastest algorithm. Any such tree will require the same average number of flips to reach a label.

To express this as a specific algorithm, we can choose the particular ordering which labels available nodes in the binary tree at each depth from left to right. The current state will be represented by an integer j which starts with the value 0 and the depth d will count how many flips have occurred. A fair coin is flipped and if it comes up tails, the state j is incremented. We examine the dth bit of the binary expansions of the probabilities p_i. If there are k ones in this bit location, we assign them to the integers from {0} to {k-1}. If j falls in this range, the corresponding index is output as the result. Otherwise, the index j is replaced by 2\cdot(j-k). The fair coin is flipped again and the whole process is repeated until an index is output. This simple procedure produces samples with the proper probabilities and uses as few coin flips on average as possible.

Best case analysis

How many coin flips does this procedure require? It depends on the particular values of the p_is. Let’s first discuss the best case. With n non-zero probability outcomes, the values for the p_is which lead to the fastest simulation are p_i=2^{-i}, for 1\le i\le n-1 and p_n={2^{-n+1}}. For example, for n=5 the distribution is: p_1={1\over{2}} , p_2={1\over{4}} , p_3={1\over{8}} , p_4={1\over{16}} , and p_5={1\over{16}} . With these choices, whenever the fair coin comes up heads, the algorithm reports a result and it always reports a result on the nth flip. If the algorithm didn’t stop on the nth tail, the average number of flips would be 2 as in the first analysis above. It makes it to the nth flip with probability 2^{-n+1} and the immediate result avoids an average of 2 additional flips. Thus the average number of flips required is only 2-{1\over2^{n-2}}.

Worst case analysis

What’s the worst case? The analysis is somewhat trickier here. As in the analysis above, the expected number of flips T will be the solution to a recursive equation. We want the p_is to be between 0 and 1, sum to 1, and produce the greatest average number of flips. But we can’t directly recurse on this because the p_is shifted left by one bit may sum to more than 1, producing carry bits. With n values it is possible to produce a carry C as large as n-1. In order to produce a recursion relation, we need to determine the worst average time as a function T(C) of the carry produced. To avoid the ambiguity in representing binary fractions, let’s assume that the p_i are not rational numbers with power of 2 denominators. In that case, to have \sum_ip_i=1, the sum of each column together with any carry from the right must produce a 1 in the least significant bit. The worst case arises when this requirement is satisfied and the leftmost column has as few bits k as possible. To produce a carry of C and a least significant bit of 1, we must have 2\cdot C+1=R+k, where R is the carry from the right and k is the number of bits in the first column. We can easily solve for R as a function of C which makes k as small as possible. If C\le{{n-2}\over2}, then we can have k=0 and R=2\cdot C +1. If C>{{n-2}\over2}, then we have R=n-1 and k=2\cdot C-n+2.
Copyright 2007 Stephen M. Omohundro

Leave a Reply