The number of ways $n$ distinct elements can be arranged into a linear order is $n!$. Each of these arrangements defines a unique permutation of the $n$ elements. A uniformly random permutation is simply a random variable that takes on one of these $n!$ permutations, each with an equal probability. The growth rate of $n!$ is incredibly fast, as may be more apparent by Stirling's approximation
\[n! \sim \sqrt{2\pi n} \left( \frac{n}{e} \right)^n. \] Thus, it seems reasonable to ask whether a uniformly random permutation can actually be generated in practice.
Anyone familiar with playing cards will likely know a few different techniques for shuffling a deck. One of the most commonly used techniques is the riffle, but how effective is it? If one's goal is to have the cards in a uniformly random order, then the riffle is not very effective on its own (consider how likely it is that the card starting at the bottom of the deck ends up at the top). However, by properly formulating these concepts, the mathematician Persi Diaconis was able to prove that 7 riffles is sufficient for most games.
From a computational perspective, there turns out to be a very simple technique for generating a uniformly random permutation called the Fisher-Yates shuffle. This technique was popularized by the eminent computer scientist Donald Knuth, and he described the technique in detail in Volume 2 of his epic monograph The Art of Computer Programming. The technique is illustrated in the following code snippet:
# To shuffle a list A of n elements (A[1],...,A[n]) for i in 1..n do j = random from i,i+1,...,n swap A[i] and A[j]
Those familiar with computer science will notice that this algorithm has several desirable qualities. For example, the amount of auxiliary space required is small even when A is very large. Furthermore, the algorithm will run to completion quickly. In particular, the Fisher-Yates shuffle is an example of a linear time algorithm since the amount of computation performed is $O(n)$ for a list of size $n$.
We will now consider a game, wherein the problem is not about surviving (like in our previous posts) but instead about reducing the amount of time it takes to perform an action. This notion of computation time will be discussed in more detail in future posts, but for now it will suffice to defer to one's intuition.
A Secret Santa is a game/tradition often played during Christmas in western cultures. Each player is randomly assigned a unique person to whom they must anonymously give a gift. No two people can be assigned the same person, and no one person can be assigned themselves. How can one randomly sample among all valid assignments so that each is chosen with equal probability?
Observe that every valid assignment corresponds to a permutation. However, the converse is not true. In a permutation, a number can be mapped back to its own position, resulting in an invalid assignment. A permutation in which no number maps to its own position is called a derangement. We can therefore generate a Secret Santa assignment as follows:
- Generate an assignment from a uniformly random permutation using the Fisher-Yates shuffle.
- Repeat the previous step until no person is assigned to themselves.
Our assignment procedure is an example of a randomized algorithm. In particular, the amount of time it takes for the procedure to complete is not deterministic. Thus, we need to treat this runtime as a random variable, and in particular, we would like to know the expectation of this random variable; that is, we want to compute the expected runtime of the algorithm.
Let $p$ be the probability that a uniformly random permutation is a derangement. In a given iteration, our algorithm will therefore complete with probability $p$ and continue with probability $1-p$. Thus, we can think of the algorithm as a sequence of independent trials with success probability $p$, where the runtime is given by the number of trials before the first success. That is, the runtime is given by a geometric distribution, which has expectation $1/p$. We can therefore complete the analysis by estimating $p$.
For a uniformly random permutation $\pi$, let $A_i$ be the event that $i$ is mapped to its own position for $i=1,\dots,n$. The probability that $\pi$ is a derangement can therefore be expressed as $P(\overline{A_1} \cap \overline{A_2} \cap \dots \cap \overline{A_n})$. Observe that these events are not independent. The probability $P(\overline{A_i})$ is simply $(1 - 1/n)$ since $i$ has a $1/n$ chance of being mapped to its own position. However, we can argue that $P(\overline{A_i} | \overline{A_1} \cap \dots \cap \overline{A_{i-1}}) \geq P(\overline{A_i})$ since knowing that none of the first $i-1$ elements mapped to their own positions can only decrease the likelihood that $i$ maps to its own position. Thus, we can proceed to bound $p$ from below as
\[ P(\overline{A_1} \cap \overline{A_2} \cap \dots \cap \overline{A_n}) \geq P(\overline{A_1})P(\overline{A_2})\cdots P(\overline{A_n}), \] the right-hand side of which equates to $(1 - 1/n)^n$. Using the definition
\[e^x = \lim_{n \to \infty} \left( 1 + \frac{x}{n} \right)^n \] it follows that $p \geq 1/e$ as $n \to \infty$.
We have not yet made any claim to how accurate this estimate is, but rest assured that this (yet another) surprise appearance of the illustrious $e$ is no coincidence. To better bound this estimate we will employ the inclusion-exclusion principle. The argument goes as follows.
Suppose we start with the trivial approximation that all permutations are derangements. This approximations gives the crude estimate $p \approx 1$. We can improve this by discounting all permutations in which $i$ occurs in its own position for $i = 1,\dots,n$. That is, we can estimate
\[ p \approx 1 - \sum_{i=1}^n \frac{1}{n} \] since $i$ occurs in its own position with probability $1/n$. This approximation gives the slightly less crude estimate $p = 0$. The error in this approximation is that we double counted. Permutations in which distinct $i$ and $j$ both occurred in their own positions where discounted twice, once for $i$ and once for $j$. Thus, we can improve our approximation by adding back these permutations, giving the estimate
\[ p \approx 1 - 1 + \sum_{1 \leq i < j \leq n} \frac{1}{n(n-1)} \] since the probability that distinct $i$ and $j$ both occur in their own positions is $1/(n(n-1))$. Computing the summation gives the slightly better estimate $p=1/2$. The error in this approximation is again a form of double counting. If all distinct $i$, $j$, and $k$ occurred in their own positions, we would have added them back twice (instead of once). Thus, we can repeat the argument to improve the estimate to
\[ p \approx 1 - 1 + 1/2 - \sum_{1 \leq i < j < k \leq n} \frac{1}{n(n-1)(n-2)} \] since the probability that distinct $i$, $j$, and $k$ occur in their own positions is $1/(n(n-1)(n-2))$. Computing this summation gives the improved estimate $p = 1/3$. If we continue this reasoning, eventually our summation will terminate with the $n$-th estimate, giving the exact probability
\[ p = \sum_{k=0}^n \frac{(-1)^k}{k!}. \] This summation is the first $n$ terms of the Taylor series for the exponential function $e^x$ evaluated at $x=-1$. Thus, we can conclude that as $n \to \infty$ $p \to 1/e$. Hence, the simple trial and error approach to generating a uniformly random derangement takes only $e=2.7182818...$ iterations on average. Not too shabby.
We leave you with a straightforward exercise:
Using Stirling's approximation and estimates on how often people play card games, show that no two truly random decks will ever, in human history, be the same (except with an inconceivably low probability).