Wednesday, January 23, 2013

The Birthday Paradox

Imagine that you are sitting in a classroom with 22 other people. How likely is it that two people in this room share a birthday? As it turns out, this probability exceeds 50%; that is, more than half the time, two people in a room of 23 will share a birthday. This result may seem counterintuitive. In fact, this likelihood disagrees so often with human intuition that it is famously known as the birthday paradox. In today's post, we will derive this probability using the probabilistic techniques established in previous posts (see our post on permutation statistics for an overview of these techniques).

Let's start by defining $B_1,B_2,\dots,B_n$, where $B_i$ is a random variable for the birthday of person $i$. We will assume that each $B_i$ is independent and identically distributed with a range of values $1,2\dots,365$ (not a leap year). A collision is a pair $B_i,B_j$ for which $B_i = B_j$ (and $i \neq j$). Let $X$ be a random variable for the number of collisions. To resolve the birthday paradox, we want to know the probability that $X$ is at least 1 as a function of $n$.

Let's first examine the expectation of $X$. We can compute $X$ as a sum of indicator variables for each possible collision, from which it follows that
\[E[X] = \sum_{i < j} P(B_i = B_j), \] by linearity of expectation. If we assume that the birthdays are uniformly distributed, it follows that \[E[X] = \frac{1}{365}{n \choose 2}\] as $P(B_i = B_j) = \frac{1}{365}$ for each $i \neq j$. For what value of $n$ might we expect at least one collision? From the equation for $E[X]$, it follows that $E[X] \geq 1$ when $n \geq 28$. This analysis does not address the problem in question but does lend some intuition as to why the claim is true. The idea is that as $n$ increases, the number of ways a collision might occur increases at rate of $O(n^2)$. This quadratic growth tells us that we can start to expect a collision when $n$ exceeds (some constant factor times) $\sqrt{m}$, where $m$ is the number of outcomes (in this case $m=365$).

Let's take a step back and try to better estimate $P(X \geq 1)$. We will generalize the problem slightly by assuming that there are $m$ possible outcomes and maintain the assumption that each is equally likely. Let $Y_i$ be the event that the $i$-th outcome (the $i$-th person's birthday) does not collide with any prior outcomes. It then follows that the probability of no collisions is
\[ P(X = 0) = P(Y_1 \cap Y_2 \cap Y_3 \cap \dots \cap Y_n), \] which expands to
\[P(Y_1)P(Y_2 | Y_1)P(Y_3 | Y_1,Y_2) \dots P(Y_n | Y_1,Y_2,\dots,Y_{n-1}),\] which is simply
\[1\left(1-\frac{1}{m}\right)\left(1-\frac{2}{m}\right)\dots \left(1 - \frac{n-1}{m}\right)\] since the $i$-th outcome need simply avoid $i-1$ distinct outcomes out of $m$.

Through some simple analysis, one can show the well known inequality that $1 + x \leq e^x$ for all $x$. Thus, substituting this bound in our above derivation shows that
\[ P(X = 0) \leq e^{-\frac{1}{m}}e^{-\frac{2}{m}}\dots e^{-\frac{n-1}{m}}\] which simplifies to
\[ P(X = 0) \leq e^{-\frac{n(n-1)}{2m}} \] since $1 + 2 + \dots + n-1 = \frac{n(n-1)}{2}$. To resolve the original question we want to know how large $n$ must be for $P(X = 0) \leq 1/2$. Taking the natural logarithm of the above inequality shows that
\[n(n-1) \geq \ln(4)m \] which, for $m = 365$, is satisfied when $n \geq 23$.

Initially, we stated that we only need to assume that each $B_i$ is independent and identically distributed for the birthday paradox to hold. This claim should be fairly intuitive since if birthdays are less uniformly distributed, the likelihood of a collision should increase. Moreover, it turns out that birthdays tend not to be uniformly distributed in practice.

Data source: NYTimes.com, Amitabh Chandra, Harvard University

We can provide a more general analysis by assuming that the $i$-th birthday occurs with probability $p_i$. A collision-free assignment of birthdays is one in which all $n$ people are assigned unique birthdays from the set of $m$ possible birthdays. Let $\mathcal{S(m,n)}$ be the set of all collision-free assignments. The probability of a particular assignment $S \in \mathcal{S(m,n)}$ is simply the product of the probabilities associated with $S$. Thus, it follows that \[P(X = 0) = \sum_{S \in \mathcal{S(m,n)}} \prod_{i \in S} p_i.\] In a uniform distribution, $p_i = \frac{1}{m}$ and the probability reduces to
\[ \frac{m!}{(m-n)!} \left(\frac{1}{m}\right)^n \]
which agrees with our analysis above (multiply the $\frac{1}{m}$ terms into the factorials). Hence, to prove that uniformity minimizes the likelihood of collisions, we want to show that \[ \sum_{S \in \mathcal{S(m,n)}} \prod_{i \in S} p_i \leq \frac{m!}{(m-n)!} \left(\frac{1}{m}\right)^n \] for any choice of probabilities $p_1, p_2, \dots, p_m$.

We can assume that $n \geq 2$ since the claim is trivial otherwise. We can then break up the summation using the following definitions:
  1. Let $\mathcal{A}$ be the set of all $S \in \mathcal{S(m,n)}$ where $1,2 \notin S$. 
  2. Let $\mathcal{B}$ be the set of all $S \in \mathcal{S(m, n - 1)}$ where $1,2 \notin S$. 
  3. Let $\mathcal{C}$ be the set of all $S \in \mathcal{S(m, n - 2)}$ where $1,2 \notin S$.
It follows that \[ P(X = 0) = \sum_{S \in \mathcal{A}} \prod_{i \in S} p_i  + n(p_1 + p_2) \sum_{S \in \mathcal{B}} \prod_{i \in S} p_i  + n(n-1)p_1p_2 \sum_{S \in \mathcal{C}} \prod_{i \in S} p_i . \] Observe that this sum is invariant to changes (exclusive) to $p_1$ and $p_2$ (that preserve a valid probability distribution) except for the $p_1p_2$ term, which is maximized when $p_1 = p_2$. It follows that $P(X = 0)$ is maximized when $p_1 = p_2$. Since $p_1$ and $p_2$ were chosen arbitrarily, it follows that all probabilities must be equal when $P(X = 0)$ is maximized; that is, the uniform distribution maximizes $P(X = 0)$.

This result may seem to be of little practical importance. It turns out, however, that the notion of collisions is of vital importance in computer science. In the study of data structures, hash tables work by storing elements into one of $m$ buckets. The data structure is more efficient when fewer elements are assigned to the same bucket (that is, when there are fewer collisions). How an element is mapped to a bucket is determined by what is called a hash function. By the above analysis, it follows that we prefer hash functions that distribute elements more uniformly. How one constructs such a hash function is a subject for another post...

Wednesday, January 9, 2013

The Platonic Solids

Our most recent post took us back to antiquity and discussed some of the early developments in geometry. Today we will conjoin this discussion with the recurring theme of randomness. As with many pedagogical introductions to probability, we begin our discussion by considering the properties of dice.

Dice have been used throughout the ancient world as early as 5000 years ago. In the Burnt City, located in the far east of modern-day Iran, an ancient game of backgammon was excavated (dating back to circa 3000BC). The dice uncovered with this backgammon set are the oldest known to exist.

An old backgammon set.
A backgammon set recovered from a Swedish warship that sunk in August 1628.

Dice add a compelling mechanic to board games. They introduce fair chance by allowing players to conveniently sample from a uniformly distributed set of discrete events. Most of us are familiar with the standard 6-sided die, which takes on the shape of a cube. However, those familiar with Dungeons & Dragons know well that dice can take on variety of different shapes with a different number of possible outcomes.

A typical variety of dice used in the board game Dungeons & Dragons.

The general geometry of a die can be thought of as a polyhedron. To guarantee that any toss will eventually stop on exactly one face of the die, we consider only convex polyhedra. For the sake of uniformity, we limit our consideration further to only those convex polyhedra for which each side is equally likely to turn up.

One way to guarantee uniformity is by enforcing a kind of symmetry in the shape of the polyhedron. For instance, if each face is indistinguishable from all other faces, we can argue that each face is equally likely to turn up. These geometric structures are referred to as face-transitive (or isohedral) convex polyhedra. The trapezohedron is an example of such a shape, as depicted below for 12 faces.

A 12-face trapezohedron makes a fair 12-sided die.

Though the face-transitive polyhedra are in a sense globally symmetric, they are not locally symmetric. That is, if one considers an individual face, it is not necessarily a regular polygon. If we further restrict the faces in this manner, we arrive at what are called the convex regular polyhedra. As we will prove later, it turns out that there are exactly 5 convex regular polyhedrons, famously known as the Platonic solids.

The 5 platonic solids.

Named after the Greek philosopher Plato, the Platonic solids were deeply interwoven with the philosophies of ancient Greece. The familiar classical elements of ancient Greek philosophy (Earth, Wind, Fire, and Water) were each associated with a Platonic solid, the remaining fifth solid (the Icosahedron) being elevated to divine status.

These peculiar beliefs regarding the Platonic solids persisted beyond antiquity into the dark ages. Johannes Kepler, a pivotal figure in the scientific revolution, investigated ways of connecting the Platonic solids to the motion of the planets. Fortunately, more accurate data on the motions of the planets (made available thanks to Tycho Brahe), led Kepler to derive a far more accurate description of the motion of planets (see Kepler's laws of planetary motion).

Kepler's model of the solar system based on the Platonic solids.

We mentioned previously that the Platonic solids capture all convex regular polyhedra. This fact has been known since antiquity, and a proof of this is included in Euclid's Elements. The proof goes something like the following.

Consider a vertex in a convex regular polyhedron. By convexity, the angles made by the edges incident this vertex must sum to a value less than $2\pi$. Since each vertex must have degree at least 3, we can immediately exclude faces of degree greater than 5. Indeed, the interior angles of a regular n-sided polygon are exactly $\pi(1 - \frac{2}{n})$, which exceeds $\frac{2\pi}{3}$ for all $n \geq 6$. Thus, we only need consider the different ways of producing convex regular polyhedra out of triangles, squares, and pentagons.

The interior angles of triangles are $\frac{\pi}{3}$ radians. Thus, a convex regular polyhedron consisting of triangular faces could have vertices of degree 3, 4, or  5 (degree 6 or more would defy convexity for the reason illustrated above). These polyhedra are the tetrahedron, the octahedron, and the icosahedron.

The interior angles of squares are $\frac{\pi}{2}$ radians. Thus, a convex regular polyhedron consisting of square faces could only have vertices of degree 3, which corresponds to the cube. Likewise, the interior angles of pentagons are $\frac{3\pi}{5}$ enforcing degree 3 vertices, which gives us the dodecahedron, the final Platonic solid.

We conclude today's post by noting another interesting property of these convex regular polyhedrons:

The dual of a Platonic solid is a Platonic solid. 

The dual of a polyhedron is the polyhedron that results from defining a vertex for each face and joining two vertices if their corresponding faces in the original polyhedron shared an edge. For example, the following figure depicts the cube and the octahedron, which are respectively the duals of one another.

The wonders of duality.