Tuesday, September 10, 2013

The Sieve of Eratosthenes

In our last post, we introduced some basic concepts from number theory and, in particular, the concept of prime numbers. In today's post, we continue this foray by discussing some of the computational aspects of prime numbers.

Eratosthenes of Cyrene is often credited for inventing the discipline of geography. He invented the system of longitude and latitude and is famous for accurately predicting the surface area of the Earth during antiquities (as described in this post).

A portrait of Eratosthenes

Another vestigial mathematical development of Eratosthenes is his famous sieve. The sieve provides an elegant method for identifying prime numbers and serves as one of the earliest examples of number-theoretic algorithms.

We know from the prime number theorem that there is $O(\frac{n}{\log{n}})$ prime numbers from $1,2,\dots,n$, but can we find them? Is there a systematic way to identify all of the prime numbers from $1$ to $n$? The sieve of Eratosthenes solves precisely this task and does so efficiently.

We will describe the algorithm as though we are equipped with a pen and papyrus:
  1. Start by writing down all numbers from $2$ to $n$.
  2. The first number must be prime. Underline it, and cross off all multiples of this number.
  3. Repeat 2 starting at the first unmarked number (i.e. the first number that is neither crossed off nor underlined). Once all numbers are marked, the underlined numbers are exactly the primes from $1$ to $n$.
To illustrate the idea further, consider the following example where $n = 10$.
  1. 2, 3, 4, 5, 6, 7, 8, 9, 10; write down all numbers from 2 to 10.
  2. 2, 3, 4, 5, 6, 7, 8, 9, 10; underline 2, cross off its multiples (4, 6, 8, 10).
  3. 2, 3, 4, 5, 6, 7, 8, 9, 10; underline 3, cross off its multiples (6, 9).
  4. 2, 3, 4, 5, 6, 7, 8, 9, 10; underline 5, cross off its multiples (10).
  5. 23, 4, 5, 6, 7, 8, 9, 10; underline 7, cross off its multiples (n/a).
  6. The underlined numbers 2, 3, 5, and 7 are prime.
One considering the problem of outputting all prime numbers from $1$ to $n$ might have first taken a different approach. Another simple algorithm is the following:

# To output all primes from 1 to n
for i in 2..n do
  if no j in 2..i-1 divides i: output i

Though simple, this approach is far less efficient. For a given number $i$, checking whether it is divisible by any smaller number from $2$ to $i-1$ is expensive. That is, for each $i$ we perform $i-2$ division tests, giving a total of \[1 + 2 + 3 + \dots + (n - 3) + (n - 2) = 1/2(n-1)(n-2)\] division tests.

A shrewd observer might notice that we can modify the second loop to perform division tests for only $j = 2$ to $\lfloor \sqrt{i} \rfloor$. This optimization is justified since if there exists an $a$ and $b$ such that $ab = i$, then the smaller of $a$ and $b$ must be at most $\sqrt{i}$. Leveraging this optimization, the runtime is reduced from $O(n^2)$ to $O(n^\frac{3}{2})$.

How does this compare to the sieve of Eratosthenes? To better analyze the computation costs, we will rewrite the algorithm in pseudocode:

# To output all primes from 1 to n
let L = [2..n]
for i in L do:
  if i is not marked:
    output i
    for j in i,2i,3i,...,(n/i)i:
      mark j

Notice that the second loop is only entered for numbers $i$ that are prime and for a given number $i$, it makes $n/i$ iterations. Thus, the total runtime can be bounded by \[ \sum_{p}\frac{n}{p} \] which simplifies to \[n \sum_{p} \frac{1}{p}\] where both sums are over all primes $p \leq n$.

We can bound the sum $ \sum_{p} \frac{1}{p} $ by the harmonic series $ \sum_{i = 1}^n \frac{1}{i} $ which we know from a previous post is $O(\log{n})$. Thus, we have shown that the runtime of the sieve is $O(n\log{n})$, which is quite an improvement over the other algorithm's $O(n^\frac{3}{2})$. However, this analysis is not tight. It turns out that the sum is exponentially smaller; that is, by Euler's theorem on the divergence of the sum of the reciprocals of the primes, we have that $ \sum_{p} \frac{1}{p}$ is $O(\log{\log{n}})$, and thus, the total runtime is $O(n\log{\log{n}})$.

We leave you with a few exercises:
  1. Can you find a way to improve the sieve of Eratosthenes by avoiding "double marking" of numbers?
  2. We showed that the naive algorithm can be implemented using $O(n^\frac{3}{2})$ division tests, but how expensive are division tests? Come up with a way to test if $i$ divides $n$ in $O(\log{n})$ time. 

Thursday, August 15, 2013

1, 2, 3, and to the 4...

With counting being such an integral part of our everyday lives, it's nearly impossible to imagine a world without numbers. How could one trade goods without measuring the quantities being exchanged? How might a shepherd, unable to count sheep, learn that wolves were curtailing his flock? It is not surprising that there are clear remnants of counting systems in use by humans from as long as 30,000 years ago.

One with some background in anthropology will know that the bones of human skeletons (and other animals) were used by the earliest humans to make a myriad of bone tools. Tools for hunting and for building were commonplace, but archaeologists have also recovered ancient bone tools used for counting. The Ishango bone, a tally stick made from a baboon fibula, is thought to be more than 20,000 years old. In fact, the word tally has its origins in the Latin word talus, which literally means heel (think of the derivative word talon).

In one way or another, humans have been counting since our earliest days. However, it was only around 5000 years ago when the ancient Babylonians and Egyptians began the first endeavors into true mathematics. By a series of abstractions, counting led the numeral systems which led to arithmetic and eventually to algebra and geometry. Later, the Greeks laid the foundation for a still actively studied subject called Number Theory, which we'll discuss today.

Let's start with the natural numbers. That is, let $\mathbb{N}$ be the numbers $1, 2, 3, \dots$. For the remainder of this post, we will use the word number to mean a natural number unless otherwise stated.

If a number can be decomposed into a product of two non-unit numbers it is said to be composite. For example, the number $12$ is composite because it can be written as the product of $4$ and $3$; that is, $12 = 4 \cdot 3$. All remaining non-unit numbers are called prime.

Observe that the divisor $4$ of $12$ is also composite since $4 = 2 \cdot 2$. Thus, we can write $12$ as a product of primes $12 = 2 \cdot 2 \cdot 3$. This expression is called a prime factorization. The fact that every non-unit number has a unique prime factorization is known as the fundamental theorem of arithmetic.

That each number $n$ can be written as a unique factorization of primes $n = p_1 p_2  \dots$ leads one to ask questions about these prime numbers. How many are there? What is their distribution like? Do they follow a simple formula? Can we use an algorithm to test if a number is prime? Finding answers to this class of questions has been the focus of research in number theory for millennia and still to this day. We address the first question today and will discuss some elementary results regarding the distribution of primes.

The fact that there are infinitely many primes is known as Euclid's theorem. Every mathematician should know this theorem and Euclid's proof. It goes something like this:

Suppose, by way of contradiction, that there are finitely many primes. There is thus a largest prime $n$. Consider the number $k = 2 \cdot 3 \cdots (n-1) \cdot n + 1$. Clearly $k$ is larger than $n$ and hence, by our assumption, it must be divisible by a prime $p \leq n$. But, observe that $p$ cannot divide $k$ as it would leave a remainder of $1 / p$. This contradiction completes our proof, and there are thus infinitely many primes.

So, now that we know that there are infinitely many primes, where are they? A first step to addressing this question might be to ask how many primes tend to be less than a given number $n$. In other words, as we step through the natural numbers, how often do we come across a number that is prime?

Consider the number ${2n \choose n} = \frac{2n!}{n!n!}$. Observe that every prime number $p$ in the range $n \leq p < 2n$ is in the prime factorization of ${2n \choose n }$, and hence
\[\prod_{n \leq p < 2n} p \leq {2n \choose n}.\] We can furthermore bound ${2n \choose n}$ using Stirling's approximation (see this post) to show that ${2n \choose n} \leq 4^n$. Hence, it follows that
\[ \prod_{n \leq p < 2n} p \leq 4^n. \] We can use this inequality to bound the frequency of primes. Let $\pi(n)$ be the numbers of primes less than $n$. Since
\[n^{\pi(2n) - \pi(n)} \leq \prod_{n \leq p < 2n} p\] we can conclude that
\[n^{\pi(2n) - \pi(n)} \leq 4^n\] which simplifies to
\[\pi(2n) - \pi(n) \leq \log{4} \frac{n}{\log{n}}\] by taking the logarithm of both sides. By induction, it follows immediately that $\pi(n)$ is $O(\frac{n}{\log{n}})$.

We can use a similar approach to show an equivalent lower bound; that is, there are constants $c_1,c_2$ such that
\[ c_1 \frac{n}{\log{n}} \leq \pi(n) \leq c_2 \frac{n}{\log{n}}. \] One of the great achievements in mathematics in the 19th century was a proof that the inequality holds when $c_1 = c_2 = 1$ as $n$ tends to infinity. This result is known as the prime number theorem, which can be restated as
\[ \pi(x) \sim \frac{x}{\log{x}}. \]
The fun of prime numbers does not stop here. Understanding the distribution of primes is at the core of solving the Riemann hypothesis, one of the most famous outstanding problems in all of mathematics. Moreover, prime numbers have extremely practical applications (e.g. RSA encryption), where one requires algorithms to find large prime numbers. We will discuss some of these topics in later posts.

Wednesday, January 23, 2013

The Birthday Paradox

Imagine that you are sitting in a classroom with 22 other people. How likely is it that two people in this room share a birthday? As it turns out, this probability exceeds 50%; that is, more than half the time, two people in a room of 23 will share a birthday. This result may seem counterintuitive. In fact, this likelihood disagrees so often with human intuition that it is famously known as the birthday paradox. In today's post, we will derive this probability using the probabilistic techniques established in previous posts (see our post on permutation statistics for an overview of these techniques).

Let's start by defining $B_1,B_2,\dots,B_n$, where $B_i$ is a random variable for the birthday of person $i$. We will assume that each $B_i$ is independent and identically distributed with a range of values $1,2\dots,365$ (not a leap year). A collision is a pair $B_i,B_j$ for which $B_i = B_j$ (and $i \neq j$). Let $X$ be a random variable for the number of collisions. To resolve the birthday paradox, we want to know the probability that $X$ is at least 1 as a function of $n$.

Let's first examine the expectation of $X$. We can compute $X$ as a sum of indicator variables for each possible collision, from which it follows that
\[E[X] = \sum_{i < j} P(B_i = B_j), \] by linearity of expectation. If we assume that the birthdays are uniformly distributed, it follows that \[E[X] = \frac{1}{365}{n \choose 2}\] as $P(B_i = B_j) = \frac{1}{365}$ for each $i \neq j$. For what value of $n$ might we expect at least one collision? From the equation for $E[X]$, it follows that $E[X] \geq 1$ when $n \geq 28$. This analysis does not address the problem in question but does lend some intuition as to why the claim is true. The idea is that as $n$ increases, the number of ways a collision might occur increases at rate of $O(n^2)$. This quadratic growth tells us that we can start to expect a collision when $n$ exceeds (some constant factor times) $\sqrt{m}$, where $m$ is the number of outcomes (in this case $m=365$).

Let's take a step back and try to better estimate $P(X \geq 1)$. We will generalize the problem slightly by assuming that there are $m$ possible outcomes and maintain the assumption that each is equally likely. Let $Y_i$ be the event that the $i$-th outcome (the $i$-th person's birthday) does not collide with any prior outcomes. It then follows that the probability of no collisions is
\[ P(X = 0) = P(Y_1 \cap Y_2 \cap Y_3 \cap \dots \cap Y_n), \] which expands to
\[P(Y_1)P(Y_2 | Y_1)P(Y_3 | Y_1,Y_2) \dots P(Y_n | Y_1,Y_2,\dots,Y_{n-1}),\] which is simply
\[1\left(1-\frac{1}{m}\right)\left(1-\frac{2}{m}\right)\dots \left(1 - \frac{n-1}{m}\right)\] since the $i$-th outcome need simply avoid $i-1$ distinct outcomes out of $m$.

Through some simple analysis, one can show the well known inequality that $1 + x \leq e^x$ for all $x$. Thus, substituting this bound in our above derivation shows that
\[ P(X = 0) \leq e^{-\frac{1}{m}}e^{-\frac{2}{m}}\dots e^{-\frac{n-1}{m}}\] which simplifies to
\[ P(X = 0) \leq e^{-\frac{n(n-1)}{2m}} \] since $1 + 2 + \dots + n-1 = \frac{n(n-1)}{2}$. To resolve the original question we want to know how large $n$ must be for $P(X = 0) \leq 1/2$. Taking the natural logarithm of the above inequality shows that
\[n(n-1) \geq \ln(4)m \] which, for $m = 365$, is satisfied when $n \geq 23$.

Initially, we stated that we only need to assume that each $B_i$ is independent and identically distributed for the birthday paradox to hold. This claim should be fairly intuitive since if birthdays are less uniformly distributed, the likelihood of a collision should increase. Moreover, it turns out that birthdays tend not to be uniformly distributed in practice.

Data source: NYTimes.com, Amitabh Chandra, Harvard University

We can provide a more general analysis by assuming that the $i$-th birthday occurs with probability $p_i$. A collision-free assignment of birthdays is one in which all $n$ people are assigned unique birthdays from the set of $m$ possible birthdays. Let $\mathcal{S(m,n)}$ be the set of all collision-free assignments. The probability of a particular assignment $S \in \mathcal{S(m,n)}$ is simply the product of the probabilities associated with $S$. Thus, it follows that \[P(X = 0) = \sum_{S \in \mathcal{S(m,n)}} \prod_{i \in S} p_i.\] In a uniform distribution, $p_i = \frac{1}{m}$ and the probability reduces to
\[ \frac{m!}{(m-n)!} \left(\frac{1}{m}\right)^n \]
which agrees with our analysis above (multiply the $\frac{1}{m}$ terms into the factorials). Hence, to prove that uniformity minimizes the likelihood of collisions, we want to show that \[ \sum_{S \in \mathcal{S(m,n)}} \prod_{i \in S} p_i \leq \frac{m!}{(m-n)!} \left(\frac{1}{m}\right)^n \] for any choice of probabilities $p_1, p_2, \dots, p_m$.

We can assume that $n \geq 2$ since the claim is trivial otherwise. We can then break up the summation using the following definitions:
  1. Let $\mathcal{A}$ be the set of all $S \in \mathcal{S(m,n)}$ where $1,2 \notin S$. 
  2. Let $\mathcal{B}$ be the set of all $S \in \mathcal{S(m, n - 1)}$ where $1,2 \notin S$. 
  3. Let $\mathcal{C}$ be the set of all $S \in \mathcal{S(m, n - 2)}$ where $1,2 \notin S$.
It follows that \[ P(X = 0) = \sum_{S \in \mathcal{A}} \prod_{i \in S} p_i  + n(p_1 + p_2) \sum_{S \in \mathcal{B}} \prod_{i \in S} p_i  + n(n-1)p_1p_2 \sum_{S \in \mathcal{C}} \prod_{i \in S} p_i . \] Observe that this sum is invariant to changes (exclusive) to $p_1$ and $p_2$ (that preserve a valid probability distribution) except for the $p_1p_2$ term, which is maximized when $p_1 = p_2$. It follows that $P(X = 0)$ is maximized when $p_1 = p_2$. Since $p_1$ and $p_2$ were chosen arbitrarily, it follows that all probabilities must be equal when $P(X = 0)$ is maximized; that is, the uniform distribution maximizes $P(X = 0)$.

This result may seem to be of little practical importance. It turns out, however, that the notion of collisions is of vital importance in computer science. In the study of data structures, hash tables work by storing elements into one of $m$ buckets. The data structure is more efficient when fewer elements are assigned to the same bucket (that is, when there are fewer collisions). How an element is mapped to a bucket is determined by what is called a hash function. By the above analysis, it follows that we prefer hash functions that distribute elements more uniformly. How one constructs such a hash function is a subject for another post...

Wednesday, January 9, 2013

The Platonic Solids

Our most recent post took us back to antiquity and discussed some of the early developments in geometry. Today we will conjoin this discussion with the recurring theme of randomness. As with many pedagogical introductions to probability, we begin our discussion by considering the properties of dice.

Dice have been used throughout the ancient world as early as 5000 years ago. In the Burnt City, located in the far east of modern-day Iran, an ancient game of backgammon was excavated (dating back to circa 3000BC). The dice uncovered with this backgammon set are the oldest known to exist.

An old backgammon set.
A backgammon set recovered from a Swedish warship that sunk in August 1628.

Dice add a compelling mechanic to board games. They introduce fair chance by allowing players to conveniently sample from a uniformly distributed set of discrete events. Most of us are familiar with the standard 6-sided die, which takes on the shape of a cube. However, those familiar with Dungeons & Dragons know well that dice can take on variety of different shapes with a different number of possible outcomes.

A typical variety of dice used in the board game Dungeons & Dragons.

The general geometry of a die can be thought of as a polyhedron. To guarantee that any toss will eventually stop on exactly one face of the die, we consider only convex polyhedra. For the sake of uniformity, we limit our consideration further to only those convex polyhedra for which each side is equally likely to turn up.

One way to guarantee uniformity is by enforcing a kind of symmetry in the shape of the polyhedron. For instance, if each face is indistinguishable from all other faces, we can argue that each face is equally likely to turn up. These geometric structures are referred to as face-transitive (or isohedral) convex polyhedra. The trapezohedron is an example of such a shape, as depicted below for 12 faces.

A 12-face trapezohedron makes a fair 12-sided die.

Though the face-transitive polyhedra are in a sense globally symmetric, they are not locally symmetric. That is, if one considers an individual face, it is not necessarily a regular polygon. If we further restrict the faces in this manner, we arrive at what are called the convex regular polyhedra. As we will prove later, it turns out that there are exactly 5 convex regular polyhedrons, famously known as the Platonic solids.

The 5 platonic solids.

Named after the Greek philosopher Plato, the Platonic solids were deeply interwoven with the philosophies of ancient Greece. The familiar classical elements of ancient Greek philosophy (Earth, Wind, Fire, and Water) were each associated with a Platonic solid, the remaining fifth solid (the Icosahedron) being elevated to divine status.

These peculiar beliefs regarding the Platonic solids persisted beyond antiquity into the dark ages. Johannes Kepler, a pivotal figure in the scientific revolution, investigated ways of connecting the Platonic solids to the motion of the planets. Fortunately, more accurate data on the motions of the planets (made available thanks to Tycho Brahe), led Kepler to derive a far more accurate description of the motion of planets (see Kepler's laws of planetary motion).

Kepler's model of the solar system based on the Platonic solids.

We mentioned previously that the Platonic solids capture all convex regular polyhedra. This fact has been known since antiquity, and a proof of this is included in Euclid's Elements. The proof goes something like the following.

Consider a vertex in a convex regular polyhedron. By convexity, the angles made by the edges incident this vertex must sum to a value less than $2\pi$. Since each vertex must have degree at least 3, we can immediately exclude faces of degree greater than 5. Indeed, the interior angles of a regular n-sided polygon are exactly $\pi(1 - \frac{2}{n})$, which exceeds $\frac{2\pi}{3}$ for all $n \geq 6$. Thus, we only need consider the different ways of producing convex regular polyhedra out of triangles, squares, and pentagons.

The interior angles of triangles are $\frac{\pi}{3}$ radians. Thus, a convex regular polyhedron consisting of triangular faces could have vertices of degree 3, 4, or  5 (degree 6 or more would defy convexity for the reason illustrated above). These polyhedra are the tetrahedron, the octahedron, and the icosahedron.

The interior angles of squares are $\frac{\pi}{2}$ radians. Thus, a convex regular polyhedron consisting of square faces could only have vertices of degree 3, which corresponds to the cube. Likewise, the interior angles of pentagons are $\frac{3\pi}{5}$ enforcing degree 3 vertices, which gives us the dodecahedron, the final Platonic solid.

We conclude today's post by noting another interesting property of these convex regular polyhedrons:

The dual of a Platonic solid is a Platonic solid. 

The dual of a polyhedron is the polyhedron that results from defining a vertex for each face and joining two vertices if their corresponding faces in the original polyhedron shared an edge. For example, the following figure depicts the cube and the octahedron, which are respectively the duals of one another.

The wonders of duality.

Thursday, November 22, 2012

Great Circles and Ham Sandwiches

Today we will digress from the theme of our recent posts and discuss some interesting geometric truths that your grandmother might even find interesting. To whet your appetite, consider the following claim:

There are always 2 opposite points on the Earth's equator that have exactly the same temperature.

It has been known for millennia that the Earth's shape can be approximated by a sphere. Greek philosophers as early as Pythagoras made claims of the Earth's spherical shape. Perhaps the most notable arguments from antiquities for a spherical Earth were given by Aristotle who observed, among other things, that the Earth cast a round shadow on the Moon during a lunar eclipse.

By the Hellenistic era (~300BC), the spherical Earth hypothesis was an established given. Moreover, the circumference of the Earth was estimated (with astounding accuracy) by Eratosthenes during this time period. Eratosthenes knew that the Sun was directly overhead during the Summer solstice in the ancient Egyptian city Syene. Aware of the distance from Alexandria to Syene, he was able to approximate the Earth's circumference using trigonometry and the length of a shadow cast in Alexandria during the Summer solstice. His mathematics was sound; had Eratosthenes been working with accurate measurements, his estimate of the Earth's meridional circumference would have been nearly exact.

It is now known that the approximate shape of the Earth is an oblate spheroid. Because the Earth rotates, its spheroidal shape compresses at the poles and bulges at the equator. For convenience, we will stick with the assumption that the Earth is a sphere, but the same arguments can be applied if we acknowledge the oblate structure.

The Earth's equator is an example of a great circle. Two points that lie opposite each other on a great circle are said to be diametrically opposed (or antipodal points). Let $C$ be any one of the Earth's great circles. Let $\tau$ be a continuous function mapping each point on the Earth to the temperature at that point. We will generalize the above claim by arguing that some pair of diametrically opposed points $p,q$ on $C$ satisfy $\tau(p) = \tau(q)$.

To prove this claim, we will introduce a new function $\delta(p,q) = \tau(p) - \tau(q)$, defined for all pairs of diametrically opposed points $p,q$ on $C$. Starting with any pair of diametrically opposed points $p,q$, consider the value of $\delta(p,q)$. If $\delta(p,q) = 0$, we have found the desired pair of points since $\delta(p,q) = 0$ only if $\tau(p) = \tau(q)$. Otherwise, observe that $\delta(p,q)$ must take on the opposite sign of $\delta(q,p)$. However, we can move from $p$ along the great circle in a continuous manner, evaluating $\delta$ at all intermediate pairs of diametrically opposed points until we reach $q$. Since $\delta$ changes sign during this process, there must be some intermediate pair of points $p^*, q^*$  for which $\delta(p^*, q^*) = 0$. This argument follows by the continuity of $\delta$ and can be formalized with the intermediate value theorem. Furthermore, observe that the claim holds for any continuous function over the Earth's surface (like barometric pressure or altitude).

A shrewd reader may realize that the above argument can easily be extended to show that there are infinitely many pairs of diametrically opposed points on the Earth's surface with the same temperature. In fact, the Borsuk-Ulam theorem states that, for any continuous mapping $f$ from the sphere onto $R^2$,  there is some pair of diametrically opposed points on the sphere that $f$ maps to the same point in $R^2$. This result can be summarized as follows.

At any point in time, there are two antipodal points on the Earth's surface with the same temperature and barometric pressure.

This claim works for any continuous function mapping points on the Earth's surface to points in $R^2$. In fact, the theorem generalizes to higher dimensional spheres, but the two-dimensional version will suffice to prove the next claim.


Ham Sandwich Theorem. A ham and cheese sandwich can always be cut in half such that each half contains the same amount of bread, ham, and cheese. 

We will formalize this problem as follows. Let $B$, $H$, and  $C$ be bounded subsets of $R^3$ corresponding respectively to the bread, ham, and cheese. Our goal is to find a plane that bisects $B$, $H$, and  $C$ simultaneously.

Let $p$ be a point on the unit sphere embedded in $R^3$ centered at the origin. Consider the set of all planes parallel to the tangent plane at $p$. We will imagine these planes as being oriented in the direction of the normal vector from the origin to $p$. Thus, we can refer to points as being right of the plane or left of the plane. By the intermediate value theorem, one of these planes must bisect $B$. This follows since if we choose a far off plane in one direction, all of the bread will be left of the plane, and if we choose a far off plane in the opposite direction, all of the bread will be right of the plane.

We can therefore map every point $p$ on the sphere to a plane $\pi(p)$ that bisects $B$. Moreover, if we define $h(p)$ and $c(p)$ to be the amount of ham and cheese, respectively, right of the plane $\pi(p)$, then $f(p) = (h(p), c(p))$ is a continuous function from each point on the sphere to $R^2$. Indeed, we can always choose the plane $\pi(p)$ in an unambiguous manner (say, preferring the middle-most plane bisecting $H$) to guarantee continuity.

By the Borsuk-Ulam theorem, there exists two antipodal points $p$ and $q$ on the unit sphere for which $f(p) = f(q)$. However, since $p$ and $q$ are antipodal, it follows that $\pi(p) = \pi(q)$ (they have the same set of planes, and thus our choice of the middle-most plane is identical). But, since $\pi(p)$ and $\pi(q)$ are oriented in opposite directions, the fact that $f(p) = f(q)$ implies that $\pi(p)$ bisects both $H$ and $C$. This concludes the proof as $\pi(p)$ also bisects $B$ by definition.

As mentioned above, these results generalize to higher dimensions. We will leave it as an exercise to come up with some interesting four-dimensional example of the Ham Sandwich theorem. Be festive! It's Thanksgiving today in the United States, so try your best to imagine a four-dimensional turkey draped in four-dimensional cranberry sauce.

Thursday, November 8, 2012

Permutation Statistics

In our previous post, we discussed the inconceivably large number of ways that $n$ numbers can be permuted. The magnitude of $n!$ for even reasonably small values of $n$ is nearly impossible to comprehend in any physical sense. To put things in a somewhat tangible perspective, consider the number $59! \approx 1.4 \times 10^{80}$.

If we employ some elementary stellar physics we can approximate the number of hydrogen atoms in a typical star at around $10^{57}$. Though incredibly large, we haven't come close to the measure of $59!$. A typical galaxy, like the one we call home, has around 400 billion stars. Basic arithmetic then reveals that such a galaxy has around $4 \times 10^{68}$ hydrogen atoms, still meager when compared to $59!$. Estimating the number of galaxies in the observable universe at around 80 billion, we can approximate the number of observable hydrogen atoms in the universe at $~3.2 \times 10^{79}$, which constitutes ~74% of all (baryonic) matter. Only after considering all atoms in the observable universe can we find a physical comparison to $59!$. One seeking quantities analogous to larger factorials will quickly run out of universe.

Despite the apparent complexity of enumerating permutations, mathematics provides a means to make precise universal statements about their properties. We discovered in our previous post that the fraction of permutations that are derangements converges to $1/e$ as $n$ tends to $\infty$. In another post, we learned about the distribution of cycles in permutations. In particular, we discovered that the expected number of cycles in a permutation converges to the $n$-th harmonic number $H_n$ as $n$ tends to $\infty$. In today's post, we will extend our list of interesting permutation statistics by exploring the notion of order.

When dealt a hand in a card game, it is common to arrange the cards in one's hand in a particular order. By comparing the rank of two cards (breaking ties by comparing the suits in alphabetical order), one can always arrange cards into their unique sorted order. This process of sorting is a fundamental concept in computer science, and a number of algorithms exist for sorting a set of $n$ elements. Before discussing any such sorting algorithms, we will first endeavor to get an idea of the extent to which a set of $n$ elements can be disordered.

The numbers $1,2,\dots,n$ serve not only as being pairwise distinct elements but also provide an implicit order $1 < 2 < \dots < n$. This order allows us to conveniently measure order-related statistics of random permutations. For instance, what is the expected number of elements that are already in their sorted position in a uniformly random permutation $\pi$ of $1,2,\dots,n$? We can compute this expectation as follows.

Define $X_i$ to be an indicator variable that is $1$ if $i$ is mapped to position $i$ by $\pi$ and $0$ otherwise. Then, the desired expectation can be expressed as
\[ E\left[\sum_{i = 1}^n X_i \right] \] which by linearity of expectation simplifies to
\[ \sum_{i = 1}^n E[X_i]. \] As $E[X_i] = P(X_i = 1) = 1/n$, it follows that on average $1$ element is mapped to its own position.

Another key measure of disorder is the number of inversions. An inversion in a permutation is a pair $i < j$ where $i$ occurs at a later position than $j$ (the two elements are out of order). What is the expected number of inversions in a uniformly random permutation $\pi$ of $1,2,\dots,n$?

Define $X_{i,j}$ to be an indicator variable that is $1$ if $i$ and $j$ form an inversion in $\pi$ and $0$ otherwise. Then, our expectation reduces to
\[ \sum_{i \neq j}P(X_{i,j} = 1) \] by linearity of expectation. Since $i$ and $j$ are in one of their two possible relative orders with equal probability, the expectation is thus $\frac{1}{2} {n \choose 2}$.

Let's consider another measure of sortedness. Suppose that we were to partition the numbers in a permutation into contiguous sorted subsequences. For example, the sorted subsequences for the permutation $1,6,7,3,4,5,2,8$ are $(1,6,7)$, $(3,4,5)$, and $(2,8)$. In this example, the permutation is composed of $3$ sorted subsequences. How many subsequences would one expect for a uniformly random permutation $\pi$ of $1,2,\dots,n$?

Observe that each sorted subsequence, with the exception of the first one, begins with a number that is smaller than its predecessor. Moreover, any such number must always start a new increasing subsequence. Thus, if we define $X$ to be the number of consecutive pairs $a,b$ where $a > b$, it follows that the desired expectation can be computed as
\[ 1 + E[X] = 1 + \sum_{i=1}^{n-1} E[X_i] \] where $X_i$ is $1$ if the number in position $i$ is larger than the number in position $i+1$ and $0$ otherwise. Since $E[X_i] = P(X_i = 1) = 1/2$, the desired expectation is $\frac{1}{2}(n+1)$.

The previous result should not be surprising because of the symmetry of increasing subsequences and decreasing subsequences. That is, each number in a permutation must either start an increasing subsequence or a decreasing subseqeuence, with the exception of the first number which starts both an increasing and decreasing subsequence. Thus, if $X$ is the number of increasing subsequences and $Y$ is the number of decreasing subsequences, $X + Y = n + 1$. By linearity of expectation, we can then conclude that
\[ E[X] = E[Y] = \frac{1}{2}(n + 1) \] since $E[X]$ must equal $E[Y]$ by symmetry.

By applying these straightforward techniques, we can discern a number of properties regarding the order of elements in a permutation. The next expectation we consider turns out to be a little more involved.

Let's pursue further this notion of increasing and decreasing subsequences. Suppose that we partitioned $\pi$ into contiguous subsequences that can be either increasing or decreasing. We will create this partition in a way that minimizes the number of parts as follows. Look at the first two numbers. If they are increasing, start building up an increasing subsequence. If they are decreasing, start building up a decreasing subsequence. When our subsequence cannot be extended any further, start over using the subsequent two numbers. For example, the subsequences for the permutation $9,5,1,6,10,2,8,7,3,4$ would be $(9,5,1)$, $(6, 10)$, $(2,8)$, $(7,3)$, and $(4)$. How many subsequences would one expect in such a partition for a uniformly random permutation $\pi$ of $1,2,\dots,n$?

Let $a, b, c$ be a consecutive subsequence in the permutation $\pi$. We say that $b$ is an extremal point if $b > a,c$ or $b < a,c$. A subsequence of $\pi$ of size $k>2$ is said to be alternating if it contains $k-2$ extremal points. For example, in the permutation $9,5,1,6,10,2,8,7,3,4$:
  • 1, 10, 2, 8, and 3 are the extremal points, and
  • (5,1,6), (6,10,2), (10,2,8), (2,8,7), (6,10,2,8), (10,2,8,7), (6,10,2,8,7), and (7,3,4) are the alternating subsequences.
For $k = 3,4,\dots,n$, let $X_k$ be the number of alternating subsequences in $\pi$ of size $k$. Let $X_{k,i}$ be an indicator variable that is $1$ if the subsequence of size $k$ starting at position $i$ is alternating and $0$ otherwise. Observe that
\[ X_k = \sum_{i=1}^{n-k+1} X_{k,i} \]
and therefore
\[ E[X_k] = \sum_{i=1}^{n-k+1} E[X_{k,i}] \]
by linearity of expectation. Let $\phi(k)$ be the number of ways of permuting $k$ numbers into an alternating sequence. We can then express $P(X_{k,i} = 1)$ as $\frac{1}{k!}\phi(k)$ from which it follows that
\[ E[X_k] = \frac{n-k+1}{k!}\phi(k)\]
since $E[X_{k,i}] = P(X_{k,i} = 1)$.

We can relate alternating subsequences to the number of increasing and decreasing subsequences in the following manner. Observe that with the exception of the first subsequence, the start of every increasing or decreasing subsequence follows an extremal point. Thus, if we define $X$ to be the number of subsequences in our partition, we can estimate $X \approx 1 + X_3$. However, suppose that $\pi$ contained an alternating subsequence of size $4$. It could have been that only one of the two extremal points in this alternating subsequence was followed by a new increasing or decreasing subsequence. Thus, we can refine our approximation by discounting all alternating subsequences of size $4$. That is, $X \approx 1 + X_3 - X_4$. This estimation goes awry whenever an alternating subsequence of size $4$ is actually part of an alternating subsequence of size $5$, in which case we would have discounted 1 too many subsequences. We can thus refine the approximation further as $X \approx 1 + X_3 - X_4 + X_5$. By the inclusion-exclusion principle, we can extend this reasoning to achieve the complete relation
\[X = 1 + \sum_{k=3}^n (-1)^{n-1} X_k \] from which the desired expectation can be computed as
\[E[X] = 1 + \sum_{k=3}^n (-1)^{n-1} \frac{n - k + 1}{k!}\phi(k). \]
The Taylor series for the function $2(\tan{x} - \sec{x})$ about the point $0$ is
\[-\phi(0) + \phi(1)x - \phi(2)\frac{x^2}{2!} + \phi(3)\frac{x^3}{3!} - \dots \] if we define $\phi(0) = \phi(1) = 1$. Observe the similarity that this expansion has with $E[X]$. We can relate the two with the identity
\[ E[X] = 1 + (n+1)(2\tan(1) - 2\sec(1) + 1) - \left(1 + \frac{d}{dx}\Biggl| 2(\tan{x} - \sec{x})\Biggr|_{x=1}\right) \] from which it follows that
\[ E[X] \sim (2\tan(1) - 2\sec(1) + 1)n \] as $n \to \infty$. Thus, we have shown that the expected number of increasing and decreasing subsequences converges to about $0.4132n$ as $n$ tends to $\infty$.

Wednesday, October 24, 2012

Beyond the Riffle

Our two previous posts illustrated how employing randomness and estimating probabilities can uncover surprisingly effective strategies for games. In both cases, we analyzed statistics of uniformly random permutations. We have yet to give a proper definition of a uniformly random permutation and explain whether or not one can be generated algorithmically. Today's post will rectify this shortcoming and give  another example of a game that relates to random permutations.

The number of ways $n$ distinct elements can be arranged into a linear order is $n!$. Each of these arrangements defines a unique permutation of the $n$ elements.  A uniformly random permutation is simply a random variable that takes on one of these $n!$ permutations, each with an equal probability. The growth rate of $n!$ is incredibly fast, as may be more apparent by Stirling's approximation
\[n! \sim \sqrt{2\pi n} \left( \frac{n}{e} \right)^n. \] Thus, it seems reasonable to ask whether a uniformly random permutation can actually be generated in practice.

Anyone familiar with playing cards will likely know a few different techniques for shuffling a deck. One of the most commonly used techniques is the riffle, but how effective is it? If one's goal is to have the cards in a uniformly random order, then the riffle is not very effective on its own (consider how likely it is that the card starting at the bottom of the deck ends up at the top). However, by properly formulating these concepts, the mathematician Persi Diaconis was able to prove that 7 riffles is sufficient for most games.

From a computational perspective, there turns out to be a very simple technique for generating a uniformly random permutation called the Fisher-Yates shuffle. This technique was popularized by the eminent computer scientist Donald Knuth, and he described the technique in detail in Volume 2 of his epic monograph The Art of Computer Programming. The technique is illustrated in the following code snippet:

# To shuffle a list A of n elements (A[1],...,A[n])
for i in 1..n do
    j = random from i,i+1,...,n
    swap A[i] and A[j]

Those familiar with computer science will notice that this algorithm has several desirable qualities. For example, the amount of auxiliary space required is small even when A is very large. Furthermore, the algorithm will run to completion quickly. In particular, the Fisher-Yates shuffle is an example of a linear time algorithm since the amount of computation performed is $O(n)$ for a list of size $n$.

We will now consider a game, wherein the problem is not about surviving (like in our previous posts) but instead about reducing the amount of time it takes to perform an action. This notion of computation time will be discussed in more detail in future posts, but for now it will suffice to defer to one's intuition.

A Secret Santa is a game/tradition often played during Christmas in western cultures. Each player is randomly assigned a unique person to whom they must anonymously give a gift. No two people can be assigned the same person, and no one person can be assigned themselves. How can one randomly sample among all valid assignments so that each is chosen with equal probability?

Observe that every valid assignment corresponds to a permutation. However, the converse is not true. In a permutation, a number can be mapped back to its own position, resulting in an invalid assignment. A permutation in which no number maps to its own position is called a derangement. We can therefore generate a Secret Santa assignment as follows:
  1. Generate an assignment from a uniformly random permutation using the Fisher-Yates shuffle.
  2. Repeat the previous step until no person is assigned to themselves.
The final assignment we end up with will be a uniformly random derangement since we consider all possible assignments with equal probability in step (1). We will conclude the discussion by considering how long this procedure takes to complete.

Our assignment procedure is an example of a randomized algorithm. In particular, the amount of time it takes for the procedure to complete is not deterministic. Thus, we need to treat this runtime as a random variable, and in particular, we would like to know the expectation of this random variable; that is, we want to compute the expected runtime of the algorithm.

Let $p$ be the probability that a uniformly random permutation is a derangement. In a given iteration, our algorithm will therefore complete with probability $p$ and continue with probability $1-p$. Thus, we can think of the algorithm as a sequence of independent trials with success probability $p$, where the runtime is given by the number of trials before the first success. That is, the runtime is given by a geometric distribution, which has expectation $1/p$. We can therefore complete the analysis by estimating $p$.

For a uniformly random permutation $\pi$, let $A_i$ be the event that $i$ is mapped to its own position for $i=1,\dots,n$. The probability that $\pi$ is a derangement can therefore be expressed as $P(\overline{A_1} \cap \overline{A_2} \cap \dots \cap \overline{A_n})$. Observe that these events are not independent. The probability $P(\overline{A_i})$ is simply $(1 - 1/n)$ since $i$ has a $1/n$ chance of being mapped to its own position. However, we can argue that $P(\overline{A_i} | \overline{A_1} \cap \dots \cap \overline{A_{i-1}}) \geq P(\overline{A_i})$ since knowing that none of the first $i-1$ elements mapped to their own positions can only decrease the likelihood that $i$ maps to its own position. Thus, we can proceed to bound $p$ from below as
\[ P(\overline{A_1} \cap \overline{A_2} \cap \dots \cap \overline{A_n}) \geq P(\overline{A_1})P(\overline{A_2})\cdots P(\overline{A_n}), \] the right-hand side of which equates to $(1 - 1/n)^n$. Using the definition
\[e^x = \lim_{n \to \infty} \left( 1 + \frac{x}{n} \right)^n \] it follows that $p \geq 1/e$ as $n \to \infty$.

We have not yet made any claim to how accurate this estimate is, but rest assured that this (yet another) surprise appearance of the illustrious $e$ is no coincidence. To better bound this estimate we will employ the inclusion-exclusion principle. The argument goes as follows.

Suppose we start with the trivial approximation that all permutations are derangements. This approximations gives the crude estimate $p \approx 1$.  We can improve this by discounting all permutations in which $i$ occurs in its own position for $i = 1,\dots,n$. That is, we can estimate
\[ p \approx 1 - \sum_{i=1}^n \frac{1}{n} \] since $i$ occurs in its own position with probability $1/n$. This approximation gives the slightly less crude estimate $p = 0$. The error in this approximation is that we double counted. Permutations in which distinct $i$ and $j$ both occurred in their own positions where discounted twice, once for $i$ and once for $j$. Thus, we can improve our approximation by adding back these permutations, giving the estimate
\[ p \approx 1 - 1 + \sum_{1 \leq i < j \leq  n} \frac{1}{n(n-1)} \] since the probability that distinct $i$ and $j$ both occur in their own positions is $1/(n(n-1))$. Computing the summation gives the slightly better estimate $p=1/2$. The error in this approximation is again a form of double counting. If all distinct $i$, $j$, and $k$ occurred in their own positions, we would have added them back twice (instead of once). Thus, we can repeat the argument to improve the estimate to
\[ p \approx 1 - 1 + 1/2 - \sum_{1 \leq i < j < k \leq n} \frac{1}{n(n-1)(n-2)} \] since the probability that distinct $i$, $j$, and $k$ occur in their own positions is $1/(n(n-1)(n-2))$. Computing this summation gives the improved estimate $p = 1/3$. If we continue this reasoning, eventually our summation will terminate with the $n$-th estimate, giving the exact probability
\[ p = \sum_{k=0}^n \frac{(-1)^k}{k!}. \] This summation is the first $n$ terms of the Taylor series for the exponential function $e^x$ evaluated at $x=-1$. Thus, we can conclude that as $n \to \infty$ $p \to 1/e$. Hence, the simple trial and error approach to generating a uniformly random derangement takes only $e=2.7182818...$ iterations on average. Not too shabby.

We leave you with a straightforward exercise:

Using Stirling's approximation and estimates on how often people play card games, show that no two truly random decks will ever, in human history, be the same (except with an inconceivably low probability).