Tuesday, September 10, 2013

The Sieve of Eratosthenes

In our last post, we introduced some basic concepts from number theory and, in particular, the concept of prime numbers. In today's post, we continue this foray by discussing some of the computational aspects of prime numbers.

Eratosthenes of Cyrene is often credited for inventing the discipline of geography. He invented the system of longitude and latitude and is famous for accurately predicting the surface area of the Earth during antiquities (as described in this post).

A portrait of Eratosthenes

Another vestigial mathematical development of Eratosthenes is his famous sieve. The sieve provides an elegant method for identifying prime numbers and serves as one of the earliest examples of number-theoretic algorithms.

We know from the prime number theorem that there is $O(\frac{n}{\log{n}})$ prime numbers from $1,2,\dots,n$, but can we find them? Is there a systematic way to identify all of the prime numbers from $1$ to $n$? The sieve of Eratosthenes solves precisely this task and does so efficiently.

We will describe the algorithm as though we are equipped with a pen and papyrus:
  1. Start by writing down all numbers from $2$ to $n$.
  2. The first number must be prime. Underline it, and cross off all multiples of this number.
  3. Repeat 2 starting at the first unmarked number (i.e. the first number that is neither crossed off nor underlined). Once all numbers are marked, the underlined numbers are exactly the primes from $1$ to $n$.
To illustrate the idea further, consider the following example where $n = 10$.
  1. 2, 3, 4, 5, 6, 7, 8, 9, 10; write down all numbers from 2 to 10.
  2. 2, 3, 4, 5, 6, 7, 8, 9, 10; underline 2, cross off its multiples (4, 6, 8, 10).
  3. 2, 3, 4, 5, 6, 7, 8, 9, 10; underline 3, cross off its multiples (6, 9).
  4. 2, 3, 4, 5, 6, 7, 8, 9, 10; underline 5, cross off its multiples (10).
  5. 23, 4, 5, 6, 7, 8, 9, 10; underline 7, cross off its multiples (n/a).
  6. The underlined numbers 2, 3, 5, and 7 are prime.
One considering the problem of outputting all prime numbers from $1$ to $n$ might have first taken a different approach. Another simple algorithm is the following:

# To output all primes from 1 to n
for i in 2..n do
  if no j in 2..i-1 divides i: output i

Though simple, this approach is far less efficient. For a given number $i$, checking whether it is divisible by any smaller number from $2$ to $i-1$ is expensive. That is, for each $i$ we perform $i-2$ division tests, giving a total of \[1 + 2 + 3 + \dots + (n - 3) + (n - 2) = 1/2(n-1)(n-2)\] division tests.

A shrewd observer might notice that we can modify the second loop to perform division tests for only $j = 2$ to $\lfloor \sqrt{i} \rfloor$. This optimization is justified since if there exists an $a$ and $b$ such that $ab = i$, then the smaller of $a$ and $b$ must be at most $\sqrt{i}$. Leveraging this optimization, the runtime is reduced from $O(n^2)$ to $O(n^\frac{3}{2})$.

How does this compare to the sieve of Eratosthenes? To better analyze the computation costs, we will rewrite the algorithm in pseudocode:

# To output all primes from 1 to n
let L = [2..n]
for i in L do:
  if i is not marked:
    output i
    for j in i,2i,3i,...,(n/i)i:
      mark j

Notice that the second loop is only entered for numbers $i$ that are prime and for a given number $i$, it makes $n/i$ iterations. Thus, the total runtime can be bounded by \[ \sum_{p}\frac{n}{p} \] which simplifies to \[n \sum_{p} \frac{1}{p}\] where both sums are over all primes $p \leq n$.

We can bound the sum $ \sum_{p} \frac{1}{p} $ by the harmonic series $ \sum_{i = 1}^n \frac{1}{i} $ which we know from a previous post is $O(\log{n})$. Thus, we have shown that the runtime of the sieve is $O(n\log{n})$, which is quite an improvement over the other algorithm's $O(n^\frac{3}{2})$. However, this analysis is not tight. It turns out that the sum is exponentially smaller; that is, by Euler's theorem on the divergence of the sum of the reciprocals of the primes, we have that $ \sum_{p} \frac{1}{p}$ is $O(\log{\log{n}})$, and thus, the total runtime is $O(n\log{\log{n}})$.

We leave you with a few exercises:
  1. Can you find a way to improve the sieve of Eratosthenes by avoiding "double marking" of numbers?
  2. We showed that the naive algorithm can be implemented using $O(n^\frac{3}{2})$ division tests, but how expensive are division tests? Come up with a way to test if $i$ divides $n$ in $O(\log{n})$ time. 

Thursday, August 15, 2013

1, 2, 3, and to the 4...

With counting being such an integral part of our everyday lives, it's nearly impossible to imagine a world without numbers. How could one trade goods without measuring the quantities being exchanged? How might a shepherd, unable to count sheep, learn that wolves were curtailing his flock? It is not surprising that there are clear remnants of counting systems in use by humans from as long as 30,000 years ago.

One with some background in anthropology will know that the bones of human skeletons (and other animals) were used by the earliest humans to make a myriad of bone tools. Tools for hunting and for building were commonplace, but archaeologists have also recovered ancient bone tools used for counting. The Ishango bone, a tally stick made from a baboon fibula, is thought to be more than 20,000 years old. In fact, the word tally has its origins in the Latin word talus, which literally means heel (think of the derivative word talon).

In one way or another, humans have been counting since our earliest days. However, it was only around 5000 years ago when the ancient Babylonians and Egyptians began the first endeavors into true mathematics. By a series of abstractions, counting led the numeral systems which led to arithmetic and eventually to algebra and geometry. Later, the Greeks laid the foundation for a still actively studied subject called Number Theory, which we'll discuss today.

Let's start with the natural numbers. That is, let $\mathbb{N}$ be the numbers $1, 2, 3, \dots$. For the remainder of this post, we will use the word number to mean a natural number unless otherwise stated.

If a number can be decomposed into a product of two non-unit numbers it is said to be composite. For example, the number $12$ is composite because it can be written as the product of $4$ and $3$; that is, $12 = 4 \cdot 3$. All remaining non-unit numbers are called prime.

Observe that the divisor $4$ of $12$ is also composite since $4 = 2 \cdot 2$. Thus, we can write $12$ as a product of primes $12 = 2 \cdot 2 \cdot 3$. This expression is called a prime factorization. The fact that every non-unit number has a unique prime factorization is known as the fundamental theorem of arithmetic.

That each number $n$ can be written as a unique factorization of primes $n = p_1 p_2  \dots$ leads one to ask questions about these prime numbers. How many are there? What is their distribution like? Do they follow a simple formula? Can we use an algorithm to test if a number is prime? Finding answers to this class of questions has been the focus of research in number theory for millennia and still to this day. We address the first question today and will discuss some elementary results regarding the distribution of primes.

The fact that there are infinitely many primes is known as Euclid's theorem. Every mathematician should know this theorem and Euclid's proof. It goes something like this:

Suppose, by way of contradiction, that there are finitely many primes. There is thus a largest prime $n$. Consider the number $k = 2 \cdot 3 \cdots (n-1) \cdot n + 1$. Clearly $k$ is larger than $n$ and hence, by our assumption, it must be divisible by a prime $p \leq n$. But, observe that $p$ cannot divide $k$ as it would leave a remainder of $1 / p$. This contradiction completes our proof, and there are thus infinitely many primes.

So, now that we know that there are infinitely many primes, where are they? A first step to addressing this question might be to ask how many primes tend to be less than a given number $n$. In other words, as we step through the natural numbers, how often do we come across a number that is prime?

Consider the number ${2n \choose n} = \frac{2n!}{n!n!}$. Observe that every prime number $p$ in the range $n \leq p < 2n$ is in the prime factorization of ${2n \choose n }$, and hence
\[\prod_{n \leq p < 2n} p \leq {2n \choose n}.\] We can furthermore bound ${2n \choose n}$ using Stirling's approximation (see this post) to show that ${2n \choose n} \leq 4^n$. Hence, it follows that
\[ \prod_{n \leq p < 2n} p \leq 4^n. \] We can use this inequality to bound the frequency of primes. Let $\pi(n)$ be the numbers of primes less than $n$. Since
\[n^{\pi(2n) - \pi(n)} \leq \prod_{n \leq p < 2n} p\] we can conclude that
\[n^{\pi(2n) - \pi(n)} \leq 4^n\] which simplifies to
\[\pi(2n) - \pi(n) \leq \log{4} \frac{n}{\log{n}}\] by taking the logarithm of both sides. By induction, it follows immediately that $\pi(n)$ is $O(\frac{n}{\log{n}})$.

We can use a similar approach to show an equivalent lower bound; that is, there are constants $c_1,c_2$ such that
\[ c_1 \frac{n}{\log{n}} \leq \pi(n) \leq c_2 \frac{n}{\log{n}}. \] One of the great achievements in mathematics in the 19th century was a proof that the inequality holds when $c_1 = c_2 = 1$ as $n$ tends to infinity. This result is known as the prime number theorem, which can be restated as
\[ \pi(x) \sim \frac{x}{\log{x}}. \]
The fun of prime numbers does not stop here. Understanding the distribution of primes is at the core of solving the Riemann hypothesis, one of the most famous outstanding problems in all of mathematics. Moreover, prime numbers have extremely practical applications (e.g. RSA encryption), where one requires algorithms to find large prime numbers. We will discuss some of these topics in later posts.

Wednesday, January 23, 2013

The Birthday Paradox

Imagine that you are sitting in a classroom with 22 other people. How likely is it that two people in this room share a birthday? As it turns out, this probability exceeds 50%; that is, more than half the time, two people in a room of 23 will share a birthday. This result may seem counterintuitive. In fact, this likelihood disagrees so often with human intuition that it is famously known as the birthday paradox. In today's post, we will derive this probability using the probabilistic techniques established in previous posts (see our post on permutation statistics for an overview of these techniques).

Let's start by defining $B_1,B_2,\dots,B_n$, where $B_i$ is a random variable for the birthday of person $i$. We will assume that each $B_i$ is independent and identically distributed with a range of values $1,2\dots,365$ (not a leap year). A collision is a pair $B_i,B_j$ for which $B_i = B_j$ (and $i \neq j$). Let $X$ be a random variable for the number of collisions. To resolve the birthday paradox, we want to know the probability that $X$ is at least 1 as a function of $n$.

Let's first examine the expectation of $X$. We can compute $X$ as a sum of indicator variables for each possible collision, from which it follows that
\[E[X] = \sum_{i < j} P(B_i = B_j), \] by linearity of expectation. If we assume that the birthdays are uniformly distributed, it follows that \[E[X] = \frac{1}{365}{n \choose 2}\] as $P(B_i = B_j) = \frac{1}{365}$ for each $i \neq j$. For what value of $n$ might we expect at least one collision? From the equation for $E[X]$, it follows that $E[X] \geq 1$ when $n \geq 28$. This analysis does not address the problem in question but does lend some intuition as to why the claim is true. The idea is that as $n$ increases, the number of ways a collision might occur increases at rate of $O(n^2)$. This quadratic growth tells us that we can start to expect a collision when $n$ exceeds (some constant factor times) $\sqrt{m}$, where $m$ is the number of outcomes (in this case $m=365$).

Let's take a step back and try to better estimate $P(X \geq 1)$. We will generalize the problem slightly by assuming that there are $m$ possible outcomes and maintain the assumption that each is equally likely. Let $Y_i$ be the event that the $i$-th outcome (the $i$-th person's birthday) does not collide with any prior outcomes. It then follows that the probability of no collisions is
\[ P(X = 0) = P(Y_1 \cap Y_2 \cap Y_3 \cap \dots \cap Y_n), \] which expands to
\[P(Y_1)P(Y_2 | Y_1)P(Y_3 | Y_1,Y_2) \dots P(Y_n | Y_1,Y_2,\dots,Y_{n-1}),\] which is simply
\[1\left(1-\frac{1}{m}\right)\left(1-\frac{2}{m}\right)\dots \left(1 - \frac{n-1}{m}\right)\] since the $i$-th outcome need simply avoid $i-1$ distinct outcomes out of $m$.

Through some simple analysis, one can show the well known inequality that $1 + x \leq e^x$ for all $x$. Thus, substituting this bound in our above derivation shows that
\[ P(X = 0) \leq e^{-\frac{1}{m}}e^{-\frac{2}{m}}\dots e^{-\frac{n-1}{m}}\] which simplifies to
\[ P(X = 0) \leq e^{-\frac{n(n-1)}{2m}} \] since $1 + 2 + \dots + n-1 = \frac{n(n-1)}{2}$. To resolve the original question we want to know how large $n$ must be for $P(X = 0) \leq 1/2$. Taking the natural logarithm of the above inequality shows that
\[n(n-1) \geq \ln(4)m \] which, for $m = 365$, is satisfied when $n \geq 23$.

Initially, we stated that we only need to assume that each $B_i$ is independent and identically distributed for the birthday paradox to hold. This claim should be fairly intuitive since if birthdays are less uniformly distributed, the likelihood of a collision should increase. Moreover, it turns out that birthdays tend not to be uniformly distributed in practice.

Data source: NYTimes.com, Amitabh Chandra, Harvard University

We can provide a more general analysis by assuming that the $i$-th birthday occurs with probability $p_i$. A collision-free assignment of birthdays is one in which all $n$ people are assigned unique birthdays from the set of $m$ possible birthdays. Let $\mathcal{S(m,n)}$ be the set of all collision-free assignments. The probability of a particular assignment $S \in \mathcal{S(m,n)}$ is simply the product of the probabilities associated with $S$. Thus, it follows that \[P(X = 0) = \sum_{S \in \mathcal{S(m,n)}} \prod_{i \in S} p_i.\] In a uniform distribution, $p_i = \frac{1}{m}$ and the probability reduces to
\[ \frac{m!}{(m-n)!} \left(\frac{1}{m}\right)^n \]
which agrees with our analysis above (multiply the $\frac{1}{m}$ terms into the factorials). Hence, to prove that uniformity minimizes the likelihood of collisions, we want to show that \[ \sum_{S \in \mathcal{S(m,n)}} \prod_{i \in S} p_i \leq \frac{m!}{(m-n)!} \left(\frac{1}{m}\right)^n \] for any choice of probabilities $p_1, p_2, \dots, p_m$.

We can assume that $n \geq 2$ since the claim is trivial otherwise. We can then break up the summation using the following definitions:
  1. Let $\mathcal{A}$ be the set of all $S \in \mathcal{S(m,n)}$ where $1,2 \notin S$. 
  2. Let $\mathcal{B}$ be the set of all $S \in \mathcal{S(m, n - 1)}$ where $1,2 \notin S$. 
  3. Let $\mathcal{C}$ be the set of all $S \in \mathcal{S(m, n - 2)}$ where $1,2 \notin S$.
It follows that \[ P(X = 0) = \sum_{S \in \mathcal{A}} \prod_{i \in S} p_i  + n(p_1 + p_2) \sum_{S \in \mathcal{B}} \prod_{i \in S} p_i  + n(n-1)p_1p_2 \sum_{S \in \mathcal{C}} \prod_{i \in S} p_i . \] Observe that this sum is invariant to changes (exclusive) to $p_1$ and $p_2$ (that preserve a valid probability distribution) except for the $p_1p_2$ term, which is maximized when $p_1 = p_2$. It follows that $P(X = 0)$ is maximized when $p_1 = p_2$. Since $p_1$ and $p_2$ were chosen arbitrarily, it follows that all probabilities must be equal when $P(X = 0)$ is maximized; that is, the uniform distribution maximizes $P(X = 0)$.

This result may seem to be of little practical importance. It turns out, however, that the notion of collisions is of vital importance in computer science. In the study of data structures, hash tables work by storing elements into one of $m$ buckets. The data structure is more efficient when fewer elements are assigned to the same bucket (that is, when there are fewer collisions). How an element is mapped to a bucket is determined by what is called a hash function. By the above analysis, it follows that we prefer hash functions that distribute elements more uniformly. How one constructs such a hash function is a subject for another post...

Wednesday, January 9, 2013

The Platonic Solids

Our most recent post took us back to antiquity and discussed some of the early developments in geometry. Today we will conjoin this discussion with the recurring theme of randomness. As with many pedagogical introductions to probability, we begin our discussion by considering the properties of dice.

Dice have been used throughout the ancient world as early as 5000 years ago. In the Burnt City, located in the far east of modern-day Iran, an ancient game of backgammon was excavated (dating back to circa 3000BC). The dice uncovered with this backgammon set are the oldest known to exist.

An old backgammon set.
A backgammon set recovered from a Swedish warship that sunk in August 1628.

Dice add a compelling mechanic to board games. They introduce fair chance by allowing players to conveniently sample from a uniformly distributed set of discrete events. Most of us are familiar with the standard 6-sided die, which takes on the shape of a cube. However, those familiar with Dungeons & Dragons know well that dice can take on variety of different shapes with a different number of possible outcomes.

A typical variety of dice used in the board game Dungeons & Dragons.

The general geometry of a die can be thought of as a polyhedron. To guarantee that any toss will eventually stop on exactly one face of the die, we consider only convex polyhedra. For the sake of uniformity, we limit our consideration further to only those convex polyhedra for which each side is equally likely to turn up.

One way to guarantee uniformity is by enforcing a kind of symmetry in the shape of the polyhedron. For instance, if each face is indistinguishable from all other faces, we can argue that each face is equally likely to turn up. These geometric structures are referred to as face-transitive (or isohedral) convex polyhedra. The trapezohedron is an example of such a shape, as depicted below for 12 faces.

A 12-face trapezohedron makes a fair 12-sided die.

Though the face-transitive polyhedra are in a sense globally symmetric, they are not locally symmetric. That is, if one considers an individual face, it is not necessarily a regular polygon. If we further restrict the faces in this manner, we arrive at what are called the convex regular polyhedra. As we will prove later, it turns out that there are exactly 5 convex regular polyhedrons, famously known as the Platonic solids.

The 5 platonic solids.

Named after the Greek philosopher Plato, the Platonic solids were deeply interwoven with the philosophies of ancient Greece. The familiar classical elements of ancient Greek philosophy (Earth, Wind, Fire, and Water) were each associated with a Platonic solid, the remaining fifth solid (the Icosahedron) being elevated to divine status.

These peculiar beliefs regarding the Platonic solids persisted beyond antiquity into the dark ages. Johannes Kepler, a pivotal figure in the scientific revolution, investigated ways of connecting the Platonic solids to the motion of the planets. Fortunately, more accurate data on the motions of the planets (made available thanks to Tycho Brahe), led Kepler to derive a far more accurate description of the motion of planets (see Kepler's laws of planetary motion).

Kepler's model of the solar system based on the Platonic solids.

We mentioned previously that the Platonic solids capture all convex regular polyhedra. This fact has been known since antiquity, and a proof of this is included in Euclid's Elements. The proof goes something like the following.

Consider a vertex in a convex regular polyhedron. By convexity, the angles made by the edges incident this vertex must sum to a value less than $2\pi$. Since each vertex must have degree at least 3, we can immediately exclude faces of degree greater than 5. Indeed, the interior angles of a regular n-sided polygon are exactly $\pi(1 - \frac{2}{n})$, which exceeds $\frac{2\pi}{3}$ for all $n \geq 6$. Thus, we only need consider the different ways of producing convex regular polyhedra out of triangles, squares, and pentagons.

The interior angles of triangles are $\frac{\pi}{3}$ radians. Thus, a convex regular polyhedron consisting of triangular faces could have vertices of degree 3, 4, or  5 (degree 6 or more would defy convexity for the reason illustrated above). These polyhedra are the tetrahedron, the octahedron, and the icosahedron.

The interior angles of squares are $\frac{\pi}{2}$ radians. Thus, a convex regular polyhedron consisting of square faces could only have vertices of degree 3, which corresponds to the cube. Likewise, the interior angles of pentagons are $\frac{3\pi}{5}$ enforcing degree 3 vertices, which gives us the dodecahedron, the final Platonic solid.

We conclude today's post by noting another interesting property of these convex regular polyhedrons:

The dual of a Platonic solid is a Platonic solid. 

The dual of a polyhedron is the polyhedron that results from defining a vertex for each face and joining two vertices if their corresponding faces in the original polyhedron shared an edge. For example, the following figure depicts the cube and the octahedron, which are respectively the duals of one another.

The wonders of duality.