Processing math: 100%

Thursday, November 8, 2012

Permutation Statistics

In our previous post, we discussed the inconceivably large number of ways that n numbers can be permuted. The magnitude of n! for even reasonably small values of n is nearly impossible to comprehend in any physical sense. To put things in a somewhat tangible perspective, consider the number 59!1.4×1080.

If we employ some elementary stellar physics we can approximate the number of hydrogen atoms in a typical star at around 1057. Though incredibly large, we haven't come close to the measure of 59!. A typical galaxy, like the one we call home, has around 400 billion stars. Basic arithmetic then reveals that such a galaxy has around 4×1068 hydrogen atoms, still meager when compared to 59!. Estimating the number of galaxies in the observable universe at around 80 billion, we can approximate the number of observable hydrogen atoms in the universe at  3.2×1079, which constitutes ~74% of all (baryonic) matter. Only after considering all atoms in the observable universe can we find a physical comparison to 59!. One seeking quantities analogous to larger factorials will quickly run out of universe.

Despite the apparent complexity of enumerating permutations, mathematics provides a means to make precise universal statements about their properties. We discovered in our previous post that the fraction of permutations that are derangements converges to 1/e as n tends to . In another post, we learned about the distribution of cycles in permutations. In particular, we discovered that the expected number of cycles in a permutation converges to the n-th harmonic number Hn as n tends to . In today's post, we will extend our list of interesting permutation statistics by exploring the notion of order.

When dealt a hand in a card game, it is common to arrange the cards in one's hand in a particular order. By comparing the rank of two cards (breaking ties by comparing the suits in alphabetical order), one can always arrange cards into their unique sorted order. This process of sorting is a fundamental concept in computer science, and a number of algorithms exist for sorting a set of n elements. Before discussing any such sorting algorithms, we will first endeavor to get an idea of the extent to which a set of n elements can be disordered.

The numbers 1,2,,n serve not only as being pairwise distinct elements but also provide an implicit order 1<2<<n. This order allows us to conveniently measure order-related statistics of random permutations. For instance, what is the expected number of elements that are already in their sorted position in a uniformly random permutation π of 1,2,,n? We can compute this expectation as follows.

Define Xi to be an indicator variable that is 1 if i is mapped to position i by π and 0 otherwise. Then, the desired expectation can be expressed as
E[ni=1Xi] which by linearity of expectation simplifies to
ni=1E[Xi]. As E[Xi]=P(Xi=1)=1/n, it follows that on average 1 element is mapped to its own position.

Another key measure of disorder is the number of inversions. An inversion in a permutation is a pair i<j where i occurs at a later position than j (the two elements are out of order). What is the expected number of inversions in a uniformly random permutation π of 1,2,,n?

Define Xi,j to be an indicator variable that is 1 if i and j form an inversion in π and 0 otherwise. Then, our expectation reduces to
ijP(Xi,j=1) by linearity of expectation. Since i and j are in one of their two possible relative orders with equal probability, the expectation is thus 12(n2).

Let's consider another measure of sortedness. Suppose that we were to partition the numbers in a permutation into contiguous sorted subsequences. For example, the sorted subsequences for the permutation 1,6,7,3,4,5,2,8 are (1,6,7), (3,4,5), and (2,8). In this example, the permutation is composed of 3 sorted subsequences. How many subsequences would one expect for a uniformly random permutation π of 1,2,,n?

Observe that each sorted subsequence, with the exception of the first one, begins with a number that is smaller than its predecessor. Moreover, any such number must always start a new increasing subsequence. Thus, if we define X to be the number of consecutive pairs a,b where a>b, it follows that the desired expectation can be computed as
1+E[X]=1+n1i=1E[Xi] where Xi is 1 if the number in position i is larger than the number in position i+1 and 0 otherwise. Since E[Xi]=P(Xi=1)=1/2, the desired expectation is 12(n+1).

The previous result should not be surprising because of the symmetry of increasing subsequences and decreasing subsequences. That is, each number in a permutation must either start an increasing subsequence or a decreasing subseqeuence, with the exception of the first number which starts both an increasing and decreasing subsequence. Thus, if X is the number of increasing subsequences and Y is the number of decreasing subsequences, X+Y=n+1. By linearity of expectation, we can then conclude that
E[X]=E[Y]=12(n+1) since E[X] must equal E[Y] by symmetry.

By applying these straightforward techniques, we can discern a number of properties regarding the order of elements in a permutation. The next expectation we consider turns out to be a little more involved.

Let's pursue further this notion of increasing and decreasing subsequences. Suppose that we partitioned π into contiguous subsequences that can be either increasing or decreasing. We will create this partition in a way that minimizes the number of parts as follows. Look at the first two numbers. If they are increasing, start building up an increasing subsequence. If they are decreasing, start building up a decreasing subsequence. When our subsequence cannot be extended any further, start over using the subsequent two numbers. For example, the subsequences for the permutation 9,5,1,6,10,2,8,7,3,4 would be (9,5,1), (6,10), (2,8), (7,3), and (4). How many subsequences would one expect in such a partition for a uniformly random permutation π of 1,2,,n?

Let a,b,c be a consecutive subsequence in the permutation π. We say that b is an extremal point if b>a,c or b<a,c. A subsequence of π of size k>2 is said to be alternating if it contains k2 extremal points. For example, in the permutation 9,5,1,6,10,2,8,7,3,4:
  • 1, 10, 2, 8, and 3 are the extremal points, and
  • (5,1,6), (6,10,2), (10,2,8), (2,8,7), (6,10,2,8), (10,2,8,7), (6,10,2,8,7), and (7,3,4) are the alternating subsequences.
For k=3,4,,n, let Xk be the number of alternating subsequences in π of size k. Let Xk,i be an indicator variable that is 1 if the subsequence of size k starting at position i is alternating and 0 otherwise. Observe that
Xk=nk+1i=1Xk,i
and therefore
E[Xk]=nk+1i=1E[Xk,i]
by linearity of expectation. Let ϕ(k) be the number of ways of permuting k numbers into an alternating sequence. We can then express P(Xk,i=1) as 1k!ϕ(k) from which it follows that
E[Xk]=nk+1k!ϕ(k)
since E[Xk,i]=P(Xk,i=1).

We can relate alternating subsequences to the number of increasing and decreasing subsequences in the following manner. Observe that with the exception of the first subsequence, the start of every increasing or decreasing subsequence follows an extremal point. Thus, if we define X to be the number of subsequences in our partition, we can estimate X1+X3. However, suppose that π contained an alternating subsequence of size 4. It could have been that only one of the two extremal points in this alternating subsequence was followed by a new increasing or decreasing subsequence. Thus, we can refine our approximation by discounting all alternating subsequences of size 4. That is, X1+X3X4. This estimation goes awry whenever an alternating subsequence of size 4 is actually part of an alternating subsequence of size 5, in which case we would have discounted 1 too many subsequences. We can thus refine the approximation further as X1+X3X4+X5. By the inclusion-exclusion principle, we can extend this reasoning to achieve the complete relation
X=1+nk=3(1)n1Xk from which the desired expectation can be computed as
E[X]=1+nk=3(1)n1nk+1k!ϕ(k).
The Taylor series for the function 2(tanxsecx) about the point 0 is
ϕ(0)+ϕ(1)xϕ(2)x22!+ϕ(3)x33! if we define ϕ(0)=ϕ(1)=1. Observe the similarity that this expansion has with E[X]. We can relate the two with the identity
E[X]=1+(n+1)(2tan(1)2sec(1)+1)(1+ddx|2(tanxsecx)|x=1) from which it follows that
E[X](2tan(1)2sec(1)+1)n as n. Thus, we have shown that the expected number of increasing and decreasing subsequences converges to about 0.4132n as n tends to .

No comments :

Post a Comment