27 December 2007

On Presidential Primaries

One of the crimes of our current electoral system — the United States picks its President in about as undemocratic a method imaginable — is that the same four million people pick the candidates for both major parties every year.

Wikipedia lists the fifty states' and six non-state U.S. territories' populations, based on official estimates by the U.S. Census Bureau. In light of next month's high-profile primary elections, the numbers are remarkable.

Elections are very expensive: candidates for primaries must go door-to-door, and raise money for media buys. It would be highly undemocratic for California to be the first primary. Eventually, the nominee must win California, but s/he should not have to compete there until most of the candidates have been culled from the game, and fundraising is funneled to only a few folks who've had lots of free media. Indeed, if California was the first primary, you can be sure that only the candidates who come into the game with big names and lots of personal money would be competitive. (Perhaps, of course, this is better? I wouldn't mind a system that rewarded career politicians who worked their way to the top after long terms in the Senate, Cabinet, and state Governor mansions.) So we must, if we are to have some semblance of a meritocratic democracy, begin the nominating contest in small states.

With that said, the current choices of Iowa, New Hampshire, and South Carolina cannot be justified.

Iowa, with just shy of three million people, is not large. It has five members of the House of Representatives, and its population is largely white, pro-farm, and anti-immigrant. New Hampshire is legitimately small: it has only 1.3 million people and two representative in Congress. South Carolina, on the other hand, houses more than four million people, and is the only early primary with a sizable non-white population.

But Iowa is larger than twenty other states (and the District of Columbia), any of whom would make a reasonable first-in-the-nation primary. From smallest to largest, they are:

Wyoming, DC, Vermont, North Dakota, Alaska, South Dakota, Delaware, Montana, Rhode Island, Hawaii, New Hampshire, Maine, Idaho, Nebraska, West Virginia, New Mexico, Nevada, Utah, Kansas, Arkansas, and Mississippi.

These twenty one voting districts represent the full range of American demographics. DC is urban, Black, and overwhelmingly Democratic. Hawaii is majority Asian; New Mexico has a large Mexican immigrant population. Many of the small states are rural and conservative, while others, like Rhode Island and Delaware, easily represent the metropolitan North East. Vermont has a bizarre politics all to its own.

The 2008 election cycle will begin with the same four million people that have started the game for the last twenty years. Four million people whom the candidates, when they are in their most pandering mode, call "uniquely qualified" to pick a candidate. Are the rest of us stupid and uneducated? Are North Dakotans or Hawaiians somehow less able to consider the candidates' experiences and abilities? A democratic system should not enfranchise only those who are educated, or moneyed, or otherwise "qualified" to vote.

In a better system, a bipartisan, independent commission, taking input from major party leaders, would select the Primary order each cycle. Their selections should, by law or policy, begin always with a couple small states — say, two states from the twenty smallest. Criteria should include some complementarity: one liberal, urban population, with one rural conservative one, say. And, most importantly, their selections should be different from year to year. In one cycle, Vermont and Idaho start off; in another, Utah and Delaware.

There have been many complaints this year about the election starting too early. It has been an expensive year (see, for instance, the listings at opensecrets.org): Clinton and Obama have each raised close to one hundred million dollars and spend forty million of it; Romney has raised and spent around sixty million. The general election will probably cost each party around a billion. And it's exhausting, and a distraction from the important work of running the country. Oregon's Governor, for instance, is out in Iowa stumping for Clinton, rather than doing his gubernatorial work. Will the 2012 election begin similarly early? It wouldn't if the Primary committee waited to announce the order, and perhaps was empowered also to announce the dates.

A system like this — or, indeed, any proposal to reform the primaries — would most likely require, in order to go into effect, the commitment of both parties as well as Congressional involvement. That's a tough order, but a necessary one if we are going to have a truly democratic democracy.

13 December 2007

Linear Differential Equations

In the calculus class I'm TAing, we spent some time learning how "the method of undetermined coefficients" could be used to solve linear differential equations. I have never taken a first-year differential equations class, so although I'd solved many differential equations this way, I had never really though about such methods with any real theory. My goal in this entry is to describe the method and explain why it works, using more sophisticated ideas than in a first-year class, but still remaining very elementary. I had hoped to have this written and posted weeks ago; as it is, I'm writing it while proctoring the final exam for the class.

First, let me remind you of the set-up of the problem. We are trying to solve a non-homogeneous differential equation with constant coefficients:

$$ a_n y^{(n)} + \dots + a_1 y' + a_0 y = g(x) $$

I will assume that $a_n$ is not 0; then the left-hand side defines a linear operator $D$ of on the space of functions, with $n$-dimensional kernel.

We can diagonalize this operator by Fourier-transforming: if $y = e^{rx}$, then $D[y] = a(r) e^{rx}$, where $a(r)$ is the polynomial $a_n r^n + \dots + a_1 r + a_0$. If $a(r)$ has no repeated roots, then we can immediately read off a basis for the kernel as $e^{rx}$ for $r$ ranging over the $n$ roots. If there is a repeated root, then $a'(r)$ and $a(r)$ share a common factor; $a'(r)$ corresponds to the operator

$$ E[y] = a_n n y^{(n-1)} + \dots + a_1 $$

Then, since $\frac{d^k}{dx^k} [x y] = x y^{(k)} + k y^{(k-1)}$, we see that

$$ D[x y] = x D[y] + E[y] $$

so $r$ is a repeated root of $a(r)$ if and only if $e^{rx}$ and $x e^{rx}$ are zeros of $D$.

More generally, linear differential operators with constant coefficients satisfy a Leibniz rule:

$$ D[p(x) e^{rx}] = \left( p(x) a(r) + p'(x) a'(r) + \dots + p^{(n)}(x) a^{(n)}(r) \right) e^{rx} $$

for polynomial $p(x)$.

Thus, our ability to solve linear homogeneous differential equations with constant coefficients depends exactly on our ability to factor polynomials; for example, we can always solve second-order equations, by using the quadratic formula.

But, now, what if $g(x) \neq 0$? I will assume that, through whatever conniving factorization methods we use, we have found the entire kernel of $D$. Then our problem will be solved if we can find one solution to $D[y] = g(x)$; all other solutions are the space through this function parallel to the kernel. In calculus, we write this observation as "$y_g = y_c + y_p$" where g, c, and p stand for "general", "[c]homogenous", and "particular", respectively.

For a general $g$, we can continue to work in the Fourier basis: Fourier transform, sending $D[y] \mapsto a(r)$, then divide and integrate to transform back. This may miss some solutions at the poles, and is computationally difficult: integrating is as hard as factoring polynomials. For second-order equations, we can get to "just an integral" via an alternative method, by judiciously choosing how to represent functions in terms of the basis of solutions for $D[y]=0$.

But for many physical problems, $g(x)$ is an especially simple function (and, of course, it can always be Fourier-transformed into one).

In particular, let's say that $g(x)$ is, for example, a sum of products of exponential (and sinosoidal) and polynomial functions. I.e. let's say that $g(x)$ is a solution to some homogeneous linear constant-coefficient $C[g(x)] = 0$. Another way of saying this: let's say that the space spanned by all the derivatives $g$, $g'$, $g''$, etc., is finite-dimensional. If it has dimension $= m$, then $g^{(m)}(x)$ is a linear combination of lower-order derivatives, and thus I can find a minimal (order) $C$ of degree $m$ so that $C[g] = 0$. By the Leibniz rule, functions with this property form a ring. When I add two functions, the dimensions of their derivative spaces no more than add; when I multiply, the dimensions no worse than multiply. Indeed, by the earlier discussion, we have an exact description of such functions: they are precisely sums of products of $x^s e^{rx}$ ($s$ an non-negative integer).

In any case, let's say $g$ is of this form, i.e. we have $C$ (with degree $m$) minimal such that $C[g] = 0$, and first let's assume that the kernels of $C$ and $D$ do not intersect. Then $D$ acts as a one-to-one linear operator on finite-dimensional space $\ker C$, which by construction is spanned by the derivatives of $g$, and so must be onto. I.e. there is a unique point $y_p(x) = b_0 g(x) + b_1 g'(x) + \dots + b_{m-1} g^{(m-1)}(x) \in \ker C$ so that $D[y_p] = g$. Finding it requires only solving the system of linear equations in the $b_i$.

If, however, $\ker C$ and $\ker D$ intersect, then we will not generically be able to solve this system of linear equations. Because $C$ is minimal, $g$ is not a linear combination of fewer than $m$ of its derivatives; if $D$ sends (linear combinations of) some of those derivatives to 0, we will never be able to get a map onto $g$. Let me say this again. If $D$ does not act one-to-one on $\ker C$, but $g$ is in the range of this matrix, then $g$ is in a smaller-than-$m$-dimensional space closed under a differential operator; thus, there is a differential operator of lower-than-$m$ degree that annihilates $g$.

How can we then solve the equation? By the Leibniz rule, we observed earlier, $(xg)' = g + x g'$, and so

$$\frac{d^k}{dx^k}[x g(x)] = x g^{(k)}(x) + k g^{(k-1)}(x)$$

Then $C[xg]$ is a linear combination of derivatives of $g$; i.e. $C[xg] \in \ker C$. If we take the space spanned by the derivatives of $x g(x)$, it is one dimension larger than $\ker C$. We can repeat this trick ---

$$ \frac{d^k}{dx^k}[ x^p g(x) ] = \sum_{i=0}^k \frac{k!p!}{i!(k-i)!(p-k+i)!} x^{p-k+i} g^{(i)}(x) $$

--- and eventually get a space that's $n+m$ dimensional, containing, among other things, $g$, and closed under differentiation. The kernel of $D$ in this larger space is at most $n$ dimensional (since $n = \dim \ker D$ in the space of all functions), and so the range is $m$-dimensional, and must contain $g$: the system of linear equations is solvable.

Of course, we often can stop before getting all the way to $n$ extra dimensions. But so long as we are only interested in functions that are zeros of constant linear differential operators, then we can always solve differential equations. For example, every linear equation from a physics class, and almost every one in first-year calculus, is solvable with this method.

One final remark:

Composition of differential operators follows matrix multiplication, and hence yields differential operators. If $g$ satisfies $C[g] = 0$, and if we're trying to solve $D[y]=g$, then we might decide to solve more generally $CD[y] = 0$. The left-hand-side is $n+m$ dimensional, and if we're truly gifted at factoring polynomials, then we can solve it directly. Then the solutions to our original equation must be in this kernel.

14 October 2007

Divergent Series take 1

The following talk is significantly too long. Some parenthetical remarks are easy enough to excise, but what else should I drop?

The talk is available here (pdf). I will give it on Thursday at "Many Cheerful Facts", a brown-bag student-organized talk series in which different graduate students present general-audience material: if you know the subject already, you won't learn anything in the talk. Someone bakes something tasty each week.

Abstract: Whereas modern physicists write down divergent series all the time, mathematicians through the ages have been variously terrified or only mildly scared of such sums. In this talk, I will survey the most important methods of summing divergent series, and make general vague remarks about them. I will quote many results, but will studiously avoid proving anything.

30 September 2007

Whole grains, it bears repeating, are tasty and nutritious. They cook easily, but many take a fair amount of time. Whole grains are processed and sold dried: before eating, they must be boiled in (potentially salted or flavored) water. Most grains should be combined with a prescribed amount of water in a pot with a well-fitting lid, brought to a boil, and simmered covered for a prescribed amount of time. Length of time is determined by the grain; amount of water should be just enough to have almost entirely evaporated off / been soaked up in that amount of time. If your pot does not have a well-fitting lid, you'll need to use more water. Some cooks prefer to soak their grains overnight, as this reduces cooking time. If you use too much water, boil uncovered for the last few minutes to evaporate off the excess. Do not stir your grains unless you want to develop the starches into a mushy mix. Grains hold their heat covered exceedingly well. To make a better seel,

What follows is a first-approximation of how much water and for how long for different grains. For details on grains' nutrition and substitutions, I refer you to The Cook's Thesaurus. For continual updates, check here.

Grain Type
Amount of Water per cup grain
Cooking Time
Notes
Corn
Enough
10 minutes
Corn can be steamed or boiled (or grilled or microwaved).
Oats, rolled
1, and add more if starts to burn
5 minutes, stirring (uncovered), or until desired consistency
A traditional breakfast cereal, cooked as a mush. I suggest cooking with raisins, a stick of cinnamon, and some maple syrup. Rolled oats have been steamed once, so cooks fast.
Quinoa
1.5
10 minutes
Very fast, high protein. Rinsing first will reduce the slightly bitter flavor.
Rice, brown
1.5
20 minutes, plus 30 minutes with heat turned off
Do not remove lid during the entire process. Just turn off the heat and let the rice continue to cook in the steam in the pot.
Rice, white, Persian style
2, or enough to cover by 2 inches
10 minutes uncovered, then 45 minutes covered
Boil rice, then drain, rinse in cold water, and drain again. Melt in a large saucepan 1 Tbsp butter per cup uncooked rice, and add rice and stir once to coat well. Cover and steam over very low heat. Bottom should be crispy and golden when done.
Wheat, berry
2.5
1 hour
Good pasta substitute, especially with tomato sauce. Given the time involved in cooking, many suggest soaking first, or slow-cooking overnight. I haven't tried these techniques.
Wheat, bulgur
.75
7 minutes, plus 15 minutes with heat turned off
Do not remove lid during the entire process. Just turn off the heat and let the wheat continue to cook in the steam in the pot. Bulgur has been steel-cut, soaked, and baked, so cooks fast. For a tasty pilaf, sauté thin-sliced onion with two-inch pieces of vermicelli, then add bulgur.

16 September 2007

Partial Fractions

I really am taking my own classes, and thinking about my own mathematics. But so far my classes have discussed supermathematics, which is cool but to which I have nothing so far to add, and classical (Lagrangian and Hamiltonian) mechanics, which I had intended to blog about last year. Perhaps I will some day write about such stuff; for now, I'd like to tell you about another topic we've been discussing in my calculus class.

Let's say I have a (proper) fraction m/n (in lowest terms). It's not a very simple expression: n most likely has lots of factors, and it would be nice to understand how much each factor contributes to the whole. For instance:

7/15 = 2/3 - 1/5

Can we always split up a number like this? At best, we could write a proper fraction as a sum of (proper) fractions with very small denominators.

Let's start by considering the case when n has two relatively prime factors: n = rs. We want to write

m/n = A/r + B/s.

Multiple both sides by n; we see that this is equivalent to solving the following (easy) Diophantine equation:

m = As + Br

This certainly has a solution. For instance, we can use Euclid's algorithm to write

1 = Xs + Yr

and then use A = mX and B = mY. Of course, this choice of A and B will generally be much larger than hoped-for. Never fear, though: we can always shift A and B simultaneously in opposite directions multiples of r and s. Thus we can assure that

0 < A < r

in which case

0 < As = m - Br < rs = n

so

-n < m-n < Br < m < n

and thus

-s < B < s.

Thus, we can factor the denominator into prime-power parts, and use induction. Going directly: we're looking for A, B, ..., C such that

m/(rs...t) = A/r + B/s + ... + C/t

If we multiply both sides by (s...t), this is

m/r = A(s...t)/r + integer

so we're looking for an A such that

m = A(s...t) (mod r).

Since A, r, and (s...t) are pairwise relatively prime, we can definitely do this, and we can be sure that

0 < A < r.

Doing this for each term yields

m/n = A/r + B/s + ... + C/t + integer

and this integer is definitely negative and no more (in absolute value) than the number of other summands, since each fraction is between 0 and 1. Thus we can subtract 1 from some of the summands to make the integer disappear.

11/60 = 1/4 + 1/3 + 3/5 - 1 = -3/4 + 1/3 + 3/5 = 1/4 - 2/3 + 3/5 = 1/4 + 1/3 - 2/5.

This last step, where we have to subtract, reminds us that this decomposition is not unique. It's close: we have two choices for each term, but of course making some choices constrains others.

If we're working with polynomials, on the other hand, we never have to subtract. The division works exactly as with integers, but now all inequalities should be written in terms of the degrees of the polynomials. By counting total degree, we see that the left-hand side has total degree less than 0, so that "+ integer" on the right-hand side must be 0. This is a proof that the partial-fractions decomposition of polynomial fractions is unique.

Or, rather, there's one more step in the partial-fractions decomposition. What I've written so far allows us to factor the denominator into prime powers, and write one fraction for each power. But we can go one step further: if q < p^d, then we can write

q/p^d = A_1/p + A_2/p^2 + ... + A_d/p^d

with 0 ≤ A_i < p for each i. This is, of course, trivial, and is how we traditionally write integers in terms of a certain "base":

q = q_1 p + A_1
q_1 = q_2 p + A_2
...

One more point bears stating. In the case when we're working with polynomials, and when we can completely factor the denominator into linear factors, partial-fractions decomposition becomes extremely easy, because dividing by a linear factor is trivial:

m(x) = (x-a)q(x) + m(a)

I.e. the remainder mod (x-a) is just the value of the polynomial at x=a. To get the quotients, synthetic division is very fast. This makes the last step trivial, and repeatedly dividing by p allows us to divide by p^d, so really we can do the initial steps quickly as well.

14 September 2007

Lavender and Apple

This article is also posted (semi)permanently on my recipe files under the same title.

I was very concerned when I saw this month's They Go Really Well Together: Apples aren't in season! I thought. It's the start of September. We're still eating peaches and tomatoes. I had not contributed to any of the previous TGRWTs: one was inconveniently timed, and most included ingredients I was not a fan of. But apples and lavender! Those ingredients are amazing! If only she had waited a month.

Little did I know that Nature and the Farmer's Market had conspired to force me to enter. That Saturday was the first day of apple season: every stand, all of a sudden, was overflowing with amazing apples. I bought close to a dozen. Lavender grows near my house, but one stand also had bunches of gorgeous dried lavender. So I came home that Saturday with all the ingredients I needed.

Besides, my roommate and I were planning on going out to Lavender Country Contra Dance, which was to begin with a potluck. What better to bring than an entirely experimental dish? I ended up making two recipes: the Vegan Apple Lavender Crisp I brought to the contra dance, and a Lavender Apple Iced Tea that I've been enjoying at home. I'll end with Reviews.


Vegan Apple Lavender Crisp

Preheat oven 350°F. Line a 9x13-inch pan with aluminum foil for ease of removing the crisp later.



Slice
  • 4 cooking apples.



Take six stalks dried lavender; chop and mortar-and-pestle the flowers. Makes
  • 1 Tbsp dried lavender flowers.



Toss with apples. Also toss in
  • 1/2 tsp cinnamon
  • 1/4 tsp lemon zest
  • 1 tsp lemon juice
  • 1 tsp cornstarch



Pour apples into prepared pan.

In medium bowl, combine with fingers
  • 1/2 cup oats
  • 1 cup whole wheat pastry flour
  • 1 Tbsp canola oil
  • 3 Tbsp honey
  • 1 Tbsp water
  • 1/4 tsp salt
  • 1/4 tsp vanilla

Sprinkle over apples. Bake 350°F for 35 minutes, or until crust turns golden.




Lavender Apple Iced Tea

If you buy dried lavender, you will find yourself wanting to use the flowers in recipes, but most of what you bought is stem. Here's a recipe that can perfectly use up the extra.

Cut lavender stems into two-inch pieces.



Place in a half-gallon container.



Also add
  • 1 bag plain black tea (I use PG Tips)
  • 1 packet instant hot apple cider powder (it's cheating, but I had some lying around)

and fill with water. Refrigerate over night.


Reviews

The apple crisp was a huge hit at the dance. It did taste overwhelmingly of lavender, and it's not clear that the apple and the lavender married well. The cinnamon and lemon also each add their own zing; the four main flavors did make for a nice ensemble (and I'd be curious to try this with more honey in the crust). Overall, there were many compliments, and even more comments of the form "this is really weird". I liked it.

The iced tea, on the other hand, was divine. Without straining, you do get the occasional sip full of stem, but the flavors married amazingly well: it tasted of lavender and apple and tea without tasting of any of these individually. The adjective that comes to mind for the flavor is "smooth".

On the other hand, now a week later, the crisp is gone, and the drink tastes mostly of the tannins in the black tea, and the spices in the cider. C'est la vie.

13 September 2007

Integration

This semester I am a graduate-student instructor for Math 1B — the second half of a first-year calculus sequence. Students attend large three hours a week of lectures with 400 classmates, and also have three hours of GSI-led "section". We get remarkable freedom with our classes: I write my own quizzes and worksheets, and plan my own material. I spend a lot of my time with "big picture" material: I feel like it's not that valuable for the students to watch me solve problems at the chalkboard. Or rather, it's very valuable to do so in an interactive setting, but I have yet to figure out how to make an interactive class with twenty to thirty students. In my office hours I often have closer to six students, and there we work through homework problems together.

We just finished a unit on techniques with which to approximate an integral, and it is about these that I would like to tell you. Although elementary, approximation techniques connect with both advanced mathematics and deep philosophical questions.

Let me begin with the latter. Why do we do calculus? Certainly, few mathematicians will ever use much calculus in her later work. But physicists and engineers and statisticians do: indeed, anyone whose job involves processing numerical data will use calculus for every calculation, even if it seems like she's just asking the computer to do some magic.

So calculus, and, indeed, arithmetic, is primarily a tool for dealing with "physical numbers": measurements and quantities and values. What's interesting about numbers in the real world is that they are never precise: physical numbers are "fat" in the sense that they come (or ought to come) equipped with error terms. Fat numbers cannot be added or multiplied on the nose — errors can propagate, although they also tend to cancel out — and fat numbers do not form a group (the number "exactly zero" is not physical). The universe has a built-in uncertainty principle, not because of quantum mechanics, but simply because we are limited beings, only able to make finitely many measurements. But a "real number" in the mathematicians' sense has infinitely much data in it, and so can never exist in the real world.

Much of the time, the hardest of physical calculations consist of evaluating integrals. We have a (fat) function (which may approximate an honestly "true" function, or may be a convention that we are using to think about messy data), usually consisting of a collection of data-points, which we would like to integrate over some interval. Riemann tells us how to do this: divide the interval into a series of small subintervals, thereby dividing the area under the curve into a series of small regions; estimate the heights of each small region; add these together.

We are now left with the difficult task of estimating the heights of the regions. In some sense, our estimates don't matter: provided our function is sufficiently nice (and if it isn't, we have no right to call it a "function"), then all possible procedures of estimates will eventually converge to the same value as the number of regions gets large. But some methods will converge faster and slower, and we need to be able to calculate this integral in a reasonable amount of time: neither humans nor computers can make truly large numbers of calculations. Much better would be if we picked a method for which we could bound the number of calculations needed to get within an allowable error.

One easy procedure is to use the height of the function at the left-hand side of the region. To investigate this, let's do a simple example: let's integrate

\int_0^1 x dx

By algebra, we know that this area is actually 1/2. If we use the "left-hand rule" for estimating heights, we get an estimate of the total area equal to 0. Our error is 1/2.

Why did I start with this function to test the rule? Let's say I had any linear function y=mx+b. Then the "+b" part cannot affect the error — the error depends on first (and higher) derivatives, and y=x is the simplest function with a first derivative.

By multiplying an using linearity, the error from using the left-hand rule in evaluating

\int_0^1 (mx + b) dx

will be m/2: error estimates must always scale linearly with the derivtive on which they depend. And integrating over [a,b]? Holding m constant but scaling [0,1] to [a,b] requires scaling both the x- and y-ranges, so our error should be proportional to m(b-a)^2.

This we can read from dimensional analysis. If x has units=seconds, say, and y is in feet, then m is in ft/sec, but the integral and the error are both in foot-seconds. If we want to estimate the error E in terms of m and the domain of integration, we must have E = Am, and since (b-a) is the only other unitful input, and has to correct for the seconds in A, we must have E \propto m(b-a)^2, and using the above coefficient of proportionality, we have E = m(b-a)^2/2.

What other parameters do we have? We could divide [a,b] into n regions. If we do, then our total error is

E = n*m((b-a)/n)^2/2 = m(b-a)^2/(2n).

This is the error for a line. But we expect every function to behave locally like a line, so if we have a generic function, we expect this to estimate the error. We can make that estimate into a stronger inequality by being more precise: E should be a number, so we should use some m that estimates the derivative; if we pick m bigger than the (absolute value) of the derivative, and we will bound the (absolute) error. (You can make this rough inequality exactly strict by simply imagining some curvature in the graph. It's clear the the line passing through the left endpoint whose slope is the maximum slope of the function is strictly above the function everywhere.)

We could have instead used the right endpoint of each interval to estimate the total integral. In the large-n limit, we expect the function over each small interval to behave like a straight line; the right-hand rule thus gives us an error almost exactly the same as the left-hand rule, except with the opposite sign (this is certainly true for a straight line). Thus we can get a much better estimate of the true integral by averaging the right- and left-hand estimate; doing this gives the "trapezoid" estimate, because of how it behaves geometrically on each subinterval.

When we average out the errors of the left- and right-hand rules, we are averaging out any dependence of the error on the first derivative of the function. Indeed, the trapezoid rule gives exactly the right results for the straight line. But there may still be an error. Our estimates on the errors are only true when the functions are straight lines; the failure of the errors to exactly cancel out must depend to first-order on the second derivative, as this measures the deviation in the (first) derivative. Dimensional analysis gives the correct error estimate for the trapezoid method:

Error(trapezoid) \leq m (b-a)^3 / (c n^2)

where c is some numerical constant, and m is a bound on the absolute second derivative. We can calculate c by working a particular example: the trapezoid estimate is exact for lines, and so up to scaling and subtracting a line, any parabola is y = x^2. But the trapezoid rule predicts an area of 1/2 for the integral from 0 to 1, whereas the true area is 1/4. Thus c=12.

How do we get a better estimate? If we had another method that also had an error depending on the second derivative, we could average again, and get an error of order the third derivative. Notice that by dimensional analysis the power of n in the denominator is always equal to the order of the derivative: the more accurately we approximate our curves by polynomials, the more quickly our estimates converge on the true values.

One such second-derivative-dependent measure is given by the "midpoint rule": estimate the height of each subinterval by the value of the function on the midpoint of the interval. I'll say why this is second-derivative-dependent in a moment. Dimensional analysis gives the same formula for the error as for trapezoids, and looking at y=x^2 yields c=24.

Error(midpoint) \leq m (b-a)^3 / (24 n^2).

Thus the midpoint rule is twice as accurate as the trapezoid rule; taking the appropriately weight average yields "Simpson's rule". This should be called the "parabola rule", since geometrically Simpson estimates each subinterval by drawing the parabola that passes through the midpoint and the two endpoints.

Naively, we might predict that Simpson's rule, by averaging out any dependence on the second derivative, should give an error linear in the third derivative. But in fact it is linear in the fourth. A hands-on way of seeing this is to simply apply it to a cubic. We might as well be interested in the integral from -1 to +1, and we may assume n=1. Given any cubic function, we can subtract some parabola so that it goes through y=0 at x=-1, 0, and +1; the parabola rule predicts an area of 0. And the true area? There is a one-dimensional family of cubics through three points, and we have a one-dimensional family through these three points: namely, scalar multiples of x^3 - x = x(x-1)(x+1). Thus, Simpson's rule is exact on cubics, and the fourth derivative is the smallest that can create an error.

There is a more elegant way of seeing that Simpson's error depends on the fourth and not third derivatives, which will also explain why the midpoint rule, which naively just measures the height at a point and so ought to error-ful with the first derivative really depends on the second. In particular, both the midpoint rule and Simpson's rule are symmetric under the reflection x \to -x. But the first and third derivatives (and indeed all odd derivatives) pick up minus signs under this reflection. By dimensional analysis, the error in an estimation method must be linear in the controlling derivative, and so any symmetric method cannot depend on odd derivatives.

Thus, I end with a cute challenge: come up with an (interesting, simple) estimation method with an error that really does scale as the third derivative of the function.

03 September 2007

Don't tell the department

I've been spending so much more time thinking about food than about mathematics. I had promised myself that math, the lowest-priority of my three main passions (with cooking and dancing) last year, would move to first in graduate school. So far it's second only because I haven't been dancing in months.

Is it a bad sign that I'm already fantasizing about dropping out and starting a restaurant or bakery? On my bookshelf, waiting to be read cover-to-cover, is a copy of Culinary Artistry, which is about cheffing. Dorenburg and Page distinguish between three kinds of cooking:
  1. "Cooking as a trade", where your primary goal is sustenance, and with your limited repertoire you're hoping your customers go away thinking "I'm full."
  2. "Cooking as craft", the style promulgated in the greatest cookbooks, has its main goal enjoyment; a chef should have a wide repertoire of classic dishes, and hope that the customers go away thinking "That was delicious."
  3. In "cooking as art", on the other hand, a chef prepares her own recipes, and any given night the menu will be very limited; customers should be entertained, and go away thinking "Life is beautiful."


This trade/craft/art distinction is useful in other disciplines. Everyone should be able to (but many can't) do mathematics at the most basic of trade levels — monetary arithmetic, being duly suspicious of newspaper statistics, etc. It bears remembering that tradesmen often have incredible skill: Alice Waters a few years ago left the artistry of Chez Panisse a few years ago to try to reform the public school food; she is now cooking trades food for the masses. I wonder whether an actuary considers her mathematics to be trade, craft, or art.

I feel like the majority of expository mathematics writing, and the entirety of an undergraduate math major curriculum, falls under "mathematics as craft": classic results presented (hopefully) well. There is certainly an artistry to teaching well, but the teacher-as-artist focuses on the delivery, not the mathematics itself. I have not yet tried serious mathematics research; I know I am excited by artistic mathematics, but I also know that I adore teaching, and I have often comforted myself with the reminder that, if research is not for me, I can have a good life teaching at a liberal arts college.

To do original, beautiful research, however, requires the creativity of an artist (and lots of crafts- and tradesmanship). A calculation can prove a new theorem, just like a recipe can create a new meal; a few geniuses create new recipes, new fields, new mathematical insights. If I can learn to be a mathematician-as-artist, then I will stay in research.

An advanced social dancer is an artist, although often not a great trades- or craftsman. An advanced ballerina is a virtuosic tradesman. When I go to Pilates, I am engaging in movement-as-craft.

Chez Panisse has a fixed menu every night: Mondays are $55 + drinks + 17% tip + 8.75% tax / person, and the price increases through the week. Other great restaurants also have small, constantly changing menus. Moosewood offers a limited, changing vegetarian menu every night: your choice of three or four entrées, a couple salads, etc. In both cases, recipes are original, based on seasonal organic ingredients and the tastes of the chef.

If I were to start a restaurant, I'd want it to be like the Moosewood. Cooking as craft does not excite me. I like knowing how to make all sorts of dishes, of course, because I like to understand how food works — the science and history — but I would never want to work in a restaurant where customers pick dishes from an extensive fixed menu, and I make those. As chef, I am the dom: I pick and create a menu that you will eat and hopefully find transcendent.

But the fantasy is not that I would run a restaurant (at best, I'd be a member of a cooperative restaurant like Moosewood). Rather, I'd like a bakery, making gourmet breads, and possibly serving pastries, coffee, sandwiches, and soup. Here it is food-as-craft, but as baker I don't serve anyone. I still make my loaves, and then sell the completed objects to you. Your choice is limited, and what I have available will change: there will be some staples, of course, but each day two soups will be available, and they will change by the week (one on Sundays, one on Wednesdays), and salads will be based on seasonality. If the bakery is successful, I will continue to expand: breads first, then pastries and coffee, then sandwiches and salads, and then dinners with a daily-changing menu. Such an operation is much more work than one person can manage.

In fact, however, I will never own a bakery, although I will continue to bake for myself, friends, and family. What I like best is that I can create everything from scratch, from raw ingredients. And I like sharing my food, and eating what other people have made, also from raw ingredients. I'm exceedingly happy when I am spending my time creating not just food, but also objects: I would love to learn more about woodworking, plumbing, pottery, knitting. However, I do hope to remain an academic. I love thinking about mathematics and communicating it.

The current fantasy, then, does not include selling food, but also does not include purchasing much. I'd rather move away, as much as possible, from the industrialized economy, towards one populated by human craftsmen and artists. Thus, the fairy-tale future involves a university, yes, where I will work six to nine months a year, but also a large house on a lot of land, outside the city but within the public-transportation network (there is no vehicle I enjoy more than the train, except possibly the bicycle). For the three months of summer I will be a full-time farmer, growing, pickling, and canning enough vegetables to live on.

Especially in the North, where the growing season is short but furious, I could avoid too much overlap between Spring planting, Fall harvesting, and Winter teaching. I have grown up in the West, and would like to return to the Pacific Northwest; on the other hand, I have no real desire to stay in the U.S.; perhaps I will live in British Columbia. At such latitudes, in this fairy-tale I will practice mathematics by night and farming by day.

Many of my recipes, some posted here, some e-mailed, some posted elsewhere, are available here.

10 August 2007

"I'm not a scientist."

Gov. Bill Richardson (D-New Mexico) has been receiving a lot of criticism for botching a question from Melissa Etheridge at Thursday's lgbt-themed Democratic forum. I would like to provide my views.

I haven't watched most of the forum. In the "botched" clip, Richardson is clearly exhausted and fumbling --- he later apologized, explaining that he just fly from New Hampshire. But the question seems like a setup, and should not have been included. Because debates and forums are like interviews: you shouldn't ask for a "right" answer or a "wrong" answer, you should ask substantive questions. What happened here was that Etheridge asked a question with only one correct answer, and Richardson, for all his fumbling and exhaustion, provided a substantive answer.

Etheridge asks "Do you think sexuality is a choice or biological." When Richardson answers "it's a choice...", she interrupts him to clarify her question. His answer, more or less: I don't know the science, and regardless of the science, "across the board I've always felt that every human being deserves the same rights."

If Richardson had been better briefed on LGBT issues, he would, of course, have answered with a canned and scripted "blah blah scientists blah biological". Instead, his answer was honest and, more importantly, got to the real heart of the matter.

First of all, a more nuanced answer to the question asked is in order. Then a response to the question she meant.

I choose who I sleep with. Certainly nature doesn't rape me, doesn't force me to sleep with people of a certain gender even though I don't want to. To a broad extent I choose how I behave, and I certainly choose what groups to identify with.

Ah, but what about attractions? Certainly I'm not sexually attracted to everyone of a given gender, and I doubt that you are too. Indeed, I find many people attractive, of multiple genders. And, moreover, given a reasonably attractive person, I can often talk myself into having a crush on them. Can't you? And I bet you have some amount of conscious control over even the behaviors and objects you fetishize.

Now, I'm not saying that everyone gets to choose whether to like pussy or cock. But absolutely folks have some control.

And whether they do or don't shouldn't affect the rights we give to them. Analogies are important:

Skin color shouldn't determine one's access to rights, and other than bleaching and tanning, it's basically biologically determined.

Gender shouldn't either --- indeed, the LGB activists would do well to remember their T friends, who have been working for a long time to convince the public that it's OK for someone to choose to change their biological gender.

Eyesight is also not something we have a lot of choice about. Those of us with poor eyesight can choose whether to wear glasses or not, but the government ought to (and does) require that we wear glasses if we are going to drive.

Religion, on the other hand, almost everyone agrees is not biologically determined. I've asked some of my very religious friends to explain where their faith in God comes from; some of them say they couldn't change it even if they wanted to. But the point, just like for the queers, is they don't want to, and we have a very strong tradition in this country of protecting from unequal treatment people who choose to worship different Gods in different ways. We should do the same with people who choose to sleep with different genders in different ways.

This, I believe, is the thrust of Richardson's argument. So, hey, queers, lay off, will you? And think a little more rather than being so quick to condemn someone who gets it about discrimination.

29 July 2007

Day 3: What is Electric Charge?

The third and final day of my Topics in QM class was harder and more advanced, and had the distinct disadvantage of not being based on a particular paper. The goal was to explain electric charge from a QM/QFT point of view; to get there, we had to discuss momentum, wave functions, etc. To wit:

How can you describe position of a particle? Need three numbers. What three numbers? Depend on my location. So particle P knows three numbers, that depend on where we're standing.

How? If I move by (column) vector [\del x, \del y, \del z], P's numbers change by [x,y,z]\mapsto [x+\del x, y+\del y, z+\del z].

This is called an action: there's an \R^3 of translations, which "act" on the \R^3 of possible states of the particle.

Now let's be mathematicians, and generalize this. But let's start wih a very simple case: let's say that P only has _one_ number, but that that number depends on where we're standing when we measure it. I.e. we have a fn \Phi: locations \to numbers.

Ok, now let's say that we move by some vector \vec{\del} = [\del x,\del y,\del z]. How does \Phi change? Well, by some function:

\Phi(location + \vec{\del}) = \Phi(loc) + f(loc,\vec{\del})

So, laws of nature seem to not depend on location. Of course, \Phi depends explicitly on location. So let's demand the next best thing: let's say that \Phi is symmetrical enough that f does not depend on location:

\Phi(location + \vec{\del}) = \Phi(loc) + f(\vec{\del})

Well, what if we move and move again?

\Phi(loc) + f(\vec\del_1 + \vec\del_2) = \Phi(loc + \vec\del_1 + \vec\del_2) = \Phi(loc + \vec\del_1) + f(\vec\del_2) = \Phi(loc) + f(\vec\del_1) + f(\vec\del_2).

So f is linear! (One student correctly asked about multiplication by constants; our functions will satisfy some smoothness condition (for \R, suffices to by cont's; for \C, suffices to be \C-differentiable), and then this \Z-linearity is enough to guarantee linearity.)

By linear algebra, a linear function f:\R^3 \to \R is exactly a row vector <f_x, f_y, f_z> s.t.

<f_x, f_y, f_z> [\del_x, \del_y, \del_z] = f(\vec\del)

(I'm writing <> for row vectors, [] for columns. This then is the usual matrix product.) Why? You can read off f_x, for instance, by stepping in one unit in the x-direction, and seeing how much \Phi changes; I'll leave it as an exercise to prove that the three numbers f_x,f_y,f_z exactly give you your function.

In QM, we call the vector <f_x, f_y, f_z> the momentum of the particle.

Let me step back and say what we've done. CM is problematic in a really important way: if a particle is over there,how can I measure it, if everything is local? So, what we've said, is that no, the "particle" is _everywhere_, and remembers a number, which we can only measure _here_.

Now, what if particle 1 turns into particle 2? Say 2 also knows a number \Phi_2, so \Phi_1(loc)\to\Phi_2(loc) as 1\to 2.

And let's say that this is as simple as possible: in particular,

\Phi_2(loc) = \Phi_1(loc) + \varphi

where \varphi does not depend on location --- i.e. is a constant.

Then

\Phi_2(loc) + f_2(\del) = \Phi_2(loc+\del) = \Phi_1(loc+\del) + \varphi = \Phi_1(loc) + f_1(\del) + \varphi

So f_1 = f_2! Conservation of momentum!

Now, we're mathematicians. Let's generalize.

Does it matter that \Phi: \to \R? What if, for instance, \Phi: locations \to S = circle? Then f(\del) is an angle change. We still demand the same linearity: f(\del_1+\del_2) = f(\del_1) + f(\del_2). By taking \del very small, we can assure that f does not go all the way around the circle, in which case it still acts like a number, and it turns out that f is still given exactly by a triple of numbers. I'll leave it as an exercise, for those who know some calculus, to make this rigorous.

By the same argument, of course, this triple, which still deserves the name momentum, is conserved.

But, other things can act on S. See, only \R (and its products, like \R^3) can act on \R, because \R is too long for something compact to act on it. But actions on S can also include S actions! So let's just imagine that, if in addition to moving in three-space, we could move in some circular direction. Then "location" includes an angle-valued component as well as the three large components: We can move by some \del(angle) = \theta, and then

\Phi \to \Phi + c(\theta)

Just as before, c should be linear:

c(\theta_1 + \theta_2) = c(\theta_1) + c(\theta_2)

But also, \theta is an angle, so \theta = \theta + 2\pi, so

c(\theta + 2\pi) = c(\theta)

And c(\theta) is also an angle, so

c(\theta) = c(\theta) + 2\pi.

So, exercise: the only linear maps satisfying this are

c(\theta) = k\theta, for k\in\Z, i.e. for k an integer!

Intuition: this is just topology! How can you wrap a circle around a circle? (Negative numbers correspond to going the wrong direction.)

This number k is called the charge of the particle! Charge is conserved exactly as it was for momentum, and this explains why charge is quantized.

Or, rather, it explains it _if_ space is (3 dimsensions) \times circle. Can it be? Well.... yes, if circle is very small, so small that atoms are bigger than size of circle.

This is called the Kaluza-Klein picture: (gravity in \R^3\times S) = (gravity in \R^3) \tensor (EM in \R^3).

BTW, in QM, physicists prefer everything to be multiplicative, not additive. So, fine: we can use \Psi(loc) = e^{\Phi(loc)}, turning all our "plus"es into "times"es. But we want \Phi to take values in a circle. In \C, we can do this via

\Psi(loc) = e^{i \Phi(loc)}

because e^{i2\pi} = 1. Then

\Psi(loc + \del + \theta) = e^{i\Phi(loc)} e^{i p.\del} e^{i k \theta}

Where p is the (row) vector of momentum, so p.\del is the dot-product, and k is the electric charge. The nice thing about using complex-valued \Psi is it lets us add them, etc., do QM, still w/ this rule. And assuming moving by (\del,\theta) is linear and that space really does have these symmetries, we get conservation of momentum, charge, etc.

28 July 2007

Another day of QM Topics: A Finite Hidden-Variables Model

This material is from R. Spekkens In defense of the epistemic view of quantum states: a toy theory.

Heisenberg says you can't have complete knowledge of position and momentum. This is a theorem in traditional QM "wavefunction" model. We will take this as axiomatic: you can't have complete info.

Important conceptual point: At any given time, system is in definite classical state, which I will call its ontic state. _But_, this theory will also explicitly refer to our _knowledge_ of the state of the system --- I'll call this the epistemic state. In QM (and (SM) epistemic states are probability distributions. We will be more single-minded.

Let's say we have a (finite) set S of ontic states, and our epistemic state is simply a subset K of S. How much knowledge do we have? I.e. what is the "entropy" in the state K? Will turn out to relate to log_2 K.

In fact, let's define this intrinsically. We work only with yes/no questions. How many questions do I need to ask of a set S to resolve exactly the ontic state? Log_2 S. A basis of questions will be a set of questions with minimum size that's enough to resolve the ontic state exactly. Called "dim of system". You can show that the question is in a basis iff it splits S in half.

Then some questions can be answered by epistemic state, other's can't. Examples.

Knowledge of K is
max_{bases} \{ number of questions answerable \} = log_2 S - log_2 K

Lack of knowledge = log_2 K.

We demand physical law:
  • systems have size 2^n (because we can only measure w/ yes/nos).
  • epistemic states of maximal knowledge have knowledge = lack of knowledge.


E.g. smallest system: \{ 1, 2, 3, 4 \}. Called an "atom". What are valid epistemic states? Cannot have epistemic state of size 3, as no question can resolve this.

What are valid physical fns? Cannot be many-to-one, by uncertainty.

What happens if you measure "Is it 12 or 34?" when it's in epistemic state 23? Say it's really 2. Then get answer 12. Do you know it's (still) in state 2? No! It we ask question again, get same answer. So def. in 1 or 2. But measuring requires interaction, so could have turned it into a 1. Called "collapsing wave fn." Now let's measure 23 v.s. 14. Could get 14. So measurements do not commute.

How about systems with two atoms? Rule: our knowledge of any subsystem should also respect axiom. Examples of valid epistemic states: tensor products, pure entangled states.

Let's do teleportation. (Act out with students as atoms.) Let's say there's an atom --- you --- that I want to transmit. Let's take two others, and collapse them into an entangled state, by measuring (in two yes/no questions) "is it I, II, III, or IV?"

atom B
|
v
4 II III IV I
3 III IV I II
2 IV I II III
1 I II III IV
1 2 3 4 <- atom A


[Question: how do we know we can measure like this? A priori, "is it I,II" need not commute with "is it I,III". But let's demand that states of maximal knowledge consistent with axioms are possible.]

Now I give one atom (say A) to my BFF, who leaves for Neptune. (We know what the entangled state it.)

Ok, now I ask unknown state C and may half B of entangled state the same question: get answer. (Remember, after measurement you're allowed to (jointly) pick and other ontic state in the epistemic state.) Important point: let's demand that physics be local, so this measurement did not affect BFF's particle.

Now I can transmit this classical info to BFF. BFF, what question did you want to know of my orig particle? Can translate now into question for your particle. Try! Compare with original.

Exercises:
  • cloning violates uncertainty.
  • study triplets of atoms --- prove the impossibility of triply-entangled state.
  • "dense coding". A priori, although an atom secretly knows two bits of information, it seems impossible to actually saturate the channel: even if I can give an atom to my BFF, I can't use it to transmit more than 1 bit, right? Because I have to tell him some particular direction along which to measure the spin epistemic state. But if we start with an entangled pair, he can keep one half, and then without touching his atom, I can manipulate my atom and hand it to him in such a way that I can transmit two bits. Work out the details of this.


Important point: QM violates Bell's inequality. As a local hidden-variables theory, our model always satisfies Bell. So different. But similar.

27 July 2007

Quantum Mechanics in Pictures, or How I spent my summer vacation

After a two-week camping trip, I visited Canada/USA Mathcamp for one week. While there, I studiously avoided any Harry Potter spoilers (upon returning home, I dutifully took a one-day break from the rest of life to read the book), offered an evening activity focused on mind-body-movement-partnering exercises called "Learn to Fly", and hung out with some of the best people on the planet. Oh, and I taught a three-day course on topics in quantum mechanics, with a fun structure: I set myself the challenge of preparing three one-day talks, which, while all part of the same large story, could be understood and appreciated entirely separately. I more-or-less succeeded, although certainly I would edit the talks before giving them again. Partly to preserve them for posterity, and partly so that I could share them with you all, I thought it might be nice to post my lecture notes here; the first is in this post, and the later two will be in posts of their own.

Day One was on "Quantum Mechanics in Pictures", and is based largely on B. Coecke's paper Kindergarten Quantum Mechanics.

We begin by setting up a graphical notation with which to describe science. We have systems = particles, which we draw by vertical lines with upward-pointing arrows, and which we label by the kind of system=particle that the line represents. States, costates, operations, and numbers are, respectively, downward-pointing triangles with single lines coming out the top, upwards-pointing triangles with single lines coming out the bottom, squares with a line coming out each end, and diamonds with no lines. By "line", I always mean labeled directed line i.e. system.

"Numbers" live in some system --- for CM, numbers are 0 or 1; for SM, probabilities; for QM, numbers are complex ---, and we can multiply them by putting them next to each other. We can arrange our system in some state; a "costate" is really a measurement, asking "how much is this system having this property?". Operations are functions: we plug in a state of system A, and get out a state of system B.

Given two systems, we can combine them ("tensor product") by drawing their lines next to each other, and interpreting it as a single line --- like putting two pipes together into one big tube. This corresponds to having a joint system of A and B. In general, we can have "operations" from any number of systems to any other. We can also hook these objects together to form composite objects. Lines are allowed to cross each other --- we don't care about whether on top or on bottom --- and the usual laws hold. In general, we can tell what kind of object we have (A-state, function from A to B, etc.) just by inspecting the incoming and outgoing lines. E.g. a costate is a function from A to "nothing", i.e. to numbers. If I were speaking in abstract language, I'd say that we are working in a symmetric braided monoidal category.

This will be Axiom 0: that we have these types of objects and can combine them in these natural ways. Axiom 0 is required of any type of "physics": any scientific theory should accommodate systems, states, joint systems, manipulations, measurements, etc.

Quantum Mechanics admits two kinds of "turning upside down". The first kind gives us Axiom 1: Given any X: A -> B, we can form its adjoint X† : B -> A. In general, I will be sloppy and drop the dagger when I'm taking the adjoint a state (to get a costate). The adjoint of a number is called its "complex conjugate". Given a state S, its adjoint is the measurement "how S-like is it? We can define the dot product by drawing two systems meeting at a dot, meaning "turn the second one into its adjoint and compose". Taking the adjoint is some kind of "time reversal".

Axiom 2: for any physical evolution --- i.e. time running forward --- i.e. anything we can actually do _in a lab_ --- X:A->B, we have X^{-1} = X†. This is called "unitarity". Pictures: this says that time evolves deterministically, in such a way that relative similarity does not change. Note: This is not true in the physics of _math_. We could make a theory of "functions"; these can be many-to-one, so definitely not unitary.

There's another kind of turning upside down. Adjoints reversed operators, etc., but not spces. Axiom 3 pt. 1: to every system A, a "dual" system A*, which we still label A but draw the arrow pointing down, s.t. A** = A. Axiom 3 pt. 2: Existence of Bell (co)states; pictures; satifying rule.

Then we have a "teleportation protocol": I have a state. I want to transmit all info in state to you. But there's no reason to believe that I can _measure_ all info (c.f. Heisenberg). One protocol: just carry particle to you. Bad: maybe you're in China, and we're just talking by phone. Better: before you leave, we make entangled pair (BFFs sharing dollare bill). Then I _measure_ "in what way is my state like LH part of Bell state" and transmit that classical info. Advertisement: we will explaion this in more detail tomorrow.

_But_: no cloning theorem. (Star Trek). A "cloning protocol" would be a physicial thing we could do X(s,e) = s,s for a fixed e and any s. But then by unitarity and gluing we get a contradiction, unless only numbers are 0 and 1. E.g. CM. If more numebrs are available, no.

Axiom 4: For any two states a and b, can get from one to another "continuously", in the sense that it's cont's when you dot with some (co)state c. This is true in SM. There exist "mixed states" where you don't have complete knowledge. SM allows the following cloning: measure state exactly, then clone it. But this _does not_ clone the "mixed state" unless I perversely then _forget_ the new info. In QM, axiom is amended: Axiom 4': Can cont'sly transmorm from any _pure_ state to any other _pure_ state going through only _pure_ staets.

"Pure" state can be well-defined; intuition is "state of maximum knowledge". We will give example of this tomorrow. It is a theorem that SM (= CM + mixed states = Axioms 0 through 3 plus technicalities) _plus_ Axiom 4' is equivalent to QM (in particular, forces complex numbers), proved in L. Hardy Quantum Theory From Five Reasonable Axioms.

18 July 2007

The problem of time

One of my campers asked me about the problem of Time: why is it different from space, why can't there be hidden time dimensions.

Here's my response; please comment with corrections/additions:

The problem of Time is a very difficult one, and you've asked some deep questions, many of which do not yet have good answers. I'll answer bits of them, but my answers will be necessarily incomplete.

First of all, why aren't there hidden time dimensions? In fact, lots of physicists have asked about this, and a priori there could be hidden time dimensions. In special/general relativity, the only difference between space and time dimensions is a minus sign somewhere; you could make a manifold with two time dimensions and d spatial dimensions if you want. Then you can still do general (and special) relativity just fine --- the laws are perfectly consistent with any (nonnegative) numbers of time and space dimensions.

But problems arise in quantum field theory. Many QFTs have what are called "anomalies": while the original classical laws are symmetric in a certain way, it becomes impossible to quantize and keep the symmetry. So, for instance, without a Higgs field, it's impossible to quantize a chiral gauge theory with massive fermions (most of the universe is a non-chiral gauge theory, which can be quantized with massive fermions, but the Weak Force is chiral, meaning that if you look at it in a mirror, you get different effects; this is why physicists believe that there are Higgs bosons). There's a very important example in string theory: the _only_ anomaly-free bosonic string theories exist in 26 dimensions (25 space, one time), and the _only_ anomaly-free superstring theories are in 10 (9+1) dimensions.

In fact, for an anomaly-free QFT --- i.e. a QFT that respects special relativity --- you cannot have both at least two space and at least two time dimensions. You could give up special relativity, but then you'd be giving up the entire ballgame: the whole idea of QFT is to respect a symmetry that definitely appears in nature.


It seems that abstractifying (not a word) time is different from abstractifying space. Geometry takes properties of real space, and idealizes them into abstract spaces; of course, the whole point of special/general relativity is to take properties of spacetime and idealize them into abstract spacetimes.

What about phase space? Don't we normally abstractify space into phase space, and import time in the most stupid "\times\R" way? This is where the meat of the matter is. In some sense, I'd argue that time is inherently _more_ abstract than space. "Time" is a measurement of how things _change_ --- a function should be thought of as a one-dimensions object (with a "front" or "in" end, and "back" or "out" end; functions can be glued together so long as you respect this orientation) pointing in the "time" direction. What's going on with things like "phase space" is that phase space exactly describes the _state_ of a particle, and you need a time dimension for the state to change. Thus Hamiltonian formalism in classical or quantum mechanics.

Incidentally, Lagrangian formalism is manifestly special-relativistic invariant --- in Lagrange (equals "least action principle"), the law is that particles evolve in such a way as to minimize a certain total path length. This is perfectly easily expressible in a spacetime, and that expression is probably more natural than in just space, because the path the particle takes is through spacetime. Lagrangian formalism is what most modern QFT folks use.


One other point is worth making: Roger Penrose, one of the most brilliant (and crackpottish) physicists alive today, invented a physics called "Twistor Theory". For twistor theory, he has to replace SO(3,1), the symmetry group of (our) three space dimensions and one time dimension, with SO(4,2), which most naturally describes a universe with two time dimensions. Twistor theory, which takes light rays as basic rather than spacetime points, has not been widely adopted; it's not clear to me what the philosophical corollaries of this change are.

Sorry I don't know any references. Hope this gives you some more ideas where to look.

30 June 2007

Marie Antoinette was right!

Or, rather, Marie-Therese was, when she encouraged the commoners to eat rich, expensive desserts. I recently procured a copy of Rose Levy Beranbaum's The Cake Bible, a book I highly recommend. Ultimately, I hope to try every cake she suggests; over the last two weeks, I began that project with the first five of her recipes, in order, and also a few of her cakes later on. We've had almost one cake a night — no mean feat for a family of five.

It was with great pleasure, then, that I came upon an easy vegan gem. Beranbaum loves her butter, and her ingredient-sorted categories include All Egg Yolks, All Egg Whites, and Safe For Passover, but not, for instance, Low Fat (low saturated fat, yes, e.g. meringue). So when I saw a recipe from the '50s — she says it's an original, although I will paraphrase — that was safe for my godmother's birthday (my godmother tries to avoid dairy), I knew I had to try it. It turned out fantastic.

Chocolate Cake

Preheat oven 350°F.

In a medium bowl, whisk together
  • 1/3 cup unsweetened cocoa powder
  • 1 cup boiling water
until smooth, and allow to cool. Whisk in
  • 1 tsp vanilla
  • 3/4 cup mayonnaise*.


In bowl of standing mixer with padle (or by hand in a large bowl, I guess), combine dry ingredients:
  • 2 cups (sifted) cake flour
  • 1 cup sugar
  • 2 tsp baking soda
  • 1/2 tsp salt.
Add chocolate mixture, and mix on low until everything is moistened. Increase to medium and beat 1 minute to aerate.

Prepare two 8-inch cake pans, by (spray) greasing, and then lining the bottoms with circles of baker's parchment, and greasing again. Scrape batter (very liquidy) into prepared pans; it should only fill about 1/3 full (cake expands significantly). Bake 20 to 25 minutes.

Having cherries on hand, and having made a cake with brandied cherries earlier last week, I combined some cherry preserves and the brandy-cherry-syrup in a small frypan over low heat to dissolve. After turning out the first cake, I poured a thin layer of glaze over it, so that I could glue the top layer on. I then poured more glaze on the top of the cake, so that little bits rolled down the sides, and lined the outer edge of the cake top with cherry halves.

* Note: I used only 1/3 cup mayonnaise, since I misread the recipe. It worked fine. More will only make it richer. And I'm sure that vegan mayonnaise substitutes, such as Nayonnaise would work just as well. I would even be interested in trying this cake with Wildwood Garlic Aioli — I've had a mayonnaise cake made with garlic aioli before (a plain yellow cake; the cook also substituted rice flour for cake flour to make it gluten free, and it still turned out good), and Khymos suggests that chocolate and garlic might go well together with the addition of coffee. This recipe would definitely be good with some added instant espresso (I'd put in between a tsp and a Tbsp in the first step with the cocoa powder).

25 June 2007

A note on ultrafilters and voting.

Terence Tao has a very nice discussion of ultrafilters in his most recent post. In his comment thread, I posed two problems (and provided an embarrassingly incorrect solution to one of them), which I will repeat here; then I'll write about another aspect of Tao's post.

Problem (1): Construct (with Axiom of Choice, of course) a bounded sequence and an ultrafilter so that the sequence has no large convergent subsequences.

Problem (2): Open-ended. Continuum Hypothesis, I've heard it claimed, is (at least sufficient to guarantee that) ultrapowers by different (nonprinciple) ultrafilters are equivalent. I first heard (a version of) this claim from Conway, and I believe he suggested that this was actually equivalent to CH, although that could be a faulty memory. A commenter says that it's not the case that two nonprinciple ultrafilters are themselves isomorphic (in the sense of there existing a bijection on \N that takes one ultrafilter to the other), but it's not clear to me why not. So, why not? And, more generally, is there a reasonably easy way to see the first claim in this paragraph?


Lastly, I would like to point out another extremely cool fact that Tao breezes through. To explain ultrafilters, Tao uses an analogy with voting theory: an ultrafilter is a way of deciding which side wins when infinitely many people vote.

This, then, suggests a proof of Arrow's Theorem.

Let me explain.

Given a set A of alternatives, and a set P of people, let's define a social welfare function (SWF) as a function that takes as input each person's (linear) rankings of the elements of A and gives as output a global ranking of the elements of A (the output may have ties). There are many conditions a "fair" SWF might satisfy; Arrow singles out three:
  1. Pareto (P): if the input consists of identical lists, then this list is the produced output,<\li>
  2. Independence of Irrelevant Alternatives (IIA): given two separate inputs such that, person-by-person, the relative rankings of two particular alternatives x and y are the same, then the outputs have the same relative rankings of alternatives

  3. Monotonicity (M): Say the output has x higher-ranked than y. Then if some people who originally ranked y above x switch their positions, then the output will still have x above y. Except for the possibility of ties, this property is strictly stronger than property P.


By the way, although originally we allowed SWF to produce ties in the output, properties P and IIA are sufficient to guarantee (if A has at least three alternatives) that ties never occur. Indeed, say a certain input yields a tie between x and y. Let X be the set of people who prefer x to y, and Y the set of people who prefer y to x. Taking another alternative z, we can, using IIA, insert it so that all of X prefer x>z>y and all of Y prefer z>y>x; then IIA says x and y are still tied, and P says z>y and hence also x; inserting so that, for X, x>y>z, and for Y, y>z>x, then P says z<y and hence also x; but the relative rankings of z and x are the same in each insertion, so IIA gives us the desired contradiction. Thus, from now on, I will assume that our SWF does not result in ties.

Arrow's theorem says that, if A has size at least three, and if P is finite, then the only SWF satisfying both P and IIA is a single-person dictatorship. Since M is also natural, and since the book* I'm glancing at for my definitions does so too, I will show that SWFs satisfying all three conditions are single-person dictatorships.

Well, an ultrafilter \F can give us a SWF: if you want to know the relative ranking of x versus y, just see whether the set of people who put x above y is \F-large (i.e. in \F) or not. This binary relation is then definitely an ordering, because of ultrafilter properties: for instance, if x > y and y > z in the output, then there are \F-large sets X and Y in P, but then X \cap Y is \F-large, and so since the individual rankings are linear orderings, we must have x > z in the final ranking. Such a SWF obviously satisfies properties IIA and M (and hence P) above.

On the other hand, a SWF \F satisfying IIA and M gives us an ultrafilter. How? Let's pick two particular elements a and b in A. Then for any X \subseteq P, consider feeding \F a collection of rankings so that everyone in X put a > b everyone else puts b > a. If the SWF declares a > b, then declare X to be an a,b-large subset of P. IIA makes this well-defined, and M guarantees that supersets of large sets are large. But this definition does not depend on the choices of a and b: given another c, inserting so that X ranks a>b>c and everyone else ranks b>c>a, then by P, \F yields a>b>c, and so X is a,c-large; similarly, it's c,b-large by a different insertion, and by continuing this type of argument you can get that an a,b-large X is x,y-large for any x,y\in A.

Is the finite intersection of two large sets also large? It suffices to show that the division of a large set into two sets yields a large set and a small set. Say X = Y \cup Z is large, with Y and Z disjoint; then assign Y as b>a>c and Y gets a>c>b and everyone else gets c>b>a. As X is large, certainly \F assigns a>c. Of the three possible (relative) places for \F to put b, either b>c, in which case Y is large, or a>b, in which case Z is large.

Thus \F generates an ultrafilter on A. Writing A! for the set of permutations on A, it's clear how a SWF=ultrafilter \F acts: take our P-sequence of points in A!, and take the (ultra)limit. In the case when A is finite, then the ultraproduct of A! is just A! — i.e. the limit is just another permutation.

In any case, Arrow's theorem is thus equivalent to the statement that there are no nonprinciple ultrafilters on a finite set.


Some discussion is warranted. In justifying that a SWF satisfying the desired criteria is an ultrafilter, I all but proved Arrow's theorem. Given that the statement that "there are no nonprinciple ultrafilters on a finite set" is rather trivial, this is not too surprising. Also, I had to use the fact that A had at least three elements throughout the argument, because Arrow's theorem fails when there are only two options (for instance, Majority Rules works). So it's not immediate to translate Arrow's criteria into the definition of an ultrafilter. Possibly, though, my argument could be cleaned a bit to make it more immediate. Also, it seems that monotonicity is required in guaranteeing that our SWF gives an ultrafilter, but Arrow's theorem holds even if you drop that condition.

On the other hand, this analysis does allow us to understand Arrow-style voting systems with infinitely many people. Or with continuously many people. Or other topologies. And, you know the US is very large — if we could use Axiom of Choice in practice, then perhaps we can pretend that the population of the US is infinite and pick a nonprinciple ultrafilter to be our voting system.


* For a very fine text, suitable for someone who's never seen any mathematics or someone who's seen a lot, on Arrow's Theorem and related topics, I highly recommend Mathematics and Politics: Strategy, Voting, Power, and Proof by Alan D. Taylor (Springer, 1995).

19 June 2007

Money

It takes a lot of money to run a Presidential campaign. In 2000, Bush raised and spent a little under two hundred million dollars; Gore only got 150 million (Center for Responsive Politics). In 2004, each of the major candidates raised and spent between three and four hundred million dollars. For the 2008 race, so far Clinton has raised the most money (36 million, ibid.) and Obama is next (Romney is a close third). Mr. Clinton personally raises one hundred thousand dollars a night for his wife's campaign (New York Times). Each of the major candidates will likely (need to) raise at least half a million dollars by the general election.

Of course, all presidential candidates are reasonably affluent. Romney is the richest, with between 190 and 250 million dollars to his name, and Clinton is next, with 10 to 50 million dollars, mostly from her husband's lecture circuit (CBS News). Campaign finance laws restrict the amount of money you or I may donate to any given candidate (for 2008, individuals may not donate more than $2,300 per candidate per election), but politicians are allowed to spend as much of their own money as they want.

Which leads, of course, to Michael Bloomberg, mayor of New York City. Today he announced, while campaigning in California, that he has changed his party affiliation from Republican to Independent (he was a Democrat before running for mayor). Almost certainly he will run for President; he certainly enjoys stoking that fire. But without a major party to provide funding and machinery, does he stand a chance? To find a third-party candidate who did better than Ross Perot (close to 20% in 1992, splitting the right and assuring Clinton a win), you have to go all the way back to Roosevelt in 1912; Nader never cracked 5% (New York Times). Except that Bloomberg is Forbes' 44th richest American, with 5.5 billion dollars (Wikipedia). He will likely run on the Independence ticket --- the same one that sponsored Ross Perot, and on whose ticket he ran (along with the GOP endorsement) for mayor. He supports abortion rights and gay marriage, and is staunchly pro-business. And, it's rumored, he has already set aside $1 billion for a presidential campaign (ibid.). This is easily enough to be competitive.

Bloomberg is not, of course, my favorite of the presidential candidates, but I certainly wouldn't mind him in the White House. I'm amused by what his friend George Stephanopoulos said about his potential run (ibid.; Stephanopolous in italics):

1. 70% of the nation would need to feel as though the country is moving in the wrong direction. Check.
2. Both nominees would need to have disapproval ratings in the 40% range. Clinton and Romney.
3. 40% of the country would need to be open to a third party candidate. Likely?
4. 20-25% of the country would need to be "open to Mike Bloomberg." Time will tell.

10 June 2007

Categories, trees, and polyhedra

Here's a Zome Tool exercise that y'all can try your hand at, and then I'll tell you some of the category theory behind it, and why I was thinking of it.

Build a solid with six pentagonal faces and three quadrilateral faces.  By Euler's formula, then, there are 14 vertices, all necessarily trivalent.  I want this figure to be as "regular" as possible: so two of those 14 vertices should be where three pentagons meet (these ought to be antipodes) and each of the other twelve vertices is at the corner of two pentagons and a quadrilateral.


Now I'll tell you some category theory, and explain why this figure is the appropriate three-dimensional entry in the sequence that goes "point, line segment, pentagon".  I don't yet know what the next (four-dimensional) figure is, but I do know that it has 42 zero-dimensional vertices, 84 one-dimensional edges, and, I think, 56 two-dimensional faces and 14 three-dimensional hyperfaces.  Perhaps you can save me the trouble of actually drawing it myself, and work it out on your own.  I'm curious to know what the patterns are of the number and kind of different dimensional faces.  If you don't like category theory but do like combinatorics, then skip ahead until you get to pictures of rooted trees.


Much of this material is available at John Baez's website; in particular, I was thinking about this after reading some of Winter 2002 of his Quantum Gravity seminar.

Say you have some binary operation.  Like multiplication in a group.  Then given two elements, say A and B, there's one way to multiply them (in order): AB.  Yes, I could also do BA, but for now let's say that I know A comes first.  I'm thinking of A and B here as some kind of "machines" (you might say "functions") with one "IN" pipe and one "OUT" pipe.  I can compose to machines by hooking them together and putting a big metal box around them, and pretending that it's one machine:

->-A->- composed with ->-B->- gives

   _________
->-|-A->-B-|->-  
=:   ->-AB->-
   |_______|  

Fair enough.  This is what we learned back in elementary school when we first met functions.  From now on, I'll use parentheses () to denote "putting a box around something" and I'll probably just write functions on a line like we normally do.

But what about multiplying three functions?  There's this super cool property of function composition --- it's this property that motivates the idea of a "group" --- called "associativity".  In symbols:

(AB)C = A(BC)

What does this property really mean?  We defined composition to be a binary operation, and this says that the two ways to use a binary operation to create a ternary one are actually the same.  More intuitively, the associative law says that there's only one interpretation for ABC, i.e. for this picture:

->-A->-B->-C->-

You can prove that the associative law is sufficient to guarantee that any unparenthesized string of symbols has a unique interpretation.  If this is the full extent of our laws --- that we can draw pictures, each consisting of a (oriented) string with some number of (labeled) beads on it, and such that each picture has a unique interpretation depending only on the order of the beads, then we get a category.  (Each picture is called a morphism, and the strings can also carry labels; a string label is called an object.  Of course, we're allowed to have no beads on a string: ->- is the "identity morphism" for the object labeling that string.  If all the strings have the same label ( i.e. if we don't allow labeled strings) then we get a monoid.)

Now, however, let's relax the associative law.  In particular, rather than saying that (AB)C equals A(BC), let's just say that they're "isomorphic".  We can start to develop the theory of "weak 2-categories", which I probably won't fully define here, by introducing an "associator 2-morphism", called a_{A,B,C}, which sends (AB)C to A(BC).

So far, so good.  What about if we want to compose four things?  Using a binary operation, there are five ways to do this.  We can place them in a pentagon, with associators going between them:

((AB)C)D 
--a_{AB,C,D}->  (AB)(CD)  --a_{A,B,CD}->  A(B(CD))
                                                   __
         \                                         /|
          \                                       /
       a_{A,B,C}                             a_{B,C,D}
            \                                   /
            _\|                                /
                                             
             (A(BC))D  --a_{A,BC,D}->  A((BC)D)

Well, Mac Lane realized that a good definitions of a (weak) 2-category and of monoidal categories (like Vect, where there's a "tensor product") should involve this kind of pentagon.  See, just saying that there are "associator" isomorphisms a_{A,B,C} between different ways of multiplying objects isn't enough, because a priori the isomorphisms going along the top or bottom of this diagram need not be the same.  But there really should only be one way that ((AB)C)D = A(B(CD)).  Mac Lane then proved that this pentagon is enough: if for any four objects this pentagon commutes, then any two ways of parenthesizing any given string will be isomorphic in a unique way.  (Just like the associative law guarantees that any two parenthesizations are equal, the pentagon law guarantees that two maps between parenthesizations are equal.)

How should we draw this?  Saying that this diagram commutes is like filling in the hole in the middle of the pentagon.  Indeed, I'll let you on your own continue this story into higher-dimensional categories, where, for instance, you need to introduce double associators like b_{A,B,C,D} defined by the following picture:

((AB)C)D  --a_{AB,C,D}->  (AB)(CD)  --a_{A,B,CD}->  A(B(CD))
                                                   __
         \                   ||                    /|
          \                  ||                   /
       a_{A,B,C}        
b_{A,B,C,D}         a_{B,C,D}
            \                ||                 /
            _\|              \/                /
                                             
             (A(BC))D  --a_{A,BC,D}->  A((BC)D)

By the way, what's a "zero associator"?  The associator a_{} gives us a morphism for each triple of objects, and the double associator b_{} gives us a 2-morphism for each quadruple of objects.  The 0-associator should give us an object for each pair of objects, and thus is exactly the binary operation m_{A,B}=AB.


Ok, now I'm going to move away from category theory --- I wanted to give you a sense of where this was all coming from, to do some combinatorics and explain why and how I was thinking of the problem I had originally posed.

See, a parenthesization is, of course, really just a binary tree:

((A(BC))D)(EF) =

     B  C
      \/
   A  /
    \/
     \ D  E F
      \/  \/
       \  /
        \/

The rule is very simple: \/ means multiplication.  The symbols are written from left to right, but the multiplication is parenthesized from top to bottom.  Of course, we've already counted how many binary trees there are: the Catalan numbers C(n) = (2n)!/(n!(n+1)!) give the number of binary trees with n+1 leaves.  (Symmetry is not being considered here:

\/         \/
 \/  and  \/

are different trees.)

What about associators?  Binary trees exactly label each object; what symbols can we use to accurately describe the associators?

Well, an associator describes some "continuous" transformation from

\/        \/
 \/  to  \/

I think of this as "sliding" that middle arm down the left side and up the right.  The graph, then, must have gone trough the following as an intermediate state:

\|/

Ah ha!  Associators correspond to trees with three-prong branches.  Rather than continuing to write words, let me just draw some ASCII pictures, and hopefully you'll understand the pattern:

0: ab

\  /
 \/


1: abc

\  /                            \  /
 \/        >--- \ | / --->       \/
  \  /           \|/          \  /
   \/                          \/


2: abcd

                          \  /  \  /
              \  /         \/    \/         \  /
 \  /          \/           \    /           \/          \  /
  \/       >--- \ | / --->   \  /   >--- \ | / --->       \/
   \  /          \|/          \/          \|/          \  /
    \/                                                  \/
     \  /                     ||                     \  /
      \/                      ||                      \/
                               
       \                  \  \  /  /               __
        \                  \ |  | /                /|
                            \ \/ /
       \ | /                 \||/              \ | /
        \|/                   \/                \|/
         \  /                                 \  /
          \/ \                ||             / \/
             _\|             \||/           /
                              \/
                 \  /                   \  /
                  \/                     \/
               \  /         \   /         \  /
                \/      >--- \ / --->      \/
                 \  /       \ | /       \  /
                  \/         \|/         \/

The thing in the middle of the pentagon is my failed attempt at a valence-four node.  It is on a downward-pointing 2-morphism, like the earlier b_{}, which goes from the morphism along the top to the morphism along the bottom.

I will leave to you the challenge of drawing the next figure (with all trees with five leaves).  In addition to pentagons, you will also get three quadrillaterals, with labels like

\ | /
 \|/
  \  |  /
   \ | /
    \|/

that describe the commutation (in the classical sense) of two different associators: in this case, a_{a,b,c} and a_{abc,d,e}.  You'll also, of course, get pentagons.  The image is hard to draw symmetrically on paper, because it requires edges to cross one another; it is, a planar graph, and fits well on a sphere.  The inside of the sphere, of course, should be filled in and labeled by the 5-valent node.

At higher dimensions, how much more complicated do the figures get?  We can derive some numerological data about the next figure.  It must have 42 vertices, because that's the next Catalan number.  I.e. there are 42 binary rooted trees with six leaves.  The figure should have 84 edges.  Why?  With six leaves, a binary rooted tree would have five branch points.  An associators corresponds to a pair of adjacent branches; each branch except the bottom one has exactly one branch directly below it, and this is all pairs.  So there are four pairs of branches, and hence four associators.  But each associator, i.e. each edge in the figure, corresponds to two vertices.  So there are 4/2= twice as many edges as vertices.  Continuing this argument, we see that the n-dimensional figure, i.e. the one coming from all trees with n+2 leaves, will have C(n+1) = (2n+2)!/(n+1)!(n+2)! vertices, and (n/2)C(n+1) = (2n+1)!/(n-1)!(n+2)! edges.

What about hyperfaces of other dimensions?  It's hard to continue this style of argument to two-dimensional faces, because they come in both quadrilateral and pentagonal flavors.  Well, how many n-1-dimensional hyperfaces are there?  The central n-dimensional piece is labeled with tree branching into n+2 leaves.  A move from a j-dimensional face to a j+1-dimensional face that it's a part of consists exactly of deleting an interior (non-leaf) edge.  So going from a j+1-dimensional face to one of its j-dimensional constituent faces consists of "zipping up" some k adjacent edges that all come from the same l-valent branch.  Except that k cannot be 1 --- zipping up one edge doesn't do anything --- nor can it be l.  So what does that leave us for n-1-dimensional faces?  I want to put one pair of parentheses in among n+2 leaves (a j-dimensional face corresponds to a (partial) parenthesization with n-j pairs of parentheses).  There are n+3 spots for each parenthesis to go; I want to choose two of them (the first one for the open parenthesis, the second for the close).  There are, of course, (n+3)(n+2)/2 ways to do this.  But n+3 of them don't count: n+2 of these ways just grab a single leaf, and one of them grabs all the leaves.  So there are really n(n+3)/2 valid single-parenthesizations, and hence 14 three-dimensional hyperfaces on the four-dimensional figure.  (When I was first trying to guess the number, I asked The Online Encyclopedia of Integer Sequences for the sequence 0,2,5,9, and sure enough got the right sequence.)

Baez says that at any dimension the figure you draw is homeomorphic to a solid ball.  If we believe him (I have no reason not to, but he hasn't justified this claim in my reading so far, and I haven't proven it myself) then it's easy to calculate the number of two-dimensional faces on the four-dimensional entry in our sequence of polytopes: by Euler's formula, the alternating sum of the number of faces of different dimensions must be exactly 1, and so there must be 56 two-dimensional faces, coming in a mix of quadrilaterals and pentagons.  I am very curious to know how many of each there are.  I would also like to know what the possible three-dimensional hyperfaces are: certainly there's the 9-sided figure that I asked you to construct with Zome Tool, corresponding to each tree with one five-valent branch (the four-solid has seven of these).  But there are also hyperfaces corresponding to trees with one node branching into four and one branching into three.  What kind of solids are these?  In even higher-dimensional figures, we could also get solids from trees with three ternary branches.

There's a family of polyhedra just waiting to be discovered.  And they have deep algebraic interpretation, so they really ought to be found and studied.  Even at the very lowest dimensions, I don't understand some of it: for example, the solid you built out of Zome Tool, corresponding to all the rooted trees with five leaves, has a three-fold symmetry that isn't obvious from the original set-up.  Even its two-fold symmetry is different from the one in the problem:  the natural north and south poles of the figure are the graphs

\  /  \  /         \  /  \  /
 \/    \/           \/    \/
  \    /             \    /
   \  /      and
      \  /
    \/                 \/
     \  /           \  /
      \/             \/


whereas from a tree point of view the most natural endpoints are

\  /                     \  /
 \/                       \/
  \  /                 \  /
   \/        and
        \/
    \  /             \  /
     \/               \/
      \  /         \  /
       \/           \/

Indeed, the natural symmetry from the tree point of view --- reflection in the left-right direction --- manifests as the least likely of symmetries of the polyhedron: 180° rotation through one particular axis, the one connecting one of the quadrilaterals to the opposite edge.  Why?  I have now idea.  If you have ideas, please leave a comment.