Today's* ODEs class was rather amazing. In standard Yasha style, the topics darted around (he once explained that "It is impossible to be lost in my class. Because topic changes every two minutes, so if you are lost, you won't be lost."): a taste of KM theory, a dash of geodesics on ellipsoids, and a little infinite-dimensional Hamiltonian systems.
It is this last discussion that I found rather mind-blowing. We did not say anything new; rather, we started defining what will eventually give us the quantum behavior of fields, from an entirely classical viewpoint.
Consider one PDE \d u(x,t)/\d t = F(u), where u is a function and F is some map from functions to functions. Let's say, for example, that we're interested in complex functions on the circle: u:S\to\C is what the physicists would call a (complex) scalar field. Let V be the set of all scalar fields; I will not be precise what conditions I want (presumably some smoothness conditions, say complex-analytic, and perhaps some convergence conditions, for instance that the square integral converges. I will call maps from V \to \R "functionals" and those from V \to V "operators"; V is a (complex) vector space, so it makes sense to talk about (real-, complex-, anti-) linear functionals and operators. For instance, \d/\dx is a linear operator; (1/2\pi) \int_0^{2\pi} -- g(x) dx is a linear functional. (I will from now on write \int for \int_0^{2\pi}; consistent with half of mathematics, the volume of the circle is 2\pi.) Rather than thinking of my problem as a PDE, I should think of it as an ODE in this infinite-dimensional space.
Imposing analyticity, etc., conditions on V curtails the freedom of functions: the value of a function at a point largely determines the value at nearby points. We ought to perform a change-of-basis so that we can better tell functions apart: let's assume, for instance, that each function has a Fourier expansion
u(x) = u_0 + \sum p_k e^{-ikx} + q_k e^{ikx}
(Given, of course, by
u_0 = (1/2\pi) \int u(x) dx; p_k = (1/2\pi) \int u(x) e^{ikx} dx; q_k = (1/2\pi) \int u(x) e^{-ikx} dx.)
((Sometime soon I will figure out what I believe to be the proper class of functions with which to do physics. One might hope for (complex) holomorphic, or for complex analytic, or for real C^\infty, or real analytic, or maybe just real integrable. But the Fourier transformation is vital to modern physics, and none of these is particularly the class of things with natural Fourier transforms, because I often end up with \delta functions. I ought to study the Fourier transform a bit more, because I don't understand something basic: it seems that the \delta functions supply a continuum of basis states, whereas the fourier modes are countable. But every delta function can fourier-transform, and modulo convergence this should be a change-of-basis. Perhaps it has something to do with the fact that the Fourier transform doesn't care about individual values of functions, just their integrals. We really ought to pick a class of "functions" so that Fourier really is a legitimate "change-of-basis" in the appropriate infinite-dimensional sense.))
Then our manifold V is, more or less, an odd-dimensional manifold. We cannot hope to put a symplectic structure on it. On the other hand, the ps and qs so naturally line up that we really want to write down \omega = \sum a_k dp_k \wedge dq_k, and we can do so on the even-dimensional subspace \{u_0 = 0\}. (The coefficients a_k have yet to be determined; \omega is a symplectic form for any choice of a_k, and we should choose judiciously to match other physics. Throughout this entry, I eschew the Einstein summation conventions.)
Well, almost. This \omega is almost certainly not going to converge if you feed it most pairs of functions, and I think that to restrict our functions to those that have Fourier expansions that converge rapidly enough is premature, especially since we have yet to determine the a_k. Rather, Yasha suggests, we should discuss the Poisson bracket, which is what really controls physics.
How so? you ask. And what, fundamentally, is a Poisson bracket? Consider our original Poisson bracket \{F,G\} = \omega(X_G,X_F), where X_F is defined by \omega(X_F,-) = dF(-). Then unwrapping definitions gives that \{F,G\} = dF(X_G) = X_G[F], where on the RHS I'm treating X_G as a differential operator. Our Poisson bracket knows exactly the information needed: given a Hamiltonian H, the flow is generated by the vector field X_H = \{-,H\}.
In general, a Poisson bracket is any bracket \{,\} satisfying
(a) Bilinearity (for now \R-linear; perhaps we will ask for \C-linear soon?)
(b) Anti-symmetry and Jacobi (i.e. \{,\} is a Lie bracket)
(c) It behaves as a (first-order) differential operator in each variable: \{FG,H\} = F\{G,H\} + \{F,H\}G.
Condition (c) guarantees that a Poisson bracket never cares about overall constants in F, etc.: \{F,G\} depends only on dF and dG. Symplectic forms, being non-degenerate, are dual to antisymmetric _0^2-tensors (two raised indices), and Poisson brackets are exactly of this form. But Poisson brackets need not be nondegenerate to give us physics. Indeed, the Poisson bracket as described is plenty to define a vector field to be the "derivative" of each function. (Not quite the gradient, but an antisymmetric version of one.) And this is what's needed.
So, to describe the physics of our system, it suffices to pick an appropriate Poisson bracket. Returning now to the system we were originally interested in, of scalar fields on the circle, our Hamiltonians should be functionals of u(x). Let's assume that every functional (at least all the physical ones) can be written in Taylor expansion as polynomials in p_k and q_k. Then to define the Poisson bracket, and continuing to ignore issues of convergence, it suffices to define the brackets between our various coefficients p_k, q_l, and u_0. Given \omega = \sum a_k p_k \wedge q_k (on our subspace of u_0=0), we get \{p_k,q_k\} = a_k^{-1}, and all other brackets are 0.
What would be nice now is to find some physical reason to pick particular a_k, or to even motivate a Poisson bracket of this form at all. Perhaps you vaguely recall a physicists telling you that the Hamiltonian density functional for free scalar fields should look something like h(u(x)) = 1/2 (u^2 + m^2 u'^2), where u'(x) = du/dx. Then the Hamiltonian would be the integral of this: H(u) = \int_0^{2\pi} h(u(x)) dx = u_0^2/2 + \sum_{k=1}^\infty (1-m^2k^2) p_k q_k. With the Poisson bracket given, we can solve this explicitly: p_k(t) = p_k(t=0) e^{(m^2k^2-1) a_k t} and q_k(t) = q_k(t=0) e^{(1-m^2k^2) a_k t}. But this doesn't particularly match physical expectations --- why, for m^2 small and positive and a_k positive, should we expect fields to tend to develop lots of p_k oscillation for large k and lots of q_k oscillation for small k, and for the rest of the oscillation to die?
I'll come back to this kind of physical justification, and perhaps argue my way to a Hamiltonian and a Poisson bracket from the other direction, later. First, I want to explain why Yasha likes this set-up, via mathematical, rather than physical, elegance.
Many interesting Hamiltonians, Yasha observes, are of forms similar to the one I considered in the previous example: the functional H(u) is defined as the integral of some density functional h(x,u(x),u'(x),...). Let's consider this case. Yasha, in fact, uses a particularly restricted case: isotropy is easy (no x dependence), but Yasha also says no derivatives: h = h(u). Then H(u) = (1/2\pi) \int h(u(x)) dx
Then what happens? Well, \dot{q_k} = \{H,q_k\} = a_k \dH/\dp_k = (1/2\pi) \int a_k \dh/\dp_k, and \dot{p_k} = (-1/2\pi) \int a_k \dh/\dq_k/. Plugging these into \dot{u}(x) = \sum \dot{p_k} e^{ikx} + \dot{q_k} e^{-ikx} gives
\dot{u}(x) = (1/2\pi) \int \sum [-a_k \dh/\dq_k e^{ikx} + a_k \dh/\dp_k e^{-ikx}].
Can we recognize this as anything simpler? Recall the chain rule: \dh/\du = \sum (\dh/\dq_k \dq_k/\du + \dh/\dp_k \dp_k/\du). Then, since q_k = (1/2\pi) \int e^{ikx} u(x) dx, we see that \dq_k/\du(x) = (1/2\pi) e^{ikx}. So we see that
h'(u)(x) = \dh/\du(x) = (1/2\pi) \sum [ \dh/\dq_k e^{ikx} + \dh/\dp_k e^{-ikx} ]
This isn't quite what we want, because we have opposite signs on the two a_k. But if we differentiate h'(u) with respect to x, we get
\d/\dx [h'(u)(x)] = (1/2\pi) \sum [ ik \dh/\dq_k e^{ikx} + -ik \dh/\dp_k e^{-ikx} ]
And so, if we pick a_k = k, we see that
\dot{u}(x) = i \d/\dx [h'(u)]
I've been a bit sloppy with this calculation, and you may have trouble following the factors of 2\pi. In particular, I write \int, when I probably should have written \int_{y=0}^{2\pi} dy. But then I would have had to keep track of what's a function of what. Anyway, somewhere in here there's a delta-function, and the formula is correct.
Yasha doesn't do this calculation, preferring one a bit more general and rigorous, which I will sketch:
He observes that since we're a vector space, we can identify points and tangent vectors. Then what are the cotangent vectors? Our vector space of functions has a natural dot product: <u,v> = (1/2\pi) \int_0^{2\pi} u(x) v(x) dx; where now I'm thinking of u and v not as points but as tangent vectors at 0. So to each vector u(x) I can identify the linear functional (1/2\pi) \int u(x) -- dx, which Yasha calls \delta u. Then, knowing that he's chosen {p_k,q_k} = k, Yasha guesses a Poisson bracket:
P(\delta u,\delta v) = (1/2\pi i) \int u v' dx
Recalling the expressions for p_k and q_k as functions of u, we can recognize dp_k = \delta[e^{-ikx}] and dq_k = \delta[e^{ikx}]. Then we can check whether P is the proper Poisson bracket (really the tensor form, eating the derivatives of the functions we would feed into \{,\}) by evaluating it on our ps and qs:
\{p_k,q_k\} = P(e^{-ikx},e^{ikx}) = (1/2\pi i) int e^{-ikx} ik e^{ikx} dx = k; all other brackets are 0, and P is antisymmetric by integration by parts, so must be correct.
Then if F and G are two (real- or complex-valued) functionals on our space of scalar fields, what is their bracket \{F,G\}(u)? Let's say that F and G have the nice form F(u) = (1/2\pi) int f(u(x)) dx (and G is similar). Then \{F,G\} = P(dF,dG) ... what is dF? Well, it's the linear part of F. If \epsilon v is a small change in u, then
(1/\epsilon) (F(u + \elpsilon v) - F(u)) = (1/2\pi) \int f'(u(x)) v(x) dx
where f'(u) is the functional derivative of f with respect to u. So we can recognize dF(u) as \delta[f'(u)], and conclude that
\{F,G\}(u) = (1/2\pi i) \int (f'(u)) (g'(u))' dx
where on the second multiplicand, the inside prime is w.r.t. u, and the outside is w.r.t. x.
Then \dot(u)? Well, for any function(al) F(u), we have \dot{F} = \{H,F\}. u is not a functional, of course, but it is a vector of functionals: u(y) = (1/2\pi) \int u(x) 2\pi \delta(x-y) dx. So du(y) is the covector field (in x) given by \delta[2\pi \delta(x-y)]. (And I'm unfortunately using \delta for too many things, because I want it is an operator and as a Dirac delta function.) So, all in all,
\dot{u}(y) = -\{u(y),H\} = (-1/2\pi i) \int 2\pi \delta(x-y) (h'(u))' dx = i d/dx [h'(u)].
I would like to complete my discussion of scalar fields from entirely a different direction. Given a unit circle, I'd like to describe the propagation of "free scalar fields" around the circle, where now I'm thinking of these as some sort of wave. Remembering very little physics, I can imagine two different interesting dynamics. Either all waves move the same speed, regardless of their shape, or waves propagate at different speeds, with the "high-energy" ones moving faster.
Let's write down some differential equations and see what happens. I'm interested in \dot{u} = some functional of u. Of course, we should demand some isotropy, so x should not appear explicitly. What are the effects of different terms? Keeping everything linear — I want free field propagation, so everything should superimpose — we could ask about \dot{u} = cu, for constant c, but this is boring: the value of the field at a point never cares what the value is ant neighboring points. (Indeed, the whole field evolves by multiplication by e^{ct}. If, for instance, c=i, then sure, different "modes" move at different "speeds", but this is the wrong analysis, since really the whole field is just rotating by some time-varying phase.)
More interesting is if \dot{u} = -u', say. Then expanding u(x) = \sum_{-\infty}^\infty u_k e^{ikx}, we can solve and conclude that \dot{u_k} = -ik u_k, and \dot{u(x)} = u(x-t). So this is what happens if waves travel all at constant velocity.
But let's say that the kinds of waves we care about are surface waves. For instance, we might have a taught string, and waves are small oscillations. Then really physics should act to even out curvatures: we should expect an upwards pull on any point where the field has positive curvature. If we don't remember freshman mechanics, we might write down \dot{u} = u'', which gives us u_k(t) = e^{-k^2 t} u_k(0). This isn't bad: different modes move with velocity proportional to t. It's not quite perfect, though, because really it's the force in that direction, not the derivative, so really we should have acceleration \ddot{u} = u''. Then we get back our original waves, except we have left-movers and right-movers. (More generally, we can add a mass term, and get H(u) = (1/2\pi) \int (1/2) [\dot{u}^2 - (u')^2 + m^2 u^2] = 1/2 \sum_{-\infty}^\infty [\dot{u_k}\dot{u_{-k}} + (k^2 + m^2) u_k u_{-k}], and the modes really do move at different velocities.)
Anyway, the point is that I really do expect, in this world, to have 2\infty total dimensions: \Z worth of "position" coordinates u_k and \Z momentum coordinates, not the \Z/2 of each that Yasha was considering. By just inventing conjugate momentum coordinates v_k to position coordinates u_k, we can get, for the free field, such simple equations of motion as \dot{u_k} = v_{-k} and \dot{v_k} = (k^2+m^2) u_{-k}.
So why does every quantum field theory course start quantizing the (real) scalar field by expanding in Fourier modes and imposing a nontrivial bracket between the coefficients? Because the (free) equations of motion, not the original setup, demand relationships between the fourier modes of u and \dot{u}, and the nontrivial bracket is between u and \dot{u}.
Perhaps next time I will venture into the realm of quantum mechanics. I'd really like to understand how the classical Poisson bracket becomes the quantum Lie bracket, and where the hell that i\hbar comes from. First, of course, I will have to talk more about sets, Hilbert spaces, and the like, and I'll probably stay finite-dimensional for a while. Eventually, of course, I want to describe Feynman diagrams, and tie them back to the Penrose birdtracks and the tensors that started this series of entries.
That is, of course, if I ever get that far. I tend to be distracted by other time-consuming tasks: I am only a few hours of work away from being done applying to graduate schools.
*Of course, I started this entry a few weeks ago.
05 December 2006
11 November 2006
Liouville's Theorem
A long time ago, in a 2n-dimensional symplectic manifold M, with form \omega_{\mu\nu} (and dual form \omega^{\mu\nu}), far, far away...
Prologue: Our hero, a young Hamiltonian H: M\to\R, defines a "hamiltonian flow" via a vector field (X_H)^\nu = dH_\mu \omega^{\mu\nu}. We can understand H as, for instance, the total energy, and M as the phase space. H has a friend, G, which is preserved by the hamiltonian flow (e.g. momentum in some direction). This happens exactly when X_H[G] = (dG).(X_H) = 0 (thinking of X_H in the first line as a differential operator). But dG.X_H = (dG)_\nu (X_H)^\nu = (X_G)^\mu \omega_{\mu\nu} (X_H)^\nu = (dG)_\nu \omega^{\mu\nu} (dH)_\mu. So H's flow preserves G if and only if G's flow preserves H: being friends is a reflective relationship.
Following the classical mechanists, we say that H and G are "in involution" if indeed \omega(X_H,X_G) = 0. More generally, we can define the "Poisson Bracket" \{H,G\} = \omega(X_H,X_G) = (dH)_\mu (dG)_\nu \omega^{\mu\nu}. Then clearly \{H,G\} = -\{G,H\}, and in particular H preserves itself (energy is conserved). Indeed, \{,\} behaves as a Lie bracket out: it satisfies the Jacobi identity \{\{G,H\},K\} + \{\{H,K\},G\} + \{\{K,G\},H\} = 0, and \{,\} is \R-linear. (Thus C^\infty(M) is naturally a Lie algebra; the corresponding Lie group is the space of "symplectomorphisms", or diffeomorphisms on M that preserve \omega.) Moreover, X_{\{G,H\}} = [X_G,X_H] where [,] is the (canonical) Lie bracket on vector fields. (The Hamiltonian fields X_H are exactly the differentials of symplectomorphisms, hence the identification in the previous parenthetical.)
((Actually, it's not C^\infty(M) that's tangent to the symplecto group of M, but C^\infty(M) / \R, when M is connected. Our \{,\} depends only on the differential of Hamiltonian functions, and so ignores constant terms: there are \R possible constant terms for each connected component of M. We can, of course, equip \R with the trivial Lie algebra, and then C^\infty(M) is, as a Lie algebra, T(symplectomorphisms) \times \R. The physicists would say this by observing that energy is defined only up to a total constant; this constant cannot affect our physics because it appears only in commutators. The physicists try to use this observation to justify introducing infinite constants into their expressions.))
One day our hero H met another function F, but this one unpreserved by H's flow. How does F change? In our setup, where H and F have no explicit time dependence, and we're just flowing via \dot{x} = X_H, we have that dF/dt = X_H[F] = {F,H}.
When H and G are buddies (in involution), then each of H and G is preserved by X_H: the flow stays in the common level set H = H(0) and G = G(0). Assuming that H and G are independent, in the sense that dH and dG are linearly independent (so we're not in the G = H^2 case, for instance), this common level set is (2n-2)-dimensional.
The story: As a young and attractive Hamiltonian, our hero H was particularly popular: there were n-1 other Hamiltonians H_2,...,H_n so that, along with H_1=H, all were pairwise in involution (\{H_i,H_j\} = 0), and all independent (the set of dH_i is linearly independent at each point in M, or at least at each point in some common level set, and so in a neighborhood of that level set). This is the most friends any Hamiltonian can have: the common level set is n-dimensional, and the tangent space contains n independent vectors X_i = X_{H_i}. Because, the X_i spanned each tangent, and because \omega of any two was zero, the common level set was for all to see a Lagrangian submanifold.
Being very good friends, the H_i never got in each other's way. X_i could flow, and X_j could flow, and because of the relationship between Poisson and Lie brackets, their flows always commuted. The gang used this to great affect: by giving each friend an amount to flow, the crowd defined an \R^n action on the common level set. The friends set out to explore this countryside: Hamiltonian flow is volume-preserving, since it preserves the symplectic form (whose nth power is a volume form), and a volume preserving \R-action is onto connected components.
The friends, returning home by some element of the stabilizer subgroup, understood the landscape: the only discrete subgroups of \R^n are lattices, and so the common level set was necessarily a torus (in the compact case). Picking standard coordinates q^j for a torus, the friends observed an isotropy: at every point, X_i = a_i^j \d/\dq^j with a constant "frequency" matrix a.
Our hero's life was solved. If all the frequencies a_1^j were rational multiples of each other, what the Greeks called "commensurate", H's paths were closed. Otherwise, H's flow would be dense in some subtorus, and either way, physics was simple. Indeed, because every Lagrangian manifold has a neighborhood symplectomorphic to its tangent bundle, there were "momentum" coordinates p_i conjugate to the angular position coordinates q^i, and these p_i depended only on the H_j. Indeed, in p,q coordinates, X_i was by definition -\dH_i/\dp_j \d/\dq^j + \dH_i/\dq^j \d/\dp_j, so the H_i knew themselves: H_i = -a_i^j p_j + const.
It was in this way that our hero the Hamiltonian understood how to flow not only in the level set, but in some neighborhood. The friends lived happily ever after.
Epilogue: Sadly, not all Hamiltonians can have as nice a life as our hero, because many do not have so many friends. It has been shown that the three-body problem is not Liouville-integrable, as this property of having enough mutual friends (and hence admitting Lagrangian tori) came to be called. Much analysis has gone into studying perturbations of Liouville systems — weakly-interacting gravitating bodies, for instance — but I do not know this material, and so will not exposit on it here. In my next entry, I hope to speak more about the Poisson bracket, and how it turns classical into quantum systems.
Edit: The matrix a_i^j may depend, of course, on H, or equivalently on p. What is actually true is that, up to a constant, H_i(p) = \int_0^{p} a_i^j(p') dp'_j. It is by solving this equation that one may find the conjugate p coordinates. That H_i = -a_i^j p_j + const. is true only to first order, and to first-order we cannot know whether, for instance, entries in a_i^j remain commensurate, and so whether paths stay closed as the momentum changes. Generically, Hamiltonian flows are not closed, and instead a single path is dense in the entire torus. In the general three-body problem, the flow is dense in a space greater than the dimension of any Lagrangian submanifold.
Prologue: Our hero, a young Hamiltonian H: M\to\R, defines a "hamiltonian flow" via a vector field (X_H)^\nu = dH_\mu \omega^{\mu\nu}. We can understand H as, for instance, the total energy, and M as the phase space. H has a friend, G, which is preserved by the hamiltonian flow (e.g. momentum in some direction). This happens exactly when X_H[G] = (dG).(X_H) = 0 (thinking of X_H in the first line as a differential operator). But dG.X_H = (dG)_\nu (X_H)^\nu = (X_G)^\mu \omega_{\mu\nu} (X_H)^\nu = (dG)_\nu \omega^{\mu\nu} (dH)_\mu. So H's flow preserves G if and only if G's flow preserves H: being friends is a reflective relationship.
Following the classical mechanists, we say that H and G are "in involution" if indeed \omega(X_H,X_G) = 0. More generally, we can define the "Poisson Bracket" \{H,G\} = \omega(X_H,X_G) = (dH)_\mu (dG)_\nu \omega^{\mu\nu}. Then clearly \{H,G\} = -\{G,H\}, and in particular H preserves itself (energy is conserved). Indeed, \{,\} behaves as a Lie bracket out: it satisfies the Jacobi identity \{\{G,H\},K\} + \{\{H,K\},G\} + \{\{K,G\},H\} = 0, and \{,\} is \R-linear. (Thus C^\infty(M) is naturally a Lie algebra; the corresponding Lie group is the space of "symplectomorphisms", or diffeomorphisms on M that preserve \omega.) Moreover, X_{\{G,H\}} = [X_G,X_H] where [,] is the (canonical) Lie bracket on vector fields. (The Hamiltonian fields X_H are exactly the differentials of symplectomorphisms, hence the identification in the previous parenthetical.)
((Actually, it's not C^\infty(M) that's tangent to the symplecto group of M, but C^\infty(M) / \R, when M is connected. Our \{,\} depends only on the differential of Hamiltonian functions, and so ignores constant terms: there are \R possible constant terms for each connected component of M. We can, of course, equip \R with the trivial Lie algebra, and then C^\infty(M) is, as a Lie algebra, T(symplectomorphisms) \times \R. The physicists would say this by observing that energy is defined only up to a total constant; this constant cannot affect our physics because it appears only in commutators. The physicists try to use this observation to justify introducing infinite constants into their expressions.))
One day our hero H met another function F, but this one unpreserved by H's flow. How does F change? In our setup, where H and F have no explicit time dependence, and we're just flowing via \dot{x} = X_H, we have that dF/dt = X_H[F] = {F,H}.
When H and G are buddies (in involution), then each of H and G is preserved by X_H: the flow stays in the common level set H = H(0) and G = G(0). Assuming that H and G are independent, in the sense that dH and dG are linearly independent (so we're not in the G = H^2 case, for instance), this common level set is (2n-2)-dimensional.
The story: As a young and attractive Hamiltonian, our hero H was particularly popular: there were n-1 other Hamiltonians H_2,...,H_n so that, along with H_1=H, all were pairwise in involution (\{H_i,H_j\} = 0), and all independent (the set of dH_i is linearly independent at each point in M, or at least at each point in some common level set, and so in a neighborhood of that level set). This is the most friends any Hamiltonian can have: the common level set is n-dimensional, and the tangent space contains n independent vectors X_i = X_{H_i}. Because, the X_i spanned each tangent, and because \omega of any two was zero, the common level set was for all to see a Lagrangian submanifold.
Being very good friends, the H_i never got in each other's way. X_i could flow, and X_j could flow, and because of the relationship between Poisson and Lie brackets, their flows always commuted. The gang used this to great affect: by giving each friend an amount to flow, the crowd defined an \R^n action on the common level set. The friends set out to explore this countryside: Hamiltonian flow is volume-preserving, since it preserves the symplectic form (whose nth power is a volume form), and a volume preserving \R-action is onto connected components.
The friends, returning home by some element of the stabilizer subgroup, understood the landscape: the only discrete subgroups of \R^n are lattices, and so the common level set was necessarily a torus (in the compact case). Picking standard coordinates q^j for a torus, the friends observed an isotropy: at every point, X_i = a_i^j \d/\dq^j with a constant "frequency" matrix a.
Our hero's life was solved. If all the frequencies a_1^j were rational multiples of each other, what the Greeks called "commensurate", H's paths were closed. Otherwise, H's flow would be dense in some subtorus, and either way, physics was simple. Indeed, because every Lagrangian manifold has a neighborhood symplectomorphic to its tangent bundle, there were "momentum" coordinates p_i conjugate to the angular position coordinates q^i, and these p_i depended only on the H_j. Indeed, in p,q coordinates, X_i was by definition -\dH_i/\dp_j \d/\dq^j + \dH_i/\dq^j \d/\dp_j, so the H_i knew themselves: H_i = -a_i^j p_j + const.
It was in this way that our hero the Hamiltonian understood how to flow not only in the level set, but in some neighborhood. The friends lived happily ever after.
Epilogue: Sadly, not all Hamiltonians can have as nice a life as our hero, because many do not have so many friends. It has been shown that the three-body problem is not Liouville-integrable, as this property of having enough mutual friends (and hence admitting Lagrangian tori) came to be called. Much analysis has gone into studying perturbations of Liouville systems — weakly-interacting gravitating bodies, for instance — but I do not know this material, and so will not exposit on it here. In my next entry, I hope to speak more about the Poisson bracket, and how it turns classical into quantum systems.
Edit: The matrix a_i^j may depend, of course, on H, or equivalently on p. What is actually true is that, up to a constant, H_i(p) = \int_0^{p} a_i^j(p') dp'_j. It is by solving this equation that one may find the conjugate p coordinates. That H_i = -a_i^j p_j + const. is true only to first order, and to first-order we cannot know whether, for instance, entries in a_i^j remain commensurate, and so whether paths stay closed as the momentum changes. Generically, Hamiltonian flows are not closed, and instead a single path is dense in the entire torus. In the general three-body problem, the flow is dense in a space greater than the dimension of any Lagrangian submanifold.
09 November 2006
Tensors and Hamiltonians
I seem to have fallen way behind in writing about my classes. In particular, it may be a while yet before I do any quantum mechanics; I'm more excited by my classical geometry. But perhaps I will move into the quantum world soon. I almost understand it.
In the last few weeks, my classes have defined forms and fields, integration, chains, Lie groups, and Riemannian manifolds; quantum fields, fermions, SUSY, and Feynman diagrams; structural stability, Anosov flow, and a world worth of material in Yasha Eliashberg's class. Yasha pointed out in one lecture, "It is impossible to be lost in my class, because I keep changing topics every two minutes."
But I'm trying to provide a unified account of such material in these pages, and I last left you only with fields of tangent vectors. So, today, tensors, differential forms, and Hamiltonian mechanics.
Remember where we were: we have a smooth manifold M with local coordinates x^i, over which we can build two extremely important bundles, called T(M) and T^*(M). T(M) is the space of "derivations at a point": on each fiber we have a basis \d/\dx^i and coordinates \dot{x}^i. T^*(M) is dual to T(M) fiberwise: its basis is dx^i and its coordinates are p_i. But from these we can build all sorts of tensor bundles.
I touched on tensors in my last post, but hardly defined them. They are, however, a straightforward construction. A tensor is two vectors set next to each other. Or almost. If I have two vectors v\in V and w\in W, I can take their tensor product v\tensor w: I define \tensor to be multilinear, and that's all. V\tensor W is then generated by all the possible v\tensor w. More precisely, V\tensor W is the universal object so that bilinear maps from V\times W factor through it: there's a canonical bilinear map V\times W \to V\tensor W so that any bilinear from V\times W to \R factors through this map and some linear V\tensor W to \R.
If you haven't seen tensors before, this definition probably only made things worse, so let me say some other words about tensors. (i) The product \tensor is multilinear: (av)\tensor w = a(v\tensor w) = v\tensor(aw), and (v_1 + v_2)\tensor w = v_1\tensor w + v_2\tensor w, and the same on the other side. Thus \R\tensor V is canonically isomorphic to V. (ii) if {v_i} and {w_j} are bases for V and W, then {v_i\tensor w_j} is a basis for V\tensor W. It is in this way that \tensor correctly generalizes \times from Set to Vect.
We will be primarily interested in tensors comprised only of V and V^*, i.e. of vectors and dual vectors. Even more, when we move to bundles, we will be interested only in tensors over T(M) and T*(M). Of course, we can canonically commute V* past V, so all our tensors might as well live in V \tensor ... \tensor V \tensor V* \tensor ... \tensor V*, for various (possibly 0) numbers of Vs and V*s. Some notation: if there are n Vs and m V*s, I will write this (tensor) product as \T^n_m(V). \T^0_0 = \R; \T^1_0 = V.
How should you write these? As birdtracks, a name which Penrose even seems to be adopting. For these, and in my (and many physicists') notation, draw vectors with upward-pointing "arms" (since we write them with raised x^i) and dual vectors with downward-pointing "legs" (indices are lowered). The order of the arms matters, as does the order of the legs, but the canonical commutation referred to in the previous paragraph is explicit. To multiply two vectors, just draw them next to each other; in general, any tensor is the sum of products of vectors, but not necessarily just a product, so in general tensors are just shapes with arms and legs.
Birdtracks are an exquisite way to keep track of what's called "tensor contraction". See, what's important about dual vectors is that they can "eat" vectors and "spit out" numbers: there is a canonical pairing from V\tensor V* = \T^1_1 \to \R. We can generalize this contraction to any tensors, if we just say which arm eats which leg. In these notes, drawing birdtracks is hard; I will use the almost-as-good notation of raised and lowered indices. We can define \T^{-1} as being basically the same as \T_1, except that it automatically contracts in tensor products; this breaks associativity.
So, our basic objects are sections of \T^n_m(T(M)), by which I mean fields of tensors. A few flavors of tensors deserve special mention.
To \T^n(V) we can impose various relationships. In particular, there are two important projection operators, Sym and Ant. There is a canonical action of S_n on \T^n(V): \pi\in S_n sends a basis element e_{i_1}\tensor...\tensor e_{i_n} to e_{\pi(i_1}\tensor...\tensor e_{\pi(i_j)}. Extending this action linearly, we can construct Sym and Ant by
Sym(\omega) = (1/n!) \sum_{\pi\in S_n} \pi(\omega)
Ant(\omega) = (1/n!) \sum_{\pi\in S_n} \sgn(\pi) \pi(\omega)
where \sgn(\pi) is 1 if \pi is an even permutation, -1 if it is odd. These definitions, of course, also work for T_n. These are projection operators — Ant^2 = Ant and Sym^2 = Sym — so we can either quotient by their kernels or just work in their images, it doesn't matter. Define \S^n = Sym(\T^n) and \A^n = Ant(\T^n), and similarly for lowered indices. We have symmetric and "wedge" (antisymmetric) multiplication by, e.g., \alpha\wedge\beta = Ant(\alpha\tensor\beta); each is associative. One can immediately see that, if dim(V)=k, then dim(\S^n(V)) = \choose{k+n}{n} and dim(\A^n(V)) = \choose{k}{n}; of course, dim(\T^n) = k^n. Of particular importance: \T^2 = \S^2 + \A^2, where I will always use "+" between vector spaces to simply mean "direct sum" (which correctly generalizes disjoint union of bases).
From now on, V will always be T(M), and I'm now interested in fields. I will start using \T^_, \A, and \S to refer to the spaces of fields. We will from time to time call on tensors fields in \S_n, but those in \A_n end up being more important: we can understand them as differential forms. I may in a future entry try to understand differential forms better; for now, the following discussion suffices.
To each function f\in C^\infty(M) = \T_0 = \A_0 we can associate a canonical "differential": remembering that v\in \T^1 acts as a differential operator, we can let df\in\T_1 be the dual vector (field) so that df_i v^i = v(f). Even more simply, f:M\to\R, so it has a ("matrix") derivative Df: TM\to T\R. But T\R is trivial, and indeed we can canonically identify each fiber just with \R. So df = Df composed with this projection T\R \to \R along the base (preserving only the fiber). In coordinates, df = \df/\dx^i dx^i. It's tempting to follow the physicists and write d_i = \d/\dx^i, since the set of these "coordinate vector fields" "transforms as a dual vector".
From this, we can build the "exterior derivative" d:\A_i\to\A_{i+1} by declaring that d(df) = 0, and that if \alpha\in\A_r and \beta\in\A_s, then d(\alpha\wedge\beta) = d\alpha \wedge \beta + (-1)^i \alpha \wedge d\beta. This will not be important for the rest of my discussion today; I may revisit it. But I may decide that I prefer the physicists' \d_i = \d/\dx^i which acts on tensors of any stripe and satisfies the normal Leibniz rule. We'll see.
So, what can we do with tensors? We could pick a nondegenerate positive definite element of S_2; such a critter is called a "Riemannian metric". I won't talk about those right now (I may come back to them in a later entry). Instead, I'm interested in their cousins, symplectic forms: nondegenerate fields \omega in \A_2. By nondegenerate, I mean that for any non-zero vector y there's a vector x so that \omega(x,y)\neq 0. Thus, symplectic forms, live metrics, give isomorphisms between V and V*, and so \omega_{ij} has an inverse form \omega^{ij}\in\A^2 so that \omega_{ij}\omega^{jk} = \delta_i^k \in \A_1^1 = \T_1^1.
Some observations about symplectic forms are immediate, and follow just from linear algebra. (i) Symplectic forms only live in even-dimensions. (ii) We can find local coordinates p_i and q^i so that \omega = dp_i\wedge dq^i. (iii) In 2n-dimensional space, the form (\omega)^n \in \A_{2n} is a nowhere-zero volume form on space, so symplectic forms only live in orientable manifolds.
It's observation (ii) that gives us our first way of tackling symplectic forms, because we have a canonical example: the cotangent bundle of a manifold. If dim(M) = n, then T*M is 2n-dimensional and has a canonical form \omega = dp_i \wedge dq^i, where the q^i are "position" coordinates in M and the p_i are the "conjugate momentum" coordinates in the fibers. You can show that this coordinate formula for \omega transforms covariantly with changes of coordinates, so the form is well-defined; better is an invariant geometric meaning. And we can construct one. Let v be a vector in T(T*M), based at some point x,y\in T*M, i.e. x\in M and y is a covector at x. Then \pi: T*M \to M projects down along fibers, so we can pushforward v to \pi_*(v) \in TM. Now let y act on this tangent vector. This defines a form \alpha\in T*(T*M) = \A_1 (T*M): \alpha(v_{x,y}) = y.\pi_*(v). In local coordinates, \alpha = p_i dq^i. Then we can differentiate \alpha to get \omega = d\alpha \in \A_2; one can show that d\alpha is everywhere nondegenerate.
Why do we care? Well, let's say we're solving a mechanics problem of the following form: we have a system, in which the position of a state is given by some generalized coordinates q^i, and momentum by p_i. Then the total phase space is T(position space). And to each state, let's assign a total energy H = p^2/2m + U(q). We're thinking of p^2/2m as the "kinetic energy" and U(q) as the "potential energy". More generally, we could imagine replacing p^2/2m by any (positive definite symmetric) a^{ij} p_i p_j / 2; of course, we ought to pick coordinates to diagonalize a^{ij}, but c'est la vie. (And, of course, a^{ij} may depend on q.) Then our dynamics should be given by
\dot{q} = p/m = \dH/\dp
\dot{p} = -\dU/\dq = -\dH/\dq
How can we express this in more universal language? A coordinate-bound physicists might be content writing \dot{q}^i = a^{ij}p_j and \dot{p}_i = -\dU/\dq^i. But what's the geometry? Well, we want this vector field X_H = (\dot{p},\dot{q})\in \T^1 = \A^1, and we have the derivative dH \in \A_1 = \T_1. It turns out that the relationship is exactly that for any other v-field Y, dH.Y = \omega(X_H,Y). I.e. dH is X_H contracted with \omega. For the physicists, letting \mu index the coordinates in T*M, we have (X_H)^\mu = \omega^{\mu\nu} \dH/\dx^\mu.
This is an important principle: the Hamiltonian, which knows only the total energy of a state, actually knows everything about the dynamics of the system (provided that the system know which momenta are conjugate to which positions).
From here, we could go in a number of directions. One physics-ish question that deeply interests me is why our universe has this physics. The conventional story is that God created a position space, and that inherent to the notion of "position" is the notion of "conjugate momentum", and thus it would be natural to create a dynamics like we have. But what's entirely unclear to me is why our physics should be second-order at all. Wouldn't it be much more natural for God to start with just a manifold of states and a vector-field of dynamics? Perhaps God felt that a function is simpler than a vector field. But that's no matter: God could have just picked a manifold with a function and any nondegenerate element of \T^2, with which to convert d(the function) into the appropriate field. For instance, we could have had a Riemannian manifold with dynamics given by "flow in the direction of the gradient of some function".
No, instead God started with a symplectic manifold. Well, that's fine; I don't begrudge the choice of anti-symmetry. But then there's a really deep mystery: why, if fundamentally our state is given by a point in some symplectic manifold, do we distinguish between position and momentum? Certainly we can always find local coordinates in which \omega = dp\wedge dq, but for God there's some rotation between p- and q-coordinates that we don't see. The symmetry is broken.
Another direction we could go is to discuss this physics from the opposite direction, and I certainly intend to do so in a future entry. As many of you probably know, alternate to the Hamiltonian formalism is a similarly useful Lagrangian formalism for mechanics. Along with its "least action" principle, Lagrangian mechanics is a natural setting in which to understand Noether's Theorem and Feynman's Sum-Over-Paths QFT. At the very least, sometime soon I will try to explain the relationship between Lagrangians and Hamiltonians; it will draw deeply on such far-afield ideas as projective geometry.
But I think that in my next entry I will pursue yet another direction. I'd like to talk about Liouville's Theorem, which tells you how to solve Hamiltonian diffeqs, by introducing a bracket between Hamiltonian functions. I hope that I will be able to then explain how this relates to QFT and its canonical commutation relations. This last step I don't fully understand yet, but I hope to: I think it's the last bit I need to know before I can understand what it means to "quantize" a classical theory.
Edit: In addition to symplectic forms being nondegenerate antisymmetric, they must also be exact: their (exterior) derivatives should be zero. Most immediately, this fact assures that the symplectic forms be locally equivalent, and, most basically, this allows us to talk about (co)homology of a symplectic. In a more advanced setting, exactness will translate into Jacobi's identity.
In the last few weeks, my classes have defined forms and fields, integration, chains, Lie groups, and Riemannian manifolds; quantum fields, fermions, SUSY, and Feynman diagrams; structural stability, Anosov flow, and a world worth of material in Yasha Eliashberg's class. Yasha pointed out in one lecture, "It is impossible to be lost in my class, because I keep changing topics every two minutes."
But I'm trying to provide a unified account of such material in these pages, and I last left you only with fields of tangent vectors. So, today, tensors, differential forms, and Hamiltonian mechanics.
Remember where we were: we have a smooth manifold M with local coordinates x^i, over which we can build two extremely important bundles, called T(M) and T^*(M). T(M) is the space of "derivations at a point": on each fiber we have a basis \d/\dx^i and coordinates \dot{x}^i. T^*(M) is dual to T(M) fiberwise: its basis is dx^i and its coordinates are p_i. But from these we can build all sorts of tensor bundles.
I touched on tensors in my last post, but hardly defined them. They are, however, a straightforward construction. A tensor is two vectors set next to each other. Or almost. If I have two vectors v\in V and w\in W, I can take their tensor product v\tensor w: I define \tensor to be multilinear, and that's all. V\tensor W is then generated by all the possible v\tensor w. More precisely, V\tensor W is the universal object so that bilinear maps from V\times W factor through it: there's a canonical bilinear map V\times W \to V\tensor W so that any bilinear from V\times W to \R factors through this map and some linear V\tensor W to \R.
If you haven't seen tensors before, this definition probably only made things worse, so let me say some other words about tensors. (i) The product \tensor is multilinear: (av)\tensor w = a(v\tensor w) = v\tensor(aw), and (v_1 + v_2)\tensor w = v_1\tensor w + v_2\tensor w, and the same on the other side. Thus \R\tensor V is canonically isomorphic to V. (ii) if {v_i} and {w_j} are bases for V and W, then {v_i\tensor w_j} is a basis for V\tensor W. It is in this way that \tensor correctly generalizes \times from Set to Vect.
We will be primarily interested in tensors comprised only of V and V^*, i.e. of vectors and dual vectors. Even more, when we move to bundles, we will be interested only in tensors over T(M) and T*(M). Of course, we can canonically commute V* past V, so all our tensors might as well live in V \tensor ... \tensor V \tensor V* \tensor ... \tensor V*, for various (possibly 0) numbers of Vs and V*s. Some notation: if there are n Vs and m V*s, I will write this (tensor) product as \T^n_m(V). \T^0_0 = \R; \T^1_0 = V.
How should you write these? As birdtracks, a name which Penrose even seems to be adopting. For these, and in my (and many physicists') notation, draw vectors with upward-pointing "arms" (since we write them with raised x^i) and dual vectors with downward-pointing "legs" (indices are lowered). The order of the arms matters, as does the order of the legs, but the canonical commutation referred to in the previous paragraph is explicit. To multiply two vectors, just draw them next to each other; in general, any tensor is the sum of products of vectors, but not necessarily just a product, so in general tensors are just shapes with arms and legs.
Birdtracks are an exquisite way to keep track of what's called "tensor contraction". See, what's important about dual vectors is that they can "eat" vectors and "spit out" numbers: there is a canonical pairing from V\tensor V* = \T^1_1 \to \R. We can generalize this contraction to any tensors, if we just say which arm eats which leg. In these notes, drawing birdtracks is hard; I will use the almost-as-good notation of raised and lowered indices. We can define \T^{-1} as being basically the same as \T_1, except that it automatically contracts in tensor products; this breaks associativity.
So, our basic objects are sections of \T^n_m(T(M)), by which I mean fields of tensors. A few flavors of tensors deserve special mention.
To \T^n(V) we can impose various relationships. In particular, there are two important projection operators, Sym and Ant. There is a canonical action of S_n on \T^n(V): \pi\in S_n sends a basis element e_{i_1}\tensor...\tensor e_{i_n} to e_{\pi(i_1}\tensor...\tensor e_{\pi(i_j)}. Extending this action linearly, we can construct Sym and Ant by
Ant(\omega) = (1/n!) \sum_{\pi\in S_n} \sgn(\pi) \pi(\omega)
where \sgn(\pi) is 1 if \pi is an even permutation, -1 if it is odd. These definitions, of course, also work for T_n. These are projection operators — Ant^2 = Ant and Sym^2 = Sym — so we can either quotient by their kernels or just work in their images, it doesn't matter. Define \S^n = Sym(\T^n) and \A^n = Ant(\T^n), and similarly for lowered indices. We have symmetric and "wedge" (antisymmetric) multiplication by, e.g., \alpha\wedge\beta = Ant(\alpha\tensor\beta); each is associative. One can immediately see that, if dim(V)=k, then dim(\S^n(V)) = \choose{k+n}{n} and dim(\A^n(V)) = \choose{k}{n}; of course, dim(\T^n) = k^n. Of particular importance: \T^2 = \S^2 + \A^2, where I will always use "+" between vector spaces to simply mean "direct sum" (which correctly generalizes disjoint union of bases).
From now on, V will always be T(M), and I'm now interested in fields. I will start using \T^_, \A, and \S to refer to the spaces of fields. We will from time to time call on tensors fields in \S_n, but those in \A_n end up being more important: we can understand them as differential forms. I may in a future entry try to understand differential forms better; for now, the following discussion suffices.
To each function f\in C^\infty(M) = \T_0 = \A_0 we can associate a canonical "differential": remembering that v\in \T^1 acts as a differential operator, we can let df\in\T_1 be the dual vector (field) so that df_i v^i = v(f). Even more simply, f:M\to\R, so it has a ("matrix") derivative Df: TM\to T\R. But T\R is trivial, and indeed we can canonically identify each fiber just with \R. So df = Df composed with this projection T\R \to \R along the base (preserving only the fiber). In coordinates, df = \df/\dx^i dx^i. It's tempting to follow the physicists and write d_i = \d/\dx^i, since the set of these "coordinate vector fields" "transforms as a dual vector".
From this, we can build the "exterior derivative" d:\A_i\to\A_{i+1} by declaring that d(df) = 0, and that if \alpha\in\A_r and \beta\in\A_s, then d(\alpha\wedge\beta) = d\alpha \wedge \beta + (-1)^i \alpha \wedge d\beta. This will not be important for the rest of my discussion today; I may revisit it. But I may decide that I prefer the physicists' \d_i = \d/\dx^i which acts on tensors of any stripe and satisfies the normal Leibniz rule. We'll see.
So, what can we do with tensors? We could pick a nondegenerate positive definite element of S_2; such a critter is called a "Riemannian metric". I won't talk about those right now (I may come back to them in a later entry). Instead, I'm interested in their cousins, symplectic forms: nondegenerate fields \omega in \A_2. By nondegenerate, I mean that for any non-zero vector y there's a vector x so that \omega(x,y)\neq 0. Thus, symplectic forms, live metrics, give isomorphisms between V and V*, and so \omega_{ij} has an inverse form \omega^{ij}\in\A^2 so that \omega_{ij}\omega^{jk} = \delta_i^k \in \A_1^1 = \T_1^1.
Some observations about symplectic forms are immediate, and follow just from linear algebra. (i) Symplectic forms only live in even-dimensions. (ii) We can find local coordinates p_i and q^i so that \omega = dp_i\wedge dq^i. (iii) In 2n-dimensional space, the form (\omega)^n \in \A_{2n} is a nowhere-zero volume form on space, so symplectic forms only live in orientable manifolds.
It's observation (ii) that gives us our first way of tackling symplectic forms, because we have a canonical example: the cotangent bundle of a manifold. If dim(M) = n, then T*M is 2n-dimensional and has a canonical form \omega = dp_i \wedge dq^i, where the q^i are "position" coordinates in M and the p_i are the "conjugate momentum" coordinates in the fibers. You can show that this coordinate formula for \omega transforms covariantly with changes of coordinates, so the form is well-defined; better is an invariant geometric meaning. And we can construct one. Let v be a vector in T(T*M), based at some point x,y\in T*M, i.e. x\in M and y is a covector at x. Then \pi: T*M \to M projects down along fibers, so we can pushforward v to \pi_*(v) \in TM. Now let y act on this tangent vector. This defines a form \alpha\in T*(T*M) = \A_1 (T*M): \alpha(v_{x,y}) = y.\pi_*(v). In local coordinates, \alpha = p_i dq^i. Then we can differentiate \alpha to get \omega = d\alpha \in \A_2; one can show that d\alpha is everywhere nondegenerate.
Why do we care? Well, let's say we're solving a mechanics problem of the following form: we have a system, in which the position of a state is given by some generalized coordinates q^i, and momentum by p_i. Then the total phase space is T(position space). And to each state, let's assign a total energy H = p^2/2m + U(q). We're thinking of p^2/2m as the "kinetic energy" and U(q) as the "potential energy". More generally, we could imagine replacing p^2/2m by any (positive definite symmetric) a^{ij} p_i p_j / 2; of course, we ought to pick coordinates to diagonalize a^{ij}, but c'est la vie. (And, of course, a^{ij} may depend on q.) Then our dynamics should be given by
\dot{p} = -\dU/\dq = -\dH/\dq
How can we express this in more universal language? A coordinate-bound physicists might be content writing \dot{q}^i = a^{ij}p_j and \dot{p}_i = -\dU/\dq^i. But what's the geometry? Well, we want this vector field X_H = (\dot{p},\dot{q})\in \T^1 = \A^1, and we have the derivative dH \in \A_1 = \T_1. It turns out that the relationship is exactly that for any other v-field Y, dH.Y = \omega(X_H,Y). I.e. dH is X_H contracted with \omega. For the physicists, letting \mu index the coordinates in T*M, we have (X_H)^\mu = \omega^{\mu\nu} \dH/\dx^\mu.
This is an important principle: the Hamiltonian, which knows only the total energy of a state, actually knows everything about the dynamics of the system (provided that the system know which momenta are conjugate to which positions).
From here, we could go in a number of directions. One physics-ish question that deeply interests me is why our universe has this physics. The conventional story is that God created a position space, and that inherent to the notion of "position" is the notion of "conjugate momentum", and thus it would be natural to create a dynamics like we have. But what's entirely unclear to me is why our physics should be second-order at all. Wouldn't it be much more natural for God to start with just a manifold of states and a vector-field of dynamics? Perhaps God felt that a function is simpler than a vector field. But that's no matter: God could have just picked a manifold with a function and any nondegenerate element of \T^2, with which to convert d(the function) into the appropriate field. For instance, we could have had a Riemannian manifold with dynamics given by "flow in the direction of the gradient of some function".
No, instead God started with a symplectic manifold. Well, that's fine; I don't begrudge the choice of anti-symmetry. But then there's a really deep mystery: why, if fundamentally our state is given by a point in some symplectic manifold, do we distinguish between position and momentum? Certainly we can always find local coordinates in which \omega = dp\wedge dq, but for God there's some rotation between p- and q-coordinates that we don't see. The symmetry is broken.
Another direction we could go is to discuss this physics from the opposite direction, and I certainly intend to do so in a future entry. As many of you probably know, alternate to the Hamiltonian formalism is a similarly useful Lagrangian formalism for mechanics. Along with its "least action" principle, Lagrangian mechanics is a natural setting in which to understand Noether's Theorem and Feynman's Sum-Over-Paths QFT. At the very least, sometime soon I will try to explain the relationship between Lagrangians and Hamiltonians; it will draw deeply on such far-afield ideas as projective geometry.
But I think that in my next entry I will pursue yet another direction. I'd like to talk about Liouville's Theorem, which tells you how to solve Hamiltonian diffeqs, by introducing a bracket between Hamiltonian functions. I hope that I will be able to then explain how this relates to QFT and its canonical commutation relations. This last step I don't fully understand yet, but I hope to: I think it's the last bit I need to know before I can understand what it means to "quantize" a classical theory.
Edit: In addition to symplectic forms being nondegenerate antisymmetric, they must also be exact: their (exterior) derivatives should be zero. Most immediately, this fact assures that the symplectic forms be locally equivalent, and, most basically, this allows us to talk about (co)homology of a symplectic. In a more advanced setting, exactness will translate into Jacobi's identity.
22 October 2006
Negative Dimensions
Since I'm behind in my series of posts on fields, quantum or otherwise, I will instead talk today about some linear algebra, and not define most of my terms.
The category Vect of vector spaces (over generic field \R = "real numbers") nicely generalizes the category Set of sets. Indeed, there is a "forgetful" functor in which each set forgets that it has a basis. Yes, that's the direction I mean. A vector space generalizes ("quantizes") in a natural way the notion of "set": rather than having definite discrete elements — two elements in a set either are or are not the same — a vector space allows super-positions of elements. A set is essentially "a vector space with a basis": morphisms of sets are morphisms of vector spaces that send basis elements to basis elements. So our forgetful functor takes each set X to the vector space Hom(X,\R) (Hom taken in the category of sets). But, I hear you complain, Hom(-,\R) is contravariant! Yes, but in this case, where I forgot to tell you that all sets are finite and all vector spaces finite-dimensional, we can make F = Hom(-,\R) covariant by F(\phi): f \mapsto g(y) = \sum_{x\in X s.t. \phi(x)=y} f(x). Actually, of course, if I'm allowing infinite sets, then I should specify that I don't quite want X \to Hom(X,\R), but the subspace of functions that send cofinitely many points in X to zero.
Anyhoo, so Set has an initial object 1 = {one element} and a terminal object 0 = {empty set}, and well-defined (up to canonical isomorphism) addition and multiplication (respectively disjoint union and cartesian product). These generalize in Vect to 1 = \R and 0 = {0}, and to direct sum and tensor product; if we identify n = "\R^n" (bad notation, because it's really n\R; I want n-dimensional space with a standard basis, so the space of column vectors), then it's especially clear that sums and products are as they should be. So Vect is, well, not quite a rig (ring without negation), because nothing is defined uniquely, but some categorified version, where all I care is that everything be defined up to canonical isomorphism (so, generically, given by a universal property).
But I can do even better. To each vector space V is associated a dual space V^* = Hom_{Vect}(V,\R), and most of the time V^{**} = V. (I need to learn more linear algebra: I think that there are various kinds of vector spaces, e.g. finite-dim ones, for which this is true, and I think that there's something like V^* = V^{***}. If so, then I should certainly always pass immediately from V to V^{**}, or some such; I really want dualing to be involutive.) By equals, of course, I always mean "up to a canonical isomorphism". Now, V\times V^* = Hom(V,V) is rather large, but there is a natural map Trace:Hom(V,V)\to\R, and this allows us to define a particular product "." which multiplies an element v\in V with w\in V^* by v.w = Tr(v\tensor w). Then . is multi-linear, as a product ought to be, and we can thus consider V.V^* = \R. Indeed, we can imagine some object 1/V that looks like V^* — a physicists wouldn't be able to tell the difference, because their elements are the same — so that V \tensor 1/V = \R. (Up to canonical isomorphism. It's not, of course, clear which copy of V we should contract 1/V with in V\tensor V. But either choice is the same up to canonical isomorphism.) There is even a natural trace from, say, \Hom(2,4) \to 2 — take the trace of the two 2x2 squares that make up the 4x2 matrices — "proving" that 4/2 = 2.
So it seems that, well, Vect is not a division rig, but it naturally extends to one. But what about that n in "ring"? What about negative dimensions? This I don't know.
See, it's an important question. Because, consider the tensor algebra T^{.}(V) = \R + V + V\tensor V + ... — this is an \N-graded algebra of multilinear functions on V^*. This looks an awful lot like the uncategorified 1+x+x^2+..., which we know is equal to 1/(1-x). (Why? Because (1-x)(1+x+...) = 1-x+x-x^2+x^2-... = 1, since every term cancels except for the -x^\infty, which is way off the page.) Anyhoo, so we ought to write the tensor algebra as 1/(1-V).
Which doesn't make any sense at all. 1-V? Well, we might as well define 1-V as dual to the tensor algebra: there should be a natural way to contract any element of 1-V with any multilinear function on V^*. But this has a much shorter algebraic expression, which ought to have Platonic meaning. So, what's a natural object that we can construct out of V that contracts (linearly) with all multilinear functions to give real-valued traces?
If we could answer this, then perhaps we could find out what -V is. How? Not, certainly, by subtracting 1=\R from 1-V. No, I suggest that whatever our proposal be, we then try it on 1-2V = (T^.(V+V))^* = 1/(\R + V+V + (V+V)\tensor(V+V) + ...), and compare. What out to happen is that there should be some natural object W such that 1-2V = W + 1-V, and it should turn out that 1-V = 1 + W. Whatever the case is, there should be a natural operation that "behaves like +" such that 1-V + V = 1. It's certainly not standard direct sum, just like how V \times 1/V is not the standard tensor product. But it should be like it in some appropriate sense. Most necessarily, it should satisfy linearity: if v_1,v_2\in V and w_1,w_2\in W, then v_1+w_1 and v_2+w_2 \in V+W should sum to (v_1+v_2)+(w_1+w_2). And, of course, if you have the right definition, then all the rest of arithmetic should work out: 1/(-V) = -(1/V), -V = -\R \times V, (-V)\times W = -(V\times W), and, most importantly, --V = V (up to canonical isomorphism).
One can go further with such symbolic manipulation. You've certainly met the symmetric tensor algebra S^{.}(V) of multilinear symmetric tensors, and you've probably defined each graded component S^{n}(V) as V^{\tensor n} / S_n, where by "/ S_n" I mean mean "modulo the S_n action that permutes the components in the n-times tensor product." (If you are a physicists, you probably defined the symmetric tensors as a _subspace_ of all tensors, rather than a quotient space, but this is ok, because the S_n identification generates a projection operator Sym: \omega \to (1/n!)\sum_{\pi\in S_n} \pi(\omega), and so the subspace is equal to the quotient. At least when the characteristic of the ground field is 0.) Well, S_n looks an awful lot like n!, so the symmetric algebra really looks like 1 + V + V^2/2! + ... = e^V. Which is reasonable: we can naturally identify S^{.}(V+W) = S^{.}V\tensor S^{.}W.
It's not quite perfect, though. The dimension of S^{.}V, if dim V = n, is not e^n, but 1 + n + n(n+1)/2 + n(n+1)(n+2)/6 + ..., which is only correct in the limit n\to\infty. Well, so why is that the dimension? When we symmetrize v\tensor w to 1/2(vw+wv), we generically identify different tensors. But v^2 symmetrizes to itself. Baez, though, says how to think about this: when a group action does not act freely, we should think of points like v^2 as only going to "half" points. So, for example, the group 2 can act on the vector space \R in a trivial way; we should think of \R/2 as consisting of only "half a dimension".
Anyway, the point is that we can divide by groups, and this is similar to our division by (dual) vector spaces: in either case, we are identifying, in a linear way, equivalence classes (either orbits or preimages).
Now, though, it becomes very obvious that we need to extend what kinds of spaces we're considering. Groups can act linearly in lots of ways, and it's rare that the quotient space is in fact a vector space. Perhaps the physicists are smart to confuse fixed subspaces and quotients: it restricts them just to projection operators. But, for instance, if we mod out \C by 2 = complex conjugation (which is real-linear, although not complex-linear), do we get \R or some more complicated orbifold? Is there a sense in which \R/2 + \R/2 = \R, where 2 acts by negation? \R/2 is the ray, so perhaps the direct sum model works, but you don't naturally get \R, just a one-dim space? To give interesting physics, it would be nice if these operations really did act on the constituent parts of each space. And what about dividing by 3? Every field has a non-trivial square root of 1, but only \C has nontrivial nth roots. So perhaps we really should just work with Vect of \C-linear spaces. Then we can always mod out by cyclic actions, but we don't normally get vector spaces.
Of course, part of the fun of categorifying is that there are multiple categorical interpretations of any arithmetic object: 6 may be the cyclic group C_6 = C_2 \times C_3, but 3! is the symmetric group S_3, and the groups 4 and 2x2 are also unequal. But if we come up with a coherent-enough theory, we ought to be able to make interesting discussion of things like square roots: there's an important sense in which the square root of a Lorentz vector is a spinor, and we should be able to say (1+V)^{1/2} = 1 + (1/2)V + (1/2)(-1/2)V^2/2 + (1/2)(-1/2)(-3/2)V^3/3! + ....
Overall, the move from Set to Vect deserves to be called "quantization" — well, really quantization doesn't yield vector spaces but (complex) Hilbert spaces, so really it should be the forgetful functor Set \to Hilb. If we have a coherent theory of how to categorify from numbers to Set, then it should match our theory of how to categorify from numbers to Hilb. And, ultimately, we should be able to understand all of linear algebra as even more trivial than how we already understand it: linear algebra is simply properly-categorified arithmetic.
The category Vect of vector spaces (over generic field \R = "real numbers") nicely generalizes the category Set of sets. Indeed, there is a "forgetful" functor in which each set forgets that it has a basis. Yes, that's the direction I mean. A vector space generalizes ("quantizes") in a natural way the notion of "set": rather than having definite discrete elements — two elements in a set either are or are not the same — a vector space allows super-positions of elements. A set is essentially "a vector space with a basis": morphisms of sets are morphisms of vector spaces that send basis elements to basis elements. So our forgetful functor takes each set X to the vector space Hom(X,\R) (Hom taken in the category of sets). But, I hear you complain, Hom(-,\R) is contravariant! Yes, but in this case, where I forgot to tell you that all sets are finite and all vector spaces finite-dimensional, we can make F = Hom(-,\R) covariant by F(\phi): f \mapsto g(y) = \sum_{x\in X s.t. \phi(x)=y} f(x). Actually, of course, if I'm allowing infinite sets, then I should specify that I don't quite want X \to Hom(X,\R), but the subspace of functions that send cofinitely many points in X to zero.
Anyhoo, so Set has an initial object 1 = {one element} and a terminal object 0 = {empty set}, and well-defined (up to canonical isomorphism) addition and multiplication (respectively disjoint union and cartesian product). These generalize in Vect to 1 = \R and 0 = {0}, and to direct sum and tensor product; if we identify n = "\R^n" (bad notation, because it's really n\R; I want n-dimensional space with a standard basis, so the space of column vectors), then it's especially clear that sums and products are as they should be. So Vect is, well, not quite a rig (ring without negation), because nothing is defined uniquely, but some categorified version, where all I care is that everything be defined up to canonical isomorphism (so, generically, given by a universal property).
But I can do even better. To each vector space V is associated a dual space V^* = Hom_{Vect}(V,\R), and most of the time V^{**} = V. (I need to learn more linear algebra: I think that there are various kinds of vector spaces, e.g. finite-dim ones, for which this is true, and I think that there's something like V^* = V^{***}. If so, then I should certainly always pass immediately from V to V^{**}, or some such; I really want dualing to be involutive.) By equals, of course, I always mean "up to a canonical isomorphism". Now, V\times V^* = Hom(V,V) is rather large, but there is a natural map Trace:Hom(V,V)\to\R, and this allows us to define a particular product "." which multiplies an element v\in V with w\in V^* by v.w = Tr(v\tensor w). Then . is multi-linear, as a product ought to be, and we can thus consider V.V^* = \R. Indeed, we can imagine some object 1/V that looks like V^* — a physicists wouldn't be able to tell the difference, because their elements are the same — so that V \tensor 1/V = \R. (Up to canonical isomorphism. It's not, of course, clear which copy of V we should contract 1/V with in V\tensor V. But either choice is the same up to canonical isomorphism.) There is even a natural trace from, say, \Hom(2,4) \to 2 — take the trace of the two 2x2 squares that make up the 4x2 matrices — "proving" that 4/2 = 2.
So it seems that, well, Vect is not a division rig, but it naturally extends to one. But what about that n in "ring"? What about negative dimensions? This I don't know.
See, it's an important question. Because, consider the tensor algebra T^{.}(V) = \R + V + V\tensor V + ... — this is an \N-graded algebra of multilinear functions on V^*. This looks an awful lot like the uncategorified 1+x+x^2+..., which we know is equal to 1/(1-x). (Why? Because (1-x)(1+x+...) = 1-x+x-x^2+x^2-... = 1, since every term cancels except for the -x^\infty, which is way off the page.) Anyhoo, so we ought to write the tensor algebra as 1/(1-V).
Which doesn't make any sense at all. 1-V? Well, we might as well define 1-V as dual to the tensor algebra: there should be a natural way to contract any element of 1-V with any multilinear function on V^*. But this has a much shorter algebraic expression, which ought to have Platonic meaning. So, what's a natural object that we can construct out of V that contracts (linearly) with all multilinear functions to give real-valued traces?
If we could answer this, then perhaps we could find out what -V is. How? Not, certainly, by subtracting 1=\R from 1-V. No, I suggest that whatever our proposal be, we then try it on 1-2V = (T^.(V+V))^* = 1/(\R + V+V + (V+V)\tensor(V+V) + ...), and compare. What out to happen is that there should be some natural object W such that 1-2V = W + 1-V, and it should turn out that 1-V = 1 + W. Whatever the case is, there should be a natural operation that "behaves like +" such that 1-V + V = 1. It's certainly not standard direct sum, just like how V \times 1/V is not the standard tensor product. But it should be like it in some appropriate sense. Most necessarily, it should satisfy linearity: if v_1,v_2\in V and w_1,w_2\in W, then v_1+w_1 and v_2+w_2 \in V+W should sum to (v_1+v_2)+(w_1+w_2). And, of course, if you have the right definition, then all the rest of arithmetic should work out: 1/(-V) = -(1/V), -V = -\R \times V, (-V)\times W = -(V\times W), and, most importantly, --V = V (up to canonical isomorphism).
One can go further with such symbolic manipulation. You've certainly met the symmetric tensor algebra S^{.}(V) of multilinear symmetric tensors, and you've probably defined each graded component S^{n}(V) as V^{\tensor n} / S_n, where by "/ S_n" I mean mean "modulo the S_n action that permutes the components in the n-times tensor product." (If you are a physicists, you probably defined the symmetric tensors as a _subspace_ of all tensors, rather than a quotient space, but this is ok, because the S_n identification generates a projection operator Sym: \omega \to (1/n!)\sum_{\pi\in S_n} \pi(\omega), and so the subspace is equal to the quotient. At least when the characteristic of the ground field is 0.) Well, S_n looks an awful lot like n!, so the symmetric algebra really looks like 1 + V + V^2/2! + ... = e^V. Which is reasonable: we can naturally identify S^{.}(V+W) = S^{.}V\tensor S^{.}W.
It's not quite perfect, though. The dimension of S^{.}V, if dim V = n, is not e^n, but 1 + n + n(n+1)/2 + n(n+1)(n+2)/6 + ..., which is only correct in the limit n\to\infty. Well, so why is that the dimension? When we symmetrize v\tensor w to 1/2(vw+wv), we generically identify different tensors. But v^2 symmetrizes to itself. Baez, though, says how to think about this: when a group action does not act freely, we should think of points like v^2 as only going to "half" points. So, for example, the group 2 can act on the vector space \R in a trivial way; we should think of \R/2 as consisting of only "half a dimension".
Anyway, the point is that we can divide by groups, and this is similar to our division by (dual) vector spaces: in either case, we are identifying, in a linear way, equivalence classes (either orbits or preimages).
Now, though, it becomes very obvious that we need to extend what kinds of spaces we're considering. Groups can act linearly in lots of ways, and it's rare that the quotient space is in fact a vector space. Perhaps the physicists are smart to confuse fixed subspaces and quotients: it restricts them just to projection operators. But, for instance, if we mod out \C by 2 = complex conjugation (which is real-linear, although not complex-linear), do we get \R or some more complicated orbifold? Is there a sense in which \R/2 + \R/2 = \R, where 2 acts by negation? \R/2 is the ray, so perhaps the direct sum model works, but you don't naturally get \R, just a one-dim space? To give interesting physics, it would be nice if these operations really did act on the constituent parts of each space. And what about dividing by 3? Every field has a non-trivial square root of 1, but only \C has nontrivial nth roots. So perhaps we really should just work with Vect of \C-linear spaces. Then we can always mod out by cyclic actions, but we don't normally get vector spaces.
Of course, part of the fun of categorifying is that there are multiple categorical interpretations of any arithmetic object: 6 may be the cyclic group C_6 = C_2 \times C_3, but 3! is the symmetric group S_3, and the groups 4 and 2x2 are also unequal. But if we come up with a coherent-enough theory, we ought to be able to make interesting discussion of things like square roots: there's an important sense in which the square root of a Lorentz vector is a spinor, and we should be able to say (1+V)^{1/2} = 1 + (1/2)V + (1/2)(-1/2)V^2/2 + (1/2)(-1/2)(-3/2)V^3/3! + ....
Overall, the move from Set to Vect deserves to be called "quantization" — well, really quantization doesn't yield vector spaces but (complex) Hilbert spaces, so really it should be the forgetful functor Set \to Hilb. If we have a coherent theory of how to categorify from numbers to Set, then it should match our theory of how to categorify from numbers to Hilb. And, ultimately, we should be able to understand all of linear algebra as even more trivial than how we already understand it: linear algebra is simply properly-categorified arithmetic.
10 October 2006
Tangent vectors and their fields
Voice-over: "Last time, on Blogging My Classes, Blogging My Fields,"
Screen flashes with images of surfaces and atlases. Main character says something cliche (but stunning because of the background music) about the definition of the manifold. Then screen switches to the final scene: The Scalar Bundle.
Voice-over: "And now, the continuation."
Classically, the tangent bundle T(M) to a manifold M was defined by taking equivalence classes of (parameterized) curves at each point, equivalent if they're tangent there. Slightly more universally, we can take our atlas of patches, and on each patch, consider the (locally trivial) bundle of tangent spaces to \R^n, then modding out by the transition functions between patches. But there is a better, more algebraic way to develop tangent vectors, directly from the sheaf of differentiable functions.
Within the space of linear functionals on C^\infty(M), consider those that are "derivations at the point p": l:C^\infty(M)\to\R should satisfy, for all f,g, l(fg) = f(p)l(g) + g(p)l(f). Of course, derivations at points of constant functions return zero, and one can check that derivations at points don't care about the value of the function outside a nbhd of the point, by considering bump functions. Given a coordinate patch x^i, the m derivations \d/\d x^i |_p (derivative in the i'th direction, evaluated at p) are examples, and it turns out that these form a basis for the (linear) space of derivations at p. (This is not entirely obvious. In coordinates, it comes from the fact that I can write any f(x):\R^n\to\R as f(x) = f(0) + \sum g_i(x) x^i (in a small nbhd of 0), by letting h_x(t) = f(xt) and thus g_i(x) = \int_0^t h_{x^i}(u) du.) So we have, given an n-dimensional manifold, n dimensions worth of derivations at each point.
Now, intuitively, any tangent vector gives a derivation at its basepoint, by differentiating the function "in the direction of the vector". And, intuitively, there are n dimensions worth of tangent vectors. So we can define a tangent vector at p to be a derivation at p.
Thus, it's clear that a vector field is exactly a derivation: a field worth of derivations, one at each point. Indeed, any derivation — an algebraically-defined object, exactly a linear operator L from C^\infty(M) to itself that satisfies the Leibniz rule L(fg) = L(f)g + fL(g) — gives us a derivation at each point: L_p(f) = L(f)(p). (And, by chasing definitions, two derivations agree iff they agree at each point.) More generally, we can talk about the sheaf of (tangent) vector fields, by asking about derivations of functions defined only on various open sets.
It's worth now mentioning cotangent vectors, and specifying some notation. Of course, to any vector space (e.g. T_p(M), the tangent vectors at p), we can define the dual space (of linear functionals). By linear algebra, if dim(V)<\infty, then the dual space has the same dimension; given a basis, we can construct a dual basis. Working now with manifolds, given any function f, I can get a cotangent field df defined by df(v) = v[f], where we think of v as a derivation. In particular, by the claim I made above about being able to write f in some local normal form, given a coordinate system x = {x^i} on a nbhd U, it's clear that the {dx^i} are a basis for the space of sections of T^*(U) as a module over C^\infty(U). (Similarly, the partials \d/\d x^i are a basis of {sections of T(U)} as a module over functions.)
Following the physicists' convention, I will usually just write p_i for the cotangent field p_i dx^i, and similarly I will usually just write q^i for the vector field q^i \d/\d x^i. (Continuing the summation convention.) This works, because dx^i \d/\d x^j = \delta^i_j, so (p_i dx^i)(q^j \d/\d x^j) = p_i q^j \delta^i_j = p_i q^i, so the dot-products work out right. This is only because I happen to be using a basis and its dual basis. Eventually, I may redefine the index conventions truly coordinate-independently, but for now let's maintain the convention that whenever we interpret our formulas in terms of coordinates, we always use dual bases for T and T^*.
Next time, I'd like to talk more about tensors, metrics, and similar structures. In particular, I'd like to define the Lorentz group and classify its representations.
Screen flashes with images of surfaces and atlases. Main character says something cliche (but stunning because of the background music) about the definition of the manifold. Then screen switches to the final scene: The Scalar Bundle.
Voice-over: "And now, the continuation."
Classically, the tangent bundle T(M) to a manifold M was defined by taking equivalence classes of (parameterized) curves at each point, equivalent if they're tangent there. Slightly more universally, we can take our atlas of patches, and on each patch, consider the (locally trivial) bundle of tangent spaces to \R^n, then modding out by the transition functions between patches. But there is a better, more algebraic way to develop tangent vectors, directly from the sheaf of differentiable functions.
Within the space of linear functionals on C^\infty(M), consider those that are "derivations at the point p": l:C^\infty(M)\to\R should satisfy, for all f,g, l(fg) = f(p)l(g) + g(p)l(f). Of course, derivations at points of constant functions return zero, and one can check that derivations at points don't care about the value of the function outside a nbhd of the point, by considering bump functions. Given a coordinate patch x^i, the m derivations \d/\d x^i |_p (derivative in the i'th direction, evaluated at p) are examples, and it turns out that these form a basis for the (linear) space of derivations at p. (This is not entirely obvious. In coordinates, it comes from the fact that I can write any f(x):\R^n\to\R as f(x) = f(0) + \sum g_i(x) x^i (in a small nbhd of 0), by letting h_x(t) = f(xt) and thus g_i(x) = \int_0^t h_{x^i}(u) du.) So we have, given an n-dimensional manifold, n dimensions worth of derivations at each point.
Now, intuitively, any tangent vector gives a derivation at its basepoint, by differentiating the function "in the direction of the vector". And, intuitively, there are n dimensions worth of tangent vectors. So we can define a tangent vector at p to be a derivation at p.
Thus, it's clear that a vector field is exactly a derivation: a field worth of derivations, one at each point. Indeed, any derivation — an algebraically-defined object, exactly a linear operator L from C^\infty(M) to itself that satisfies the Leibniz rule L(fg) = L(f)g + fL(g) — gives us a derivation at each point: L_p(f) = L(f)(p). (And, by chasing definitions, two derivations agree iff they agree at each point.) More generally, we can talk about the sheaf of (tangent) vector fields, by asking about derivations of functions defined only on various open sets.
It's worth now mentioning cotangent vectors, and specifying some notation. Of course, to any vector space (e.g. T_p(M), the tangent vectors at p), we can define the dual space (of linear functionals). By linear algebra, if dim(V)<\infty, then the dual space has the same dimension; given a basis, we can construct a dual basis. Working now with manifolds, given any function f, I can get a cotangent field df defined by df(v) = v[f], where we think of v as a derivation. In particular, by the claim I made above about being able to write f in some local normal form, given a coordinate system x = {x^i} on a nbhd U, it's clear that the {dx^i} are a basis for the space of sections of T^*(U) as a module over C^\infty(U). (Similarly, the partials \d/\d x^i are a basis of {sections of T(U)} as a module over functions.)
Following the physicists' convention, I will usually just write p_i for the cotangent field p_i dx^i, and similarly I will usually just write q^i for the vector field q^i \d/\d x^i. (Continuing the summation convention.) This works, because dx^i \d/\d x^j = \delta^i_j, so (p_i dx^i)(q^j \d/\d x^j) = p_i q^j \delta^i_j = p_i q^i, so the dot-products work out right. This is only because I happen to be using a basis and its dual basis. Eventually, I may redefine the index conventions truly coordinate-independently, but for now let's maintain the convention that whenever we interpret our formulas in terms of coordinates, we always use dual bases for T and T^*.
Next time, I'd like to talk more about tensors, metrics, and similar structures. In particular, I'd like to define the Lorentz group and classify its representations.
09 October 2006
A new class of entries
I think I might like to spend some time thinking about definitions in mathematical physics. What is a quantum field, for instance? Physicists usually give a slightly incoherent answer: a quantum field is a quantum particle at every point, just like a field is a number at every point. You ask them to unpack this a bit, and some might remember that there may be global — what the physicists call "topological" — issues with such a definition, but for now let's only be concerned with the local definition, where a field is a function. So what should a quantum field be?
Conveniently, I'm taking three classes right now on related questions: Differential Geometry, Geometric Methods to ODEs, and Quantum Field Theory. I would like to start a series of entries blogging those classes, and relating it back to such foundational questions. I hope to get to answers involving infinitesimals: Robinson's "Non-standard Analysis", or Kock's "Synthetic Geometry". I don't have the answers yet.
What's most important about fields is their geometric nature. Like the physicists and the classical differential geometers, I may from time to time refer to coordinates, but ultimately I'd like a coordinate-invariant picture — indeed, one without coordinates at all. I also hope to ask and answer issues about how to regularize our fields, by which I mean "how continuous should they be?" This is an extremely non-trivial question: not only is it extremely unclear how to demand that two
"nearby" "quantum particles" be "similar" (we can demand as much of classical fields: for any epsilon, there should be a delta at each point so that within the delta ball at that point the fields don't vary more than epsilon; perhaps we should find the right metric on Schrodinger-quantized particles?), but the physicists don't even want to be stuck with, say, C^\infty fields. They want \delta functions to work within their formalism. And yet they adamantly refuse to consider "pathological" fields that are too "wildly varying".
Eventually, it would be nice also to understand the Lagrangian and Hamiltonian, and this almost-symmetry between position and momentum. For now, I'd like to end this entry with some basic definitions.
Manifolds: There are many equivalent definitions of a manifold. Since the physicists and classical geometers like to work with coordinates (replacing geometry-defined, invariant objects with coordinate-defined, covariant objects), I'll use the definition that mentions coordinates explicitly. A manifold is a (metrizable) topological space M with a maximal atlas — to each "small" open set U in M we assign a module of "coordinate patches" \phi:U\to\R^n, which should be homeomorphisms, subject to some regularity condition: if \phi:U\to\R^n and \psi:V\to\R^n, then \phi\psi^{-1} should be, say, smooth wherever it's defined. In general, modifying the word manifold modifies the condition on \phi\psi^{-1}: a C^\infty manifold has that all the \phi\psi^{-1}'s are C^\infty, for example. I will generally be interested only in C^\infty (aka "smooth") manifolds, although once we understand what kinds of functions the physicists are ok with, we may change that restriction. For a manifold, I demand that the atlas be maximal in the sense that it list all possible coordinatizations consistent with the smoothness condition. It is, of course, sufficient to simply cover our space with (coherent) patches, defining the rest as all other possibilities.
So that we can generalize this definition if we need to, it would be nice to reword this definition in the language of sheaves. The god-given structure on a smooth manifold is exactly enough to tell which functions are differentiable: a sheaf is a topological space along with a ring of "smooth functions" on each open set, so that the function rings align coherently (in full glory, a sheaf is a (contravariant) functor from the category of open sets in the space to the category of commutative\R-algebras whatever your sheaf is of, along with some "local" axioms, which ultimately say that to know a function I need exactly to know it on an open cover). I probably won't use this description, largely because I don't know what other conditions I would want to put on my sheaf in order to make it into something like a smooth manifold. Clearly every manifold generates a sheaf, and I have it on good authority that if two manifolds have the same sheaf, then they are the same manifold.
So what about our most-important of objects: a field? A field is a "section" of a "bundle".
Let's start with the latter of those undefined words. To each point p\in M, we associate a (for now) vector space V_p, called the "fiber at p". And let's (for now) demand some isotropy: V_p should be isomorphic to V_q for any given p and q in M, although not necessarily canonically so. (When we move to the realm of infinite-dimensional fibers, we may demand only that the fibers be somehow "smoothly varying" — I'm not sure yet how to define this. So long as everything is finite-dimensional, the isomorphism class of a fiber is determined by an integer, and integers cannot smoothly vary, so it suffices to consider bundles where the dimension of the fibers is constant.)
There should be some sort of association between nearby fibers: locally (on small neighborhoods U) the bundle should look like U\times V. So I ought to demand that the bundle be equipped with a manifold structure, which aligns coherently with M: a bundle E is a manifold along with a projection map \pi_E : E\to M, such that the inverse image of each point is a vector space. This is the same as saying that among the coordinate patches in E's atlas, there are some of the form \Phi: \pi^{-1}(U) \to \R^(n+k), (where, of course, n is the dimension of M and k is the dimension of each fiber) so that \Phi = (\phi,\alpha), where \phi is a coordinate patch on M and \alpha is linear on each fiber. We can naturally embed M\into E by identifying each point p\in M with (p,0) in E (where 0 is the origin of the fiber at p).
I will soon make like a physicist and forget about global issues, but I do want to provide one example of why global issues are important: the cylinder and the mobius strip are both one-dimensional ("line") bundles over the circle. The latter has a "twist" in it: as you go around the circle, you come back with an extra factor of -1.
So what's a section of a bundle? A (global) section is a map s:M\to E so that \pi s:M\to M is the identity, i.e. a section picks out one vector from each fiber. We will for now think of our sections as being C^\infty.
The most important kinds of fields are "scalar" fields, by which I exactly mean a function, i.e. a number at every point. I want to do this because I want to consider other spaces of fields as modules over the ring of scalar fields, so I need to be able to multiply. Of course, there are many times when I don't want a full-fledged scalar field. The potential energy, for instance, is only defined up to a constant: I will eventually need my formalism to accommodate objects that have fields as derivatives, but aren't fields themselves. Since potentials don't care about constants, we could imagine that after going around a circle we measure a different potential energy than we had to begin with, but that we never picked up any force. The string theorists, in fact, need similar objects: locally, string theory looks like (conformal) field theory on the string's worldsheet. But perhaps the string wraps around a small extra dimension? This is why in the previous paragraph I refer to "global" sections: I really ought to allow myself a whole sheaf of fields, understanding that sometimes I want to work with fields that are only defined in a local area. But the physicists are generally clever about this type of problem, so, at the risk of saying things that we might think generalize but actually don't, I'm going to restrict my attention to scalar fields.
In which case, yes, by "scalar field" I mean "function from M \to \R". "A section of M\times\R". "A number at each point". Those who prefer to start with the sheaf of scalar fields will be happy to know that, when I define tangent vectors and their relatives in the next entry, I will start with these scalar fields.
Conveniently, I'm taking three classes right now on related questions: Differential Geometry, Geometric Methods to ODEs, and Quantum Field Theory. I would like to start a series of entries blogging those classes, and relating it back to such foundational questions. I hope to get to answers involving infinitesimals: Robinson's "Non-standard Analysis", or Kock's "Synthetic Geometry". I don't have the answers yet.
What's most important about fields is their geometric nature. Like the physicists and the classical differential geometers, I may from time to time refer to coordinates, but ultimately I'd like a coordinate-invariant picture — indeed, one without coordinates at all. I also hope to ask and answer issues about how to regularize our fields, by which I mean "how continuous should they be?" This is an extremely non-trivial question: not only is it extremely unclear how to demand that two
"nearby" "quantum particles" be "similar" (we can demand as much of classical fields: for any epsilon, there should be a delta at each point so that within the delta ball at that point the fields don't vary more than epsilon; perhaps we should find the right metric on Schrodinger-quantized particles?), but the physicists don't even want to be stuck with, say, C^\infty fields. They want \delta functions to work within their formalism. And yet they adamantly refuse to consider "pathological" fields that are too "wildly varying".
Eventually, it would be nice also to understand the Lagrangian and Hamiltonian, and this almost-symmetry between position and momentum. For now, I'd like to end this entry with some basic definitions.
Manifolds: There are many equivalent definitions of a manifold. Since the physicists and classical geometers like to work with coordinates (replacing geometry-defined, invariant objects with coordinate-defined, covariant objects), I'll use the definition that mentions coordinates explicitly. A manifold is a (metrizable) topological space M with a maximal atlas — to each "small" open set U in M we assign a module of "coordinate patches" \phi:U\to\R^n, which should be homeomorphisms, subject to some regularity condition: if \phi:U\to\R^n and \psi:V\to\R^n, then \phi\psi^{-1} should be, say, smooth wherever it's defined. In general, modifying the word manifold modifies the condition on \phi\psi^{-1}: a C^\infty manifold has that all the \phi\psi^{-1}'s are C^\infty, for example. I will generally be interested only in C^\infty (aka "smooth") manifolds, although once we understand what kinds of functions the physicists are ok with, we may change that restriction. For a manifold, I demand that the atlas be maximal in the sense that it list all possible coordinatizations consistent with the smoothness condition. It is, of course, sufficient to simply cover our space with (coherent) patches, defining the rest as all other possibilities.
So that we can generalize this definition if we need to, it would be nice to reword this definition in the language of sheaves. The god-given structure on a smooth manifold is exactly enough to tell which functions are differentiable: a sheaf is a topological space along with a ring of "smooth functions" on each open set, so that the function rings align coherently (in full glory, a sheaf is a (contravariant) functor from the category of open sets in the space to the category of commutative
So what about our most-important of objects: a field? A field is a "section" of a "bundle".
Let's start with the latter of those undefined words. To each point p\in M, we associate a (for now) vector space V_p, called the "fiber at p". And let's (for now) demand some isotropy: V_p should be isomorphic to V_q for any given p and q in M, although not necessarily canonically so. (When we move to the realm of infinite-dimensional fibers, we may demand only that the fibers be somehow "smoothly varying" — I'm not sure yet how to define this. So long as everything is finite-dimensional, the isomorphism class of a fiber is determined by an integer, and integers cannot smoothly vary, so it suffices to consider bundles where the dimension of the fibers is constant.)
There should be some sort of association between nearby fibers: locally (on small neighborhoods U) the bundle should look like U\times V. So I ought to demand that the bundle be equipped with a manifold structure, which aligns coherently with M: a bundle E is a manifold along with a projection map \pi_E : E\to M, such that the inverse image of each point is a vector space. This is the same as saying that among the coordinate patches in E's atlas, there are some of the form \Phi: \pi^{-1}(U) \to \R^(n+k), (where, of course, n is the dimension of M and k is the dimension of each fiber) so that \Phi = (\phi,\alpha), where \phi is a coordinate patch on M and \alpha is linear on each fiber. We can naturally embed M\into E by identifying each point p\in M with (p,0) in E (where 0 is the origin of the fiber at p).
I will soon make like a physicist and forget about global issues, but I do want to provide one example of why global issues are important: the cylinder and the mobius strip are both one-dimensional ("line") bundles over the circle. The latter has a "twist" in it: as you go around the circle, you come back with an extra factor of -1.
So what's a section of a bundle? A (global) section is a map s:M\to E so that \pi s:M\to M is the identity, i.e. a section picks out one vector from each fiber. We will for now think of our sections as being C^\infty.
The most important kinds of fields are "scalar" fields, by which I exactly mean a function, i.e. a number at every point. I want to do this because I want to consider other spaces of fields as modules over the ring of scalar fields, so I need to be able to multiply. Of course, there are many times when I don't want a full-fledged scalar field. The potential energy, for instance, is only defined up to a constant: I will eventually need my formalism to accommodate objects that have fields as derivatives, but aren't fields themselves. Since potentials don't care about constants, we could imagine that after going around a circle we measure a different potential energy than we had to begin with, but that we never picked up any force. The string theorists, in fact, need similar objects: locally, string theory looks like (conformal) field theory on the string's worldsheet. But perhaps the string wraps around a small extra dimension? This is why in the previous paragraph I refer to "global" sections: I really ought to allow myself a whole sheaf of fields, understanding that sometimes I want to work with fields that are only defined in a local area. But the physicists are generally clever about this type of problem, so, at the risk of saying things that we might think generalize but actually don't, I'm going to restrict my attention to scalar fields.
In which case, yes, by "scalar field" I mean "function from M \to \R". "A section of M\times\R". "A number at each point". Those who prefer to start with the sheaf of scalar fields will be happy to know that, when I define tangent vectors and their relatives in the next entry, I will start with these scalar fields.
16 September 2006
a statement of belief
Pacifism is something I've struggled with since at least mid high school. When the President started making waves about Iraq, the American Left moved strongly towards an isolationist/pacifist stance, and although I was nervous about the occasional paleoconservative philosophy, I was already on the bandwagon, having felt that the President's hasty response in Afghanistan was poorly executed, hasty, and morally questionable. At the same time, however, I was reading A Problem From Hell: America and the Age of Genocide by Samantha Power, a fantastic book by a New York Times writer that lays the blame for the 20th Century's genocides squarely at the feet of this country and its reluctance to involve itself militarily in foreign affairs.
I did, at the time, describe myself as "trying to move towards pacifism". In my case, it wasn't a question of will power, but of wrestling with the morally ambiguous issue of military humanitarian intervention.
Having grown up in a Christian society, immersed in "turn the other cheek" rhetoric, I definitely understand the appeal. It is the noble thing to do for the resource-rich. For the resource-poor, "turning the other cheek" effectively means not responding to oppression, and it is totally not clear to me, in instances of direct physical threat, when the switch from resource rich to resource poor happens.
Were I attacked, would I be able to kill someone? No. Of course not. Do I think it would be moral to do so? Probably not. Were I to watch someone rape and murder my sister, I still would probably be unable to kill them; were the choice between killing them or having them rape and murder my sister, I think that I would not be able to bring myself to killing someone. But the moral action? Probably, yes, to prevent imminent harm murder might be valid.
More generally, I simply do not believe that retributive justice is ethical. And since it is unethical to deprive you of the right to make personal decisions about life and death (you, for instance, have the right, in my mind, to suicide), it is certainly unethical to do so as punishment. But incarceration has four uses (and, since I don't value life per se the way many people do, I see murder as essentially a complete and violent form of incarceration) --- as retributive justice, as a way of bettering people, as deterrent, and to prevent other harm --- and the last is potentially ethical (the second would be if it were effective, but it is not). I can morally justify murder for, and only for, the purpose of preventing future harm, only as a last resort, and only when "turn the other cheek" is not the correct response. Ultimately, ethical decisions do involve balancing acts.
So what about the utilitarian test of the tourist, who may either kill one captured Indian (setting the rest free), or allow all twenty to be killed? I think that either choice must be allowed as an ethical choice; I myself would be entirely unable to fire the gun. But ultimately the answer is not really either: the completely ethical action is to consult first with the Indians and ask what they want. What's unethical about the situation is that the tourist is ultimately one of the oppressors, making life and death decisions for the oppressed people. Perhaps one Indian is willing to sacrifice themselves. Perhaps they decide to draw straws. Or perhaps they decide that they would all be happier dying than knowing that they lived only because someone else died for them. It should be their decision to make.
Similarly in international affairs, if we see endemic oppression, we may, and indeed we must, involve ourselves to help the oppressed. We must be careful to do the most effective things, and this is rarely, I believe, militaristic, and we must base our decisions strongly on what the oppressed people would like us to do. (This is, of course, hard. The most oppressed people are often sub-altern.) But, when we have the resources to just stand in the way of oppression, and absorb the onslaught of violent attempts to maintain the oppression while "turning the other cheek", then we cannot justify engaging ourselves in violence.
And, yet, we often find that we do not have such resources. And then is humanitarian military aid ethical? It helped, Power says, in Kosovo. I don't know.
I did, at the time, describe myself as "trying to move towards pacifism". In my case, it wasn't a question of will power, but of wrestling with the morally ambiguous issue of military humanitarian intervention.
Having grown up in a Christian society, immersed in "turn the other cheek" rhetoric, I definitely understand the appeal. It is the noble thing to do for the resource-rich. For the resource-poor, "turning the other cheek" effectively means not responding to oppression, and it is totally not clear to me, in instances of direct physical threat, when the switch from resource rich to resource poor happens.
Were I attacked, would I be able to kill someone? No. Of course not. Do I think it would be moral to do so? Probably not. Were I to watch someone rape and murder my sister, I still would probably be unable to kill them; were the choice between killing them or having them rape and murder my sister, I think that I would not be able to bring myself to killing someone. But the moral action? Probably, yes, to prevent imminent harm murder might be valid.
More generally, I simply do not believe that retributive justice is ethical. And since it is unethical to deprive you of the right to make personal decisions about life and death (you, for instance, have the right, in my mind, to suicide), it is certainly unethical to do so as punishment. But incarceration has four uses (and, since I don't value life per se the way many people do, I see murder as essentially a complete and violent form of incarceration) --- as retributive justice, as a way of bettering people, as deterrent, and to prevent other harm --- and the last is potentially ethical (the second would be if it were effective, but it is not). I can morally justify murder for, and only for, the purpose of preventing future harm, only as a last resort, and only when "turn the other cheek" is not the correct response. Ultimately, ethical decisions do involve balancing acts.
So what about the utilitarian test of the tourist, who may either kill one captured Indian (setting the rest free), or allow all twenty to be killed? I think that either choice must be allowed as an ethical choice; I myself would be entirely unable to fire the gun. But ultimately the answer is not really either: the completely ethical action is to consult first with the Indians and ask what they want. What's unethical about the situation is that the tourist is ultimately one of the oppressors, making life and death decisions for the oppressed people. Perhaps one Indian is willing to sacrifice themselves. Perhaps they decide to draw straws. Or perhaps they decide that they would all be happier dying than knowing that they lived only because someone else died for them. It should be their decision to make.
Similarly in international affairs, if we see endemic oppression, we may, and indeed we must, involve ourselves to help the oppressed. We must be careful to do the most effective things, and this is rarely, I believe, militaristic, and we must base our decisions strongly on what the oppressed people would like us to do. (This is, of course, hard. The most oppressed people are often sub-altern.) But, when we have the resources to just stand in the way of oppression, and absorb the onslaught of violent attempts to maintain the oppression while "turning the other cheek", then we cannot justify engaging ourselves in violence.
And, yet, we often find that we do not have such resources. And then is humanitarian military aid ethical? It helped, Power says, in Kosovo. I don't know.
02 September 2006
A Categorical Definition
Categories, best described with commutative diagrams, allow for truly non-linear thinking, and yet they are usually defined linearly, similar to the way groups are usually introduced. This almost makes sense: morphisms, as one-dimensional objects, are about the most natural thing to compose linearly. And yet the power of category theory comes from the non-linear diagrams people draw, showing non-linear relationships between objects. The snake lemma, for instance, is poorly expressed and even more poorly understood if you are limited to writing words on a page.
I am, on this blog, constrained to poor, linear writing. Nevertheless, I would like to provide a definition of "category" using only visual, diagramatic ideas. Perhaps I will succeed in ASCIIing the diagrams. This definition, I hope, will ultimately be seen as providing a more basic understanding of these powerful creatures.
To begin with, a diagram is a (labeled) directed graph: it's composed of (labeled) vertices ("objects") connected by (labeled) directed edges ("arrows"). There is a natural notion of "subdiagram", formed by deleting any collection of edges, and any collection of vertices and all their edges. A category is a collection of diagrams, subject to certain rules, to be enunciated lower down. The diagrams in a category are said to commute. I will sometimes leave off labels, in which case I generally mean to refer to all (commutative) diagrams of that "shape".
Rule 0: A diagram commutes if and only if all its finite subdiagrams commute. (Thus any subdiagram of a commutative diagram commutes. In any category, the empty diagram commutes.)
One advantage of this construction is that I don't need any of that junk about "a collection of objects, and between each pair A and B of objects a set Hom(A,B) of arrows...". Instead, I can say simply that a morphism is a commutative diagram A ---> B, where I have left of the label on the arrow: I will write "f:A->B" for the very simple diagram of an arrow from A to B labeled by f, but only because I have to make it fit in this constrained formatting.
Rule 1: If two diagrams commute, then their disjoint union commutes. If diagram D contains an object label A, and E contains B, and if A ---> B commutes (morphism called f), then in the disjoint union we can connect A and B via f so that the diagram still commutes:
Often we will draw a dotted arrow in a diagram. In these notes, creating dotted arrows is too hard; I will use equal signs instead, as in ===>. A diagram with a dotted arrow means "If the diagram without the dotted arrow commutes, then there is exactly one diagram with an (labeled) arrow in place of the dotted one." Sometimes folks will write a label on the arrow, in which they mean the name (label) that the (unique) arrow which replaces it should have.
Rule 2: For any diagram E containing a finite chain (in the picture, I draw E "surrounding" the chain, to suggest that various objects in the chain might have other arrows to and from the rest of E), we have
So, in particular, some corollaries:
However, one more rule concerning the various 1_A morphisms is necessary.
Rule 3If a commutative diagram contains 1_A:A->A for some A, then we can maintain commutativity by replacing this diagram with A: all arrows into and out of either A in the original diagram now go into and out of the single A in the new diagram. Conversely, any object A in a diagram may be replaced by 1_A:A->A, where all arrows on A are now duplicated, and placed once on each A. I will not try to draw this, but you should.
And that, folks, is it. Rules 0 through 3 suffice to define a category.
Of course, we should say some more, to convince you that pure diagrammatic thinking is useful. For instance, an isomorphism is a commutative diagram of the form
With our "dotted arrow" (except I'm using equal signs) notation, we can go on to define "universal properties". It seems to me that there should really be "universal properties" and "co-universal properties", depending on which direction the arrows go. To define these, we introduce another new symbol [], which is kindof like ==>; whereas ==> defined an arrow uniquely, [] defines objects up to isomorphism. How? Well, say we have a diagram with a box. Then we're saying that (if the rest of the diagram commutes), then there's some object X which can fill the box (i.e. a labeling for that object that makes the diagram commute) s.t. if any other Y can also fill the box, then there's a dotted arrow from Y to X. I.e.:
where the idea in the last picture is that Y and X each have all the arrows going from and to them and D that the original diagram says they should have. The dual notion to a universal property is a "co-universal property", in which Y => X is replaced by "Y <= X" in the definition. If needed, I will write that as {}.
Now, I don't actually know of any useful (co-)universal properties that are not (co-)limits — well, I think tensor products might be one, but I don't remember how to define that — so I really ought to just define limit. And maybe I should have just started with them, I dunno.
Anyhoo, given a (commutative) diagram D, the limit of the diagram lim D is the (universal property) diagram given by D with a box added, and arrows from the added box to each object in D. Co-limits are the dual notion. Limits, like anything defined via universal property, need not exist. Some examples:
I will stop here. Many category theory books from around here and onward start using diagrammatic reasoning and definitions more frequently, so I refer you to them for other definitions of other objects. My goal was to give diagrammatic definitions of the most basic elements of category theory, and to suggest that categories are best thought of not as collections of objects and morphisms, but simply as collections of diagrams.
I am, on this blog, constrained to poor, linear writing. Nevertheless, I would like to provide a definition of "category" using only visual, diagramatic ideas. Perhaps I will succeed in ASCIIing the diagrams. This definition, I hope, will ultimately be seen as providing a more basic understanding of these powerful creatures.
To begin with, a diagram is a (labeled) directed graph: it's composed of (labeled) vertices ("objects") connected by (labeled) directed edges ("arrows"). There is a natural notion of "subdiagram", formed by deleting any collection of edges, and any collection of vertices and all their edges. A category is a collection of diagrams, subject to certain rules, to be enunciated lower down. The diagrams in a category are said to commute. I will sometimes leave off labels, in which case I generally mean to refer to all (commutative) diagrams of that "shape".
Rule 0: A diagram commutes if and only if all its finite subdiagrams commute. (Thus any subdiagram of a commutative diagram commutes. In any category, the empty diagram commutes.)
One advantage of this construction is that I don't need any of that junk about "a collection of objects, and between each pair A and B of objects a set Hom(A,B) of arrows...". Instead, I can say simply that a morphism is a commutative diagram A ---> B, where I have left of the label on the arrow: I will write "f:A->B" for the very simple diagram of an arrow from A to B labeled by f, but only because I have to make it fit in this constrained formatting.
Rule 1: If two diagrams commute, then their disjoint union commutes. If diagram D contains an object label A, and E contains B, and if A ---> B commutes (morphism called f), then in the disjoint union we can connect A and B via f so that the diagram still commutes:
DDD EEE
D f E
D A ---> B E
Often we will draw a dotted arrow in a diagram. In these notes, creating dotted arrows is too hard; I will use equal signs instead, as in ===>. A diagram with a dotted arrow means "If the diagram without the dotted arrow commutes, then there is exactly one diagram with an (labeled) arrow in place of the dotted one." Sometimes folks will write a label on the arrow, in which they mean the name (label) that the (unique) arrow which replaces it should have.
Rule 2: For any diagram E containing a finite chain (in the picture, I draw E "surrounding" the chain, to suggest that various objects in the chain might have other arrows to and from the rest of E), we have
EEEEEEEEEEEEEEEEEEEEEEEEE
E __ B -...-> C E
E /| \ E
E / _| E
E A =============> D E
So, in particular, some corollaries:
- "composition"
A -> B -> C
========> - "associativitiy"
if B -> C and B -> C commute
^ > \ |
| / > v
A D
then so do C and B
> | ^ / v | >
A -> D A -> D
(of course, with all edges (consistently) labeled.) - "identity"
A <==|| \\\ //
normally called "\mathbb{1}_A", I will just call this morphism "1_A". I will let you write out the "left and right identity laws" in this language; they follow from Rule 2.
\====/
However, one more rule concerning the various 1_A morphisms is necessary.
Rule 3If a commutative diagram contains 1_A:A->A for some A, then we can maintain commutativity by replacing this diagram with A: all arrows into and out of either A in the original diagram now go into and out of the single A in the new diagram. Conversely, any object A in a diagram may be replaced by 1_A:A->A, where all arrows on A are now duplicated, and placed once on each A. I will not try to draw this, but you should.
And that, folks, is it. Rules 0 through 3 suffice to define a category.
Of course, we should say some more, to convince you that pure diagrammatic thinking is useful. For instance, an isomorphism is a commutative diagram of the form
--->
A B.
<---
With our "dotted arrow" (except I'm using equal signs) notation, we can go on to define "universal properties". It seems to me that there should really be "universal properties" and "co-universal properties", depending on which direction the arrows go. To define these, we introduce another new symbol [], which is kindof like ==>; whereas ==> defined an arrow uniquely, [] defines objects up to isomorphism. How? Well, say we have a diagram with a box. Then we're saying that (if the rest of the diagram commutes), then there's some object X which can fill the box (i.e. a labeling for that object that makes the diagram commute) s.t. if any other Y can also fill the box, then there's a dotted arrow from Y to X. I.e.:
DD DD
DDDDDD DDDDDD
[] DD means that X DD
DDDDDD DDDDDD
DD DD
DD
DDDDDD
and if any other Y DD, then
DDDDDD
DD
DDDDDDDD
DD DD DD
Y => X DD
DD DD DD
DDDDDDDD
where the idea in the last picture is that Y and X each have all the arrows going from and to them and D that the original diagram says they should have. The dual notion to a universal property is a "co-universal property", in which Y => X is replaced by "Y <= X" in the definition. If needed, I will write that as {}.
Now, I don't actually know of any useful (co-)universal properties that are not (co-)limits — well, I think tensor products might be one, but I don't remember how to define that — so I really ought to just define limit. And maybe I should have just started with them, I dunno.
Anyhoo, given a (commutative) diagram D, the limit of the diagram lim D is the (universal property) diagram given by D with a box added, and arrows from the added box to each object in D. Co-limits are the dual notion. Limits, like anything defined via universal property, need not exist. Some examples:
A x B = lim A B
B
A x_C B = lim |
v
A -> C
terminal object = lim (empty diagram)
1_A
lim A = A ---> A, which we're considering to be equivalent to A.
I will stop here. Many category theory books from around here and onward start using diagrammatic reasoning and definitions more frequently, so I refer you to them for other definitions of other objects. My goal was to give diagrammatic definitions of the most basic elements of category theory, and to suggest that categories are best thought of not as collections of objects and morphisms, but simply as collections of diagrams.
31 August 2006
Today's News
Today's headlines in the New York Times:
No news is good news?
- Lockheed Martin got another government contract.
- Bush said something he said last week too.
- Folks post stuff online.
- Someone in Chicago wants to be mayor.
No news is good news?
28 August 2006
Young boys and a man
While looking out the window at a rainy Newark Airport and waiting for a very delayed flight, I found myself standing next to a young boy — perhaps five or six — eating a large roll of bread. I struck up a conversation, and we were soon joined by his older brother — six or seven. I let the conversation go wherever it wandered, and learned quite a lot: that their father is a pilot; that the Yankees are the best baseball team, pitching is the best position, and next year they won't use the tee until you get six strikes; that the bushes below the hotel in Hawaii with the big rooms (three balconies in the suite!) now house a favorite action figure; that the police climbing the stairs into the jet-way were probably entering the airplane, because if there were a bad guy in the terminal, the security would have caught him in the initial screening (in fact, they were there to escort a very drunk passenger, who had repeatedly opened an alarmed door, from the terminal to the hospital).
After a while, their farther joined us at the window. "Tell the man next to you" — me — "what the kind of plane with the bump on top is," he asked his younger son. "I'll give you a hint: it starts Seven...."
"Um, Seven Seven?"
"No, Seven Forty-Seven."
"Seven Forty-Seven."
"And if there are [a particular kind of wing flaps]" — here my memory of the technical terms, which he used, has gone — "then it's a 747-400."
What I found most memorable about this discussion was not the ease with which we changed topics — an ease I normally associate with the uniformly brilliant kids at Mathcamp; an ease often pathologized as ADHD and ruined with drugs such asspeed ritalin — nor the freedom with which these kids would talk to a complete stranger. What stuck with me was one particular piece of language: "Tell the man next to you..."
Those who've known me for a while may remember previous discussions I've had (though I think not here) about the different words "boy", "man", "kid", etc., which I find fascinating. I've intentionally used some throughout this entry: Mathcamp students and five-year-olds I've both described as "kids," for instance, whereas my first companion was a "young boy." I generally insist that periodicals refer to high school, and certainly college, students as "men" and "women": my freshman roommate was on the men's swim team, and in my brother's CS class there are only six women, as opposed to "boys'" and "girls." Mathcampers, on the other hand, and even my housemates, I often think of as "boys and girls". Not "children," perhaps, but "kids."
What's hardest, though, is self-identity — I'm good at holding multiple contradictory beliefs about the external realty — I had never before defined myself as someone who could be a "man [standing] next to you." Perhaps, when discussing sexual and gender politics, I've identified myself as a "(suitably adjectived) man," but more often as a "male." Categories like "men who have sex with men" are so entirely foreign and don't seem to apply to me or any of my peers. People in my socioeconomic class don't become "adults" until closer to 26, but I'm definitely no longer a "young adult." I'm a "student" or a "guy," not a "man."
One reason for my sojourn to New York was to attend a ninetieth birthday party and family reunion, where I spent some time chatting with various second cousins whom I haven't seen in ten years. My father, an older brother, is younger than his cousins, so while I played cards and board games with my fourteen-year-old cousin, the majority of "my generation" were three to ten years older than me. One announced the wonderful news of her pregnancy, making the matriarch whose birthday we were celebrating extremely happy. I'm used to my peers consisting of younger siblings and students exactly my age; I'm used to understanding those classmates only a few years older than me as significantly closer to adult, since they tend to be grad students when I'm an undergrad, or undergrads when I'm in high school.
But I'll be graduating in four months, and dreaming of my own apartment, and, eventually, house and family. I watch my fresh-out-of-college friends with their jobs in Silicon Valley, and can't help but think how similar that life is to college — they have roommates, come to campus, go on dates. They're no more "adults" than I am.
I have no trouble being "mature", or "old", or even relatively "grown up". But I'm twenty-one years old, and have a hard time thinking of myself as an "adult". Identifying as a "man" is impossible, and it is my current self-descriptor.
After a while, their farther joined us at the window. "Tell the man next to you" — me — "what the kind of plane with the bump on top is," he asked his younger son. "I'll give you a hint: it starts Seven...."
"Um, Seven Seven?"
"No, Seven Forty-Seven."
"Seven Forty-Seven."
"And if there are [a particular kind of wing flaps]" — here my memory of the technical terms, which he used, has gone — "then it's a 747-400."
What I found most memorable about this discussion was not the ease with which we changed topics — an ease I normally associate with the uniformly brilliant kids at Mathcamp; an ease often pathologized as ADHD and ruined with drugs such as
Those who've known me for a while may remember previous discussions I've had (though I think not here) about the different words "boy", "man", "kid", etc., which I find fascinating. I've intentionally used some throughout this entry: Mathcamp students and five-year-olds I've both described as "kids," for instance, whereas my first companion was a "young boy." I generally insist that periodicals refer to high school, and certainly college, students as "men" and "women": my freshman roommate was on the men's swim team, and in my brother's CS class there are only six women, as opposed to "boys'" and "girls." Mathcampers, on the other hand, and even my housemates, I often think of as "boys and girls". Not "children," perhaps, but "kids."
What's hardest, though, is self-identity — I'm good at holding multiple contradictory beliefs about the external realty — I had never before defined myself as someone who could be a "man [standing] next to you." Perhaps, when discussing sexual and gender politics, I've identified myself as a "(suitably adjectived) man," but more often as a "male." Categories like "men who have sex with men" are so entirely foreign and don't seem to apply to me or any of my peers. People in my socioeconomic class don't become "adults" until closer to 26, but I'm definitely no longer a "young adult." I'm a "student" or a "guy," not a "man."
One reason for my sojourn to New York was to attend a ninetieth birthday party and family reunion, where I spent some time chatting with various second cousins whom I haven't seen in ten years. My father, an older brother, is younger than his cousins, so while I played cards and board games with my fourteen-year-old cousin, the majority of "my generation" were three to ten years older than me. One announced the wonderful news of her pregnancy, making the matriarch whose birthday we were celebrating extremely happy. I'm used to my peers consisting of younger siblings and students exactly my age; I'm used to understanding those classmates only a few years older than me as significantly closer to adult, since they tend to be grad students when I'm an undergrad, or undergrads when I'm in high school.
But I'll be graduating in four months, and dreaming of my own apartment, and, eventually, house and family. I watch my fresh-out-of-college friends with their jobs in Silicon Valley, and can't help but think how similar that life is to college — they have roommates, come to campus, go on dates. They're no more "adults" than I am.
I have no trouble being "mature", or "old", or even relatively "grown up". But I'm twenty-one years old, and have a hard time thinking of myself as an "adult". Identifying as a "man" is impossible, and it is my current self-descriptor.
17 August 2006
Angst and graduate school
I'm taking the GREs tomorrow (today), so instead of sleeping I'm avoiding looking up the rules and instructions. To do well on tests, it's best to go in knowing the structure of both the individual questions and the test as a whole. I don't yet, because I've been procrastinating with such useful time-sinks as listening to all of these pieces (link from the most excellent TWF234, about math and music). Oh, and actually getting things done --- I've written thank-you cards, answered e-mails --- there's no better way to be truly productive than to avoid something you really, really have to do.
One of my major accomplishments was writing back to a mathematical physicists from whom I had asked advice about grad schools. My e-mail ended up doing a decent job of outlining both some of my angst and some of my intellectual excitement; I thought you might enjoy it, and I'd gladly here your advice as well:
I'm at what I imagine to be the hardest part of grad school applications (and academic life? I'm sure there are harder things that my fantasy of the "easy life after grad school" leaves out) before actually writing them: figuring out what I want to do. This seems to come in two parts: 1. what am I interested in (and how to formulate it, and how much to formulate it or leave interests undecided as yet)? 2. and what people and departments are right for me given my interests?
I know I want to go into a career in mathematics; I want to teach calculus, and, though I enjoy the amorphous lands between math and physics, I've been ultimately happiest in math departments. At the same time, I know that the mathematics I want to study should have obvious connections with the physics: I'm happiest when I can use language and ways of thinking from physical theories, and when it's clear how the mathematical objects I'm playing with are connected to various attempts at fundamental theories. I want an anthropologist to conclude that my epistemology involves a real world that I'm studying (as opposed to those mathematicians who study platonic, nonexistent ideals). I assume that such is "mathematical physics" --- I definitely enjoy the material in John Baez's This Week's Finds. I would not be interested in studying interesting applied math such as fluid mechanics (or cryptography).
More precisely? I've been devouring This Week's Finds recently, so have been enjoying Baez's fascination with n-categories. I could happily study those for a while. Mostly as a way for me to record and inspire my own thoughts, I've been working on defining linear algebra entirely in terms of Penrose's tangle notation for tensors. I generally feel like algebraic notations, in which ideas are strung in lines, is restrictive and doesn't take advantage of the page.
My favorite toy is the hyperreal numbers, invented more or less by Abraham Robinson. These beasties have the power to do all of calculus, and provide actual interpretations for divergent sums, concepts like "much smaller than", and other important tools that are generally treated with intuition rather than rigor in most of math and physics. I would love to work on various projects to interpret and rigorize the mathematical footing of modern physics with this type of under-used tool. I hold that Robinson's calculus is more powerful than Cauchy's --- it's a conservative extension, so it can't prove anything that Cauchy can't, but it provides much more elementary meanings to a lot of the intuition. Vector fields really are infinitesimal, etc. The problem is that Robinson didn't get very far in constructing a user interface for his operating system. He can do Leibniz calculus, but no better than Cauchy can, and he didn't go farther. Cauchy is Windows to Robinson's Unix; I want to write Macintosh, incorporating QFT and the like.
What interests me most, beyond the actual mathematics, is the methods and institutions of mathematics and physics, and bettering those. I'm fascinated by the ways people think about math and physics, and the language they use, in a normative way: I want to find ways of understanding objects that get at their meanings, and specifically by combining math and physics intuition. It seems that the physicists are much more willing to take a cavalier attitude towards rigor, instead inventing formalisms that _might_ work in order to answer hard, hands-on questions. Whereas mathematicians may be better able to think extremely abstractly and provide the rigor, thereby arriving at a deeper meaning for the physicists' doodles. I want to help the mathematicians think in terms of particles, local processes, and effective theories (not to mention in terms of two-dimensional diagrams rather than "linear" equations). So what I would actually like to do is act as a translator.
So I've gotten some of the way towards an answer to my first question. Of course my interests will change as I continue to learn more math and physics. But my second question? I need all the advice I can get.
I think that I'm a strong applicant. I've taken the undergrad Intro to String Theory, and I'll be taking QFT this year. In math, I've taken a fair amount of algebra and analysis; most of my math knowledge comes from reading (including lots of math and physics blogs) and attending (now as a counselor) Canada/USA Mathcamp, which tries to expose its students to a wide variety of graduate-level math. So I'm looking at the top: strong departments, with people doing what I'm interest in.
But which are those? And who are the people?
Are there mathematical physics journals I should be reading or glancing at, because they're interesting or because they will suggest people and places I should pursue?
If you have any other advise for an aspiring (and presumably as-yet naive) mathematician with physics envy, please do share. Thank you so much.
One of my major accomplishments was writing back to a mathematical physicists from whom I had asked advice about grad schools. My e-mail ended up doing a decent job of outlining both some of my angst and some of my intellectual excitement; I thought you might enjoy it, and I'd gladly here your advice as well:
I'm at what I imagine to be the hardest part of grad school applications (and academic life? I'm sure there are harder things that my fantasy of the "easy life after grad school" leaves out) before actually writing them: figuring out what I want to do. This seems to come in two parts: 1. what am I interested in (and how to formulate it, and how much to formulate it or leave interests undecided as yet)? 2. and what people and departments are right for me given my interests?
I know I want to go into a career in mathematics; I want to teach calculus, and, though I enjoy the amorphous lands between math and physics, I've been ultimately happiest in math departments. At the same time, I know that the mathematics I want to study should have obvious connections with the physics: I'm happiest when I can use language and ways of thinking from physical theories, and when it's clear how the mathematical objects I'm playing with are connected to various attempts at fundamental theories. I want an anthropologist to conclude that my epistemology involves a real world that I'm studying (as opposed to those mathematicians who study platonic, nonexistent ideals). I assume that such is "mathematical physics" --- I definitely enjoy the material in John Baez's This Week's Finds. I would not be interested in studying interesting applied math such as fluid mechanics (or cryptography).
More precisely? I've been devouring This Week's Finds recently, so have been enjoying Baez's fascination with n-categories. I could happily study those for a while. Mostly as a way for me to record and inspire my own thoughts, I've been working on defining linear algebra entirely in terms of Penrose's tangle notation for tensors. I generally feel like algebraic notations, in which ideas are strung in lines, is restrictive and doesn't take advantage of the page.
My favorite toy is the hyperreal numbers, invented more or less by Abraham Robinson. These beasties have the power to do all of calculus, and provide actual interpretations for divergent sums, concepts like "much smaller than", and other important tools that are generally treated with intuition rather than rigor in most of math and physics. I would love to work on various projects to interpret and rigorize the mathematical footing of modern physics with this type of under-used tool. I hold that Robinson's calculus is more powerful than Cauchy's --- it's a conservative extension, so it can't prove anything that Cauchy can't, but it provides much more elementary meanings to a lot of the intuition. Vector fields really are infinitesimal, etc. The problem is that Robinson didn't get very far in constructing a user interface for his operating system. He can do Leibniz calculus, but no better than Cauchy can, and he didn't go farther. Cauchy is Windows to Robinson's Unix; I want to write Macintosh, incorporating QFT and the like.
What interests me most, beyond the actual mathematics, is the methods and institutions of mathematics and physics, and bettering those. I'm fascinated by the ways people think about math and physics, and the language they use, in a normative way: I want to find ways of understanding objects that get at their meanings, and specifically by combining math and physics intuition. It seems that the physicists are much more willing to take a cavalier attitude towards rigor, instead inventing formalisms that _might_ work in order to answer hard, hands-on questions. Whereas mathematicians may be better able to think extremely abstractly and provide the rigor, thereby arriving at a deeper meaning for the physicists' doodles. I want to help the mathematicians think in terms of particles, local processes, and effective theories (not to mention in terms of two-dimensional diagrams rather than "linear" equations). So what I would actually like to do is act as a translator.
So I've gotten some of the way towards an answer to my first question. Of course my interests will change as I continue to learn more math and physics. But my second question? I need all the advice I can get.
I think that I'm a strong applicant. I've taken the undergrad Intro to String Theory, and I'll be taking QFT this year. In math, I've taken a fair amount of algebra and analysis; most of my math knowledge comes from reading (including lots of math and physics blogs) and attending (now as a counselor) Canada/USA Mathcamp, which tries to expose its students to a wide variety of graduate-level math. So I'm looking at the top: strong departments, with people doing what I'm interest in.
But which are those? And who are the people?
Are there mathematical physics journals I should be reading or glancing at, because they're interesting or because they will suggest people and places I should pursue?
If you have any other advise for an aspiring (and presumably as-yet naive) mathematician with physics envy, please do share. Thank you so much.
09 August 2006
A conventional question
I'm in the progress of writing up what I understand about tensors, defining them from scratch, using only intuition and Penrose's graphical notation. Eventually, perhaps I will write a version of my notes for Wikipedia, since their current article on the subject is laughably bad. I first read about them in this post by jao at physics musings; I had started reading Penrose's most recent book, The Road to Reality: A Complete Guide to the Laws of the Universe, which explores them in some depth. I am going to shamelessly reproduce jao's picture of such diagrams, so that you have some idea what I'm talking about:
Incidentally, I wonder what the history of such doodles really is. I hear talk of "einbeins", "zweibeins" and "dreibeins", lit. one-leg, two-leg, and three-leg, and if I knew more German, multi-legs ("mehrbeins"?), which sound like these tensorial pictures. Based on skimming the discussion here, it looks like einbeins are related, but not fully formulated. I wonder why someone would refer to an objects legs, though, unless it had legs.
Anyway, the notational question I wanted to ask was this:
We generally write "vectors" (as opposed to covectors) with raised indices, and covectors with lowered indices. This has physical significance: the Poincare group acts differently depending on whether the index is raised or lowered: on lowered indices, symmetries act by the adjoint, and so it's really a "dual" action, in the sense that it happens in the backwards order. So, although in some sense vectors and covectors are interchangeable, interpretations of diagrams are not.
Since vectors' indices are raised, Penrose proposes that a vector ought to have one "arm" (an edge coming out of the top), whereas a covector ought to have one leg. This makes sense, and closely matches how he thinks of contractions: contracting indices corresponds to drawing curves from the tops of the vectors to the bottoms of the covectors.
On the other hand, as soon as you start playing around with Penrose's diagrams — well, as soon as Josh H. started playing with them, when I introduced them to him over IM — you notice the connection between these diagrams and various ideas from quantum topology. In particular, diagrams like this look an awful lot like tangles.
This is actually no surprise. A n,m-tensor (one with n arms and m legs, so e.g. a vector is a 1,0-tensor), by definition, is a map from V tensored with itself m times to V tensored with itself n times. (By convention, V tensored with itself 0 times is the ground field — no, not a generic one-dimensional vector space, because I do in fact need the special number "1". This is so that there is a natural isomorphism between "V^0 tensor W" and W.)
But this, then, is a problem, because this commits me to reading my morphisms as going up. But my friends who study TQFTs think of their cobordisms as going down (see, for example, the many This Week's Finds starting at Week 73, in which Baez gives a mini course on n-categories).
So clearly one of us is right, and the other is wrong. Either we should think of vectors as having a head and a leg (more than half the time I catch myself drawing them this way anyway), or we should think of cobordisms, tangles, and their cousins as transforming the bottom of the page into the top of the page.
I'm leaning towards the latter, but only because there's one more, very established, case in which this matters. Diagrams of Minkowski space, and more generally of, for example, light cones in curved space, the positive time dimension is always drawn going up the page. And if our conventions are to have any sensible physical meaning, morphisms must correspond to forward time evolution.
So only typesetters and screen-renderers (and English language readers), who insist and putting (0,0) in the upper left corner of the page, have it backwards. Then again, mathematicians have known that for years.
But, of course, we do live in a democracy. If you tell me that infinitely more people and publications think time flows down, then I'll happily switch my conventions.
Incidentally, I wonder what the history of such doodles really is. I hear talk of "einbeins", "zweibeins" and "dreibeins", lit. one-leg, two-leg, and three-leg, and if I knew more German, multi-legs ("mehrbeins"?), which sound like these tensorial pictures. Based on skimming the discussion here, it looks like einbeins are related, but not fully formulated. I wonder why someone would refer to an objects legs, though, unless it had legs.
Anyway, the notational question I wanted to ask was this:
We generally write "vectors" (as opposed to covectors) with raised indices, and covectors with lowered indices. This has physical significance: the Poincare group acts differently depending on whether the index is raised or lowered: on lowered indices, symmetries act by the adjoint, and so it's really a "dual" action, in the sense that it happens in the backwards order. So, although in some sense vectors and covectors are interchangeable, interpretations of diagrams are not.
Since vectors' indices are raised, Penrose proposes that a vector ought to have one "arm" (an edge coming out of the top), whereas a covector ought to have one leg. This makes sense, and closely matches how he thinks of contractions: contracting indices corresponds to drawing curves from the tops of the vectors to the bottoms of the covectors.
On the other hand, as soon as you start playing around with Penrose's diagrams — well, as soon as Josh H. started playing with them, when I introduced them to him over IM — you notice the connection between these diagrams and various ideas from quantum topology. In particular, diagrams like this look an awful lot like tangles.
This is actually no surprise. A n,m-tensor (one with n arms and m legs, so e.g. a vector is a 1,0-tensor), by definition, is a map from V tensored with itself m times to V tensored with itself n times. (By convention, V tensored with itself 0 times is the ground field — no, not a generic one-dimensional vector space, because I do in fact need the special number "1". This is so that there is a natural isomorphism between "V^0 tensor W" and W.)
But this, then, is a problem, because this commits me to reading my morphisms as going up. But my friends who study TQFTs think of their cobordisms as going down (see, for example, the many This Week's Finds starting at Week 73, in which Baez gives a mini course on n-categories).
So clearly one of us is right, and the other is wrong. Either we should think of vectors as having a head and a leg (more than half the time I catch myself drawing them this way anyway), or we should think of cobordisms, tangles, and their cousins as transforming the bottom of the page into the top of the page.
I'm leaning towards the latter, but only because there's one more, very established, case in which this matters. Diagrams of Minkowski space, and more generally of, for example, light cones in curved space, the positive time dimension is always drawn going up the page. And if our conventions are to have any sensible physical meaning, morphisms must correspond to forward time evolution.
So only typesetters and screen-renderers (and English language readers), who insist and putting (0,0) in the upper left corner of the page, have it backwards. Then again, mathematicians have known that for years.
But, of course, we do live in a democracy. If you tell me that infinitely more people and publications think time flows down, then I'll happily switch my conventions.
28 July 2006
Mathcamp High School, to within an order of magnitude
Although I've been a Mathcamp JC, I've carefully stayed away from any talk financial. I've certainly been around, so perhaps have more knowledge than most, but I'm pretty sure that any numerics I might quote are, to within my level of accuracy, publicly available.
Mathcamp tuition for the five-week summer hovers around $3000; half of this goes to the university for room and board. (For comparison, one week without scholarship at a private American university costs more than $1000.) There are 100 students, many of whom do not pay full fair; we can expect an operating budget from tuition around 50 thousand dollars (if fully half of camp is covered by scholarships; perhaps 100 thousand is a good upper estimate). What, then, are expenses? There are roughly 20 staff, who each make, travel and housing included, between three and four thousand dollars for the summer. (This is less, per week, than many camps pay, but certainly well worth it.) That right there eats up most of the operating budget; housing for visitors (at any given time there are perhaps four former staff and perhaps four Famous Professors) already pushes us over any reasonable estimate. And, of course, visiting professors receive around $100 a day, plus travel, far below what many conferences pay. So it's possible, with squeezing, that Mathcamp breaks even. More likely, the Mathematical Foundation of America is doing its job well.
Greg, a camper, brought up one day after lunch the fantasy of a Mathcamp High School. Is it feasible? he asked. Is it good? I responded.
Mathcamp High School, under the present model, would be extremely expensive. College tuition is on par with tuition at elite boarding schools: Exeter, for instance, expects about $37 000 per year. Mathcamp successfully draws students from many socioeconomic backgrounds; it would have to be very careful to continue to do so in a yearlong model. Is MFOA up to the task?
Of course, I would not go to Mathcamp High School. Boarding schools can be extremely valuable, especially for the unfortunately many children who come from less-than-supportive families. But students with good families and good local schools should not, generally, choose boarding schools, even if they are economically feasible.
Mathcamp High School would also have to be extremely careful about keeping the culture of freedom that it currently rides to great success. Can an elite school for gifted youth avoid all quantitative grades and measurements? Can it allow students complete freedom to choose their classes and design their curricula? There are Waldorf schools that succeed. How do they do it?
One thing MHS would have to do is find young, enthusiastic, and brilliant teachers who can commit to years-long tenures. Mathcamp the summer program has a student-teacher ratio of almost five-to-one; advising groups (not all teachers advise) are around size seven. To make MHS work would require JCs and Mentors to be even more involved in helping students pick classes and put together four-year plans, assuring that they cover complete curricula.
Where would the staff come from? Undergrad and graduate students have their own academic careers to see to. But they are who you want: young and enthusiastic and able to directly participate in the culture.
Perhaps you run it at a college. Perhaps, as part of the deal allowing Mathcamp to stay on that campus for the year, the staff can enroll as full-time students. Then, especially if MHS's campus continues to move from year to year, Mathcamp can continue to bring in students from around the country.
I would gladly spend a year on staff at MHS. But I would not give up four years at Stanford to work as an MHS staff advisor, and I had said earlier that helping students plan high-school-long curricula takes staff who are there for the long haul. And what about graduate students?
MHS could no longer provide only math classes. American colleges demand that American high schools provide general liberal educations, and I'm sure that the Mathcamp model would work in other academic areas. What I'm not sure about is how tied Mathcamp's particularly dorky culture is to the math emphasis. The value at Mathcamp of sitting around in the lounge talking math, and math specifically, is astronomical: full immersion in the math jokes and culture and language cannot be replaced.
The most important question, in any discussion of extending Mathcamp from five weeks to forty, is how diminished would be the immediate experience. And how much is a good thing. Campers talk extremely positively about how "intense" a summer they had. Mathcamp levels of intensity are not sustainable, although some colleges come close. Is close good enough?
In the hope of providing a more year-round Mathcamp presence, I have, instead, a different suggestion. Let's hold Mathcamp more than once a year. Five weeks in the summer, yes, but also two weeks over winter break. These would most reasonably be in the American south or southwest, where it's warmer, but perhaps these two weeks should be in southern Maine, so as to end with four days at MIT's Mystery Hunt. Then another week around spring break: Mathcamp, I feel, has the bravado to encourage students to skip a week of high school if the local spring break falls on a different week from Mathcamp's. Each of these would be run, not as just a reunion, but as a mini camp: JCs and Mentors would live in dorms as RAs, fulfill all the loco-parentis responsibilities of camp counselors, and run activities and teach classes. This would not be a reunion but a week- or two-week-long math conference.
This I could skip school to organize. College final exam periods are largely a laugh; one can find workarounds for missing classes and assignments. I would not give up a year at Stanford for a year JCing, unless I simultaneously had access to a university's classes and undergraduate community. But I would happily — nay, eagerly — give up a few weeks.
Mathcamp tuition for the five-week summer hovers around $3000; half of this goes to the university for room and board. (For comparison, one week without scholarship at a private American university costs more than $1000.) There are 100 students, many of whom do not pay full fair; we can expect an operating budget from tuition around 50 thousand dollars (if fully half of camp is covered by scholarships; perhaps 100 thousand is a good upper estimate). What, then, are expenses? There are roughly 20 staff, who each make, travel and housing included, between three and four thousand dollars for the summer. (This is less, per week, than many camps pay, but certainly well worth it.) That right there eats up most of the operating budget; housing for visitors (at any given time there are perhaps four former staff and perhaps four Famous Professors) already pushes us over any reasonable estimate. And, of course, visiting professors receive around $100 a day, plus travel, far below what many conferences pay. So it's possible, with squeezing, that Mathcamp breaks even. More likely, the Mathematical Foundation of America is doing its job well.
Greg, a camper, brought up one day after lunch the fantasy of a Mathcamp High School. Is it feasible? he asked. Is it good? I responded.
Mathcamp High School, under the present model, would be extremely expensive. College tuition is on par with tuition at elite boarding schools: Exeter, for instance, expects about $37 000 per year. Mathcamp successfully draws students from many socioeconomic backgrounds; it would have to be very careful to continue to do so in a yearlong model. Is MFOA up to the task?
Of course, I would not go to Mathcamp High School. Boarding schools can be extremely valuable, especially for the unfortunately many children who come from less-than-supportive families. But students with good families and good local schools should not, generally, choose boarding schools, even if they are economically feasible.
Mathcamp High School would also have to be extremely careful about keeping the culture of freedom that it currently rides to great success. Can an elite school for gifted youth avoid all quantitative grades and measurements? Can it allow students complete freedom to choose their classes and design their curricula? There are Waldorf schools that succeed. How do they do it?
One thing MHS would have to do is find young, enthusiastic, and brilliant teachers who can commit to years-long tenures. Mathcamp the summer program has a student-teacher ratio of almost five-to-one; advising groups (not all teachers advise) are around size seven. To make MHS work would require JCs and Mentors to be even more involved in helping students pick classes and put together four-year plans, assuring that they cover complete curricula.
Where would the staff come from? Undergrad and graduate students have their own academic careers to see to. But they are who you want: young and enthusiastic and able to directly participate in the culture.
Perhaps you run it at a college. Perhaps, as part of the deal allowing Mathcamp to stay on that campus for the year, the staff can enroll as full-time students. Then, especially if MHS's campus continues to move from year to year, Mathcamp can continue to bring in students from around the country.
I would gladly spend a year on staff at MHS. But I would not give up four years at Stanford to work as an MHS staff advisor, and I had said earlier that helping students plan high-school-long curricula takes staff who are there for the long haul. And what about graduate students?
MHS could no longer provide only math classes. American colleges demand that American high schools provide general liberal educations, and I'm sure that the Mathcamp model would work in other academic areas. What I'm not sure about is how tied Mathcamp's particularly dorky culture is to the math emphasis. The value at Mathcamp of sitting around in the lounge talking math, and math specifically, is astronomical: full immersion in the math jokes and culture and language cannot be replaced.
The most important question, in any discussion of extending Mathcamp from five weeks to forty, is how diminished would be the immediate experience. And how much is a good thing. Campers talk extremely positively about how "intense" a summer they had. Mathcamp levels of intensity are not sustainable, although some colleges come close. Is close good enough?
In the hope of providing a more year-round Mathcamp presence, I have, instead, a different suggestion. Let's hold Mathcamp more than once a year. Five weeks in the summer, yes, but also two weeks over winter break. These would most reasonably be in the American south or southwest, where it's warmer, but perhaps these two weeks should be in southern Maine, so as to end with four days at MIT's Mystery Hunt. Then another week around spring break: Mathcamp, I feel, has the bravado to encourage students to skip a week of high school if the local spring break falls on a different week from Mathcamp's. Each of these would be run, not as just a reunion, but as a mini camp: JCs and Mentors would live in dorms as RAs, fulfill all the loco-parentis responsibilities of camp counselors, and run activities and teach classes. This would not be a reunion but a week- or two-week-long math conference.
This I could skip school to organize. College final exam periods are largely a laugh; one can find workarounds for missing classes and assignments. I would not give up a year at Stanford for a year JCing, unless I simultaneously had access to a university's classes and undergraduate community. But I would happily — nay, eagerly — give up a few weeks.
05 July 2006
Orange Juice: it's what's for breakfast
When I went off to college, my mother strongly encouraged me not to get a credit card. I have a debit/atm card, no debt, and no credit either. Sometime, probably at the end of the summer, I'll sign up for one of those air miles cards, and put all sorts of notes-to-self in my calendar about how much to spend, and when to pay it off, and when to cancel. I need the credit history, because it won't be too long before I actually will want to borrow. And it's possible, given the right circumstances, to make money (or at least air fare) off those cards. But in my case the right circumstances include having parents pay for tuition and bail me out when needed (never more than the cost of books and board bill, which I've been covering out of pocket, although they offered to pay for them).
And they've made it very clear that when I hit grad school, so exactly a year from now, I'm responsible for paying for everything. Rent, food, etc. comes out of whatever salary and stipend I get.
One way to live very well and very cheaply, if you're willing to spend the time and energy on thinking about cooking and eating, is to always cook vegan. Or vegan plus eggs, since eggs are cheaper than soy. Or, rather, it's very easy to spend huge amounts of money on vegan products — soy milks and egg replacers and yummy, unnecessary stuff. But, if you have the pallet, vegetables and beans and soy are cheaper than meat and dairy.
At school, this is a big part of how we cut costs — my quarterly board bill is less than any other eating arrangement on campus. I'll be doing half the ordering for our kitchen, and it's great to buy bulk flours. Our most expensive products are the (organic, free range) dairy: cheese and butter is expensive. It's also high fat, and especially high in saturated fat (is how it stays solid at room temp). It's a constant challenge to try to convince the residents, many of whom have never even tried vegetarianism before, that they don't need cheese and butter to survive. I plan to wow them, early on, with vegan desserts: my brother got me a vegan cookbook that, because of its veganism, is also zero-colesterol, almost zero-saturated fat, and generally very low fat. Silken tofu, my current favorite ingredient, in almost every cake, frosting, and pudding.
Poverty is one of the major causes of American obesity. Eating healthy requires resources: time, energy, education, and money. Whereas McDonalds will sell you all the calories you need in a meal for a dollar and no wait. But if you have the conveniences of, for instance, an academic life, in which thegovernment school provides medical coverage, athletic facilities, something intellectual to do, and a small amount of money, living cheaply and eating well is very easy. The trick is to be a food snob: prefer your own cooking, buy only the very best ingredients, and think carefully about what you eat. And eat vegan. And organic. And local. And, most importantly, be part of the "slow food" push. WholeFoods will happily provide expensive vegan organic premade and packaged products.
Below is my signature dessert, which I usually think of as a vegan gluten-free brownie recipe, but I'll present here as a chocolate raspberry cake (as I had it for my birthday), with commentary on how to modify. As always, check local availability before committing to any particular fresh produce — by varying the fruit, one can make a seasonal cake in almost any season.
Chocolate raspberry cake
Preheat oven 350°F (325 for gluten-free). Grease two nine-inch round cake pans (or one 9x13 pan for brownies), and, for cakes, cut parchment or wax paper into circles to exactly fit on the bottom of the pans (for easier removal), place in, and grease both sides.
In blender, combine wet ingredients until smooth:
In standing mixer with paddle blade, mix dry ingredients:
Pour in wet ingredients. (For brownies, also add
Frosting and assembly
Tofu generally comes in 16-oz packs, and I usually use about 9 oz in this cake. So the rest, rather than trying to keep it, goes into the frosting. (In theory one would have the presence of mind to do the frosting a day ahead, so that the tofu can set. But I never do.)
Wash and clean standing mixer bowl, and fit with wire whisk. Whip
For a raspberry chocolate cake, I also like to acquire fresh raspberries, and to make a raspberry syrup/glaze. This latter is very easy: in a sauce pan, heat raspberry jam with a little water until it dissolves, just before boiling (careful not to overheat and burn the sugar).
Once cakes are done, let cool 10 minutes then remove from pans and let cool completely. To assemble, place one cake face down on plate. Spread a thin layer of frosting, and cover with
Serve, and amaze your friends, after they've commented on how moist and rich it is, by revealing its ingredients.
And they've made it very clear that when I hit grad school, so exactly a year from now, I'm responsible for paying for everything. Rent, food, etc. comes out of whatever salary and stipend I get.
One way to live very well and very cheaply, if you're willing to spend the time and energy on thinking about cooking and eating, is to always cook vegan. Or vegan plus eggs, since eggs are cheaper than soy. Or, rather, it's very easy to spend huge amounts of money on vegan products — soy milks and egg replacers and yummy, unnecessary stuff. But, if you have the pallet, vegetables and beans and soy are cheaper than meat and dairy.
At school, this is a big part of how we cut costs — my quarterly board bill is less than any other eating arrangement on campus. I'll be doing half the ordering for our kitchen, and it's great to buy bulk flours. Our most expensive products are the (organic, free range) dairy: cheese and butter is expensive. It's also high fat, and especially high in saturated fat (is how it stays solid at room temp). It's a constant challenge to try to convince the residents, many of whom have never even tried vegetarianism before, that they don't need cheese and butter to survive. I plan to wow them, early on, with vegan desserts: my brother got me a vegan cookbook that, because of its veganism, is also zero-colesterol, almost zero-saturated fat, and generally very low fat. Silken tofu, my current favorite ingredient, in almost every cake, frosting, and pudding.
Poverty is one of the major causes of American obesity. Eating healthy requires resources: time, energy, education, and money. Whereas McDonalds will sell you all the calories you need in a meal for a dollar and no wait. But if you have the conveniences of, for instance, an academic life, in which the
Below is my signature dessert, which I usually think of as a vegan gluten-free brownie recipe, but I'll present here as a chocolate raspberry cake (as I had it for my birthday), with commentary on how to modify. As always, check local availability before committing to any particular fresh produce — by varying the fruit, one can make a seasonal cake in almost any season.
Chocolate raspberry cake
Preheat oven 350°F (325 for gluten-free). Grease two nine-inch round cake pans (or one 9x13 pan for brownies), and, for cakes, cut parchment or wax paper into circles to exactly fit on the bottom of the pans (for easier removal), place in, and grease both sides.
In blender, combine wet ingredients until smooth:
- 1 cup (8 oz) silken tofu
- 1/2 cup raspberry (or other fruit) jam
- 1/4 cup canola oil
- 1 Tbsp vanilla
In standing mixer with paddle blade, mix dry ingredients:
- 2 cups sugar (for fudge brownies, use 3 cups)
- 1 cup unsweetened cocoa powder (for brownies, use 1 1/2 cups)
- optional: up to 1 Tbsp instant coffee powder
- 2 cups all-purpose flour (or cake flour, or tapioca flour for gluten-free; if making cake with tapioca flour, supplement with 2 tsp xanthan gum, a gluten substitute derived from bacteria, and 1/4 cup cornstarch)
- 1 tsp baking soda (for brownies, use less; baking powder also works, and has less leavening power, because the batter is already acidic)
Pour in wet ingredients. (For brownies, also add
- 3 cups (vegan) dark chocolate chips)
- 1 cup soymilk (be sure, if making gluten-free, to check the brand — Soy Dream and Almond Breeze are both safe, whereas Edensoy and Vitasoy are not)
Frosting and assembly
Tofu generally comes in 16-oz packs, and I usually use about 9 oz in this cake. So the rest, rather than trying to keep it, goes into the frosting. (In theory one would have the presence of mind to do the frosting a day ahead, so that the tofu can set. But I never do.)
Wash and clean standing mixer bowl, and fit with wire whisk. Whip
- silken tofu
- cocoa powder
- powdered sugar
- ground instant coffee
- corn starch and/or tapioca powder to thicken
For a raspberry chocolate cake, I also like to acquire fresh raspberries, and to make a raspberry syrup/glaze. This latter is very easy: in a sauce pan, heat raspberry jam with a little water until it dissolves, just before boiling (careful not to overheat and burn the sugar).
Once cakes are done, let cool 10 minutes then remove from pans and let cool completely. To assemble, place one cake face down on plate. Spread a thin layer of frosting, and cover with
- fresh raspberries, cut in half
- whole fresh raspberries
Serve, and amaze your friends, after they've commented on how moist and rich it is, by revealing its ingredients.
Subscribe to:
Posts (Atom)