13 December 2007

Linear Differential Equations

In the calculus class I'm TAing, we spent some time learning how "the method of undetermined coefficients" could be used to solve linear differential equations. I have never taken a first-year differential equations class, so although I'd solved many differential equations this way, I had never really though about such methods with any real theory. My goal in this entry is to describe the method and explain why it works, using more sophisticated ideas than in a first-year class, but still remaining very elementary. I had hoped to have this written and posted weeks ago; as it is, I'm writing it while proctoring the final exam for the class.

First, let me remind you of the set-up of the problem. We are trying to solve a non-homogeneous differential equation with constant coefficients:

$$ a_n y^{(n)} + \dots + a_1 y' + a_0 y = g(x) $$

I will assume that $a_n$ is not 0; then the left-hand side defines a linear operator $D$ of on the space of functions, with $n$-dimensional kernel.

We can diagonalize this operator by Fourier-transforming: if $y = e^{rx}$, then $D[y] = a(r) e^{rx}$, where $a(r)$ is the polynomial $a_n r^n + \dots + a_1 r + a_0$. If $a(r)$ has no repeated roots, then we can immediately read off a basis for the kernel as $e^{rx}$ for $r$ ranging over the $n$ roots. If there is a repeated root, then $a'(r)$ and $a(r)$ share a common factor; $a'(r)$ corresponds to the operator

$$ E[y] = a_n n y^{(n-1)} + \dots + a_1 $$

Then, since $\frac{d^k}{dx^k} [x y] = x y^{(k)} + k y^{(k-1)}$, we see that

$$ D[x y] = x D[y] + E[y] $$

so $r$ is a repeated root of $a(r)$ if and only if $e^{rx}$ and $x e^{rx}$ are zeros of $D$.

More generally, linear differential operators with constant coefficients satisfy a Leibniz rule:

$$ D[p(x) e^{rx}] = \left( p(x) a(r) + p'(x) a'(r) + \dots + p^{(n)}(x) a^{(n)}(r) \right) e^{rx} $$

for polynomial $p(x)$.

Thus, our ability to solve linear homogeneous differential equations with constant coefficients depends exactly on our ability to factor polynomials; for example, we can always solve second-order equations, by using the quadratic formula.

But, now, what if $g(x) \neq 0$? I will assume that, through whatever conniving factorization methods we use, we have found the entire kernel of $D$. Then our problem will be solved if we can find one solution to $D[y] = g(x)$; all other solutions are the space through this function parallel to the kernel. In calculus, we write this observation as "$y_g = y_c + y_p$" where g, c, and p stand for "general", "[c]homogenous", and "particular", respectively.

For a general $g$, we can continue to work in the Fourier basis: Fourier transform, sending $D[y] \mapsto a(r)$, then divide and integrate to transform back. This may miss some solutions at the poles, and is computationally difficult: integrating is as hard as factoring polynomials. For second-order equations, we can get to "just an integral" via an alternative method, by judiciously choosing how to represent functions in terms of the basis of solutions for $D[y]=0$.

But for many physical problems, $g(x)$ is an especially simple function (and, of course, it can always be Fourier-transformed into one).

In particular, let's say that $g(x)$ is, for example, a sum of products of exponential (and sinosoidal) and polynomial functions. I.e. let's say that $g(x)$ is a solution to some homogeneous linear constant-coefficient $C[g(x)] = 0$. Another way of saying this: let's say that the space spanned by all the derivatives $g$, $g'$, $g''$, etc., is finite-dimensional. If it has dimension $= m$, then $g^{(m)}(x)$ is a linear combination of lower-order derivatives, and thus I can find a minimal (order) $C$ of degree $m$ so that $C[g] = 0$. By the Leibniz rule, functions with this property form a ring. When I add two functions, the dimensions of their derivative spaces no more than add; when I multiply, the dimensions no worse than multiply. Indeed, by the earlier discussion, we have an exact description of such functions: they are precisely sums of products of $x^s e^{rx}$ ($s$ an non-negative integer).

In any case, let's say $g$ is of this form, i.e. we have $C$ (with degree $m$) minimal such that $C[g] = 0$, and first let's assume that the kernels of $C$ and $D$ do not intersect. Then $D$ acts as a one-to-one linear operator on finite-dimensional space $\ker C$, which by construction is spanned by the derivatives of $g$, and so must be onto. I.e. there is a unique point $y_p(x) = b_0 g(x) + b_1 g'(x) + \dots + b_{m-1} g^{(m-1)}(x) \in \ker C$ so that $D[y_p] = g$. Finding it requires only solving the system of linear equations in the $b_i$.

If, however, $\ker C$ and $\ker D$ intersect, then we will not generically be able to solve this system of linear equations. Because $C$ is minimal, $g$ is not a linear combination of fewer than $m$ of its derivatives; if $D$ sends (linear combinations of) some of those derivatives to 0, we will never be able to get a map onto $g$. Let me say this again. If $D$ does not act one-to-one on $\ker C$, but $g$ is in the range of this matrix, then $g$ is in a smaller-than-$m$-dimensional space closed under a differential operator; thus, there is a differential operator of lower-than-$m$ degree that annihilates $g$.

How can we then solve the equation? By the Leibniz rule, we observed earlier, $(xg)' = g + x g'$, and so

$$\frac{d^k}{dx^k}[x g(x)] = x g^{(k)}(x) + k g^{(k-1)}(x)$$

Then $C[xg]$ is a linear combination of derivatives of $g$; i.e. $C[xg] \in \ker C$. If we take the space spanned by the derivatives of $x g(x)$, it is one dimension larger than $\ker C$. We can repeat this trick ---

$$ \frac{d^k}{dx^k}[ x^p g(x) ] = \sum_{i=0}^k \frac{k!p!}{i!(k-i)!(p-k+i)!} x^{p-k+i} g^{(i)}(x) $$

--- and eventually get a space that's $n+m$ dimensional, containing, among other things, $g$, and closed under differentiation. The kernel of $D$ in this larger space is at most $n$ dimensional (since $n = \dim \ker D$ in the space of all functions), and so the range is $m$-dimensional, and must contain $g$: the system of linear equations is solvable.

Of course, we often can stop before getting all the way to $n$ extra dimensions. But so long as we are only interested in functions that are zeros of constant linear differential operators, then we can always solve differential equations. For example, every linear equation from a physics class, and almost every one in first-year calculus, is solvable with this method.

One final remark:

Composition of differential operators follows matrix multiplication, and hence yields differential operators. If $g$ satisfies $C[g] = 0$, and if we're trying to solve $D[y]=g$, then we might decide to solve more generally $CD[y] = 0$. The left-hand-side is $n+m$ dimensional, and if we're truly gifted at factoring polynomials, then we can solve it directly. Then the solutions to our original equation must be in this kernel.

No comments: