Lost Among Notes

<< Newer Who gives a shit about tangents?
Older >> And Then …
Tags:

Differential Forms are simpler than you’ve been told

Abstract

I’ve struggled to understand differential forms for many years. I have several books that present differential forms and the generalized Stokes theorem at various levels of sophistication, and none of them made the subject click into place in my head.

I recently had an epiphany, not from a new book but rather from reframing and untangling bits and pieces from those books.

In this post I’ll just explain the roles of the various objects that are defined in books and materials on differential forms, and how they fit together. I won’t go into the heavy stuff, because that is well covered in those materials. The complete theory requires quite a bit of algebraic machinery. This post is meant to help you through the journey.

TL; DR

The great success of differential forms is to generalize determinants and cofactor expansions via the wedge product, giving a viable theory of algebraic area, and then to use this theory to relate integrals over k-dimensional “volumes” with integrals over their (k-1)-dimensional boundary “surfaces”.
Once you understand the wedge product and the importance of multi-linearity, the exterior derivative is almost inevitable, and the rest is just book-keeping.

Red herrings and obscurity

Books on differential forms and Stokes’s Theorem pay a heavy toll for handling some historical baggage:

  • A smooth transition from “div, grad, curl” and the classic integral theorems
  • The $ dx $ notation in integrals

I think those are red herrings. Differential forms and generalized Stokes are clearer than the classical path. And the $ dx $ notation for integrals is not really something that adds much insight, and to be honest, there is less to it than meets the eye.

The books tend to be either full-on axiomatic and algebraic, or frustratingly informal and vague, defining differential forms as “something you integrate” or as “a formalism”. Yikes, what a silly thing to say.

Forms are generalizations of determinants

Determinants are defined on square matrices, or viewed from another vantage point, are defined to take N vectors in an N-dimensional vector space, and compute the “volume” of the parallelepiped they span.

When learning the theory of determinants, you probably saw them characterized as multilinear, alternating functions, motivated by the need to measure volumes, and to detect linearly dependent sets of vectors.

A form is, like a determinant, a multilinear, alternating function into $ \mathbb{R} $. Like a determinant, a form measures K-dimensional volumes in an N-dimensional vector space, but… when K is smaller than N, something unexpected happens. A form does not work equally well for all K-dimensional volumes. A form may measure 0 for a given set of K linearly independent vectors, something that could never happen for determinants.

For example, in $ \mathbb{R}^3 $ we can define a 2-form $ \psi_{xy} $ that measures the projected area of a parallelogram spanned by 2 vectors into the x-y plane.

We can define $ \psi_{xy} $ as the 2×2 determinant of the x-y components of the vectors. Or, said another way, $ \psi_{xy} $ is the minor that results from discarding the z component of the two input vectors.

$$ \psi_{xy} (a, b) = \psi_{xy} \begin{pmatrix} a_x & b_x \cr a_y & b_y \cr a_z & b_z \cr \end{pmatrix} \ = \begin{vmatrix} a_x & b_x \cr a_y & b_y \cr \end{vmatrix} \ = a_x \ b_y - b_x \ a_y $$

This function is easily seen to be bilinear and alternating. Note that we could have two linearly independent vectors $ a, b$ such that $ \psi_{xy}(a, b) = 0 $.
For example $ \begin{pmatrix} 1 \cr 1 \cr 1 \cr \end{pmatrix} $ and $ \begin{pmatrix} 1 \cr 1 \cr 2 \end{pmatrix} $ are linearly independent vectors such that $ \psi_{xy} $ takes value 0 for the pair. It does make sense, as they don’t project an area on the x-y plane.

Once you realize this, it’s easier to understand all the trouble the books will go through to prove that the space of multilinear alternating K-forms on N is a vector space with dimension $ {N \choose K} $.

In $ \mathbb{R}^3 $, the space of 1-forms and the space of 2-forms both have dimension 3.
Let’s define a basis of 1-forms:

$$ \phi_x(v) = v_x \ \ \ \phi_y(v) = v_y \ \ \ \phi_z(v) = v_z $$

And a basis of 2-forms:

$$ \psi_{xy}(a, b) = \begin{vmatrix} a_x & b_x \cr a_y & b_y \cr \end{vmatrix} \ \ \ \psi_{yz}(a, b) = \begin{vmatrix} a_y & b_y \cr a_z & b_z \cr \end{vmatrix} \ \ \ \psi_{zx}(a, b) = \begin{vmatrix} a_z & b_z \cr a_x & b_x \cr \end{vmatrix} $$

Note that we said that the space K-forms on N is a vector space with dimension $ {N \choose K} $. As a special case, when K equals N, the space of K-forms is 1-dimensional. Which we knew, since the determinant is fully characterized as the alternating multilinear form such that $ \det(e_1, \dots, e_n) = 1 $ (where the $e_i$ are the basis vectors of $ \mathbb{R}^3 $).

The wedge product generalizes determinant cofactors

When computing determinants, one will generally use the Laplace expansion, aka cofactor expansion, to decompose a determinant into a sum of smaller determinants.

Working downward on dimension, we compute a 3×3 determinant as a weighted addition of 2×2 minors.

Or, working upward on dimension: given that we have 2×2 determinants in the x-y plane to measure surface area, how could we build 3×3 determinants to compute volumes with the extra z dimension?

We’ve defined the basis 1-form $ \phi_z $ and basis 2-form $ \psi_{xy} $. We would expect the “volume” in 3-space to be Area × Height, something like $ \text{vol}(a, b, c) = \phi_z(c)\ \psi_{xy}(a, b) $.

However, note that the expression above could give a positive volume for a degenerate 2-dimensional parallelepiped.

Let’s choose:

$$ a = (1, 0, 1)^T,\ \ b= (0, 1, 0)^T,\ \ c=a=(1, 0, 1)^T $$ and observe: $$ \phi_z(c) = 1\ \ \psi_{x, y}(a, b) = 1 \implies \text{vol}(a, b, c) = 1 $$

The failure here is we combined $ \phi_z $ with $ \psi_{xy} $ but did not get an alternating form.

The wedge product can combine the forms $\psi_{xy}$ and $\phi_z$, to create an alternating 3-form, which is none other than the determinant in 3-space. Let’s look at a 3×3 determinant’s cofactor expansion.

$$ \begin{vmatrix} a_x & b_x & c_x \cr a_y & b_y & c_y \cr a_z & b_z & c_z \cr \end{vmatrix} = a_z \begin{vmatrix} b_x & c_x \cr b_y & c_y \cr \end{vmatrix} - b_z \begin{vmatrix} a_x & c_x \cr a_y & c_y \cr \end{vmatrix} + c_z \begin{vmatrix} a_x & b_x \cr a_y & b_y \cr \end{vmatrix} $$

Notice that $$ a_z = \phi_z(a) $$ and $$ \begin{vmatrix} b_x & c_x \cr b_y & c_y \cr \end{vmatrix} = \psi_{xy}(b, c) $$

So, $$ \det(a, b, c) $$ $$ = \phi_z(a)\ \psi_{xy}(b, c) - \phi_z(b)\ \psi_{xy}(a, c) + \phi_z(c)\ \psi_{xy}(a, b) $$ $$ = \phi_z \wedge \psi_{xy}\ (a, b, c) $$

With the previous example $a, b, c$ defining a degenerate 2-dimensional parallelepiped, we’d get:

$$ \det(a, b, c) $$ $$ = \phi_z(a)\ \psi_{xy}(b, c) - \phi_z(b)\ \psi_{xy}(a, c) + \phi_z(c)\ \psi_{xy}(a, b) $$ $$ = 1 \times \psi_{xy}(b, c) - 0 \times 0 + 1 \times \psi_{xy}(a, b) $$ $$ = 1 \times \psi_{xy}(b, a) + 1 \times \psi_{xy}(a, b) $$ $$ = 0 $$ since $ a = c $ and $ \psi_{xy} $ is alternating.

In determinants, one ends up computing a sum made up of products of permuted vector components.

$$ \det(A) = \sum_{\sigma \in S_n} \Big( \text{sign}(\sigma) \prod_{i=1}^{n} a_{i,\sigma_i}\Big) $$ ref: wikipedia

The wedge product generalizes this: given two alternating forms, it shuffles them in such a way that the result is also an alternating form. And the crazy thing is, by this simple insistence on getting another multilinear alternating function, we end up with a viable and consistent theory of algebraic volume.

The proofs and computations around the wedge product are a bit of a slog, but knowing that they’re just repeating the determinant’s trick of permuting components, it all makes more sense.

Differential forms are just form fields

A vector field is an assignment of a vector to each point in a space. A differential form is an assignment of a form to each point in a space. That’s it, there’s nothing more there. Differential forms should be called form fields. And that’s what I’ll call them for the remainder of this post.

We already know a 1-form field: the derivative. In the generalized realm of functions in vector spaces, the derivative of a function $ f: V \to \mathbb{R} $ at a point $ x_0 $ is the linear map that approximates the function near $ x_0 $.

$$ Df\vert_{x_0}(h) \approx f(x_0 + h) - f(x_0) $$

The $ \vert_{x_0} $ notation expresses that $ Df $ has a dependency on $x_0$, while the “linear” argument is $h$. By convention we’ll say that $f$ itself is a 0-form, i.e. it takes no “linear” argument. So, $Df$ takes 1 more linear argument than $f$.

In several books there is a long preparation leading to the definition of a differential form. Really, it’s nothing much on top of forms. And the term differential forms makes them sound all too fancy. I don’t see anything intrinsically “differential” about them.

Integrals in the small are approximately forms

In the general setting of functions on vector spaces, the derivative of a function at a point is the linear map that best approximates the function in a small environment of the point.

In the same spirit, we can examine the integral of a function $ f $ in a very small parallelepiped around a point $ p$, spanned by k vectors. Let’s call the parallelepiped $ S_p$, and the vectors that span it $v_1, \dots, v_k $. If the function $ f $ is well behaved, we can consider it approximately constant in $ S_p$. Then by analogy to single-variable calculus, we could say the value of the integral on $ S_p$ is $ f(p) $ times the volume of $ S_p $. We know how to compute volumes: forms.

For a small enough parallelepiped $ S_p $ around $ p $ then, the integral of $ f$ would be:

$$ \int_{S_p} f \approx f(p)\ \omega(v_1, \dots, v_k) $$

for some k-form $ \omega $. Note that $ f(p)\ \omega $ is also a k-form, so in the small, the integral is a k-form.

This point especially, explains why forms are useful to the theory of integration. While the derivative is the linear function that approximates a given function locally, a form field approximates integrals over k-volumes locally.

Forms dictate how domains of integration count for the integral

We’ve seen the basis forms $\phi_x$ and $\psi_{xy} = \phi_x \wedge \phi_y$. We’d like to integrate them.

In n-space, even the simplest integral needs to choose an direction. Looking at the the integral as the limit of a sum, let’s define a line $L$ as the domain of integration.

Let’s integrate $f \phi_x$ over $L$ by decomposing $L$ into a series of segments $l_m$. The function $f$ would take values in points of the form $x_m = p + l_1 + \dots + l_m $.

$$ \int_L f \phi_x \approx \sum_{m=1}^{M} f(x_m) \phi(l_m) $$

Imagine $L$ is a vertical line, that is $l_m = c_m e_y$.

Then the $\phi_x(l_m) = c_m \phi_x(e_y) = 0$ all vanish, and the integral is 0.

Now imagine $L$ is horizontal and $l_m = c_m e_x$. Then $\phi_x(l_m) = c_m \phi_x(e_x) = c_m $, and

$$ \sum_{m=1}^{M} f(x_m) \phi(l_m) = \sum_{m=1}^{M} f(x_m) c_m $$

And this looks very much like a plain old integral from single-variable calculus.

The form $\phi_x$ dictates that the vertical component of the domain of integration is ignored, and that if our domain of integration is “parallel to” $\phi_x$, then the integral is just a plain old integral.

In the rest of this post we only need to consider domains of integration that are perpendicular or parallel to the form being integrated.

The exterior derivative is … what you’d expect

Having established that in the small, integrals are approximately forms, and that forms can be combined into higher-dimensional forms, the exterior derivative is almost low hanging fruit.

Let’s remember that, for a function $ f: V \to \mathbb{R} $,

$$ Df\vert_{x_0}(h) \approx f(x_0 + h) - f(x_0) $$

We’re going to invent the new operation of the exterior differential $d$. The first “rule” of it is for functions $ f: V \to \mathbb{R} $, i.e. for 0-forms. The exterior differential for 0-forms is the same as the derivative:

$$ df\vert_{x_0}(h) = Df\vert_{x_0}(h) \approx f(x_0 + h) - f(x_0) $$

Let’s imagine we have a k-form field. The general k-form field is a linear combination of basis k-forms, and as we saw already, there are $ {n \choose k} $ such basis k-forms in a space of dimension n.

$$ \omega(v_1, \dots, v_k) = \sum_{i=1}^{n \choose k} f_i(x_1, \dots, x_n)\ \phi_{\sigma_1, \dots, \sigma_k} (v_1, \dots, v_k) $$

Note that the functions $ f_i $ are defined on the ambient vector space with dimension $n$, NOT $k$.

We lose no generality by looking at a single summand: $$ f_i(x_1, \dots, x_n)\ \phi_{\sigma_1, \dots, \sigma_k} (v_1, \dots, v_k) $$ For convenience, let’s just call this $f(x)\ \phi(v_1, \dots, v_k)$.

Now, a differential or exterior derivative ought to be found by taking differences at points that are close, just as with the derivative of a function:

$$ f(x+h) \phi(v_1, \dots, v_k) - f(x) \phi (v_1, \dots, v_k) $$ $$ = \Big[ f(x+h) -f(x)\Big]\ \phi(v_1, \dots, v_k) $$ $$ \approx df\vert_x(h)\ \phi(v_1, \dots, v_k) $$

The combination has one more input vector $h$, over $v_1, \dots, v_k$. So, we’re combining the forms $ df\vert_x $ and $ \phi $. Let’s aim to get an alternating form as a result. Given that $\phi$ is k-dimensional, the result will be an alternating k+1 form. We can get this by using the wedge product. We define:

$$ \boxed{ d(f\phi)\vert_x = df\vert_x \wedge \phi } $$

It’s important, that’s why it gets a frame.
Let’s go back to this part: $$ f(x+h) \phi(v_1, \dots, v_k) - f(x) \phi (v_1, \dots, v_k) $$

That’s a mini-integral at point $ x+h $, and a mini-integral at point $ x $ with negative sign. Both are k-forms so they measure k-volumes. $ d(f\phi)\vert_x $ is a k+1 form, so it’s a mini-integral measuring k+1-volumes.

Using the $\vert_x$ notation to clarify the arguments we’re interested in:

$$ \boxed{ d(f\phi)\vert_x(h, v_1, \dots, v_k) \approx f\vert_{x+h} \phi(v_1, \dots, v_k) - f\vert_x \phi (v_1, \dots, v_k) } $$

Bringing it all together … stopping before Stokes

The last equation smacks of the fundamental theorem of calculus. Let’s corroborate this.

We’re going to pick a very simple domain of integration. A $n-1$ dimensional base $B = a_1\ e_1\times \dots \times a_{n-1}\ e_{n-1}$. Height given by $ H\ e_n $. Domain of integration $S_p = B \times H\ e_n$, at point $p$. Let’s divide $H$ into $m$ small increments $h=\frac{H}{m}$.

$$ \sum_{l=1}^{m} d(f\phi)\vert_{x+(l-1)h}(h, v_1, \dots, v_k) $$

using the formula from last section…

$$ \approx \sum_{l=1}^m \Big[f\vert_{x+lh} \phi(v_1, \dots, v_k) - f\vert_{x+(l-1)h} \phi(v_1, \dots, v_k)\Big] $$ $$ = f\vert_{x+H} \phi(v_1, \dots, v_k) - f\vert_{x} \phi(v_1, \dots, v_k) $$

As happens in single-variable calculus, the sums of differences have a telescoping effect, and collapse to a single subtraction. One would suspect: $$ \int_{S_p} d(f\phi) = \int_{(p+H e_n)\times B} f\phi - \int_{p\times B} f\phi $$

Let’s prove it.

$$ d(f \phi_1 \wedge \dots \wedge \phi_{n-1}) = \Big( \sum_{m=1}^n \frac{\partial}{\partial x_m} f \phi_m \Big) \wedge \phi_1 \wedge \dots \wedge \phi_{n-1} $$ $$ = \frac{\partial}{\partial x_n} f \phi_n \wedge \phi_1 \wedge \dots \wedge \phi_{n-1} = (-1)^{n-1} \frac{\partial}{\partial x_n} f \phi_1 \wedge \dots \wedge \phi_n $$

Note that reordering the $\phi_i$ as we have done above incurs a possible change of sign. Each transposition of two 1-forms across $\wedge$ adds a sign reversal. Hence the $(-1)^{n-1}$ factor in the equation above.
Much as I commented that getting div, curl, grad is a red herring, it is satisfying to compute $$ d(f \phi_x + g \phi_y) = (\frac{\partial g}{\partial x} - \frac{\partial f}{\partial y})\ \phi_x\wedge\phi_y$$ just so easily. Yes, I know, that was implicitly done in dimension 2. If you did it in dimension 3, $\partial z$ and $\phi_z$ would enter the picture, and you’d get the curl.

OK, back to our proof: $$ \int_{S_p} d(f\phi_1 \wedge \dots \wedge \phi_{n-1}) = (-1)^{n-1} \int_{S_p} \frac{\partial}{\partial x_n} f \phi_1 \wedge \dots \wedge \phi_n $$

$$ = (-1)^{n-1} \int_{B} \Big( \int_p^{p+H e_n} \frac{\partial}{\partial x_n} f \phi_n \Big) \ \phi_1 \wedge \dots \wedge \phi_{n-1} $$

$$ = (-1)^{n-1} \int_{B} \Big( f\vert_{p+H e_n} - f\vert_p \Big)\ \phi_1 \wedge \dots \wedge \phi_{n-1} $$ $$ = (-1)^{n-1} \Big( \int_{B} f\vert_{p+H e_n}\ \phi_1 \wedge \dots \wedge \phi_{n-1} - \int_{B}f\vert_p\ \phi_1 \wedge \dots \wedge \phi_{n-1} \Big) $$

$$ = (-1)^{n-1} \Big( \int_{(p+H e_n)\times B} f\ \phi_1 \wedge \dots \wedge \phi_{n-1} - \int_{p\times B}f\ \phi_1 \wedge \dots \wedge \phi_{n-1} \Big) $$

We thus get our proof of a sort of “Fundamental Theorem of Calculus”, but we’re not quite there yet.

These two integrals are across the $n-1$ dimensional base B, and their opposite sign shows their opposite orientation. We can think of them as the integral of $f\phi_1 \wedge \dots \wedge \phi_{n-1}$ across the lid and base of the domain $S_p$. And, noting that $\phi_1 \wedge \dots \wedge \phi_{n-1}$ will be 0 for all the other “walls” of $S_p$, we can say that the two summands compute the total integral of

$f\phi_1 \wedge \dots \wedge \phi_{n-1}$

across the oriented boundary of $S_p$. The boundary will incorporate book-keeping for that $(-1)^{n-1} $ we encountered above.

$$ \int_{S_p} d(f\phi) = \int_{(p+H e_n)\times B} f\phi - \int_{p\times B} f\phi = \int_{\partial S_p} f\phi $$

We’ve only done one term $f \phi_{\sigma 1} \wedge \dots \phi_{\sigma k}$ while there are ${N \choose K}$ such terms; what remains is just keeping your notation straight, and is well handled by books and materials on forms and Stokes.

It’s interesting to realize that the fundamental part of this result, and the fundamental part of Stokes’s Theorem as done with forms, is the definition of the exterior derivative.

Once we got here:

$$ \boxed{ d(f\ \phi_1 \wedge \dots \wedge \phi_{n-1}) = \frac{\partial}{\partial x_n} f\ \phi_n \wedge \phi_1 \wedge \dots \wedge \phi_{n-1} } $$

everything else was simple follow-through.

Michael Spivak writes this in Calculus on Manifolds:

Stokes’ theorem shares three important attributes with many fully evolved major theorems:

  1. It is trivial.
  2. It is trivial because the terms appearing in it have been properly defined.
  3. It has significant consequences.

There’s more… elsewhere

There are more pieces to the narrative, which you can find in the books. Majorly missing from this post:

  • integrating the general form with its ${N \choose K}$ terms
  • generalized “cubes” and their boundaries in N-space via chains
  • integrating over non-rectangular domains via pullbacks
  • a proper discussion of orientation
  • defining exterior differentiation in a coordinate-free way

I haven’t found a single book that made everything click, and several that are popular left me frustrated so I’ll ignore them. However, three books I recommend:

  • Harold Edwards Advanced Calculus: A Differential Forms Approach
  • Serge Lang Undergraduate Analysis
  • Bamberg & Sternberg A course in mathematics for students or physics

Edwards and Bamberg & Sternberg both are introductory and try to motivate things. Lang gives a quick intro without much motivation and gets to Stokes with ease and no fuss.

Honorable mention to Michael Spivak’s Calculus on Manifolds and A Comprehensive Introduction to Differential Geometry. They are a bit uneven, in that Spivak just goes full-steam on multilinear algebra without much in the way of motivation, but then, they do get to the point fast, and Spivak knows how to tell a story with math.

Godspeed.