Lost Among Notes

Older >> The hype of rewrites

Writing programs to clarify calculus

You have to hand it to mathematicians, they create enticing notation. In basic calculus already there’s and and their myriad combinations in complicated layouts. I remember looking at my father’s physics and math textbooks when I had just started learning calculus in high-school, and imagining all sorts of magical powers that would be unlocked by the arcane symbols.

Mathematics has a lot of flexibility in terms of language too. Spoken languages can be used, and then there are meta-mathematical languages like set theory, logic or category theory.

And yet, I see cases where mathematical notation and exposition feel clunky or naïve or even timid, and where formulation using a computer language with its constraints can be comparatively clearer.

The derivative is a what?

In calculus, linguistic contortions are made when introducing the derivative. One begins with the derivative of a function at a point. Then the derivative function is introduced by extending to all points.

This is from the Wikipedia page on the derivative:

If f is a function that has a derivative at every point in its domain, then a function can be defined by mapping every point x to the value of the derivative of f at x. This function is written f’ and is called the derivative function or the derivative of f.

Quite a mouthful. The paragraph defines the derivative function, and with this wording, which is fairly standard, there are as many derivative functions as there are functions1. And yet we talk informally of “the derivative” as if there were only one.

What do we mean when we talk of “the derivative”? What kind of object is it? In college-level math one might see mentions of “the derivative operator”, sometimes represented with a D so that Df = f'. You can see this in some linear algebra / differential equations texts, where “the derivative” is a useful object to manipulate.

The intuitive idea one talks about when one talks of “the derivative” is an operation that can be applied to a function and returns another function:

derivative(sin) = cos
derivative(exp) = exp
derivative(f) = f'

In any programming language with support for function values, we can easily define a function which does exactly that.

The derivative as a function written in Go

All the code shown in this post is available in its GitHub page.

Let’s write a plausible definition of the derivative in Go. The code in this section is not what one would do for a mathematical system; the point of writing it is to show the expressive power of a mainstream language, in this case Go.
Don’t worry about the details. Just try to get the gist of what the code says.

Since Go is a strongly typed language, let’s create a type to represent functions of a real variable:

type realfunc func(float64) float64

Using floating point numbers to represent real numbers is inherently inaccurate, so this code won’t be winning any prizes. Again, my focus in this section is on the linguistic corset imposed by programming languages.

The starting point of the derivative is to compute the rate of change of a function around an input. Easy:

func rateOfChange(f realfunc, x, h float64) (float64, error) {
    if h == 0 {
        return 0, errors.New("h cannot be zero")
    }
    return (f(x+h) - f(x)) / h, nil
}

Notice that rateOfChange is a function, and it takes another function as a parameter. Many a mathematician would quibble here, but seriously, is there any doubt that this is a valid function? rateOfChange returns two arguments: a number and an error type. In the normal case where the h parameter is sensible, the error is nil meaning there is no error.

The derivative of a function at a point is the limit of the rate of change as the perturbation (the h argument in our code above) tends to zero. With our approach using floating point numbers we can’t compute the exact limit2, but we can compute the rateOfChange with progressively smaller values of h, and call it done when the value stabilizes. We’re going to use a rough criterion: we’re done when the rate of change varies less than 0.000001 on consecutive iterations.

func derivativeAt(f realfunc, x float64) (float64, error) {
    var limit float64 = 0
    for h := 0.1; h > 0.00000000001; h = h / 2 {
        rate, _ := rateOfChange(f, x, h)
        if math.Abs(limit-rate) < 0.000001 {
            return rate, nil
        }
        limit = rate
    }
    return 0, fmt.Errorf("function not differentiable at %f", x)
}

The details in the code above are not that important, but note that derivativeAt, like rateOfChange, returns an error argument. We can use it to signal that a function is not differentiable at a point, which happens if the rates computed on smaller and smaller perturbations don’t stabilize. Let’s see an example:

func main() {
    beast := func(x float64) float64 {
        if x == 0 {
            return 0
        }
        return x * math.Sin(1/x)
    }
   _, err := derivativeAt(beast, 0)
   fmt.Printf("derivativeAt(beast, 0) → %s\n", err.Error())
}

Which prints out:

derivativeAt(beast, 0) → function not differentiable at 0.000000

We’re ready to define the derivative function:

func derivative(f realfunc) realfunc {
    return func(x float64) float64 {
        rate, _ := derivativeAt(f, x)
        return rate
    }
}

We are creating a new function as the value returned by the derivative function. Not every programming language allows creating functions on the fly, but it’s not a rarity either. Several general-purpose languages can do this, for example JavaScript, Go, Scheme, and all the functional languages.

In derivative we are discarding the second argument returned by derivativeAt (in Go we use an underscore _ to ignore an argument.) By doing this we ignore that a function may fail to be differentiable at a point. In a code review this would get some raised eyebrows, and with good reason. We could easily do something smarter, but in math one often hand-waves these details away. We have no problem saying that 1/x is the derivative of log(x), and only if pressed adding that this is only true for values of x greater than zero (of course!!)

There you have it. We have defined the derivative as a function that takes a real function as input, and returns another real function.

Let’s test this code to make sure we’re not deluded. We create some smoke tests:

func main() {
    fmt.Printf("exp(1) → %f\n", math.Exp(1))
    expPrime := derivative(math.Exp)
    fmt.Printf("expPrime(1) → %f\n", expPrime(1))
    fmt.Printf("cos(π) → %f\n", math.Cos(math.Pi))
    sinPrime := derivative(math.Sin)
    fmt.Printf("sinPrime(π) → %f\n", sinPrime(math.Pi))
}

Executing this code will print the following:

exp(1) → 2.718282
expPrime(1) → 2.718282
cos(π) → -1.000000
sinPrime(π) → -1.000000

Go is a simple general-purpose programming language, but many mathematicians, with all the expressive freedom at their disposal, are more conservative than Go in their use of functions. The sad thing is, the modern mathematical definition of a function would fully justify calling the derivative operation a function.3

The derivative of vector functions

While the linguistic clunkiness around “the derivative” in basic calculus is only minor, in multivariate calculus the story gets more complicated. Since there is no division operator in vector spaces, the “rate of change” approach is abandoned in favor of the “best linear approximation”.

This is from Lang’s Undergraduate Analysis:

If f is differentiable at every point x of U, then we say that f is differentiable on U. In that case the derivative f' is a map f' : U → L(E,F) from U into the space of linear maps L(E,F).

f' is a very different kind of function from f, and you cannot graph them side by side. In fact you can’t really graph f'—you need to resort to tricks like level curves or talking of the gradient vector, which is really the row matrix representing the linear map but whatever.

Let’s write some code. First let’s create types to represent vectors, vector functions, and linear maps. Linear maps are one kind of vector functions but we’re going to keep track of them separately for clarity:

type vector struct {
    X float64
    Y float64
    Z float64
}

type vectorFunc func(vector) float64

type linearMap func(vector) float64

A limitation of these definitions is that we’re only looking at 3D vectors, and only functions mapping those vectors to real numbers. For the general study of functions from N-space to M-space we’d need more sophisticated code, and Go may not be the easiest language for that.

The derivative of a function at a point is the linear map that best approximates the function at that point.
For small h we have f(x+h) - f(x) ≈ L(h) where L, the derivative at x, is a linear map.
What does the linear map L “do” when h isn’t small? It’s a linear projection, in the simple every-day way we make linear projections when estimating our arrival time in the office, assuming the traffic behaves as it has in the last minutes.

Since L is linear, L(k × h) = k × L(h), and since it’s the derivative, f(x+h) - f(x) ≈ L(h) for small values of h. Very roughly, for an arbitrary h, we could compute the linear projection like so:

  1. scale h down, say multiplying it by k = 0.000001 → g = k×h
  2. compute f(x + g) - f(x)
  3. multiply the result by the inverse of k, in this case 1,000,000

We can easily code this in Go, but before we do that we need to define a few operations to handle vector objects, since Go cannot natively sum or scale them.

func sumVect(a, b vector) vector {
    return vector{
        X: a.X + b.X,
        Y: a.Y + b.Y,
        Z: a.Z + b.Z,
    }
}

func scaleVect(k float64, a vector) vector {
    return vector{
        X: k * a.X,
        Y: k * a.Y,
        Z: k * a.Z,
    }
}

func lengthVect(a vector) float64 {
    return math.Sqrt(a.X*a.X + a.Y*a.Y + a.Z*a.Z)
}

We’re ready to write the linear projection code, and add to it the iterative computation just like we did with derivativeAt,

func linearProjectionAt(f vectorFunc, x vector) linearMap {
    return func(h vector) float64 {
        var limit float64 = 0
        hLength := lengthVect(h)
        for k := 0.1; k*hLength > 0.00000000001; k = k / 2 {
            g := scaleVect(k, h)
            projection := (f(sumVect(x, g)) - f(x)) / k
            if math.Abs(limit-projection) < 0.0000001 {
                return limit
            }
            limit = projection
        }
        return 0
    }
}

Note that we’re not using an error argument to indicate non-differentiability like we did in derivativeAt. We should have, but we’re going to ignore the errors anyway, as we did with derivative.

What we’re calling the linear projection is generally called the directional derivative in math books, but I never liked that definition. I’ve only thought of what the directional derivative really means when writing this post.

We’re almost there: the derivative ought to take a function as input and generate a new function that assigns the linear projection map to each point. We’re going to define a form field as any function that assigns a linear map to each point.

type formField func(x vector) linearMap

func derivative3D(f vectorfunc) formField {
    return func(x vector) linearMap {
        return linearProjectionAt(f, x)
    }
}

Just as we did for real functions, let’s write some test code to see if we’re out of our minds.

func main() {
    sphereF := func(a vector) float64 {
        return a.X*a.X + a.Y*a.Y + a.Z*a.Z
    }
    spherePrime := derivative3D(sphereF)
    fmt.Printf("spherePrime(1, 0, 0)(1, 0, 0) → %f\n",
        spherePrime(vector{X: 1})(vector{X: 1}))
    fmt.Printf("spherePrime(1, 0, 0)(0, 1, 0) → %f\n",
        spherePrime(vector{X: 1})(vector{Y: 1}))
    fmt.Printf("spherePrime(1, 0, 0)(0, 0, 1) → %f\n",
        spherePrime(vector{X: 1})(vector{Z: 1}))
}

Executing this code will print:

spherePrime(1, 0, 0)(1, 0, 0)  2.000000
spherePrime(1, 0, 0)(0, 1, 0)  0.000000
spherePrime(1, 0, 0)(0, 0, 1)  0.000000

Yes, the derivative is a weird object. A function that returns a function on each point. If you read the above results between the lines, you’ll see the row matrix representing spherePrime at (1, 0, 0) is (2, 0, 0), which is expected since the Jacobian matrix of the sphereF function is (2x, 2y, 2z).

Writing the code forces us to confront that the derivative is more complicated than just returning a gradient vector, even if that is a convenient image.

The conceptual difficulty of functions

The code we’ve been writing, creating functions on demand and taking functions as parameters, confuses many beginning programmers, just like recursion does. I’m not sure what the divide is between the programmers that are comfortable with this type of functional programming and those that aren’t, but I think perhaps they are more mathematically inclined, or more willing to accept abstraction.

The mathematical concept of function has had a long evolution. In the times of Euler and Lagrange, a function was an expression, a formula involving polynomials, quotients, exponentials and such.

The modern concept with its set-theoretic trappings was developed because mathematicians were starting to study objects that deserved to be called functions but were not writable neatly as formulas.

In the modern definition, a function is a tabulation where no two entries are allowed to have the same value on the left column:

x exp(x)
0 1
1 e
-1 1/e

We’re fully justified in calling the derivative a function:

f derivative(f)
1 0
x 1
exp exp
sin cos

In basic calculus of a real variable, we take the derivative of a function of a real variable, and we get another function of a real variable. In general multi-variate calculus, we take the derivative of a general vector function, and we get a function that returns another function on each point.
To be fair, this is complicated to grapple with. The surprising bit, for me, is that the notation and prose around this in math books is so often more convoluted than it needs to be, and that a language as limited as Go can be liberating.


  1. assuming we have restricted ourselves to the set of differentiable functions. ↩︎

  2. it is possible to get exact results for a limit computation in a program, in the same way a person would do it manually: by symbolic manipulation. A program for derivatives or limits would not be representing “the variable” x as a floating point number or any kind of number really. ↩︎

  3. a function from a set A to a set B is a collection of pairs of the form (a, b) where no two pairs can have the same first element. ↩︎