Writing programs to clarify calculus
You have to hand it to mathematicians, they create enticing notation.
In basic calculus already there’s ∫
and ∂
and their myriad
combinations in complicated layouts.
I remember looking at my father’s physics and math textbooks when I had just
started learning calculus in highschool, and imagining all sorts of magical
powers that would be unlocked by the arcane symbols.
Mathematics has a lot of flexibility in terms of language too. Spoken languages can be used, and then there are metamathematical languages like set theory, logic or category theory.
And yet, I see cases where mathematical notation and exposition feel clunky or naïve or even timid, and where formulation using a computer language with its constraints can be comparatively clearer.
The derivative is a what?
In calculus, linguistic contortions are made when introducing the derivative. One begins with the derivative of a function at a point. Then the derivative function is introduced by extending to all points.
This is from the Wikipedia page on the derivative:
If f is a function that has a derivative at every point in its domain, then a function can be defined by mapping every point x to the value of the derivative of f at x. This function is written f’ and is called the derivative function or the derivative of f.
Quite a mouthful. The paragraph defines the derivative function, and with this wording, which is fairly standard, there are as many derivative functions as there are functions^{1}. And yet we talk informally of “the derivative” as if there were only one.
What do we mean when we talk of “the derivative”? What kind of object is it?
In collegelevel math one might see mentions of “the derivative
operator”, sometimes represented with a D
so that Df = f'
.
You can see this in some linear algebra / differential equations texts, where
“the derivative” is a useful object to manipulate.
The intuitive idea one talks about when one talks of “the derivative” is an operation that can be applied to a function and returns another function:
derivative(sin) = cos
derivative(exp) = exp
derivative(f) = f'
In any programming language with support for function values, we can easily define a function which does exactly that.
The derivative as a function written in Go
All the code shown in this post is available in its GitHub page.
Let’s write a plausible definition of the derivative in Go.
The code in this section is not what one would do for a mathematical
system; the point of writing it is to show the expressive power of a mainstream
language, in this case Go.
Don’t worry about the details. Just try to get the gist of what the
code says.
Since Go is a strongly typed language, let’s create a type to represent functions of a real variable:
type realfunc func(float64) float64
Using floating point numbers to represent real numbers is inherently inaccurate, so this code won’t be winning any prizes. Again, my focus in this section is on the linguistic corset imposed by programming languages.
The starting point of the derivative is to compute the rate of change of a function around an input. Easy:
func rateOfChange(f realfunc, x, h float64) (float64, error) {
if h == 0 {
return 0, errors.New("h cannot be zero")
}
return (f(x+h)  f(x)) / h, nil
}
Notice that rateOfChange
is a function, and it takes another function as a
parameter. Many a mathematician would quibble here, but seriously, is there
any doubt that this is a valid function?
rateOfChange
returns two arguments: a number and an error type. In the
normal case where the h
parameter is sensible, the error is nil
meaning
there is no error.
The derivative of a function at a point is the limit of the rate of change as
the perturbation (the h
argument in our code above) tends to zero.
With our approach using floating point numbers we can’t compute the exact
limit^{2},
but we can compute the rateOfChange
with progressively smaller
values of h
, and call it done when the value stabilizes. We’re going to
use a rough criterion: we’re done when the rate of change
varies less than 0.000001 on consecutive iterations.
func derivativeAt(f realfunc, x float64) (float64, error) {
var limit float64 = 0
for h := 0.1; h > 0.00000000001; h = h / 2 {
rate, _ := rateOfChange(f, x, h)
if math.Abs(limitrate) < 0.000001 {
return rate, nil
}
limit = rate
}
return 0, fmt.Errorf("function not differentiable at %f", x)
}
The details in the code above are not that important, but note that
derivativeAt
, like rateOfChange
, returns an error argument. We
can use it to signal that a function is not differentiable at a point, which
happens if the rates computed on smaller and smaller perturbations don’t
stabilize. Let’s see an example:
func main() {
beast := func(x float64) float64 {
if x == 0 {
return 0
}
return x * math.Sin(1/x)
}
_, err := derivativeAt(beast, 0)
fmt.Printf("derivativeAt(beast, 0) → %s\n", err.Error())
}
Which prints out:
derivativeAt(beast, 0) → function not differentiable at 0.000000
We’re ready to define the derivative function:
func derivative(f realfunc) realfunc {
return func(x float64) float64 {
rate, _ := derivativeAt(f, x)
return rate
}
}
We are creating a new function as the value returned by the derivative
function.
Not every programming language allows creating functions on the fly, but it’s
not a rarity either. Several generalpurpose languages can do this, for example
JavaScript, Go, Scheme, and all the functional languages.
In derivative
we are discarding the second argument
returned by derivativeAt
(in Go we use an underscore _
to ignore an
argument.)
By doing this we ignore that a function may fail to be differentiable at a
point. In a code review this would get some raised eyebrows, and with good
reason. We could easily do something smarter, but in math one often handwaves
these details away. We have no
problem saying that 1/x
is the derivative of
log(x)
, and only if pressed adding that this is only
true for values of x
greater than zero (of course!!)
There you have it. We have defined the derivative
as a function that
takes a real function as input, and returns another real function.
Let’s test this code to make sure we’re not deluded. We create some smoke tests:
func main() {
fmt.Printf("exp(1) → %f\n", math.Exp(1))
expPrime := derivative(math.Exp)
fmt.Printf("expPrime(1) → %f\n", expPrime(1))
fmt.Printf("cos(π) → %f\n", math.Cos(math.Pi))
sinPrime := derivative(math.Sin)
fmt.Printf("sinPrime(π) → %f\n", sinPrime(math.Pi))
}
Executing this code will print the following:
exp(1) → 2.718282
expPrime(1) → 2.718282
cos(π) → 1.000000
sinPrime(π) → 1.000000
Go is a simple generalpurpose programming language, but many mathematicians, with all the expressive freedom at their disposal, are more conservative than Go in their use of functions. The sad thing is, the modern mathematical definition of a function would fully justify calling the derivative operation a function.^{3}
The derivative of vector functions
While the linguistic clunkiness around “the derivative” in basic calculus is only minor, in multivariate calculus the story gets more complicated. Since there is no division operator in vector spaces, the “rate of change” approach is abandoned in favor of the “best linear approximation”.
This is from Lang’s Undergraduate Analysis:
If
f
is differentiable at every pointx
ofU
, then we say thatf
is differentiable onU
. In that case the derivativef'
is a mapf' : U → L(E,F)
fromU
into the space of linear mapsL(E,F)
.
f'
is a very different kind of function from f
, and you
cannot graph them side by side. In fact you can’t really graph f'
—you
need to resort to tricks like level curves or talking of the gradient vector,
which is really the row matrix representing the linear map but whatever.
Let’s write some code. First let’s create types to represent vectors, vector functions, and linear maps. Linear maps are one kind of vector functions but we’re going to keep track of them separately for clarity:
type vector struct {
X float64
Y float64
Z float64
}
type vectorFunc func(vector) float64
type linearMap func(vector) float64
A limitation of these definitions is that we’re only looking at 3D vectors, and only functions mapping those vectors to real numbers. For the general study of functions from Nspace to Mspace we’d need more sophisticated code, and Go may not be the easiest language for that.
The derivative of a function at a point is the linear map that best
approximates the function at that point.
For small h
we have f(x+h)  f(x) ≈ L(h)
where
L
, the derivative at x
, is a linear map.
What does the linear map L
“do” when h
isn’t small?
It’s a linear projection, in the simple everyday way we make linear
projections when estimating our arrival time in the office, assuming the
traffic behaves as it has in the last minutes.
Since L
is linear, L(k × h) = k × L(h)
, and since it’s the
derivative, f(x+h)  f(x) ≈ L(h)
for small values of h
.
Very roughly, for an arbitrary h
, we could compute the linear projection
like so:
 scale
h
down, say multiplying it byk = 0.000001 → g = k×h
 compute
f(x + g)  f(x)
 multiply the result by the inverse of
k
, in this case 1,000,000
We can easily code this in Go, but before we do that we need to define a few
operations to handle vector
objects, since Go cannot natively sum or scale
them.
func sumVect(a, b vector) vector {
return vector{
X: a.X + b.X,
Y: a.Y + b.Y,
Z: a.Z + b.Z,
}
}
func scaleVect(k float64, a vector) vector {
return vector{
X: k * a.X,
Y: k * a.Y,
Z: k * a.Z,
}
}
func lengthVect(a vector) float64 {
return math.Sqrt(a.X*a.X + a.Y*a.Y + a.Z*a.Z)
}
We’re ready to write the linear projection code, and add to it the iterative
computation just like we did with derivativeAt
,
func linearProjectionAt(f vectorFunc, x vector) linearMap {
return func(h vector) float64 {
var limit float64 = 0
hLength := lengthVect(h)
for k := 0.1; k*hLength > 0.00000000001; k = k / 2 {
g := scaleVect(k, h)
projection := (f(sumVect(x, g))  f(x)) / k
if math.Abs(limitprojection) < 0.0000001 {
return limit
}
limit = projection
}
return 0
}
}
Note that we’re not using an error argument to indicate nondifferentiability
like we did in derivativeAt
. We should have, but we’re going to ignore the
errors anyway, as we did with derivative
.
What we’re calling the linear projection is generally called the directional derivative in math books, but I never liked that definition. I’ve only thought of what the directional derivative really means when writing this post.
We’re almost there: the derivative ought to take a function as input and generate a new function that assigns the linear projection map to each point. We’re going to define a form field as any function that assigns a linear map to each point.
type formField func(x vector) linearMap
func derivative3D(f vectorfunc) formField {
return func(x vector) linearMap {
return linearProjectionAt(f, x)
}
}
Just as we did for real functions, let’s write some test code to see if we’re out of our minds.
func main() {
sphereF := func(a vector) float64 {
return a.X*a.X + a.Y*a.Y + a.Z*a.Z
}
spherePrime := derivative3D(sphereF)
fmt.Printf("spherePrime(1, 0, 0)(1, 0, 0) → %f\n",
spherePrime(vector{X: 1})(vector{X: 1}))
fmt.Printf("spherePrime(1, 0, 0)(0, 1, 0) → %f\n",
spherePrime(vector{X: 1})(vector{Y: 1}))
fmt.Printf("spherePrime(1, 0, 0)(0, 0, 1) → %f\n",
spherePrime(vector{X: 1})(vector{Z: 1}))
}
Executing this code will print:
spherePrime(1, 0, 0)(1, 0, 0) → 2.000000
spherePrime(1, 0, 0)(0, 1, 0) → 0.000000
spherePrime(1, 0, 0)(0, 0, 1) → 0.000000
Yes, the derivative is a weird object. A function that returns a function on
each point. If you read the above results between
the lines, you’ll see the row matrix representing spherePrime
at
(1, 0, 0)
is (2, 0, 0)
, which is expected since the
Jacobian matrix of the sphereF
function is (2x, 2y, 2z)
.
Writing the code forces us to confront that the derivative is more complicated than just returning a gradient vector, even if that is a convenient image.
The conceptual difficulty of functions
The code we’ve been writing, creating functions on demand and taking functions as parameters, confuses many beginning programmers, just like recursion does. I’m not sure what the divide is between the programmers that are comfortable with this type of functional programming and those that aren’t, but I think perhaps they are more mathematically inclined, or more willing to accept abstraction.
The mathematical concept of function has had a long evolution. In the times of Euler and Lagrange, a function was an expression, a formula involving polynomials, quotients, exponentials and such.
The modern concept with its settheoretic trappings was developed because mathematicians were starting to study objects that deserved to be called functions but were not writable neatly as formulas.
In the modern definition, a function is a tabulation where no two entries are allowed to have the same value on the left column:
x  exp(x) 

0  1 
1  e 
1  1/e 
…  … 
We’re fully justified in calling the derivative a function:
f  derivative(f) 

1  0 
x  1 
exp  exp 
sin  cos 
…  … 
In basic calculus of a real variable, we take the derivative of a function of a
real variable, and we get another function of a real variable.
In general multivariate calculus, we take the derivative of a general vector
function, and we get a function that returns another function on each point.
To be fair, this is complicated to grapple with.
The surprising bit, for me, is that the notation and prose around this in math
books is so often more convoluted than it needs to be, and that a language as
limited as Go can be liberating.

assuming we have restricted ourselves to the set of differentiable functions. ↩︎

it is possible to get exact results for a limit computation in a program, in the same way a person would do it manually: by symbolic manipulation. A program for derivatives or limits would not be representing “the variable”
x
as a floating point number or any kind of number really. ↩︎ 
a function from a set A to a set B is a collection of pairs of the form
(a, b)
where no two pairs can have the same first element. ↩︎