Posts Tagged ‘calculus’

Integration by parts

April 19, 2013

How did loving the ground-up toenails of bisexuals get an interior designer to take up geology? Simple, he went from noting decor to what the core denotes by being into grated bi-parts.

I don’t really get why this XKCD is funny.

But here is a picture explaining integration by parts:

The area of the entire rectangle is uv, and it is made of two parts we integrate, so


uv = \int \!u\, \text{d}v + \int\! v\,\text{d}u


and therefore


\int \! u \,\text{d}v = uv - \int\! v \,\text{d}u


Also, take \text{d}(uv) = \text{d}(\int \!u \,\text{d}v + \int \!v\,\text{d}u) and you find


\text{d}(uv) = u \,\text{d}v + v\,\text{d}u,


which is the product rule.

Why is the integral of 1/x equal to the natural logarithm of x?

December 17, 2011

The title of this post asks a question that many calculus students find befuddling. Here I’ll give some geometric intuition behind it. I leave small logical gaps to avoid cheating the reader of the pleasure of their discovery.

One essential feature of logarithms is that they make a multiplication problem equivalent to an addition problem, by which I mean

\ln(ab) = \ln(a) + \ln(b)

Meanwhile, \int\frac{1}{x}\mathrm{d}x is usually thought of geometrically as the area underneath a curve. The problem, then, is to try to see visually what an area under a curve has to do with turning multiplication into addition.

Here’s a graph of 1/x, and we’re finding, as an example, the area under it from 1 to 2.


Let’s say now that we multiply the limits of integration by two, so we’re now finding the area from 2 to 4. Here’s what that looks like.

second integral

The two portions are actually very similar to each other in their overall shape. The orange one is twice is wide as the green one, but also half as tall. Here they are overlaid.

overlaid integrals

If you take the green shape and first squash it down vertically by a factor of two then stretch it out horizontally by a factor of two, you get the orange shape exactly. (If you don’t believe this, convince yourself it works!) This means the areas of these shapes are exactly the same, even though we don’t know what that area is.

Show for yourself that this result is general. The area under 1/x from a to b is the same as that from ac to bc.

What, then, is the area from 1 to 6? We can break it into two parts – the area from 1 to 2 and the area from 2 to 6. But the area from 2 to 6 is the same as the area from 1 to 3, by the above reasoning.

Thus, the area from 1 to 6 is the same as the sum of the areas from 1 to 2 and from 1 to 3. Note that 6 = 3*2. Again, this is general. The area under 1/x from 1 to ab is the same as the sum of the areas from 1 to a and from 1 to b.

That’s pretty good motivation for the definition

\ln(x) = \int_1^x\frac{1}{t}\mathrm{d}t

Note that this is being taken as a definition of the natural logarithm, not a proof of the relationship. Our argument about the integral of 1/x now translates to the statement

\ln(ab) = \ln(a) + \ln(b)

Now, step by step, we will show that all the other properties you expect of the natural logarithm follow from this definition.

It is evident that

\ln(1) = 0

Our definition implies that the logarithm grows without bound because if we continually multiply the argument of the logarithm by two, we continually add \ln(2) to the value. (i.e. \ln(2x) = \ln(x) + \ln(2)). Since we can multiply any number by two over and over, we can add \ln(2) to the logarithm as many times as we want. That means we can make the logarithm arbitrarily big.

This also means that starting the integral from 1 rather than from zero was a good idea. If we start from zero, the integral is infinite. We can see this because 1/x is symmetric about the line y = x.


This implies that the area to the left of the curve is the same as the area under the curve, like this.


We just showed that the area under the curve diverges as we move the right hand side of the integral out to infinity, so the area to the left of the curve diverges, too. If we started the integral at zero, it would be infinite.

What about taking the logarithm of numbers less than one? A good check of whether everything makes sense so far is to work out that \ln(1/x) = - \ln(x).

Since the area under 1/x starts at zero when x=1 and goes up infinitely, it is clear that there must be some number x such that \ln(x) = 1. Let’s choose to call that number e. We don’t know what it is yet, but it certainly exists. Thus

\ln(e) = 1

Again, this is definition, not proof.

It is immediately apparent that, for example, \ln(e^5) = \ln(e*e*e*e*e) = 5\ln(e) = 5. That makes e a pretty handy number. It shows us that the logarithm of a number x is how many times you need to multiply e to itself in order to get x.

How about \ln(e^{3/2})? That is \ln\left([e^{1/2}]^3\right) = 3\ln(e^{1/2}). So in order to understand logarithms of rational numbers, we need to understand roots of e.

That’s not so hard, though.

\ln(e^{1/2}*e^{1/2}) = \ln(e) = 1.

On the other hand,

\ln(e^{1/2}*e^{1/2}) = \ln(e^{1/2}) + \ln(e^{1/2}) = 2 \ln(e^{1/2})

From this we deduce \ln(e^{1/2}) = 1/2. Returning to the unfinished example, \ln(e^{3/2}) = 3*(1/2) = 3/2. It is not great leap to say that for any rational number x, we have

\ln(e^x) = x

This is important result; it is probably the definition of \ln(x) that you’re used to. The pieces are falling into place. The main remaining hurdle is to find the value of e and show it comes to what we expect.

Before that, we should mention how the above relation works for irrational numbers. Irrational numbers are squeezed in between the rational ones, and since the definition of the logarithm as the area under a curve is evidently smooth, the logarithm of an irrational number is squeezed in tightly as well. Ultimately, the above relation holds for all positive numbers. However, the fine details of real numbers are more involved than I would like to address here. (The logarithm of a negative number or of zero isn’t defined, at least not in the real numbers. What is a difficulty with doing so?)

Finally, we would like some way of determining what e is. Here is one way to do it. For small values of x, we can see that

\ln(1+x) \approx x

This follows from the extremely simple approximation below.


The red box is an approximation to the area of the green integral. The red box clearly has area x while the green integral is \ln(1+x). Thus

\ln(1+x) \approx x

It’s crude, but it works better and better as x becomes tiny. Multiplying both sides of the approximation by 1/x, we get

\frac{1}{x}\ln(1+x) \approx 1

We know how to rewrite the left hand side. It gives

\ln\left([1+x]^{1/x}\right) \approx 1

Since we have defined e by \ln(e) = 1, we finally see

e = \lim_{x\to 0} (1+x)^{1/x}

This is the common definition of e. At last we see that the reason that the integral of 1/x is \ln(x) is that all the properties of the two functions are exactly the same, and so they must be the same function.

Visualizing Elementary Calculus: Optimization

April 26, 2011

Geometric thinking sometimes lets us skip a bunch of algebraic steps in basic min/max problems. Here are some common problems solved geometrically. I learned to think about optimization this way from The Feynman Lectures.

This series:

I – Introduction
II – Trigonometry
III – Differentiation Rules 1
IV – Graphs, Tangents, Derivatives
V – Optimization


Where is the vertex of the parabola

y = ax^2 + bx + c ?

A parabola looks like this, with the vertex at the lowest part (or highest if it opens down)

If you’re at the bottom like that, the tangent line must be flat. Otherwise, you could take a small step in whichever direction on the tangent line went down, and you’d get to something smaller, and hence you weren’t at the bottom to begin with.

So to find the vertex, we simply need to look for where the tangent is horizontal. In the previous post, we saw that the slope of the tangent is the derivative, so we need to set the derivative to zero.

y = ax^2 + bx + c

\frac{\textrm{d}y}{\textrm{d}x} = 2ax + b = 0

x = \frac{-b}{2a} ,

a fact you may remember from algebra.


Suppose we want to put up some fence to make a rectangular pen. We only have 90m of fence to use, and we want the biggest possible pen.

It’s easy guess by symmetry that the optimal shape is a square, but what if we twist the problem slightly? Say the fence is going up against the side of a cliff, and so we get one side of it for free. Now what is the best rectangular pen?

The fence has a length and a width like this:

To maximize or minimize A respect to B, \textrm{d}A/\textrm{d}B must be zero. So we want the derivative of area with respect to side length to be zero. We draw a picture to show the product rule

\textrm{d}A = l\textrm{d}w + w \textrm{d}l  = 0

If we take away 1 meter from the vertical length of the fence, it has to be split in two to go on the horizontal widths, so they only add half a meter. \textrm{d}w = -\frac{1}{2} \textrm{d}l, so

\textrm{d}A = \frac{-1}{2} l \textrm{d} l + w \textrm{d} l = 0

l = 2w

so the vertical length of the fence should be twice the horizontal width.

Distance to a line

Here is a easy problem: Given a line and a point, what is the shortest path from the point to the line?

There are many ways to go from the point to the line. Here are a few:

If the point of contact with the line is called x and the distance from our original point to the line is called l, we can form the derivative \textrm{d}l/\textrm{d}x. This derivative tells us how the distance to the line changes as we move the point around.

If we find x that minimizes l, the \textrm{d}l/\textrm{d}x is zero there. We can make a little picture to illustrate \textrm{d}l and \textrm{d}x. \textrm{d}x is a little distance along the line.

\textrm{d}l is the change in the length of the segment. To find it, draw a circle with the center at the point off the line going through one of the candidate points on the line. The circle shows everywhere that’s equidistant, so the length of the other segment outside the circle is how much longer it is.

In order for the extra bit to shrink to zero, indicating the derivative is zero, we must have the circle be tangent to the line. Tangents to circle are perpendicular to radii, so the shortest possible path from a point to a line is perpendicular to the line. This is a result you could probably get without calculus, but it’s a good warm up for the next bit.

Fermat’s Principle

Fermat’s principle for optics says that light takes the whatever path from A to B is fastest. We can find such paths by calculus, keeping in mind “the fastest path” means the derivative of the time of travel is zero.

Take two arbitrary points on the same side of a flat mirror.

What is the fastest route from A to B? The answer is a straight line, ignoring the mirror. But what is the fastest route that also touches the mirror somewhere? There are many potential places to touch the mirror, and therefore many potential paths.

The fastest one has the derivative of path length with respect to contact point equal to zero, so take two nearby points and compare.

In this picture, segment AC is clearly shorter than segment AD. How much shorter? Draw a circle with AC as a radius.

The purple segment shows the discrepancy. We would like to find its length. Zoom in on the interesting area.

Since this is calculus, we are letting the points C and D get be separated by a very small distance \textrm{d}x. When we zoom in, the circle appears indistinguishable from its tangent line, which is a line perpendicular to AC. Also, as C and D get closer together, AC become parallel to AD, so the circle is also perpendicular to AD.

The purple segment’s length is just \sin\theta \textrm{d}x.

Next we want to do basically the same thing to figure out how much longer CB is than DB.

Again, we zoom in on the interesting area, making the same linear approximations as the separation \textrm{d}x becomes very small.

This time, we get that the extra length is \textrm{d}x\sin\phi.

These two extra lengths must cancel each other out if the paths are going to be the same length, so

\textrm{d}x\sin\theta = \textrm{d}x\sin\phi

so \theta = \phi. \theta is the angle that the incoming rays make with the vertical, and \phi is the angle that the outgoing rays make with the vertical (exercise). So Fermat’s principle says that light bounces off a mirror at the same angle it came in.

A similar problem is the “lifeguard problem”. You’re a lifeguard. You see a drowning person, and you want to go save them, but you have to decide what path is fastest. You run part way on sand and swim part way in the water. What path should you choose?

You go faster on the beach, so you probably shouldn’t take a straight line. Instead, run further on the beach and turn a bit when you enter the water. We want to know how much you should turn. Again, take two nearby points and find the condition so that the difference in path lengths sums to zero. Let’s bring those green circles and purple segments back again.

They’re different lengths, which is actually what we want. We want those two purple segments to take the same amount of time to traverse, not to be the same length. That way, the two nearby paths take equal total time and the derivative of the time with respect to the entry point is zero.

From here, you can follow the details through to find v_{water} \sin\theta = v_{land} \sin \phi, which is called Snell’s Law.

Fermat’s principle does not really state that light takes a path of least time – in fact having the derivative be zero is enough. In most cases the time is least, in some applications, images actually form where the time is at a maximum compared to nearby paths, or even where it is a “stationary point” – the derivative is zero but not a minimum or maximum, which happens, for example, at the origin of the graph of y = x^3.

Witches with unusually-shaped heads

This example is somewhat artificial, but what is the largest cylinder (an unusual head shape) that fits inside a given right circular cone (witch’s hat)?

The correct cylinder is clearly something along these lines:

We want to optimize V by changing h, so we had better set \textrm{d}V/\textrm{d}h = 0.

As the cylinder gets a little taller, it sweeps out some volume with its top, and sucks in some volume with its sides, so

\textrm{d}V = A_{top}\textrm{d}h - A_{sides} \textrm{d}r

\textrm{d}h and \textrm{d}r are related by the slope of the cone, which is R/H, so we have

\textrm{d}V = \pi r^2\textrm{d}h - 2\pi r h \textrm{d}h * \frac{R}{H} = 0

which is equivalent to

\frac{r}{h} = 2\frac{R}{H}

This happens when

h = \frac{H}{3}

r = \frac{2R}{3}

These toy optimization problems are given to calculus students for practice. This is a useful skill, but many real optimization problems are more difficult because they involve many variables (even infinitely many). These problems are extremely important to physics, though. In the next posts, we’ll see some physics examples.


  1. Write down the quadratic formula and stare at it until you understand how it shows you what the vertex of a parabola is.
  2. What is the smallest and largest value of the function f(x) = \sin(x)/x? (you can check your answer like this.)
  3. Where are the “humps” in the graph of the cubic equation y = ax^3 + bx^2 + cx + d? Under what conditions does it have humps? How can you use this to tell whether a cubic has one real zero or three?
  4. Use the Pythagorean theorem and some algebra to solve the problem of finding the shortest segment from a point to a line.
  5. Prove the result about bouncing off the flat mirror using the concept of an image point. Create a new point, called B' on the other side of the mirror opposite B. For every path from A to the mirror to B, there is an equally-long path from A to the same point on the mirror to B'. Now use the fact that the fastest route from A to B' is a straight line to find the fastest route from A to B touching the mirror.
  6. Imagine that instead of a single pen, we want to make a whole grid of pens (or cubicles) enclosed on all sides. Our grid is m pens wide and n pens tall. If we have a fixed amount of fencing, what should the aspect ratio of the pens be to maximize their area? (Answer: m+1 : n+1
  7. Find the optimal height for a cylinder with fixed surface area and maximal volume. Compare this to a cylinder with only one end cap, and then one with no end caps. (Answer: r = 3/2 h, r = 3h, and unbounded)
  8. Here’s a modified version of the lifeguard problem. The pool has become an ellipse. What path should the lifeguard take? Try to find a condition such that there are three equal, optimal paths.

Visualizing Elementary Calculus: Graphs, Tangents, Derivatives

April 17, 2011

The derivative as the slope of a graph is standard fare, and it’s important for visualizing calculus.

This series
I – Introduction
II – Trigonometry
III – Differentiation Rules 1
IV – Graphs, Tangents, Derivatives

The Derivative as Slope

Let’s look at the graph of y=x^2.

If we take a point on this graph, for example (2,4), the y-value is the square of the x-value.

If look at a nearby point, those values have changed by \textrm{d}y and \textrm{d}x respectively. We can visualize those changes like this:

\textrm{d}y and \textrm{d}x are supposed to represent tiny changes, so we better bring the points in close to each other and zoom in. Any reasonable curve looks like a straight line when you zoom in on it enough, including this one. As far as these nearby points are concerned, y = x^2 is a line, and they are on it. That line is called the tangent line. Here it is:

The value of \textrm{d}y/\textrm{d}x is the derivative of y with respect to x, but in this context it is also called the slope of the tangent line. So, the derivative of a function at a certain point is the slope of the tangent line that point.

If we zoom back out again, eventually the graph of y = x^2 no longer looks like a line; we can see its curvature. The tangent line tracks the graph for a while, but eventually diverges. The red line shown below is the tangent line to the parabola. The derivative of x^2 with respect to x is 2x, so the slope of this tangent line through (2,4) is 2x = 2*2 = 4. To find the equation for tangent line itself, we choose the line with the specified slope that goes through the point. That would be y = 4x-4.


Elementary geometry tells us that the tangent to a circle is perpendicular to the radius. Let’s combine this fact with some calculus.

If we have a circle at the origin, the slope to a point (x,y) on the circle is y/x.

The circle is given by x^2 + y^2 = R^2. Applying \textrm{d} to both sides gives 2x\textrm{d}x + 2y\textrm{d}y = 0 (because \textrm{d}R = 0). This simplifies to

\frac{\textrm{d}y}{\textrm{d}x} = -\frac{x}{y}

Which is the slope of the tangent line.

Since this is perpendicular to a line of slope y/x, we see that perpendicular lines have negative-reciprocal slopes, a fact familiar from algebra.

Square Roots

If you want to estimate the square root of a number n, a good way is take a guess g, then average g with n/g. For example, to find the square root of 37, guess that it’s 6, then take the average of 6 and 37/6.

\frac{6 + 37/6}{2} = 6.0833

The actual answer is about 6.0828. It’s close. To get closer, iterate.

\frac{6.0833 + 37/6.0833}{2} = 6.08276256

The actual answer, with more accuracy, is 6.08276253. So we’ve got 7 decimal places of accuracy after two iterations of guessing.

Calculus shows us where this comes from. We are estimating \sqrt{n}. That is a zero of x^2 - n. So we plot y = x^2 - n (here, n = 37).

We don’t know where the zero is, but we know that x = 6 is near the zero. So we draw the tangent line to the graph at x = 6. This tangent is y = 12x - 73.

The tangent line tracks the parabola quite closely for the very short \textrm{d}x from the point x = 6 to wherever the zero is. So closely that we can’t even see the difference there. Zoom in near the point (6,-1).

Now we see that the tangent line is a very good approximation to the parabola near the zero, so we can approximate the zero using the zero of the tangent line instead of the zero of the parabola. The zero of the tangent line is given by

0 = 12x - 73

x = 6.0833

This is our first new guess for the zero of the parabola. It’s off, but only by a tiny bit, as this even-more-zoomed picture shows. We’ve zoomed in so closely that the original point (6,-1) is no longer visible.

From here, we can iterate the process by drawing a new tangent line like this:

We’ve zoomed in even closer. The red line is the tangent that gave us our first improved guess of 6.0833. Next, we drew a new tangent (purple) to the graph (blue) at the location of the improved guess to get a second improved guess, which is again so close we can’t even see the difference on this picture, despite zooming in three times already.

This general idea of estimating the zeroes of a function by guessing, drawing tangents, and finding a zero of the tangent, is called Newton’s method.


  1. Graph y = \sin x and find the places where the tangent line slices through the graph, rather than lying completely above or below it near the point of tangency. What is unique about the derivative at these points? (Answer: the derivative is at a local minimum or maximum (i.e. the graph is steepest) when the tangent line slices through)
  2. Find the slope of the tangent line to a point (x,y) on the ellipse (x/a)^2 + (y/b)^2 = 1 via calculus. Find it again by starting with the unit circle x'^2 + y'^2 = 1, for which you already know the slope of the tangent, and making appropriate substitutions for x' and y'. (Answer: \textrm{d}y/\textrm{d}x = -x/y * (b/a)^2)
  3. In this post, we found that y = 4x - 4 is tangent to y = x^2 at (2,4). Confirm this without calculus by noting that there are many lines through (2,4), all with different slopes. The thing that singles out the tangent line is that it only intersects the parabola once. Any line through (2,4) with a shallower slope than the tanget will intersect the parabola at (2,4), but intersect again somewhere off to the left. Any line with a steeper slope will have a second intersection to the right. Use algebra to write down the equation for a line passing through (2,4) with unknown slope, and set its y-value equal to x^2 to find the intersections with the parabola. What slope does the line need to have so that there is only one such intersection?
  4. Do the previous exercise over for a circle (i.e. use algebra to find the tangent line to a circle)
  5. For any point outside a circle, there are two tangents to the circle that pass through the point. When are these tangents perpendicular? (Answer: When the point is on a circle with the same center and radius \sqrt{2} as much)
  6. Newton’s method of estimating zeroes gave the same numerical answer for the zero of x^2 - 37 as the algorithm for estimating square roots gave for \sqrt{37}. Show that this is always the case (i.e. perform Newton’s method on y = x^2 -n with a tangent at some point g, and show that the new guess generated is the same as that given in the algorithm).
  7. Use Newton’s method to estimate 28^{1/3} to four decimal places (Answer: 3.0366).

Visualizing Elementary Calculus: Differentiation Rules 1

March 27, 2011

The basic rules of differentiation are linearity, the product rule, and the chain rule. Once we start graphing functions, we’ll revisit these rules.

This Series
I – Introduction
II – Trigonometry
III – Differentiation Rules


The linearity of differentials means

\textrm{d}(\alpha u + \beta v) = \alpha \textrm{d}u + \beta \textrm{d}v

\alpha and \beta are constants, while u and v might change.

This looks obvious, but here’s a quick sketch.

First we’ll look at \textrm{d}(\alpha u). Construct a right triangle with base 1 and hypotenuse \alpha. Then extend the base by length u. This creates a larger, similar triangle. The hypotenuse must be \alpha times the base, so the hypotenuse is extended by \alpha u.

Then increase u by \textrm{d}u. This induces an increase \textrm{d}(\alpha u) in the hypotenuse.

We draw an original blue triangle with base 1 and hypotenuse alpha. Then it's extended to the dark green triangle, adding u to the base and alpha*u to the hypotenuse. Finally, we increment u by du and observe the effect.

The little right triangle made by \textrm{d}u and \textrm{d}(\alpha u) is similar to the original, so

\frac{\textrm{d}(\alpha u)}{\textrm{d}u} = \frac{\alpha}{1}


\textrm{d}(\alpha u) = \alpha \textrm{d}u

Next look at \textrm{d}(u + v). u+v is just two line segments laid one after the other. We increase the lengths by \textrm{d}u and \textrm{d}v and see what the change in the total length \textrm{d}(u+v) is.

The total change is equal to the sum of the changes.

\textrm{d}(u + v) = \textrm{d}u + \textrm{d}v

These rules combine to give the rule for linearity

\textrm{d}(\alpha u + \beta v) = \alpha \textrm{d}u + \beta \textrm{d}v

The Product Rule

The product rule is

\textrm{d}(uv) = u\textrm{d}v + v\textrm{d}u

To show this, we need a line segment with length uv.

Start by drawing u, then drawing a segment of length 1 starting at the same place as u and going an arbitrary direction.

Close the triangle. Extend the segment of length 1 by v, and close the new triangle. We’ve now extended the base by uv.

Construction of length u*v, by similar triangles.

Increase u by \textrm{d}u and v by \textrm{d}v. This results in several changes to uv.

The segment uv has a little bit chopped off on the left, since \textrm{d}u cuts into the place where it used to be.

uv is also extended twice on the right. The first extension is the projection of \textrm{d}v down onto the base. All such projections multiply the length by u, so the piece added is u\textrm{d}v.

Finally there is a piece added from the very skinny tall triangle. It is similar to the skinny, short triangle created by adding \textrm{d}u to u. The tall triangle is (1+v) times as far from the bottom left corner as the short one, so it is (1+v) times as big. Since the base of the short one is \textrm{d}u, the base of the tall one is (1+v)\textrm{d}u.

Combining all three changes to uv, one subtracting from the left and two adding to the right, we get

\textrm{d}(uv) = -\textrm{d}u + u\textrm{d}v + (1+v)\textrm{d}u = u\textrm{d}v + v\textrm{d}u

This is the product rule. We’ll give another visual proof in the exercises.

The Chain Rule

Suppose we want \textrm{d}\sin x^2. (There’s no particular reason I can think of to want that, but we have a limited milieu of functions at hand right now.)

We know \textrm{d}(\sin\theta) = \cos\theta\textrm{d}{\theta}. Let \theta = x^2.

\textrm{d}(\sin x^2) = \cos(x^2)\textrm{d}(x^2)

But we already know that \textrm{d}(x^2) = 2x\textrm{d}x, so substitute that in to get

\textrm{d}(\sin x^2) = \cos(x^2)2x\textrm{d}x

This is called the chain rule. A symbolic way to right it is

\frac{\textrm{d}f}{\textrm{d}t} = \frac{\textrm{d}f}{\textrm{d}x}\frac{\textrm{d}x}{\textrm{d}t}

Suppose you are hiking up a mountain trail. f is your height above sea level. x is the distance you’ve gone down the trail. t is the time you’ve been hiking.

\textrm{d}f/\textrm{d}t is the rate you are gaining height. According to the chain rule, you can calculate this rate by multiplying the slope of the trail \textrm{d}f/\textrm{d}x to your speed \textrm{d}x/\textrm{d}t.


  • Show that the linearity rule \textrm{d}(\alpha u) = \alpha \textrm{d}u is a special case of the product rule.
  • What is the derivative of A\sin\theta + C\cos\theta with respect to \theta? Take the derivative with respect to \theta of that. (This is called a “second derivative”.) What do you get? (Answer: -1 times the original function)
  • Use the product rule to prove by induction that the derivative of x^n is n x^{n-1} for all positive integers n.
  • Apply the product rule to x^nx^{-n} = 1 to prove that the “power rule” from the previous question holds for all integers n.
  • Look back at the arguments from the introduction. Draw a rectangle with one side length u and one side length v. Its area is uv. Use this to prove the product rule.
  • Apply the chain rule to (x^{1/n})^{n} = x to find the derivative of x^{1/n} with respect to x for all integers n (Answer: \frac{1}{n} x^{1/n -1})
  • Argue that the derivative of x^{p/q} = \frac{p}{q}x^{p/q - 1} for all rational numbers p/q.
  • Show that the derivative of a polynomial is always another polynomial. Is there any polynomial that is its own derivative? (Answer: no, except zero)
  • Combine the product rule with the chain rule to prove the quotient rule \textrm{d}\frac{u}{v} = \frac{u\textrm{d}v - v\textrm{d}u}{v^2}

A Non-mathematician’s Non-apology

March 26, 2011

After finishing this post about the derivative of the sine function, I decided to hunt around online to see how common its approach is.

It’s not common. Most sites take the derivative of sine by considering

\frac{\textrm{d}(\sin\theta)}{\textrm{d}\theta} = \lim_{\Delta\theta \to 0}\frac{\sin(\theta + \Delta \theta) - \sin(\theta)}{\Delta \theta}

and working from there.

Eventually, after wading through three pages of results, I found another write-up of the geometric argument from, of all places, a site called Biblical Christian World View. It is apparently the personal site of a guy who’s good at math and also thinks it makes sense to write things like,

I illustrated Biblical truths with mathematical expressions. For an example, I illustrated the Biblical truth, “With God, nothing is impossible” as “two negatives equate to a ringing positive.” In the arithmetic of negative numbers -(-7) = +7! Two negatives equal a positive.

So. There’s that.

But just a little further along the Google results I found one more presentation of the same idea. This one is from Victor J. Katz, a mathematician who wrote a book about the history of math, and was writing from the historical point of view.

His article is much better than mine. The proof is clearer and surrounded with tons of other insight.

Katz delightfully points out how great a term “arcsine” is – it’s the length of the arc associated with that value of the sine function. Then, at the end, he gives Leibniz’ original argument that y = \sin\theta satisfies \frac{\textrm{d}^2 y}{(\textrm{d}\theta)^2} = -y, and it’s crazy! Differentials are applied willy-nilly and manipulated algebraically in ways nobody does any more. I felt disoriented at first, adapting to this new way of thinking about calculus, and then wondered why I’d never seen it until now.

It’s true that there are a lot of old techniques no one uses, and that’s because now we have better ones. Indeed, modern analysis, with its deltas and epsilons, is much better, mathematically, than manipulating differentials in dubious ways. It’s rigorous and logical.

It’s also hard. I’ve been asked to teach delta-epsilon proofs to quite a few people, and I’ve never been able to get it across. I’m giving up on that for beginners. I am going to teach the geometry stuff, and I’m not going to feel guilty about it.

It is okay to learn a thing the wrong way the first time. That first pass is only there to get you used to the main ideas, and the main idea a calculus is applying derivatives, integrals, and series. It is not the mean value theorem.

Once you learn a rough version, you practice it in the field until you’re comfortable. Do some physics. Learn some differential equations. After all that, it’s nice to come back, study calculus again, and finally understand all that’s really going on.

Actually, I like it better that way. Lots of my college classes made me think, “Oh, wow – so that’s what was behind the curtain!” But if you had shown me all the wheels and gears up front, I’d have been too busy checking how each one fit into the next to see what they accomplished.

A case-in-point is linear algebra. I remember almost nothing from my freshman linear algebra course. It wasn’t a bad course, but it was rigorous, proving theorems from the axioms of vector spaces, and it was beyond the level I was ready for at the time.

A couple years later, I found I really did need to know linear algebra to get through quantum mechanics, so I watched Gilbert Strang’s video lectures, which are far more concrete.

They were wonderful. I understood what was happening. I could do all the calculations and answer all the conceptual questions.

Then, finally, I went back to read Sheldon Axler’s Linear Algebra Done Right, a book that goes back again to the axioms-of-a-vector-space point of view, and thought it was wonderful.

Keith Devlin disagrees. Devlin takes up multiplication, claiming one should not tell young children that multiplication is repeated addition. Multiplication is its own fundamental operation. (The field axioms treat multiplication and addition independently.)

I was taught multiplication as repeated addition as a child, and then retaught multiplication as an fundamental operation in college. Do you know how confused I was by that? None. Zero confusion ever. In fact I never even noticed the discrepancy until Devlin pointed it out. I thought about multiplication as repeated addition when it was convenient, and thought about it as multiplication when that was convenient, and never realized I was switching.

I do the same for the geometric and analytic modes of thinking about calculus now. When I’m solving a physics problem, I don’t even notice whether I’m doing calculus or algebra at a given moment – it’s all just problem solving.

Why, then, do introductory calculus classes spend a month learning limits? Better just to ignore them and press on to the good stuff. There will be time later for learning what the difference between “continuous”, “differentiable”, and “smooth” is – modern medical science is working new miracles all the time.

Visualizing Elementary Calculus: Trigonometry

March 26, 2011

Here we’ll find the derivatives of trigonometric functions. The goal is to reinforce the idea of \textrm{d} as a thing that means “a little bit of” and grant some new insight into why these derivatives are what they are. The first argument is based on the preface of Tristan Needham’s Visual Complex Analysis. I haven’t read the bulk of it, but the preface is good.

This series
I – Introduction
II – Trigonometry

The Sine Function

Let’s find \textrm{d}(\sin\theta) / \textrm{d}\theta. The sine function is the height of a right triangle in the unit circle. We’ll draw it, and add a little change in \theta. This induces a change in \sin\theta. The change in \theta is called \textrm{d}\theta and the change in \sin\theta is called \textrm{d}(\sin\theta).

We show the sine of an angle as the dark blue line. The change in the sine when we change the angle slightly is the light blue line.

The interesting part is \textrm{d}\sin\theta, so we’ll zoom in there in the next picture. Before we do, remember that the arc length along a piece of the unit circle is equal to the angle it subtends. This will tell us the length of the little piece of the circumference near \textrm{d}\sin\theta. Also remember that we’re imagining \textrm{d}\theta to get smaller and smaller, until the two radii in the picture are parallel. We get this:

The interesting region is blown up to large size. The black line d(theta) is part of the edge of the circle. The angles marked are congruent to theta.

The section of the circle is \textrm{d}\theta long. It looks like a straight line because we are zoomed in close, like the horizon at the beach. You can use some geometry to show that the angles marked are congruent to \theta.

Looking at the right triangle formed, we can use the definition of the cosine function to read off

\frac{\textrm{d}(\sin\theta)}{\textrm{d}\theta} = \cos\theta

which is the derivative of the sine function.

Motion on the Unit Circle

Another way to view these derivatives is to imagine a point moving around the outside edge of the unit circle with speed one. Its location as a function of time is (\cos t, \sin t).

Its velocity is tangent to the circle and length one. Let’s draw the velocity vector right at the point, and then also translate it to the origin.

The position of the point is the red vector r. Its velocity is the green tangent v, which has also been copied to the origin.

We want to know the coordinates of \vec{v}. That’s not too hard; \vec{v} is a quarter-circle rotation of \vec{r}. Draw in the components of \vec{r}, and rotate those components to get \vec{v}. The x-component of the position becomes the y-component of the velocity, and the y-component of the position becomes minus one times the x-component of the velocity.

The components of the position get rotated a quarter turn to make the components of the velocity.

The derivative of position is velocity, and so comparing components between the position and velocity vectors, we get

\frac{\textrm{d}(\cos\theta)}{\textrm{d}\theta} = -\sin\theta

\frac{\textrm{d}(\sin\theta)}{\textrm{d}\theta} = \cos\theta


  • Look back at the first derivation we gave that \textrm{d}(\sin\theta)/\textrm{d}\theta = \cos\theta. Rework it to find derivatives of the other five trig functions. You might want to note that one way to interpret \tan\theta and \sec\theta is

The tangent and secant of an angle are side lengths of a right triangle with "adjacent" side length one.


  • Look back at the argument about a dot moving around a circle. Consider a larger circle to find the derivative of 5\sin\theta with respect to \theta. (Answer: 5\cos\theta)
  • Suppose the dot moving around the edge of the circle is going three times as fast. What does this mean for the derivative of \sin(3 t) and \cos(3 t) with respect to t? Remember that the velocity must still be perpendicular to the position, but not necessarily unit length and more. (Answer: the derivative of \sin(3 t) with respect to t is 3\cos(3 t).
  • Suppose the dot is moving at a variable speed v(t) = t, so that it keeps getting faster. Then the y-coordinate of the position is \sin(\frac{1}{2}t^2). Again, the velocity is perpendicular to position, but its length is changing. What is the derivative of \sin(\frac{1}{2}t^2) with respect to t? (Answer: t\cos(\frac{1}{2}t^2)

Visualizing Elementary Calculus: Introduction

March 25, 2011

Recently I’ve been trying to be more geometrical when discussing elementary calculus with high school students. I don’t want to write an entire introduction to calculus, but the next few posts will outline some ways I think the geometric view can be helpful.

This series
I – Introduction
II – Trigonometry

You know about \Delta, which means “the change in”. For example, if w represents my weight, then -\Delta w represents the weight of the poop I just took.

Let’s say h is your height above sea level. \Delta h is the change in that height, but what change? The change when you climb the stairs? When you jump out of a plane? When you step on a banana peel?

When we think about change, we usually think about two things changing together. You get higher when you climb another stair on the staircase. h is changing, and so is s, the number of stairs climbed.

These two changes are related to each other. Say the stairs are 10 cm high. Then you gain 10 cm of height for each stair. We can write that as \Delta h = 10 {\rm cm} \hspace{.5em} \Delta s. We can also write it \Delta h / \Delta s = 10 \hspace{.5em}{\rm cm}. This says, “the height per stair is ten centimeters.”

This is the goal of calculus – to study the relationships between changing quantities. Let’s do a real example.

The Area of a Square

Let’s say we have a square whose sides lengths are x. Its area is x^2. What is the relationship between changes in its area and changes in the length of a side? Draw the square, then expand the sides some. The amount the sides have expanded is \Delta x. The new area that’s been added is \Delta (x^2).

We begin with the red square on the left, whose area is x^2. We add an extra amount Delta(x) to the sides, creating all the new green area.

From the picture we see

\Delta(x^2) = 2x\Delta x + (\Delta x)^2

This formula relates \Delta (x^2), the change in the area, to \Delta x, the change in the length of a side.

The Derivative of x^2

In the picture of the square, there is a little piece in the upper-right corner whose area is (\Delta x)^2. It is the smallest bit of area in the whole picture.

Look what happens when we make \Delta x even smaller.

We shrink Delta(x) and observe what happens to the different areas being added on.

In the first picture, \Delta x (no longer marked) is a quarter of x. (\Delta x)^2 is the dark green area, and it is one quarter as large as x \Delta x, the light green area. We see this because the dark patch fits inside the light one four times.

In the second picture, we shrink \Delta x to one eighth of x. All the green areas shrink, but the dark patch shrinks on two sides while the light patches shrink on only one. As a result, the dark (\Delta x)^2 is now only one eighth the size of the light x \Delta x.

If we continued to shrink \Delta x, this ratio would continue to decrease. Eventually we could tile the dark patch a million times into the light one. So, as long as \Delta x is very small, we can get a good estimate of the entire green area by ignoring the dark part (\Delta x)^2. Thus

\Delta(x^2) \approx 2x\Delta x

This approximation becomes better and better as \Delta x shrinks, becoming perfect as \Delta x becomes infinitesimally small.

When we want to indicate these infinitely small changes, we trade in the \Delta for a {\rm d} and write

\textrm{d}(x^2) = 2x \textrm{d}x

The terms \textrm{d}(x^2) and \textrm{d}x are called “differentials”. The equation expresses the relationship between two infinitely-small changes, one in x and one in x^2.

Frequently, we divide by \textrm{d}x on both sides to get

\frac{\textrm{d}(x^2)}{\textrm{d}x} = 2x

This is called “the derivative of x^2 with respect to x“.

Example 1: Estimating Squares

20^2 = 400. What is 21^2?

Here x = 20, and we’re looking at x^2. When x goes from 20 to 21, it changes by 1, so \textrm{d}x = 1. Our formula tells us

\textrm{d}(x^2) = 2x \textrm{d}x = 2*20*(1) = 40

Hence, x^2 increases by about 40, from 400 to 440.

The real value is 441. We got the change in x^2 wrong by about 2%. That’s because \textrm{d}x wasn’t infinitely small.

Let’s try again, this time estimating the square of 20.00458. Now \textrm{d}x = .00458, so

\textrm{d}(x^2) = 2 x \textrm{d}x = 2*20*.00458 = .1832

The estimate is 400.1832. The real value is 400.183221. We did much better, under-estimating the change by only 0.01% this time. Also, it was not much harder to do this problem than the last, but squaring out 20.00458 by hand would be a pain. We saved some work.

Example 2: How Far Is the Horizon?

The beach is a good place to think about calculus. If you look out at the ocean, the horizon appears perfectly flat. Nonetheless, we know the Earth is really curved. In fact, we can deduce the curvature of the Earth by standing on the beach and enlisting the help of a friend in a boat.

It works like this: You stand on the beach with your head two meters above the water. Your friend sails away until the boat begins to disappear from sight. The reason the bottom of the boat is disappearing is that it is hidden behind the curvature of Earth.

When the bottom of the boat disappears, measure the distance to some part of the boat you can still see. What’s the relationship between your height, the distance to the boat, and the radius of Earth?

A picture will help. We’ll call your height h and the distance to the horizon z.

You are the vertical stick on top, height h. The boat is the brown circle. It's at the horizon, a distance z away. The dotted line shows your line of sight. When the bottom of the boat begins disappearing, a right triangle forms.

Your height, the radius of Earth, and the distance to the horizon are related by the Pythagorean theorem to give

R^2 + z^2 = (R+h)^2

this is equivalent to

z^2 = 2Rh + h^2

As we have seen, if your height h is small compared to the size of the Earth (and it is), the term h^2 drops away and the distance to the horizon is

z = \sqrt{2Rh}

You can see about 5 {\rm km} at the beach, making the radius of Earth about 6,000 {\rm km}. (It’s actually 6378.1 {\rm km}).

Next we want to know how much further you can see if you stand on your tiptoes. That would be a small change \textrm{d}h to your height. It would let you see a small amount \textrm{d}z further. How is \textrm{d}h related to \textrm{d}z?

We already know

\textrm{d}(x^2) = 2x\textrm{d}x

So let x^2 = h, or x = \sqrt{h}, and we have

\textrm{d}h = 2\sqrt{h}\hspace{.3em}\textrm{d}(\sqrt{h})

But we also know

\sqrt{h} = \frac{z}{\sqrt{2R}}

so we can substitute that in to \textrm{d}(\sqrt{h}) and get

\textrm{d}h = 2\sqrt{h}\hspace{.3em}\textrm{d}\left(\frac{z}{\sqrt{2R}}\right)


\frac{\textrm{d}z}{\textrm{d}h} = \sqrt{\frac{R}{2h}}

This tells us how much further you can see if you get a little higher up. The interesting thing is it depends on h. The higher you go, the smaller \textrm{d}z. When you’re only two meters up, you get to see almost ten meters further out for every centimeter higher you go. However, if you’re 100m up on top a carousel, you get only 1 meter for each centimeter you rise.

It makes sense that the extra distance you see gets smaller and smaller the higher you go, and eventually shrinks down to zero. No matter how high you go, you can never see more than a quarter way around the globe.

(In reality, light bends due to refraction in the atmosphere, so you can sometimes see a bit further.)


Suppose we have a circle with radius r. It has a certain area (you undoubtedly know the formula already, but play along). Suppose we increase r by a small amount \textrm{d}r. What is the change \textrm{d}A in the area?

The original circle is dark blue with area A and radius R. The radius increases an amount dR, increasing the area by the light blue ring with area dA.

\textrm{d}A is the thin, light-blue ring. Imagine taking that ring and peeling it off the edge of the circle and laying it flat. We’d have a rectangle with width \textrm{d}R. Its length comes from the outside edge of the entire circle – the circumference. The circumference is 2 \pi R, so

\textrm{d}A = 2\pi R \textrm{d}R

We saw earlier that \textrm{d}(x^2) = 2x\textrm{d}x, so let x = R and we have

\textrm{d}A = \pi \textrm{d}(R^2)

Thus the quantities A and \pi R^2 change in exactly the same way. Since they also start out the same (both zero when R is zero), we have

A = \pi R^2

Next Post

We’ll look at trigonometry. Geometric arguments about the derivatives of trig functions are very simple ways of visualizing what’s going one, and are usually not introduced in a basic calculus course.


  • Draw a cube with sides x and show that \textrm{d}(x^3) = 3x^2\textrm{d}x. Thus the derivative of x^3 with respect to x is 3x^2.
  • Draw a line with length x and show that \textrm{d}(x) = \textrm{d}x, which is of course algebraically obvious. Thus the derivative of x with respect to itself is 1.
  • Draw a rectangle with width w and length c*w and show that \textrm{d}(c*w^2) = 2cw\textrm{d}w = c\textrm{d}(w^2). Thus, whenever you have the differential of a variable multiplied by a constant, the constant can pop outside. Where was this property used implicitly in this post?
  • Now that you know \textrm{d}(x^3) = 3x^2\textrm{d}x, let x^3 = u and find the derivative of u^{1/3} with respect to u. (Answer: \frac{1}{3} u^{-2/3})
  • What is \textrm{d}(x^3)/\textrm{d}(x^2)? Let u = x^2 and find the derivative of u^{3/2} with respect to u. (Answer: \frac{3}{2}u^{1/2}).
  • Examine \textrm{d}(x^4) by letting u = x^2, so we’re looking at \textrm{d}(u^2). Find the derivative of x^4 with respect to x. (Answer: 4x^3)
  • Draw an equilateral triangle with sides of length s. Increase the sides a small amount \textrm{d}s and relate this to the change in area \textrm{d}A. Does this agree with our previous findings?
  • Draw an ellipse with a fixed with semi-major axis a and semi-minor axis b. Starting with a unit circle, argue by thinking about stretching that the area of the ellipse is \pi ab. Increase a by a small amount \textrm{d}a and increase b proportionately. This adds a small area \textrm{d}A to the ellipse. Show that this area is \pi(a^2+b^2)/b\hspace{.3em}\textrm{d}a. Does this let us find the circumference of the ellipse by the same thought process as we used for the circle? (Answer: no). Why not?
  • Draw a sphere with radius R. Use the relationship between \textrm{d}R and \textrm{d}A to find the volume of a sphere, given its surface area is 4\pi R^2. Check your answer against this post.

Another Definite Integral

August 1, 2009

My students claimed they were doing a calculation that required

\int_{-\infty}^\infty e^{-x^4}dx.

I’m not sure what physical situation brought up such a question, but we can find the answer anyway. Let’s kill infinity birds with one stone by evaluating

\int_{0}^\infty x^{\alpha} e^{-x^\beta} dx.

and treating my students’ problem as a special case.

First define the gamma function by

\Gamma(n) \equiv \int_0^\infty t^{n-1}e^{-t}dt.

I have never understood why \Gamma(n) involves (n-1) as the power of t, rather than just n. It makes even less sense when you consider \Gamma(n) = (n-1)! for natural numbers n.

In the definition of the gamma function, make the substitution

t = x^\beta, t^{n-1} = x^{\beta n-\beta}, dt = \beta x^{\beta-1}

\begin{array}{rcl}\Gamma(n) & = & \int_0^\infty x^{\beta n-\beta}e^{-x^\beta}\beta x^{\beta-1}dx \\ { } & = & \int_0^\infty x^{n\beta-1}e^{-x^\beta}dx\end{array}

We can choose whatever we want for n, as long as we think we can find \Gamma(n). So let’s turn this into the original problem by substituting

\begin{array}{rcl}n\beta-1 & = & \alpha \\ n & = & \frac{\alpha+1}{\beta} \end{array}

Putting it all together:

\frac{1}{\beta}\Gamma\left(\frac{\alpha+1}{\beta}\right) =  \int_0^\infty x^\alpha e^{-x^\beta}dx.

So we understand these seemingly-more-complicated definite integrals equally as well as we understand the gamma function.

For the special case my students were interested in, which has \alpha = 0, \beta = 4, the integral goes from -\infty to \infty, so we need to multiply by two to get

\int_{-\infty}^{\infty} e^{-x^4}dx = \frac{1}{2}\Gamma(1/4).

A computer tells me this evaluates to about 1.8.

Why Do Taylor Series Converge? (part 1)

July 11, 2009

Although I use Taylor series regularly, I have never understood precisely why they work.

The basic idea is simple enough. Suppose, by magic, it just happens to be true that

f(x) = c_0 + \sum_{k=1}^\infty c_k (x-x_0)^k

for some infinite list of real numbers (c_0,c_1,...c_k,...).

Then by differentiating both sides of the sum n times, we have

\frac{d^nf}{dx^n} = c_n + \sum_{k=n+1}^\infty c_k \frac{k!}{k-n!} (x-x_0)^{k-n}

and for the particular value x = x_0, we can simplify this to

c_n = \frac{d^nf}{dx^n}. Plugging the c_n back into the original sum, we obtain the Taylor series for f about the point x_0.

This is not satisfying because, as I said, it involves magic when assuming we can write a function as an infinite sum of polynomials. What’s been worrying me in particular is the following:

Here is a rough picture of a smooth function.


Here is a different one.


These pictures are exactly the same most of the way – after I made the first one, I made the second one by erasing the last part and redoing it – giving it a twist at the end.

The problem is that if I calculate a Taylor series for these functions centered about x=0, I do it using solely local knowledge – just a bunch of derivatives right at that point.

In order to calculate derivatives at some point, you only need the value of the function there, and the values of the function within an infinitesimal neighborhood (by which I mean any finite neighborhood, no matter how small, is big enough). Is it really true that if I draw just a tiny little box around one part of a smooth function, and show you just what’s inside the box, you can tell me all the wild fluctuations the full function goes through anywhere from here to infinity?

If I only tweak the function way out on one of the ends, how are you supposed to know about it, just looking at your isolated little box? Shouldn’t my two example functions have the same Taylor series about x=0? Then they should be the same function. But they’re different.

Evidently, it is impossible to have two smooth functions that agree for a while, then go their separate ways. In fact, unless they’re the same function, it’s impossible for them to agree in any open interval at all. Why should this be true?

Let’s suppose the two smooth functions are f and g, and that they agree everywhere on the open interval (a,b). However, they do not have the same value everywhere. We’re looking for a contradiction in this setup.

Let’s define h as the difference f-g. Then h is smooth because f and g are. We also know h=0 on (a,b), but h is not zero everywhere. Form the set of all points x > a for which h(x) \neq 0. (If there are no such points, find all the points x<b where h(x) \neq 0 and flip the rest of this argument around to match.)

a is a lower bound for this set, and so the set must have a greatest lower bound. Because the real numbers are complete, the greatest lower bound must be a real number. Let's denote this number by c, so that now h=0 on (a,c), but h(c) \neq 0. This implies h is discontinuous, because lim_{x \to c-} = 0, but h(c) \neq 0. This contradicts with h being smooth.

We must conclude that the original assumption was false. If f and g are two smooth functions that agree on some open interval, they agree everywhere. (This is not intended to be a rigorous proof, since I am mathematically incompetent to produce one.)

So somehow, when I tried to make those two pictures that fit together smoothly, I messed up. When I erased the right hand end of the first one and drew a new ending in for it, I was supposed to match all the derivatives of the original function with my new one. I might have matched the first derivative, second derivative, and third derivative, but somewhere along that line I went awry.

The only way I could have fit my new function to the old one for all their derivatives is to have changed the old one just a little bit in the process. I would have had to change it at every value, including all the way over at x=0, even though I'm just trying to put a new little blip on the function way over at x = something\_big.

This tiny little change to the original function shows up in that box you were using to monitor a little chunk of the function, and if your spectacles are powerful enough, you can extrapolate to determine exactly what changes I've made out on the periphery.

This seems pretty nice. It eases some of my concerns. It doesn't prove that the Taylor series of a smooth function converges, but it does show that the concept, that of describing an entire smooth function by its local characteristics at a given point, makes sense.

If I want to prove that Taylor series converge, I now only have to do it on some infinitesimal interval. Then, so long as the series converges to some value, it must converge to the correct one.