When I was a kid, I collected baseball cards. I’ve stopped long since. I can no longer recite streams of players’ yearly RBI totals, and I think my parents threw the cards out when I was away at college. Still, one problem related to those baseball cards still interests me.
Cards were sold in packs of twelve, each containing a random sample selected from the complete set of seven hundred. With my one-dollar allowance and the fifty-cent price of a pack of a cards, how long should I have expected to invest my income before getting a complete set?
To get a complete set of 700 cards, I first need to get a card that’s not a repeat, then get another one that’s not a repeat, and so on 700 times. The first card I pull out of the first pack can’t be a repeat. After that, the second card has a 699/700 chance of being new. Once I have c different cards, the probability of the next one being distinct is (700-c)/700.
We need to know how many cards I expect to draw before getting a new one, when the probability of the next card being new is p. This is similar to the classic problem of how many times one expects to roll a die before getting a 4. The answer is just what you’d think – six times. One argument for this is that if I roll the die many times, then one sixth of the rolls will be 4’s, so the average number of rolls from one 4 to the next is six. That means the expected number of rolls to get a 4 is six. The general answer is 1/p.
So once I have c different cards, the expected number of cards I have to look at to get another new one is 700/(700-c). This makes my expected number of cards purchased to complete the set
Factoring out 700 and writing the terms backwards, this is
In other words, if there are n possible equally-likely outcomes to a random event, the expected value of the number of trials to get every value to occur is n times the nth harmonic number.
An image from Wikipedia shows that the harmonic numbers are approximated by the integral of 1/x, so that the answer scales with n as n*log(n).
For my baseball card problem, n=700 and I’d need about 4500 cards, or around 4 years’ allowance, to complete the set (I never made it).
The integral picture immediately suggests an improvement. The curve drawn is an underestimate – the rectangles always pop up higher than the curve. What is the approximate area left over above the curve?
A simple guess comes from taking each of those left-over areas to be a triangle. These triangles all have a base of 1. Their heights vary, but the nice thing is that the top of one triangle begins at the bottom of the next. Imagine sliding all the triangles over so they’re stacked on top each other. This would give a total height of 1-1/(n+1) for the triangles out to the nth harmonic number. So a better guess for the for the nth harmonic number is
This is also an underestimate because the area above the curve doesn’t really consist of triangles, as you can see. truly converges to . is the Euler-Mascheroni constant. It’s now know to 30 billion digits, but the most important ones are 0.577…
Browsing Wikipedia reveals a trove of information about these mathematical entities. For example, if you have ever tried to sum a geometric series, you saw that
If you integrate from zero to one, you get 1/n, so
This lets us define fractional harmonic numbers, and for simple fractions they involve things like and . Further down that page we learn that harmonic numbers are related to the Riemann zeta function and the Riemann hypothesis and a bunch of other things I don’t know about.
Why might I find this stuff interesting? If you have a closed physical system with many particles, there are many possible physical states of those particles. The ergodic hypothesis assumes that as the system evolves in time, it passes through all those states, and all states are equally likely to be accessed at some far future time (assuming we restrict attention to those states that obey conservation laws, like conservation of energy). So the harmonic numbers might give some insight into how long I have to wait to observe particular configuration of the system. Would the harmonic numbers be discovered by a Boltzmann brain?