When I was a kid, I collected baseball cards. I’ve stopped long since. I can no longer recite streams of players’ yearly RBI totals, and I think my parents threw the cards out when I was away at college. Still, one problem related to those baseball cards still interests me.

Cards were sold in packs of twelve, each containing a random sample selected from the complete set of seven hundred. With my one-dollar allowance and the fifty-cent price of a pack of a cards, how long should I have expected to invest my income before getting a complete set?

To get a complete set of 700 cards, I first need to get a card that’s not a repeat, then get another one that’s not a repeat, and so on 700 times. The first card I pull out of the first pack can’t be a repeat. After that, the second card has a 699/700 chance of being new. Once I have c different cards, the probability of the next one being distinct is (700-c)/700.

We need to know how many cards I expect to draw before getting a new one, when the probability of the next card being new is p. This is similar to the classic problem of how many times one expects to roll a die before getting a 4. The answer is just what you’d think – six times. One argument for this is that if I roll the die many times, then one sixth of the rolls will be 4’s, so the average number of rolls from one 4 to the next is six. That means the expected number of rolls to get a 4 is six. The general answer is 1/p.

So once I have c different cards, the expected number of cards I have to look at to get another new one is 700/(700-c). This makes my expected number of cards purchased to complete the set

Factoring out 700 and writing the terms backwards, this is

In other words, if there are n possible equally-likely outcomes to a random event, the expected value of the number of trials to get every value to occur is n times the n^{th} harmonic number.

An image from Wikipedia shows that the harmonic numbers are approximated by the integral of 1/x, so that the answer scales with n as n*log(n).

For my baseball card problem, n=700 and I’d need about 4500 cards, or around 4 years’ allowance, to complete the set (I never made it).

The integral picture immediately suggests an improvement. The curve drawn is an underestimate – the rectangles always pop up higher than the curve. What is the approximate area left over above the curve?

A simple guess comes from taking each of those left-over areas to be a triangle. These triangles all have a base of 1. Their heights vary, but the nice thing is that the top of one triangle begins at the bottom of the next. Imagine sliding all the triangles over so they’re stacked on top each other. This would give a total height of 1-1/(n+1) for the triangles out to the n^{th} harmonic number. So a better guess for the for the n^{th} harmonic number is

This is also an underestimate because the area above the curve doesn’t really consist of triangles, as you can see. truly converges to . is the Euler-Mascheroni constant. It’s now know to 30 billion digits, but the most important ones are 0.577…

Browsing Wikipedia reveals a trove of information about these mathematical entities. For example, if you have ever tried to sum a geometric series, you saw that

If you integrate from zero to one, you get 1/n, so

This lets us define fractional harmonic numbers, and for simple fractions they involve things like and . Further down that page we learn that harmonic numbers are related to the Riemann zeta function and the Riemann hypothesis and a bunch of other things I don’t know about.

Why might I find this stuff interesting? If you have a closed physical system with many particles, there are many possible physical states of those particles. The ergodic hypothesis assumes that as the system evolves in time, it passes through all those states, and all states are equally likely to be accessed at some far future time (assuming we restrict attention to those states that obey conservation laws, like conservation of energy). So the harmonic numbers might give some insight into how long I have to wait to observe particular configuration of the system. Would the harmonic numbers be discovered by a Boltzmann brain?

January 1, 2011 at 4:42 pm

Hi Mark,

nice post! I don’t know about Riemann hypothesis but harmonic numbers obviously approximate $\zeta(-1) = -1/12$. This sounds like nonsense because harmonic numbers tend to infinity but it has to do with clever tricks in analytic continuation and zeta function regularization that are often used in QFT. See this post of Terry Tao for a gentle (if quite rigorous) introduction: http://terrytao.wordpress.com/2010/04/10/the-euler-maclaurin-formula-bernoulli-numbers-the-zeta-function-and-real-variable-analytic-continuation/

As for the talk about ergodic hypothesis: well, ergodic hypothesis tells you that system spends amount of time in a given place of the system that is proportional to its volume. Because we have exp(S/k_B) microstates accessible for a given macrostate, you have to wait for exp((S_1 – S_2) / k_B) (units being some characteristic time-scale in the system) for a broken egg with entropy S_1 to fix itself into an unbroken egg with entropy S_2.

January 2, 2011 at 10:37 am

Hi Marek,

Nice to see you here! Thanks for the link. I have to do some background reading before I can follow everything there, though.

I think you are right about the stat mech connection – I was just looking for some sort of tie in with physics, since I actually did the calculation while thinking about physics, not baseball cards.

January 2, 2011 at 12:13 pm

Yeah, Terry’s post are not particularly easy to read for physicists. He often assumes quite a lot math background.

Hm, there could be some connection but I don’t see it yet and I’ve never heard about it. But perhaps you’d be interested in another connection of Riemann zeta function and statistical physics via zeta distribution. It’s the only reasonable probability distribution on primes, so to speak. See these posts of Qiaochu Yuan: http://qchu.wordpress.com/2010/11/09/zeta-functions-statistical-mechanics-and-haar-measure/ http://qchu.wordpress.com/2010/11/14/a-little-more-about-zeta-functions-and-statistical-mechanics/ Some parts of the posts are quite technical (like profinite completions of groups) but mostly it’s just basic probability and should be understandable. What I find most interesting in these posts is the primon gas: http://en.wikipedia.org/wiki/Primon_gas which is a very nice application of statistical mechanics to prime numbers. I don’t know whether this is interesting either for physics or math though. But I guess it could pop up somewhere.