Breaking pills
In a recent post, Mark Jason Dominus posed the following question:
Suppose you have a bottle that contains \(N\) whole pills. Each day you select a pill at random from the bottle. If it is a whole pill you eat half and put the other half back in the bottle. If it is a half pill, you eat it. How many halfpills can you expect to have in the bottle the day after you break the last whole pill?
This is a nice problem and stated in a natural setting. Superficially, I was reminded of (Feller’s) Banach’s matchbox problem.
Consider an “inbetween state”, after some number of days have passed. The bottle will have some number of whole pills, say \(w\), and some number of halfpills, say \(h\). We can describe this as the state \((w, h)\). From this state, we pick a pill at random, which means that:

with probability \(w/(w+h)\), we pick a whole pill and put back a halfpill, thus going to state \((w  1, h + 1)\).

with probability \(h/(w+h)\), we pick a halfpill and eat it, thus going to state \((w, h  1)\).
Here’s the same thing drawn as a picture; the numbers on the arrows are the probabilities:
(We used the letters “w” and “h” to stand for “whole pills” and “halfpills”, but conveniently, we can also remember “h” as “height”, maybe.)
So starting at state \((N, 0)\), the state \((w, h)\) evolves according to those probabilities, and we are interested in the value of \(h\) at the first time we hit \(w = 0\).
[TODO: some sort of animation / interactive applet here, showing evolution and the first time the path hits the \(w=0\) vertical line.]
Two useless(?) digressions
There exists a mathematical formalism that models such random processes, though it turns out to be useless for this problem. Namely, a Markov chain. This is when we have some set \(S\) of states, and from each state \(s\) in \(S\), we move with probability \(p_i\) to a new state \(s_{i}\) that’s also in \(S\). (Here the set \(S\) may be all pairs \((w, h)\) of integers, say, or those such that \((w+h)\) is bounded by \(N\), or something like that.) As MJD points out, the theory of Markov chains is more useful when we have a fixed set of states and we want to know the eventual behaviour after a very large number of steps, but that’s not what we want here. (The number of states increases with \(N\), and eventually we’ll always end up in the \((0, 0)\) state.)
There’s another formalism, a special case of Markov chains, that is more relevant, though it was still not useful to me. Namely, the theory of an urn process. The following is from page 529 of Analytic Combinatorics by Flajolet and Sedgewick:
an urn model with two colours is determined by a \(2 × 2\) matrix with integer entries: \[\mathcal{M} = \begin{pmatrix} \alpha & \beta \cr \gamma & \delta \end{pmatrix}, \quad \alpha, \delta \in \mathbb{Z},,, \beta, \gamma \in \mathbb{Z}_{≥0}.\] At any instant, if a ball of the first colour is drawn, then it is placed back into the urn together with \(\alpha\) balls of the first colour and \(\beta\) balls of the second colour; similarly, when a ball of the second colour is drawn, with \(\gamma\) balls of the first colour and \(\delta\) balls of the second colour.
So we can model the whole pills and halfpills as two colours of balls, and thus our random process as the urn process determined by the matrix \[\begin{pmatrix} 0 & 1 \cr 0 & 0 \end{pmatrix}\]
—every time we pick a whole pill we add a halfpill to the bottle, and every time we pick a halfpill we return nothing to the bottle. Unfortunately I don’t know anything about urn processes besides this definition, and I couldn’t find a way to use this.
End of digression
Fortunately, it is not too hard to just solve this problem from scratch. Recall the illustration from earlier:
[TODO: either reproduce earlier animation or a small variant.]
So for each starting point \((w, h)\), there’s a probability distribution of the first state \((0, X)\) that is reached: a probability that this happens at \(X=0\), a probability that that this happens at \(X=1\), and so on. There’s a nice way of encapsulating a probability distribution (on nonnegative integers) into a single “object”, and that’s a probability generating function: using \(X\) to denote a random variable sampled from this distribution, we can write down the (formal power series)
\[P(z) = \Pr[X=0] + \Pr[X=1]z + \Pr[X=2]z^2 + \Pr[X=3]z^3 + \ldots\]
in which the probability of a particular nonnegative integer \(k\) is “stored” as the coefficient of \(z^k\).
In our problem, for each point \((w, h)\), we’ll encapsulate the corresponding probability distribution as \(P_{w, h}(z)\). This is a polynomial of degree \(w + h\), as the coefficient of \(z^k\) is \(0\) for \(k > w + h\) (we’ll never end up with more halfpills than the total number of pills we started with).
Each \(P_{w, h}(z)\) can be computed easily:

For \(w = 0\), we have \(P_{0, h}(z) = z^h\) as we’ve already hit \(w=0\) so the probability distribution is simply the current \(h\) with probability \(1\).

For \(w > 0\), we have \[P_{w, h}(z) = \frac{w}{w+h} P_{w1, h+1}(z) + \frac{h}{w+h} P_{w, h1}(z)\] where on the RHS we take the second term to be \(0\) if \(h=0\) (that is, as the product is going to be \(0\), we won’t bother ourselves with the question of what \(P_{w, 1}(z)\) is… or we could just define that arbitrarily, or whatever).
For example, in Sage, we can compute it as follows:
var('z')
from collections import defaultdict
p = defaultdict(dict) # So that we can refer to p[w][h] without error.
N = 31
for h in range(N): p[0][h] = z**h
for w in range(1, N):
w = Integer(w)
for h in range(N  w):
p[w][h] = w/(w+h)*p[w1][h+1] + h/(w+h)*(p[w][h1] if h else 0)
For example, we get
\[P_{2,3}(z) = \frac{163}{600}z + \frac{163}{600}z^{2} + \frac{133}{600}z^{3} + \frac{31}{200}z^{4} + \frac{2}{25}z^{5}\]
Hover over a point \((w, h)\) below to see the corresponding polynomial (thus, the probability distribution) for that point:
[TODO: create this interactive applet]
We started with \(P_{0, h} = z^h\), and it turns out that \(P_{1, h} = \frac{1}{h+1}(z + z^2 + \dots + z^h)\), but for \(w \ge 2\) the coefficients of these polynomials seem quite arbitrary, and I couldn’t find them on OEIS, let alone find a closed form.
But if we only care about the expected value, then it turns out to have a nice expression. For a probability generating function as before, namely
\[P(z) = \Pr[X=0] + \Pr[X=1]z + \Pr[X=2]z^2 + \Pr[X=3]z^3 + \ldots\]
we can get the expected value as the derivative of \(P\), evaluated at \(1\):
\[\begin{align}
P'(z) &= \Pr[X=1] + 2\Pr[X=2]z + 3\Pr[X=3]z^2 + \ldots \\\
P'(1) &= \Pr[X=1] + 2\Pr[X=2] + 3\Pr[X=3] + \ldots \\\
&= E[X]
\end{align}\]
So we can simply compute, additionally to the previous code:
E = defaultdict(dict)
for w in range(1, N):
for h in range(N  w):
E[w][h] = derivative(p[w][h])(z = 1)
This works, and the numbers turn out to be something interesting: see if you can notice the pattern below:
[TODO: make an interactive thing to see the values as fractions, replacing this table below.]
\begin{array}{cccccccccccc}
& h=0 & h=1 & h=2 & h=3 & h=4 & h=5 & h=6
& h=7 & h=8 & h=9 & h=10 \\\
w=1 & 1 & \frac{3}{2} & 2 & \frac{5}{2} &
3 & \frac{7}{2} & 4 & \frac{9}{2} & 5 &
\frac{11}{2} & 6 \\\
w=2 & \frac{3}{2} & \frac{11}{6} & \frac{13}{6} &
\frac{5}{2} & \frac{17}{6} & \frac{19}{6} &
\frac{7}{2} & \frac{23}{6} & \frac{25}{6} &
\frac{9}{2} & \frac{29}{6} \\\
w=3 & \frac{11}{6} & \frac{25}{12} & \frac{7}{3} &
\frac{31}{12} & \frac{17}{6} & \frac{37}{12} &
\frac{10}{3} & \frac{43}{12} & \frac{23}{6} &
\frac{49}{12} & \frac{13}{3} \\\
w=4 & \frac{25}{12} & \frac{137}{60} & \frac{149}{60}
& \frac{161}{60} & \frac{173}{60} & \frac{37}{12}
& \frac{197}{60} & \frac{209}{60} & \frac{221}{60}
& \frac{233}{60} & \frac{49}{12} \\\
w=5 & \frac{137}{60} & \frac{49}{20} & \frac{157}{60}
& \frac{167}{60} & \frac{59}{20} & \frac{187}{60}
& \frac{197}{60} & \frac{69}{20} & \frac{217}{60}
& \frac{227}{60} & \frac{79}{20} \\\
w=6 & \frac{49}{20} & \frac{363}{140} &
\frac{383}{140} & \frac{403}{140} & \frac{423}{140} &
\frac{443}{140} & \frac{463}{140} & \frac{69}{20} &
\frac{503}{140} & \frac{523}{140} & \frac{543}{140} \\\
w=7 & \frac{363}{140} & \frac{761}{280} &
\frac{199}{70} & \frac{831}{280} & \frac{433}{140} &
\frac{901}{280} & \frac{117}{35} & \frac{971}{280} &
\frac{503}{140} & \frac{1041}{280} & \frac{269}{70} \\\
w=8 & \frac{761}{280} & \frac{7129}{2520} &
\frac{7409}{2520} & \frac{2563}{840} & \frac{7969}{2520}
& \frac{8249}{2520} & \frac{2843}{840} &
\frac{8809}{2520} & \frac{9089}{2520} & \frac{1041}{280}
& \frac{9649}{2520} \\\
w=9 & \frac{7129}{2520} & \frac{7381}{2520} &
\frac{7633}{2520} & \frac{1577}{504} & \frac{8137}{2520}
& \frac{8389}{2520} & \frac{8641}{2520} &
\frac{8893}{2520} & \frac{1829}{504} & \frac{9397}{2520}
& \frac{9649}{2520} \\\
w=10 & \frac{7381}{2520} & \frac{83711}{27720} &
\frac{86231}{27720} & \frac{88751}{27720} &
\frac{91271}{27720} & \frac{93791}{27720} &
\frac{96311}{27720} & \frac{98831}{27720} &
\frac{101351}{27720} & \frac{103871}{27720} &
\frac{106391}{27720} \\\
\end{array}
It turns out that there’s a simple closedform expression:
\[E(w, h) = H_w + \frac{h}{w + 1}\]
where \(H_w\) stands for the harmonic number \[H_w = 1 + \frac12 + \frac13 + \dots + \frac1w.\]
This is easy to prove by induction:

For \(w = 0\), there’s nothing to prove: \(E(0, h) = h\), because all the probability is concentrated on that one value.

For \(w > 0\) and \(h = 0\), we can from \((w, 0)\) only move to \((w  1, 1)\), so \[E(w, 0) = E(w  1, 1) = H_{w1} + \frac1{(w1)+1} = H_w\] as expected.

For \(w > 0\) and \(h > 0\), suppose we’ve proved that \(E(w', h') = H_{w'} + h'/(w' + 1)\), for all \((w', h') < (w, h)\) lexicographically (i.e. anything to the left or directly below). Then, \[\begin{align}E(w, h) &= \frac{w}{w+h} E(w1, h+1) + \frac{h}{w+h} E(w, h1) \cr &= \frac{w}{w+h}\left(H_{w1} + \frac{h+1}{w}\right) + \frac{h}{w+h}\left(H_w + \frac{h1}{w+1}\right) \cr &= \frac{w}{w+h}\left(H_w + \frac{h}{w}\right) + \frac{h}{w+h}\left(H_w + \frac{h1}{w+1}\right) \cr &= H_w + \frac{h}{w+h}\left(1 + \frac{h1}{w+1}\right) \cr &= H_w + \frac{h}{w+h}\frac{w+h}{w+1} \cr &= H_w + \frac{h}{w+1}\end{align}\] which is what we wanted to prove. (I wrote out the manipulations explicitly because it took me a few tries to get it to work!)
Alternative proof
We proved that \[E(w, h) = H_w + \frac{h}{w + 1}\] but the proof by induction still gives not much intuition about why this should be so.
Another way to picture this problem is in terms of ballsinbins: consider the starting state \((w, h)\) as being made of \((w + h)\) bins, of which \(w\) bins have \(2\) balls and \(h\) bins have \(1\) ball. In each step we pick a nonempty bin at random (and remove one ball from it), and we’re asking for how many bins are nonempty at the first time that no bin has more than \(1\) ball. I don’t know if this way of thinking about it helps—in hindsight, it’s just a restatement of the original problem—but I already wrote the following, so I’ll stick to it.
Consider the following argument. Suppose we start at state \((w, h)\), i.e. with \(w\) bins having two balls, and \(h\) bins having a single ball. Consider a particular one of those \(h\) bins. Its contribution to the final expected number of singleball bins is simply the probability that this one remains nonempty when the last twoball bin is picked.
For each bin \(i\) (of the \(w + h\) bins), let \(\mathbb{1}_i\) denote the indicator function for the event of that particular bin remaining nonempty (i.e. having exactly one ball) at the first time when all the twoball bins have been picked at least once. And let \(p_i\) be its probability. Then, the expected number we want is \[E(w, h) = \sum_{i} \mathop{\mathbb{E}}[\mathbb{1}_i] = \sum_i p_i.\] For the \(h\) bins \(i\) that correspond to the oneball bins, the probability \(p_i\) turns out to be easy to compute: Assuming \(h > 0\), consider the \(w\) twoball bins along with this particular bin \(i\). For these \(w + 1\) bins write down the order in which they are first picked from, in the entire process (not stopping until there are no more balls). By symmetry, all possible orderings are equally likely: the fact that the \(w\) bins start with two balls only matters for the second time they are picked. This means that the probability that the bin \(i\) occurs last in this ordering is simply \(1/(w+1)\). Now, note that the event of this bin remaining nonempty when there are no more twoball bins is precisely the probability that this bin is not picked from until all the \(w\) bins have been picked from: in other words, \(p_i = 1/(w + 1)\) as we just calculated. This shows that \[E(w, h) = E(w, 0) + \frac{h}{w + 1}.\] And when we start with \(h = 0\), the outcome of very first step is that we invariably go to the \((w  1, 1)\) state, thus \[\begin{align}E(w, 0) &= E(w  1, 1) \cr &= E(w  1, 0) + \frac{1}{w}\end{align}\] iterating which gives \[E(w, 0) = H_w\] and our formula for \(E(w, h)\) is proved again.
Update
See followup post by MJD which includes more clear explanation and also an intriguing note about isomorphism with a frog puzzle.