Derangements, Dobiński's formula, and their combination
A couple of years ago, someone on a mailing list asked for a proof for their observation that
where $B_n$ denotes the Bell numbers (the number of set partitions of $n$ items), and \( !(nk) \) aka $D_{nk}$ denotes the number of derangements of $nk$ items.
Actually, this appears to be the $m=n$ special case of something more general: whenever $m \le n$, we can say
$$B_m n! = \sum_{k} {n \choose k} k^m D_{nk}$$
The “symbolic method” of Flajolet and Sedgewick (from their book Analytic Combinatorics) gives a nice unifying framework that puts in perspective (1) the relation between permutations and derangements, (2) Dobinski’s formula, and (3) this result — when viewed from this lens, all these results feel very natural (IMO).
0. Background (EGFs and labelled products)
Suppose we have a combinatorial class \(\mathcal{C}\), i.e. a set of objects such that \(C_n\) is the number of objects of size \(n\) and is finite. Then the exponential generating function (EGF) of this class is defined as
\[C(z) = \sum_{n} C_n \frac{z^n}{n!}\]
It makes sense to use EGFs when these objects are labelled, i.e. if an object is of size \(n\), then it contains \(n\) elements that have labels \(1, 2, \ldots, n\). For example, \(\mathcal{C}\) may be the class of all permutations, so that \(C_n\) counts the number of permutations of \(n\) items.
Some trivial examples:

\(\mathcal{E}\) for the classes containing only a single element of size 0, thus having EGF \(E(z) = 1\).

\(\mathcal{Z}\) for the class containing only a single element of size \(1\), thus having EGF \(Z(z) = z\).

\(\mathcal{U} = \operatorname{S\scriptsize ET}(\mathcal{Z})\) for an “urn”, i.e. a set of elements: There is exactly one for every size, i.e. \(U_n = 1\) for all \(n\), and \(U(z) = \sum_{n} {z^n \over n!} = e^z\).
We can define the labelled product of two such combinatorial classes: put together an object from the first class and an object from the second, redistributing labels so that the relative order is consistent within each object. For two classes \(\mathcal{A}\) and \(\mathcal{B}\), this operation, denoted \(\mathcal{A} \star \mathcal{B}\), corresponds to the product of the EGFs.
Note that the binomial convolution \(C_n = \sum_{k} {n \choose k} A_k B_{nk}\) corresponds to both

the combinatorial interpretation (to obtain an object of size \(n\) in \(\mathcal{C} = \mathcal{A} \star \mathcal{B}\), take an object of size \(k\) from \(\mathcal{A}\), and one of size \(nk\) from \(\mathcal{B}\), choose in \(n \choose k\) ways which \(k\) labels of \(n\) are given to the former, and relabel both accordingly), and

the formal expression: the coefficient of \(z^n\) in \(C(z) = A(z)B(z)\), namely \([z^n]C(z) = C_n/n!\) is got via products of the form \(A_k {z^k \over k!} B_{nk} {z^{nk} \over (nk)!} = {1 \over n!} {n \choose k} A_k B_{nk} z^n\) for some \(k\).
With that quick background (explained more clearly in the book!), all three results have very short proofs.
\[\def\stirling#1#2{\left\lbrace{#1 \atop #2}\right\rbrace}\]
1. Permutations and derangements
Let \(\mathcal{P}\) be the class of all permutations. There are \(P_n = n!\) permutations of size \(n\), so its EGF is \(P(z) = \sum_n P_n \frac{z^n}{n!} = \sum_{n} z^n = \frac{1}{1z}\). (In fact we can also obtain this by noting that any permutation is either the empty permutation or the labelled product of a single element and a smaller permutation, so \(\mathcal{P} = \mathcal{E} + \mathcal{Z} \star \mathcal{P}\) which for the EGF gives \(P(z) = 1 + zP(z)\) or \(P(z) = \frac{1}{1z}\).)
Let \(\mathcal{D}\) be the class of all derangements. Then note that any permutation is the labelled product of a derangement and a set of elements (the fixed points). So
\[\mathcal{P} = \mathcal{D} \star \mathcal{U}\]
\[P(z) = D(z) e^z\]
Comparing coefficients of $z^n/n!$ gives the relation \(n! = \sum_{k} \binom{n}{k} D_k\), or equivalently, from \(D(z) = e^{z}P(z)\), the relation $$D_n = \sum_k \binom{n}{k} (1)^k (nk)! = n! \sum_k \binom{n}{k} \frac{1}{k!}$$
(BTW, the same “symbolic method” also gives us a couple of other ways of deriving the same expression for $D_n$; see this Math.SE answer.)
2. Surjections and all functions (Dobiński’s formula)
Fix a set \(S\) of size \(M\) for some fixed constant integer \(M\). (For example, \(S\) may be \(\{1, 2, \ldots, M\}\).)
Let \(\mathcal{R}\) be the class of all surjective (“onto”) functions from \(S\) to some prefix of the natural numbers. That is, \(R_n\) counts the number of surjective functions from \(S\) to \(\{1, 2, \ldots, n\}\). This is an ordered set partition, so the number is \(R_n = n!\stirling{M}{n}\), where the notation $\stirling{M}{n}$ denotes the number of partitions of a set of size $M$ into $n$ parts, known as the Stirling subset numbers (aka Stirling numbers of the second kind). We can see this as: partition \(S\) into \(n\) nonempty subsets (the preimage of each element), then order these \(n\) subsets in any of \(n!\) ways. (Note that \(R_n\) is \(0\) for \(n > M\).)
Let \(\mathcal{F}\) be the class of all functions from \(S\) to some prefix of the natural numbers. That is, \(F_n\) counts the number of functions from \(S\) to \(\{1, 2, \ldots, n\}\), and this of course is \(n^M\).
Now, any such function is the labelled product of a surjection and set of “dummy” elements (in the codomain but not in the range). So we have
\[\mathcal{F} = \mathcal{U} \star \mathcal{R}\]
\[F(z) = e^z R(z)\]
Setting \(z = 1\) here gives
\[\sum_{n} \frac{n^M}{n!} = e \sum_{n} n! \stirling{M}{n} \frac{1}{n!} = e \sum_{n} \stirling{M}{n} = eB_M\]
which is known as Dobiński’s formula.
I had posted this online at some point, excuse the handwriting:
3. Putting them together
Above we’ve seen that \(\mathcal{P} = \mathcal{U} \star \mathcal{D}\) and that \(\mathcal{F} = \mathcal{U} \star \mathcal{R}\). Putting them together, we have
\[\mathcal{F} \star \mathcal{D} = \mathcal{R} \star \mathcal{U} \star \mathcal{D} = \mathcal{R} \star \mathcal{P}\]
\[F(z) D(z) = R(z) P(z)\]
So
\begin{align}
\sum_{k} {n \choose k} k^M D_{nk} &= \sum_{k} {n \choose k} k!\stirling{M}{k} (nk)! \\\
&= n! \sum_{k \le n} \stirling{M}{k}
\end{align}
and for \(n \ge M\) the RHS is \(n! B_M\). This is precisely the identity we initially wanted to prove.