Mathematical trolling: the Willans prime formula

December 16, 2023

It has long been of interest to study any regularity in the prime numbers, a question which can be phrased as: “Is there a formula for the $n$ th prime number?” While there has been a fair bit of progress and understanding (e.g. the work of Riemann, the prime-number theorem), in 1964 someone named(?) C. P. Willans from the University of Birmingham gave the following “troll” answer, answering the question literally while flagrantly violating its spirit.

$p_{n} = 1 + \sum_{i = 1}^{2^{n}} ⌊ {(\frac{n}{\sum_{j = 1}^{i} ⌊ {(\cos \frac{(j - 1)! + 1}{j} π)}^{2} ⌋})}^{1 / n} ⌋$

This ridiculous expression, which we’ll understand better below, is indeed a “formula” for the $n$ th prime number, but it is in nearly every way worse than just saying “the $n$ th prime number”. Two articles from a couple of decades later have some words on such formulas:

What is an Answer? by Herbert S. Wilf (May 1982):

sometimes […] we may feel that the disease was preferable to the cure. […] is unacceptable in polite society as an answer, despite its elegant appearance, because it is just a restatement of the question, and it does not give us a tool for calculating f(n) that we didn’t have before.
Formulas for Primes by Underwood Dudley (January 1983):

The conclusion to be drawn from all this, I think, is that formulas for formulas' sake do not advance the mathematical enterprise. Formulas should be useful. If not, they should be astounding, elegant, enlightening, simple, or have some other redeeming value. […] Even as in business and marriage, in mathematics not all that is true needs to be published.

Nevertheless, the formula is surprising at first glance and can be interesting to understand: it is mentioned on Wikipedia and there’s an excellent YouTube video about it.

It is based on a bunch of tricks, and the video does a good job of explaining them from the bottom up, so as a complement, let’s do it top-down.

The nth prime in terms of the prime-counting function

Let $π (m)$ denote the number of primes not greater than $m$ , e.g. $π (10) = 4$ because there are four primes $2, 3, 5, 7$ not greater than $10$ , and $π (11) = 5$ because $11$ is the fifth prime. (For more, see prime-counting function.)

Then, we can write an expression (circular definition) for $p_{n}$ as: $p_{n} = the smallest number m such that π (m) \geq n$

With the additional knowledge that $p_{n}$ is at most $2^{n}$ (by Bertrand’s postulate), we can write this as: $p_{n} = 1 + \sum_{i = 1}^{2^{n}} [π (i) < n]$ because if the $n$ th prime is $m$ , then each $i < m$ will add $1$ to the sum, while $m$ and higher will add $0$ .

Next, we can write $[c \leq n] = ⌊ {(\frac{n}{c})}^{1 / n} ⌋$ because for $c > n$ we have $n / c < 1$ so $(n / c)^{1 / n} < 1$ , while for $c \leq n$ we have $n \geq n / c \geq 1$ so $n^{1 / n} \geq (n / c)^{1 / n} \geq 1$ (and the global maximum of $n^{1 / n}$ is $e^{1 / e} \approx 1.445 < 2$ ).

Putting these together (and $π (i) < n$ being the same as $1 + π (i) \leq n$ ), we can write: $p_{n} = 1 + \sum_{i = 1}^{2^{n}} ⌊ {(\frac{n}{1 + π (i)})}^{1 / n} ⌋$ This is already a “formula” for $p_{n}$ , if we can express the $π (i)$ part as a “formula”.

The prime-counting function

$\begin{aligned} 1 + π (i) & = \sum_{j = 1}^{i} [j is prime or 1] \\ = \sum_{j = 1}^{i} [\frac{(j - 1)! + 1}{j} is an integer] \end{aligned}$ by Wilson’s theorem, which states that $j$ divides $(j - 1)! + 1$ iff $j$ is either prime or $1$ .

We can represent “is an integer” with this trick: $[x is an integer] = ⌊ {(c o s (x π))}^{2} ⌋$ as $c o s (x π)$ is $- 1$ or $1$ when $x$ is an integer, and strictly less than $1$ in absolute value otherwise.

And with that, we’re done!

Summary

Thus, Willans’s formula is “merely” a clever putting together of these tricks:

The $n$ th prime number is guaranteed to be at most $2^{n}$
it is the smallest number with $n$ primes less than it
Wilson’s theorem
$[c \leq n] = ⌊ (n / c)^{1 / n} ⌋$
$[x is an integer] = ⌊ {(c o s (x π))}^{2} ⌋$

In programming terms, it is like starting with the following naive and inefficient function for returning the $n$ th prime:

def nth_prime(n):
    num_primes_seen = 0
    for i in range(1, 2**n):
        # TODO: implement is_prime, returning 0 or 1
        num_primes_seen += is_prime(i)
        if num_primes_seen >= n:
            return i

and making it less efficient, by recomputing num_primes_seen (the number of primes $\leq i$ ) from scratch in a nested loop, and by choosing an awfully slow implementation of is_prime(i):

def nth_prime(n):
    for i in range(1, 2**n):
        # count = 1 + pi(n) = 1 + (number of primes <= i)
        count = sum((math.factorial(j - 1) + 1) % j == 0 
                    for j in range(1, i + 1))
        if count > n:
            return i

and making it even worse by always running the loop up to $2^{n}$ . This gives the following very inefficient program:

def nth_prime(n):
    p = 1
    for i in range(1, 2**n):
        count = sum((math.factorial(j - 1) + 1) % j == 0
                    for j in range(1, i + 1))
        p += (count <= n)
    return p

This (absurdly inefficient) program is now in a form amenable to “encoding” using formula tricks, and that is precisely what the Willans formula does.