Blog

Mathematical trolling: the Willans prime formula

It has long been of interest to study any regularity in the prime numbers, a question which can be phrased as: “Is there a formula for the $n$th prime number?” While there has been a fair bit of progress and understanding (e.g. the work of Riemann, the prime-number theorem), in 1964 someone named(?) C. P. Willans from the University of Birmingham gave the following “troll” answer, answering the question literally while flagrantly violating its spirit.

$$p_n = 1 + \sum_{i=1}^{2^n} \left\lfloor \left(\frac{n}{\sum_{j=1}^i \left\lfloor\left(\cos \frac{(j-1)! + 1}{j} \pi\right)^2\right\rfloor }\right)^{1/n} \right\rfloor$$

This ridiculous expression, which we’ll understand better below, is indeed a “formula” for the $n$th prime number, but it is in nearly every way worse than just saying “the $n$th prime number”. Two articles from a couple of decades later have some words on such formulas:

Nevertheless, the formula is surprising at first glance and can be interesting to understand: it is mentioned on Wikipedia and there’s an excellent YouTube video about it.

It is based on a bunch of tricks, and the video does a good job of explaining them from the bottom up, so as a complement, let’s do it top-down.

The nth prime in terms of the prime-counting function

Let $\pi(m)$ denote the number of primes not greater than $m$, e.g. $\pi(10) = 4$ because there are four primes $2, 3, 5, 7$ not greater than $10$, and $\pi(11) = 5$ because $11$ is the fifth prime. (For more, see prime-counting function.)

Then, we can write an expression (circular definition) for $p_n$ as: $$p_n = \text{the smallest number $m$ such that $\pi(m) \ge n$}$$

With the additional knowledge that $p_n$ is at most $2^n$ (by Bertrand’s postulate), we can write this as: $$p_n = 1 + \sum_{i = 1}^{2^n} [\pi(i) < n]$$ because if the $n$th prime is $m$, then each $i < m$ will add $1$ to the sum, while $m$ and higher will add $0$.

Next, we can write $$[c \le n] = \left\lfloor \left(\frac{n}{c}\right)^{1/n} \right\rfloor$$because for $c > n$ we have $n / c < 1$ so $(n/c)^{1/n} < 1$, while for $c \le n$ we have $n \ge n / c \ge 1$ so $n^{1/n} \ge (n/c)^{1/n} \ge 1$ (and the global maximum of $n^{1/n}$ is $e^{1/e} \approx 1.445 < 2$).

Putting these together (and $\pi(i) < n$ being the same as $1 + \pi(i) \le n$), we can write: $$p_n = 1 + \sum_{i = 1}^{2^n} \left\lfloor \left(\frac{n}{1 + \pi(i)}\right)^{1/n} \right\rfloor$$ This is already a “formula” for $p_n$, if we can express the $\pi(i)$ part as a “formula”.

The prime-counting function

$$\begin{align} 1 + \pi(i) &= \sum_{j = 1}^{i} [j\text{ is prime or $1$}] \cr &= \sum_{j = 1}^{i} \left[\frac{(j-1)!+1}{j}\text{ is an integer }\right] \end{align}$$ by Wilson’s theorem, which states that $j$ divides $(j-1)! + 1$ iff $j$ is either prime or $1$.

We can represent “is an integer” with this trick: $$[x\text{ is an integer}] = \lfloor \left( cos(x \pi) \right)^2 \rfloor$$ as $cos(x\pi)$ is $-1$ or $1$ when $x$ is an integer, and strictly less than $1$ in absolute value otherwise.

And with that, we’re done!

Summary

Thus, Willans’s formula is “merely” a clever putting together of these tricks:

In programming terms, it is like starting with the following naive and inefficient function for returning the $n$th prime:

def nth_prime(n):
    num_primes_seen = 0
    for i in range(1, 2**n):
        # TODO: implement is_prime, returning 0 or 1
        num_primes_seen += is_prime(i)
        if num_primes_seen >= n:
            return i

and making it less efficient, by recomputing num_primes_seen (the number of primes $\le i$) from scratch in a nested loop, and by choosing an awfully slow implementation of is_prime(i):

def nth_prime(n):
    for i in range(1, 2**n):
        # count = 1 + pi(n) = 1 + (number of primes <= i)
        count = sum((math.factorial(j - 1) + 1) % j == 0 
                    for j in range(1, i + 1))
        if count > n:
            return i

and making it even worse by always running the loop up to $2^n$. This gives the following very inefficient program:

def nth_prime(n):
    p = 1
    for i in range(1, 2**n):
        count = sum((math.factorial(j - 1) + 1) % j == 0
                    for j in range(1, i + 1))
        p += (count <= n)
    return p

This (absurdly inefficient) program is now in a form amenable to “encoding” using formula tricks, and that is precisely what the Willans formula does.