It has long been of interest to study any regularity in the prime numbers, a question which can be phrased as: “Is there a formula for the $n$th prime number?” While there has been a fair bit of progress and understanding (e.g. the work of Riemann, the prime-number theorem), in 1964 someone named(?) C. P. Willans from the University of Birmingham gave the following “troll” answer, answering the question literally while flagrantly violating its spirit.
$$p_n = 1 + \sum_{i=1}^{2^n} \left\lfloor \left(\frac{n}{\sum_{j=1}^i \left\lfloor\left(\cos \frac{(j-1)! + 1}{j} \pi\right)^2\right\rfloor }\right)^{1/n} \right\rfloor$$
This ridiculous expression, which we’ll understand better below, is indeed a “formula” for the $n$th prime number, but it is in nearly every way worse than just saying “the $n$th prime number”. Two articles from a couple of decades later have some words on such formulas:
Nevertheless, the formula is surprising at first glance and can be interesting to understand: it is mentioned on Wikipedia and there’s an excellent YouTube video about it.
It is based on a bunch of tricks, and the video does a good job of explaining them from the bottom up, so as a complement, let’s do it top-down.
The nth prime in terms of the prime-counting function
Let $\pi(m)$ denote the number of primes not greater than $m$, e.g. $\pi(10) = 4$ because there are four primes $2, 3, 5, 7$ not greater than $10$, and $\pi(11) = 5$ because $11$ is the fifth prime. (For more, see prime-counting function.)
Then, we can write an expression (circular definition) for $p_n$ as:
$$p_n = \text{the smallest number $m$ such that $\pi(m) \ge n$}$$
With the additional knowledge that $p_n$ is at most $2^n$ (by Bertrand’s postulate), we can write this as:
$$p_n = 1 + \sum_{i = 1}^{2^n} [\pi(i) < n]$$
because if the $n$th prime is $m$, then each $i < m$ will add $1$ to the sum, while $m$ and higher will add $0$.
Next, we can write $$[c \le n] = \left\lfloor \left(\frac{n}{c}\right)^{1/n} \right\rfloor$$because for $c > n$ we have $n / c < 1$ so $(n/c)^{1/n} < 1$, while for $c \le n$ we have $n \ge n / c \ge 1$ so $n^{1/n} \ge (n/c)^{1/n} \ge 1$ (and the global maximum of $n^{1/n}$ is $e^{1/e} \approx 1.445 < 2$).
Putting these together (and $\pi(i) < n$ being the same as $1 + \pi(i) \le n$), we can write:
$$p_n = 1 + \sum_{i = 1}^{2^n} \left\lfloor \left(\frac{n}{1 + \pi(i)}\right)^{1/n} \right\rfloor$$
This is already a “formula” for $p_n$, if we can express the $\pi(i)$ part as a “formula”.
The prime-counting function
$$\begin{align}
1 + \pi(i)
&= \sum_{j = 1}^{i} [j\text{ is prime or $1$}] \cr
&= \sum_{j = 1}^{i} \left[\frac{(j-1)!+1}{j}\text{ is an integer }\right]
\end{align}$$
by Wilson’s theorem, which states that $j$ divides $(j-1)! + 1$ iff $j$ is either prime or $1$.
We can represent “is an integer” with this trick:
$$[x\text{ is an integer}] = \lfloor \left( cos(x \pi) \right)^2 \rfloor$$
as $cos(x\pi)$ is $-1$ or $1$ when $x$ is an integer, and strictly less than $1$ in absolute value otherwise.
And with that, we’re done!
Summary
Thus, Willans’s formula is “merely” a clever putting together of these tricks:
- The $n$th prime number is guaranteed to be at most $2^n$
- it is the smallest number with $n$ primes less than it
- Wilson’s theorem
- $[c \le n] = \left\lfloor (n/c)^{1/n} \right\rfloor$
- $[x\text{ is an integer}] = \lfloor \left( cos(x \pi) \right)^2 \rfloor$
In programming terms, it is like starting with the following naive and inefficient function for returning the $n$th prime:
def nth_prime(n):
num_primes_seen = 0
for i in range(1, 2**n):
# TODO: implement is_prime, returning 0 or 1
num_primes_seen += is_prime(i)
if num_primes_seen >= n:
return i
and making it less efficient, by recomputing num_primes_seen
(the number of primes $\le i$) from scratch in a nested loop, and by choosing an awfully slow implementation of is_prime(i)
:
def nth_prime(n):
for i in range(1, 2**n):
# count = 1 + pi(n) = 1 + (number of primes <= i)
count = sum((math.factorial(j - 1) + 1) % j == 0
for j in range(1, i + 1))
if count > n:
return i
and making it even worse by always running the loop up to $2^n$. This gives the following very inefficient program:
def nth_prime(n):
p = 1
for i in range(1, 2**n):
count = sum((math.factorial(j - 1) + 1) % j == 0
for j in range(1, i + 1))
p += (count <= n)
return p
This (absurdly inefficient) program is now in a form amenable to “encoding” using formula tricks, and that is precisely what the Willans formula does.