# Finding Primes

One of the most efficient ways to find smaller primes is to use a sieve. There are several different sieves but one of the easiest to code is the sieve of Eratosthenes.

Here is a simple example:

1 2 3 4 |
low_primes = [2,3,5,7] low_primes += [x for x in range(9,100,2) if all(x % p for p in low_primes)] |

This starts with primes less than 10 and then finds primes between 10 and 100 by finding odd numbers that are not multiples of 2, 3, 5 or 7.

In order to test whether a number is prime, we only need to test whether it is a multiple of any prime up to the square root of the number since, if it has any integer factors, at least one of them must be at or below this limit. We could hence extend the above approach by using each new list to find primes in a larger interval – primes up to 10 allow the list to be extended to 100, primes up to 100 provide for extension up to 10,000 and so on.

But we really need a more general solution, which the following subroutine provides:

1 2 3 4 5 6 7 8 9 |
def prime_list(n): sieve = [False, False] + [True] * (n - 1) for i in range(2, int(n ** 0.5) + 1): if sieve[i]: m = n // i - i sieve[i * i : n + 1 : i] = [False] * (m + 1) return [i for i in range(n + 1) if sieve[i]] |

This is quite fast for an interpreted language such as Python but it uses a lot of memory when large primes are needed.An alternative that consumes less memory uses a Python generator:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
def prime_gen(maxp = -1): d = {} q = 2 while maxp < 0 or q < maxp: if q not in d: yield q d[q * q] = [q] else: for p in d[q]: d.setdefault(p + q, []).append(p) del d[q] q += 1 |

We can time these two approaches using the following example which counts the number of primes below 1,000,000:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from time import clock class Timer() : def __enter__(self): self.start = clock() def __exit__(self, *args): print("Time", clock() - self.start) with Timer() : x = prime_list(1000000) print(len(x), x[-1]) with Timer() : x = [p for p in prime_gen(1000000)] print(len(x), x[-1]) |

This gives the output:

1 2 3 4 5 6 |
78498 999983 Time 0.637893911573 78498 999983 Time 7.35350640485 |

which shows that the prime list function is over ten times faster than the generator solution.

Nevertheless, the generator uses less memory and offers a lot of flexibility as can be seen in its use with a filter function such as:

1 2 3 4 5 6 |
def select(f, s): for x in s: if f(list(map(int, str(x)))): yield x |

Here a list of digits in the input number is created and the function f is applied to this list. We can now define a property we want for the digits of a prime using the function f (which can be a lambda function, an ordinary function written in a compact form).

For example, suppose we want to list the primes below 1000 with repeated digits. This does the job in a single line of code:

1 2 3 4 5 6 7 8 |
print([p for p in select(lambda x : len(x) != len(set(x)), prime_gen(1000))]) [11, 101, 113, 131, 151, 181, 191, 199, 211, 223, 227, 229, 233, 277, 311, 313, 331, 337, 353, 373, 383, 433, 443, 449, 499, 557, 577, 599, 661, 677, 727, 733, 757, 773, 787, 797, 811, 877, 881, 883, 887, 911, 919, 929, 977, 991, 997] |

In a similar way we can easily find primes whose digits sum to 32:

1 2 3 4 5 |
print([p for p in select(lambda x : sum(x) == 32, prime_gen(10000))]) [6899, 8699, 8969, 9689, 9887] |

So far we have considered only primes where it is feasible to hold lists of primes in memory. But when numbers are very large, with hundreds or thousands of digits, it becomes impractical to find primes in this way. Although there are ways of determining whether a very large number is prime, in practice they are very slow. Fortunately, however, there are tests that can determine whether a given number is almost certain to be prime. Such tests are often known as probable prime tests but they would be more accurately called composite tests because they can determine with certainty that a number is composite (i.e. it is not a prime).

One such test is known as the Miller-Rabin test, which is as follows in Python:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
from random import randrange def miller_rabin(n, r = 10): if n < 10: return n in (2, 3, 5, 7) t = n - 1 s = 0 while not t % 2: t >>= 1 s += 1 for i in range(r): a = randrange(2, n - 1) x = pow(a, t, n) if x != 1 and x != n - 1: for j in range(s - 1): x = (x * x) % n if x == 1: return False if x == n - 1: break else: return False return True |

The first parameter is the number to test, the second parameter being the number of times the test is to be repeated (with a default of 10). Each time the test is run, the value returned will be False if the number is composite and True if the number is possibly prime.

For the Miller-Rabin test it is known that the probability of a test falsely indicating a prime is not higher than 1/4. Moreover, tests are independent of each other when run with different internal random numbers, which means that running N tests reduces the chance of finding a false prime to 1 in 4^N. So, if 10 tests all indicate that a value is prime, the probability that it is actually composite will be lower than 1 in 1,000,000. So, for example, if we want to find the largest primes below the powers of 10 between 10 and 30, we can use:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
for n in range(10, 31): t = 10 ** n - 1 while not miller_rabin(t): t -= 2 print(n, t) 10 9999999967 11 99999999977 12 999999999989 13 9999999999971 14 99999999999973 15 999999999999989 16 9999999999999937 17 99999999999999997 18 999999999999999989 19 9999999999999999961 20 99999999999999999989 21 999999999999999999899 22 9999999999999999999973 23 99999999999999999999977 24 999999999999999999999743 25 9999999999999999999999877 26 99999999999999999999999859 27 999999999999999999999999901 28 9999999999999999999999999791 29 99999999999999999999999999973 30 999999999999999999999999999989 |

In practice this test is almost always far better than its theoretical probability suggests and running it 20 times will make the probability of falsely reporting a composite as a prime far lower than the probability that a random machine memory error will cause such a false report.