Over the last couple years security and privacy on the internet has become significant issue due to constant hacking and spying from groups and individuals both foreign and domestic. A common solution to such intrusions is cryptography, one common encryption algorithm is RSA encryption. RSA is a public-key encryption algorithm, the core security of the algorithm comes from the assumed difficulty of finding the prime factorization of a composite number consisting of two large prime factors. So the practical uses of the algorithm depend on the ability to generate large prime numbers that are known only to the individual who is setting up the encryption.

After studying the RSA algorithm I became interested in the process of generating prime numbers for use in cryptography and the statistics involved in the process. As an exercise in using the algorithms we will attempt to generate a large prime number with the appropriate properties so that it is improbable that the same prime number would be generated again.

Primality Tests

The problem with prime numbers is that usually the algorithms used to prove that a number is prime runs slowly, an alternative is to use a probabilistic algorithm that checks if the number is composite. If a number repetitively fails the composite test we can determine the probability that it is prime and it is considered to pass the equivalent primality test. So rather than proving a number to be prime we can show that there is only a small probability that the number could pass this new primality test.

For example the AKS primality test is the first deterministic primality test to be proven to run in polynomial time. The time complexity of the original version of this algorithm as it was when it published in 2002 is \(\tilde{O}(\log^{12}(n))\), where \(n\) is the number being tested for primality. The notation \(\tilde{O}\) is defined as \(\tilde{O}(f(n))=O(f(n)\log^{m}(f(n)))\) for some arbitrary \(m\). An updated version of the AKS algorithm has a time complexity of \(\tilde{O}(\log^{6}(n))=O(\log^{6}(n)\log^{m}(\log(n)))\). While alternatively one of the most commonly used probabilistic primality tests, the Miller-Rabin test, has a time complexity of \(O(k\log^{3}(n))\), where \(k\) is the number of times the algorithm is applied to a number \(n\) in order to reach the required probability of n being prime. When comparing the time complexity of these two algorithms we see that for large prime numbers even when \(k\) is large the Miller-Rabin test will tend to be faster. So rather than trying to find a large prime number by proving that \(n\) is prime, we can use a probabilistic primality test with the parameter set such that the probability that \(n\) is composite is small.

Small Probability

Our target is to generate a prime number that is unlikely to be generated again, this is achieved by finding a number that is large enough that the probability of selecting it specifically is small and showing that it is probably prime with the primality test. Since the primality is probabilistic we should set up the test such that the probability that the number is not prime is also small. But how small should these probabilities be?

We can consider the probability of some event occurring given the number of opportunities it has to occur to be \(P=pN\), where \(p\) is the probability of a particular event, and \(N\) is the number of times that the event could occur. So if \(p\) is the probability of selecting a random number \(n\), and \(N\) is the number of times a random number is selected, then \(P\) is the probability that \(n\) will be selected after \(N\) trials. In order to calculate the required size of \(p\) we still need to know how small \(P\) should be and the size of \(N\). The value of \(P\) can be set to any probability but for this exercise we will use \(P=\frac{1}{2}\), so given \(N\) tries it is as likely as not that an event with probability \(p\) will occur.

Now to estimate \(N\). If we let \(N\) be the maximum number of computations that a highly efficient parallel computation device could perform over its existence. Lets assume that no more than one particle is necessary to perform a calculation. Then we can use; the plank time as the minimum time for one particle to perform one computation, the approximate time till the heat death of the universe as the maximum time that the machine has available for calculations, and the total mass-energy of the observable universe in the form of the lightest particle from the standard model as the maximum number of particles available for computation. Then \(N=\frac{T_{\text{heat death}} n_{\text{neutrinos}}}{t_{\text{plank}}}=\frac{T_{\text{heat death}} E_{\text{universe}}}{t_{\text{plank}} E_{\text{neutrino}}}\), where the number of electron-neutrinos is calculated from the energy of the electron-neutrino and the mass-energy of the observable universe such that \(n_{\text{neutrinos}}=\frac{E_{\text{universe}}}{E_{\text{neutrino}}}\). Using these estimations \(N=\frac{10^{100}years ~ 2.5 \times 10^{88}eV}{5.4 \times 10^{-44}seconds ~ 2.2eV}=8 \times 10^{238}\).

Now that we have \(N\) we can calculate our small probability, \(p=\frac{P}{N}=\frac{1/2}{8 \times 10^{238}} = 6 \times 10^{-240}\). Given the assumptions and values used in the calculation of \(p\), any event that occurs with a probability smaller than \(p\) is as likely as not to never occur.

Generating the Prime Number

Using the Miller-Rabin primality test, the probability that a composite number passes a single iteration of the test is \(P=\frac{1}{4}\), so the probability that it passes \(n\) times is \(P=4^{-n}\). Now rearranging and calculating for the number of times the test should be run on a number so that the probability that it is composite is the small probability \(p\). \(n=\lceil-\frac{\log(p)}{\log(4)}\rceil=\lceil-\frac{\log(6 \times 10^{-240})}{\log(4)}\rceil=398\). So the each number should be tested \(398\) times before it is considered prime.

Finally the probability of randomly selecting the prime number should be the small probability \(p\). If the number was merely a random number with a value smaller than or equal to \(x\) then the probability of selecting it would be \(P=\frac{1}{x}\). But the number is considered to be a prime so all of the composite numbers can be ignored and the probability would then become \(P=\frac{1}{\pi(x)}\), where \(\pi(x)\) is the number of prime numbers smaller than \(x\). According to the prime number theorem the number of primes smaller than the integer \(x\) is \(\pi(x)\sim\frac{x}{\ln(x)}\), so the probability of choosing a particular random prime number smaller than \(x\) is \(P=\frac{1}{\pi(x)}=\frac{\ln(x)}{x}\). So we need to solve the equation \(0=\ln(x)-xP\). This function can be solved numerically using the Newton-Raphson method.

Once we solve for the number \(x\) we will be able to use that to calculate a prime number with the Miller-Rabin primality test such that the probability of the Miller-Rabin test failing is smaller than or equal to \(p\). And such that the probability of the number being randomly selected by any random process is smaller than or equal to \(p\).