```
CHAPTER  7

MISCELLANEOUS ALGORITHMS

7.0 Testing For Prime Numbers
-----------------------------

There are a number of algorithms where randomness plays an important part in
solving a perfectly precise problem. Perhaps the most interesting is the prime
number problem.

Suppose we want to tell whether positive integer p is prime.  Eratosthenes
sieve is essentially a dynamic programming technique in which we remember the
numbers divisible by small primes while working on the next. It requires one
bit for each number up to p and is suitable for numbers up to several million
(say). What if p has several hundred digits? The sieve clearly takes at least
O(p) time, and probably O(plogp).  The size of this problem is the amount of
information needed to specify p, i.e. the number of digits of p, i.e.  n = log
p. We want a fast algorithm, i.e. something polynomial in n = log p.  There is
practical interest in this problem in the area of encryption.

One of the fastest known methods is a probabilistic one based on Fermat's
little theorem. See Lehmann's paper in SIAM J. Comput. vol. 11.  If p is prime
and 1 <= a <= p-1 then

a^(p-1)  =  1 (mod p)

=>   a^((p-1)/2)  =  +1 or -1 (mod p)

For example if p = 11 then (p-1)/2 = 5 and:

a:   1  2  3  4  5  6  7  8  9  10
a^5:  1 -1  1  1  1 -1 -1 -1  1  -1

If p = 15 then (p-1)/2 = 7 and:

a:   1  2  3  4  5  6  7  8  9  10  11  12  13  14
a^7:  1 -7 -3  4  5  6 -2  2 -6  -5  -4   3   7  -1

Some nasty number theory ensures that the following probabilistic algorithm is
correct.

To test if p is prime
---------------------
If p = 2 give answer PRIME
If p is even give answer COMPOSITE
Choose 20 numbers (say) at random from 1 to p-1
For each number a calculate a^((p-1)/2) (mod p)
If any are not +1 or -1, give answer COMPOSITE
If all are +1, give answer COMPOSITE (prob. of being wrong 2^-20)
Otherwise give answer PRIME (prob. of being wrong 2^-20)

If k numbers are used instead of 20, the probability of a wrong answer is 2^-k.
If a theoretically sound random number generator is used, and the probability
of a wrong answer made less than the probability of a bug in the program, the
algorithm is as good as a non-probabilistic one.

All numbers, including intermediate results can be kept modulo p, so they all
have log p digits. The question remaining is how to calculate a^m efficiently.
Using m multiplications is O(m) - too slow.  Divide and conquer suggests
calculating a^(m/2) and squaring.  Indeed a^(2^k) takes k squarings, i.e. log m
squarings. If m is not a power of 2, it is a sum of powers of 2:

m = 42        m = 32 + 8 + 2 = 2^5 + 2^3 + 2^1

In 5 squarings, we can calculate a, a^2, a^4, a^8, a^16, a^32 and with 2 mults
we have a^42 = a^(32+8+2) = a^32 * a^8 * a^2.  In general we can use O(log m)
squarings and O(log m) multiplications.  A long multiplication can be done in
O((log p)^2) steps, making O((log p)^3) overall.

Note that the algorithm does not factorise composite numbers. There is no known
polynomial algorithm for that, and indeed some modern public key cryptography
techniques depend on the fact that you cannot factorise a number of several
hundred digits.

Finally, note that there is an arbitrary precision number package on unix which
can be used in programs, or interactively through the "bc" or "dc" commands.

7.1 String Searching
--------------------

Suppose we are searching for a word of w characters in a text of n characters
held in memory (e.g. in an editor). The obvious algorithm is (essentially):

To find word[1..w] in text[1..n]
--------------------------------
for i := 0 to n-w do
for j := 1 to w do
compare word[j] with text[i+j]

No better method was found until 1970, and no algorithm published until 1977.
The idea is to scan the text faster, using knowledge from the mismatches found.
The easiest method to understand is a variant of the Boyer-Moore algorithm. The
idea is to compare the last letter of the word first. Suppose we are looking
for STING in a text:

A STRING SEARCH
STING
^

The R mismatches. As there is no R in STING, STING cannot possibly match "
STRI", "STRIN", "TRING", or "RING ", so we can skip 5 places forward for the
next comparison:

A STRING SEARCH
STING
^

The S mismatches, but is contained in STING, so we can only skip 4 places:

A STRING SEARCH
STING
^

word: packed array [1..w] of char;
text: packed array [1..n] of char;
skip: array [char] of integer;

procedure initialise;
var j: integer; c: char;
begin
for c := chr(0) to chr(255) do skip[c] := w;
for j := 1 to w do skip[word[j]] := w-j;
end;

function search: integer;
var i,j: integer;
begin
i := w; j := w;
repeat
if text[i] = word[j] then begin
i := i - 1;
j := j - 1;
end else begin
i := i + skip[text[i]];
j := w;
end;
until (j<1) or (i>n);
search := i + 1
end;

These methods are typically O(w+n) rather than O(wn). Note that the
preprocessing is worth it. If the text is fixed, and many searches are to be
done in it, it may be worth preprocessing the text using some hash-table or
binary tree method.  Note that there are variants with less overhead, and there
are variants which never "back up" in the text suitable for file based
searching or for parallel processors.
```