CHAPTER 3 ALGORITHM DESIGN 3.0 Recursion ------------- There is really only one general method of algorithm design - break the problem down into smaller subproblems. Recursion is just a version of this break-down technique which breaks a problem into smaller subproblems of the SAME KIND. It does not mean that the resulting program need have any recursive procedures in it. For example, let's use recursion to analyse the sorting problem. The idea is to reduce the problem of sorting n items to the problem of sorting n-1 items. There are two approaches. One is to sort the first n-1 items then insert the n'th item in its correct place, giving the familiar insertion sort: sort a[1]...a[n] ---------------- if n=1 return sort a[1]...a[n-1] insert a[n] in the sorted list a[1]...a[n-1] Alternatively, we could search for the largest item, put it last, then sort the first n-1 items, giving a selection sort: Sort a[1]...a[n] ---------------- if n=1 return find the largest item and swap it with a[n]. sort a[1]...a[n-1] Both these algorithms take time f(n) satisfying f(n) = f(n-1) + O(n) => f(n) = O(n^2) Insertion sort is simpler and faster than any of the other famous O(n^2) algorithms (especially bubble sort). It is fast on small problems, and fast (linear, i.e. O(n)) on arrays which are almost sorted already. This makes it suitable for combining with more sophisticated algorithms. Remember, a recursive design should always begin with a simple case which can be solved easily. 3.1 Divide And Conquer ---------------------- This is another important technique in algorithm design. It is just a more sophisticated use of recursion in which we try to break the problem into equal sized pieces. In other words, there are two features: a breakdown into (1) SIMILAR PROBLEMS of (2) SIMILAR SIZES.

For example, consider searching for an item in a sorted array a[1]...a[n]. A linear search, looking at each element in turn is O(n). A binary chop is better. The idea is to break the search into two equal sized searches: To search a[i]...[j] for value v -------------------------------- if i > j return "not found" look at the middle element k = (i + j) div 2 if v = a[k] return "k" if v < a[k] then search a[i]...a[k-1] if v > a[k] then search a[k+1]...a[j] WARNING - if you try to "optimize" the 3-way test into 2-way test, you are liable to produce an incorrect algorithm. The time taken is: f(n) = f(n/2) + O(1) => f(n) = O(log(n)) Now let's try to break a sorting problem into two equal sized sorting problems. One approach is to sort the first half, sort the second half and combine the results, giving a merge sort: Sort items a[i]...a[j] ---------------------- if there is only one item, return sort the first half of the items a[1]...a[k] where k = (i+j) div 2 sort the second half of the items a[k+1]...a[j] merge the two sorted sequences Assuming that merging of sequences of length n can be done in time O(n), (see questions) the time f(n) to sort n items using this algorithm is: f(n) = 2f(n/2) + O(n) => f(n) = O(n*log(n)) Alternatively, we could try to split the numbers into low numbers and high numbers first, then sort the low numbers and sort the high numbers. We get the quicksort algorithm: Sort items a[i]...a[j] ---------------------- 1. if there is only one item, return 2. choose a "medium" value v 3. arrange for a[i]...a[k] to contain values <= v and for a[k+1]...a[n] to be > v (whatever k happens to be) 4. sort a[i]...a[k] 5. sort a[k+1]...a[n] The questions remaining in this algorithm are how to choose v at step (2), and how to do the splitting at step 3. Step 3 is done by another recursive design:

Split a[i]...a[j] into those <= v and those > v ----------------------------------------------- if i > j then return if a[i] <= v then split a[i+1]...a[j] and return if a[j] > v then split a[i]...a[j-1] and return now we know a[i] and a[j] are mis-placed swap a[i] and a[j] split a[i+1]...a[j-1] This splitting can be implemented with simple loops rather than a recursive procedure. (Increment i until a[i]>v, decrement j until a[j]<=v, swap a[i] and a[j], repeat). The splitting algorithm clearly examines each item once, so is O(n). More formally, the time taken is f(n) = f(n-1) + O(1) => f(n) = O(n) As for choosing v, we can choose a random item from a[i]...a[j], or if we assume the items are in random order to start with, we can just use v = a[i]. A popular compromise is to use the median of a[i], a[j] and a[(i+j) div 2]. On average, half the items will be less than v and half greater, so the time for quicksort ON AVERAGE is: f(n) = 2f(n/2) + O(n) => f(n) = O(n*log(n)) Thus Divide & Conquer design gives us two O(n*log(n)) algorithms for sorting. Which sorting algorithm should be used in practice? Mergesort is a useful tool in sorting large files. In memory, insertion sort is best for small problems (say up to 10 items) or when speed is unimportant. Otherwise use quicksort. Either way, take the code from (e.g.) Sedgewick. There are various ways of optimising quicksort. One problem is that it spends a lot of its time sorting small pieces of the array, at which insertion sort is better. Thus a good scheme is to change the first line to "if the number of items is less than 10 return". After the quicksort, insertion sort can be used once on the whole array to finish the job - this will be quick since the array is nearly sorted already. Also, one of the two recursive calls in the quicksort algorithm can easily be replaced by a loop. 3.2 Divide & Conquer - Multiplying Polynomials ---------------------------------------------- As another example, consider multiplying polynomials, say with n coefficients. p = 1 + 2*x + 3*x^2 + 4*x^3 + 5*x^4 + 6*x^5 q = 6 + 5*x + 4*x^2 + 3*x^3 + 2*x^4 + 1*x^5 The obvious algorithm involves multiplying each term of one poly by each term of the other, giving an O(n^2) algorithm. The divide and conquer approach suggests splitting each polynomial in two and reducing the problem to that of multiplying polynomials of length n/2:

p = 1+2*x+3*x^2 + x^3*(4+5*x+6*x^2) ----------- ----------- a b q = 6+5*x+4*x^2 + x^3*(3+2*x+1*x^2) ----------- ----------- c d The product p*q is: a*c + x^3*(a*d + b*c) + x^6*(b*d) If we have to perform all 4 multiplications a*c, a*d, b*c, b*d then we have gained nothing. However, a*d + b*c = (a + b) * (c + d) - a*c - b*d and so the product p*q can be calculated using only three half-length multiplications, namely a*c, b*d and (a+b)*(c+d). The additions etc. take O(n) time as they can be done coefficient by coefficient, and so the time taken is: f(n) = 3*f(n/2) + O(n) => f(n) = O(n^log(3)) = O(n^1.59) This is better than O(n^2), but only for large n because of the overheads. There is a O(n*log(n)*log(log(n))) algorithm for multiplying large polynomials based on the Fast Fourier Transform, but it is too complex to be practical except for VERY large problems. For matrix calculations it has been shown that the costs of multiplying two matrices, inverting a matrix, finding the determinant of a matrix etc. are all the same - a fast algorithm for one results in fast algorithms for the others. The normal multiplication method for NXN matrices takes O(N^3) time. There is a divide & conquer method called Strassen's algorithm for multiplying large NXN matrices in O(N^log7) = O(N^2.81...) based on the fact that 2 by 2 matrices can be multiplied using only seven multiplications instead of the usual 8. It becomes noticeably better than the standard method at around N=100. There is a better algorithm still (about O(N^2.795)), but the best possible power of N is not known. Note that the problem size is, strictly speaking, n = N^2 and so the standard method is O(n^(3/2)) = O(n^1.5) and Strassen's is O(n^log7/2) = O(n^1.40...). 3.3 Divide & Conquer - Graphics ------------------------------- As another example, consider Warnock's algorithm for producing a two-dimensional picture from 3-dimensional information (the hidden surface problem). The aim is (for example) to produce a picture of a cube:

----------------------------------- | | | ___________ | | /__________/| | | | || | | | || | (six squares) | | || | | |__________|/ | | | | | | | ----------------------------------- given a description of the position of the cube, and the viewing position and direction. Suppose the cube is represented as six squares of different colours, Front, Back, Top, Bottom, Left, and Right. The viewing rectangle is split into 4 equal pieces, and the squares are also cut into pieces. ------------------|----------------- | (4 pieces) | (4 pieces) | | _______|____ | | /_______|___/| | | | | || | | | | || | ------------------|----------------- | | | || | | |_______|___|/ | | | | | (1 piece) | (4 pieces) | | | | ------------------|----------------- Any piece which is completely covered by another piece can be discarded (see lower left corner). As this process is repeated, a rectangle will eventually have only one piece (or none) in it, and so can be easily be "coloured in" on the view screen. If a rectangle becomes very small without becoming simple enough to colour in a single colour, it can be coloured by some simple heuristic. Algorithms of this type are very common and useful in graphics, particularly as a pre-processing stage, or for use on multiple processors. 3.4 Removing Recursion ---------------------- Recursion can always be removed from an algorithm if desired. This removes the cost of the procedure calls involved. Advice - always remove recursion if you can do it without using your own stack. If you have to use your own stack, you might make the program slower, since the compiler may have access to a special fast hardware stack. Removing recursion only changes the speed of an algorithm by a constant factor, so it does not change the analysis. The easiest kind of recursion to remove is called HEAD- or TAIL-RECURSION, where there is a recursive call that appears very near the beginning or end of the procedure. For example the simplest cases are:

proc (n) proc (n) -------- -------- if n=0 return becomes for i := 1 to n do proc (n-1) etc..... etc..... proc (n) proc (n) -------- -------- if n=0 return becomes for i := n downto 1 do etc..... etc..... proc (n-1) These assume that n is not altered during the procedure. As another example, the last call in quicksort can be removed as follows: Sort items a[i]...a[j] ---------------------- 1. while i+1 < j (number of items > 1) 2. arrange for a[i]...a[k] to be less than a[k+1]...a[j] 3. sort a[i]...a[k] 4. i := k+1 (instead of: sort a[k+1]...a[j]) More complicated recursions can always be removed in the same way that a compiler does, i.e. by using a stack of local variables. This can often be thought of as "things left to be done". For example Quicksort becomes (using a stack of ranges): Sort a stack of ranges ---------------------- while the stack is not empty do take a range i,j from the stack while i+1 < j arrange for a[i]...[k] < a[k+1]...[j] add i,k to the stack i := k+1 3.5 Dynamic programming ----------------------- This is a fancy name for a simple idea, again built on recursion. The idea is to speed up algorithms at the expense of space by keeping a table of previous smaller results. For example, in calculating factorial(n), it helps to keep an array of results, up to the largest previously requested, say r. When factorial(n) is requested, if n <= r, then the value in the array is returned. If n > r, then entries r+1 to n are filled in, using one multiplication each. This speeds the algorithm up from O(n) to an average of O(1). Another example is pagination. Suppose there is a paragraph of text which consists of n words which must be split into lines in the best possible way. Suppose we have an array w[1]...w[n] of words. We are given a function LineCost(i,j) which measures the cost of putting words w[i]...w[j] on one line. The cost is high if the result would be aesthetically un-pleasant, and (effectively) infinite if it would be impossible. A straightforward recursive design gives:

To find min cost of paragraphing words w[1]...w[n] -------------------------------------------------- for i := 1 to n-1 do find min cost of paragraphing w[1]...w[i] add LineCost(i+1,n) return the minimum of the n-1 costs found In other words, for each i we find the best way of splitting the first i words, and putting the remaining words on one more line. If it were implemented directly as a recursive procedure, the complexity would be roughly f(n) = f(1)+f(2)+f(3)+...+f(n-1) = O(1+1+2+4+8+16+...+2^n) = O(2^n) If instead we keep a table of smaller results, so that ParaCost[i] becomes a table lookup instead of a recursive call, we get: To find min cost of paragraphing words w[1]...w[n] -------------------------------------------------- for j := 1 to n do for i := 1 to j-1 do find Paracost[i] + Linecost(i+1,j) Paracost[j] := minimum of the costs found return Paracost[n] Now f(n) = O(n^3), assuming that LineCost takes time O(n). This is the approach which the TeX typesetting package takes. 3.6 Dynamic Programming - Knapsack ---------------------------------- Another example is the knapsack problem. Suppose there are items of n types varying in size and value, and a knapsack of size K, e.g. size each (say cc) value each (say $) stack of gold bars 12 24 heap of silver coins 7 13 tree of banknotes 2 1 knapsack 28 What is the greatest value of items that can be put in the knapsack? Any number of items of one type can be used. An approximate answer can be obtained using a "greedy" algorithm by starting with the items which are most valuable for their size. In the above example, gold is best (2 $/cc) then silver (13/7 $/cc) then notes (1/2 $/cc). To fill the knapsack, we put in as much gold as possible (2 bars) then silver coins (none) then notes (2) for a total value of 48+2 = 50. Unfortunately, this algorithm does not give the best answer, as it is better to put in 4 silver coins for a total value of 4*13 = 52. For an exact answer, we must turn to recursion. There are n choices of first item, and for each choice we find the best way of filling the remaining space. Then we take the maximum over the n choices.

To find the max value of a knapsack of size K --------------------------------------------- for i := 1 to n do find value[i] plus max value of a knapsack of size (K-size[i]) return the maximum of the values found The time taken f(n) is again at least a sum of n previous values f(i), giving f(n) = O(2^n). Again if we tabulate the answers as knapsack[j], we get: To find max value of knapsack of size K --------------------------------------- for j := 1 to K do for i := 1 to n do if size[i] <= K then find value[i] + knapsack[j-size[i]] knapsack[j] := maximum value found return knapsack[K] which takes O(K*n) time. If we keep track of the best item as well as the max value at each stage, we can construct the optimum contents as well as the max value of the contents. The method can be adapted to the case when there are a limited number of items of each type. Note that this method is proportional to K, and is no good if K is large, say exponential in n. If the numbers in the problem are real instead of integer, this is equivalent to K being large. 3.7 Dynamic Programming - Sequences of Matrices ----------------------------------------------- Another example comes from multiplying matrices of different sizes. Suppose Mab stands for a matrix with 'a' rows and 'b' columns. The cost of multiplying Mab by Mbc is a*b*c (since there are 'a' rows of Mab to multiply by 'c' columns of Mbc, and each row/col has 'b' items in it). The cost of multiplying many matrices depends on the order. If a=2, b=3, c=1, d=4 then (Mab * Mbc) * Mcd costs a*b*c + a*c*d = 14 Mab * (Mbc * Mcd) costs b*c*d + a*b*d = 36 so the first order is better. Suppose we know the size of each of n matrices M1...Mn, and we want to find the minimum cost of multiplying them all together, i.e. of calculating M1*M2*...*Mn, we have a recursive design: To find min cost of multiplying M1*M2*...*Mn -------------------------------------------- for i := 1 to n-1 do find min cost of multiplying M1*...*Mi find min cost of multiplying Mi+1*...*Mn add these to the cost of multiplying (M1*...*Mi) by (Mi+1*...*Mn) return the minimum of these n-1 costs A two-dimensional table cost[i,j] is needed to avoid an exponential algorithm. Cost[i,j] is the cost of multiplying Mi*...*Mj. When calculating cost[i,j], we need to have available previous values of cost[i,k] and cost[k+1,j].

We can ensure this by filling in cost[i,j] in order of increasing j-i. First we fill in the diagonal i=j. Second we fill in the diagonal j-i=1. Third, we fill in the diagonal j-i=2, and so on. j --> ------------ i | 1 2 3 4 5 | 1 2 3 4 | | 1 2 3 v | 1 2 | 1 To find min cost of multiplying M1*...*Mn ----------------------------------------- initialise cost[i,i] to zero for L := 1 to n-1 do for i := 1 to n-L do calculate cost[i,i+L] as follows for j := i to i+L-1 do calculate cost[i,j] + cost[j+1,i+L] add cost of multiplying (Mi*...*Mj) by (Mj+1*...*Mi+L) cost[i,i+L] := min of values found return cost[1,n] This gives us an O(n^3) algorithm. Again, if we record with cost[i,j] the best place to split Mi...Mj, we can reconstruct an optimum order of multiplications. 3.8 Dynamic Programming - Optimal Search Trees ---------------------------------------------- Suppose we have a set of strings s[1]...s[n] in order, with known frequencies of occurrence f[1]...f[n]: s: A B C D E F G f: .20 .10 .05 .15 .30 .15 .05 We know that there will be a large number of searches for these strings, the strings appearing with the frequencies given (e.g. 20% of the searches are for string A). We want to create a binary search tree from them such as D --- 1 comparison / \ / \ A F --- 2 comparisons \ / \ B E G --- 3 comparisons \ C --- 4 comparisons before searching begins. We want the tree with minimum average search time. We measure the cost of a tree by multiplying the number of comparisons needed to find a string (distance from the root + 1) by the frequency for the string and summing. We want the tree with smallest cost. (This is like Huffman trees, except we must preserve the order). All nodes to the left of a node must be earlier, and all nodes to the right must be later.

If we choose s[i] as root, its left subtree must contain s[1]...s[i-1] and its right subtree must contain s[i+1]...s[n]. Thus we try each string s[1]...s[n] as root. Let weight[i,j] be the sum of the frequencies f[i]+...+f[j]. A recursive design gives: Find optimum time for s[i]...s[j] --------------------------------- for k := i to j do find optimum time t1 for s[i]...s[k-1] find optimum time t2 for s[k+1]...s[j] calculate t1 + t2 + weight[i,j] return the minimum of the n times found Note that every string in the tree takes one more comparison than it does in the two subtrees, so the total time is t1+(f(i)+..+f(k-1)) for the left tree, plus f(k) for the root plus t2+(f(k+1)+..+f(j)) for the right tree, giving a total of t1+t2+weight[i,j]. Again, this gives an exponential algorithm, and we need a two-dimensional table cost[i,j] = "optimum time for s[i]...s[j]". Again, we must fill it in order of increasing length j-i. Find optimum time for s[1]...s[n] --------------------------------- for L := 1 to n-1 do for i := 1 to n-L do find cost[i,i+L] as follows for j := i to i+L do t1 := cost[i,j-1] t2 := cost[j+1,i+L] find f[j] + t1 + weight[i,j-1] + t2 + weight[j+1,i+L] cost[i,i+L] := minimum of the costs found return cost[1,n] Again, this gives an O(n^3) algorithm. 3.9 Backtracking ---------------- Some problems are intractable, that is, there is no known efficient algorithm, and we have to resort to an exhaustive search for possible solutions. Such algorithms are generally exponential, and suffer from "combinatorial explosion". That is, there is a problem size, say 20 items, beyond which there is no practical hope of solution. To cope with 21 items would require a computer 20 times as powerful, and to cope with 22 items would require 400 times the power. The aim in designing exhaustive search algorithms is to put off the explosion as long as possible. For example, if we can improve an O(2^n) algorithm to O(2^(n/2)), we may put off the explosion from size 20 to size 40. The most famous problem of this kind is the Travelling Salesperson Problem (which always used to be called the Travelling Salesman Problem and is now sometimes called the Weary Traveller Problem). Given a number of towns and inter-town distances, find the shortest route which visits each town once. For example:

A B C D E .... A 5 6 7 8 B 4 3 3 C 2 4 D 3 E ... The statement of the problem says that the traveller must visit each town only once. There are circumstances where you might think it better to visit a town more than once. Suppose town T is visited twice, the second time being between visits to A and B (...T...ATB...). Then this must be because the sum of distances AT + TB is smaller than distance AB, otherwise it would be better to leave out the second visit. But that means that the shortest distance from A to B (which just HAPPENS to go through T) hasn't been recorded properly. Thus we may assume that the triangle inequality holds (direct routes are always shortest) and that the traveller visits towns only once. The most obvious algorithm for this is to generate all the n! permutations of the towns, calculate the length of each tour, and keep track of the best one so far. This is O(n!), but remember that success is measured by cutoff point and not by asymptotic behaviour. There are obviously gains to made through symmetry, but symmetry is a rather special problem. A more general technique is PRUNING. Suppose we generate the permutations as follows: begin with each town A,B,C,D... in turn, and for each generate all permutations of the n-1 remaining items giving ABCD... ABDC... ACBD... ACDB... ... When all possibilities of one type have been exhausted, we "backtrack" and try the next type. Note this is just another form of recursion. We may as well keep track of the length of the partial path. Suppose the best tour found so far is 20 and the length of ACBD is >= 20. Then we can ignore all permutations beginning ACBD... This is called pruning because it is cutting off a whole subtree of the search tree: . / \ A / \ AC / ACB / ACBD (this subtree is pruned) //\\

Pruning is more successful if an answer close to optimal is found early. Thus it pays to make the order of search one which makes this likely. For example, having decided on ACBD, the next town chosen could be the one closest to D. Note that the majority of nodes in a tree are near the bottom, so pruning can be very effective. It may be worth applying a good heuristic algorithm first to get a good solution to aid the pruning. Using very sophisticated techniques, the explosion point for this problem is around 100-200. 3.10 Heuristic Methods For Intractable Problems ----------------------------------------------- If faced with intractable problems as above, with problem sizes over the explosion point, we have to resort to approximate methods. These are difficult to design and almost impossible to analyse. Perhaps the commonest method is to look for a LOCAL OPTIMUM. In the travelling salesperson case, we could begin with a reasonable solution and make small changes. When no more small changes can improve the solution, we have a local maximum - no solution "close by" is any better. One quite successful method is to split the current solution into (say) 3 paths and investigate all possible ways of reconnecting the paths: ABCDEFGHI JKLMNOPQR STUVWXYZ ABCDEFGHI JKLMNOPQR ZYXWVUTS ABCDEFGHI RQPONMLKJ STUVWXYZ ABCDEFGHI RQPONMLKJ ZYXWVUTS ABCDEFGHI STUVWXYZ JKLMNOPQR ABCDEFGHI STUVWXYZ RQPONMLKJ ABCDEFGHI ZYXWVUTS JKLMNOPQR ABCDEFGHI ZYXWVUTS RQPONMLKJ It is difficult to say how well the method performs as there is no practical method of finding the best solution to compare against.