PRACTICAL ESTIMATION OF SPEED

1.0 The "Big O" Notation

Suppose a program for sorting n numbers takes 10*n^2 microseconds. On a slower
computer the same program might take 20*n^2 microseconds, while on a faster one
it might take 5*n^2 microseconds. The same happens for any program - there is a
constant factor which depends on the speed of the computer, the quality of the
compiler, the skill of the programmer, the unit of time, etc. Thus we say the
program takes "order n^2 time", written "O(n^2) time" meaning that it takes
k*n^2 time for some constant k.

Moreover, a program normally has an overhead for small problems (initialising
time etc.). Thus the time taken might be 1000 + 10n^2 microseconds.  We are not
usually interested in this overhead, so "O(n^2)" means "approximately k*n^2 for
large n".

Often, we can only get an upper bound for the time a program takes, i.e. all we
can say is that it takes at most k*n^2 time, or that the worst case takes k*n^2
time. Thus we end up with the following definition for O(f(n)):

   |  "O(f(n))" means "at most k*f(n) for large n".  |

Occasionally, we will be interested in the best case or average case.  Instead
of using a different notation, we will just say "the time taken is at least
O(n^2)" or "the average time taken is O(n^2)".

We always try to give the simplest and best estimate possible. Thus suppose a
program takes 5*n^2 + 10*n time. Since n^2 grows faster than n, we know that
10*n is less than 10*n^2, so the time is at most 15*n^2, so we can say it is
O(n^2). No better estimate is possible as the time is at least 5*n^2, and no
simpler estimate is possible. This leads to the following rule:

   |  if f grows faster than g then f+g is O(f)  |

Here are some examples of estimation using the rule:

   n^2 + n            is   O(n^2)
   a*n^3 + b*n^2 + d  is   O(n^3)
   a*n + b*n          is   O(n)
   2^n + n^2          is   O(2^n)
   2^n + n!           is   O(n!)
   n^2 + n*log(n)     is   O(n^2)      etc.

Not very many different functions appear as estimates of speed, and we can rank
them roughly as follows. Note that taking O(1) time means taking a fixed
constant amount of time regardless of the problem size.

O(1) (constant time) Wonderful \ O(log(n)) (logarithmic time) Excellent | O(n) (linear time) Very Good | Polynomial algorithms O(n*log(n)) (loglinear time) Good | O(n^2) (quadratic time) Acceptable | O(n^3) (cubic time) Dubious / O(2^n) (exponential time) Hopeless \ Exponential algorithms O(n!) ( " ) Disastrous / We use the letter "n" throughout to stand for the size of the problem (i.e. the amount of information necessary to specify the problem). For sorting, we have to specify n numbers to be sorted, so the problem size is n. For inverting an N*N matrix, the problem size is n = N^2 as there are N^2 elements in the matrix. For sparse matrices, only the non-zero elements need be specified, so the problem size is the number of non-zero elements. For testing whether a large number P is prime, the number will typically be much too large to fit in one computer "word", so the number will have to handled by putting one digit (in some base) in each word. The problem size is the number of digits which is n = O(log(P)). 1.1 Algorithms -------------- Suppose we want to know how long sorting takes. We might begin by writing and analysing an insertion sort program in Pascal like this one from Sedgewick (p.96): To sort array a[1] to a[n] -------------------------- var a: array[0..n] of integer; ..... a[0] := -maxint; {a dummy entry to avoid special cases} ..... 1 for i := 2 to n do 2 begin 3 v := a[i]; j := i; 4 while a[j-1] > v do 5 begin a[j] := a[j-1]; j := j-1 end; 6 a[j] := v; 7 end It is clear (from our knowledge of the way computers work) that line 5 takes a constant amount of time k, i.e. O(k) = O(1) time. The test on line 4, and any other overheads associated with loop processing, take O(1) time for each execution of the loop, giving us O(1+1) = O(1) time in total each time round the loop. The loop causes this to be repeated (at most) n times, giving O(n*1) = O(n) time for lines 4-5. Lines 3 and 6 take constant time each, so the lines 3-6 take O(1+n+1) = O(n) time. This block of lines is then repeated (at most) n times at line 1, in a loop with O(1) overheads, making a total of O(n*(1+n)) = O(n^2) time for lines 1-7. Initialisation takes O(1) time, so we say that THIS PROGRAM takes O(n^2) time.
It should be clear that the insertion-sort algorithm will give the same result regardless of language and programming details, provided it is programmed "sensibly" in a "sensible" sequential language. However, beware of languages with features which have hidden costs -- the string-handling in Basic, for example, while useful, is often very costly making it almost impossible to analyse the running time of a Basic program. Pascal and C have (almost) no hidden costs. Given this, we can say that the insertion-sort ALGORITHM takes O(n^2) time. The analysis of the algorithm can be done just from an informal description of the algorithm such as: sort array a[1]...a[n] ---------------------- 1 for i := 2 to n do 2 assume a[1]...a[i-1] already sorted 3 find the right place to insert a[i], say before a[j] 4 move a[j]...a[i-1] into positions a[j+1]...a[i] 5 insert the old a[i] into position a[j] 6 now a[1]...a[i] are sorted This is the kind of description we will usually use. The details of the dummy value, and the combining of the two steps 3 and 4 into a single loop are irrelevant to the analysis (though they make the program run faster by a constant factor). The analysis now goes as follows. Steps 3 and 4 take O(n) time each (as they involve scanning through up to n elements) and step 5 takes O(1) time. Thus steps 2 to 6 take O(n+n+1) = O(n) time. This is repeated O(n) times by line 1, giving O(n^2) time over all. The COMPUTATIONAL COMPLEXITY of a problem is the amount of time it takes to solve. There are O(n*log(n)) algorithms for sorting, which is better than O(n^2), so the complexity of sorting is O(n*log(n)). We need to be more precise now about how long various things take. 1.2 Simple Estimation --------------------- We make three important assumptions when estimating speed. The first is that the computer has only one processor - design and analysis of parallel algorithms is a new and difficult subject. Second, we assume that all numbers (integer or real) appearing in problems fit in one computer word so that arithmetic takes a small constant amount of time. Third, we assume that there is plenty of (internal) memory, and we ignore virtual memory problems, so that (e.g.) accessing an array element takes a small constant amount of time which is the same for a[1000000] as for a[1]. Arithmetic, array accesses and most built-in procedures are assumed to take a fixed, constant amount of time each, i.e. they are assumed to take O(1) time. Thus the following are O(1): a := b + c * sqrt(d) x[n] := y[n] + z[n] p := p^.next new (p) dispose (p) seta := setb + setc * [1,3,5,9] { + is union, * is intersection }
Note that new and dispose are for dynamic space allocation (malloc and free in C). Although they CAN be implemented in O(1) time (using free lists), they are sometimes implemented using full heap and garbage collection methods, and so can be costly. If the total length of (say) a collection of linked lists can be estimated in advance, it is often more efficient to avoid new and dispose and use an array of records (in Pascal, you would then have to use indexes in place of pointers). Note that the set operations in Pascal are very limited. They can only be used with small sets, e.g. "set of 0..31" and not really useful sets like "set of integer". The current standard does not even insist that "set of char" be available (though it usually is these days). Thus the main use of Pascal sets is to make hardware bit operations available at a high level, and to simplify fiddly things like "if ch in ['A'..'Z','a'..'z','0'..'9'] then". True large scale sets such as "set of integer" have to be implemented by other data structures such as lists, hash tables, trees or whatever. Suppose we have a statement of the form if x then A else B This takes time to evaluate and test expression x (usually O(1)) plus time to do A or time to do B. If we have no way of telling which will happen, we can make it the maximum of the two. If A takes O(n) time and B takes O(n^2) time then the if statement takes O(n^2) time. To get a better estimate, we would need to know how often x is true and how often it is false. For simple loops such as: for i := 1 to n do for j := 1 to n do something which takes O(log(n)) time we work out how long the inner block takes and multiply by the number of times it is executed. Each time round the loop, there will be a few simple statements to increment i and compare it with n. These will take O(1) time, and this can be absorbed into the time taken by the inner block. Thus above, an O(log(n)) operation is repeated n times giving O(n*log(n)) and this is repeated n times to give O(n^2*log(n)) time in total. The same thing often works for simple while loops etc. 1.3 Summing Series ------------------ With more difficult loops, the time taken by the inner block might vary. For example: 1 for i := 1 to n do 2 for j := 1 to i do 3 something which takes O(1) time We might just say i<=n, so step 3 is repeated O(n) times and the outer loop repeats this n times to give O(n^2). Let us try for a better estimate. Line 3 takes O(1) time. Line 2 repeats this i times to give O(i) time. Line 1 repeats this for each i to give a time which is of order 1 + 2 + 3 + ... + n written sigma (i=1..n) i It turns out that the sum is n*(n+1)/2 for which O(n^2) is the best estimate, so we can't improve our original rough guess.
Thus loops often lead to series which need to be summed in order to find good time estimates. However, we only need estimates for the sums, and the following are useful: ------------------------------------------ | sigma (i=1..n) 1 O(n) | | sigma (i=1..n) i O(n^2) | | sigma (i=1..n) i^2 O(n^3) | | sigma (i=1..n) 2^i O(2^n) | | sigma (i=1..n) (1/i) O(log(n)) | | sigma (i=1..n) (1/i^2) O(1) | | sigma (i=1..n) (1/2^i) O(1) | | sigma (i=1..n) (i/2^i) O(1) | ------------------------------------------ Many of these have exact solutions. A method of estimation which works on the hard ones (like 1/i or i/2^i) is to compare them with the corresponding integrals. For the last one, replace 2^i by e^i, and integrate by parts. Note that it can be difficult or impossible to tell how many times a loop is executed. No-one knows whether the following loop is always finite: while n > 1 do if n even then n := n div 2 else n := 3*n + 1 1.4 Procedures, Recursion and Recurrence Relations -------------------------------------------------- Apparently all we have to do to deal with a procedure is to work out how long it takes, then add in that amount each time it is called. For example, we might decide to use an insertion procedure to do insertion sorting: procedure insert(i,j) {moves item a[i] to position a[j] in array a} --------------------- temp := a[i] move a[j]...a[i-1] to positions a[j+1]...a[i] a[j] := temp sort array a[1]...a[n] ---------------------- for i := 2 to n do find the right place to insert a[i], say before a[j] insert (i,j) The procedure takes O(i-j) steps to do the moving. As we know nothing about j (except that it is between 1 and i), call this O(i). In the sorting program, finding the right place is also O(i). Thus the sorting takes time O(2 + 3 + ... + n) = O(sigma (i=2..n) i) = O(n^2). Note that we assume that the time to do the procedure call and the return is O(1), and so can be absorbed into the time taken by the procedure itself. (The call and return are essentially just "goto's" -- you can usually reckon that they take about the equivalent of two simple statements). Many languages also allow recursive procedures, i.e. ones which call themselves (directly or indirectly). This is implemented simply by keeping local variables on a "stack", so that they are "remembered" when the procedure returns. The local variables are accessed directly on the stack via an end-of-stack pointer and so never need to be moved, so call and return are still O(1). On modern processors, recursive procedures are no more expensive than non-recursive ones. Our insertion-sort example can be viewed as a recursive algorithm:
sort (i) {sorts items a[1]...a[i] in array a} -------- if i=1 return {there is nothing to do} sort (i-1) {make sure the first i-1 items are sorted} find the place to insert a[i] and insert it Note that Pascal has no "return" statement, but there is equivalent code. A recursive procedure must always begin with a test for a simple case which does not need recursion, otherwise we get into an infinite loop. Local variables of a procedure include the parameters and all variables declared inside the procedure. A call sort(3) would result in the following being executed: code stack ---- ----- ___ sort(3) |_3_|... test 3=1 ___ ___ sort(2) |_3_|_2_|... test 2=1 ___ ___ ___ sort(1) |_3_|_2_|_1_|... test 1=1 ___ ___ return from sort(1) |_3_|_2_|... insert a[2] before or after a[1] ___ return from sort(2) |_3_|... insert a[3] in a[1]...a[2] return from sort(3) Estimation of speed becomes a bit more difficult. We get a recurrence relation to solve as follows. Write f(n) for the time taken by sort(n). Then: f(n) = f(n-1) + O(n) Solving this gives O(n^2) as usual. Clearly, the loop version is more efficient as it avoids the cost of call and return (and avoids the stack). Recursion can always be removed from a program (after all, the compiler does it). When the stack is not needed as above, it is more efficient to remove the recursion. When the stack is necessary, recursion is often more efficient than implementing the stack for yourself (as there may be a built-in hardware stack). Either way, recursion does not change the speed estimates by more than a constant factor. We will see in the next chapter that recursion provides a very useful way of designing algorithms, so we will usually leave it in. Some useful recurrence relation solutions are: ------------------------------------------ | f(n) = f(n-1) + O(1) O(n) | | f(n) = f(n-1) + O(n) O(n^2) | | f(n) = f(n-1) + O(n^2) O(n^3) | | f(n) = f(n-1) + O(1/n) O(log(n)) | | f(n) = f(n-1) + O(1/n^2) O(1) | | f(n) = f(n-1) + O(1/2^n) O(1) | | | | f(n) = f(n/2) + O(1) O(log(n)) | | f(n) = f(n/2) + O(n) O(n) | | f(n) = f(n/2) + O(n^2) O(n^2) | | | | f(n) = 2*f(n/2) + O(1) O(n) | | f(n) = 2*f(n/2) + O(n) O(n*log(n)) | | f(n) = 2*f(n/2) + O(n^2) O(n^2) | | | | f(n) = 3*f(n/2) + O(1) O(n^log(3)) | (log base 2 - approx O(n^1.58)) | f(n) = 3*f(n/2) + O(n) O(n^log(3)) | | f(n) = 3*f(n/2) + O(n^2) O(n^2) | ------------------------------------------
These results can just be quoted in the exam if needed. The first block of these are solved by expanding into series which can be summed using the results we found before. One way to solve the rest is worth remembering. It works for any "f(n) = a*f(n/b) + ...". For example: f(n) = 2*f(n/2) + O(n) f(2^m) = 2*f(2^(m-1)) + O(2^m) { assume n is a power of 2 } f(2^m)/2^m = f(2^(m-1))/2^(m-1) + O(1) { divide by 2^m } g(m) = g(m-1) + O(1) { put g(m) = f(2^m)/2^m } g(m) = O(m) { from first block of recurrences} f(2^m)/2^m = O(m) { substitute for g } f(n)/n = O(log(n)) { as n=2^m, we have m=log(n) } f(n) = O(n*log(n)) { multiply by n } f(n) = 3*f(n/2) + O(n) f(2^m) = 3*f(2^(m-1)) + O(2^m) { n = 2^m } f(2^m)/3^m = f(2^(m-1))/3^(m-1) + O(2^m/3^m) { divide by 3^m } g(m) = g(m-1) + O((2/3)^m) { g(m) = f(2^m)/3^m } g(m) = O(1) { like sigma 1/2^m } f(2^m)/3^m = O(1) f(2^m) = O(3^m) = O((2^log(3))^m) = O(2^(m*log(3))) = O((2^m)^log(3)) f(n) = O(n^log(3)) When we use "log(n)" in estimation, the base doesn't usually matter, because it makes only a constant factor difference. When it does matter, as above, we usually mean base 2. 1.5 Trees and Logs ------------------ Logarithms to base 2 turn up all over the place. For example, suppose we do a "binary chop" search. We have a sorted array a[1]..a[n] of numbers, and we want to find a particular number x. We compare it with a[n/2] and then search either a[1]...a[n/2] or a[n/2]..a[n] as appropriate. If f(n) is the time to search a portion of array of length n, we get: f(n) = O(1) + f(n/2) = O(1 + 1 + 1 ... + 1) (about log(n) times) = log(n) We get the same effect when we use binary search trees. The size n of a tree is the number of items in it. The time taken to search for an item in a tree is proportional to the depth of the tree. If a binary tree has a perfect shape: . / \ . . / \ / \ then the depth is log(n) (near enough). A tree is said to be WELL-BALANCED if it has depth O(log(n)). Later on, we will look at a technique for making sure that search trees are well-balanced. Even if this technique isn't used, we can often assume the tree is grown "randomly", so we need to know the depth of random trees. The usual searching algorithm is: To search for item in tree -------------------------- if the tree is null, we have failed if root contains what we are looking for, we have succeeded search for item in one of the two sub-trees
Now we can estimate the time this takes in a random tree. On average, the chosen sub-tree will have size n/2 so the time taken is f(n) = O(1) + f(n/2) = O(log(n)) This is only an estimate, but it can be proven that ---------------------------------------------------------------------- | Random binary trees are well-balanced, i.e. have O(log(n)) depth | ---------------------------------------------------------------------- This is remarkable and very important. More precisely what it means is that if a tree is grown truly randomly, the probability of it being unbalanced is provably negligible. In practice, the construction is not truly random, but the chances of producing an unbalanced tree are still usually small.