```
CHAPTER 1

PRACTICAL ESTIMATION OF SPEED

1.0 The "Big O" Notation
------------------------

Suppose a program for sorting n numbers takes 10*n^2 microseconds. On a slower
computer the same program might take 20*n^2 microseconds, while on a faster one
it might take 5*n^2 microseconds. The same happens for any program - there is a
constant factor which depends on the speed of the computer, the quality of the
compiler, the skill of the programmer, the unit of time, etc. Thus we say the
program takes "order n^2 time", written "O(n^2) time" meaning that it takes
k*n^2 time for some constant k.

Moreover, a program normally has an overhead for small problems (initialising
time etc.). Thus the time taken might be 1000 + 10n^2 microseconds.  We are not
usually interested in this overhead, so "O(n^2)" means "approximately k*n^2 for
large n".

Often, we can only get an upper bound for the time a program takes, i.e. all we
can say is that it takes at most k*n^2 time, or that the worst case takes k*n^2
time. Thus we end up with the following definition for O(f(n)):

---------------------------------------------------
|  "O(f(n))" means "at most k*f(n) for large n".  |
---------------------------------------------------

Occasionally, we will be interested in the best case or average case.  Instead
of using a different notation, we will just say "the time taken is at least
O(n^2)" or "the average time taken is O(n^2)".

We always try to give the simplest and best estimate possible. Thus suppose a
program takes 5*n^2 + 10*n time. Since n^2 grows faster than n, we know that
10*n is less than 10*n^2, so the time is at most 15*n^2, so we can say it is
O(n^2). No better estimate is possible as the time is at least 5*n^2, and no
simpler estimate is possible. This leads to the following rule:

-----------------------------------------------
|  if f grows faster than g then f+g is O(f)  |
-----------------------------------------------

Here are some examples of estimation using the rule:

n^2 + n            is   O(n^2)
a*n^3 + b*n^2 + d  is   O(n^3)
a*n + b*n          is   O(n)
2^n + n^2          is   O(2^n)
2^n + n!           is   O(n!)
n^2 + n*log(n)     is   O(n^2)      etc.

Not very many different functions appear as estimates of speed, and we can rank
them roughly as follows. Note that taking O(1) time means taking a fixed
constant amount of time regardless of the problem size.

O(1)        (constant time)    Wonderful       \
O(log(n))   (logarithmic time) Excellent        |
O(n)        (linear time)      Very Good        |   Polynomial algorithms
O(n*log(n)) (loglinear time)   Good             |
O(n^2)      (quadratic time)   Acceptable       |
O(n^3)      (cubic time)       Dubious         /

O(2^n)      (exponential time) Hopeless        \     Exponential algorithms
O(n!)       (      "         ) Disastrous      /

We use the letter "n" throughout to stand for the size of the problem (i.e.
the amount of information necessary to specify the problem). For sorting, we
have to specify n numbers to be sorted, so the problem size is n.  For
inverting an N*N matrix, the problem size is n = N^2 as there are N^2 elements
in the matrix. For sparse matrices, only the non-zero elements need be
specified, so the problem size is the number of non-zero elements.  For testing
whether a large number P is prime, the number will typically be much too large
to fit in one computer "word", so the number will have to handled by putting
one digit (in some base) in each word. The problem size is the number of
digits which is n = O(log(P)).

1.1 Algorithms
--------------

Suppose we want to know how long sorting takes. We might begin by writing and
analysing an insertion sort program in Pascal like this one from Sedgewick
(p.96):

To sort array a[1] to a[n]
--------------------------
var a: array[0..n] of integer;
.....
a[0] := -maxint;   {a dummy entry to avoid special cases}
.....
1   for i := 2 to n do
2   begin
3       v := a[i];  j := i;
4       while a[j-1] > v do
5           begin a[j] := a[j-1]; j := j-1 end;
6       a[j] := v;
7   end

It is clear (from our knowledge of the way computers work) that line 5 takes a
constant amount of time k, i.e. O(k) = O(1) time.

The test on line 4, and any other overheads associated with loop processing,
take O(1) time for each execution of the loop, giving us O(1+1) = O(1) time in
total each time round the loop. The loop causes this to be repeated (at most) n
times, giving O(n*1) = O(n) time for lines 4-5.

Lines 3 and 6 take constant time each, so the lines 3-6 take O(1+n+1) = O(n)
time.

This block of lines is then repeated (at most) n times at line 1, in a loop
with O(1) overheads, making a total of O(n*(1+n)) = O(n^2) time for lines 1-7.
Initialisation takes O(1) time, so we say that THIS PROGRAM takes O(n^2) time.

It should be clear that the insertion-sort algorithm will give the same result
regardless of language and programming details, provided it is programmed
"sensibly" in a "sensible" sequential language.  However, beware of languages
with features which have hidden costs -- the string-handling in Basic, for
example, while useful, is often very costly making it almost impossible to
analyse the running time of a Basic program.  Pascal and C have (almost) no
hidden costs.

Given this, we can say that the insertion-sort ALGORITHM takes O(n^2) time.
The analysis of the algorithm can be done just from an informal description of
the algorithm such as:

sort array a[1]...a[n]
----------------------
1   for i := 2 to n do
2      assume a[1]...a[i-1] already sorted
3      find the right place to insert a[i], say before a[j]
4      move a[j]...a[i-1] into positions a[j+1]...a[i]
5      insert the old a[i] into position a[j]
6      now a[1]...a[i] are sorted

This is the kind of description we will usually use.  The details of the dummy
value, and the combining of the two steps 3 and 4 into a single loop are
irrelevant to the analysis (though they make the program run faster by a
constant factor).

The analysis now goes as follows. Steps 3 and 4 take O(n) time each (as they
involve scanning through up to n elements) and step 5 takes O(1) time. Thus
steps 2 to 6 take O(n+n+1) = O(n) time. This is repeated O(n) times by line 1,
giving O(n^2) time over all.

The COMPUTATIONAL COMPLEXITY of a problem is the amount of time it takes to
solve. There are O(n*log(n)) algorithms for sorting, which is better than
O(n^2), so the complexity of sorting is O(n*log(n)).

We need to be more precise now about how long various things take.

1.2 Simple Estimation
---------------------

We make three important assumptions when estimating speed. The first is that
the computer has only one processor - design and analysis of parallel
algorithms is a new and difficult subject.  Second, we assume that all numbers
(integer or real) appearing in problems fit in one computer word so that
arithmetic takes a small constant amount of time.  Third, we assume that there
is plenty of (internal) memory, and we ignore virtual memory problems, so that
(e.g.) accessing an array element takes a small constant amount of time which
is the same for a[1000000] as for a[1].

Arithmetic, array accesses and most built-in procedures are assumed to take a
fixed, constant amount of time each, i.e. they are assumed to take O(1) time.
Thus the following are O(1):

a := b + c * sqrt(d)
x[n] := y[n] + z[n]
p := p^.next
new (p)
dispose (p)
seta := setb + setc * [1,3,5,9]    { + is union, * is intersection }

Note that new and dispose are for dynamic space allocation (malloc and free in
C). Although they CAN be implemented in O(1) time (using free lists), they are
sometimes implemented using full heap and garbage collection methods, and so
can be costly. If the total length of (say) a collection of linked lists can be
estimated in advance, it is often more efficient to avoid new and dispose and
use an array of records (in Pascal, you would then have to use indexes in place
of pointers).

Note that the set operations in Pascal are very limited. They can only be used
with small sets, e.g. "set of 0..31" and not really useful sets like "set of
integer". The current standard does not even insist that "set of char" be
available (though it usually is these days). Thus the main use of Pascal sets
is to make hardware bit operations available at a high level, and to simplify
fiddly things like "if ch in ['A'..'Z','a'..'z','0'..'9'] then". True large
scale sets such as "set of integer" have to be implemented by other data
structures such as lists, hash tables, trees or whatever.  Suppose we have a
statement of the form

if x then A else B

This takes time to evaluate and test expression x (usually O(1)) plus time to
do A or time to do B. If we have no way of telling which will happen, we can
make it the maximum of the two. If A takes O(n) time and B takes O(n^2) time
then the if statement takes O(n^2) time. To get a better estimate, we would
need to know how often x is true and how often it is false.  For simple loops
such as:

for i := 1 to n do
for j := 1 to n do
something which takes O(log(n)) time

we work out how long the inner block takes and multiply by the number of times
it is executed. Each time round the loop, there will be a few simple statements
to increment i and compare it with n. These will take O(1) time, and this can
be absorbed into the time taken by the inner block.  Thus above, an O(log(n))
operation is repeated n times giving O(n*log(n)) and this is repeated n times
to give O(n^2*log(n)) time in total. The same thing often works for simple
while loops etc.

1.3 Summing Series
------------------

With more difficult loops, the time taken by the inner block might vary.
For example:

1   for i := 1 to n do
2      for j := 1 to i do
3         something which takes O(1) time

We might just say i<=n, so step 3 is repeated O(n) times and the outer loop
repeats this n times to give O(n^2). Let us try for a better estimate.  Line 3
takes O(1) time. Line 2 repeats this i times to give O(i) time. Line 1 repeats
this for each i to give a time which is of order

1 + 2 + 3 + ... + n      written     sigma (i=1..n) i

It turns out that the sum is n*(n+1)/2 for which O(n^2) is the best estimate,
so we can't improve our original rough guess.

Thus loops often lead to series which need to be summed in order to find good
time estimates. However, we only need estimates for the sums, and the following
are useful:

------------------------------------------
|  sigma (i=1..n) 1          O(n)        |
|  sigma (i=1..n) i          O(n^2)      |
|  sigma (i=1..n) i^2        O(n^3)      |
|  sigma (i=1..n) 2^i        O(2^n)      |
|  sigma (i=1..n) (1/i)      O(log(n))   |
|  sigma (i=1..n) (1/i^2)    O(1)        |
|  sigma (i=1..n) (1/2^i)    O(1)        |
|  sigma (i=1..n) (i/2^i)    O(1)        |
------------------------------------------

Many of these have exact solutions. A method of estimation which works on the
hard ones (like 1/i or i/2^i) is to compare them with the corresponding
integrals. For the last one, replace 2^i by e^i, and integrate by parts.

Note that it can be difficult or impossible to tell how many times a loop is
executed. No-one knows whether the following loop is always finite:

while n > 1 do
if n even then n := n div 2 else n := 3*n + 1

1.4 Procedures, Recursion and Recurrence Relations
--------------------------------------------------

Apparently all we have to do to deal with a procedure is to work out how long
it takes, then add in that amount each time it is called. For example, we might
decide to use an insertion procedure to do insertion sorting:

procedure insert(i,j)    {moves item a[i] to position a[j] in array a}
---------------------
temp := a[i]
move a[j]...a[i-1] to positions a[j+1]...a[i]
a[j] := temp

sort array a[1]...a[n]
----------------------
for i := 2 to n do
find the right place to insert a[i], say before a[j]
insert (i,j)

The procedure takes O(i-j) steps to do the moving. As we know nothing about j
(except that it is between 1 and i), call this O(i). In the sorting program,
finding the right place is also O(i). Thus the sorting takes time
O(2 + 3 + ...  + n) = O(sigma (i=2..n) i) = O(n^2).

Note that we assume that the time to do the procedure call and the return is
O(1), and so can be absorbed into the time taken by the procedure itself.
(The call and return are essentially just "goto's" -- you can usually reckon
that they take about the equivalent of two simple statements).

Many languages also allow recursive procedures, i.e. ones which call themselves
(directly or indirectly). This is implemented simply by keeping local variables
on a "stack", so that they are "remembered" when the procedure returns. The
local variables are accessed directly on the stack via an end-of-stack pointer
and so never need to be moved, so call and return are still O(1). On modern
processors, recursive procedures are no more expensive than non-recursive ones.
Our insertion-sort example can be viewed as a recursive algorithm:

sort (i)          {sorts items a[1]...a[i] in array a}
--------
if i=1 return     {there is nothing to do}
sort (i-1)        {make sure the first i-1 items are sorted}
find the place to insert a[i] and insert it

Note that Pascal has no "return" statement, but there is equivalent code.  A
recursive procedure must always begin with a test for a simple case which does
not need recursion, otherwise we get into an infinite loop. Local variables of
a procedure include the parameters and all variables declared inside the
procedure. A call sort(3) would result in the following being executed:

code                                    stack
----                                    -----
___
sort(3)                                 |_3_|...
test 3=1                                 ___ ___
sort(2)                                 |_3_|_2_|...
test 2=1                              ___ ___ ___
sort(1)                              |_3_|_2_|_1_|...
test 1=1                           ___ ___
return from sort(1)               |_3_|_2_|...
insert a[2] before or after a[1]      ___
return from sort(2)                  |_3_|...
insert a[3] in a[1]...a[2]
return from sort(3)

Estimation of speed becomes a bit more difficult. We get a recurrence relation
to solve as follows. Write f(n) for the time taken by sort(n). Then:

f(n) = f(n-1) + O(n)

Solving this gives O(n^2) as usual. Clearly, the loop version is more efficient
as it avoids the cost of call and return (and avoids the stack). Recursion can
always be removed from a program (after all, the compiler does it). When the
stack is not needed as above, it is more efficient to remove the recursion.
When the stack is necessary, recursion is often more efficient than
implementing the stack for yourself (as there may be a built-in hardware
stack).  Either way, recursion does not change the speed estimates by more than
a constant factor. We will see in the next chapter that recursion provides a
very useful way of designing algorithms, so we will usually leave it in.
Some useful recurrence relation solutions are:

------------------------------------------
|  f(n) = f(n-1) + O(1)      O(n)        |
|  f(n) = f(n-1) + O(n)      O(n^2)      |
|  f(n) = f(n-1) + O(n^2)    O(n^3)      |
|  f(n) = f(n-1) + O(1/n)    O(log(n))   |
|  f(n) = f(n-1) + O(1/n^2)  O(1)        |
|  f(n) = f(n-1) + O(1/2^n)  O(1)        |
|                                        |
|  f(n) = f(n/2) + O(1)      O(log(n))   |
|  f(n) = f(n/2) + O(n)      O(n)        |
|  f(n) = f(n/2) + O(n^2)    O(n^2)      |
|                                        |
|  f(n) = 2*f(n/2) + O(1)    O(n)        |
|  f(n) = 2*f(n/2) + O(n)    O(n*log(n)) |
|  f(n) = 2*f(n/2) + O(n^2)  O(n^2)      |
|                                        |
|  f(n) = 3*f(n/2) + O(1)    O(n^log(3)) | (log base 2 - approx O(n^1.58))
|  f(n) = 3*f(n/2) + O(n)    O(n^log(3)) |
|  f(n) = 3*f(n/2) + O(n^2)  O(n^2)      |
------------------------------------------

These results can just be quoted in the exam if needed.  The first block of
these are solved by expanding into series which can be summed using the results
we found before. One way to solve the rest is worth remembering. It works for
any "f(n) = a*f(n/b) + ...". For example:

f(n)   = 2*f(n/2) + O(n)
f(2^m) = 2*f(2^(m-1)) + O(2^m)         { assume n is a power of 2 }
f(2^m)/2^m = f(2^(m-1))/2^(m-1) + O(1) { divide by 2^m }
g(m) = g(m-1) + O(1)                   { put g(m) = f(2^m)/2^m }
g(m) = O(m)                            { from first block of recurrences}
f(2^m)/2^m = O(m)                      { substitute for g }
f(n)/n = O(log(n))                     { as n=2^m, we have m=log(n) }
f(n) = O(n*log(n))                     { multiply by n }

f(n) = 3*f(n/2) + O(n)
f(2^m) = 3*f(2^(m-1)) + O(2^m)                   { n = 2^m }
f(2^m)/3^m = f(2^(m-1))/3^(m-1) + O(2^m/3^m)     { divide by 3^m }
g(m) = g(m-1) + O((2/3)^m)                       { g(m) = f(2^m)/3^m }
g(m) = O(1)                                      { like sigma 1/2^m }
f(2^m)/3^m = O(1)
f(2^m) = O(3^m) = O((2^log(3))^m) = O(2^(m*log(3))) = O((2^m)^log(3))
f(n) = O(n^log(3))

When we use "log(n)" in estimation, the base doesn't usually matter, because it
makes only a constant factor difference. When it does matter, as above, we
usually mean base 2.

1.5 Trees and Logs
------------------

Logarithms to base 2 turn up all over the place. For example, suppose we do a
"binary chop" search. We have a sorted array a[1]..a[n] of numbers, and we
want to find a particular number x. We compare it with a[n/2] and then search
either a[1]...a[n/2] or a[n/2]..a[n] as appropriate. If f(n) is the time to
search a portion of array of length n, we get:

f(n) = O(1) + f(n/2)
= O(1 + 1 + 1 ... + 1)  (about log(n) times)
= log(n)

We get the same effect when we use binary search trees. The size n of a tree is
the number of items in it. The time taken to search for an item in a tree is
proportional to the depth of the tree. If a binary tree has a perfect shape:

.
/ \
.   .
/ \ / \

then the depth is log(n) (near enough). A tree is said to be WELL-BALANCED if
it has depth O(log(n)). Later on, we will look at a technique for making sure
that search trees are well-balanced. Even if this technique isn't used, we can
often assume the tree is grown "randomly", so we need to know the depth of
random trees.  The usual searching algorithm is:

To search for item in tree
--------------------------
if the tree is null, we have failed
if root contains what we are looking for, we have succeeded
search for item in one of the two sub-trees

Now we can estimate the time this takes in a random tree. On average, the
chosen sub-tree will have size n/2 so the time taken is

f(n) = O(1) + f(n/2) = O(log(n))

This is only an estimate, but it can be proven that

----------------------------------------------------------------------
|  Random binary trees are well-balanced, i.e. have O(log(n)) depth  |
----------------------------------------------------------------------

This is remarkable and very important. More precisely what it means is that if
a tree is grown truly randomly, the probability of it being unbalanced is
provably negligible. In practice, the construction is not truly random, but the
chances of producing an unbalanced tree are still usually small.
```