This working paper reviews the array and monadic approaches to graph structures, and makes a proposal for implementation in Brisk.

Trees are well supported by functional languages using recursive data types. In trying to generalise to graphs, two issues arise concerned with sharing and updating which mean that some different kind of representation is needed.

With tree structures, there is no distinction between copying a subtree and
sharing it. Thus a tree is often represented internally as a graph, with
identical subtrees being shared. This makes updating reasonably efficient.
Updating a tree consists of replacing those nodes which refer directly or
indirectly to the new information. Typically, this involves replacing the
nodes on the path from the root down to the point of change. Other parts of
the tree remain unchanged, shared by both the old and new versions of the tree.
In a tree with * n * nodes, an update often involves replacing only about
* log n * nodes.

With a graph structure, there is a distinction between copying a structure
and sharing it, otherwise you would not be able to distinguish cycles of
different lengths, or to distinguish a cycle from an infinite path. This
suggests that ** each node in a graph must have an identity ** so that, for
example, if you traverse a cycle, you can tell when you have got back to where
you started from.

The situation with updating is also different in a graph from a tree. In
general, a node must always be replaced by a new one if any of the items it
points to represent different values from before. In a graph, the update of
any one node would lead to a need to replace all the nodes from which it is
accessible, which could easily be the entire graph! This suggests that **
edges in a graph structure should be represented indirectly ** via some
lookup mechanism.

type Graph = Array Int Node type Node = [Int]This provides the necessary indirection so that a node can be updated in isolation. However, there are disadvantages to the array approach:

- Updating an array entry may not be constant time.
- Having the Int type visible is unsafe; it should be made abstract.
- Even then, it would not be possible to prevent someone from indexing one array with an index which belongs to another.
- Adding nodes is awkward; the array may need to be increased in size.
- Deleting nodes leaves holes in the array which need to be garbage collected explicitly.

- It is difficult to have a number of different graphs around at the same time.
- It can still be difficult to prevent one reference from being looked up in a graph to which it does not belong.
- All graph handling is forced to be monadic.

We assume that references are abstract, and that equality is the only
direct operation on them which might need to be available. We will call the
structure in which references are looked up a * Heap * rather than a
graph, since the item which a reference points to is arbitrary, and is not
restricted to being a simple node. The operations proposed are:

newHeap :: IO (Heap a) -- create a new heap newRef :: Heap a -> a -> (Ref, Heap a) -- generate a new pointer lookup :: Heap a -> Ref -> a -- like ! in arrays update :: Heap a -> Ref -> a -> Heap a -- like // in arraysThe operation to create a new heap is monadic, in fact an I/O operation. This is to ensure that every heap can be given a separate identity, and references from different heaps can be distinguished. Otherwise, if

There is no operation to find all the references in a heap. This is deliberate; it allows references to be garbage collected. A programmer implementing a graph must explicitly hold onto a node or nodes from which all other nodes of interest are accessible.

A reference can be implemented not directly as an address, but as a small indirection node containing the identity of the heap which it belongs to, and a pointer to the value currently being referenced.

The problem of accessing different versions of a heap at different times can be dealt with using trailers, as with the proposed Brisk implementation of arrays. Thus a heap is represented as node containing just its identity. When a heap is updated, a copy is made and the original is over-written with a node representing an unevaluated expression which refers to the new version and undoes the update.

The lookup operation is implemented by first evaluating the node representing the heap to bring it to the correct version if necessary, then checking that the reference node contains the correct heap identifier, and then indirecting to find the associated value.

Dr. Ian Holyer, ian@cs.bris.ac.uk. Last modified on Friday 22 August 1997 at 16:59.