This working paper reviews the array and monadic approaches to graph structures, and makes a proposal for implementation in Brisk.
Trees are well supported by functional languages using recursive data types. In trying to generalise to graphs, two issues arise concerned with sharing and updating which mean that some different kind of representation is needed.
With tree structures, there is no distinction between copying a subtree and sharing it. Thus a tree is often represented internally as a graph, with identical subtrees being shared. This makes updating reasonably efficient. Updating a tree consists of replacing those nodes which refer directly or indirectly to the new information. Typically, this involves replacing the nodes on the path from the root down to the point of change. Other parts of the tree remain unchanged, shared by both the old and new versions of the tree. In a tree with n nodes, an update often involves replacing only about log n nodes.
With a graph structure, there is a distinction between copying a structure and sharing it, otherwise you would not be able to distinguish cycles of different lengths, or to distinguish a cycle from an infinite path. This suggests that each node in a graph must have an identity so that, for example, if you traverse a cycle, you can tell when you have got back to where you started from.
The situation with updating is also different in a graph from a tree. In general, a node must always be replaced by a new one if any of the items it points to represent different values from before. In a graph, the update of any one node would lead to a need to replace all the nodes from which it is accessible, which could easily be the entire graph! This suggests that edges in a graph structure should be represented indirectly via some lookup mechanism.
type Graph = Array Int Node type Node = [Int]This provides the necessary indirection so that a node can be updated in isolation. However, there are disadvantages to the array approach:
We assume that references are abstract, and that equality is the only direct operation on them which might need to be available. We will call the structure in which references are looked up a Heap rather than a graph, since the item which a reference points to is arbitrary, and is not restricted to being a simple node. The operations proposed are:
newHeap :: IO (Heap a) -- create a new heap newRef :: Heap a -> a -> (Ref, Heap a) -- generate a new pointer lookup :: Heap a -> Ref -> a -- like ! in arrays update :: Heap a -> Ref -> a -> Heap a -- like // in arraysThe operation to create a new heap is monadic, in fact an I/O operation. This is to ensure that every heap can be given a separate identity, and references from different heaps can be distinguished. Otherwise, if newHeap just had type Heap a, and we were to define two heaps by applying identical operations to newHeap, then referential transparency would force us to conclude that the two heaps, and all the references generated from them, were identical values. This means that references from one heap could be used in another. By making newHeap monadic, an attempt to use a reference from one heap to look something up in another can be made an error.
There is no operation to find all the references in a heap. This is deliberate; it allows references to be garbage collected. A programmer implementing a graph must explicitly hold onto a node or nodes from which all other nodes of interest are accessible.
A reference can be implemented not directly as an address, but as a small indirection node containing the identity of the heap which it belongs to, and a pointer to the value currently being referenced.
The problem of accessing different versions of a heap at different times can be dealt with using trailers, as with the proposed Brisk implementation of arrays. Thus a heap is represented as node containing just its identity. When a heap is updated, a copy is made and the original is over-written with a node representing an unevaluated expression which refers to the new version and undoes the update.
The lookup operation is implemented by first evaluating the node representing the heap to bring it to the correct version if necessary, then checking that the reference node contains the correct heap identifier, and then indirecting to find the associated value.