Data diffusion architectures (also known as COMA machines) are scalable multiprocessors that provide a shared address space on top of distributed main memory. Their distinctive feature is that data ``diffuses'', or migrates and replicates, in main memory according to whichever processors are using the data; thus effective access time tends to be local access time. This property is possible due to the associative organisation of main memory, which in effect decouples each address and its data item from any physical location. A data item can thus be placed and replicated where it is needed. Also, the physical address space does not have to be a fixed and contiguous address range. It can be any set of addresses within the address range of the processors, possibly varying over time, provided it is smaller than the size of main memory. This flexibility, which is similar to that of an address space under virtual memory management, offers new possibilities to organise virtual memory in order to support a general purpose multiprogramming environment. This thesis presents an analysis of possible ways to organise virtual memory on such machines, and proposes two main alternatives: \it traditional virtual memory (TVM) is organised around a fixed and contiguous physical address space using a traditional mapping; \it associative memory virtual memory (AMVM) is organised around a varying and non-contiguous physical address space using a simpler mapping. Our analysis suggests that AMVM has performance advantages over TVM. The KSR-1, the first commercial data diffusion architecture, has a virtual memory which is similar to AMVM, but is partly integrated with the data diffusing hardware. AMVM is more hardware independent, and has potentially improved performance. For data to diffuse, a fraction of main memory must be reserved as diffusion space to ensure reasonable performance. This thesis presents an analysis of diffusion space requirements that suggests that, on set-associative memory, the adequate provision of diffusion space should start with a base size diffusion space across all memory sets, and that it be adjusted on demand. To evaluate TVM and AMVM, and to gain insight into the provision of diffusion space, a multiprocessor emulation of a data diffusion architecture has been extended to include the emulation of part of the Mach operating system virtual memory. This extension implements TVM; a slightly modified version implements AMVM. On applications tested, AMVM shows a marginal performance gain over TVM. We argue that AMVM will offer greater advantages on applications with higher degrees of parallelism or larger data sets. For the provision of diffusion space to be adequate, our results on set-associative memory suggest the need for a simple interaction between virtual memory software and the data diffusing hardware.