In designing a multiprocessor architecture, the motivating factors are that the architecture should be general purpose, easier to program and at the same time scalable. The Data Diffusion Machine (DDM) seeks to fulfil such criteria. The DDM provides shared-data access on distributed memory hardware, allowing data to freely migrate to processors on demand. The DDM concept was originally proposed in terms of a hierarchy of buses, but has since been elaborated for different interconnects. This thesis presents a link-based realisation of the architecture and a link-based coherence protocol which is central in maintaining coherence of data. The link-based protocol exploits the combining properties of the DDM network to minimise traffic in the DDM hierarchy. The protocol also contains efficient and general support for synchronisation. To evaluate the design and performance of new architectures, trace-driven simulation is often used. This thesis presents a novel prototyping and performance evaluation methodology called Multiprocessor Emulation (MPE). Unlike trace-driven simulation, MPE is both fast and accurate and does not require enormous resources as in trace- driven simulation. The thesis presents such an emulator for the DDM which has been implemented on a transputer-based multiprocessor. The emulator has been able to run a number of "standard" shared memory application programs and contains tools to visualise and analyse the performance data obtained from the emulator. A source of distortion in the emulator is the unrealistically long remote and shared access times in the emulator. To bring the access times in line with other comparable architectures, the emulator has the ability to be calibrated. Calibration involves slowing down of the computation in order to bring the communication to computation ratio closer to actual implementations. A simple analytic model has also been devised which projects the performance of larger DDM configurations. Performance evaluation of the DDM using the emulator correlates well with the DASH and SICS DDM results. Applications show a wide-ranging behaviour with miss rates between 0.25 % and 12.5 %. The optimal item size ranges between 32 and 128 bytes. Compared with the cache-coherent NUMA version of the protocol, the DDM protocol performs significantly better.