Assessing Diagnostics for Fault Tolerant SoftwareJohn Napier, Assessing Diagnostics for Fault Tolerant Software. PhD thesis. Department of Computer Science, University of Bristol. August 2001. PDF, 764 Kbytes.
Reliability is of prime importance in computer-based safety critical systems where failure can lead to fatal consequences. Fault tolerant techniques in software have a vital role to play, because verification and validation techniques cannot guarantee that software is error free. Fault tolerance further improves the reliability of the system by ensuring it continues to operate safely when residual software errors are encountered. On line diagnosis is a critical aspect of software fault tolerance. At the present time however, there is a lack of any real guidance or understanding of how in practice to use online diagnostics effectively and efficiently. For a particular program design it is difficult to reason about what an effective diagnostic strategy should be, because our current level of understanding of software design errors is so poor. This thesis proposes that through a controlled process of experimentation, aimed at investigating the way software behaves in the presence of simulated faults, our understanding can be improved. In this thesis an empirical method is developed, and demonstrated, which aims to increase our understanding of the key factors influencing the fault detection capabilities of on-line diagnostics. The experiments presented illustrate the potential of this approach and provide new insights of significance into the relevance of these factors. These experiments lay the foundations for a longer term progressive and controlled process of experimentation. Only by continuing the experimental process in this way will it be possible to move towards a better understanding of how to design effective diagnostics for fault tolerant software.