|
|
home >
philosophy >
motivation
The Design Error ProblemA computer system may fail to perform as expected because a physical component fails or because a design error is uncovered. For a system to be both ultra-reliable and safe, both of these potential causes of failure must be handled. Established techniques exist for handling physical component failure; these techniques use redundancy and voting. The reliability assessment problem in the presence of physical faults is based on Markov modeling techniques and is well understood. The design error problem is a much greater threat. Unfortunately, no scientifically justifiable defense against this threat is currently used in practice. There are 3 basic strategies that are advocated for dealing with the design error:
Many advocate design diversity as a means to overcome the limitations of testing. The basic idea is to use separate design/implementation teams to produce multiple versions from the same specification. Then, non-exact threshold voters are used to mask the effect of a design error in one of the versions. The hope is that the design flaws will manifest errors independently or nearly so. By assuming independence one can obtain ultra-reliable-level estimates of reliability even though the individual versions have failure rates on the order of 10^{-4}. Unfortunately, the independence assumption has been rejected at the 99% confidence level in several experiments for low reliability software. Furthermore, the independence assumption cannot ever be validated for high reliability software because of the exorbitant test times required. If one cannot assume independence then one must measure correlations. This is infeasible as well---it requires as much testing time as life-testing the system because the correlations must be in the ultra-reliable region in order for the system to be ultra-reliable. Therefore, it is not possible, within feasible amounts of testing time, to establish that design diversity achieves ultra-reliability. Consequently, design diversity can create an ``illusion" of ultra-reliability without actually providing it. From this analysis, we conclude that formal methods currently offers the most promising method for handling the design fault problem. Because the often quoted 1 - 10^{-9} reliability is beyond the range of quantification, we have no choice but to develop life-critical systems in the most rigorous manner available to us, which is the use of formal methods. References
|
||
home | welcome | quick page | philosophy | team | research | quote | links | new? Curator and Responsible NASA Official: Ricky W. Butler larc privacy statement last modified: 6 August 2001 (14:19:51) |