Robin_PHD/submission_thesis/CH1_introduction/copy.tex


\paragraph{Abstract}{
The ability to assess the safety of man made equipment has been a concern
since the dawn of the industrial age~\cite{indacc01}~\cite{steamboilers}.
The philosophy behind safety measure has progressed
with time, and by world war two~\cite{boffin} we begin to see concepts such as `no single component failure should cause
a dangerous system failure' emerging.
The concept of a double failure causing a dangerous condition being unacceptable,
can be found in the legally binding European standard EN298~\cite{en298}.
More sophisticated statistically based standards, i.e EN61508~\cite{en61508} and variants thereof,
governing failure conditions and determining risk levels associated with systems.

All of these risk assessment techniques are based on variations on the theme of
Failure Mode Effect Analysis (FMEA), which has its roots in the 1940's mass production industry
and was designed to save large companies money by fixing the most financially
draining problems in a product first.

This thesis show that the refinements and additions made to
FMEA to tailor them for military or statistical commercial use, have common flaws
which make them unsuitable for the higher safety requirements of the 21st century.
Problems with state explosion in failure mode reasoning and the impossibility
of integrating software and hardware failure mode models are the most obvious of these. %flaws.
The methodologies are explained in chapter~\ref{sec:chap2} and the advantages and drawbacks
of each FMEA variant are examined in chapter~\ref{sec:chap3}.
In chapter~\ref{sec:chap4}, a new methodology is then proposed which addresses the state explosion problem
and, using contract programmed software, allows the modelling of integrated
software/electrical systems.
This is followed by two chapters showing examples of the new modular FMEA analysis technique (Failure Mode Modular De-Composition FMMD)
firstly looking at electronic circuits and then at electronic/software hybrid systems.
}

\section{Introduction}
The motivation for this study came form two sources, one academic and the other
practical. I had recently completed an
Msc and my project was to create an Euler/Spider Diagram editor in Java.
This editor allowed the user to draw Euler/Spider diagrams, and could then
represent these as abstract---or mathematical---definitions.
At work, writing embedded `C' and assembly language code for safety critical
industrial burners, we were faced with a new and daunting requirement.
Conformance to the latest European standard, EN298. It appeared to ask for the impossible,
not only did it require the usual safety measures (self checking of ROM and RAM, watchdog processors with separate clock sources,  EMC
triple fail safe control of valves), it had one new clause in it, that had far reaching consequences.
It stated that in the event of a failure, where the controller had gone into a `lockout~state'--- a state where the controller
applies all possible safety measures to stop fuel entering the burner---it could not become dangerous should another fault occur.
In short this meant we had to be able to deal with double failures.
Any of the components that could, in failing create a dangerous state, were already
documented and approved using failure mode effects analysis (FMEA). This new requirement
effectively meant that any all combinations of component failures were
now required to be analysed. This, from a state explosion problem alone,
meant that it was going to be virtually impossible to perform.
%
Following the concept of de-composing a problem, and thus simplifying the state explosion---using the thinking behind
the fast Fourier transform (FFT)~\cite{fpodsadsp}[Ch.8], which takes a complex intermeshed series of real and imaginary  number calculations
and by de-composing them simplifies the problem.
My reasoning was that were I to analyse the problem in small modules, from the bottom-up following the FFT example, I could apply
checking for all double failure scenarios.
Once these first modules were analysed, I now call them {\fgs}, I could determine the symptoms of failure for them
Using the symptoms of failure, I could now treat these modules as components, now called {\dcs}, and use them to build higher level
modules. I could apply double simultaneous failure mode checking, because the number of components
in each module/{\fg} was quite small---thus avoiding state explosion problems, but I could apply
double checking all the way up the hierarchy. In fact this meant, as a by-product that many multiple as well as double
failures would be analysed.


Euler/Spider Diagrams
could be used to model failure modes in components.
Contours could represent failure modes, and the spider diagram
`existential~points' instances of failure modes.
By drawing a spider collecting existential points, a common failure symptom could
be determined and from this a new diagram generated automatically, to represent the {\dc}.
Each spider represented a derived failure mode.
These concepts were presented at the ``Euler~2004''~\cite{Clark200519} conference at Brighton University.

--- 2005 paper --- need for static analysis because of
high reliability of modern safety critical systems.

\section{Practical Experience: Safety Critical Product Approvals}

FMEA performed on selected areas perceived as critical
by test house.
Blanket measures, RAM ROM checks, EMC, electrical and environmental stress testing

\subsection{Practical limitations of testing for certification vs. rigorous approach}

State explosion problem considering a failure mode of a given component against
all other components in the system i.e. an exponential ($2^N$) order of processing resource rather than a polynomial i.e. $N^2$.

Impossible to perform double simultaneous failure analysis (as demanded by EN298~\cite{en298}).