Good Friday morning

This commit is contained in:
Robin Clark 2013-03-29 15:38:36 +00:00
parent 0ea57ac50c
commit a7aa5e3854
2 changed files with 236 additions and 155 deletions

View File

@ -3,20 +3,32 @@
\label{sec:chap2}
The generic and statistical European Safety Standard, EN61508:6\cite{en61508}[B.6.6]
describes Failure Mode Effect Analysis (FMEA) as:
describes FMEA as:
\begin{quotation}
"To analyse a system design, by examining all possible sources of failure
``To analyse a system design, by examining all possible sources of failure
of a system's components and determining the effects of these failures
on the behaviour and safety of the system."
on the behaviour and safety of the system.''
\end{quotation}.
\section*{Introduction}
This chapter introduces Failure Mode Effect Analysis (FMEA).
%It begins with a simple example to demonstrate the basic concept of FMEA
%and then
It starts by looking at how we determine the failure modes associated with components.
Two common electrical components, the resistor and the operational amplifier
and examined in the context of two sources of information that define failure modes.
A simple example of an FMEA is then given.
The four main variants are then described and finally we conclude by describing concepts
that underlie the usage and philosophy of FMEA.
\section{FMEA}
\section{FMEA Basic concept.}
\label{basicfmea}
%\subsection{FMEA}
%\tableofcontents[currentsection]
\paragraph{FMEA basic concept.}
%\paragraph{FMEA basic concept.}
FMEA~\cite{safeware}[pp.341-344] is widely used, and proof of its use is a mandatory legal requirement
for a large proportion of safety critical products sold in the European Union.
@ -62,15 +74,16 @@ the effectiveness of FMEA.
In order to apply any form of FMEA we need to know the ways in which
the components we are using can fail.
%
A good introduction to hardware and software failure modes may be found in~\cite{sccs}[pp.114-124].
\footnote{A good introduction to hardware and software failure modes may be found in~\cite{sccs}[pp.114-124].}
%
Typically when choosing components for a design, we look at manufacturers' data sheets
which describe functionality, physical dimensions
environmental ranges, tolerances and can indicate how a component may fail/misbehave
under given conditions.
%
How base components could fail internally, is not of interest to an FMEA investigation.
The FMEA investigator needs to know what failure behaviour a component may exhibit. %, or in other words, its modes of failure.
How %base
components could fail internally, is not of interest to an FMEA investigation.
The FMEA investigator needs to know what failure behaviour a component could exhibit. %, or in other words, its modes of failure.
%
A large body of literature exists giving guidance for the determination of component {\fms}.
%
@ -90,7 +103,7 @@ FMD-91 entries include general descriptions of internal failures alongside {\fm
%
FMD-91 entries need, in some cases, some interpretation to be mapped to a clear set of
component {\fms} suitable for use in FMEA.
%
A third document, MIL-1991~\cite{mil1991} provides overall reliability statistics for
component types, but does not detail specific failure modes.
%
@ -119,10 +132,13 @@ requires statistics for Meantime to Failure (MTTF) for all {\bc} failure modes.
\section{Determining the failure modes of Components.}
The starting point in the FMEA process are the failure modes of {\bcs}.
The starting point in the FMEA process are the failure modes of the components
we would typically find in a production parts list, which we can term the {\bcs}.
%
In order the define FMEA we must start with a discussion on how these failure modes are chosen.
%
In this section we look in detail at two common electrical components and examine how
In this section we pick %look in detail at
two common electrical components as examples, and examine how
the two chosen sources of {\fm} information define their failure mode behaviour.
We look at the reasons why some known failure modes % are omitted, or presented in
%specific but unintuitive ways.
@ -130,8 +146,8 @@ We look at the reasons why some known failure modes % are omitted, or presented
can be found in one source but not in the others and vice versa.
%
Finally we compare and contrast the failure modes determined for these components
from the FMD-91 reference source and from the guidelines of the
European burner standard EN298.
from the FMD-91~\cite{fmd91} reference source and from the guidelines of the
European burner standard EN298~\cite{en298}.
\subsection{Failure mode determination for generic resistor.}
\label{sec:resistorfm}
@ -221,6 +237,10 @@ and thus subject to drift/parameter change.
\subsubsection{Resistor Failure Modes}
\label{sec:res_fms}
The differneces in resistor failure modes between FMD-91 and EN298 are that FMD-91 would
include the failure mode DRIFT. EN298 does not include this, mainly because it imposes circuit design constraints
that effectively side step that problem.
%
For this study we will take the conservative view from EN298, and consider the failure
modes for a generic resistor to be both OPEN and SHORT.
i.e.
@ -268,7 +288,7 @@ We need to translate these failure causes within the Op-Amp into {\fms}.
We can look at each failure cause in turn, and map it to potential {\fms} suitable for use in FMEA
investigations.
\paragraph{Op-Amp failure cause: Poor Die attach}
\paragraph{Op-Amp failure cause: Poor Die attach.}
The symptom for this is given as a low slew rate. This means that the op-amp
will not react quickly to changes on its input terminals.
This is a failure symptom that may not be of concern in a slow responding system like an
@ -276,24 +296,24 @@ instrumentation amplifier. However, where higher frequencies are being processed
a signal may entirely be lost.
We can map this failure cause to a {\fm}, and we can call it $LOW_{slew}$.
\paragraph{No Operation - over stress}
\paragraph{No Operation - over stress.}
Here the OP-Amp has been damaged, and the output may be held HIGH or LOW, or may be
effectively tri-stated, i.e. not able to drive circuitry in along the next stages of
the signal path: we can call this state NOOP (no Operation).
%
We can map this failure cause to three {\fms}, $LOW$, $HIGH$, $NOOP$.
\paragraph{Shorted $V_+$ to $V_-$}
\paragraph{Shorted inputs: $V_+$ to $V_-$.}
Due to the high intrinsic gain of an op-amp, and the effect of offset currents,
this will force the output HIGH or LOW.
We map this failure cause to $HIGH$ or $LOW$.
\paragraph{Open $V_+$}
\paragraph{Open input: $V_+$.}
This failure cause will mean that the minus input will have the very high gain
of the Op-Amp applied to it, and the output will be forced HIGH or LOW.
We map this failure cause to $HIGH$ or $LOW$.
\paragraph{Collecting Op-Amp failure modes from FMD-91}
\paragraph{Collecting Op-Amp failure modes from FMD-91.}
We can define an Op-Amp, under FMD-91 definitions to have the following {\fms}.
\begin{equation}
\label{eqn:opampfms}
@ -301,7 +321,7 @@ We can define an Op-Amp, under FMD-91 definitions to have the following {\fms}.
\end{equation}
\paragraph{Failure Modes of an Op-Amp according to EN298}
\paragraph{Failure Modes of an Op-Amp according to EN298.}
EN298 does not specifically define OP\_AMPS failure modes; these can be determined
by following a procedure for `integrated~circuits' outlined in
@ -470,7 +490,7 @@ component {\fms} in FMEA or FMMD and require interpretation.
FMEA is a bottom-up procedure which starts with the failure modes of the low level components of a system, an example
analysis will serve to demonstrate it in practise.
\paragraph{ FMEA Example: Milli-volt reader.}
\section{FMEA worked example: milli-volt reader.}
Example: Let us consider a system, in this case a simple milli-volt reader, consisting
of instrumentation amplifiers connected to a micro-processor
that reports its readings via RS-232.
@ -542,6 +562,7 @@ In this section we examine some fundamental concepts and underlying philosophies
\paragraph{The signal path.}
% C Garret does not like the terms afferent and efferent here, try to think of something else
Most electronic systems are used to process a signal: with signal processing
there is usually a clear afferent to transform to efferent path.
%
@ -558,9 +579,6 @@ An FMEA investigation will often take the component {\fm} and examine its effect
in the direction of the signal,
echoing diagnostic/fault~finding methods~\cite{garrett, maikowski}. % loebowski}.
%
The rationale and work-culture of those tasked to
perform FMEA are generally personnel who have performed fault finding.
%
When fault finding we generally follow the signal path, checking for correct behaviour
along it: when we find something out of place we zoom in and measure
the circuit behaviour until we find a faulty component or module.
@ -568,6 +586,10 @@ the circuit behaviour until we find a faulty component or module.
With this style of fault finding, because it is based on experiment,
we can hop from module to module eliminating working modules, until we find the
failure.
%
The rationale and work-culture of those tasked to
perform FMEA are generally personnel who have performed fault finding.
%
FMEA is a theoretical discipline.
@ -575,15 +597,23 @@ FMEA is a theoretical discipline.
It would be very unusual to build a circuit and then simulate
component failure modes.
%
This would be time consuming as it would involve building a circuit for each component {\fm} in the system.
This would be time consuming as it would involve building a circuit for each component {\fm} in
the system\footnote{Building circuit simulations and simulating component failure modes
would be a very time consuming process and might only be performed as a final-stage of accident investigation, where the cause is
required to be proven.}
%
We cannot, as with fault finding, verify modules along the signal path for correct behaviour
and eliminate them from the investigation.
%
With FMEA we therefore need to be more thorough.
FMEA is a `thought~experiment', not actual experiment.
%
With FMEA we therefore need to be more thorough in the consideration of the effects a failure mode may have
on the other components in a system, than with fault finding.
%
The question is by how much.
%
Too much and the task becomes impossible due to time/labour constraints.
%
Too little and the analysis could become meaningless because it misses
potential system failures.
%
@ -594,10 +624,21 @@ of the component exhibiting the {\fm} under investigation.
Also, whether following the effects through the signal path {\em only} is acceptable, and instead
looking at its effect on all other components in the system is necessary,
is a matter for debate.
%
In practise, it is a compromise between the amount of time/money that can be spent
on analysis relative to the criticality of the project.
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.
\paragraph{Failure Modes and the signal path}
In general a component failure mode in an electronic circuit will
change the circuit topology. For a single failure
this effect may cause additional complications for the analyst.
For multiple failures this means
that the analyst
will have to deal altered---or changed circuit topologies---
of the electronic circuit for each analysis.
\paragraph{Single component failure mode to system failure relation.}
@ -619,11 +660,12 @@ From a whole system perspective, we may find that {\bc} {\fms}
may have more than one possible system event associated with them.
Often there will be a clear one to one mapping, but
probabilities to failure (as used in FMECA)
could mean one to many.% mapping.
could mean one too many. % mapping.
%
\paragraph{Use of Markov chains to model failure modes.}
We could represent a failure mode and its possible outcomes using a Markov chain~\cite{probfmea_4338247}.
%
Where multiple simultaneous\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}
Where multiple simultaneous%\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}
failure modes are considered this complicates
the statistical nature of the Markov chain, cause effect model.
%
@ -734,15 +776,22 @@ required to map a failure cause to its potential outcomes.
In our basic FMEA example in section~\ref{basicfmea}
we were asked to consider one failure mode against all the components in the milli-volt reader.
%
To create a complete FMEA report on the milli-volt reader we would have had to examine every
To create an exhaustive FMEA report on the milli-volt reader, we would have had to examine every
known failure mode of every component within it---against all its other components.
%
The reasoning~distance is defined as the sum of the number of failure modes, against all other components
We define `reasoning~distance' as the number of components checked against
for a given failure mode to determine a system level symptom.
%
No current FMEA variant gives guidelines for the components that should
be included to analyse a {\fm} in a system.
%does not
The exhaustive~reasoning~distance would be
the sum of the number of failure modes, against all other components
in that system.
%
If the milli-volt reader had say 100 components, with three failure modes each, this
would give a reasoning distance of 3 * 100 * 99.
would give an exhaustive reasoning distance of 3 * 100 * 99.
%
The discussion on reasoning distance leads provides us with a metric to examine
the state explosion problems associated with forward search failure investigation
methodologies.
@ -799,9 +848,10 @@ double failure scenarios (for burner lock-out scenarios).}
%(N^2 - N).f
\end{equation}
For our theoretical 100 components with 3 failure modes each example, this is
$100*99*98*3=2,910,600$ failure mode scenarios.
For our theoretical 100 components with 3 failure modes each example, this is a reasoning distance of
$100*99*98*3=2,910,600$ . % failure mode scenarios.
In practise there is an additional concern here, that of
the circuit topology changes that {\fms} can cause.
\paragraph{Reliance on experts for meaningful FMEA Analysis.}
Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach.
@ -818,7 +868,7 @@ on anything but a non-trivial system.
\subsection{Component Tolerance}
Component tolerances may need considered when determining if a component has failed.
Component tolerances may need considering when determining if a component has failed.
Calculations for acceptable ranges to determine failure or acceptable conditions
must be made where appropriate.
%
@ -846,13 +896,14 @@ is given in section~\ref{sec:resistortolerance}.
Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of
cost, problems to be addressed in product production.
It focuses on known problems, determines the
frequency they occur and their cost to fix.
This is multiplied together and called an RPN
number.
%
It generally focuses on known problems and using their
statistical frequency %they occur
and their cost to fix multiplied gives a Risk Priority Number (RPN)
number for the component {\fm}.
%
Fixing problems with the highest RPN number
will return most cost benefit.
will return most cost benefit~\cite{bfmea}.
% benign example of PFMEA in CARS - make something up.
\subsection{PFMEA Example}
@ -872,7 +923,7 @@ will return most cost benefit.
\section{FMECA - Failure Modes Effects and Criticality Analysis}
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
% \begin{figure}
% \centering
% %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg}
@ -883,10 +934,16 @@ will return most cost benefit.
% \end{figure}
FMECA places emphasis on determining criticality rather than the cost of system failures.
%
Applies some Bayesian statistics (probabilities of component failures
thereby causing given system level failures).
It applies Bayesian statistics (probabilities of component failures
and the probability of those failures causing given system level failures)
to determine the risk of system level events/symptoms.
The results of the probabilities for the system level failures
are multiplied by the operational time of the system.
For instance a military or emergency system may be typically operational for
a given number of hours. This in conjunction with the severity
of the system level event gives us a level of criticality.
%
Also the probability of the system failure causing a critical event.
%Also the probability of the system failure causing a critical event.
%
Applying Bayesian statistics to failure analysis, suffers the
problem that correlation does not imply causation~\cite{bayesfrequentist}.
@ -895,9 +952,7 @@ However, correlation is evidence for causation, and maybe the only evidence to h
and this is the justification behind its use.
A history of the usage and development of FMECA may be found in~\cite{FMECAresearch}.
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
Very similar to PFMEA, but instead of cost, a criticality or
seriousness factor is ascribed to putative top level incidents.
FMECA has three probability factors for component failures.
@ -917,7 +972,7 @@ a particular failure~mode occurring within a component. reference FMD-91.
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
\textbf{FMECA $\beta$ value.}
The second probability factor $\beta$, is the probability that the failure mode
will cause a given system failure.
@ -938,17 +993,15 @@ A weighting factor to indicate the seriousness of the putative system level erro
C_m = {\beta} . {\alpha} . {{\lambda}_p} . {t} . {s}
\end{equation}
Highest $C_m$ values would be at the top of a `to~do' list
for a project manager.
The highest $C_m$ values would represent the most dangerous or serious
system level failures.
The highest $C_m$ values would be at the top of a `to~fix' list
for a project manager, and some levels of risk may be considered unacceptable
and require re-design of some systems.
\section{FMEDA - Failure Modes Effects and Diagnostic Analysis}
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
% \begin{figure}
% \centering

View File

@ -1,8 +1,20 @@
\label{sec:chap3}
\section*{Introduction}
This chapter examines FMEA in a critical light.
The problems with the scope---or required reasoning distance---of detail to apply
for FMEA analysis are examined. The impossibility of integrating software
and hardware in FMEA failure models, and the impossibility of performing meaningful
multiple failure analysis are examined.
Additional problems such as the inability to easily re-use, and validate (through
traceable reasoning) FMEA models is presented.
Finally we conclude with a list of deficiencies in current FMEA methodologies, and present a wish list
for an improved methodology.
\section{Historical Origins of FMEA}
\subsection{FMEA designed for simple electro-mechanical systems}
\subsection{FMEA: {\bc} {\fm} to system level failure modelling}
FMEA traces it roots to the 1940s when it was used to identify the most costly
failures arising from car mass-production~\cite{bfmea}.
It was later modified slightly to include severity of the top level failure (FMECA~\cite{fmeca}).
@ -14,6 +26,13 @@ This means that we have one analysis case per component failure mode for all the
This analysis philosophy has not changed since FMEA was first used.
\subsection{FMEA does not support Traceable Reasoning}
An FMEA report normally assigns one line of a spreadsheet to
each {\bc} {\fm}.
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
but the structure of current FMEA reports does not encourage this.
\subsection{FMEA does not support modularity.}
It is a common practise in the process control industry to buy in sub-systems,
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
@ -64,10 +83,19 @@ We could term such a group a `{\fg}'.
Given the {\bc} {\fm} to system level failure mode paradigm it is
difficult to re-use FMEA analysis.
%
Several strategies to aid re-use have been proposed~\cite{rudov2009language, reuse_of_fmea}, but
the fundamental problem remains, that, with any changes
to the component base in a system, it is very difficult to
determine which FMEA test scenarios must be re-worked.
%
It is common in safety critical systems to have repeated circuit topologies.
For instance we may have several signal input and output
structures that are repeated.
%
The failure mode behaviour of these repeated structures will be the same.
However with the {\bc} {\fm} to system level failure mode mapping
work is likely to be repeated.
\section{software and FMEA}
@ -82,7 +110,7 @@ Similar difficulties in integrating mechanical and electronic/software
failure models are discussed in ~\cite{SMR:SMR580,swassessment}.
\paragraph{Current work on Software FMEA}
\paragraph{Current work on Software FMEA.}
SFMEA usually does not seek to integrate
hardware and software models, but to perform
@ -204,104 +232,104 @@ utterly anachronistic in the distributed real time system environment.
FMEA is no longer fit for purpose!
%
\section{Conclusions on current FMEA Methodologies}
%% FOCUS
The focus of this chapter %literature review
is to establish the current practice and applications
of FMEA.
%, and to examine its strengths and weaknesses.
%% GOAL
Its
goal is to identify central issues and to criticise and assess the current
FMEA methodologies.
%% PERSPECTIVE
The perspective of the author, is as a practitioner of static failure mode analysis techniques
concerning approval of product
to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
A second perspective is that of a software engineer trained to use formal methods.
Examining FMEA methodologies for mathematical properties, influenced by
formal methods applied to software, should provide a perspective not traditionally considered.
%% COVERAGE
The literature reviewed, has been restricted to published books, European safety standards (as examples
of current safety measures applied), and traditional research, from journal and conference papers.
%% ORGANISATION
The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
%% AUDIENCE
% Well duh! PhD supervisors and examiners....
% \subsection{Related Methodologies}
% FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
% \subsection{Hardware FMEA (HFMEA)}
% \subsection{Multiple Failure scenarios and FMEA}
% \subsection{Software FMEA (SFMEA)}
\paragraph{Current work on Software FMEA}
SFMEA usually does not seek to integrate
hardware and software models, but to perform
FMEA on the software in isolation~\cite{procsfmea}.
%
Work has been performed using databases
to track the relationships between variables
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
and FMEA (bottom-up inductive)
to be performed on the same system to provide insight into the
software hardware/interface~\cite{embedsfmea}.
%
Although this
would give a better picture of the failure mode behaviour, it
is by no means a rigorous approach to tracing errors that may occur in hardware
through to the top (and therefore ultimately controlling) layer of software~\cite{swassessment}.
\paragraph{Current FMEA techniques are not suitable for software}
The main FMEA methodologies are all based on the concept of taking
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
%
In a complicated system, mapping a component failure mode to a system level failure
will mean a long reasoning distance; that is to say the actions of the
failed component will have to be traced through
several sub-systems, gauging its effects with and on other components.
%
With software at the higher levels of these sub-systems,
we have yet another layer of complication.
%
%In order to integrate software, %in a meaningful way
%we need to re-think the
%FMEA concept of simply mapping a base component failure to a system level event.
%
SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
The failure modes of these variables, are that they could become erroneously over-written,
calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
external influences such as
ionising radiation causing bits to be erroneously altered.
\paragraph{FMEA and Modularity}
From the 1940's onwards, software has evolved from a simple procedural languages (i.e. assembly language/Fortran~\cite{f77} call return)
to structured programming ( C~\cite{DBLP:books/ph/KernighanR88}, pascal etc) and then to object oriented models (Java C++...).
FMEA has undergone no such evolution.
%
In a world where sensor systems, often including embedded software components, are brought in to
create complex systems, FMEA still follows a rigid {\bc} {\fm} to system level error model,
that is only suitable for simple electro mechanical systems.
%
%
% MAYBE MOVE THIS TO CH3, FMEA CRITICISM
%
% \section{Conclusions on current FMEA Methodologies}
%
% %% FOCUS
% The focus of this chapter %literature review
% is to establish the current practice and applications
% of FMEA.
% %, and to examine its strengths and weaknesses.
% %% GOAL
% Its
% goal is to identify central issues and to criticise and assess the current
% FMEA methodologies.
% %% PERSPECTIVE
% The perspective of the author, is as a practitioner of static failure mode analysis techniques
% concerning approval of product
% to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
% A second perspective is that of a software engineer trained to use formal methods.
% Examining FMEA methodologies for mathematical properties, influenced by
% formal methods applied to software, should provide a perspective not traditionally considered.
% %% COVERAGE
% The literature reviewed, has been restricted to published books, European safety standards (as examples
% of current safety measures applied), and traditional research, from journal and conference papers.
% %% ORGANISATION
% The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
% to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
% %% AUDIENCE
% % Well duh! PhD supervisors and examiners....
%
% % \subsection{Related Methodologies}
% % FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
% % \subsection{Hardware FMEA (HFMEA)}
% % \subsection{Multiple Failure scenarios and FMEA}
% % \subsection{Software FMEA (SFMEA)}
%
% \paragraph{Current work on Software FMEA}
%
% SFMEA usually does not seek to integrate
% hardware and software models, but to perform
% FMEA on the software in isolation~\cite{procsfmea}.
% %
% Work has been performed using databases
% to track the relationships between variables
% and system failure modes~\cite{procsfmeadb}, to %work has been performed to
% introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
% automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
% some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
% and FMEA (bottom-up inductive)
% to be performed on the same system to provide insight into the
% software hardware/interface~\cite{embedsfmea}.
% %
% Although this
% would give a better picture of the failure mode behaviour, it
% is by no means a rigorous approach to tracing errors that may occur in hardware
% through to the top (and therefore ultimately controlling) layer of software~\cite{swassessment}.
%
% \paragraph{Current FMEA techniques are not suitable for software}
%
% The main FMEA methodologies are all based on the concept of taking
% base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
% %
% In a complicated system, mapping a component failure mode to a system level failure
% will mean a long reasoning distance; that is to say the actions of the
% failed component will have to be traced through
% several sub-systems, gauging its effects with and on other components.
% %
% With software at the higher levels of these sub-systems,
% we have yet another layer of complication.
% %
% %In order to integrate software, %in a meaningful way
% %we need to re-think the
% %FMEA concept of simply mapping a base component failure to a system level event.
% %
% SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
% The failure modes of these variables, are that they could become erroneously over-written,
% calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
% external influences such as
% ionising radiation causing bits to be erroneously altered.
%
%
% \paragraph{FMEA and Modularity}
% From the 1940's onwards, software has evolved from a simple procedural languages (i.e. assembly language/Fortran~\cite{f77} call return)
% to structured programming ( C~\cite{DBLP:books/ph/KernighanR88}, pascal etc) and then to object oriented models (Java C++...).
% FMEA has undergone no such evolution.
% %
% In a world where sensor systems, often including embedded software components, are brought in to
% create complex systems, FMEA still follows a rigid {\bc} {\fm} to system level error model,
% that is only suitable for simple electro mechanical systems.
%
%
%
% %
%
% %
% % MAYBE MOVE THIS TO CH3, FMEA CRITICISM
% 30JAN2013
%
\subsection{Where FMEA is now.}
\subsection{FMEA Criticism: Conclusions.}
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
very good with single failure modes linked to top level events.
FMEA has become part of the safety critical and safety certification industries.
@ -319,7 +347,7 @@ All these FMEA based methodologies have the following short comings:
\begin{itemize}
\item Impossible to integrate Software and hardware models,
\item State explosion problem exacerbated by increasing complexity due to density of modern electronics,
\item Impossibility to consider all multiple component failure modes~\cite{FMEAmultiple653556}
\item Impossible to consider all multiple component failure modes~\cite{FMEAmultiple653556}
\end{itemize}
@ -333,7 +361,7 @@ We now form a wish list, stating the features that we would want
in an improved FMEA methodology,
\begin{itemize}
\item No state explosion making analysis impractical,
\item Rigorous (total failure coverage within {\fgs} all interacting component and failure modes checked),
\item Exhaustive checking (total failure coverage within {\fgs} all interacting component and failure modes checked),
\item Reasoning Traceable in system models,
\item Re-useable i.e. it should be possible to re-use analysis performed previously,
\item It must be possible to analyse simultaneous/multiple failures,