Good Friday morning
This commit is contained in:
parent
0ea57ac50c
commit
a7aa5e3854
@ -3,20 +3,32 @@
|
||||
\label{sec:chap2}
|
||||
|
||||
The generic and statistical European Safety Standard, EN61508:6\cite{en61508}[B.6.6]
|
||||
describes Failure Mode Effect Analysis (FMEA) as:
|
||||
describes FMEA as:
|
||||
\begin{quotation}
|
||||
"To analyse a system design, by examining all possible sources of failure
|
||||
``To analyse a system design, by examining all possible sources of failure
|
||||
of a system's components and determining the effects of these failures
|
||||
on the behaviour and safety of the system."
|
||||
on the behaviour and safety of the system.''
|
||||
\end{quotation}.
|
||||
|
||||
\section*{Introduction}
|
||||
This chapter introduces Failure Mode Effect Analysis (FMEA).
|
||||
%It begins with a simple example to demonstrate the basic concept of FMEA
|
||||
%and then
|
||||
It starts by looking at how we determine the failure modes associated with components.
|
||||
Two common electrical components, the resistor and the operational amplifier
|
||||
and examined in the context of two sources of information that define failure modes.
|
||||
A simple example of an FMEA is then given.
|
||||
The four main variants are then described and finally we conclude by describing concepts
|
||||
that underlie the usage and philosophy of FMEA.
|
||||
|
||||
|
||||
\section{FMEA}
|
||||
|
||||
|
||||
\section{FMEA Basic concept.}
|
||||
\label{basicfmea}
|
||||
%\subsection{FMEA}
|
||||
%\tableofcontents[currentsection]
|
||||
\paragraph{FMEA basic concept.}
|
||||
%\paragraph{FMEA basic concept.}
|
||||
|
||||
FMEA~\cite{safeware}[pp.341-344] is widely used, and proof of its use is a mandatory legal requirement
|
||||
for a large proportion of safety critical products sold in the European Union.
|
||||
@ -62,15 +74,16 @@ the effectiveness of FMEA.
|
||||
In order to apply any form of FMEA we need to know the ways in which
|
||||
the components we are using can fail.
|
||||
%
|
||||
A good introduction to hardware and software failure modes may be found in~\cite{sccs}[pp.114-124].
|
||||
\footnote{A good introduction to hardware and software failure modes may be found in~\cite{sccs}[pp.114-124].}
|
||||
%
|
||||
Typically when choosing components for a design, we look at manufacturers' data sheets
|
||||
which describe functionality, physical dimensions
|
||||
environmental ranges, tolerances and can indicate how a component may fail/misbehave
|
||||
under given conditions.
|
||||
%
|
||||
How base components could fail internally, is not of interest to an FMEA investigation.
|
||||
The FMEA investigator needs to know what failure behaviour a component may exhibit. %, or in other words, its modes of failure.
|
||||
How %base
|
||||
components could fail internally, is not of interest to an FMEA investigation.
|
||||
The FMEA investigator needs to know what failure behaviour a component could exhibit. %, or in other words, its modes of failure.
|
||||
%
|
||||
A large body of literature exists giving guidance for the determination of component {\fms}.
|
||||
%
|
||||
@ -90,7 +103,7 @@ FMD-91 entries include general descriptions of internal failures alongside {\fm
|
||||
%
|
||||
FMD-91 entries need, in some cases, some interpretation to be mapped to a clear set of
|
||||
component {\fms} suitable for use in FMEA.
|
||||
|
||||
%
|
||||
A third document, MIL-1991~\cite{mil1991} provides overall reliability statistics for
|
||||
component types, but does not detail specific failure modes.
|
||||
%
|
||||
@ -119,10 +132,13 @@ requires statistics for Meantime to Failure (MTTF) for all {\bc} failure modes.
|
||||
|
||||
\section{Determining the failure modes of Components.}
|
||||
|
||||
The starting point in the FMEA process are the failure modes of {\bcs}.
|
||||
The starting point in the FMEA process are the failure modes of the components
|
||||
we would typically find in a production parts list, which we can term the {\bcs}.
|
||||
%
|
||||
In order the define FMEA we must start with a discussion on how these failure modes are chosen.
|
||||
%
|
||||
In this section we look in detail at two common electrical components and examine how
|
||||
In this section we pick %look in detail at
|
||||
two common electrical components as examples, and examine how
|
||||
the two chosen sources of {\fm} information define their failure mode behaviour.
|
||||
We look at the reasons why some known failure modes % are omitted, or presented in
|
||||
%specific but unintuitive ways.
|
||||
@ -130,8 +146,8 @@ We look at the reasons why some known failure modes % are omitted, or presented
|
||||
can be found in one source but not in the others and vice versa.
|
||||
%
|
||||
Finally we compare and contrast the failure modes determined for these components
|
||||
from the FMD-91 reference source and from the guidelines of the
|
||||
European burner standard EN298.
|
||||
from the FMD-91~\cite{fmd91} reference source and from the guidelines of the
|
||||
European burner standard EN298~\cite{en298}.
|
||||
|
||||
\subsection{Failure mode determination for generic resistor.}
|
||||
\label{sec:resistorfm}
|
||||
@ -221,6 +237,10 @@ and thus subject to drift/parameter change.
|
||||
|
||||
\subsubsection{Resistor Failure Modes}
|
||||
\label{sec:res_fms}
|
||||
The differneces in resistor failure modes between FMD-91 and EN298 are that FMD-91 would
|
||||
include the failure mode DRIFT. EN298 does not include this, mainly because it imposes circuit design constraints
|
||||
that effectively side step that problem.
|
||||
%
|
||||
For this study we will take the conservative view from EN298, and consider the failure
|
||||
modes for a generic resistor to be both OPEN and SHORT.
|
||||
i.e.
|
||||
@ -268,7 +288,7 @@ We need to translate these failure causes within the Op-Amp into {\fms}.
|
||||
We can look at each failure cause in turn, and map it to potential {\fms} suitable for use in FMEA
|
||||
investigations.
|
||||
|
||||
\paragraph{Op-Amp failure cause: Poor Die attach}
|
||||
\paragraph{Op-Amp failure cause: Poor Die attach.}
|
||||
The symptom for this is given as a low slew rate. This means that the op-amp
|
||||
will not react quickly to changes on its input terminals.
|
||||
This is a failure symptom that may not be of concern in a slow responding system like an
|
||||
@ -276,24 +296,24 @@ instrumentation amplifier. However, where higher frequencies are being processed
|
||||
a signal may entirely be lost.
|
||||
We can map this failure cause to a {\fm}, and we can call it $LOW_{slew}$.
|
||||
|
||||
\paragraph{No Operation - over stress}
|
||||
\paragraph{No Operation - over stress.}
|
||||
Here the OP-Amp has been damaged, and the output may be held HIGH or LOW, or may be
|
||||
effectively tri-stated, i.e. not able to drive circuitry in along the next stages of
|
||||
the signal path: we can call this state NOOP (no Operation).
|
||||
%
|
||||
We can map this failure cause to three {\fms}, $LOW$, $HIGH$, $NOOP$.
|
||||
|
||||
\paragraph{Shorted $V_+$ to $V_-$}
|
||||
\paragraph{Shorted inputs: $V_+$ to $V_-$.}
|
||||
Due to the high intrinsic gain of an op-amp, and the effect of offset currents,
|
||||
this will force the output HIGH or LOW.
|
||||
We map this failure cause to $HIGH$ or $LOW$.
|
||||
|
||||
\paragraph{Open $V_+$}
|
||||
\paragraph{Open input: $V_+$.}
|
||||
This failure cause will mean that the minus input will have the very high gain
|
||||
of the Op-Amp applied to it, and the output will be forced HIGH or LOW.
|
||||
We map this failure cause to $HIGH$ or $LOW$.
|
||||
|
||||
\paragraph{Collecting Op-Amp failure modes from FMD-91}
|
||||
\paragraph{Collecting Op-Amp failure modes from FMD-91.}
|
||||
We can define an Op-Amp, under FMD-91 definitions to have the following {\fms}.
|
||||
\begin{equation}
|
||||
\label{eqn:opampfms}
|
||||
@ -301,7 +321,7 @@ We can define an Op-Amp, under FMD-91 definitions to have the following {\fms}.
|
||||
\end{equation}
|
||||
|
||||
|
||||
\paragraph{Failure Modes of an Op-Amp according to EN298}
|
||||
\paragraph{Failure Modes of an Op-Amp according to EN298.}
|
||||
|
||||
EN298 does not specifically define OP\_AMPS failure modes; these can be determined
|
||||
by following a procedure for `integrated~circuits' outlined in
|
||||
@ -470,7 +490,7 @@ component {\fms} in FMEA or FMMD and require interpretation.
|
||||
FMEA is a bottom-up procedure which starts with the failure modes of the low level components of a system, an example
|
||||
analysis will serve to demonstrate it in practise.
|
||||
|
||||
\paragraph{ FMEA Example: Milli-volt reader.}
|
||||
\section{FMEA worked example: milli-volt reader.}
|
||||
Example: Let us consider a system, in this case a simple milli-volt reader, consisting
|
||||
of instrumentation amplifiers connected to a micro-processor
|
||||
that reports its readings via RS-232.
|
||||
@ -542,6 +562,7 @@ In this section we examine some fundamental concepts and underlying philosophies
|
||||
|
||||
\paragraph{The signal path.}
|
||||
|
||||
% C Garret does not like the terms afferent and efferent here, try to think of something else
|
||||
Most electronic systems are used to process a signal: with signal processing
|
||||
there is usually a clear afferent to transform to efferent path.
|
||||
%
|
||||
@ -558,9 +579,6 @@ An FMEA investigation will often take the component {\fm} and examine its effect
|
||||
in the direction of the signal,
|
||||
echoing diagnostic/fault~finding methods~\cite{garrett, maikowski}. % loebowski}.
|
||||
%
|
||||
The rationale and work-culture of those tasked to
|
||||
perform FMEA are generally personnel who have performed fault finding.
|
||||
%
|
||||
When fault finding we generally follow the signal path, checking for correct behaviour
|
||||
along it: when we find something out of place we zoom in and measure
|
||||
the circuit behaviour until we find a faulty component or module.
|
||||
@ -568,6 +586,10 @@ the circuit behaviour until we find a faulty component or module.
|
||||
With this style of fault finding, because it is based on experiment,
|
||||
we can hop from module to module eliminating working modules, until we find the
|
||||
failure.
|
||||
%
|
||||
The rationale and work-culture of those tasked to
|
||||
perform FMEA are generally personnel who have performed fault finding.
|
||||
%
|
||||
|
||||
|
||||
FMEA is a theoretical discipline.
|
||||
@ -575,15 +597,23 @@ FMEA is a theoretical discipline.
|
||||
It would be very unusual to build a circuit and then simulate
|
||||
component failure modes.
|
||||
%
|
||||
This would be time consuming as it would involve building a circuit for each component {\fm} in the system.
|
||||
This would be time consuming as it would involve building a circuit for each component {\fm} in
|
||||
the system\footnote{Building circuit simulations and simulating component failure modes
|
||||
would be a very time consuming process and might only be performed as a final-stage of accident investigation, where the cause is
|
||||
required to be proven.}
|
||||
%
|
||||
We cannot, as with fault finding, verify modules along the signal path for correct behaviour
|
||||
and eliminate them from the investigation.
|
||||
%
|
||||
With FMEA we therefore need to be more thorough.
|
||||
FMEA is a `thought~experiment', not actual experiment.
|
||||
%
|
||||
With FMEA we therefore need to be more thorough in the consideration of the effects a failure mode may have
|
||||
on the other components in a system, than with fault finding.
|
||||
%
|
||||
The question is by how much.
|
||||
%
|
||||
Too much and the task becomes impossible due to time/labour constraints.
|
||||
%
|
||||
Too little and the analysis could become meaningless because it misses
|
||||
potential system failures.
|
||||
%
|
||||
@ -594,10 +624,21 @@ of the component exhibiting the {\fm} under investigation.
|
||||
Also, whether following the effects through the signal path {\em only} is acceptable, and instead
|
||||
looking at its effect on all other components in the system is necessary,
|
||||
is a matter for debate.
|
||||
%
|
||||
In practise, it is a compromise between the amount of time/money that can be spent
|
||||
on analysis relative to the criticality of the project.
|
||||
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.
|
||||
|
||||
\paragraph{Failure Modes and the signal path}
|
||||
|
||||
In general a component failure mode in an electronic circuit will
|
||||
change the circuit topology. For a single failure
|
||||
this effect may cause additional complications for the analyst.
|
||||
For multiple failures this means
|
||||
that the analyst
|
||||
will have to deal altered---or changed circuit topologies---
|
||||
of the electronic circuit for each analysis.
|
||||
|
||||
|
||||
\paragraph{Single component failure mode to system failure relation.}
|
||||
|
||||
@ -619,11 +660,12 @@ From a whole system perspective, we may find that {\bc} {\fms}
|
||||
may have more than one possible system event associated with them.
|
||||
Often there will be a clear one to one mapping, but
|
||||
probabilities to failure (as used in FMECA)
|
||||
could mean one to many.% mapping.
|
||||
could mean one too many. % mapping.
|
||||
%
|
||||
\paragraph{Use of Markov chains to model failure modes.}
|
||||
We could represent a failure mode and its possible outcomes using a Markov chain~\cite{probfmea_4338247}.
|
||||
%
|
||||
Where multiple simultaneous\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}
|
||||
Where multiple simultaneous%\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}
|
||||
failure modes are considered this complicates
|
||||
the statistical nature of the Markov chain, cause effect model.
|
||||
%
|
||||
@ -734,15 +776,22 @@ required to map a failure cause to its potential outcomes.
|
||||
In our basic FMEA example in section~\ref{basicfmea}
|
||||
we were asked to consider one failure mode against all the components in the milli-volt reader.
|
||||
%
|
||||
To create a complete FMEA report on the milli-volt reader we would have had to examine every
|
||||
To create an exhaustive FMEA report on the milli-volt reader, we would have had to examine every
|
||||
known failure mode of every component within it---against all its other components.
|
||||
%
|
||||
The reasoning~distance is defined as the sum of the number of failure modes, against all other components
|
||||
We define `reasoning~distance' as the number of components checked against
|
||||
for a given failure mode to determine a system level symptom.
|
||||
%
|
||||
No current FMEA variant gives guidelines for the components that should
|
||||
be included to analyse a {\fm} in a system.
|
||||
%does not
|
||||
The exhaustive~reasoning~distance would be
|
||||
the sum of the number of failure modes, against all other components
|
||||
in that system.
|
||||
%
|
||||
If the milli-volt reader had say 100 components, with three failure modes each, this
|
||||
would give a reasoning distance of 3 * 100 * 99.
|
||||
|
||||
would give an exhaustive reasoning distance of 3 * 100 * 99.
|
||||
%
|
||||
The discussion on reasoning distance leads provides us with a metric to examine
|
||||
the state explosion problems associated with forward search failure investigation
|
||||
methodologies.
|
||||
@ -799,9 +848,10 @@ double failure scenarios (for burner lock-out scenarios).}
|
||||
%(N^2 - N).f
|
||||
\end{equation}
|
||||
|
||||
For our theoretical 100 components with 3 failure modes each example, this is
|
||||
$100*99*98*3=2,910,600$ failure mode scenarios.
|
||||
|
||||
For our theoretical 100 components with 3 failure modes each example, this is a reasoning distance of
|
||||
$100*99*98*3=2,910,600$ . % failure mode scenarios.
|
||||
In practise there is an additional concern here, that of
|
||||
the circuit topology changes that {\fms} can cause.
|
||||
|
||||
\paragraph{Reliance on experts for meaningful FMEA Analysis.}
|
||||
Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach.
|
||||
@ -818,7 +868,7 @@ on anything but a non-trivial system.
|
||||
|
||||
\subsection{Component Tolerance}
|
||||
|
||||
Component tolerances may need considered when determining if a component has failed.
|
||||
Component tolerances may need considering when determining if a component has failed.
|
||||
Calculations for acceptable ranges to determine failure or acceptable conditions
|
||||
must be made where appropriate.
|
||||
%
|
||||
@ -846,13 +896,14 @@ is given in section~\ref{sec:resistortolerance}.
|
||||
|
||||
Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of
|
||||
cost, problems to be addressed in product production.
|
||||
|
||||
It focuses on known problems, determines the
|
||||
frequency they occur and their cost to fix.
|
||||
This is multiplied together and called an RPN
|
||||
number.
|
||||
%
|
||||
It generally focuses on known problems and using their
|
||||
statistical frequency %they occur
|
||||
and their cost to fix multiplied gives a Risk Priority Number (RPN)
|
||||
number for the component {\fm}.
|
||||
%
|
||||
Fixing problems with the highest RPN number
|
||||
will return most cost benefit.
|
||||
will return most cost benefit~\cite{bfmea}.
|
||||
|
||||
% benign example of PFMEA in CARS - make something up.
|
||||
\subsection{PFMEA Example}
|
||||
@ -872,7 +923,7 @@ will return most cost benefit.
|
||||
|
||||
\section{FMECA - Failure Modes Effects and Criticality Analysis}
|
||||
|
||||
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
||||
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
|
||||
% \begin{figure}
|
||||
% \centering
|
||||
% %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg}
|
||||
@ -883,10 +934,16 @@ will return most cost benefit.
|
||||
% \end{figure}
|
||||
FMECA places emphasis on determining criticality rather than the cost of system failures.
|
||||
%
|
||||
Applies some Bayesian statistics (probabilities of component failures
|
||||
thereby causing given system level failures).
|
||||
It applies Bayesian statistics (probabilities of component failures
|
||||
and the probability of those failures causing given system level failures)
|
||||
to determine the risk of system level events/symptoms.
|
||||
The results of the probabilities for the system level failures
|
||||
are multiplied by the operational time of the system.
|
||||
For instance a military or emergency system may be typically operational for
|
||||
a given number of hours. This in conjunction with the severity
|
||||
of the system level event gives us a level of criticality.
|
||||
%
|
||||
Also the probability of the system failure causing a critical event.
|
||||
%Also the probability of the system failure causing a critical event.
|
||||
%
|
||||
Applying Bayesian statistics to failure analysis, suffers the
|
||||
problem that correlation does not imply causation~\cite{bayesfrequentist}.
|
||||
@ -895,9 +952,7 @@ However, correlation is evidence for causation, and maybe the only evidence to h
|
||||
and this is the justification behind its use.
|
||||
A history of the usage and development of FMECA may be found in~\cite{FMECAresearch}.
|
||||
|
||||
|
||||
|
||||
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
||||
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
|
||||
Very similar to PFMEA, but instead of cost, a criticality or
|
||||
seriousness factor is ascribed to putative top level incidents.
|
||||
FMECA has three probability factors for component failures.
|
||||
@ -917,7 +972,7 @@ a particular failure~mode occurring within a component. reference FMD-91.
|
||||
|
||||
|
||||
|
||||
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
||||
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
|
||||
\textbf{FMECA $\beta$ value.}
|
||||
The second probability factor $\beta$, is the probability that the failure mode
|
||||
will cause a given system failure.
|
||||
@ -938,17 +993,15 @@ A weighting factor to indicate the seriousness of the putative system level erro
|
||||
C_m = {\beta} . {\alpha} . {{\lambda}_p} . {t} . {s}
|
||||
\end{equation}
|
||||
|
||||
Highest $C_m$ values would be at the top of a `to~do' list
|
||||
for a project manager.
|
||||
|
||||
|
||||
The highest $C_m$ values would represent the most dangerous or serious
|
||||
system level failures.
|
||||
The highest $C_m$ values would be at the top of a `to~fix' list
|
||||
for a project manager, and some levels of risk may be considered unacceptable
|
||||
and require re-design of some systems.
|
||||
|
||||
|
||||
\section{FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
||||
|
||||
|
||||
|
||||
|
||||
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
||||
% \begin{figure}
|
||||
% \centering
|
||||
|
@ -1,8 +1,20 @@
|
||||
\label{sec:chap3}
|
||||
|
||||
\section*{Introduction}
|
||||
|
||||
This chapter examines FMEA in a critical light.
|
||||
The problems with the scope---or required reasoning distance---of detail to apply
|
||||
for FMEA analysis are examined. The impossibility of integrating software
|
||||
and hardware in FMEA failure models, and the impossibility of performing meaningful
|
||||
multiple failure analysis are examined.
|
||||
Additional problems such as the inability to easily re-use, and validate (through
|
||||
traceable reasoning) FMEA models is presented.
|
||||
Finally we conclude with a list of deficiencies in current FMEA methodologies, and present a wish list
|
||||
for an improved methodology.
|
||||
|
||||
\section{Historical Origins of FMEA}
|
||||
|
||||
\subsection{FMEA designed for simple electro-mechanical systems}
|
||||
\subsection{FMEA: {\bc} {\fm} to system level failure modelling}
|
||||
FMEA traces it roots to the 1940s when it was used to identify the most costly
|
||||
failures arising from car mass-production~\cite{bfmea}.
|
||||
It was later modified slightly to include severity of the top level failure (FMECA~\cite{fmeca}).
|
||||
@ -14,6 +26,13 @@ This means that we have one analysis case per component failure mode for all the
|
||||
This analysis philosophy has not changed since FMEA was first used.
|
||||
|
||||
|
||||
\subsection{FMEA does not support Traceable Reasoning}
|
||||
An FMEA report normally assigns one line of a spreadsheet to
|
||||
each {\bc} {\fm}.
|
||||
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
|
||||
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
|
||||
but the structure of current FMEA reports does not encourage this.
|
||||
|
||||
\subsection{FMEA does not support modularity.}
|
||||
It is a common practise in the process control industry to buy in sub-systems,
|
||||
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
|
||||
@ -64,10 +83,19 @@ We could term such a group a `{\fg}'.
|
||||
|
||||
Given the {\bc} {\fm} to system level failure mode paradigm it is
|
||||
difficult to re-use FMEA analysis.
|
||||
%
|
||||
Several strategies to aid re-use have been proposed~\cite{rudov2009language, reuse_of_fmea}, but
|
||||
the fundamental problem remains, that, with any changes
|
||||
to the component base in a system, it is very difficult to
|
||||
determine which FMEA test scenarios must be re-worked.
|
||||
%
|
||||
It is common in safety critical systems to have repeated circuit topologies.
|
||||
For instance we may have several signal input and output
|
||||
structures that are repeated.
|
||||
%
|
||||
The failure mode behaviour of these repeated structures will be the same.
|
||||
However with the {\bc} {\fm} to system level failure mode mapping
|
||||
work is likely to be repeated.
|
||||
|
||||
|
||||
\section{software and FMEA}
|
||||
@ -82,7 +110,7 @@ Similar difficulties in integrating mechanical and electronic/software
|
||||
failure models are discussed in ~\cite{SMR:SMR580,swassessment}.
|
||||
|
||||
|
||||
\paragraph{Current work on Software FMEA}
|
||||
\paragraph{Current work on Software FMEA.}
|
||||
|
||||
SFMEA usually does not seek to integrate
|
||||
hardware and software models, but to perform
|
||||
@ -204,104 +232,104 @@ utterly anachronistic in the distributed real time system environment.
|
||||
|
||||
FMEA is no longer fit for purpose!
|
||||
%
|
||||
|
||||
\section{Conclusions on current FMEA Methodologies}
|
||||
|
||||
%% FOCUS
|
||||
The focus of this chapter %literature review
|
||||
is to establish the current practice and applications
|
||||
of FMEA.
|
||||
%, and to examine its strengths and weaknesses.
|
||||
%% GOAL
|
||||
Its
|
||||
goal is to identify central issues and to criticise and assess the current
|
||||
FMEA methodologies.
|
||||
%% PERSPECTIVE
|
||||
The perspective of the author, is as a practitioner of static failure mode analysis techniques
|
||||
concerning approval of product
|
||||
to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
|
||||
A second perspective is that of a software engineer trained to use formal methods.
|
||||
Examining FMEA methodologies for mathematical properties, influenced by
|
||||
formal methods applied to software, should provide a perspective not traditionally considered.
|
||||
%% COVERAGE
|
||||
The literature reviewed, has been restricted to published books, European safety standards (as examples
|
||||
of current safety measures applied), and traditional research, from journal and conference papers.
|
||||
%% ORGANISATION
|
||||
The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
|
||||
to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
|
||||
%% AUDIENCE
|
||||
% Well duh! PhD supervisors and examiners....
|
||||
|
||||
% \subsection{Related Methodologies}
|
||||
% FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
|
||||
% \subsection{Hardware FMEA (HFMEA)}
|
||||
% \subsection{Multiple Failure scenarios and FMEA}
|
||||
% \subsection{Software FMEA (SFMEA)}
|
||||
|
||||
\paragraph{Current work on Software FMEA}
|
||||
|
||||
SFMEA usually does not seek to integrate
|
||||
hardware and software models, but to perform
|
||||
FMEA on the software in isolation~\cite{procsfmea}.
|
||||
%
|
||||
Work has been performed using databases
|
||||
to track the relationships between variables
|
||||
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
||||
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
||||
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
||||
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
||||
and FMEA (bottom-up inductive)
|
||||
to be performed on the same system to provide insight into the
|
||||
software hardware/interface~\cite{embedsfmea}.
|
||||
%
|
||||
Although this
|
||||
would give a better picture of the failure mode behaviour, it
|
||||
is by no means a rigorous approach to tracing errors that may occur in hardware
|
||||
through to the top (and therefore ultimately controlling) layer of software~\cite{swassessment}.
|
||||
|
||||
\paragraph{Current FMEA techniques are not suitable for software}
|
||||
|
||||
The main FMEA methodologies are all based on the concept of taking
|
||||
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
|
||||
%
|
||||
In a complicated system, mapping a component failure mode to a system level failure
|
||||
will mean a long reasoning distance; that is to say the actions of the
|
||||
failed component will have to be traced through
|
||||
several sub-systems, gauging its effects with and on other components.
|
||||
%
|
||||
With software at the higher levels of these sub-systems,
|
||||
we have yet another layer of complication.
|
||||
%
|
||||
%In order to integrate software, %in a meaningful way
|
||||
%we need to re-think the
|
||||
%FMEA concept of simply mapping a base component failure to a system level event.
|
||||
%
|
||||
SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
|
||||
The failure modes of these variables, are that they could become erroneously over-written,
|
||||
calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
|
||||
external influences such as
|
||||
ionising radiation causing bits to be erroneously altered.
|
||||
|
||||
|
||||
\paragraph{FMEA and Modularity}
|
||||
From the 1940's onwards, software has evolved from a simple procedural languages (i.e. assembly language/Fortran~\cite{f77} call return)
|
||||
to structured programming ( C~\cite{DBLP:books/ph/KernighanR88}, pascal etc) and then to object oriented models (Java C++...).
|
||||
FMEA has undergone no such evolution.
|
||||
%
|
||||
In a world where sensor systems, often including embedded software components, are brought in to
|
||||
create complex systems, FMEA still follows a rigid {\bc} {\fm} to system level error model,
|
||||
that is only suitable for simple electro mechanical systems.
|
||||
|
||||
|
||||
|
||||
%
|
||||
|
||||
%
|
||||
% MAYBE MOVE THIS TO CH3, FMEA CRITICISM
|
||||
%
|
||||
% \section{Conclusions on current FMEA Methodologies}
|
||||
%
|
||||
% %% FOCUS
|
||||
% The focus of this chapter %literature review
|
||||
% is to establish the current practice and applications
|
||||
% of FMEA.
|
||||
% %, and to examine its strengths and weaknesses.
|
||||
% %% GOAL
|
||||
% Its
|
||||
% goal is to identify central issues and to criticise and assess the current
|
||||
% FMEA methodologies.
|
||||
% %% PERSPECTIVE
|
||||
% The perspective of the author, is as a practitioner of static failure mode analysis techniques
|
||||
% concerning approval of product
|
||||
% to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
|
||||
% A second perspective is that of a software engineer trained to use formal methods.
|
||||
% Examining FMEA methodologies for mathematical properties, influenced by
|
||||
% formal methods applied to software, should provide a perspective not traditionally considered.
|
||||
% %% COVERAGE
|
||||
% The literature reviewed, has been restricted to published books, European safety standards (as examples
|
||||
% of current safety measures applied), and traditional research, from journal and conference papers.
|
||||
% %% ORGANISATION
|
||||
% The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
|
||||
% to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
|
||||
% %% AUDIENCE
|
||||
% % Well duh! PhD supervisors and examiners....
|
||||
%
|
||||
% % \subsection{Related Methodologies}
|
||||
% % FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
|
||||
% % \subsection{Hardware FMEA (HFMEA)}
|
||||
% % \subsection{Multiple Failure scenarios and FMEA}
|
||||
% % \subsection{Software FMEA (SFMEA)}
|
||||
%
|
||||
% \paragraph{Current work on Software FMEA}
|
||||
%
|
||||
% SFMEA usually does not seek to integrate
|
||||
% hardware and software models, but to perform
|
||||
% FMEA on the software in isolation~\cite{procsfmea}.
|
||||
% %
|
||||
% Work has been performed using databases
|
||||
% to track the relationships between variables
|
||||
% and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
||||
% introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
||||
% automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
||||
% some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
||||
% and FMEA (bottom-up inductive)
|
||||
% to be performed on the same system to provide insight into the
|
||||
% software hardware/interface~\cite{embedsfmea}.
|
||||
% %
|
||||
% Although this
|
||||
% would give a better picture of the failure mode behaviour, it
|
||||
% is by no means a rigorous approach to tracing errors that may occur in hardware
|
||||
% through to the top (and therefore ultimately controlling) layer of software~\cite{swassessment}.
|
||||
%
|
||||
% \paragraph{Current FMEA techniques are not suitable for software}
|
||||
%
|
||||
% The main FMEA methodologies are all based on the concept of taking
|
||||
% base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
|
||||
% %
|
||||
% In a complicated system, mapping a component failure mode to a system level failure
|
||||
% will mean a long reasoning distance; that is to say the actions of the
|
||||
% failed component will have to be traced through
|
||||
% several sub-systems, gauging its effects with and on other components.
|
||||
% %
|
||||
% With software at the higher levels of these sub-systems,
|
||||
% we have yet another layer of complication.
|
||||
% %
|
||||
% %In order to integrate software, %in a meaningful way
|
||||
% %we need to re-think the
|
||||
% %FMEA concept of simply mapping a base component failure to a system level event.
|
||||
% %
|
||||
% SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
|
||||
% The failure modes of these variables, are that they could become erroneously over-written,
|
||||
% calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
|
||||
% external influences such as
|
||||
% ionising radiation causing bits to be erroneously altered.
|
||||
%
|
||||
%
|
||||
% \paragraph{FMEA and Modularity}
|
||||
% From the 1940's onwards, software has evolved from a simple procedural languages (i.e. assembly language/Fortran~\cite{f77} call return)
|
||||
% to structured programming ( C~\cite{DBLP:books/ph/KernighanR88}, pascal etc) and then to object oriented models (Java C++...).
|
||||
% FMEA has undergone no such evolution.
|
||||
% %
|
||||
% In a world where sensor systems, often including embedded software components, are brought in to
|
||||
% create complex systems, FMEA still follows a rigid {\bc} {\fm} to system level error model,
|
||||
% that is only suitable for simple electro mechanical systems.
|
||||
%
|
||||
%
|
||||
%
|
||||
% %
|
||||
%
|
||||
% %
|
||||
% % MAYBE MOVE THIS TO CH3, FMEA CRITICISM
|
||||
% 30JAN2013
|
||||
%
|
||||
|
||||
\subsection{Where FMEA is now.}
|
||||
\subsection{FMEA Criticism: Conclusions.}
|
||||
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
|
||||
very good with single failure modes linked to top level events.
|
||||
FMEA has become part of the safety critical and safety certification industries.
|
||||
@ -319,7 +347,7 @@ All these FMEA based methodologies have the following short comings:
|
||||
\begin{itemize}
|
||||
\item Impossible to integrate Software and hardware models,
|
||||
\item State explosion problem exacerbated by increasing complexity due to density of modern electronics,
|
||||
\item Impossibility to consider all multiple component failure modes~\cite{FMEAmultiple653556}
|
||||
\item Impossible to consider all multiple component failure modes~\cite{FMEAmultiple653556}
|
||||
\end{itemize}
|
||||
|
||||
|
||||
@ -333,7 +361,7 @@ We now form a wish list, stating the features that we would want
|
||||
in an improved FMEA methodology,
|
||||
\begin{itemize}
|
||||
\item No state explosion making analysis impractical,
|
||||
\item Rigorous (total failure coverage within {\fgs} all interacting component and failure modes checked),
|
||||
\item Exhaustive checking (total failure coverage within {\fgs} all interacting component and failure modes checked),
|
||||
\item Reasoning Traceable in system models,
|
||||
\item Re-useable i.e. it should be possible to re-use analysis performed previously,
|
||||
\item It must be possible to analyse simultaneous/multiple failures,
|
||||
|
Loading…
Reference in New Issue
Block a user