Good Friday morning
This commit is contained in:
parent
0ea57ac50c
commit
a7aa5e3854
@ -3,20 +3,32 @@
|
|||||||
\label{sec:chap2}
|
\label{sec:chap2}
|
||||||
|
|
||||||
The generic and statistical European Safety Standard, EN61508:6\cite{en61508}[B.6.6]
|
The generic and statistical European Safety Standard, EN61508:6\cite{en61508}[B.6.6]
|
||||||
describes Failure Mode Effect Analysis (FMEA) as:
|
describes FMEA as:
|
||||||
\begin{quotation}
|
\begin{quotation}
|
||||||
"To analyse a system design, by examining all possible sources of failure
|
``To analyse a system design, by examining all possible sources of failure
|
||||||
of a system's components and determining the effects of these failures
|
of a system's components and determining the effects of these failures
|
||||||
on the behaviour and safety of the system."
|
on the behaviour and safety of the system.''
|
||||||
\end{quotation}.
|
\end{quotation}.
|
||||||
|
|
||||||
|
\section*{Introduction}
|
||||||
|
This chapter introduces Failure Mode Effect Analysis (FMEA).
|
||||||
|
%It begins with a simple example to demonstrate the basic concept of FMEA
|
||||||
|
%and then
|
||||||
|
It starts by looking at how we determine the failure modes associated with components.
|
||||||
|
Two common electrical components, the resistor and the operational amplifier
|
||||||
|
and examined in the context of two sources of information that define failure modes.
|
||||||
|
A simple example of an FMEA is then given.
|
||||||
|
The four main variants are then described and finally we conclude by describing concepts
|
||||||
|
that underlie the usage and philosophy of FMEA.
|
||||||
|
|
||||||
|
|
||||||
\section{FMEA}
|
|
||||||
|
|
||||||
|
\section{FMEA Basic concept.}
|
||||||
\label{basicfmea}
|
\label{basicfmea}
|
||||||
%\subsection{FMEA}
|
%\subsection{FMEA}
|
||||||
%\tableofcontents[currentsection]
|
%\tableofcontents[currentsection]
|
||||||
\paragraph{FMEA basic concept.}
|
%\paragraph{FMEA basic concept.}
|
||||||
|
|
||||||
FMEA~\cite{safeware}[pp.341-344] is widely used, and proof of its use is a mandatory legal requirement
|
FMEA~\cite{safeware}[pp.341-344] is widely used, and proof of its use is a mandatory legal requirement
|
||||||
for a large proportion of safety critical products sold in the European Union.
|
for a large proportion of safety critical products sold in the European Union.
|
||||||
@ -62,15 +74,16 @@ the effectiveness of FMEA.
|
|||||||
In order to apply any form of FMEA we need to know the ways in which
|
In order to apply any form of FMEA we need to know the ways in which
|
||||||
the components we are using can fail.
|
the components we are using can fail.
|
||||||
%
|
%
|
||||||
A good introduction to hardware and software failure modes may be found in~\cite{sccs}[pp.114-124].
|
\footnote{A good introduction to hardware and software failure modes may be found in~\cite{sccs}[pp.114-124].}
|
||||||
%
|
%
|
||||||
Typically when choosing components for a design, we look at manufacturers' data sheets
|
Typically when choosing components for a design, we look at manufacturers' data sheets
|
||||||
which describe functionality, physical dimensions
|
which describe functionality, physical dimensions
|
||||||
environmental ranges, tolerances and can indicate how a component may fail/misbehave
|
environmental ranges, tolerances and can indicate how a component may fail/misbehave
|
||||||
under given conditions.
|
under given conditions.
|
||||||
%
|
%
|
||||||
How base components could fail internally, is not of interest to an FMEA investigation.
|
How %base
|
||||||
The FMEA investigator needs to know what failure behaviour a component may exhibit. %, or in other words, its modes of failure.
|
components could fail internally, is not of interest to an FMEA investigation.
|
||||||
|
The FMEA investigator needs to know what failure behaviour a component could exhibit. %, or in other words, its modes of failure.
|
||||||
%
|
%
|
||||||
A large body of literature exists giving guidance for the determination of component {\fms}.
|
A large body of literature exists giving guidance for the determination of component {\fms}.
|
||||||
%
|
%
|
||||||
@ -90,7 +103,7 @@ FMD-91 entries include general descriptions of internal failures alongside {\fm
|
|||||||
%
|
%
|
||||||
FMD-91 entries need, in some cases, some interpretation to be mapped to a clear set of
|
FMD-91 entries need, in some cases, some interpretation to be mapped to a clear set of
|
||||||
component {\fms} suitable for use in FMEA.
|
component {\fms} suitable for use in FMEA.
|
||||||
|
%
|
||||||
A third document, MIL-1991~\cite{mil1991} provides overall reliability statistics for
|
A third document, MIL-1991~\cite{mil1991} provides overall reliability statistics for
|
||||||
component types, but does not detail specific failure modes.
|
component types, but does not detail specific failure modes.
|
||||||
%
|
%
|
||||||
@ -119,10 +132,13 @@ requires statistics for Meantime to Failure (MTTF) for all {\bc} failure modes.
|
|||||||
|
|
||||||
\section{Determining the failure modes of Components.}
|
\section{Determining the failure modes of Components.}
|
||||||
|
|
||||||
The starting point in the FMEA process are the failure modes of {\bcs}.
|
The starting point in the FMEA process are the failure modes of the components
|
||||||
|
we would typically find in a production parts list, which we can term the {\bcs}.
|
||||||
|
%
|
||||||
In order the define FMEA we must start with a discussion on how these failure modes are chosen.
|
In order the define FMEA we must start with a discussion on how these failure modes are chosen.
|
||||||
%
|
%
|
||||||
In this section we look in detail at two common electrical components and examine how
|
In this section we pick %look in detail at
|
||||||
|
two common electrical components as examples, and examine how
|
||||||
the two chosen sources of {\fm} information define their failure mode behaviour.
|
the two chosen sources of {\fm} information define their failure mode behaviour.
|
||||||
We look at the reasons why some known failure modes % are omitted, or presented in
|
We look at the reasons why some known failure modes % are omitted, or presented in
|
||||||
%specific but unintuitive ways.
|
%specific but unintuitive ways.
|
||||||
@ -130,8 +146,8 @@ We look at the reasons why some known failure modes % are omitted, or presented
|
|||||||
can be found in one source but not in the others and vice versa.
|
can be found in one source but not in the others and vice versa.
|
||||||
%
|
%
|
||||||
Finally we compare and contrast the failure modes determined for these components
|
Finally we compare and contrast the failure modes determined for these components
|
||||||
from the FMD-91 reference source and from the guidelines of the
|
from the FMD-91~\cite{fmd91} reference source and from the guidelines of the
|
||||||
European burner standard EN298.
|
European burner standard EN298~\cite{en298}.
|
||||||
|
|
||||||
\subsection{Failure mode determination for generic resistor.}
|
\subsection{Failure mode determination for generic resistor.}
|
||||||
\label{sec:resistorfm}
|
\label{sec:resistorfm}
|
||||||
@ -221,6 +237,10 @@ and thus subject to drift/parameter change.
|
|||||||
|
|
||||||
\subsubsection{Resistor Failure Modes}
|
\subsubsection{Resistor Failure Modes}
|
||||||
\label{sec:res_fms}
|
\label{sec:res_fms}
|
||||||
|
The differneces in resistor failure modes between FMD-91 and EN298 are that FMD-91 would
|
||||||
|
include the failure mode DRIFT. EN298 does not include this, mainly because it imposes circuit design constraints
|
||||||
|
that effectively side step that problem.
|
||||||
|
%
|
||||||
For this study we will take the conservative view from EN298, and consider the failure
|
For this study we will take the conservative view from EN298, and consider the failure
|
||||||
modes for a generic resistor to be both OPEN and SHORT.
|
modes for a generic resistor to be both OPEN and SHORT.
|
||||||
i.e.
|
i.e.
|
||||||
@ -268,7 +288,7 @@ We need to translate these failure causes within the Op-Amp into {\fms}.
|
|||||||
We can look at each failure cause in turn, and map it to potential {\fms} suitable for use in FMEA
|
We can look at each failure cause in turn, and map it to potential {\fms} suitable for use in FMEA
|
||||||
investigations.
|
investigations.
|
||||||
|
|
||||||
\paragraph{Op-Amp failure cause: Poor Die attach}
|
\paragraph{Op-Amp failure cause: Poor Die attach.}
|
||||||
The symptom for this is given as a low slew rate. This means that the op-amp
|
The symptom for this is given as a low slew rate. This means that the op-amp
|
||||||
will not react quickly to changes on its input terminals.
|
will not react quickly to changes on its input terminals.
|
||||||
This is a failure symptom that may not be of concern in a slow responding system like an
|
This is a failure symptom that may not be of concern in a slow responding system like an
|
||||||
@ -276,24 +296,24 @@ instrumentation amplifier. However, where higher frequencies are being processed
|
|||||||
a signal may entirely be lost.
|
a signal may entirely be lost.
|
||||||
We can map this failure cause to a {\fm}, and we can call it $LOW_{slew}$.
|
We can map this failure cause to a {\fm}, and we can call it $LOW_{slew}$.
|
||||||
|
|
||||||
\paragraph{No Operation - over stress}
|
\paragraph{No Operation - over stress.}
|
||||||
Here the OP-Amp has been damaged, and the output may be held HIGH or LOW, or may be
|
Here the OP-Amp has been damaged, and the output may be held HIGH or LOW, or may be
|
||||||
effectively tri-stated, i.e. not able to drive circuitry in along the next stages of
|
effectively tri-stated, i.e. not able to drive circuitry in along the next stages of
|
||||||
the signal path: we can call this state NOOP (no Operation).
|
the signal path: we can call this state NOOP (no Operation).
|
||||||
%
|
%
|
||||||
We can map this failure cause to three {\fms}, $LOW$, $HIGH$, $NOOP$.
|
We can map this failure cause to three {\fms}, $LOW$, $HIGH$, $NOOP$.
|
||||||
|
|
||||||
\paragraph{Shorted $V_+$ to $V_-$}
|
\paragraph{Shorted inputs: $V_+$ to $V_-$.}
|
||||||
Due to the high intrinsic gain of an op-amp, and the effect of offset currents,
|
Due to the high intrinsic gain of an op-amp, and the effect of offset currents,
|
||||||
this will force the output HIGH or LOW.
|
this will force the output HIGH or LOW.
|
||||||
We map this failure cause to $HIGH$ or $LOW$.
|
We map this failure cause to $HIGH$ or $LOW$.
|
||||||
|
|
||||||
\paragraph{Open $V_+$}
|
\paragraph{Open input: $V_+$.}
|
||||||
This failure cause will mean that the minus input will have the very high gain
|
This failure cause will mean that the minus input will have the very high gain
|
||||||
of the Op-Amp applied to it, and the output will be forced HIGH or LOW.
|
of the Op-Amp applied to it, and the output will be forced HIGH or LOW.
|
||||||
We map this failure cause to $HIGH$ or $LOW$.
|
We map this failure cause to $HIGH$ or $LOW$.
|
||||||
|
|
||||||
\paragraph{Collecting Op-Amp failure modes from FMD-91}
|
\paragraph{Collecting Op-Amp failure modes from FMD-91.}
|
||||||
We can define an Op-Amp, under FMD-91 definitions to have the following {\fms}.
|
We can define an Op-Amp, under FMD-91 definitions to have the following {\fms}.
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
\label{eqn:opampfms}
|
\label{eqn:opampfms}
|
||||||
@ -301,7 +321,7 @@ We can define an Op-Amp, under FMD-91 definitions to have the following {\fms}.
|
|||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Failure Modes of an Op-Amp according to EN298}
|
\paragraph{Failure Modes of an Op-Amp according to EN298.}
|
||||||
|
|
||||||
EN298 does not specifically define OP\_AMPS failure modes; these can be determined
|
EN298 does not specifically define OP\_AMPS failure modes; these can be determined
|
||||||
by following a procedure for `integrated~circuits' outlined in
|
by following a procedure for `integrated~circuits' outlined in
|
||||||
@ -470,7 +490,7 @@ component {\fms} in FMEA or FMMD and require interpretation.
|
|||||||
FMEA is a bottom-up procedure which starts with the failure modes of the low level components of a system, an example
|
FMEA is a bottom-up procedure which starts with the failure modes of the low level components of a system, an example
|
||||||
analysis will serve to demonstrate it in practise.
|
analysis will serve to demonstrate it in practise.
|
||||||
|
|
||||||
\paragraph{ FMEA Example: Milli-volt reader.}
|
\section{FMEA worked example: milli-volt reader.}
|
||||||
Example: Let us consider a system, in this case a simple milli-volt reader, consisting
|
Example: Let us consider a system, in this case a simple milli-volt reader, consisting
|
||||||
of instrumentation amplifiers connected to a micro-processor
|
of instrumentation amplifiers connected to a micro-processor
|
||||||
that reports its readings via RS-232.
|
that reports its readings via RS-232.
|
||||||
@ -542,6 +562,7 @@ In this section we examine some fundamental concepts and underlying philosophies
|
|||||||
|
|
||||||
\paragraph{The signal path.}
|
\paragraph{The signal path.}
|
||||||
|
|
||||||
|
% C Garret does not like the terms afferent and efferent here, try to think of something else
|
||||||
Most electronic systems are used to process a signal: with signal processing
|
Most electronic systems are used to process a signal: with signal processing
|
||||||
there is usually a clear afferent to transform to efferent path.
|
there is usually a clear afferent to transform to efferent path.
|
||||||
%
|
%
|
||||||
@ -558,9 +579,6 @@ An FMEA investigation will often take the component {\fm} and examine its effect
|
|||||||
in the direction of the signal,
|
in the direction of the signal,
|
||||||
echoing diagnostic/fault~finding methods~\cite{garrett, maikowski}. % loebowski}.
|
echoing diagnostic/fault~finding methods~\cite{garrett, maikowski}. % loebowski}.
|
||||||
%
|
%
|
||||||
The rationale and work-culture of those tasked to
|
|
||||||
perform FMEA are generally personnel who have performed fault finding.
|
|
||||||
%
|
|
||||||
When fault finding we generally follow the signal path, checking for correct behaviour
|
When fault finding we generally follow the signal path, checking for correct behaviour
|
||||||
along it: when we find something out of place we zoom in and measure
|
along it: when we find something out of place we zoom in and measure
|
||||||
the circuit behaviour until we find a faulty component or module.
|
the circuit behaviour until we find a faulty component or module.
|
||||||
@ -568,6 +586,10 @@ the circuit behaviour until we find a faulty component or module.
|
|||||||
With this style of fault finding, because it is based on experiment,
|
With this style of fault finding, because it is based on experiment,
|
||||||
we can hop from module to module eliminating working modules, until we find the
|
we can hop from module to module eliminating working modules, until we find the
|
||||||
failure.
|
failure.
|
||||||
|
%
|
||||||
|
The rationale and work-culture of those tasked to
|
||||||
|
perform FMEA are generally personnel who have performed fault finding.
|
||||||
|
%
|
||||||
|
|
||||||
|
|
||||||
FMEA is a theoretical discipline.
|
FMEA is a theoretical discipline.
|
||||||
@ -575,15 +597,23 @@ FMEA is a theoretical discipline.
|
|||||||
It would be very unusual to build a circuit and then simulate
|
It would be very unusual to build a circuit and then simulate
|
||||||
component failure modes.
|
component failure modes.
|
||||||
%
|
%
|
||||||
This would be time consuming as it would involve building a circuit for each component {\fm} in the system.
|
This would be time consuming as it would involve building a circuit for each component {\fm} in
|
||||||
|
the system\footnote{Building circuit simulations and simulating component failure modes
|
||||||
|
would be a very time consuming process and might only be performed as a final-stage of accident investigation, where the cause is
|
||||||
|
required to be proven.}
|
||||||
%
|
%
|
||||||
We cannot, as with fault finding, verify modules along the signal path for correct behaviour
|
We cannot, as with fault finding, verify modules along the signal path for correct behaviour
|
||||||
and eliminate them from the investigation.
|
and eliminate them from the investigation.
|
||||||
%
|
%
|
||||||
With FMEA we therefore need to be more thorough.
|
FMEA is a `thought~experiment', not actual experiment.
|
||||||
|
%
|
||||||
|
With FMEA we therefore need to be more thorough in the consideration of the effects a failure mode may have
|
||||||
|
on the other components in a system, than with fault finding.
|
||||||
%
|
%
|
||||||
The question is by how much.
|
The question is by how much.
|
||||||
|
%
|
||||||
Too much and the task becomes impossible due to time/labour constraints.
|
Too much and the task becomes impossible due to time/labour constraints.
|
||||||
|
%
|
||||||
Too little and the analysis could become meaningless because it misses
|
Too little and the analysis could become meaningless because it misses
|
||||||
potential system failures.
|
potential system failures.
|
||||||
%
|
%
|
||||||
@ -594,10 +624,21 @@ of the component exhibiting the {\fm} under investigation.
|
|||||||
Also, whether following the effects through the signal path {\em only} is acceptable, and instead
|
Also, whether following the effects through the signal path {\em only} is acceptable, and instead
|
||||||
looking at its effect on all other components in the system is necessary,
|
looking at its effect on all other components in the system is necessary,
|
||||||
is a matter for debate.
|
is a matter for debate.
|
||||||
|
%
|
||||||
In practise, it is a compromise between the amount of time/money that can be spent
|
In practise, it is a compromise between the amount of time/money that can be spent
|
||||||
on analysis relative to the criticality of the project.
|
on analysis relative to the criticality of the project.
|
||||||
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.
|
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.
|
||||||
|
|
||||||
|
\paragraph{Failure Modes and the signal path}
|
||||||
|
|
||||||
|
In general a component failure mode in an electronic circuit will
|
||||||
|
change the circuit topology. For a single failure
|
||||||
|
this effect may cause additional complications for the analyst.
|
||||||
|
For multiple failures this means
|
||||||
|
that the analyst
|
||||||
|
will have to deal altered---or changed circuit topologies---
|
||||||
|
of the electronic circuit for each analysis.
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Single component failure mode to system failure relation.}
|
\paragraph{Single component failure mode to system failure relation.}
|
||||||
|
|
||||||
@ -619,11 +660,12 @@ From a whole system perspective, we may find that {\bc} {\fms}
|
|||||||
may have more than one possible system event associated with them.
|
may have more than one possible system event associated with them.
|
||||||
Often there will be a clear one to one mapping, but
|
Often there will be a clear one to one mapping, but
|
||||||
probabilities to failure (as used in FMECA)
|
probabilities to failure (as used in FMECA)
|
||||||
could mean one to many.% mapping.
|
could mean one too many. % mapping.
|
||||||
%
|
%
|
||||||
|
\paragraph{Use of Markov chains to model failure modes.}
|
||||||
We could represent a failure mode and its possible outcomes using a Markov chain~\cite{probfmea_4338247}.
|
We could represent a failure mode and its possible outcomes using a Markov chain~\cite{probfmea_4338247}.
|
||||||
%
|
%
|
||||||
Where multiple simultaneous\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}
|
Where multiple simultaneous%\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}
|
||||||
failure modes are considered this complicates
|
failure modes are considered this complicates
|
||||||
the statistical nature of the Markov chain, cause effect model.
|
the statistical nature of the Markov chain, cause effect model.
|
||||||
%
|
%
|
||||||
@ -734,15 +776,22 @@ required to map a failure cause to its potential outcomes.
|
|||||||
In our basic FMEA example in section~\ref{basicfmea}
|
In our basic FMEA example in section~\ref{basicfmea}
|
||||||
we were asked to consider one failure mode against all the components in the milli-volt reader.
|
we were asked to consider one failure mode against all the components in the milli-volt reader.
|
||||||
%
|
%
|
||||||
To create a complete FMEA report on the milli-volt reader we would have had to examine every
|
To create an exhaustive FMEA report on the milli-volt reader, we would have had to examine every
|
||||||
known failure mode of every component within it---against all its other components.
|
known failure mode of every component within it---against all its other components.
|
||||||
%
|
%
|
||||||
The reasoning~distance is defined as the sum of the number of failure modes, against all other components
|
We define `reasoning~distance' as the number of components checked against
|
||||||
|
for a given failure mode to determine a system level symptom.
|
||||||
|
%
|
||||||
|
No current FMEA variant gives guidelines for the components that should
|
||||||
|
be included to analyse a {\fm} in a system.
|
||||||
|
%does not
|
||||||
|
The exhaustive~reasoning~distance would be
|
||||||
|
the sum of the number of failure modes, against all other components
|
||||||
in that system.
|
in that system.
|
||||||
%
|
%
|
||||||
If the milli-volt reader had say 100 components, with three failure modes each, this
|
If the milli-volt reader had say 100 components, with three failure modes each, this
|
||||||
would give a reasoning distance of 3 * 100 * 99.
|
would give an exhaustive reasoning distance of 3 * 100 * 99.
|
||||||
|
%
|
||||||
The discussion on reasoning distance leads provides us with a metric to examine
|
The discussion on reasoning distance leads provides us with a metric to examine
|
||||||
the state explosion problems associated with forward search failure investigation
|
the state explosion problems associated with forward search failure investigation
|
||||||
methodologies.
|
methodologies.
|
||||||
@ -799,9 +848,10 @@ double failure scenarios (for burner lock-out scenarios).}
|
|||||||
%(N^2 - N).f
|
%(N^2 - N).f
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
For our theoretical 100 components with 3 failure modes each example, this is
|
For our theoretical 100 components with 3 failure modes each example, this is a reasoning distance of
|
||||||
$100*99*98*3=2,910,600$ failure mode scenarios.
|
$100*99*98*3=2,910,600$ . % failure mode scenarios.
|
||||||
|
In practise there is an additional concern here, that of
|
||||||
|
the circuit topology changes that {\fms} can cause.
|
||||||
|
|
||||||
\paragraph{Reliance on experts for meaningful FMEA Analysis.}
|
\paragraph{Reliance on experts for meaningful FMEA Analysis.}
|
||||||
Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach.
|
Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach.
|
||||||
@ -818,7 +868,7 @@ on anything but a non-trivial system.
|
|||||||
|
|
||||||
\subsection{Component Tolerance}
|
\subsection{Component Tolerance}
|
||||||
|
|
||||||
Component tolerances may need considered when determining if a component has failed.
|
Component tolerances may need considering when determining if a component has failed.
|
||||||
Calculations for acceptable ranges to determine failure or acceptable conditions
|
Calculations for acceptable ranges to determine failure or acceptable conditions
|
||||||
must be made where appropriate.
|
must be made where appropriate.
|
||||||
%
|
%
|
||||||
@ -846,13 +896,14 @@ is given in section~\ref{sec:resistortolerance}.
|
|||||||
|
|
||||||
Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of
|
Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of
|
||||||
cost, problems to be addressed in product production.
|
cost, problems to be addressed in product production.
|
||||||
|
%
|
||||||
It focuses on known problems, determines the
|
It generally focuses on known problems and using their
|
||||||
frequency they occur and their cost to fix.
|
statistical frequency %they occur
|
||||||
This is multiplied together and called an RPN
|
and their cost to fix multiplied gives a Risk Priority Number (RPN)
|
||||||
number.
|
number for the component {\fm}.
|
||||||
|
%
|
||||||
Fixing problems with the highest RPN number
|
Fixing problems with the highest RPN number
|
||||||
will return most cost benefit.
|
will return most cost benefit~\cite{bfmea}.
|
||||||
|
|
||||||
% benign example of PFMEA in CARS - make something up.
|
% benign example of PFMEA in CARS - make something up.
|
||||||
\subsection{PFMEA Example}
|
\subsection{PFMEA Example}
|
||||||
@ -872,7 +923,7 @@ will return most cost benefit.
|
|||||||
|
|
||||||
\section{FMECA - Failure Modes Effects and Criticality Analysis}
|
\section{FMECA - Failure Modes Effects and Criticality Analysis}
|
||||||
|
|
||||||
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
|
||||||
% \begin{figure}
|
% \begin{figure}
|
||||||
% \centering
|
% \centering
|
||||||
% %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg}
|
% %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg}
|
||||||
@ -883,10 +934,16 @@ will return most cost benefit.
|
|||||||
% \end{figure}
|
% \end{figure}
|
||||||
FMECA places emphasis on determining criticality rather than the cost of system failures.
|
FMECA places emphasis on determining criticality rather than the cost of system failures.
|
||||||
%
|
%
|
||||||
Applies some Bayesian statistics (probabilities of component failures
|
It applies Bayesian statistics (probabilities of component failures
|
||||||
thereby causing given system level failures).
|
and the probability of those failures causing given system level failures)
|
||||||
|
to determine the risk of system level events/symptoms.
|
||||||
|
The results of the probabilities for the system level failures
|
||||||
|
are multiplied by the operational time of the system.
|
||||||
|
For instance a military or emergency system may be typically operational for
|
||||||
|
a given number of hours. This in conjunction with the severity
|
||||||
|
of the system level event gives us a level of criticality.
|
||||||
%
|
%
|
||||||
Also the probability of the system failure causing a critical event.
|
%Also the probability of the system failure causing a critical event.
|
||||||
%
|
%
|
||||||
Applying Bayesian statistics to failure analysis, suffers the
|
Applying Bayesian statistics to failure analysis, suffers the
|
||||||
problem that correlation does not imply causation~\cite{bayesfrequentist}.
|
problem that correlation does not imply causation~\cite{bayesfrequentist}.
|
||||||
@ -895,9 +952,7 @@ However, correlation is evidence for causation, and maybe the only evidence to h
|
|||||||
and this is the justification behind its use.
|
and this is the justification behind its use.
|
||||||
A history of the usage and development of FMECA may be found in~\cite{FMECAresearch}.
|
A history of the usage and development of FMECA may be found in~\cite{FMECAresearch}.
|
||||||
|
|
||||||
|
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
|
||||||
|
|
||||||
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
|
||||||
Very similar to PFMEA, but instead of cost, a criticality or
|
Very similar to PFMEA, but instead of cost, a criticality or
|
||||||
seriousness factor is ascribed to putative top level incidents.
|
seriousness factor is ascribed to putative top level incidents.
|
||||||
FMECA has three probability factors for component failures.
|
FMECA has three probability factors for component failures.
|
||||||
@ -917,7 +972,7 @@ a particular failure~mode occurring within a component. reference FMD-91.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
|
||||||
\textbf{FMECA $\beta$ value.}
|
\textbf{FMECA $\beta$ value.}
|
||||||
The second probability factor $\beta$, is the probability that the failure mode
|
The second probability factor $\beta$, is the probability that the failure mode
|
||||||
will cause a given system failure.
|
will cause a given system failure.
|
||||||
@ -938,17 +993,15 @@ A weighting factor to indicate the seriousness of the putative system level erro
|
|||||||
C_m = {\beta} . {\alpha} . {{\lambda}_p} . {t} . {s}
|
C_m = {\beta} . {\alpha} . {{\lambda}_p} . {t} . {s}
|
||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
Highest $C_m$ values would be at the top of a `to~do' list
|
The highest $C_m$ values would represent the most dangerous or serious
|
||||||
for a project manager.
|
system level failures.
|
||||||
|
The highest $C_m$ values would be at the top of a `to~fix' list
|
||||||
|
for a project manager, and some levels of risk may be considered unacceptable
|
||||||
|
and require re-design of some systems.
|
||||||
|
|
||||||
|
|
||||||
\section{FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
\section{FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
||||||
% \begin{figure}
|
% \begin{figure}
|
||||||
% \centering
|
% \centering
|
||||||
|
@ -1,8 +1,20 @@
|
|||||||
\label{sec:chap3}
|
\label{sec:chap3}
|
||||||
|
|
||||||
|
\section*{Introduction}
|
||||||
|
|
||||||
|
This chapter examines FMEA in a critical light.
|
||||||
|
The problems with the scope---or required reasoning distance---of detail to apply
|
||||||
|
for FMEA analysis are examined. The impossibility of integrating software
|
||||||
|
and hardware in FMEA failure models, and the impossibility of performing meaningful
|
||||||
|
multiple failure analysis are examined.
|
||||||
|
Additional problems such as the inability to easily re-use, and validate (through
|
||||||
|
traceable reasoning) FMEA models is presented.
|
||||||
|
Finally we conclude with a list of deficiencies in current FMEA methodologies, and present a wish list
|
||||||
|
for an improved methodology.
|
||||||
|
|
||||||
\section{Historical Origins of FMEA}
|
\section{Historical Origins of FMEA}
|
||||||
|
|
||||||
\subsection{FMEA designed for simple electro-mechanical systems}
|
\subsection{FMEA: {\bc} {\fm} to system level failure modelling}
|
||||||
FMEA traces it roots to the 1940s when it was used to identify the most costly
|
FMEA traces it roots to the 1940s when it was used to identify the most costly
|
||||||
failures arising from car mass-production~\cite{bfmea}.
|
failures arising from car mass-production~\cite{bfmea}.
|
||||||
It was later modified slightly to include severity of the top level failure (FMECA~\cite{fmeca}).
|
It was later modified slightly to include severity of the top level failure (FMECA~\cite{fmeca}).
|
||||||
@ -14,6 +26,13 @@ This means that we have one analysis case per component failure mode for all the
|
|||||||
This analysis philosophy has not changed since FMEA was first used.
|
This analysis philosophy has not changed since FMEA was first used.
|
||||||
|
|
||||||
|
|
||||||
|
\subsection{FMEA does not support Traceable Reasoning}
|
||||||
|
An FMEA report normally assigns one line of a spreadsheet to
|
||||||
|
each {\bc} {\fm}.
|
||||||
|
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
|
||||||
|
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
|
||||||
|
but the structure of current FMEA reports does not encourage this.
|
||||||
|
|
||||||
\subsection{FMEA does not support modularity.}
|
\subsection{FMEA does not support modularity.}
|
||||||
It is a common practise in the process control industry to buy in sub-systems,
|
It is a common practise in the process control industry to buy in sub-systems,
|
||||||
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
|
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
|
||||||
@ -64,10 +83,19 @@ We could term such a group a `{\fg}'.
|
|||||||
|
|
||||||
Given the {\bc} {\fm} to system level failure mode paradigm it is
|
Given the {\bc} {\fm} to system level failure mode paradigm it is
|
||||||
difficult to re-use FMEA analysis.
|
difficult to re-use FMEA analysis.
|
||||||
|
%
|
||||||
Several strategies to aid re-use have been proposed~\cite{rudov2009language, reuse_of_fmea}, but
|
Several strategies to aid re-use have been proposed~\cite{rudov2009language, reuse_of_fmea}, but
|
||||||
the fundamental problem remains, that, with any changes
|
the fundamental problem remains, that, with any changes
|
||||||
to the component base in a system, it is very difficult to
|
to the component base in a system, it is very difficult to
|
||||||
determine which FMEA test scenarios must be re-worked.
|
determine which FMEA test scenarios must be re-worked.
|
||||||
|
%
|
||||||
|
It is common in safety critical systems to have repeated circuit topologies.
|
||||||
|
For instance we may have several signal input and output
|
||||||
|
structures that are repeated.
|
||||||
|
%
|
||||||
|
The failure mode behaviour of these repeated structures will be the same.
|
||||||
|
However with the {\bc} {\fm} to system level failure mode mapping
|
||||||
|
work is likely to be repeated.
|
||||||
|
|
||||||
|
|
||||||
\section{software and FMEA}
|
\section{software and FMEA}
|
||||||
@ -82,7 +110,7 @@ Similar difficulties in integrating mechanical and electronic/software
|
|||||||
failure models are discussed in ~\cite{SMR:SMR580,swassessment}.
|
failure models are discussed in ~\cite{SMR:SMR580,swassessment}.
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Current work on Software FMEA}
|
\paragraph{Current work on Software FMEA.}
|
||||||
|
|
||||||
SFMEA usually does not seek to integrate
|
SFMEA usually does not seek to integrate
|
||||||
hardware and software models, but to perform
|
hardware and software models, but to perform
|
||||||
@ -204,104 +232,104 @@ utterly anachronistic in the distributed real time system environment.
|
|||||||
|
|
||||||
FMEA is no longer fit for purpose!
|
FMEA is no longer fit for purpose!
|
||||||
%
|
%
|
||||||
|
%
|
||||||
\section{Conclusions on current FMEA Methodologies}
|
% \section{Conclusions on current FMEA Methodologies}
|
||||||
|
%
|
||||||
%% FOCUS
|
% %% FOCUS
|
||||||
The focus of this chapter %literature review
|
% The focus of this chapter %literature review
|
||||||
is to establish the current practice and applications
|
% is to establish the current practice and applications
|
||||||
of FMEA.
|
% of FMEA.
|
||||||
%, and to examine its strengths and weaknesses.
|
% %, and to examine its strengths and weaknesses.
|
||||||
%% GOAL
|
% %% GOAL
|
||||||
Its
|
% Its
|
||||||
goal is to identify central issues and to criticise and assess the current
|
% goal is to identify central issues and to criticise and assess the current
|
||||||
FMEA methodologies.
|
% FMEA methodologies.
|
||||||
%% PERSPECTIVE
|
% %% PERSPECTIVE
|
||||||
The perspective of the author, is as a practitioner of static failure mode analysis techniques
|
% The perspective of the author, is as a practitioner of static failure mode analysis techniques
|
||||||
concerning approval of product
|
% concerning approval of product
|
||||||
to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
|
% to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
|
||||||
A second perspective is that of a software engineer trained to use formal methods.
|
% A second perspective is that of a software engineer trained to use formal methods.
|
||||||
Examining FMEA methodologies for mathematical properties, influenced by
|
% Examining FMEA methodologies for mathematical properties, influenced by
|
||||||
formal methods applied to software, should provide a perspective not traditionally considered.
|
% formal methods applied to software, should provide a perspective not traditionally considered.
|
||||||
%% COVERAGE
|
% %% COVERAGE
|
||||||
The literature reviewed, has been restricted to published books, European safety standards (as examples
|
% The literature reviewed, has been restricted to published books, European safety standards (as examples
|
||||||
of current safety measures applied), and traditional research, from journal and conference papers.
|
% of current safety measures applied), and traditional research, from journal and conference papers.
|
||||||
%% ORGANISATION
|
% %% ORGANISATION
|
||||||
The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
|
% The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
|
||||||
to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
|
% to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
|
||||||
%% AUDIENCE
|
% %% AUDIENCE
|
||||||
% Well duh! PhD supervisors and examiners....
|
% % Well duh! PhD supervisors and examiners....
|
||||||
|
%
|
||||||
% \subsection{Related Methodologies}
|
% % \subsection{Related Methodologies}
|
||||||
% FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
|
% % FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
|
||||||
% \subsection{Hardware FMEA (HFMEA)}
|
% % \subsection{Hardware FMEA (HFMEA)}
|
||||||
% \subsection{Multiple Failure scenarios and FMEA}
|
% % \subsection{Multiple Failure scenarios and FMEA}
|
||||||
% \subsection{Software FMEA (SFMEA)}
|
% % \subsection{Software FMEA (SFMEA)}
|
||||||
|
%
|
||||||
\paragraph{Current work on Software FMEA}
|
% \paragraph{Current work on Software FMEA}
|
||||||
|
%
|
||||||
SFMEA usually does not seek to integrate
|
% SFMEA usually does not seek to integrate
|
||||||
hardware and software models, but to perform
|
% hardware and software models, but to perform
|
||||||
FMEA on the software in isolation~\cite{procsfmea}.
|
% FMEA on the software in isolation~\cite{procsfmea}.
|
||||||
%
|
% %
|
||||||
Work has been performed using databases
|
% Work has been performed using databases
|
||||||
to track the relationships between variables
|
% to track the relationships between variables
|
||||||
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
% and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
||||||
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
% introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
||||||
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
% automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
||||||
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
% some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
||||||
and FMEA (bottom-up inductive)
|
% and FMEA (bottom-up inductive)
|
||||||
to be performed on the same system to provide insight into the
|
% to be performed on the same system to provide insight into the
|
||||||
software hardware/interface~\cite{embedsfmea}.
|
% software hardware/interface~\cite{embedsfmea}.
|
||||||
%
|
% %
|
||||||
Although this
|
% Although this
|
||||||
would give a better picture of the failure mode behaviour, it
|
% would give a better picture of the failure mode behaviour, it
|
||||||
is by no means a rigorous approach to tracing errors that may occur in hardware
|
% is by no means a rigorous approach to tracing errors that may occur in hardware
|
||||||
through to the top (and therefore ultimately controlling) layer of software~\cite{swassessment}.
|
% through to the top (and therefore ultimately controlling) layer of software~\cite{swassessment}.
|
||||||
|
%
|
||||||
\paragraph{Current FMEA techniques are not suitable for software}
|
% \paragraph{Current FMEA techniques are not suitable for software}
|
||||||
|
%
|
||||||
The main FMEA methodologies are all based on the concept of taking
|
% The main FMEA methodologies are all based on the concept of taking
|
||||||
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
|
% base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
|
||||||
%
|
% %
|
||||||
In a complicated system, mapping a component failure mode to a system level failure
|
% In a complicated system, mapping a component failure mode to a system level failure
|
||||||
will mean a long reasoning distance; that is to say the actions of the
|
% will mean a long reasoning distance; that is to say the actions of the
|
||||||
failed component will have to be traced through
|
% failed component will have to be traced through
|
||||||
several sub-systems, gauging its effects with and on other components.
|
% several sub-systems, gauging its effects with and on other components.
|
||||||
%
|
% %
|
||||||
With software at the higher levels of these sub-systems,
|
% With software at the higher levels of these sub-systems,
|
||||||
we have yet another layer of complication.
|
% we have yet another layer of complication.
|
||||||
%
|
% %
|
||||||
%In order to integrate software, %in a meaningful way
|
% %In order to integrate software, %in a meaningful way
|
||||||
%we need to re-think the
|
% %we need to re-think the
|
||||||
%FMEA concept of simply mapping a base component failure to a system level event.
|
% %FMEA concept of simply mapping a base component failure to a system level event.
|
||||||
%
|
% %
|
||||||
SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
|
% SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
|
||||||
The failure modes of these variables, are that they could become erroneously over-written,
|
% The failure modes of these variables, are that they could become erroneously over-written,
|
||||||
calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
|
% calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
|
||||||
external influences such as
|
% external influences such as
|
||||||
ionising radiation causing bits to be erroneously altered.
|
% ionising radiation causing bits to be erroneously altered.
|
||||||
|
%
|
||||||
|
%
|
||||||
\paragraph{FMEA and Modularity}
|
% \paragraph{FMEA and Modularity}
|
||||||
From the 1940's onwards, software has evolved from a simple procedural languages (i.e. assembly language/Fortran~\cite{f77} call return)
|
% From the 1940's onwards, software has evolved from a simple procedural languages (i.e. assembly language/Fortran~\cite{f77} call return)
|
||||||
to structured programming ( C~\cite{DBLP:books/ph/KernighanR88}, pascal etc) and then to object oriented models (Java C++...).
|
% to structured programming ( C~\cite{DBLP:books/ph/KernighanR88}, pascal etc) and then to object oriented models (Java C++...).
|
||||||
FMEA has undergone no such evolution.
|
% FMEA has undergone no such evolution.
|
||||||
%
|
% %
|
||||||
In a world where sensor systems, often including embedded software components, are brought in to
|
% In a world where sensor systems, often including embedded software components, are brought in to
|
||||||
create complex systems, FMEA still follows a rigid {\bc} {\fm} to system level error model,
|
% create complex systems, FMEA still follows a rigid {\bc} {\fm} to system level error model,
|
||||||
that is only suitable for simple electro mechanical systems.
|
% that is only suitable for simple electro mechanical systems.
|
||||||
|
%
|
||||||
|
%
|
||||||
|
%
|
||||||
%
|
% %
|
||||||
|
%
|
||||||
%
|
% %
|
||||||
% MAYBE MOVE THIS TO CH3, FMEA CRITICISM
|
% % MAYBE MOVE THIS TO CH3, FMEA CRITICISM
|
||||||
% 30JAN2013
|
% 30JAN2013
|
||||||
%
|
%
|
||||||
|
|
||||||
\subsection{Where FMEA is now.}
|
\subsection{FMEA Criticism: Conclusions.}
|
||||||
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
|
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
|
||||||
very good with single failure modes linked to top level events.
|
very good with single failure modes linked to top level events.
|
||||||
FMEA has become part of the safety critical and safety certification industries.
|
FMEA has become part of the safety critical and safety certification industries.
|
||||||
@ -319,7 +347,7 @@ All these FMEA based methodologies have the following short comings:
|
|||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item Impossible to integrate Software and hardware models,
|
\item Impossible to integrate Software and hardware models,
|
||||||
\item State explosion problem exacerbated by increasing complexity due to density of modern electronics,
|
\item State explosion problem exacerbated by increasing complexity due to density of modern electronics,
|
||||||
\item Impossibility to consider all multiple component failure modes~\cite{FMEAmultiple653556}
|
\item Impossible to consider all multiple component failure modes~\cite{FMEAmultiple653556}
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
|
|
||||||
@ -333,7 +361,7 @@ We now form a wish list, stating the features that we would want
|
|||||||
in an improved FMEA methodology,
|
in an improved FMEA methodology,
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item No state explosion making analysis impractical,
|
\item No state explosion making analysis impractical,
|
||||||
\item Rigorous (total failure coverage within {\fgs} all interacting component and failure modes checked),
|
\item Exhaustive checking (total failure coverage within {\fgs} all interacting component and failure modes checked),
|
||||||
\item Reasoning Traceable in system models,
|
\item Reasoning Traceable in system models,
|
||||||
\item Re-useable i.e. it should be possible to re-use analysis performed previously,
|
\item Re-useable i.e. it should be possible to re-use analysis performed previously,
|
||||||
\item It must be possible to analyse simultaneous/multiple failures,
|
\item It must be possible to analyse simultaneous/multiple failures,
|
||||||
|
Loading…
Reference in New Issue
Block a user