Robin_PHD/submission_thesis/CH2_FMEA/copy.tex


%%% CHAPTER 2
\label{sec:chap2}

The generic and statistical European Safety Standard, EN61508:6\cite{en61508}[B.6.6]
describes Failure Mode Effect Analysis (FMEA) as:
\begin{quotation}
``To analyse a system design, by examining all possible sources of failure
of a system's components and determining the effects of these failures
on the behaviour and safety of the system.''
\end{quotation}
\fmeagloss
\section*{Introduction}
This chapter introduces Failure Mode Effect Analysis (FMEA).
%It begins with a simple example to demonstrate the basic concept of FMEA
%and then
It starts with a generic conceptual overview of the process.
It then looks at the stages of the FMEA process in greater detail, starting with
how to determine the failure modes associated with components.
%
Two common electrical components, the resistor and the operational amplifier
are examined in the context of two sources of information that define failure modes.
%
To introduce the concept  of FMEA, a simple example is  given, using a hypothetical four to twenty milli-amp ({\ft}) %milli-amp
reader.
%
The four main current FMEA variants are described along with %and we  develop %conclude by describing concepts
the concepts
that underlie the usage and philosophy of FMEA. %Fof a grou discussed.
%
The overall process of FMEA is then reviewed and modelled  using UML.
%
By using UML
the entities needed to implement FMEA
are defined.
%
The act of defining relationships between the data objects in FMEA raises questions about the nature of the process
and allows analysis of its strengths and weaknesses.


\section{FMEA Basic concept.}
\label{basicfmea}
%\subsection{FMEA}
%\tableofcontents[currentsection]
%\paragraph{FMEA basic concept.}

FMEA~\cite{safeware}[pp.341-344] is widely used, and proof of its use is a %mandatory
legal requirement
for a large proportion of safety critical products sold in the European Union.
The acronym FMEA can be expanded as follows:
\begin{itemize}
   \item \textbf{F - Failures of given component,} Consider a particular component in a system;
    \item \textbf{M - Failure Mode,} Choose a particular failure mode of this component; %  `failure~mode';
    \item \textbf{E - Effects,} Determine the effects this failure mode will cause; % the system; we are examining;
   \item \textbf{A - Analysis,} Analyse how much impact this symptom will have on the environment/operators/the system itself.
\end{itemize}
\fmeagloss
%
FMEA is a broad term; it could mean anything from an informal check on
how failures could affect some equipment in %an initial
a brain-storming session
%in product design,
to formal submission as part of safety critical certification.
FMEA is a manual, % and therefore
time intensive process. To reduce the amount of manual work performed,
software packages~\cite{931423, 1778436820050601} and analysis strategies have
been developed~\cite{incrementalfmea, automatingFMEA1281774}.
%
FMEA is always performed in context. That is, the equipment is always analysed for a particular purpose
and in a given environment. An `O' ring for instance can fail by leaking
but if fitted to a water seal on a garden hose, the system level failure %is a
would be  a slight leak at the tap. % outside the house.
%
Applied to the rocket engine on a space shuttle an 'O' ring failure
could cause a catastrophic fire and destruction of the spacecraft and occupants~\cite{challenger}.
%
At a lower level, consider a resistor and capacitor forming a potential divider to ground.
This could be considered a low pass filter in some electrical environments~\cite{aoe},
but for fixed frequencies the same circuit could be used as a phase changer~\cite{electronicssysapproach}[p.114].
The failure modes of the latter, could be `no~signal' and `all~pass',
but when used as a phase changer, would be `no~signal' and `no~phase' change.
%
The actual failure modes for a `group~of~components', are therefore defined by the
function that they perform.
%
% This chapter describes basic concepts of FMEA, uses a simple example to
% demonstrate a single  FMEA analysis stage, describes the four main variants of FMEA in use today
% and explores some concepts with which we can discuss and evaluate
% the effectiveness of FMEA.
\fmeagloss
\section{FMEA Process}

The initial stage of the FMEA process is with the basic, or starting components.
%
These components are the sort bought in or considered as pre-assembled modules.
These are termed `{\bcs}'; they are considered ``atomic'' i.e. they are not broken down further.
%
The first requirement for a {\bc} is to define the ways in which it can fail,
this relationship %between a {\bc} and its failure modes,
is shown in figure~\ref{fig:component_fm_rel}.
\fmmdglossBC
%DIAGRAM of Base components and failure modes

\begin{figure}[h]
 \centering
 \includegraphics[width=300pt]{./CH2_FMEA/component_fm_rel.png}
 % component_fm_rel.png: 368x71 pixel, 72dpi, 12.98x2.50 cm, bb=0 0 368 71
 \caption{Base Component to Failure Modes relationship}
 \label{fig:component_fm_rel}
\end{figure}

The next stage is analysis, that is reasoning applied to the system in the event of
a given failure mode.
%
To analyse how a failure
mode, after considering its effect on other components in the system,
will translate to a system level symptom/failure.
%
The result of FMEA  is to determine  system level failures,
or symptoms for each given component failure mode.
%
In practise, each entry of an FMEA analysis of a {\bc} {\fm}
would typically be one line in a spreadsheet.
%
The analysis to symptom relationship is generally % considered
one-to-one, however here (see figure~\ref{fig:component_fm_rel_ana}), allowance is made for the possibility
of more than one failure symptom.
%DIAGRAM of reasoning and Symptoms.

\begin{figure}[h]
 \centering
 \includegraphics[width=400pt]{./CH2_FMEA/component_fm_rel_ana.png}
 % component_fm_rel_ana.png: 369x184 pixel, 72dpi, 13.02x6.49 cm, bb=0 0 369 184
 \caption{FMEA analyis entry data relationships}
 \label{fig:component_fm_rel_ana}
\end{figure}

Figure ~\ref{fig:component_fm_rel_ana} defines the data relationships
for FMEA. This model is later extended in the conclusion
of this chapter.


\section{Determining the failure modes of {\bcs}}
\fmodegloss
\fmmdglossBC
\label{sec:determine_fms}
\fmodegloss
In order to apply any form of FMEA  the ways in which
the {\bcs}\footnote{A good introduction to hardware and software failure modes may be found in~\cite{sccs}[pp.114-124].} %used
can fail must be clearly defined.
%
In practice, this part of the process is guided by %%% PRACTICE NOUN Practice makes perfect.------------------- PRACTISE --- VERB I practise the piano.
the particular standard
which is being conformed to. %we are seeking to conform.% to.
%
Standards may differ in their definitions for the {\fms} of {\bcs}.
The reasons for these differences are examined below using two example components.
%
%
%%%%%%%%%% DATA SHEETS and FAILURE MODES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
Typically, when choosing components for a design, engineers will look at manufacturers' data~sheets
which describe functionality, physical dimensions,
environmental ranges and tolerances etc. .
%
It is rare for a data~sheet to list failure modes.
%
Data~sheets after all are a sales tool as well as being a usage guide and technical description.
%
However, `reading~between~the~lines' or noting what is not~stated,
can in some cases indicate how a component could  fail/misbehave.
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%under given conditions.
%
How %base
components could fail internally is not of interest to an FMEA investigation.
The FMEA investigator needs to know what failure behaviour a component could exhibit. %, or in other words, its modes of failure.
%
A large body of literature exists giving guidance for the determination of  component {\fms}.
%
For this study FMD-91~\cite{fmd91} and the gas burner standard EN298~\cite{en298} are examined.
%Some standards prescribe specific failure modes for generic component types.
In EN298 failure modes for most generic component types are listed, or if not listed,
are determined using a procedure:
typically of the form of examining scenarios such as
`all~pins~open' and then `all~adjacent~pins~shorted'~\cite{en298}[A.1 note e].

%a procedure where failure scenarios of all pins OPEN and all adjacent pins shorted
%are examined.
%
%
FMD-91 is a reference document released into the public domain by the United States DOD
and describes `failures' of common electronic components, with percentage statistics for each failure.
%
FMD-91 entries include general descriptions of  internal failures alongside {\fms} of use to an FMEA investigation.
%
FMD-91 entries need, in some cases, some interpretation to be mapped to a clear set of
component {\fms} suitable for use in FMEA.
%
A third document, MIL-1991~\cite{mil1991} provides overall reliability statistics for
component types, but does not detail specific failure modes.
%
Using MIL1991 in conjunction with FMD-91 statistics can be determined for the failure modes
of component types.
%
As these documents are now a little old, the results
from them can be on the conservative side.
\frategloss
\fmmdglossFIT
%
A FIT\footnote{Failure rates measured per $10^9$ hours of operation
are known as Failure in Time (FIT) values.} value for a micro-processor
may be determined at around 100 using these documents for instance, but
FIT claims for modern integrated micro-controllers are typically less than five~\cite{microchipreliability}.
%
The FMEA variant\footnote{EN61508 (and related standards) are based on the FMEA variant Failure Mode Effects and Diagnostic Analysis (FMEDA)}
used for European standard EN61508~\cite{en61508}
requires statistics for Mean Time to Failure (MTTF) for all {\bc} failure modes.


% One is from the US military document FMD-91, where internal failures
% of components are described (with stats).
%
% The other is EN298 where the failure modes for generic component types are prescribed, or
% determined by a procedure where failure scenarios of all pins OPEN and all adjacent pins shorted
% is applied. These techniques
%
% The FMD-91 entries need, in some cases, some interpretation to be mapped to
% component failure symptoms, but include failure modes that can be due to internal failures.
% The EN298 SHORT/OPEN procedure cannot determine failures due to internal causes but can be applied to any IC.
%
% Could I come in and see you Chris to quickly discuss these.
%
% I hope to have chapter 5 finished by the end of March, chapter 5 being the
% electronics examples for the FMMD methodology.

\section{Determining the failure modes of Components.}
\fmodegloss
The starting points in the  FMEA process  are the failure modes of the {\bcs}.
%s
%Typically found in a production parts list, which are termed the {\bcs}.
%
In order to define FMEA, a discussion on how these failure modes are defined and
their relationship to particular standards is presented below.
%
%In this section we pick %look in detail at
Two common electrical components are used as examples,
and examined against two sources of {\fm} information. % define their failure mode behaviour.
%
Failure mode definitions for a given generic component may not always agree.
%
The reasons why, some {\fms}
can be found in one source, but not in the others and vice versa, are discussed.
%
Finally, the failure modes determined %for these components
from the FMD-91~\cite{fmd91} reference source and from the guidelines of the
European burner standard EN298~\cite{en298}, are compared and contrasted.

\clearpage

\subsection{Failure mode determination for generic resistor.}
\label{sec:resistorfm}
%- Failure modes. Prescribed failure modes EN298 - FMD91
\paragraph{Resistor failure modes according to FMD-91.}
\fmodegloss

%The resistor is a ubiquitous component in electronics, and is therefore a good candidate for detailed examination of its failure modes.
%
FMD-91\cite{fmd91}[3-178] lists many types of resistor
and lists many possible failure causes,
for instance for {\textbf{Resistor,~Fixed,~Film}} the following failure causes are given:
\begin{itemize}
 \item Opened 52\%  ,
  \item Drift 31.8\%  ,
 \item Film Imperfections 5.1\%  ,
 \item Substrate defects 5.1\% ,
 \item Shorted 3.9\%  ,
 \item Lead damage 1.9\%  .
\end{itemize}
% This information may be of interest to the manufacturer of resistors, but it does not directly
% help a circuit designer.
% The circuit designer is not interested in the causes of resistor failure, but to build in contingency
% against {\fms} that the resistor could exhibit.
% We can determine these {\fms} by converting the internal failure descriptions
% to {\fms} thus:
To make this useful for FMEA each failure cause must be mapped to a
symptomatic failure mode descriptor~\footnote{The symptomatic descriptors chosen are based on experience and are not unique.}
as listed below:
%
%and map these failure causes to three symptoms,
%drift (resistance value changing), open and short.

\begin{itemize}
 \item Opened 52\% $\mapsto$ OPENED,
  \item Drift 31.8\% $\mapsto$ DRIFT,
 \item Film Imperfections 5.1\% $\mapsto$ OPEN,
 \item Substrate defects 5.1\% $\mapsto$ OPEN,
 \item Shorted 3.9\% $\mapsto$  SHORT,
 \item Lead damage 1.9\% $\mapsto$ OPEN.
\end{itemize}
%

%
Note, that the main cause of resistor value drift is overloading. % of components.
This is borne out in the FMD-91~\cite{fmd91}[232] entry for a resistor network where the failure
modes do not include drift.
%
If it is ensured that resistors will not be exposed to overload conditions, the
probability of drift (sometimes called parameter change) %occurring
is significantly reduced, enough for some standards to exclude it~\cite{en298,en230}.


\paragraph{Resistor failure modes according to EN298.}

EN298, the European gas burner safety standard,
tends to  give  failure modes that are more directly
usable for performing FMEA than FMD-91.
%
The certification process for EN298 requires that a full FMEA be undertaken, examining all failure modes
of all electronic components~\cite{en298}[11.2 5]. % as part of the certification process.
%
Annex A of EN298, prescribes failure modes for common components
and guidance on determining sets of failure modes for complex components (i.e. integrated circuits).
EN298~\cite{en298}[Annex A] (for most types of resistor)
only requires that the failure mode OPEN be considered for FMEA analysis.
%
For resistor types not specifically listed in EN298, the failure modes
are considered to be either OPEN or SHORT.
%
The reason that parameter change is not considered for resistors chosen for an EN298 compliant system, is that they must be {\em downrated}
during the design process.
%
That is to say the power and voltage ratings of components must be calculated
for maximum possible exposure, with a 40\% margin of error.
%
This drastically reduces the probability
that the resistors will be overloaded,
and thus subject to drift/parameter change.
%
Clearly the assumed failure modes of base components represent a fundamental
limit of resolution in any failure analysis methodology.
% XXXXXX get ref from colin T

%If a resistor was rated for instance for

%These are useful for resistor manufacturersthey have three failure modes
%EN298
%Parameter change not considered for EN298 because the resistors are down-rated from
%maximum possible voltage exposure -- find refs.


% FMD-91 gives the following percentages for failure rates in
% \label{downrate}
% The parameter change, is usually a failure mode associated with over stressing the component.
%In a system designed to typical safety critical constraints (as in EN298)
%these environmentally induced failure modes need not be considered.

\subsubsection{Resistor Failure Modes}
\label{sec:res_fms}
The difference in resistor failure modes between FMD-91 and EN298 is that FMD-91 would
include the failure mode DRIFT.
%
EN298 does not include this, mainly because it imposes circuit design constraints
that effectively side step that problem.
%
For this study the conservative view from EN298, but restrictive view from FMD-91 (i.e. no DRIFT) is taken, and the failure
modes for a generic resistor taken to be both OPEN and SHORT. The function $fm$ is used
to return a set of failure modes,
i.e.
\label{ros}
$$ fm(R) = \{ OPEN, SHORT \} . $$
%
%
% Mention tolerance here
%
% hmmmmmm
%
%
\subsection{Failure modes determination for a generic operational amplifier}
%
The operational amplifier (op-amp) %is a differential amplifier and
is very widely used in nearly all fields of modern analogue electronics.
\fmmdglossOPAMP
%
Only one of two sources of information on {\bc} {\fms} being compared
has an entry specific to operational amplifiers (FMD-91).
%
EN298 does not specifically define the
{\fms} of op-amps but
instead has a procedure for determining the {\fms} of
components types not specifically listed in it.
%
Operational amplifiers are typically packaged in dual or quad configurations---meaning
that a chip will typically contain two or four amplifiers.
%
The failure modes determined from the FMD-91 entries are presented and then
the failure mode determination procedure of EN298
is applied to a typical op-amp designed for instrumentation and measurement, the dual packaged version of the LM358~\cite{lm358}
(see figure~\ref{fig:lm258}).
%
The results from both sources of {\fm} definition are then compared.
\fmmdglossOPAMP

\paragraph{Failure Modes of an Op-Amp according to FMD-91.}
\fmodegloss
%Literature suggests, latch up, latch down and oscillation.
For Op-Amp failures modes, FMD-91\cite{fmd91}{3-116] states,
\begin{itemize}
 \item Degraded Output 50\% Low Slew rate - poor die attach
 \item No Operation - overstress 31.3\%
 \item Shorted inputs (labelled $V_+$ to $V_-$), overstress, resistive short in amplifier 12.5\%
 \item Opened input (labelled $V_+$) open 6.3\%
\end{itemize}

These are mostly internal causes of failure, more of interest to the component manufacturer
than a test engineer % designer
looking for the symptoms of failure.
%
These failure causes within the Op-Amp need to be translated to symptomatic {\fms}.
%
Each failure cause  is examined in turn, and mapped to potential {\fms} suitable for use in FMEA
investigations.

\paragraph{Op-Amp failure cause: Poor Die attach.}
\fmmdglossOPAMP
The symptom for this is given as a low slew rate.
%
Slew rate for a circuit/component is the maximum rate at which it can change an output voltage level (i.e.  $\frac{\delta V}{\delta t} $).
%
A low slew rate will mean that the op-amp will not react quickly to changes on its input terminals.
%
%
This is a failure symptom that may not be of concern in a slow responding system like an
instrumentation amplifier. However, where higher frequencies are being processed,
a signal may  be lost entirely.
This failure cause can be mapped to a symptomatic {\fm} called $LOW\_SLEW$.

\paragraph{No Operation - over stress.}
Here the OP-Amp has been damaged, and the output may be held HIGH or LOW, or may be
effectively tri-stated, i.e. not able to drive circuitry along the next stages of
the signal path: this {\fm} is termed NOOP (no Operation).
%
This failure cause thus maps to three {\fms}, $LOW$, $HIGH$, $NOOP$.

\paragraph{Shorted inputs: $V_+$  to $V_-$.}
Due to the high intrinsic gain of an op-amp, and the effect of offset currents,
this will force the output HIGH or LOW.
This failure cause maps to $HIGH$ or $LOW$.

\paragraph{Open input: $V_+$.}
This failure cause will mean that the minus input will have the very high gain
of the Op-Amp applied to it, and the output will be forced HIGH or LOW.
This failure cause maps to $HIGH$ or $LOW$~\footnote{No failure mode for open input ${V}_{-}$ was listed in this FMD-91 entry~\cite{fmd91}[3-116].}.

\paragraph{Collecting Op-Amp failure modes from FMD-91.}
An Op-Amp's failure mode behaviour, under FMD-91 definitions will have the  following {\fms}:
\begin{equation}
 \label{eqn:opampfms}
 fm(OpAmp) = \{ HIGH, LOW, NOOP, LOW\_SLEW \} .
\end{equation}


\paragraph{Failure Modes of an Op-Amp according to EN298.}

EN298 does not specifically define  op-amp failure modes; these can be determined
by following a  procedure for `integrated~circuits' outlined in
annex~A~\cite{en298}[A.1 note e].
%
This demands that all open connections, and shorts between adjacent pins be considered as failure scenarios.
%
In table ~\ref{tbl:lm358} these failure scenarios on the dual packaged  $LM358$~\cite{lm358} %\mu741$
are examined and from this its {\fms} are determined.
%
% Collecting the op-amp failure modes from table ~\ref{tbl:lm358} we obtain the same {\fms}
% that we got from FMD-91, listed in equation~\ref{eqn:opampfms}, except for
% $LOW\_SLEW$.
%
Collating the op-amp failure modes from table ~\ref{tbl:lm358}  the same {\fms}
from FMD-91 are obtained---listed in equation~\ref{eqn:opampfms}---except for
$LOW\_SLEW$.
\fmmdglossOPAMP

%\paragraph{EN298: Open and shorted pin failure symptom determination technique}


%Eighth


\begin{table}[h+]
\caption{LM358: EN298 Open and shorted pin failure symptom determination technique}
\begin{tabular}{|| l | l | c | c | l ||} \hline
 %\textbf{Failure Scenario} & &  \textbf{Amplifier Effect}  &   & \textbf{Symptom(s)}          \\
 \textbf{Failure} & &  \textbf{Amplifier Effect}  &   & \textbf{FMEA component}          \\
 \textbf{cause}   & &  \textbf{                }  &   & \textbf{Failure Mode}          \\

               \hline

       &  &                        &  &          \\   \hline

     FS1: PIN 1 OPEN  &  & A output open   &  &  $NOOP_A$              \\  \hline

  FS2: PIN 2 OPEN  &  &  A-input disconnected,        &  &          \\
                   &  &    infinite gain on A+input   &  &  $LOW_A$    or $HIGH_A$          \\  \hline

  FS3: PIN 3 OPEN  &  &  A+input disconnected,      &  &                \\
                   &  &  infinite gain on A-input   &  &  $LOW_A$    or $HIGH_A$             \\  \hline

  FS4: PIN 4 OPEN  &  &  power to chip (ground) disconnected           &  &  $NOOP_A$ and   $NOOP_B$            \\  \hline


  FS5: PIN 5 OPEN  &  &  B+input disconnected,     &  &                  \\
                   &  &   infinite gain on B-input   &  &  $LOW_B$    or $HIGH_B$               \\  \hline

  FS6: PIN 6 OPEN  &  &  B-input disconnected,    &  &                   \\
 FS6:              &  &    infinite gain on B+input  &  &   $LOW_B$    or $HIGH_B$               \\  \hline


  FS7: PIN 7 OPEN  &  &  B output open  &  &  $NOOP_B$              \\  \hline

  FS8: PIN 8 OPEN  &  &  power to chip     &  &                 \\
  FS8:             &  &    (V+ supply) disconnected   &  &   $NOOP_A$ and   $NOOP_B$              \\  \hline
                    &  &                        &  &          \\
                  %    &  &                        &  &          \\
                  %    &  &                        &  &          \\   \hline

       FS9: PIN 1 $\stackrel{short}{\longrightarrow}$  PIN 2   &  &  A -ve 100\% Feed back, unity gain                               &  &  $LOW_A$             \\  \hline

      FS10: PIN 2 $\stackrel{short}{\longrightarrow}$  PIN 3   &  &  A inputs shorted,                     &  &               \\
                                                             &  &    output controlled by internal offset  &  &  $LOW_A$ or $HIGH_A$               \\  \hline

      FS11: PIN 3 $\stackrel{short}{\longrightarrow}$  PIN 4   &  &  A + input held to ground                                &  &   $LOW_A$  or $HIGH_A$           \\  \hline

      FS12: PIN 5 $\stackrel{short}{\longrightarrow}$  PIN 6   &  &  B inputs shorted,                       &  &          \\
                                                                 &  &   output controlled by internal offset  &  & $LOW_B$ or $HIGH_B$           \\  \hline

      FS13: PIN 6 $\stackrel{short}{\longrightarrow}$  PIN 7   &  &  B -ve 100\% Feed back, low gain                         &  &  $LOW_B$             \\  \hline

      FS14: PIN 7 $\stackrel{short}{\longrightarrow}$  PIN 8   &  &  B output held high                                      &  &  $HIGH_B$             \\  \hline


\hline
\end{tabular}
\label{tbl:lm358}
\end{table}

\begin{figure}[h+]
 \centering
 \includegraphics[width=200pt]{CH5_Examples/lm258pinout.jpg}
 % lm258pinout.jpg: 478x348 pixel, 96dpi, 12.65x9.21 cm, bb=0 0 359 261
 \caption{Pinout for an LM358 dual Op-Amp}
 \label{fig:lm258}
\end{figure}

%\clearpage

\subsubsection{Failure modes of an Op-Amp}

\label{sec:opamp_fms}
For the purpose of the examples to follow in this document, op-amp's
are assigned  the following failure modes:
%
$$ fm(OPAMP) = \{ LOW, HIGH, NOOP, LOW\_SLEW \} . $$
%
\fmmdglossOPAMP
\subsection{Comparing the component failure mode sources: EN298 vs FMD-91}


The EN298 pinouts failure mode technique cannot reveal failure modes due to internal failures,
and that is why it misses the $LOW\_SLEW$.
%
The FMD-91 entries for op-amps are not directly usable as
component {\fms} in FMEA and require interpretation.
%
However, once a failure mode analysis has been carried out, the model can
be used throughout the FMEA  process.

%%%% Talk about R differences ?? XXXXX


\clearpage


 \section{FMEA worked example: milli-volt reader.}
%
FMEA is a bottom-up procedure which starts with the failure modes of the  low level components of a system.
%
An example analysis will serve to demonstrate it in practice.
%
%
Consider a system of a simple milli-volt reader, consisting
of instrumentation amplifiers connected to a micro-processor
that reports its readings via RS-232.
%
\begin{figure}
 \centering
 \includegraphics[width=175pt]{./CH2_FMEA/mvamp.png}
 % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403
 \caption{System diagram of a milli-volt reader, showing an expanded circuit diagram for the component of interest.}
\end{figure}
\fmeagloss


\subsection{FMEA Example: Milli-volt reader}
%
Undertaking an FMEA on the milli-volt reader to consider how one of its resistors failing could affect
it and choosing the resistor R1 in the  OP-AMP gain circuitry:
% \begin{figure}
%  \centering
%  \includegraphics[width=175pt]{./mvamp.png}
%  % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403
% \end{figure}


 \paragraph{FMEA Example: Milli-volt reader}
% \begin{figure}
%  \centering
%  \includegraphics[width=80pt]{./mvamp.png}
%  % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403
% \end{figure}
\begin{itemize}
   \item \textbf{F - Failures of given component} The resistor (R1) could fail by going OPEN or SHORT (EN298 definition),
    \item \textbf{M - Failure Mode} Consider the component failure mode SHORT,
    \item \textbf{E - Effects} This will drive the minus input LOW causing a HIGH OUTPUT/READING,
   \item \textbf{A - Analysis} The reading will be out of the  normal range, i.e. will have an erroneous milli-volt reading.
\end{itemize}

\fmeagloss


The analysis above has given  a result for % one failure %scenario i.e.
one single component failure mode.
A complete FMEA report, would have to contain an entry
for each failure mode of all the components in the system under investigation.
%
In theory it would be necessary to look at the failure~mode
in relation to the entire circuit.
%
Intuition has been used to determine the probable
effect of this failure mode.
%
For instance it has been assumed that the resistor R1 going SHORT
will not affect the ADC, the Microprocessor or the UART.
\fmmdglossADC
%
%
%
The {\bc} {\fm} R1 SHORT has been examined
and failure reasoning applied,
along a heuristically determined signal path,
to find a putative system level symptom.
%
\fmmdglossSIGPATH
That is R1 going SHORT is expected to just give an out of range value
that can be read by the ADC and reported correctly by the software.
%
Potential side effects of this {\fm} may not have been factored.
%
To put this in more general terms, this failure mode has not been examined
against all other components in the system, only those expected on the signal path.
%
Examining the {\fm} R1 SHORT against all component in this system, would be a more rigorous and complete
approach in looking for system failures.
%
FMEA where
each failure mode is compared against all other components
is termed exhaustive FMEA (XFMEA).
%
An indicator of the vagueness of not performing XFMEA, in terms of failure outcome,
is shown in the UML relationship in figure~\ref{fig:component_fm_rel_ana}
giving a one to many mapping for a failure mode and its system level symptom.


\section{Theoretical Concepts in FMEA}

In this section some fundamental concepts and underlying philosophies of FMEA are examined.

\paragraph{Failure modes of a component and mutual exclusivity.}
It is desirable that the failure modes for a component are mutually exclusive, were a component able
to fail in several ways at the same time, this would complicate analysis.
%
It would mean having to consider combinations of internal component failures
as separate failure modes. This concept is discussed in sections~\ref{ch4:mutex}
and~\ref{ch7:mutex}.
%
\fmmdglossMUTEX
%
In general, failure modes
for simple components   are mutually exclusive,
but large and complex components (such as integrated circuits), especially where they contain separate modules,
could have non mutually exclusive failure modes and these need special handling, see section~\ref{ch7:indfm}.

\paragraph{The signal path.}
\fmmdglossSIGPATH
% C Garret does not like the terms afferent and efferent here, try to think of something else
Most electronic systems are used to process a signal: with signal processing
there is usually a clear path from the signal coming into the system, it being processed in some way, and a resultant effect on
an output or control signal. % afferent to transform to efferent path.
%
That is, there is an input, some processing and an output.
%
In electronics this could be termed a sensor, processing and actuator
model.
%
In software this would be termed afferent, transform and efferent data flow.
%
For the purpose of FMEA, the signal path is defined by the components and connections  used to process the signal.
%
Some circuits have feedback loops or even circular signal paths, but it
is normal for a signal path to exist.
%
%can be identified.
%
An FMEA investigation will often take the component {\fm} and examine its effect along this path,
in the direction of the signal,
echoing diagnostic/fault~finding methods~\cite{garrett, maikowski}. % loebowski}.
%
When fault finding, the signal path is followed, checking for correct behaviour
along it: when something out of place is found,
the circuit behaviour is measured in finer granularity,
 until  a faulty component or module~\cite{garrett} is identified.
%
With this style of fault finding, because it is based on experiment,
hopping from module to module eliminating working ones, until
failure is found~\cite{maikowski}, it is efficient in terms of
concentrating effort.
%
The rationale and work-culture of those tasked to
perform FMEA are generally personnel who have performed fault finding~\cite{cbds}[p.97].
%


FMEA is a theoretical discipline. %AF does not like this!
%
It  would be very unusual to build a circuit and then simulate
component failure modes.
%
This would be  time consuming as it would involve building a circuit for each component {\fm} in
the system\footnote{Building circuit simulations and simulating component failure modes
would be a very time consuming process and might only be performed as a final-stage of accident investigation, where the cause is
required to be proven.}.
%
It is not possible, as with fault finding, to verify modules along the signal path for correct behaviour
and eliminate them from the investigation.
%
FMEA is a `thought~experiment', not actual experiment.
%
With FMEA  there is a need to be more thorough in the consideration of the effects a failure mode may have
on the other components in a system, than with fault finding.
%
The question is by how much.
%
Too much and the task becomes impossible due to time/labour constraints.
%
Too little and the analysis could become meaningless, because it could miss
potential system failures.
%
For a more complete analysis, the strategy of examining each component {\fm} along the complete signal path,
forwards and backwards from the placement
of the component exhibiting the {\fm} under investigation, could be applied.
%
% Also, whether following the effects through the signal path {\em only} is acceptable, and instead
% would looking at its effect on all other components in the system be necessary?
Is following the effects of a {\fm}  {\em only} through the components along the signal path acceptable?
This could easily  ignore side effects; this leads onto the idea of
looking at a {\fm}'s effects on all other components in the system. % be necessary?
%is a matter for debate.
%
In practise, a compromise is made between the amount of time/money  that can be spent
on analysis relative to the criticality of the project.
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.

\paragraph{Failure Modes and the signal path.}
\fmmdglossSIGPATH
In general a component failure mode in an electronic circuit will
change the circuit topology. For a single failure
this effect may cause additional complications for the analyst.
For multiple failures this means
that the analyst
will have to deal with altered---or changed circuit topologies---
of the electronic circuit for each analysis.


\paragraph{Single component failure mode to system failure relation.}
%
%
% NEED SOME NICE HISTORICAL REFS HERE
FMEA, due to its inductive bottom-up approach, is good
at mapping potential single component failures to system level faults/events.
%
The concept of the unacceptability of a single component failure causing a system failure % catastrophe,
is an important and easily understood measurement of safety.
%
They are easy to calculate
because  Mean Time to Failure (MTTF) statistics~\cite{fmd91,mil1991} for commonly used components can be found.
%
Also, used in the design phase of a project, FMEA is a useful tool
for discovering potential  failure scenarios~\cite{1778436820050601}.
%
From a large system perspective, it may be found that {\bc} {\fms}
may have more than one possible system event associated with them.
%
Often there will be a clear one to one mapping, but
probabilities to failure (as used in FMECA, see section~\ref{sec:FMECA})
could mean one ({\fm}) too many (system level symptoms). % mapping.
%
\paragraph{Use of Markov chains to model failure modes.}
We could represent a failure mode and its possible outcomes using a Markov chain~\cite{probfmea_4338247}.
%
Where multiple simultaneous %\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}
failure modes are considered this complicates
the statistical nature of the Markov chain cause and effect model.
%
What we in fact get is the merging, or local interaction of two Markov chains
for the cause and effect model.
% Subject Object Wiki answers : Best Answer
%It is not grammar or vocabulary. It is a philosophical reference.
%The dichotomy is the surrounding view of self that we act out of. It is often learned with language and not taught [like the alphabet and numbers are taught] in early life through language and the forming of distinctions.
%The Subject/Object dichotomy is related mostly to the Cartesian model of a 'self'. We can be both the subject that we observe, and the object doing the observing.But it goes beyond that into how we view the world we are in. In balanced thinking, we are both subjective and objective about situations and interactions in daily life, internally and externally. In unbalanced thinking, there is a tilt towards one side or the other. That is, either too subjective; as relating everything to how it affects you personally, [temperamental and self center] or, too objective; not having a sense of who you are in regards to what is occurring, [aloof, distant and apathetic]. It is related in Western philosophy as the basic nature of dualism. How do you know that you learned to live in a subject/object dichotomy?
%The core of Cartesianism is that you have a mind: a separate function of your'self'. If you have an invisible self called a mind - you are in the subject/object dichotomy. Non-dualism is mostly learned in Eastern philosophies and will refer to the mind as an integer of the self - not separate from it.
%You can not jump from one to the other. And, they both must be learned as referential contexts to who 'you' are in the world you live in.
%
\paragraph{Subjective and Objective thinking in relation to FMEA.}
\label{sec:subjectiveobjective}
FMEA is always performed in the context of the use of the equipment.
In terms of philosophy the context is in the domain of the subjective and the
logic and reasoning behind failure causation, the objective.
%
By using objective reasoning a component level failure to a system level event can be traced,
but only in
the subjective sense its meaning and/or severity be determined.
%
It is worth remembering that
failure mode analysis performed on the leaks possible from the O ring on the space shuttle
did not link this failure to the catastrophic failure of the spacecraft~\cite{challenger,sanjeev}.
%
This was not a failure in the objective reasoning, but more of the subjective, or the context in which the leak occurred.
%
What this means is that for an objectively calculated failure mode outcome, there may have
more than one subjective outcome. %, or definition, for it.
%

This means that objective reasoning can be applied to determine objective effects, but the criticality ---or the seriousness/consequences---
of those failures depends upon the Equipment Under Control (EUC)
and its environment.
%
For instance a leak of nuclear material %on an
aboard a spacecraft could have the consequences
of loss of mission, but a leak on earth could have serious health and environmental consequences.
This means one line of FMECA describing a system risk is an over simplification (consider that the same
nuclear material will be present during transport and launch, and when outside earth's environment).
%
Subjective appraisal of the outcome of a system failure mode can also
be subject to management and/or political pressure.
%
The two most recent variants of FMEA,
FMEDA and FMECA have dipped a metaphorical toe into the subjective realm, FMECA with its `criticality~factor' and
FMEDA with its definition of `dangerous'.
%
However, while starting to address the subjective side
of failure analysis,
these methodologies
do not separate the final subjective stage from the objective. % stage of analysis.
%
A subjective assessment is made during the analysis of each {\bc} {\fm}
regardless of the fact that most  {\bc} {\fms} cause shared
system level failures.
%
This means that work at the subjective
level is repeated.
%
Detailed work on subjective analysis is beyond the scope of this study.


\paragraph{Multiple Simultaneous Failure Modes.}
%
FMEA is less useful for determining events for multiple
simultaneous
failures\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.
Detection periods are typically determined for the process under control. For instance, for a flame detector in an industrial burner this
is typically one second.~\cite{en298}}.
%
Work has been performed using component failure statistics to
offer the more likely multiple failures~\cite{FMEAmultiple653556} for analysis.
%
%We now compound the multiple symptoms from one {\bc} {\fm} possibility
%with the merging of Markov chains.
%,this is an additional complication.
%, of having to change between these two modes of thinking, it becomes more difficult to
%get a balance between subjective and objective perspectives.
A complication for multiple failure analysis is that  failure modes may cause a change in circuit topology
meaning the additional failures might have to be analysed with respect to the changed topology.
%subjective/objective become more cluttered when there are multiple possibilities
%for the the results of an FMEA line of reasoning.
Because multiple failures mean dealing with changed topologies
the objective criteria is additionally complicated with the subjective
adding another layer of complication.
%
%
Traditional FMEA has the translation from an objective to subjective
failure modes as an intrinsic part of its process, which can be considered a weakness.

\paragraph{Failure modes and their observability criterion: detectable and undetectable.}
\label{sec:detectable}
\fmmdglossOBS
Often the effects of  a failure mode may be easy to detect,
and our equipment can react by raising an alarm or compensating for the resulting fault.
%
Some failure modes may cause undetectable failures, for instance a component that causes
a measured reading to change could have adverse consequences yet not be flagged as a failure.
%
This type of failure
can not be dealt with by passing error indication to higher level modules
because it simply cannot be detected.
%
The system therefore
has no way of knowing the reading is invalid.
%
The term observable has a specific meaning in the field of control engineering~\cite{721666, ACS:ACS1297};
systems submitted for FMEA are generally related to control systems,
and so to avoid confusion the terms `detectable' and `undetectable' (as defined in EN61508\cite{en61508})
will be used for describing the observability of failure modes in this document.
%\glossary{name={observability}, description={The property of a system failure in relation to a particular component failure mode, where it can be determined whether the readings/actions associated with it are valid, or the by-product of a failure. If we cannot determine that there is a fault present, the system level failure is said to be unobservable.}}
\fmmdglossOBS


\paragraph{Impracticality of Field Data for Modern Systems.}
\fmmdglossFIT
Modern electronic components, are generally very reliable, and the systems built from them
are thus very reliable too. Reliable field data on failures will, therefore, be sparse.
%
Should it be wished to prove a continuous demand system for say ${10}^{-7}$ failures\footnote{${10}^{-7}$ failures per hour of operation is the
threshold for S.I.L. 3 reliability~\cite{en61508}.
%
Failure rates are normally measured per $10^9$ hours of operation
and are known as Failure in Time (FIT) values.
%
The maximum FIT values for a SIL 3 system is therefore 100.}
per hour of operation, even with 1000 correctly monitored units in the field
there could only be one failure per ten thousand hours expected  (i.e. a little over one a year) .
%
It would be utterly impractical to get statistically significant data for equipment
at these reliability levels.
%
However, FMEA can be used (more specifically the FMEDA variant, see section~\ref{sec:FMEDA}),
working from known component failure rates, to obtain
statistical estimates of the equipment reliability.
\fmmdglossFIT
%
\paragraph{Forward and Backward Searches.}
\fmmdglossFS
\fmmdglossBS
A forward search starts with possible failure causes
and uses logic and reasoning to determine system level outcomes.
%
Forward search types of fault analysis are said to be `inductive'.
%
A backward search starts with (undesirable) system level events and
works back down to potential causes using de-composition
of the system and logic.
%
FMEA based methodologies are forward searches\cite{Lutz:1997:RAU:590564.590572} and top down
methodologies such as FTA~\cite{nucfta,nasafta} are backward searches.
%
%
Backward (or bottom-up) searches are said to be deductive (i.e. the results of failure are
deduced).


\subsection{Reasoning distance.}
\label{reasoningdistance}
\fmmdglossRD
Reasoning distance,   is the number of stages of logic and reasoning used
in {\fm} analysis to map a failure cause to its potential outcomes; counted
by th number of {\fm} to component checks made.
%
The basic FMEA example in section~\ref{basicfmea}
considered one {\fm} against some of  the components in the milli-volt reader.
%
To create an exhaustive FMEA report on the milli-volt reader,  every
known failure mode of every component within it would have to be examined against all its other components.
%
`Reasoning~distance', for one {\fm}, is defined as the number of components checked against it
to determine its system level symptom(s).
%
No current FMEA variant gives guidelines for the components that should
be included to analyse a {\fm} in a system.
%
Were a {\fm} examined against all the other components in a system
this would give us the maximum reasoning distance.
%
This is termed the exhaustive FMEA case for a single {\fm}.
%does not
% The exhaustive~reasoning~distance would be
% the sum of the number of failure modes, against all other components
% in that system.
Thus the exhaustive~reasoning~distance for a particular component
would be to multiply
the number of failure modes it has by the number of remaining components
in the system.
%
The exhaustive reasoning~distance for a system would be the
the sum of these multiplications for all the components it contains.
%
If the milli-volt reader had say 100 components, with three failure modes each, this
would give an exhaustive reasoning distance---for single failure analysis---of $3 \times 100 \times 99$.
%
The discussion on reasoning distance provides a metric to examine
the state explosion problems associated with forward search failure investigation
methodologies.
%
\fmmdglossSTATEEX
%
It is apparent that the shorter the reasoning distance, the more precisely theoretical examination
can determine failure symptoms.
%
For instance for a very simple small circuit, a better understanding of failure effects is expected,
than for a very large system where there are more variables and potential {\fm} interactions.
%
%.... general concept... simple ideas about how complex a
%failure analysis is the more modules and components are involved
% cite for forward and backward search related to safety critical software
 %{sfmeaforwardbackward}
\subsection{FMEA and the  State Explosion Problem}
\label{sec:xfmea}
\paragraph{Problem of which components to check for a given {\bc} {\fm}.}
\fmmdglossSTATEEX
%
FMEA for safety critical certification (i.e. for EN298 and EN61508)~\cite{en298,en61508}  has to be applied
to all known failure modes of all components within a system.
%
Each one of these, in a typical report, would be one line of a spreadsheet entry.
%
FMEA does not define or specify the scope of the investigation for each component failure mode.
%
For instance should  the signal path be followed, with all components encountered along that, or should the scope be wider?
%
%If we wethe effect of a component {\fm} against all other components
%in a system, this could be said to be exhaustive analysis.

\paragraph{Exhaustive Single Failure FMEA.}
\fmmdglossXFMEA
%
To perform exhaustive FMEA (XFMEA), every possible interaction
of a failure mode with all other components in a system must be examined.
%
Or in other words, all possible failure scenarios considered.
%
%to do this completely (all failure modes against all components).
This is represented in the equation below, %~\ref{eqn:fmea_state_exp},
where $N$ is the total number of components in the system, $RD_{single}$ is the reasoning~distance and
$f$ is the number of failure modes per component:
%
\begin{equation}
  \label{eqn:fmea_single}
  RD_{single} = N.(N-1).f  . % \\
  %(N^2 - N).f
\end{equation}
%
This means an order of $O(N^2)$  checks to perform
to undertake XFMEA for single failures.
%
Even small systems have typically
100 components, and they typically have 3 or more failure modes each, which would give
$100 \times 99 \times 3 = 29,700 $ as a reasoning~distance.
%
\fmmdglossSTATEEX
\paragraph{Exhaustive FMEA and double failure scenarios.}
%
%\paragraph{Exhaustive Double Failure FMEA}
For looking at potential double failure
scenarios\footnote{Certain double failure scenarios are already legal
requirements---The European Gas burner standard (EN298:2003)---demands the checking of
double failure scenarios (for burner lock-out scenarios).}
%
(two components failing within a given time frame) and the order becomes $O(N^3)$.
Where $RD_{double}$ is the reasoning~distance for double failure scenarios:
\begin{equation}
  \label{eqn:fmea_double}
  RD_{double} = N.(N-1).(N-2).f  . % \\
  %(N^2 - N).f
\end{equation}
%
For a theoretical system with 100 components and a fixed 3 failure modes each, this gives reasoning distance of
$100 \times 99 \times 98 \times 3 = 2,910,600$. % failure mode scenarios.
%
In practise there is an additional complication here, that of
the circuit topology changes that {\fms} can cause.

\paragraph{Reliance on experts for meaningful FMEA Analysis.}
Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach.
%We define exhaustive FMEA ({\XFMEA}) as examining the effect of every component failure mode
%against the remaining components in the system under investigation.
%
\fmmdglossSTATEEX
%
Because for practical reasons,   XFMEA cannot be performed for anything other than a trivial system,
reliance is placed upon  experts on the system under investigation
to perform a meaningful analysis.
%
These experts must use their judgement and experience to choose
sub-sets of the components in the system to check against each {\fm}.
%
Also, %In practise
these experts have to select the areas they see as most critical for detailed FMEA analysis:
it is usually impossible, for reasons of time to perform the work,
to action a detailed level of analysis on all component {\fms}
on anything but a small hypothetical system.

\subsection{Component Tolerance}

Component tolerances may need considering when determining if a component has failed.
Calculations for acceptable ranges to determine failure or acceptable conditions
must be made where appropriate.
%
An example of component tolerance considered for FMEA
is given in section~\ref{sec:resistortolerance}.

\section{FMEA in current usage: Five variants}

\paragraph{Five main Variants of FMEA}
 \begin{itemize}
  \item \textbf{PFMEA - Production}   Emphasis on cost reduction and product improvement;
    \item \textbf{FMECA - Criticality}  Emphasis on minimising the effect of critical systems failing; % Military/Space
    \item \textbf{FMEDA - Statistical Safety} Statistical analysis giving Safety Integrity Levels;
   \item \textbf{DFMEA - Design or Static/Theoretical}  Approval of safety critical systems using FMEA and single or double failure prevention;%  EN298/EN230/UL1998
   \item \textbf{SFMEA - Software FMEA --- only used in highly critical systems at present}
\end{itemize}


\section{PFMEA - Production FMEA : 1940's to present}
\fmmdglossPFMEA
%
Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of
cost, problems to be addressed in product production.
%
It generally focuses on known problems and using their
statistical frequency %they occur
and their cost to fix multiplied gives a  Risk Priority Number (RPN)
number for the germane component {\fm}.
%
Fixing problems with the highest RPN number
will return most cost benefit~\cite{bfmea}.
%
An example PFMEA report is presented in table~\ref{tbl:pfmeareport}.

% benign example of PFMEA in CARS - make something up.
\subsection{PFMEA Example}
\begin{table}[ht]
\label{tbl:pfmeareport}
\caption{FMEA Calculations} % title of Table
\centering % used for centering table
\begin{tabular}{|| l | l | c | c | l ||} \hline
 \textbf{Failure Mode} &   \textbf{P}             & \textbf{Cost}        &  \textbf{Symptom} & \textbf{RPN} \\ \hline \hline
      relay 1 n/c      & $1*10^{-5}$              &  38.0                & indicators fail   & 0.00038 \\ \hline
        relay 2 n/c      & $1*10^{-5}$              &  98.0                & doorlocks fail   & 0.00098 \\ \hline
%       rear end crash    &  $14.4*10^{-6}$         & 267,700              & fatal fire       &  3.855 \\
%       ruptured f.tank   &                         &                      &                  &        \\ \hline
\hline
\end{tabular}
\end{table}


\section{FMECA - Failure Modes Effects and Criticality Analysis}
\fmmdglossFMECA
\label{sec:FMECA}
\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
% \begin{figure}
%  \centering
%  %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg}
%  \includegraphics[width=300pt]{./CH2_FMEA/A10_thunderbolt.jpg}
%  % military-aircraft-desktop-computer-wallpaper-missile-launch.jpg: 1024x768 pixel, 300dpi, 8.67x6.50 cm, bb=0 0 246 184
%  \caption{A10 Thunderbolt}
%  \label{fig:f16missile}
% \end{figure}
FMECA places emphasis on determining criticality rather than the cost of system failures.
%
%
It applies Bayesian statistics within the FMEA process (i.e. using probabilities of component failures
and the probability of those failures causing given system level failures)
to determine the risk of system level events/symptoms.
%
%
The results of these risk probabilities, i.e. for system level failures,
are then multiplied by the estimated operational time of the system.
%
For instance a military or emergency  system may be typically operational for
a given number of hours. The risk against time value, in conjunction with the severity
of the system level event gives a `criticality~level'.
%
%Also the probability of the system failure causing a critical event.
%
Bayes' theorem can be seen as a theory on the `probability~of~causes'~\cite{probstatcrash}[p.9].
%
A given component failure may for instance, be associated with
a particular system failure to a calculated, or measured from field~data, statistical probability.
%
Applying Bayesian statistics to failure analysis, suffers the
problem that correlation does not imply causation~\cite{bayesfrequentist}.
%
However, correlation is evidence for causation, and maybe the only evidence to hand
and this is the justification behind its use.
%
This implies a weakness in the FMECA philosophy. It means that
failure causes can be inferred, rather than analytically
determined, to become part of the failure mode model.
%
A history of the usage and development of FMECA may be found in~\cite{FMECAresearch}.
 \fmmdglossFMECA

\paragraph{ FMECA - Failure Modes Effects and Criticality Analysis.}
%
Very similar to PFMEA, but instead of cost, a criticality or
seriousness factor is ascribed to putative top level incidents.
FMECA has three probability factors for component failures, a system operational time and a severity factor.

\textbf{FMECA ${\lambda}_{p}$ value.}
This is the overall failure rate of a base component.
This will typically be the failure rate per million ($10^6$) or
billion ($10^9$) hours of operation~\cite{mil1991}.

\textbf{FMECA $\alpha$ value.}
The failure mode probability, usually denoted by $\alpha$ is the  probability of
a particular failure~mode occurring within a component~\cite{fmd91}.
%, should it fail.
%A component with N failure modes will thus have
%have an $\alpha$ value associated with each of those modes.
%As the $\alpha$ modes are probabilities, the sum of all $\alpha$ modes for a component must equal one.
%
\fmmdglossFMECA
%

\textbf{FMECA $\beta$ value.}
The second probability factor $\beta$, is the probability that the failure mode
will cause a given system failure.
%
This corresponds to `Bayesian' probability, i.e. given a particular
component failure mode, the probability of a given system level failure~\cite{nucfta}[VI-19].

\textbf{FMECA `t' Value.}
The time that a system will be operating for, or the working life time of the product is
represented by the variable $t$.
%for probability of failure on demand studies,
%this can be the number of  operating cycles or demands expected.

\textbf{Severity `s' value.}
A weighting factor to indicate the seriousness of the putative system level error.
%Typical classifications are as follows:~\cite{fmd91}

The statistical formula to calculate the criticallity factor for one component {\fm} is given below:
%
\begin{equation}
 C_m  =  {\beta} .  {\alpha} . {{\lambda}_p} . {t} . {s} .
\end{equation}
\fmmdglossFMECA
%
The highest $C_m$ values would represent the most dangerous or serious
system level failures.
The highest $C_m$ values would be at the top of a `to~fix' list
for a project manager, and some levels of risk may be considered unacceptable
and require re-design of some systems.
\fmmdglossFMECA

\section{FMEDA - Failure Modes Effects and Diagnostic Analysis}
%
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
% \begin{figure}
%  \centering
%  \includegraphics[width=200pt]{./SIL.png}
%  % SIL.jpg: 350x286 pixel, 72dpi, 12.35x10.09 cm, bb=0 0 350 286
%  \caption{SIL requirements}
% \end{figure}
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
%
\fmmdglossFMEDA
%
\begin{table}[ht]
\centering

%\centering % used for centering table
\begin{tabular}{|| l | l | c | c | l ||} \hline
 \textbf{SIL} &   \textbf{Low Demand}     & \textbf{Continuous Demand}          \\
              & Prob of failing on demand & Prob of failure per hour  \\ \hline \hline
      4       & $ 10^{-5}$ to $< 10^{-4}$  &   $ 10^{-9}$ to $< 10^{-8}$                \\ \hline
      3       &  $ 10^{-4}$ to $< 10^{-3}$ &    $ 10^{-8}$ to $< 10^{-7}$             \\ \hline
      2       &  $ 10^{-3}$ to $< 10^{-2}$ &    $ 10^{-7}$ to $< 10^{-6}$             \\ \hline
      1       &  $ 10^{-2}$ to $< 10^{-1}$ &    $ 10^{-6}$ to $< 10^{-5}$                        \\ \hline

\hline
\end{tabular}
\caption{Table adapted from EN61508-1:2001 [7.6.2.9 p33], showing statistical tolerance of `dangerous~failures' to
comply with a given SIL level} % title of Table
\label{tbl:sil_levels}
\end{table}
%
% \begin{itemize}
%     \item \textbf{Statistical Safety}   Safety Integrity Level (SIL) standards (EN61508/IOC5108).
%     \item \textbf{Diagnostics}          Diagnostic or self checking elements modelled
%     \item \textbf{Complete Failure Mode Coverage}    All failure modes of all components must be in the model
%    \item \textbf{Guidelines}    To system architectures and development processes
% \end{itemize}
FMEDA is a modern extension of FMEA, in that it recognises the effect of
self checking features on safety, and provides detailed recommendations for computer/software architecture.
%
%
%
FMEDA is the fundamental methodology of the  statistical (safety integrity level)
type standards (EN61508/IOC5108).
The end result of an EN61508 analysis is an % provides a statistical
overall `level~of~safety' known as a Safety Integrity level (SIL) assigned to  an installed system.
%
It has a simple final result, a Safety Integrity Level (SIL) from 1 to 4 (where 4 is safest).
%
These SIL levels are broadly linked to the concept of an
acceptance of given probabilities of dangerous
failures against time, as shown in table~\ref{tbl:sil_levels}.
%
The philosophy behind this is that it is recognised that no system can have a perfect
safety integrity, but that risk and criticality can be matched to acceptable,
or realistic levels of risk.
%There are currently four SIL `levels', one to four, with four being the highest level.
%
%
SIL levels are intended to
classify the statistical safety of installed  plant:
sales terms such as a `SIL~3~sensor' or other `device' given a SIL level, are meaningless.
%
SIL analysis is concerned with `safety~loops', not individual modules, sensors, computing devices or actuators.
%
In control engineering terms, the safety~loop is the complete
path from sensors to signal~processing to actuators for a given function
in the plant.
%
This entire loop must be designed to detect and  deal with any hazards
and have measures in place to reduce their affects.
%
In EN61508 terminology, a safety~loop is known as a Safety Instrumented Function (SIF).
%
\fmmdglossFMEDA
%
 % for four levels of
%safety integrity, referred to as Safety Integrity Levels (SIL).
%For Hardware
%
FMEDA requires %does force
the analyst to consider all hardware components in a system
and requires that an MTTF value is assigned for each {\bc} {\fm};
the MTTF may be statistically mitigated (improved)
if it can be shown that self-checking measures will not only detect it within the SIF, but
also react in a safe way.
That is that the SIF can recognise that it has a fault condition and can take appropriate action.
%
The MTTF value for each component {\fm} is denoted using the symbol `$\lambda$'.
%
\paragraph{SIL and Software.}
EN61508 regulation in relation to software provides procedural quality guidelines and constraints (such as forbidding certain
programming languages and/or features): it does not provide a means to trace failure mode effects in software
or across the software/hardware interface.
%
While procedural guidelines and constraints can improve software reliability, ensuring that reliability targets, for software,
are actually met for given SIL levels is currently almost impossible~\cite{silsandsoftware}.
\fmmdglossFMEDA

%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
\label{sec:FMEDA}
\textbf{Failure Mode Classifications and metrics in FMEDA.}
 \begin{itemize}
  \item \textbf{Safe or Dangerous.}   Failure modes are classified SAFE or DANGEROUS.
    \item \textbf{Detectable failure modes.}   Failure modes are given the attribute DETECTABLE or UNDETECTABLE.
    \item \textbf{Four attributes for FMEDA Failure Modes.}    All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU)
   \item \textbf{Four statistical properties of a system.}  the statistics for the four classifications of system failures  are summed:  \\
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$. \\
\end{itemize}

% Failure modes are classified as Safe or Dangerous according
% to the putative system level failure they will cause.
% The Failure modes are also classified as Detected or
% Undetected.
% This gives us four level failure mode classifications:
% Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
% and the probabilistic failure rate of each classification
% is represented by lambda variables
% (i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).


%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}

\textbf{Diagnostic Coverage.}
The diagnostic coverage is simply the ratio
of the dangerous detected probabilities
against the probability of all dangerous failures,
and is normally expressed as a percentage~\cite{en61508}[2-Annex C].
%
$\Sigma\lambda_{DD}$ represents
the percentage of dangerous detected base component failure modes, and
$\Sigma\lambda_D$ the total number of dangerous base component failure modes,
%
$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D . $$
\fmmdglossFMEDA
%
%
%
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
The \textbf{diagnostic coverage} for safe failures, where  $\Sigma\lambda_{SD}$ represents the percentage of
safe detected base component failure modes,
and $\Sigma\lambda_S$ the total number of safe base component failure modes,
is given as
%
$$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} . $$
%
%
%
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
\textbf{Safe Failure Fraction.}
A key concept in  FMEDA is Safe Failure Fraction (SFF).
This is the ratio of safe  and dangerous detected failures
against all safe and dangerous failure probabilities.
Again this is usually expressed as a percentage,
%
$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) . $$
%
SFF determines how proportionately fail-safe a system is, not how reliable it is.
%
A weakness in this philosophy is that by adding extra safe failures (even unused ones)
the apparent SFF would be improved\footnote{The artificial inflation of SFF,
by including unnecessary safe functions or unused components
(i.e. a loophole) is closed in the 2010 edition of the standard.}.
\fmmdglossFMEDA
%
%
%
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
To achieve SIL levels, diagnostic coverage and SFF levels are prescribed along with
hardware architectures and software techniques.
The overall aim of SIL is to classify the safety of a system,
by statistically determining how frequently it can fail dangerously.
\fmmdglossFMEDA
%
%
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
%FMEA can be used as a term simple to mean Failure Mode Effects Analysis, and is
%part of product approval for many regulated products in the EU and the USA...
%
\section{FMEA used for Safety Critical Approvals}
\fmmdglossDFMEA
\subsection{DESIGN FMEA: Safety Critical Approvals FMEA}
% \begin{figure}[h]
%  \centering
%  \includegraphics[width=300pt,keepaspectratio=true]{./CH2_FMEA/tech_meeting.png}
%  % tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72
%  \caption{FMEA  Meeting}
%  \label{fig:tech_meeting}
% \end{figure}
%Static FMEA, Design FMEA, Approvals FMEA
%
Experts from Approval House and Equipment Manufacturer
discuss selected component failure modes
judged to be in critical sections of the product.
%
This could be considered as a design check method, deliberately
looking for weaknesses at a theoretical level.
%
%\subsection{DESIGN FMEA: Safety Critical Approvals FMEA}
%
% \begin{figure}[h]
%  \centering
%  \includegraphics[width=70pt,keepaspectratio=true]{./tech_meeting.png}
%  % tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72
%  \caption{FMEA  Meeting}
%  \label{fig:tech_meeting}
% \end{figure}
%
\begin{itemize}
   \item Impossible to look at all component failures let alone apply FMEA exhaustively/rigorously,
   \item In practice, failure scenarios for critical sections are contested, and either justified or extra safety measures implemented,
    \item Often meeting notes or minutes only: unusual for detailed technical  arguments to be documented.
\end{itemize}
%
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% SFMEA????
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Conclusion}
\begin{figure}[h]
 \centering
 \includegraphics[width=400pt]{./CH2_FMEA/component_fm_rel_ana_subj_obj.png}
 % component_fm_rel_ana_subj_obj.png: 694x303 pixel, 72dpi, 24.48x10.69 cm, bb=0 0 694 303
 \caption{FMEA UML data representation with subjective system level failure modes.}
 \label{fig:component_fm_rel_ana_subj_obj}
\end{figure}
%
Returning to the FMEA model, the data relationships shown in
figure~\ref{fig:component_fm_rel_ana} hold for the five variants of FMEA discussed.
%
This could be extended, if it is considered that the system level symptoms have subjective
interpretations.
%
With the addition of subjective failure mode symptoms, the UML model for FMEA gains an attribute
(see figure~\ref{fig:component_fm_rel_ana_subj_obj}).
%
The UML data model reveals some undefined qualities of FMEA.
These raise questions and are discussed below.
%
\paragraph{Which, or how many components should be checked for each {\fm} entry?}
For instance a given {\fm} will have its effect measured in relation
to some of the components in the system.
%
These components can be chosen by stipulating several criteria,
relating this to the signal path or adjacency in the electronic circuit,
potential strategies are listed below:
%
\begin{itemize}
 \item Look at all components electronically adjacent (i.e. connected to the affected component),
 \item Look at all components connected (as above) and those once removed (those connected to those connected to the affected component),
 \item Look at components forward of the {\fm} in the signal path,
 \item Look at all components in the signal path,
 \item Look at all components in the signal path including those one connection removed,
% dependency tree is a logical construct.
 \item Look at all components within pre-determined dependency models~\cite{cbds}[Ch.5],
 \item Look at all components in the system (i.e. XFMEA).
\end{itemize}
No current variant of FMEA gives any guidelines for which, or how many components to check for a given {\fm}.
\fmmdglossRD
\paragraph{FMEA gives us objective system level failures/symptoms.} %, what do we do with subjective or contextual failures resulting from this?}
%
The two more modern variants of FMEA, FMECA and FMEDA start to address the problem of subjective/contextual
failure symptoms of a system.
%
FMEDA classifies them as dangerous or safe failures.
%
FMECA gives us a statistically biased criticality level.
%
In both of these methodologies however, there is no formal stage where objective to subjective
system failures are mapped, this processes seems to be  intertwined with the basic analysis itself.
%
%
\paragraph{Re-use potential of an FMEA report.}
%
Each {\fm} entry in an FMEA report should have a reasoning or comments field.
This should provide a guide to someone re-examining, or trying to re-use results
on a similar project.
However, %, as with the components that we should check against a {\fm},
%there are no guidelines for documenting
the depth of description for reasoning stages in FMEA entries is in practise  variable.
%FMEA does not stipulat which
Ideally each FMEA entry would contain a reasoning description
for each {\fm},
so that the entry can be more easily reviewed or revisited/audited. % than a traditional FMEA report.
%
Because FMEA is traditionally performed with one entry per component {\fm}, full reasoning descriptions
are rare.
This means that re-use, review and checking of traditional analysis must often be started from `cold'.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%