Robin_PHD/fmmd_design_aide/fmmd_design_aide.tex

\ifthenelse {\boolean{paper}}
{
\abstract{ This
paper
describes how the FMMD methodology can be used to refine
safety critical designs and identify undetectable and dormant faults.
%
Once undetecable faults or dormant faults are discovered
the design can be altered (or have a safety component added), and the FMMD analysis process re-applied.
This can be an iterative process applied until the
design has an acceptable level safety. % of dormant or undetectable failure modes.
%
Used in this way, its is a design aide, giving the user
the possibility to refine/correct a {\dc} from the perspective
of its failure mode behaviour.
}
}
{
\section{Introduction}
This chapter
describes how the FMMD methodology can be used to examine
safety critical designs and identify undetectable and dormant faults.
%
Once undetecable faults or dormant faults are discovered
the design can be altered (or have a safety component added), and the FMMD analysis process re-applied.
This can be an iterative process which can be applied until the
design has an acceptable level of safety. % dormant or undetectable failure modes.
%
Used in this way, its is a design aide, giving the user
the possibility to refine/correct a {\dc} from the perspective
of its failure mode behaviour.
}


\section{How FMMD Analysis can reveal design flaws w.r.t. failure behaviour }

\paragraph{Overview of FMMD Methodology}
The principle of FMMD analysis is a four stage process,
the collection of components into {\fg}s,
which are analysed  w.r.t. their failure mode behaviour,
the failure mode behaviour is then viewed from the
{\fg} perspective (i.e. as a symptoms of the {\fg}),
common  symptoms are then collected.

%
%From the failure mode behaviour of the {\fg} common symptoms are collected.
These common symptoms are % in effect
the failure mode behaviour of
the {\fg} viewed as an % single
entity, or a `black box' component.
%
From the analysis of the {\fg} we can create a {\dc}, where the failure modes
are the symptoms of the {\fg} we derived it from.
%
\paragraph{detectable and undetectable failure modes}
The symptoms will be detectable (like a value of  of range)
or undetectable (like a logic state or value being incorrect).
The `undetectable' failure modes are the most worrying for the safety critical designer.
%It is these that are, generally the ones that stand out as single
%failure modes.
For instance, out of range values, are easy to detect by
systems using the {\dc} supplying them.
An undetectable faults are ones that forward incorrect information
where we have no way of validating or testing it.
% we know we can cope with; they
%are an obvious error condition that will be detected by any modules
%using the {\dc}.
%
An undetecable failure mode can introduce serious
errors into a SYSTEM.


\paragraph{dormant faults} A dormant fault is one
which can manifest its-self in conjuction with
another failure mode becoming active, or an environmental
condition changing (for instance temperature). Some
component failure modes may lead to dormant failure modes.
By examining test cases from a functional group against all
input conditions and germane environmental conditions
we can determine all the failure modes of the {\fg}.

\subsection{Iterative Design Example}

By applying FMMD analysis to a {\fg} we can determine which failure
modes of a {\dc} are undetectable or dormant.
We can then either modify the circuit and iteratively
apply FMMD to the design again, or we could add another {\fg}
that specifically tests for the undetectable/dormant conditions.

This
\ifthenelse {\boolean{paper}}
{
paper
}
{
chapter
}
describes a milli-volt amplifier (see R18 in figure \ref{fig:mv1}), with an inbuilt safety\footnote{The `safety resistor' also acts
as a potential divider to provide a mill-volt offset. An offset is often required to allow for negative readings form the
milli-volt source.}
resistor. The circuit is analysed and it is found that all but one component failure modes
are detectable.
We then design a circuit to test for the `undetectable' failure mode
and analyse this with FMMD.
With both {\dcs} we then use them to form a {\fg} which we can call our `self testing milli-volt amplifier'.
We then analsye the {\fg} and the resultant {\dc} failure modes/symptoms are discussed.
\section{An example: A Millivolt Amplifier}

\begin{figure}[h]
 \centering
 \includegraphics[width=200pt,bb=0 0 678 690,keepaspectratio=true]{./fmmd_design_aide/mv_opamp_circuit.png}
 % mv_opamp_circuit.png: 678x690 pixel, 72dpi, 23.92x24.34 cm, bb=0 0 678 690
 \caption{Milli-Volt Amplifier with Safety/Offset Resistor}
 \label{fig:mv1}
\end{figure}

\subsection{Brief Circuit Description}

This circuit amplifies a milli-volt input by a gain of $\approx$ 184 ($\frac{150E3}{820}+1$).
An offset is applied to the input by R18 and R22 forming a potential divider
of $\frac{820}{2.2E6+820}$. With 5V applied as Vcc this gives an input offset of $1.86\,mV$.
So the amplified offset is $\approx 342 \, mV$. We can determine the output of the amplifier
by subtracting this amount from the reading. We can also define an acceptable
range for the readings. This would depend on the characteristics of milli-volt source, and also on the
thresholds of the volatges considered out of range. For the sake of example let us
consider this to be a type K thermocouple amplifier, with a range of temperatures
expected to be within {{0}\oc} and {{300}\oc}.

EXPAND

\section{FMMD Analysis}


\begin{table}[h+]
\caption{Milli Volt Amplifier Single Fault FMMD} % title of Table
\centering % used for centering table
\begin{tabular}{||l|c|l|c||}
\hline \hline
 \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \textbf{MTTF}   \\
 \textbf{Case} &  \textbf{mode} & \textbf{       } &    \\ % \textbf{per $10^9$ hours of operation}   \\
%   R         &    wire        & res +         & res -    & description
\hline
\hline
TC:1 $R18$ SHORT    &  Amp plus input high        &  Out of range       &  1.38  \\ \hline
TC:2 $R18$  OPEN     &  No Offset Voltage         & \textbf{Low reading}     &  12.42\\ \hline
 \hline
TC:3 $R22$  SHORT    &  No offset voltage       & \textbf{Low reading}      & 1.38  \\ \hline
TC:4 $R22$  OPEN     &  Amp plus high input       & Out of Range      & 1.38  \\ \hline
\hline
TC:5 $R26$ SHORT     &  No gain from amp         &  Out of Range   &  1.38  \\
TC:6 $R26$ OPEN     &  Very high amp gain     & Out of Range     & 12.42 \\ \hline
\hline
TC:5 $R30$ SHORT     &  Very high amp gain         &  Out of range   &  1.38  \\
TC:6 $R30$ OPEN     &  No gain from amp            &  Out of Range     & 12.42 \\ \hline
\hline
TC:7 $OP\_AMP$ LATCH UP     &   high amp output         &  Out of range   &  1.38  \\
TC:8 $OP\_AMP$ LATCH DOWN     &  low amp output            &  Out of Range     & 12.42 \\ \hline

\end{tabular}
\label{tab:fmmdaide1}
\end{table}


This analysis process, which given the components R18,R22,R26,R30,IC1, has derived
the component "milli-volt amplifier" with two failure modes, `Out of Range' and
`Low reading'.
we can represent this in an FMMD hierarchy diagram, see figure \ref{fig:mvamp_fmmd}.

\begin{figure}[h]
 \centering
 \includegraphics[width=200pt,keepaspectratio=true]{./fmmd_design_aide/mvamp_fmmd.jpg}
 % mvamp_fmmd.jpg: 281x344 pixel, 72dpi, 9.91x12.14 cm, bb=0 0 281 344
 \caption{FMMD analysis Hierarchy for Milli-Volt Amplifier}
 \label{fig:mvamp_fmmd}
\end{figure}

The table \ref{tab:fmmdaide1} shows two possible causes for an undetectable
error, that of a low reading due to the loss of the offset millivolt signal.
Typically this type of circuit would be used to read a thermocouple
and this error symptom, `low\_reading' would mean our plant could
beleive that the temperature reading is lower than it actually is.
To take an example from a K type thermocouple, the offset of 1.86mV
%from the potential divider represents amplified to
would represent  $\approx \; 46\,^{\circ}{\rm C}$ \cite{eurothermtables} \cite{aoe}.

\clearpage
\subsection{Undetected Failure Mode: Incorrect Reading}

Although statistically, this failure is unlikely (get stats for R short FIT etc from pt100 doc)
if the reading is considered critical, or we are aiming for a high integrity level
this may be unacceptable.
We will need to add some type of detection mechanism to the circuit to
test $R_{off}$ periodically.
For instance were we to check $R_off$ every $\tau = 20mS$ work out detection
allowance according to EN61508.


\section{Proposed Checking Method}

Were we to able to switch a second resistor in series with the
820R resistor (R22) and switch it out again, we could test
that the safety resistor (R18) still functioning correctly.

With the new resistor switched in we would expect
the voltage added by the potential divider
to increase.

The circuit in figure \ref{fig:mvamp2} shows an FET transistor
controlled by the `test line' connection, which can switch in the resitor R36
also with a value of \ohms{820}.

We could detect the effect on the reading with the potential divider
according to the following formula.

%% check figures
The potential divider is now $\frac{820R+820R}{2M2+820R+820R}$ over 5V ci this gives
3.724mV, amplified by 184 this is 0.685V \adcten{140}.
%
The potential divider with the second resistor
switched out is $\frac{820R}{2M2+820R}$ over 5V gives 1.86mV,
amplified by 184 gives 0.342V \adcten{70}.

This is a difference of \adcten{70} in the readings.

So periodically, perhaps even as frequently as once every few seconds
we can apply the checking resistor and look for a corresponding
change in the reading.

Lets us analyse this in more detail to prove that we are indeed checking for
the failure of the safety resistor, and that we are not introducing
any new problems.

First let us look at the new transistor and resistor and
treat these as a functional group.
In our analysis of the failure modes we have to consider
both states of the transistor, ON and OFF.

\begin{figure}[h]
 \centering
 \includegraphics[width=200pt,keepaspectratio=true]{./fmmd_design_aide/mv_opamp_circuit2.png}
 % mv_opamp_circuit2.png: 577x479 pixel, 72dpi, 20.35x16.90 cm, bb=0 0 577 479
 \caption{Amplifier with check circuit}
 \label{fig:mvamp2}
\end{figure}


\section{FMMD analysis of Safety Addition}


This test circuit has two operational states, in that it
can be switched on to apply the test series resistance, and
off to obtain the correct reading.
%
We must examine each test case from these two perspectives.
For $\overline{TEST\_LINE}$  ON the transistor is turned OFF
and we are in a test mode and expect the reading to go up by around \adcten{70}.
For $\overline{TEST\_LINE}$ OFF the tranistor is on and R36 is by-passed,
and the reading is assumed to be valid.

\begin{table}[h+]
\caption{Test Addition Single Fault FMMD} % title of Table
\centering % used for centering table
\begin{tabular}{||l|l|c|l|c||}
\hline \hline
 \textbf{test line } & \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \textbf{MTTF}   \\
 \textbf{status}     & \textbf{Case} &  \textbf{mode} & \textbf{       } &    \\ % \textbf{per $10^9$ hours of operation}   \\
%   R         &    wire        & res +         & res -    & description
\hline
\hline
%% OK TR1 OFF , and so 36 in series. R36 has shorted so
$\overline{TEST\_LINE}$ ON  & TC:1 $R36$ SHORT     & No added resistance               &  NO TEST EFFECT           & XX 1.38  \\ \hline
%%
$\overline{TEST\_LINE}$ OFF  & TC:1 $R36$ SHORT      & dormant failure        &  NO SYMPTOM       & XX 1.38  \\ \hline
%% here TR1 should be OFF, as R36 is open we now have an open circuit
$\overline{TEST\_LINE}$ ON  & TC:2 $R36$  OPEN      & open circuit              &  OPEN CIRCUIT               & XX 12.42\\ \hline
%% here TR1 should be ON and R36 by-passed, the fact it has gone OPEN means no symptom here, a dormant failure.
$\overline{TEST\_LINE}$ OFF  & TC:2 $R36$  OPEN     & dormant failure              &  NO SYMPTOM               & XX 12.42\\ \hline
 \hline
%
%% TR1 OFF so R36 should be in series. Because TR1 is ON because it is faulty, R36 is not in series
$\overline{TEST\_LINE}$ LINE ON & TC:3   $TR1$  ALWAYS ON    &  No added resistance   &  NO TEST EFFECT      & XX 1.38  \\ \hline
%%
%% TR1 ON R36 should be bypassed by TR1, and it is, but as TR1 is always on we have a dormant failure.
$\overline{TEST\_LINE}$ OFF & TC:3   $TR1$  ALWAYS ON    &  dormant failure    &   NO SYMPTOM & XX 1.38  \\ \hline
%%
%% TR1 should be off as overline{TEST\_LINE}$ is ON. As TR1 is faulty it is always off and we have a dormant failure.
$\overline{TEST\_LINE}$ LINE ON & TC:4   $TR1$  ALWAYS OFF    &  dormant failure    &   NO SYMPTOM   & XX 1.38  \\ \hline
%%
%% TR1 should be ON, but is off due to TR1 failure. The resistance R36 will always be in series therefore
$\overline{TEST\_LINE}$ OFF & TC:4 $TR1$  ALWAYS OFF    & resistance always added       &    NO TEST EFFECT   & XX 1.38  \\ \hline
\hline
\end{tabular}
\label{tab:testaddition}
\end{table}

\subsection{Test Cases Analysis in detail}

The purpose of this circuit is to switch a resistance in when we want to test the circuit
and to switch it out for normal operation.
The control is provided by a line called $\overline{TEST\_LINE}$.
Thus to apply the test conditions we set $\overline{TEST\_LINE}$ to OFF or false
and to order normal operation we set it to ON or true.

\subsubsection{TC 1}
This test case looks at the shorted resistor failure mode of R36.
\paragraph{$\overline{TEST\_LINE}$ ON}
Here TR1 should be off and R36 should be in series. As R36 is shorted, this means that
no resistance will be contributed to the circuit by R36.
In the terms of the behaviour
of the functional group, this means that it will provide no test effect.
\paragraph{$\overline{TEST\_LINE}$ OFF}
Here TR1 will be on and by-pass R36, so it does not make any difference if
R36 is shorted. This is a dormant failure, we can only detect this failure
when $\overline{TEST\_LINE}$ is ON.


\subsubsection{TC 2}
This test case looks at the open circuit resistor failure mode of R36.
\paragraph{$\overline{TEST\_LINE}$ ON}
Here TR1 should be off and R36 should be in series. As R36 is open, this means that
the test circuit is no open.
In the terms of the behaviour
of the functional group, this means that it will cause an open circuit failure.
\paragraph{$\overline{TEST\_LINE}$ OFF}
Here TR1 will be on and by-pass R36, so it does not make any difference if
R36 is open. This is a dormant failure, we can only detect this failure
when $\overline{TEST\_LINE}$ is ON.


\subsubsection{TC 3}
This test case looks at the transistor failure mode where TR1 is always ON.
\footnote{The transistor is being used as a switch, and so we can model it as having two failure modes ALWAYS ON or ALWAYS OFF.}
\paragraph{$\overline{TEST\_LINE}$ ON}
Here TR1 should be off and R36 should be in series. As TR1 is always ON, this means that
R36 will always be by-passed. Thus there will be no test effect.
\paragraph{$\overline{TEST\_LINE}$ OFF}
Here TR1 should be on and by-pass R36.
This is a dormant failure, we can only detect this failure
when $\overline{TEST\_LINE}$ is ON.

\subsubsection{TC 4}
This test case looks at the transistor failure mode where TR1 is always OFF.
\paragraph{$\overline{TEST\_LINE}$ ON}
Here TR1 should be OFF and R36 should be in series.
This is a dormant failure, we can only detect this failure
\paragraph{$\overline{TEST\_LINE}$ OFF}
Here TR1 should be ON, but is OFF due to failure.
The resistance R36 will always be in series.
As a symptom for this circuit, it means that there would be no test effect.


\subsection{conclusion of FMMD analysis on safety addition}

For the FMMD analysis in table \ref{tab:testaddition} we have two failure modes for its derived component
`no~test~effect' or `open~circuit'.
%~out~of~range'.

The next stage is to combine the two derived components we have made into
a functional group.

\section{FMMD Hierarchy, with milli-volt amp and safety addition}

The next stage is to take the two derived components
and place them into a functional group.
We can now analyse this functional
grou w.r.t the failure modes in the two derived compoennts.

\vspace{20pt}
Draw FMMD hierarchy diagram.
\vspace{20pt}

\subsection{Analysis of FMMD Derived component `added safety milli-volt amp'}


\section{conclusions}

With safety addition reliability GOES DOWN !
But safety goes UP !
Work it out

Yes so we now have aditional failure modesso the reliability
of the `self testing' circuit is lower than the basic one.