\ifthenelse {\boolean{paper}} { \abstract{ This paper describes how the FMMD methodology can be used to refine safety critical designs and identify undetectable and dormant faults. % Once undetecable faults or dormant faults are discovered the design can be altered (or have a safety component added), and the FMMD analysis process re-applied. This can be an iterative process applied until the design has an acceptable level safety. % of dormant or undetectable failure modes. % Used in this way, its is a design aide, giving the user the possibility to refine/correct a {\dc} from the perspective of its failure mode behaviour. } } { \section{Introduction} This chapter describes how the FMMD methodology can be used to examine safety critical designs and identify undetectable and dormant faults. % Once undetecable faults or dormant faults are discovered the design can be altered (or have a safety component added), and the FMMD analysis process re-applied. This can be an iterative process which can be applied until the design has an acceptable level of safety. % dormant or undetectable failure modes. % Used in this way, its is a design aide, giving the user the possibility to refine/correct a {\dc} from the perspective of its failure mode behaviour. } \section{How FMMD Analysis can reveal design flaws w.r.t. failure behaviour } \ifthenelse {\boolean{paper}} { \paragraph{Overview of FMMD Methodology} The principle of FMMD analysis is a five stage process, the collection of components into {\fg}s, which are analysed w.r.t. their failure mode behaviour, the failure mode behaviour is then viewed from the {\fg} perspective (i.e. as a symptoms of the {\fg}) and common symptoms are then collected. The final stage is to create a {\dc} which has the symptoms of the {\fg} it was sourced from, as its failure modes. } \paragraph{Overview of FMMD Methodology} To re-cap from chapter \ref{symptomex}, the principle of FMMD analysis is a five stage process, the collection of components into {\fg}s, which are analysed w.r.t. their failure mode behaviour, the failure mode behaviour is then viewed from the {\fg} perspective (i.e. as a symptoms of the {\fg}), common symptoms are then collected. The final stage is to create a {\dc} which has the symptoms of the {\fg} it was sourced from, as its failure modes. { % %From the failure mode behaviour of the {\fg} common symptoms are collected. These common symptoms are % in effect the failure mode behaviour of the {\fg} viewed as an % single entity, or a `black box' component. % From the analysis of the {\fg} we can create a {\dc}, where the failure modes are the symptoms of the {\fg} we derived it from. % \paragraph{detectable and undetectable failure modes} The symptoms will be detectable (like a value out of range) or undetectable (like a logic state or value being incorrect). The `undetectable' failure modes undertsandably, are the most worrying for the safety critical designer. EN61058, the statistically based European Norm, using ratios of detected and undetected system failure modes to classify the sytems safety levels and describes sub-clasifications for detected and undetected failure modes~\cite{en61508}. %It is these that are, generally the ones that stand out as single %failure modes. For instance, out of range values, are easy to detect by systems using the {\dc} supplying them. Undetectable faults are ones that forward incorrect information where we have no way of validating or testing it. % we know we can cope with; they %are an obvious error condition that will be detected by any modules %using the {\dc}. % An undetecable failure mode can introduce serious errors into a SYSTEM. \paragraph{dormant faults} A dormant fault is one which can manifest its-self in conjuction with another failure mode becoming active, or an environmental condition changing (for instance temperature). Some component failure modes may lead to dormant failure modes. By examining test cases from a functional group against all operational states and germane environmental conditions we can determine all the failure modes of the {\fg}. \subsection{Iterative Design Example} By applying FMMD analysis to a {\fg} we can determine which failure modes of a {\dc} are undetectable or dormant. We can then either modify the circuit and iteratively apply FMMD to the design again, or we could add another {\fg} that specifically tests for the undetectable/dormant conditions. This \ifthenelse {\boolean{paper}} { paper } { chapter } describes a milli-volt amplifier (see figure \ref{fig:mv1}), with an inbuilt safety\footnote{The `safety resistor' also acts as a potential divider to provide a mill-volt offset. An offset is often required to allow for negative readings from the milli-volt source.} resistor (R18). The circuit is analysed and it is found that all but one component failure modes are detectable. We then design a circuit to test for the `undetectable' failure modes and analyse this with FMMD. The test circuit addition can now be represented by a {\dc}. With both {\dcs} we then use them to form a {\fg} which we can call our `self testing milli-volt amplifier'. We then analsye the {\fg} and the resultant {\dc} failure modes/symptoms are discussed. \section{An example: A Millivolt Amplifier} \begin{figure}[h] \centering \includegraphics[width=200pt,bb=0 0 678 690,keepaspectratio=true]{./fmmd_design_aide/mv_opamp_circuit.png} % mv_opamp_circuit.png: 678x690 pixel, 72dpi, 23.92x24.34 cm, bb=0 0 678 690 \caption{Milli-Volt Amplifier with Safety/Offset Resistor} \label{fig:mv1} \end{figure} \subsection{Brief Circuit Description} This circuit amplifies a milli-volt input by a gain of $\approx$ 184 ($\frac{150E3}{820}+1$) \footnote{The resistors used to program the gain of the op-amp would typically be of a $ \le 1\%$ guaranteed tolerance. In practise, the small variations would be corrected with software constants prorgammed during production test/calibration.}. An offset is applied to the input by R18 and R22 forming a potential divider of $\frac{820}{2.2E6+820}$. With 5V applied as Vcc this gives an input offset of $1.86\,mV$. This amplified offset can be termed a $\Delta V$, an addition to the mV value provided by the sensor. So the amplified offset is $\approx 342 \, mV$. We can determine the output of the amplifier by subtracting this amount from the reading. We can also define an acceptable range for the readings. This would depend on the characteristics of milli-volt source, and also on the thresholds of the volatges considered out of range. For the sake of example let us consider this to be a type K thermocouple amplifier, with a range of temperatures expected to be within {{0}\oc} and {{300}\oc}. \paragraph{Voltage range for {{0}\oc} to {{300}\oc}.} Choosing the common Nickel-Chromium v. Nickel Aluminium `K' type thermocouple, {{0}\oc} provides an EMF of 0mV, and {{300}\oc} 12.207. Multiplying these by 184 and adding the 1.86mV offset gives 342.24mV and 2563.12mV. This is now in a suitable range to be read by an analogue didtital converter, which will have a voltage span typically between 3.3V and 5V on modern microcontrollers/ADC (Analogue Digital Converter) chips. Note that this also leaves a margin or error on both sides of the range. If the thermocouple were to become colder than {{0}\oc} it would supply a negative voltage, which would subtract from the offset. At around {{-47}\oc} the amplifier output would be zero; but anything under 342.24mV is considered out of range. Thus the ADC can comfortably read out of range values but controlling software can determine it as invalid. Similarly anything over 2563.12mV would be considered out of range but would be still within comfortable reading range for an ADC. \section{FMMD Analysis} \begin{table}[h+] \caption{Milli Volt Amplifier Single Fault FMMD} % title of Table \centering % used for centering table \begin{tabular}{||l|c|l|c||} \hline \hline \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \textbf{MTTF} \\ \textbf{Case} & \textbf{mode} & \textbf{ } & \\ % \textbf{per $10^9$ hours of operation} \\ % R & wire & res + & res - & description \hline \hline TC:1 $R18$ SHORT & Amp plus input high & Out of range & 1.38 \\ \hline TC:2 $R18$ OPEN & No Offset Voltage & \textbf{Low reading} & 12.42\\ \hline \hline TC:3 $R22$ SHORT & No offset voltage & \textbf{Low reading} & 1.38 \\ \hline TC:4 $R22$ OPEN & Amp plus high input & Out of Range & 1.38 \\ \hline \hline TC:5 $R26$ SHORT & No gain from amp & Out of Range & 1.38 \\ TC:6 $R26$ OPEN & Very high amp gain & Out of Range & 12.42 \\ \hline \hline TC:5 $R30$ SHORT & Very high amp gain & Out of range & 1.38 \\ TC:6 $R30$ OPEN & No gain from amp & Out of Range & 12.42 \\ \hline \hline TC:7 $OP\_AMP$ LATCH UP & high amp output & Out of range & 1.38 \\ TC:8 $OP\_AMP$ LATCH DOWN & low amp output & Out of Range & 12.42 \\ \hline \end{tabular} \label{tab:fmmdaide1} \end{table} This analysis process, which given the components R18,R22,R26,R30,IC1, has derived the component "milli-volt amplifier" with two failure modes, `Out of Range' and `Low reading'. we can represent this in an FMMD hierarchy diagram, see figure \ref{fig:mvamp_fmmd}. \begin{figure}[h] \centering \includegraphics[width=200pt,keepaspectratio=true]{./fmmd_design_aide/mvamp_fmmd.jpg} % mvamp_fmmd.jpg: 281x344 pixel, 72dpi, 9.91x12.14 cm, bb=0 0 281 344 \caption{FMMD analysis Hierarchy for Milli-Volt Amplifier} \label{fig:mvamp_fmmd} \end{figure} The table \ref{tab:fmmdaide1} shows two possible causes for an undetectable error, that of a low reading due to the loss of the offset millivolt signal. The loss of the $\Delta V$ would mean an incorrect temperature reading would be made. Typically this type of circuit would be used to read a thermocouple and this error symptom, `low\_reading' would mean our plant could beleive that the temperature reading is lower than it actually is. To take an example from a K type thermocouple, the offset of 1.86mV %from the potential divider represents amplified to would represent $\approx \; 46\,^{\circ}{\rm C}$~\cite{eurothermtables}~\cite{aoe}. %\clearpage \subsection{Undetected Failure Mode: Incorrect Reading} Although statistically, this failure is unlikely (get stats for R short FIT etc from pt100 doc) if the reading is considered critical, or we are aiming for a high integrity level this may be unacceptable. We will need to add some type of detection mechanism to the circuit to test $R_{off}$ periodically. For instance were we to check $R_off$ every $\tau = 20mS$ work out detection allowance according to EN61508. \section{Proposed Checking Method} Were we to able to switch a second resistor in series with the 820R resistor (R22) and switch it out again, we could test that the safety resistor (R18) still functioning correctly. With the new resistor switched in we would expect the voltage added by the potential divider to increase. The circuit in figure \ref{fig:mvamp2} shows an FET transistor controlled by the `test line' connection, which can switch in the resitor R36 also with a value of \ohms{820}. We could detect the effect on the reading with the potential divider according to the following formula. %% check figures The potential divider is now $\frac{820R+820R}{2M2+820R+820R}$ over 5V ci this gives 3.724mV, amplified by 184 this is 0.685V \adcten{140}. % The potential divider with the second resistor switched out is $\frac{820R}{2M2+820R}$ over 5V gives 1.86mV, amplified by 184 gives 0.342V \adcten{70}. This is a difference of \adcten{70} in the readings. So periodically, perhaps even as frequently as once every few seconds we can apply the checking resistor and look for a corresponding change in the reading. Lets us analyse this in more detail to prove that we are indeed checking for the failure of the safety resistor, and that we are not introducing any new problems. First let us look at the new transistor and resistor and treat these as a functional group. In our analysis of the failure modes we have to consider both states of the transistor, ON and OFF. \begin{figure}[h] \centering \includegraphics[width=200pt,keepaspectratio=true]{./fmmd_design_aide/mv_opamp_circuit2.png} % mv_opamp_circuit2.png: 577x479 pixel, 72dpi, 20.35x16.90 cm, bb=0 0 577 479 \caption{Amplifier with check circuit} \label{fig:mvamp2} \end{figure} \section{FMMD analysis of Safety Addition} This test circuit has two operational states, in that it can be switched on to apply the test series resistance, and off to obtain the correct reading. % We must examine each test case from these two perspectives. For $\overline{TEST\_LINE}$ ON the transistor is turned OFF and we are in a test mode and expect the reading to go up by around \adcten{70}. For $\overline{TEST\_LINE}$ OFF the tranistor is on and R36 is by-passed, and the reading is assumed to be valid. \begin{table}[h+] \caption{Test Addition Single Fault FMMD} % title of Table \centering % used for centering table \begin{tabular}{||l|l|c|l|c||} \hline \hline \textbf{test line } & \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \textbf{MTTF} \\ \textbf{status} & \textbf{Case} & \textbf{mode} & \textbf{ } & \\ % \textbf{per $10^9$ hours of operation} \\ % R & wire & res + & res - & description \hline \hline %% OK TR1 OFF , and so 36 in series. R36 has shorted so $\overline{TEST\_LINE}$ ON & TC:1 $R36$ SHORT & No added resistance & NO TEST EFFECT & XX 1.38 \\ \hline %% $\overline{TEST\_LINE}$ OFF & TC:1 $R36$ SHORT & dormant failure & NO SYMPTOM & XX 1.38 \\ \hline %% here TR1 should be OFF, as R36 is open we now have an open circuit $\overline{TEST\_LINE}$ ON & TC:2 $R36$ OPEN & open circuit & OPEN CIRCUIT & XX 12.42\\ \hline %% here TR1 should be ON and R36 by-passed, the fact it has gone OPEN means no symptom here, a dormant failure. $\overline{TEST\_LINE}$ OFF & TC:2 $R36$ OPEN & dormant failure & NO SYMPTOM & XX 12.42\\ \hline \hline % %% TR1 OFF so R36 should be in series. Because TR1 is ON because it is faulty, R36 is not in series $\overline{TEST\_LINE}$ LINE ON & TC:3 $TR1$ ALWAYS ON & No added resistance & NO TEST EFFECT & XX 1.38 \\ \hline %% %% TR1 ON R36 should be bypassed by TR1, and it is, but as TR1 is always on we have a dormant failure. $\overline{TEST\_LINE}$ OFF & TC:3 $TR1$ ALWAYS ON & dormant failure & NO SYMPTOM & XX 1.38 \\ \hline %% %% TR1 should be off as overline{TEST\_LINE}$ is ON. As TR1 is faulty it is always off and we have a dormant failure. $\overline{TEST\_LINE}$ LINE ON & TC:4 $TR1$ ALWAYS OFF & dormant failure & NO SYMPTOM & XX 1.38 \\ \hline %% %% TR1 should be ON, but is off due to TR1 failure. The resistance R36 will always be in series therefore $\overline{TEST\_LINE}$ OFF & TC:4 $TR1$ ALWAYS OFF & resistance always added & NO TEST EFFECT & XX 1.38 \\ \hline \hline \end{tabular} \label{tab:testaddition} \end{table} \subsection{Test Cases Analysis in detail} The purpose of this circuit is to switch a resistance in when we want to test the circuit and to switch it out for normal operation. The control is provided by a line called $\overline{TEST\_LINE}$. Thus to apply the test conditions we set $\overline{TEST\_LINE}$ to OFF or false and to order normal operation we set it to ON or true. \subsubsection{TC 1} This test case looks at the shorted resistor failure mode of R36. \paragraph{$\overline{TEST\_LINE}$ ON} Here TR1 should be off and R36 should be in series. As R36 is shorted, this means that no resistance will be contributed to the circuit by R36. In the terms of the behaviour of the functional group, this means that it will provide no test effect. \paragraph{$\overline{TEST\_LINE}$ OFF} Here TR1 will be on and by-pass R36, so it does not make any difference if R36 is shorted. This is a dormant failure, we can only detect this failure when $\overline{TEST\_LINE}$ is ON. \subsubsection{TC 2} This test case looks at the open circuit resistor failure mode of R36. \paragraph{$\overline{TEST\_LINE}$ ON} Here TR1 should be off and R36 should be in series. As R36 is open, this means that the test circuit is no open. In the terms of the behaviour of the functional group, this means that it will cause an open circuit failure. \paragraph{$\overline{TEST\_LINE}$ OFF} Here TR1 will be on and by-pass R36, so it does not make any difference if R36 is open. This is a dormant failure, we can only detect this failure when $\overline{TEST\_LINE}$ is ON. \subsubsection{TC 3} This test case looks at the transistor failure mode where TR1 is always ON. \footnote{The transistor is being used as a switch, and so we can model it as having two failure modes ALWAYS ON or ALWAYS OFF.} \paragraph{$\overline{TEST\_LINE}$ ON} Here TR1 should be off and R36 should be in series. As TR1 is always ON, this means that R36 will always be by-passed. Thus there will be no test effect. \paragraph{$\overline{TEST\_LINE}$ OFF} Here TR1 should be on and by-pass R36. This is a dormant failure, we can only detect this failure when $\overline{TEST\_LINE}$ is ON. \subsubsection{TC 4} This test case looks at the transistor failure mode where TR1 is always OFF. \paragraph{$\overline{TEST\_LINE}$ ON} Here TR1 should be OFF and R36 should be in series. This is a dormant failure, we can only detect this failure when the $\overline{TEST\_LINE}$ is OFF. \paragraph{$\overline{TEST\_LINE}$ OFF} Here TR1 should be ON, but is OFF due to failure. The resistance R36 will always be in series. As a symptom for this circuit, it means that there would be no test effect. \subsection{conclusion of FMMD analysis on safety addition} For the FMMD analysis in table \ref{tab:testaddition} we have two failure modes for its derived component `no~test~effect' or `open~circuit'. %~out~of~range'. The next stage is to combine the two derived components we have made into a functional group. \section{FMMD Hierarchy, with milli-volt amp and safety addition} The next stage is to take the two derived components and place them into a functional group. We can now analyse this functional group w.r.t the failure modes in the two derived compoennts. \begin{figure}[h] \centering \includegraphics[width=300pt,bb=0 0 698 631,keepaspectratio=true]{./fmmd_design_aide/testable_mvamp.jpg} % testable_mvamp.jpg: 698x631 pixel, 72dpi, 24.62x22.26 cm, bb=0 0 698 631 \caption{Testable milli-volt amplifier} \label{fig:testable_mvamp} \end{figure} \subsection{Analysis of FMMD Derived component `testable milli-volt amp'} The failure mode of most concern is the undetectable failure `low~reading'. This has two potential causes in the unmodified circuit, R22\_SHORT and R18\_OPEN. \paragraph{R22\_SHORT with safety addition} With the modified circuit, in the $\overline{TEST\_LINE}$ ON condition TR1 will be off and we will have a reading + test $\Delta V$. However with $\overline{TEST\_LINE}$ OFF we have no potential divider. R18 will pull the +ve terminal on the op-amp up, pushing the result out of range. The failure is thus detectable. \paragraph{R18\_OPEN with safety addition} Here there is no potential divider. The $\overline{TEST\_LINE}$ will have no effect which ever way it is switched. The failure mode is thus detectable. \paragraph{Symptom Extraction for the Functional Group `testable mill-volt amplifier'} We have four failure modes to consider in the functional group `testable mill-volt amplifier'. These are \begin{itemize} \item failure mode: open~potential~divider \item failure mode: no~test~effect \item failure mode: out~of~range \item failure mode: low~reading \end{itemize} We can now collect symptoms; `open~potential~divider' from test will cause R18 to pull the +ve input of the opamp high giving an out of range reading from the op-amp output. We can group `low~reading' with `out~of~range'. The `low~reading' will now becomes either `no~test~effect' or `out~of~range' depending on the $\overline{TEST\_LINE}$ state. \begin{table}[h+] \caption{Testable Milli Volt Amplifier Single Fault FMMD} % title of Table \centering % used for centering table \begin{tabular}{||l|c|l|c||} \hline \hline \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \textbf{MTTF} \\ \textbf{Case} & \textbf{mode} & \textbf{ } & \\ % \textbf{per $10^9$ hours of operation} \\ % R & wire & res + & res - & description \hline \hline TC:1 $testcircuit$ & open potential divider & Out of range & XX 1.38 \\ \hline \hline TC:2 $testcircuit$ & no test effect & no test effect & XX 1.38 \\ \hline \hline TC:3 $mvamp$ & out of range & Out of Range & XX 1.38 \\ \hline TC:4 $mvamp$ & low reading & Out of range \& no test effect & XX 1.38 \\ \hline \end{tabular} \label{tab:fmmdaide2} \end{table} We now have two symptoms, `out~of~range' or `no~test~effect'. So for single component failures we now have a circuit where there are no undetectable failure modes. We can surmise the symptoms in a list. \begin{itemize} \item symptom: \textbf{out~of~range} caused by the failure modes: open~potential~divider, low~reading. \item symptom: \textbf{no~test~effect} caused by the failure modes: no~test~effect, low~reading. \end{itemize} \section{MTTF Reliability statistics} %\clearpage \subsection{OP-AMP FIT Calculations} The DOD electronic reliability of components document MIL-HDBK-217F~\cite{mil1991}[5.1] gives formulae for calculating the %$\frac{failures}{{10}^6}$ ${failures}/{{10}^6}$ % looks better hours for a wide range of generic components. These figures are based on components from the 1980's and MIL-HDBK-217F gives very conservative reliability figures when applied to modern components. The formula for a generic packaged micro~circuit is reproduced in equation \ref{microcircuitfit}. The meanings of and values assigned to its co-efficients are described in table \ref{tab:opamp}. \begin{equation} {\lambda}_p = (C_1{\pi}_T+C_2{\pi}_E){\pi}_Q{\pi}_L \label{microcircuitfit} \end{equation} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % SIL assessment 8 PIN GENERAL OP-AMP %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{table}[ht] \caption{OP AMP FIT assessment} % title of Table \centering % used for centering table \begin{tabular}{||c|c|l||} \hline \hline \em{Parameter} & \em{Value} & \em{Comments} \\ & & \\ \hline \hline $C_1$ & 0.040 & $300 \ge 1000$ BiCMOS transistors \\ \hline ${\pi}_T$ & 1.4 & max temp of $60^o$ C\\ \hline $C_2$ & 0.0026 & number of functional pins(8) \\ \hline ${\pi}_E$ & 2.0 & ground fixed environment (not benign)\\ \hline ${\pi}_Q$ & 2.0 & Non-Mil spec component\\ \hline ${\pi}_L$ & 1.0 & More than 2 years in production\\ \hline \hline \hline \end{tabular} \label{tab:opamp} \end{table} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Taking these parameters and applying equation \ref{microcircuitfit}, $$ 0.04 \times1.4 \times0.0026 \times2.0 \times2.0 \times1.0 = .0005824 $$ we get a value of $0.0005824 \times {10}^6$ failures per hour. This is a worst case FIT\footnote{where FIT (Failure in Time) is defined as failures per Billion (${10}^9$) hours of operation} of 1. \subsection{Switching Transistor} The switching transistor will be operating at a low frequency and well within 50\% of its maximum voltage. MIL-HDBK-217F\cite{mil1992}[6-25] gives an exmaple transistor in these environmental conditions, and assigns an FIT value of 11. \section{Conclusions} With the safety addition the undetectable failure mode of \textbf{low~reading} disappears. However, the overall reliability though goes down ! This is simply because we have more components that {\em can} fail. %% Safety vs. reliability paradox. The sum of the MTTF's for the original circuit is DAH, and for the new one DAH. The circuit is arguably safer now but statistically less reliable. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%