diff --git a/fmmd_design_aide/fmmd_design_aide.tex b/fmmd_design_aide/fmmd_design_aide.tex index 68897b0..3bc30da 100644 --- a/fmmd_design_aide/fmmd_design_aide.tex +++ b/fmmd_design_aide/fmmd_design_aide.tex @@ -5,51 +5,53 @@ paper describes how the FMMD methodology can be used to refine safety critical designs and identify undetectable and dormant faults. % -Its uses an industry standard mill-volt amplifier -circuit, intended for reading thermocouples. -It has an inbuilt safety resistor which allows it +As a working example, an industry standard mill-volt amplifier, intended for reading thermocouples, +circuit is analysed. +It has an inbuilt `safety~resistor' which allows it to detect the thermocouple becoming disconnected/going OPEN. % This circuit is analysed from an FMMD perspective and and two undetectable failure modes are identified. -A `safety check' circuit is then proposed and analysed. +% +An additional `safety check' circuit is then proposed and analysed. This has no undetectable failure modes, but does have one `dormant' failure mode. % This paper shows that once undetectable faults or dormant faults are discovered -the design can be altered (or have a safety component added), and the FMMD analysis process re-applied. +the design can be altered (or have a safety feature added), and the FMMD analysis process can then be re-applied. This can be an iterative process applied until the design has an acceptable level safety. % of dormant or undetectable failure modes. % Used in this way, its is a design aide, giving the user -the possibility to refine/correct a {\dc} from the perspective +the possibility to refine a {\dc} from the perspective of its failure mode behaviour. } } { \section{Introduction} This chapter -describes how the FMMD methodology can be used to examine +describes how the FMMD methodology can be used to refine safety critical designs and identify undetectable and dormant faults. % -Its uses an industry standard mill-volt amplifier -circuit, intended for reading thermocouples. -It has an inbuilt safety resistor which allows it +As a working example, an industry standard mill-volt amplifier, intended for reading thermocouples, +circuit is analysed. +It has an inbuilt `safety~resistor' which allows it to detect the thermocouple becoming disconnected/going OPEN. % This circuit is analysed from an FMMD perspective and and two undetectable failure modes are identified. -A `safety check' circuit is then proposed and analysed. +% +An additional `safety check' circuit is then proposed and analysed. This has no undetectable failure modes, but does have one `dormant' failure mode. % This paper shows that once undetectable faults or dormant faults are discovered -the design can be altered (or have a safety component added), and the FMMD analysis process re-applied. +the design can be altered (or have a safety feature added), and the FMMD analysis process can then be re-applied. This can be an iterative process applied until the design has an acceptable level safety. % of dormant or undetectable failure modes. % Used in this way, its is a design aide, giving the user -the possibility to refine/correct a {\dc} from the perspective +the possibility to refine a {\dc} from the perspective of its failure mode behaviour. } @@ -101,8 +103,8 @@ are the symptoms of the {\fg} we derived it from. } % \paragraph{Undetectable failure modes.} -The symptoms will be detectable (like a value out of range) -or undetectable (like a logic state or value being incorrect). +Within a functional group failure symptoms will be detectable +or undetectable. The `undetectable' failure modes understandably, are the most worrying for the safety critical designer. EN61058~\cite{en61508}, the statistically based failure mode European Norm, using ratios of detected and undetected system failure modes to @@ -206,12 +208,14 @@ Choosing the common Nickel-Chromium v. Nickel Aluminium `K' type thermocouple, Multiplying these by 184 and adding the 1.86mV offset gives 342.24mV and 2563.12mV. This is now in a suitable range to be read by an analogue digital converter, which will have a voltage span -typically between 3.3V and 5V on modern micro-controllers/ADC (Analogue Digital Converter) chips. +typically between 3.3V and 5V~\cite{pic18f2523}.% on modern micro-controllers/ADC (Analogue Digital Converter) chips. Note that this also leaves a margin or error on both sides of the range. If the thermocouple were to become colder than {{0}\oc} it would supply a negative voltage, which would subtract from the offset. At around {{-47}\oc} the amplifier output would be zero; -but anything under 342.24mV is considered out of range. +but anything under say 10mV is considered out of range\footnote{We need some negative range +to cope with cold junction compensation~\cite{aoe}, +which is a subject beyond the scope of this paper}. Thus the ADC can comfortably read out of range values but controlling software can determine it as invalid. Similarly anything over 2563.12mV would be considered out of range @@ -284,8 +288,8 @@ if the reading is considered critical, or we are aiming for a high integrity lev this may be unacceptable. We will need to add some type of detection mechanism to the circuit to test $R_{off}$ periodically. -For instance were we to check $R_off$ every $\tau = 20mS$ work out detection -allowance according to EN61508~\cite{en61508}. +%For instance were we to check $R_{off}$ every $\tau = 20mS$ work out detection +%allowance according to EN61508~\cite{en61508}. \section{Proposed Checking Method} @@ -298,7 +302,7 @@ With the new resistor switched in we would expect the voltage added by the potential divider to increase. -The circuit in figure \ref{fig:mvamp2} shows an FET transistor +The circuit in figure \ref{fig:mvamp2} shows an bi-polar transistor % yes its menally ill and goes on mad shopping spreees etc controlled by the `test line' connection, which can switch in the resitor R36 also with a value of \ohms{820}. @@ -357,7 +361,7 @@ and the reading is assumed to be valid. \centering % used for centering table \begin{tabular}{||l|l|c|l|c||} \hline \hline - \textbf{test line } & \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \textbf{MTTF} \\ + \textbf{test line } & \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \\ %\textbf{MTTF} \\ \textbf{status} & \textbf{Case} & \textbf{mode} & \textbf{ } & \\ % \textbf{per $10^9$ hours of operation} \\ % R & wire & res + & res - & description \hline @@ -373,16 +377,16 @@ $\overline{TEST\_LINE}$ OFF & TC:2 $R36$ OPEN & dormant failure \hline % %% TR1 OFF so R36 should be in series. Because TR1 is ON because it is faulty, R36 is not in series -$\overline{TEST\_LINE}$ LINE ON & TC:3 $TR1$ ALWAYS ON & No added resistance & NO TEST EFFECT & XX 1.38 \\ \hline +$\overline{TEST\_LINE}$ LINE ON & TC:3 $TR1$ ALWAYS ON & No added resistance & NO TEST EFFECT & 3 \\ \hline %% %% TR1 ON R36 should be bypassed by TR1, and it is, but as TR1 is always on we have a dormant failure. -$\overline{TEST\_LINE}$ OFF & TC:3 $TR1$ ALWAYS ON & dormant failure & NO SYMPTOM & XX 1.38 \\ \hline +$\overline{TEST\_LINE}$ OFF & TC:3 $TR1$ ALWAYS ON & dormant failure & NO SYMPTOM & 3 \\ \hline %% %% TR1 should be off as overline{TEST\_LINE}$ is ON. As TR1 is faulty it is always off and we have a dormant failure. -$\overline{TEST\_LINE}$ LINE ON & TC:4 $TR1$ ALWAYS OFF & dormant failure & NO SYMPTOM & 1.38 \\ \hline +$\overline{TEST\_LINE}$ LINE ON & TC:4 $TR1$ ALWAYS OFF & dormant failure & NO SYMPTOM & 8 \\ \hline %% %% TR1 should be ON, but is off due to TR1 failure. The resistance R36 will always be in series therefore -$\overline{TEST\_LINE}$ OFF & TC:4 $TR1$ ALWAYS OFF & resistance always added & NO TEST EFFECT & 1.38 \\ \hline +$\overline{TEST\_LINE}$ OFF & TC:4 $TR1$ ALWAYS OFF & resistance always added & NO TEST EFFECT & 8 \\ \hline \hline \end{tabular} \label{tab:testaddition} @@ -502,23 +506,30 @@ giving an out of range reading from the op-amp output. We can group `low~reading' with `out~of~range'. The `low~reading' will now becomes either `no~test~effect' or `out~of~range' depending on the $\overline{TEST\_LINE}$ state. + +% +% NB: the calculate MTTF here we have to traverse down the DAG +% adding XOR conditions and multiplying AND conditions +% 16MAR2011 +% + \begin{table}[h+] \caption{Testable Milli Volt Amplifier Single Fault FMMD} % title of Table \centering % used for centering table \begin{tabular}{||l|c|l|c||} \hline \hline - \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \textbf{MTTF} \\ + \textbf{Test} & \textbf{Failure } & \textbf{Symptom } & \\ % \textbf{MTTF} \\ \textbf{Case} & \textbf{mode} & \textbf{ } & \\ % \textbf{per $10^9$ hours of operation} \\ % R & wire & res + & res - & description \hline \hline -TC:1 $testcircuit$ & open potential divider & Out of range & XX 1.38 \\ \hline +TC:1 $testcircuit$ & open potential divider & Out of range & \\ \hline % XX 1.38 \\ \hline \hline -TC:2 $testcircuit$ & no test effect & no test effect & XX 1.38 \\ \hline +TC:2 $testcircuit$ & no test effect & no test effect & \\ \hline % XX 1.38 \\ \hline \hline -TC:3 $mvamp$ & out of range & Out of Range & XX 1.38 \\ +TC:3 $mvamp$ & out of range & Out of Range & \\ \hline % XX 1.38 \\ \hline -TC:4 $mvamp$ & low reading & Out of range \& no test effect & XX 1.38 \\ +TC:4 $mvamp$ & low reading & Out of range \& no test effect & \\ \hline % XX 1.38 \\ \hline \end{tabular} @@ -598,6 +609,78 @@ and well within 50\% of its maximum voltage. We can also assume a benign temperature environment of $ < 60^{o}C$. MIL-HDBK-217F\cite{mil1992}[6-25] gives an exmaple transistor in these environmental conditions, and assigns an FIT value of 11. +% +The RAC failure mode distributuions manual~\cite{fmd91}[2-25] entry for +bi-polar transistors, gives a 0.73 probability of them failing shorted, and a 0.23 probability of them failing OPEN. +% +For this exmaple, we can therefore use a FIT value of 8 ($0.73 \times 11$) the transistor failing +SHORT and a FIT of 3 ($0.27 \times 11$) failing OPEN. + + + +\subsection{Resistors} +\ifthenelse {\boolean{paper}} +{ +The formula for given in MIL-HDBK-217F\cite{mil1991}[9.2] for a generic fixed film non-power resistor +is reproduced in equation \ref{resistorfit}. The meanings +and values assigned to its co-efficients are described in table \ref{tab:resistor}. + +\glossary{name={FIT}, description={Failure in Time (FIT). The number of times a particular failure is expected to occur in a $10^{9}$ hour time period.}} +\fmodegloss + +\begin{equation} +% fixed comp resistor{\lambda}_p = {\lambda}_{b}{\pi}_{R}{\pi}_Q{\pi}_E +resistor{\lambda}_p = {\lambda}_{b}{\pi}_{R}{\pi}_Q{\pi}_E + \label{resistorfit} +\end{equation} + +\begin{table}[ht] +\caption{Fixed film resistor Failure in time assessment} % title of Table +\centering % used for centering table +\begin{tabular}{||c|c|l||} +\hline \hline + \em{Parameter} & \em{Value} & \em{Comments} \\ + & & \\ \hline \hline + ${\lambda}_{b}$ & 0.00092 & stress/temp base failure rate $60^o$ C \\ \hline + %${\pi}_T$ & 4.2 & max temp of $60^o$ C\\ \hline + ${\pi}_R$ & 1.0 & Resistance range $< 0.1M\Omega$\\ \hline + ${\pi}_Q$ & 15.0 & Non-Mil spec component\\ \hline + ${\pi}_E$ & 1.0 & benign ground environment\\ \hline + +\hline \hline +\end{tabular} +\label{tab:resistor} +\end{table} + +Applying equation \ref{resistorfit} with the parameters from table \ref{tab:resistor} +give the following failures in ${10}^6$ hours: + +\begin{equation} + 0.00092 \times 1.0 \times 15.0 \times 1.0 = 0.0138 \;{failures}/{{10}^{6} Hours} + \label{eqn:resistor} +\end{equation} + +While MIL-HDBK-217F gives MTTF for a wide range of common components, +it does not specify how the components will fail (in this case OPEN or SHORT). {Some standards, notably EN298 only consider resistors failing in OPEN mode}. +%FMD-97 gives 27\% OPEN and 3\% SHORTED, for resistors under certain electrical and environmental stresses. +% FMD-91 gives parameter change as a third failure mode, luvvverly 08FEB2011 +This example +compromises and uses a 90:10 ratio, for resistor failure. +Thus for this example resistors are expected to fail OPEN in 90\% of cases and SHORTED +in the other 10\%. +A standard fixed film resistor, for use in a benign environment, non military spec at +temperatures up to 60\oc is given a probability of 13.8 failures per billion ($10^9$) +hours of operation (see equation \ref{eqn:resistor}). +This figure is referred to as a FIT\footnote{FIT values are measured as the number of +failures per Billion (${10}^9$) hours of operation, (roughly 114,000 years). The smaller the +FIT number the more reliable the fault~mode} Failure in time. +} +{ % CHAPTER +Resistors for this example are considered to have a FIT of 13.8, and are expected to fail OPEN in 90\% of cases and SHORTED +in the other 10\%. +This is described in detail with supporting references in \ref{resistorfit}. +} + \section{Conclusions} @@ -605,14 +688,12 @@ With the safety addition the undetectable failure mode of \textbf{low~reading} disappears. However, the overall reliability though goes down ! This is simply because we have more components that {\em can} fail. - %% Safety vs. reliability paradox. - -The sum of the MTTF's for the original circuit is DAH, and for the new one -DAH. The circuit is arguably safer now +%The sum of the MTTF's for the original circuit is DAH, and for the new one +%DAH. +The circuit is arguably safer now but statistically less reliable. - \paragraph{Practical side effect of checking for thermocouple disconnection} Because the potential divider provides an offset as a side effect of detecting a disconnection