EN61508:6\cite{en61508}[B.6.6] describes FMEA as: \begin{quotation} "To analyse a system design, by examining all possible sources of failure of a system's components and determining the effects of these failures on the behaviour and safety of the system." \end{quotation}. \section{Concepts} Forward and backward searching... forward search starts with possible failure causes and works out what could happen backward search uses possible failures and works back down (and not necessarily to base components in a system) Reasoning distance .... general concept... simple ideas about how complex a failure analysis is the more modules and components are involved % cite for forward and backward search related to safety critical software \cite{Lutz:1997:RAU:590564.590572} %{sfmeaforwardbackward} \section{F.M.E.A.} \subsection{FMEA} %\tableofcontents[currentsection] FMEA is a broad term; it could mean anything from an informal check on how how failures could affect some equipment in an initial brain-storming session in product design, to formal submissions as part of safety critical certification. % This chapter describes basic concepts of FMEA, uses a simple example to demonstrate a single FMEA analysis stage, describes the four main variants of FMEA in use today and explores some concepts with which we can discuss and evaluate the effectiveness of FMEA. % \subsection{FMEA} % This talk introduces Failure Mode Effects Analysis, and the different ways it is applied. % These techniques are discussed, and then % a refinement is proposed, which is essentially a modularisation of the FMEA process. % % % % \begin{itemize} % \item Failure % \item Mode % \item Effects % \item Analysis % \end{itemize} % % % % % % \begin{itemize} % % \item Failure % % \item Mode % % \item Effects % % \item Analysis % % \end{itemize} \clearpage \paragraph{FMEA basic concept.} \begin{itemize} \item \textbf{F - Failures of given component} Consider a component in a system \item \textbf{M - Failure Mode} Look at one of the ways in which it can fail (i.e. determine a component `failure~mode') \item \textbf{E - Effects} Determine the effects this failure mode will cause to the system we are examining \item \textbf{A - Analysis} Analyse how much impact this symptom will have on the environment/people/the system itsself \end{itemize} FMEA is a procedure based on the low level components of a system, and an example analysis will serve to demonstrate it in practise. \paragraph{ FMEA Example: Milli-volt reader} Example: Let us consider a system, in this case a milli-volt reader, consisting of instrumentation amplifiers connected to a micro-processor that reports its readings via RS-232. \begin{figure} \centering \includegraphics[width=175pt]{./CH2_FMEA/mvamp.png} % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403 \end{figure} \subsection{FMEA Example: Milli-volt reader} Let us perform an FMEA and consider how one of its resistors failing could affect it. For the sake of example let us choose resistor R1 in the OP-AMP gain circuitry. % \begin{figure} % \centering % \includegraphics[width=175pt]{./mvamp.png} % % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403 % \end{figure} \paragraph{FMEA Example: Milli-volt reader} % \begin{figure} % \centering % \includegraphics[width=80pt]{./mvamp.png} % % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403 % \end{figure} \begin{itemize} \item \textbf{F - Failures of given component} The resistor (R1) could fail by going OPEN or SHORT (EN298 definition). \item \textbf{M - Failure Mode} Consider the component failure mode SHORT \item \textbf{E - Effects} This will drive the minus input LOW causing a HIGH OUTPUT/READING \item \textbf{A - Analysis} The reading will be out of the normal range, and we will have an erroneous milli-volt reading \end{itemize} The analysis above has given us a result for one failure scenario i.e. for one component failure mode. A complete FMEA report would have to contain an entry for each failure mode of all the components in the system under investigation. % Note here that we have had to look at the failure~mode in relation to the entire circuit. We have used intuition to determine the probable effect of this failure mode. For instance we have assumed that the resistor R1 going SHORT will not affect the ADC, the Microprocessor or the UART. % To put this in more general terms, have not examined this failure mode against every other component in the system. Perhaps we should: this would be a more rigorous and complete approach in looking for system failures. \section{Theoretical Concepts in FMEA} \subsection{The unacceptability of a single component failure causing a catastrophe} FMEA, due to its inductive bottom-up approach, is very good at finding potential component failures that could have catastrophic implications. Used in the design phase of a project FMEA is an invaluable tool for unearthing these type of failure scenario. It is less useful for determining catastrophic events for multiple simultaneous\footnote{Multiple simultaneous failures are taken to mean failure that occur within the same detection period.} failures. \subsection{Impracticality of Field Data for modern systems} Modern electronic components, are generally very reliable, and the systems built from them are thus very reliable too. Reliable field data on failures will, therefore be sparse. Should we wish to prove a continuous demand system for say ${10}^{-7}$ failures\footnote{${10}^{-7}$ failures per hour of operation is the threshold for S.I.L. 3 reliability~\cite{en61508}.} per hour of operation, even with 1000 correctly monitored units in the field we could only expect one failure per ten thousand hours (a little over one a year). It would be utterly impractical to get statistically significant data for equipment at these reliability levels. However, we can use FMEA (more specifically the FMEDA variant, see section~\ref{sec:FMEDA}), working from known component failure rates, to obtain statistical estimates of the equipment reliability. \subsection{Rigorous FMEA --- State Explosion Problem} \paragraph{Rigorous Single Failure FMEA} FMEA for a safety critical certification~\cite{en298,en61508} will have to be applied to all known failure modes of all components within a system. To perform FMEA rigorously (i.e. to examine every possible interaction of a failure mode with all other components in a system). Or in other words, ---we would need to look at all possible failure scenarios. %to do this completely (all failure modes against all components). This is represented in the equation below. %~\ref{eqn:fmea_state_exp}, where $N$ is the total number of components in the system, and $f$ is the number of failure modes per component. \begin{equation} \label{eqn:fmea_single} N.(N-1).f % \\ %(N^2 - N).f \end{equation} \paragraph{Rigorous Single Failure FMEA} This would mean an order of $O(N^2)$ number of checks to perform to undertake a `rigorous~FMEA'. Even small systems have typically 100 components, and they typically have 3 or more failure modes each. $100*99*3=29,700$. \paragraph{Rigorous Double Failure FMEA} For looking at potential double failure scenarios\footnote{Certain double failure scenarios are already legal requirements---The European Gas burner standard (EN298:2003)---demands the checking of double failure scenarios (for burner lock-out scenarios).} (two components failing within a given time frame) and the order becomes $O(N^3)$. \begin{equation} \label{eqn:fmea_double} N.(N-1).(N-2).f % \\ %(N^2 - N).f \end{equation} For our theoretical 100 components with 3 failure modes each example, this is $100*99*98*3=2,910,600$ failure mode scenarios. \paragraph{Reliance of experts for meaningful FMEA Analysis.} FMEA cannot consider---for practical reasons---a rigorous approach. We define rigorous FMEA as examining the effect of every component failure mode against the remaining components in the system under investigation. % Because we cannot perform rigorous FMEA, we rely on experts in the system under investigation to perform a meaningful FMEA analysis. \section{FMEA in practise: Five variants} \paragraph{Five main Variants of FMEA} \begin{itemize} \item \textbf{PFMEA - Production} Car Manufacture etc \item \textbf{FMECA - Criticallity} Military/Space \item \textbf{FMEDA - Statistical safety} EN61508/IOC1508 Safety Integrity Levels \item \textbf{DFMEA - Design or static/theoretical} EN298/EN230/UL1998 \item \textbf{SFMEA - Software FMEA --- only used in highly critical systems at present} \end{itemize} \section{PFMEA - Production FMEA : 1940's to present} Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of cost, problems to be addressed in product production. It focuses on known problems, determines the frequency they occur and their cost to fix. This is multiplied together and called an RPN number. Fixing problems with the highest RPN number will return most cost benefit. % benign example of PFMEA in CARS - make something up. \subsection{PFMEA Example} \begin{table}[ht] \caption{FMEA Calculations} % title of Table %\centering % used for centering table \begin{tabular}{|| l | l | c | c | l ||} \hline \textbf{Failure Mode} & \textbf{P} & \textbf{Cost} & \textbf{Symptom} & \textbf{RPN} \\ \hline \hline relay 1 n/c & $1*10^{-5}$ & 38.0 & indicators fail & 0.00038 \\ \hline relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline % rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\ % ruptured f.tank & & & & \\ \hline \hline \end{tabular} \end{table} %Savings: 180 burn deaths, 180 serious burn injuries, 2,100 burned vehicles. Unit Cost: $200,000 per death, $67,000 per injury, $700 per vehicle. %Total Benefit: 180 X ($200,000) + 180 X ($67,000) + $2,100 X ($700) = $49.5 million. %COSTS %Sales: 11 million cars, 1.5 million light trucks. %Unit Cost: $11 per car, $11 per truck. %Total Cost: 11,000,000 X ($11) + 1,500,000 X ($11) = $137 million. %\subsection{Production FMEA : Example Ford Pinto : 1975} \subsection{PFMEA Example: Ford Pinto: 1975} \begin{figure}[h] \centering \includegraphics[width=300pt]{./CH2_FMEA/ad_ford_pinto_mpg_red_3_1975.jpg} % ad_ford_pinto_mpg_red_3_1975.jpg: 720x933 pixel, 96dpi, 19.05x24.69 cm, bb=0 0 540 700 \caption{Ford Pinto Advert} \label{fig:fordpintoad} \end{figure} \begin{figure}[h] \centering \includegraphics[width=300pt]{./CH2_FMEA/burntoutpinto.png} % burntoutpinto.png: 376x250 pixel, 72dpi, 13.26x8.82 cm, bb=0 0 376 250 \caption{Burnt Out Pinto} \label{fig:burntoutpinto} \end{figure} \begin{table}[ht] \caption{FMEA Calculations} % title of Table %\centering % used for centering table \begin{tabular}{|| l | l | c | c | l ||} \hline \textbf{Failure Mode} & \textbf{P} & \textbf{Cost} & \textbf{Symptom} & \textbf{RPN} \\ \hline \hline relay 1 n/c & $1*10^{-5}$ & 38.0 & indicators fail & 0.00038 \\ \hline relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\ ruptured f.tank & & & allow & \\ \hline rear end crash & $1$ & $11$ & recall & 11.0 \\ ruptured f.tank & & & fix tank & \\ \hline \hline \end{tabular} \end{table} % don't think this is relevant for the thesis: http://www.youtube.com/watch?v=rcNeorjXMrE \section{FMECA - Failure Modes Effects and Criticality Analysis} \subsection{ FMECA - Failure Modes Effects and Criticallity Analysis} \begin{figure} \centering %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg} \includegraphics[width=300pt]{./CH2_FMEA/A10_thunderbolt.jpg} % military-aircraft-desktop-computer-wallpaper-missile-launch.jpg: 1024x768 pixel, 300dpi, 8.67x6.50 cm, bb=0 0 246 184 \caption{A10 Thunderbolt} \label{fig:f16missile} \end{figure} Emphasis on determining criticality of failure. Applies some Bayesian statistics (probabilities of component failures and those thereby causing given system level failures). \subsection{ FMECA - Failure Modes Effects and Criticality Analysis} Very similar to PFMEA, but instead of cost, a criticality or seriousness factor is ascribed to putative top level incidents. FMECA has three probability factors for component failures. \textbf{FMECA ${\lambda}_{p}$ value.} This is the overall failure rate of a base component. This will typically be the failure rate per million ($10^6$) or billion ($10^9$) hours of operation. reference MIL1991. \textbf{FMECA $\alpha$ value.} The failure mode probability, usually denoted by $\alpha$ is the probability of a particular failure~mode occurring within a component. reference FMD-91. %, should it fail. %A component with N failure modes will thus have %have an $\alpha$ value associated with each of those modes. %As the $\alpha$ modes are probabilities, the sum of all $\alpha$ modes for a component must equal one. \subsection{ FMECA - Failure Modes Effects and Criticality Analysis} \textbf{FMECA $\beta$ value.} The second probability factor $\beta$, is the probability that the failure mode will cause a given system failure. This corresponds to `Bayesian' probability, given a particular component failure mode, the probability of a given system level failure. \textbf{FMECA `t' Value} The time that a system will be operating for, or the working life time of the product is represented by the variable $t$. %for probability of failure on demand studies, %this can be the number of operating cycles or demands expected. \textbf{Severity `s' value} A weighting factor to indicate the seriousness of the putative system level error. %Typical classifications are as follows:~\cite{fmd91} \begin{equation} C_m = {\beta} . {\alpha} . {{\lambda}_p} . {t} . {s} \end{equation} Highest $C_m$ values would be at the top of a `to~do' list for a project manager. \section{FMEDA - Failure Modes Effects and Diagnostic Analysis} \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} % \begin{figure} % \centering % \includegraphics[width=200pt]{./SIL.png} % % SIL.jpg: 350x286 pixel, 72dpi, 12.35x10.09 cm, bb=0 0 350 286 % \caption{SIL requirements} % \end{figure} \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} \begin{itemize} \item \textbf{Statistical Safety} Safety Integrity Level (SIL) standards (EN61508/IOC5108). \item \textbf{Diagnostics} Diagnostic or self checking elements modelled \item \textbf{Complete Failure Mode Coverage} All failure modes of all components must be in the model \item \textbf{Guidelines} To system architectures and development processes \end{itemize} FMEDA is the methodology behind statistical (safety integrity level) type standards (EN61508/IOC5108). It provides a statistical overall level of safety and allows diagnostic mitigation for self checking etc. It provides guidelines for the design and architecture of computer/software systems for the four levels of safety Integrity. %For Hardware % FMEDA does force the user to consider all hardware components in a system by requiring that a MTTF value is assigned for each failure~mode; the MTTF may be statistically mitigated (improved) if it can be shown that self-checking will detect failure modes. For software it provides procedural quality guidelines and constraints (such as forbidding certain programming languages and/or features. \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} \label{sec:FMEDA} \textbf{Failure Mode Classifications in FMEDA.} \begin{itemize} \item \textbf{Safe or Dangerous} Failure modes are classified SAFE or DANGEROUS \item \textbf{Detectable failure modes} Failure modes are given the attribute DETECTABLE or UNDETECTABLE \item \textbf{Four attributes to Failure Modes} All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU) \item \textbf{Four statistical properties of a system} \\ $ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$ \end{itemize} % Failure modes are classified as Safe or Dangerous according % to the putative system level failure they will cause. % The Failure modes are also classified as Detected or % Undetected. % This gives us four level failure mode classifications: % Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU), % and the probabilistic failure rate of each classification % is represented by lambda variables % (i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} \textbf{Diagnostic Coverage.} The diagnostic coverage is simply the ratio of the dangerous detected probabilities against the probability of all dangerous failures, and is normally expressed as a percentage. $\Sigma\lambda_{DD}$ represents the percentage of dangerous detected base component failure modes, and $\Sigma\lambda_D$ the total number of dangerous base component failure modes. $$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$ \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} The \textbf{diagnostic coverage} for safe failures, where $\Sigma\lambda_{SD}$ represents the percentage of safe detected base component failure modes, and $\Sigma\lambda_S$ the total number of safe base component failure modes, is given as $$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$ \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} \textbf{Safe Failure Fraction.} A key concept in FMEDA is Safe Failure Fraction (SFF). This is the ratio of safe and dangerous detected failures against all safe and dangerous failure probabilities. Again this is usually expressed as a percentage. $$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$ SFF determines how proportionately fail-safe a system is, not how reliable it is ! Weakness in this philosophy; adding extra safe failures (even unused ones) improves the SFF. \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} To achieve SIL levels, diagnostic coverage and SFF levels are prescribed along with hardware architectures and software techniques. The overall the aim of SIL is classify the safety of a system, by statistically determining how frequently it can fail dangerously. \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} { \begin{table}[ht] \caption{FMEA Calculations} % title of Table %\centering % used for centering table \begin{tabular}{|| l | l | c | c | l ||} \hline \textbf{SIL} & \textbf{Low Demand} & \textbf{Continuous Demand} \\ & Prob of failing on demand & Prob of failure per hour \\ \hline \hline 4 & $ 10^{-5}$ to $< 10^{-4}$ & $ 10^{-9}$ to $< 10^{-8}$ \\ \hline 3 & $ 10^{-4}$ to $< 10^{-3}$ & $ 10^{-8}$ to $< 10^{-7}$ \\ \hline 2 & $ 10^{-3}$ to $< 10^{-2}$ & $ 10^{-7}$ to $< 10^{-6}$ \\ \hline 1 & $ 10^{-2}$ to $< 10^{-1}$ & $ 10^{-6}$ to $< 10^{-5}$ \\ \hline \hline \end{tabular} \end{table} Table adapted from EN61508-1:2001 [7.6.2.9 p33] \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} FMEDA is a modern extension of FMEA, in that it will allow for self checking features, and provides detailed recommendations for computer/software architecture. It has a simple final result, a Safety Integrity Level (SIL) from 1 to 4 (where 4 is safest). %FMEA can be used as a term simple to mean Failure Mode Effects Analysis, and is %part of product approval for many regulated products in the EU and the USA... \section{FMEA used for Safety Critical Approvals} \subsection{DESIGN FMEA: Safety Critical Approvals FMEA} \begin{figure}[h] \centering \includegraphics[width=300pt,keepaspectratio=true]{./CH2_FMEA/tech_meeting.png} % tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72 \caption{FMEA Meeting} \label{fig:tech_meeting} \end{figure} Static FMEA, Design FMEA, Approvals FMEA Experts from Approval House and Equipment Manufacturer discuss selected component failure modes judged to be in critical sections of the product. \subsection{DESIGN FMEA: Safety Critical Approvals FMEA} % \begin{figure}[h] % \centering % \includegraphics[width=70pt,keepaspectratio=true]{./tech_meeting.png} % % tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72 % \caption{FMEA Meeting} % \label{fig:tech_meeting} % \end{figure} \begin{itemize} \item Impossible to look at all component failures let alone apply FMEA rigorously. \item In practise, failure scenarios for critical sections are contested, and either justified or extra safety measures implemented. \item Often Meeting notes or minutes only. Unusual for detailed arguments to be documented. \end{itemize} \section{Software FMEA (SFMEA)} \paragraph{Current work on Software FMEA} SFMEA usually does not seek to integrate hardware and software models, but to perform FMEA on the software in isolation~\cite{procsfmea}. % Work has been performed using databases to track the relationships between variables and system failure modes~\cite{procsfmeadb}, to %work has been performed to introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately, some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive) and FMEA (bottom-up inductive) to be performed on the same system to provide insight into the software hardware/interface~\cite{embedsfmea}. % Although this would give a better picture of the failure mode behaviour, it is by no means a rigorous approach to tracing errors that may occur in hardware through to the top (and therefore ultimately controlling) layer of software. \subsection{Current FMEA techniques are not suitable for software} The main FMEA methodologies are all based on the concept of taking base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}. % In a complicated system, mapping a component failure mode to a system level failure will mean a long reasoning distance; that is to say the actions of the failed component will have to be traced through several sub-systems, gauging its effects with and on other components. % With software at the higher levels of these sub-systems, we have yet another layer of complication. % %In order to integrate software, %in a meaningful way %we need to re-think the %FMEA concept of simply mapping a base component failure to a system level event. % SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}. The failure modes of these variables, are that they could become erroneously over-written, calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or external influences such as ionising radiation causing bits to be erroneously altered. \paragraph{A more-complete Failure Mode Model} % HFMEA % SFMEA % VARIABLE CURRUPTION % MICRO PROCESSOR FAULTS % INTERFACE ANALYSIS % % add them all together --- a load of bollocks, lots of impressive inches of reports that no one will be bothered to read.... % In order to obtain a more complete failure mode model of a hybrid electronic/software system we need to analyse the hardware, the software, the hardware the software runs on (i.e. the software's medium), and the software/hardware interface. % HFMEA is a well established technique and needs no further description in this paper. \section{Example for analysis} % : How can we apply FMEA} For the purpose of example, we chose a simple common safety critical industrial circuit that is nearly always used in conjunction with a programmatic element. A common method for delivering a quantitative value in analogue electronics is to supply a current signal to represent the value to be sent~\cite{aoe}[p.934]. Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale, and this is referred to as {\ft} signalling. % {\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20]. Thus resistance in the wires between the source and the receiving end is not an issue that can alter the accuracy of the signal. % This circuit has many advantages for safety. If the signal becomes disconnected it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range, and is therefore easy to detect as an error rather than an incorrect value. % Should the driving electronics go wrong at the source end, it will usually supply far too little or far too much current, making an error condition easy to detect. % At the receiving end, one needs a resistor to convert the current signal into a voltage that we can read with an ADC.% %we only require one simple component to convert the %BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP \begin{figure}[h] \centering \includegraphics[width=250pt]{./CH2_FMEA/ftcontext.png} % ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385 \caption{Context Diagram for {\ft} loop} \label{fig:ftcontext} \end{figure} The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending a {\ft} signal to a micro-controller system. The signal is locally driven over a load resistor, and then read into the micro-controller via an ADC and its multiplexer. With the voltage detected at the ADC the multiplexer we read the intended quantitative value from the external equipment. \subsection{Simple Software Example} Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$) representing the value intended by the current detected, with an additional error indication flag to indicate the validity of the value returned. % Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage from an ADC into the software. Let us define any value outside the 4mA to 20mA range as an error condition. % As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $$0.004A * \ohms{220} = 0.88V $$ and $$0.020A * \ohms{220} = 4.4V \;.$$ % Our acceptable voltage range is therefore % $$(V \ge 0.88) \wedge (V \le 4.4) \; .$$ This voltage range forms our input requirement. % We can now examine a software function that performs a conversion from the voltage read to a per~mil representation of the {\ft} input current. % For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is used\footnote{ C coding examples use the Misra~\cite{misra} and SIL-3 recommended language constraints~\cite{en61508}.}. We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision value representing the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}). %%{\vbox{ \begin{figure}[h+] \footnotesize \begin{verbatim} /***********************************************/ /* read_4_20_input() */ /***********************************************/ /* Software function to read 4mA to 20mA input */ /* returns a value from 0-999 proportional */ /* to the current input. */ /***********************************************/ int read_4_20_input ( int * value ) { double input_volts; int error_flag; /* set ADC MUX with input to read from */ input_volts = read_ADC(INPUT_4_20_mA); if ( input_volts < 0.88 || input_volts > 4.4 ) { error_flag = 1; /* Error flag set to TRUE */ } else { *value = (input_volts - 0.88) * ( 4.4 - 0.88 ) * 999.0; error_flag = 0; /* indicate current input in range */ } /* ensure: value is proportional (0-999) to the 4 to 20mA input */ return error_flag; } \end{verbatim} %} %} \caption{Software Function: \textbf{read\_4\_20\_input}} \label{fig:code_read_4_20_input} %\label{fig:420i} \end{figure} We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a voltage for a given ADC channel. % This function deals directly with the hardware in the micro-controller on which we are running the software. % Its job is to select the correct channel (ADC multiplexer) and then to initiate a conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}). % It takes the raw ADC reading and converts it into a floating point\footnote{the type `double' or `double precision' is a standard C language floating point type~\cite{DBLP:books/ph/KernighanR88}.} voltage value. %{\vbox{ \begin{figure}[h+] \footnotesize \begin{verbatim} /***********************************************/ /* read_ADC() */ /***********************************************/ /* Software function to read voltage from a */ /* specified ADC MUX channel */ /* Assume 10 ADC MUX channels 0..9 */ /* ADC_CHAN_RANGE = 9 */ /* Assume ADC is 12 bit and ADCRANGE = 4096 */ /* returns voltage read as double precision */ /***********************************************/ double read_ADC( int channel ) { int timeout = 0; /* return out of range result */ /* if invalid channel selected */ if ( channel > ADC_CHAN_RANGE ) return -2.0; /* set the multiplexer to the desired channel */ ADCMUX = channel; ADCGO = 1; /* initiate ADC conversion hardware */ /* wait for ADC conversion with timeout */ while ( ADCGO == 1 || timeout < 100 ) timeout++; if ( timeout < 100 ) dval = (double) ADCOUT * 5.0 / ADCRANGE; else dval = -1.0; /* indicate invalid reading */ /* return voltage as a floating point value */ /* ensure: value is voltage input to within 0.1% */ return dval; } \end{verbatim} \caption{Software Function: \textbf{read\_ADC}} \label{fig:code_read_ADC} \end{figure} %} %} We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input} calls {\em read\_ADC}, which in turn interacts with the hardware/electronics. %shown in figure~\ref{fig:ct1}. % % \begin{figure}[h] % \centering % \includegraphics[width=56pt]{./ct1.png} % % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224 % \caption{Call tree for software example} % \label{fig:ct1} % \end{figure} % This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the software is reading values from the `lower~level' electronics. % %FMEA is always a bottom-up process and so we must begin with this hardware. % The hardware is simply a load resistor, connected across an ADC input pin on the micro-controller and ground. % We can identify the resistor and the ADC module of the micro-controller as the base components in this design. % We now apply FMMD starting with the hardware. \section{Failure Mode effects Analysis} Four emerging and current techniques are now used to apply FMEA to the hardware, the software, the software medium and the software hardware insterface. \subsection{Hardware FMEA} The hardware FMEA requires that for each component we consider all failure modes and the putative effect those failure modes would have on the system. The electronic components in our {\ft} system are the load resistor, the multiplexer and the analogue to digital converter. { \tiny \begin{table}[h+] \caption{Hardware FMEA {\ft}} % title of Table \label{tbl:r420i} \begin{tabular}{|| l | c | l ||} \hline \textbf{Failure} & \textbf{failure} & \textbf{System} \\ \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline \hline $R$ & OPEN~\cite{en298}[Ann.A] & $LOW$ \\ & & $READING$ \\ \hline $R$ & SHORT~\cite{en298}[Ann.A] & $HIGH$ \\ & & $READING$ \\ \hline $MUX$ & read wrong & $VAL\_ERROR$ \\ & input ~\cite{fmd91}[3-102] & \\ \hline $ADC$ & ADC output & $VAL\_ERROR$ \\ & erronous ~\cite{fmd91}[3-109] & \\ \hline \hline \end{tabular} \end{table} } The last two failures both lead to the system failure of $VAL\_ERROR$ . They could lead to low or high reading as well, but we would only be able to determine this from knowledge of the software systems criteria for these. \clearpage \subsection{Software FMEA - variables in place of components} For software FMEA, we take the variables used by the system, and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}. From the function $read\_4\_20\_input()$ we have the variables $error\_flag$, $input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$. We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}. { \tiny \begin{table}[h+] \caption{SFMEA {\ft}} % title of Table \label{tbl:sfmea} \begin{tabular}{|| l | c | l ||} \hline \textbf{Failure} & \textbf{failure} & \textbf{System} \\ \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline \hline $error\_flag$ & set FALSE & $VAL\_ERROR$ \\ & & \\ \hline $error\_flag$ & set TRUE & invalid \\ & & error flag \\ \hline $input\_volts$ & corrupted & $VAL\_ERROR$ \\ & & \\ \hline $value $ & corrupted & $VAL\_ERROR$ \\ & & \\ \hline $timeout $ & corrupted & $VAL\_ERROR$ \\ & & \\ \hline $ADCMUX $ & corrupted & $VAL\_ERROR$ \\ & & \\ \hline $ADCGO $ & corrupted & $VAL\_ERROR$ \\ & & \\ \hline $dval $ & corrupted & $VAL\_ERROR$ \\ & & \\ \hline \hline \end{tabular} \end{table} xe } \clearpage \subsection{Software FMEA - failure modes of the medium ($\mu P$) of the software} Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and oscillator clock timing { \tiny \begin{table}[h+] \caption{SFMEA {\ft}} % title of Table \label{tbl:sfmeaup} \begin{tabular}{|| l | c | l ||} \hline \textbf{Failure} & \textbf{failure} & \textbf{System} \\ \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline \hline $RAM$ & variable & All errors \\ & corruption & from table~\ref{tbl:sfmea} \\ \hline $RAM$ & proxegram flow & process \\ & & halts / crashes \\ \hline $OSC$ & stopped & process \\ & & halts \\ \hline $OSC$ & too & ADC \\ & fast & value errors \\ \hline $OSC$ & too & ADC \\ & slow & value errors \\ \hline $ROM$ & program & All errors \\ & corruption & from table~\ref{tbl:sfmea} \\ \hline $ROM$ & constant & All errors \\ & /data corruption & from table~\ref{tbl:sfmea} \\ \hline \hline \end{tabular} \end{table} } \clearpage \subsection{Software FMEA - The software/hardware interface} As FMEA is applied separately to software and hardware the interface between them is an undefined factor. Ozarin~\cite{sfmeainterface,procsfmea} recommends that an FMEA report be written to focus on the software/hardware interface. The software/hardware interface has specific problems common to many systems and configurations and these are described in~\cite{sfmeainterface}. %An interface FMEA is performed in table~\ref{hwswinterface}. % The hardware to software interface for the {\ft} example is handled by the 'C' function $read\_ADC()$ (see code sample in figure~\ref{fig:code_read_ADC}). % % An FMEA of the `software~medium' is given in table~\ref{tbl:sfmeaup}. \paragraph{Timing and Synchronisation.} The $ADCOUT$ register, where the raw ADC value is read is an internal register used by the ADC and presented as a readable memory location when the ADC has finished updating it. Reading it at the wrong time would cause an invalid value to be read. The synchronisation is performed by polling an $ADCGO$ bit, a flag mapped to memory by which the ADC indicates that the data is ready. \paragraph{Interrupt Contention.} Were an interrupt to also attempt to read from the ADC the ADCMUX could be altered, causing the non-interrupt routine to read from the wrong channel. \paragraph{Data Formatting.} The ADC may use a big-endian or little endian integer format. It may also right or left justify the bits in its value. \subsection{SFMEA Conclusion} % This paper has picked a very simple example (the industry standard {\ft} input circuit and software) to demonstrate SFMEA and HFMEA methodologies used to describe a failure mode model. %Even a modest system would be far too large to analyse in conference paper %and this % %The {\dc} representing the {\ft} reader %shows that by taking a %modular approach for FMEA, i.e. FMMD, we can integrate Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from model the system from several failure mode perspectives. % With traditional FMEA methods the reasoning~distance is large, because it stretches from the component failure mode to the top---or---system level failure. % With these four analysis reports we do not have stages along the `reasoning~path' linking the failure modes from the electronics to those in the software. %Software is often written `defensively' but t %Each {\fg} to {\dc} transition represents a %reasoning stage. % % %For this reason applying traditional FMEA to software stretches %the reasoning distance even further. % In fact many these reasoning paths overlap---or even by-pass one another--- it is very difficult to gauge cause and effect. For instance, hardware failures are not analysed in the context of how they will be handled (or missed) by the software. % System outputs commanded from software may not take into account particular hardware limitations etc. The interface FMEA does serve to provide a useful check-list to ensure data and synchronisation conventions used by the hardware and software are not mismatched. However, the fact it is perceived as required %The fact its required highlights the the miss-matches possible between the two types of analysis which could run deeper than the mere interface level. However, while these techniques ensure that the software and hardware is viewed and analysed from several perspectives, it cannot be termed a homogeneous failure mode model. % For instance % were the ADC to have a small value error, say adding % a small percentage onto the value, we would be unable to % detect this under the analysis conditions for this model, or % be able to pinpoint it. % \section{Conclusion} FMEA useful tool for basic safety --- provides statistics on safety where field data impractical --- very good with single failure modes linked to top level events. FMEA has become part of the safety critical and safety certification industries. SFMEA is in its infancy, but there is a gap in current certification for software, EN61508, recommends hardware redundancy architectures in conjunction with FMEDA for hardware: for software it recommends language constraints and quality procedures but no inductive fault finding technique.