\documentclass{beamer} \title[Failure Mode Effects Analysis]{Failure Mode Effects Analysis\\A critical view} \usetheme{Warsaw} \usepackage[latin1]{inputenc} \author{Robin Clark -- Energy Technology Control Ltd} \institute{Brighton University} \setbeamertemplate{footline}[page number] \begin{document} \section{F.M.E.A.} \begin{frame} \frametitle{Outline} \tableofcontents[currentsection] \end{frame} \begin{frame} \begin{itemize} \pause \item Failure \pause \item Mode \pause \item Effects \pause \item Analysis \end{itemize} \end{frame} % % \begin{itemize} % \item Failure % \item Mode % \item Effects % \item Analysis % \end{itemize} \subsection{FMEA basic concept} \begin{frame} \begin{itemize} \pause \item \textbf{F - Failures of given component} Consider a component in a system \pause \item \textbf{M - Failure Mode} Look at one of the ways in which it can fail (i.e. determine a component `failure~mode') \pause \item \textbf{E - Effects} Determine the effects this failure mode will cause to the system we are examining \pause \item \textbf{A - Analysis} Analyse how much impact this symptom will have on the environment/people/the system its-self \end{itemize} \end{frame} \begin{frame} Example: Let us consider a system, in this case a milli-volt reader, consisting of instrumentation amplifiers connected to a micro-processor that reports its readings via RS-232. Let us perform an FMEA and consider how one of its resistors failing could affect it. For the sake of example let us choose a resistor in an OP-AMP reading the milli-volt source and that if it were to go open, we would have a gain of 1 from the amplifier. \begin{itemize} \pause \item \textbf{F - Failures of given component} The resistor could fail by going OPEN or SHORT (EN298 definition). \pause \item \textbf{M - Failure Mode} Consider the component failure mode OPEN \pause \item \textbf{E - Effects} This will disconnect the feedback loop in the amplifier causing a LOW READING \pause \item \textbf{A - Analysis} The reading will be out of normal range, and we will have an erroneous milli-volt reading \end{itemize} \end{frame} \begin{frame} Note here that we have had to look at the failure~mode in relation to the entire circuit. We have used intuition to determine the probable effect of this failure mode. We have not examined this failure mode against every other component in the system. Perhaps we should.... this would be a more rigorous and complete approach in looking for system failures. \end{frame} \subsection{Rigorous FMEA - State Explosion} \begin{frame} \frametitle{Rigorous Single Failure FMEA} Consider the analysis where we look at all the failure modes in a system, and then see how they can affect all other components within it. We need to look at a large number of failure scenarios to do this completely (all failure modes against all components). This is represented in equation~\ref{eqn:fmea_state_exp}, where $N$ is the total number of components in the system, and $cfm$ is the number of failure modes per component. \end{frame} \begin{frame} \frametitle{Rigorous Single Failure FMEA} \begin{equation} \label{eqn:fmea_single} N.(N-1).cfm % \\ %(N^2 - N).cfm \end{equation} This would mean an order of $N^2$ number of checks to perform to perform `rigorous~FMEA'. Even small systems have typically 100 components, and they typically have 3 or more failure modes each. $100*99*3=29,700$. \end{frame} \begin{frame} \frametitle{Rigorous Double Failure FMEA} For looking at potential double failure scenarios (two components failing within a given time frame) and the order becomes $N^3$. \begin{equation} \label{eqn:fmea_double} N.(N-1).(N-2).cfm % \\ %(N^2 - N).cfm \end{equation} $100*99*98*3=2,910,600$. The European Gas burner standard (EN298:2003), demands the checking of double failure scenarios (for burner lock-out scenarios). \end{frame} \section{FMEA used for Saftey Critical Aprovals} \begin{frame} \frametitle{Safety Critical Approvals FMEA} Experts from Approval House and Equipement Manufacturer discuss selected component failure modes judged to be in critical sections of the product. \begin{figure}[h] \centering \includegraphics[width=100pt,keepaspectratio=true]{./tech_meeting.png} % tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72 \caption{FMEA Meeting} \label{fig:tech_meeting} \end{figure} \end{frame} \begin{frame} \frametitle{Safety Critical Approvals FMEA} \begin{figure}[h] \centering \includegraphics[width=70pt,keepaspectratio=true]{./tech_meeting.png} % tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72 \caption{FMEA Meeting} \label{fig:tech_meeting} \end{figure} \begin{itemize} \pause \item Impossible to look at all component failures let alone apply FMEA rigorously. \pause \item In practise, failure scenarios for critical sections are contested, and either justified or extra safety measures implemented. \pause \item Meeting notes or minutes only. \end{itemize} \end{frame} \section{PFMEA - Production FMEA : 1940's to present} \begin{frame} Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of cost, problems to be addressed in product production. It focuses on known problems, determines the frequency they occur and their cost to fix. This is multiplied together and called an RPN number. Fixing problems with the highest RPN number will return most cost benefit. \end{frame} \begin{frame} % benign example of PFMEA in CARS - make something up. \frametitle{PFMEA Example} { \begin{table}[ht] \caption{FMEA Calculations} % title of Table %\centering % used for centering table \begin{tabular}{|| l | l | c | c | l ||} \hline \textbf{Failure Mode} & \textbf{P} & \textbf{Cost} & \textbf{Symptom} & \textbf{RPN} \\ \hline \hline relay 1 n/c & $1*10^{-5}$ & 38.0 & indicators fail & 0.00038 \\ \hline relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline % rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\ % ruptured f.tank & & & & \\ \hline \hline \end{tabular} \end{table} } %Savings: 180 burn deaths, 180 serious burn injuries, 2,100 burned vehicles. Unit Cost: $200,000 per death, $67,000 per injury, $700 per vehicle. %Total Benefit: 180 X ($200,000) + 180 X ($67,000) + $2,100 X ($700) = $49.5 million. %COSTS %Sales: 11 million cars, 1.5 million light trucks. %Unit Cost: $11 per car, $11 per truck. %Total Cost: 11,000,000 X ($11) + 1,500,000 X ($11) = $137 million. \end{frame} %\subsection{Production FMEA : Example Ford Pinto : 1975} \begin{frame} \frametitle{PFMEA Example: Ford Pinto: 1975} \begin{figure}[h] \centering \includegraphics[width=200pt]{./ad_ford_pinto_mpg_red_3_1975.jpg} % ad_ford_pinto_mpg_red_3_1975.jpg: 720x933 pixel, 96dpi, 19.05x24.69 cm, bb=0 0 540 700 \caption{Ford Pinto Advert} \label{fig:fordpintoad} \end{figure} \end{frame} \begin{frame} \frametitle{PFMEA Example: Ford Pinto: 1975} \begin{figure}[h] \centering \includegraphics[width=200pt]{./burntoutpinto.png} % burntoutpinto.png: 376x250 pixel, 72dpi, 13.26x8.82 cm, bb=0 0 376 250 \caption{Burnt Out Pinto} \label{fig:burntoutpinto} \end{figure} \end{frame} \begin{frame} \frametitle{PFMEA Example: Ford Pinto: 1975} { \begin{table}[ht] \caption{FMEA Calculations} % title of Table %\centering % used for centering table \begin{tabular}{|| l | l | c | c | l ||} \hline \textbf{Failure Mode} & \textbf{P} & \textbf{Cost} & \textbf{Symptom} & \textbf{RPN} \\ \hline \hline relay 1 n/c & $1*10^{-5}$ & 38.0 & indicators fail & 0.00038 \\ \hline relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\ ruptured f.tank & & & allow & \\ \hline rear end crash & $1$ & $11$ & recall & 11.0 \\ ruptured f.tank & & & fix tank & \\ \hline \hline \end{tabular} \end{table} } http://www.youtube.com/watch?v=rcNeorjXMrE \end{frame} \section{FMECA - Failure Modes Effects and Criticallity Analysis} \begin{frame} \frametitle{ FMECA - Failure Modes Effects and Criticallity Analysis} Very similar to PFMEA, but instead of cost, a criticallity or seriousness factor is ascribed to putative top level incidents. FMECA has three probability factors for component failures. \textbf{FMECA ${\lambda}_{p}$ value.} This is the overall failure rate of a base component. This will typically be the failure rate per million ($10^6$) or billion ($10^9$) hours of operation. \textbf{FMECA $\alpha$ value.} The failure mode probability, usually dentoted by $\alpha$ is the probability of is the probability of a particular failure mode occuring within a component. %, should it fail. %A component with N failure modes will thus have %have an $\alpha$ value associated with each of those modes. %As the $\alpha$ modes are probabilities, the sum of all $\alpha$ modes for a component must equal one. \end{frame} \begin{frame} \frametitle{ FMECA - Failure Modes Effects and Criticallity Analysis} \textbf{FMECA $\beta$ value.} The second probability factor $\beta$, is the probability that the failure mode will cause a given system failure. This corresponds to `Baysian' probability, given a particular component failure mode, the probability of a given system level failure. \textbf{FMECA `t' Value} The time that a system will be operating for, or the working life time of the product is represented by the variable $t$. %for probability of failure on demand studies, %this can be the number of operating cycles or demands expected. \textbf{Severity `s' value} A weighting factor to indicate the seriousness of the putative system level error. %Typical classifications are as follows:~\cite{fmd91} \begin{equation} C_m = {\beta} . {\alpha} . {{\lambda}_p} . {t} . {s} \end{equation} Highest $C_m$ values would be at the top of a `to~do' list for a project manager. \end{frame} \section{FMEDA - Failure Modes Effects and Diagnostic Analysis} \begin{frame} \frametitle{ FMEDA - Failure Modes Effects and Diagnostic Analysis} FMEDA is the methodology behind statistical (safety integrity level) type standards (EN61508/IOC5108). It provides a statistical overall level of safety and allows diagnostic mitigation for self checking etc. It provides guidelines for the design and architecture of computer/software systems for the four levels of safety Integrity. For Hardware FMEDA does force the user to consider all components in a system by requiring that a MTTF value is assigned. This MTTF may be statistically mitigated (improved) if it can be shown that selfchecking will detect failure modes. \end{frame} \begin{frame} Failure modes are classified as Safe or Dangerous according to the putative system level failure they will cause. The Failure modes are also classified as Detected or Undetected. This gives us four level failure mode classifications: Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU), and the probablistic failure rate of each classification is represented by lambda variables (i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). \end{frame} \begin{frame} \textbf{Diagnostic Coverage.} The diagnostic coverage is simply the ratio of the dangerous detected probabilities against the probability of all dangerous failures, and is normally expressed as a percentage. $\Sigma\lambda_{DD}$ represents the percentage of dangerous detected base component failure modes, and $\Sigma\lambda_D$ the total number of dangerous base component failure modes. $$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$ \end{frame} \begin{frame} The diagnostic coverage for safe failures, where $\Sigma\lambda_{SD}$ represents the percentage of safe detected base component failure modes, and $\Sigma\lambda_S$ the total number of safe base component failure modes, is given as $$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$ \textbf{Safe Failure Fraction.} A key concept in FMEDA is Safe Failure Fraction (SFF). This is the ratio of safe and dangerous detected failures against all safe and dangerous failure probabilities. Again this is usually expressed as a percentage. $$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$ \end{frame} \begin{frame} SIL Levels are how they are calculated \end{frame} \section{FMEA - General Criticism} \begin{frame} \frametitle{FMEA - General Criticism} \begin{itemize} \pause \item Reasoning Distance - component failure to system level symptom \pause \item State explosion - impossible to perform rigorously \pause \item Difficult to re-use previous analysis work \pause \item FMEA type methodologies were designed for simple electro-mechanical systems of the 1940's to 1960's. \end{itemize} FMEDA is a modern extension of FMEA, in that it will allow for self checking features, and provides detailed recommendations for computer/software architecture, but \end{frame} \section{Failure Mode Modular De-Composition} \subsection{FMEA and complexity of each failure scenario analysis} \begin{frame} Consider the FMEA type methodologies where we look at all the failure modes in a system, and then see how they can affect all other components within it, to determine its system level symptom or failure mode. We need to look at a large number of failure scenarios to do this completely (all failure modes against all components). This is represented in equation~\ref{eqn:fmea_state_exp}, where $N$ is the total number of components in the system, and $cfm$ is the number of failure modes per component. \begin{equation} \label{eqn:fmea_state_exp} N.(N-1).cfm % \\ %(N^2 - N).cfm \end{equation} The FMMD methodology breaks the analysis down into small stages, by making the analyst choose functional groups, and then when analysed the groups are treated as components to be used for a higher stage. This is designed to address the state explosion (where $O$ is order of complexity) $O=N^2$ inherent in equation~\ref{eqn:fmea_state_exp}. \end{frame} We can view the functional groups in FMMD as forming a hierarchy. If for the sake of example we consider each functional group to be three components, figure~\ref{fig:three_tree} shows how the levels work and converge to a top or system level. \begin{figure} \centering \includegraphics[width=300pt]{./three_tree.png} % three_tree.png: 780x226 pixel, 72dpi, 27.52x7.97 cm, bb=0 0 780 226 \caption{Functional Group Tree example} \label{fig:three_tree} \end{figure} \clearpage We can represent the number of failure scenarios to check in an FMMD hierarchy with equation~\ref{eqn:anscen}. \begin{equation} \label{eqn:anscen} \sum_{n=0}^{L} {fgn}^{n}.fgn.cfm.(fgn-1) \end{equation} Where $fgn$ is the number of components in each functional group, and $cfm$ is the number of failure modes per component and L is the number of levels, the number of analysis scenarios to consider is show in equation~\ref{eqn:anscen}. So for a very simple analysis with three components forming a functional group where each component has three failure modes, we have only one level (zero'th). So to check every failure modes against the other components in the functional group requires 18 checks. \begin{equation} \label{eqn:anscen2} \sum_{n=0}^{0} {3}^{0}.3.3.(3-1) = 18 \end{equation} \clearpage In other words, we have three components in our functional group, and nine failure modes to consider. So taking each failure mode and looking at how that could affect the functional group, we must compare each failure mode against the two other components (the `$fgn-1$' term). For the one `zero' level FMMD case we are doing the same thing as FMEA type analysis (but on a very simple small sub-system). We are looking at how each failure~mode can effect the system/top level. We can use equation~\ref{eqn:fmea_state_exp44} to represent the number of checks to rigorously perform FMEA, where $N$ is the total number of components in the system, and $cfm$ is the number of failures per component. Where $N=3$ and $cfm=3$ we can see that the number of checks for this simple functional group is the same for equation~\ref{eqn:fmea_state_exp22} and equation~\ref{eqn:anscen}. \clearpage \section{Example} To see the effects of reducing `state~explosion' we need to look at a larger system. Let us take a system with 3 levels and apply these formulae. Having three levels (in addition to the top zero'th level) will require 81 base level components. $$ %\begin{equation} \label{eqn:fmea_state_exp22} 81.(81-1).3 = 19440 % \\ %(N^2 - N).cfm %\end{equation} $$ $$ %\begin{equation} % \label{eqn:anscen} \sum_{n=0}^{3} {3}^{n}.3.3.(2) = 720 %\end{equation} $$ Thus for FMMD we needed to examine 720 failure mode scenarios, and for traditional FMEA type analysis methods 19440. % In practical example followed through, no more than 9 components have ever been required for a functional % group and the largest known number of failure modes has been 6. % If we take these numbers and double them (18 components per functional group % and 12 failure modes per component) and apply the formulas for a 4 level analysis % (i.e. \clearpage Note that for all possible double simultaneous failures the equation~\ref{eqn:fmea_state_exp} becomes equation~\ref{eqn:fmea_state_exp2} essentially making the order $N^3$. The FMMD case (equation~\ref{eqn:anscen2}), is cubic within the functional groups only, not all the components in the system. \begin{equation} \label{eqn:fmea_state_exp2} N.(N-1).(N-2).cfm % \\ %(N^2 - N).cfm \end{equation} \begin{equation} \label{eqn:anscen2} \sum_{n=0}^{L} {fgn}^{n}.fgn.cfm.(fgn-1).(fgn-2) \end{equation} \end{document}