644 lines
25 KiB
TeX
644 lines
25 KiB
TeX
|
|
|
|
The generic and statistical European Safety Standard, EN61508:6\cite{en61508}[B.6.6]
|
|
describes Failure Mode Effect Analysis (FMEA) as:
|
|
\begin{quotation}
|
|
"To analyse a system design, by examining all possible sources of failure
|
|
of a system's components and determining the effects of these failures
|
|
on the behaviour and safety of the system."
|
|
\end{quotation}.
|
|
|
|
\section{Concepts}
|
|
|
|
\paragraph{Forward and backward searches}
|
|
|
|
A forward search starts with possible failure causes
|
|
and uses logic and reasoning to determine system level outcomes.
|
|
A backward search starts with system level events
|
|
works back down (and not necessarily to
|
|
base components in a system) using de-composition of
|
|
of the system and logic.
|
|
FMEA based methodologies are forward searches\cite{Lutz:1997:RAU:590564.590572} and top down
|
|
methodologies such as FTA~\cite{nucfta,nasafta}
|
|
|
|
\paragraph{Reasoning distance}
|
|
A reasoning distance is the number of stages of logic and reasoning
|
|
required to map a failure cause to its potential outcomes.
|
|
%.... general concept... simple ideas about how complex a
|
|
%failure analysis is the more modules and components are involved
|
|
% cite for forward and backward search related to safety critical software
|
|
%{sfmeaforwardbackward}
|
|
|
|
\section{FMEA}
|
|
|
|
%\subsection{FMEA}
|
|
%\tableofcontents[currentsection]
|
|
|
|
|
|
FMEA is a broad term; it could mean anything from an informal check on how
|
|
how failures could affect some equipment in an initial brain-storming session
|
|
in product design, to formal submission as part of safety critical certification.
|
|
%
|
|
This chapter describes basic concepts of FMEA, uses a simple example to
|
|
demonstrate a single FMEA analysis stage, describes the four main variants of FMEA in use today
|
|
and explores some concepts with which we can discuss and evaluate
|
|
the effectiveness of FMEA.
|
|
|
|
|
|
% \subsection{FMEA}
|
|
% This talk introduces Failure Mode Effects Analysis, and the different ways it is applied.
|
|
% These techniques are discussed, and then
|
|
% a refinement is proposed, which is essentially a modularisation of the FMEA process.
|
|
% %
|
|
%
|
|
% \begin{itemize}
|
|
% \item Failure
|
|
% \item Mode
|
|
% \item Effects
|
|
% \item Analysis
|
|
% \end{itemize}
|
|
%
|
|
%
|
|
%
|
|
% % % \begin{itemize}
|
|
% % \item Failure
|
|
% % \item Mode
|
|
% % \item Effects
|
|
% % \item Analysis
|
|
% % \end{itemize}
|
|
|
|
\clearpage
|
|
\paragraph{FMEA basic concept.}
|
|
|
|
|
|
\begin{itemize}
|
|
|
|
\item \textbf{F - Failures of given component} Consider a component in a system
|
|
\item \textbf{M - Failure Mode} Look at one of the ways in which it can fail (i.e. determine a component `failure~mode')
|
|
\item \textbf{E - Effects} Determine the effects this failure mode will cause to the system we are examining
|
|
\item \textbf{A - Analysis} Analyse how much impact this symptom will have on the environment/people/the system itsself
|
|
\end{itemize}
|
|
|
|
|
|
|
|
FMEA is a procedure based on the low level components of a system, and an example
|
|
analysis will serve to demonstrate it in practise.
|
|
|
|
\paragraph{ FMEA Example: Milli-volt reader}
|
|
Example: Let us consider a system, in this case a milli-volt reader, consisting
|
|
of instrumentation amplifiers connected to a micro-processor
|
|
that reports its readings via RS-232.
|
|
\begin{figure}
|
|
\centering
|
|
\includegraphics[width=175pt]{./CH2_FMEA/mvamp.png}
|
|
% mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{FMEA Example: Milli-volt reader}
|
|
Let us perform an FMEA and consider how one of its resistors failing could affect
|
|
it.
|
|
For the sake of example let us choose resistor R1 in the OP-AMP gain circuitry.
|
|
% \begin{figure}
|
|
% \centering
|
|
% \includegraphics[width=175pt]{./mvamp.png}
|
|
% % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403
|
|
% \end{figure}
|
|
|
|
|
|
|
|
\paragraph{FMEA Example: Milli-volt reader}
|
|
% \begin{figure}
|
|
% \centering
|
|
% \includegraphics[width=80pt]{./mvamp.png}
|
|
% % mvamp.png: 561x403 pixel, 72dpi, 19.79x14.22 cm, bb=0 0 561 403
|
|
% \end{figure}
|
|
\begin{itemize}
|
|
\item \textbf{F - Failures of given component} The resistor (R1) could fail by going OPEN or SHORT (EN298 definition).
|
|
\item \textbf{M - Failure Mode} Consider the component failure mode SHORT
|
|
\item \textbf{E - Effects} This will drive the minus input LOW causing a HIGH OUTPUT/READING
|
|
\item \textbf{A - Analysis} The reading will be out of the normal range, and we will have an erroneous milli-volt reading
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
The analysis above has given us a result for one failure scenario i.e.
|
|
for one component failure mode.
|
|
A complete FMEA report would have to contain an entry
|
|
for each failure mode of all the components in the system under investigation.
|
|
%
|
|
Note here that we have had to look at the failure~mode
|
|
in relation to the entire circuit.
|
|
We have used intuition to determine the probable
|
|
effect of this failure mode.
|
|
For instance we have assumed that the resistor R1 going SHORT
|
|
will not affect the ADC, the Microprocessor or the UART.
|
|
%
|
|
To put this in more general terms, have not examined this failure mode
|
|
against every other component in the system.
|
|
Perhaps we should: this would be a more rigorous and complete
|
|
approach in looking for system failures.
|
|
|
|
|
|
\section{Theoretical Concepts in FMEA}
|
|
|
|
|
|
\subsection{The unacceptability of a single component failure causing a catastrophe}
|
|
|
|
FMEA, due to its inductive bottom-up approach, is very good
|
|
at finding potential single component failures that could have catastrophic implications.
|
|
Used in the design phase of a project FMEA is an invaluable tool
|
|
for unearthing these failure scenarios.
|
|
It is less useful for determining catastrophic events for multiple
|
|
simultaneous\footnote{Multiple simultaneous failures are taken to mean failure that occur within the same detection period.} failures.
|
|
|
|
\subsection{Impracticality of Field Data for modern systems}
|
|
|
|
Modern electronic components, are generally very reliable, and the systems built from them
|
|
are thus very reliable too. Reliable field data on failures will, therefore be sparse.
|
|
Should we wish to prove a continuous demand system for say ${10}^{-7}$ failures\footnote{${10}^{-7}$ failures per hour of operation is the
|
|
threshold for S.I.L. 3 reliability~\cite{en61508}. Failure rates are normally measured per $10^9$ hours of operation
|
|
and are know as Failure in Time (FIT) values. The maximum FIT values for a SIL 3 system is therefore 100.}
|
|
per hour of operation, even with 1000 correctly monitored units in the field
|
|
we could only expect one failure per ten thousand hours (a little over one a year).
|
|
It would be utterly impractical to get statistically significant data for equipment
|
|
at these reliability levels.
|
|
However, we can use FMEA (more specifically the FMEDA variant, see section~\ref{sec:FMEDA}),
|
|
working from known component failure rates, to obtain
|
|
statistical estimates of the equipment reliability.
|
|
|
|
|
|
\subsection{FMEA and the State Explosion Problem}
|
|
|
|
\paragraph{Rigorous Single Failure FMEA}
|
|
|
|
FMEA for a safety critical certification~\cite{en298,en61508} will have to be applied
|
|
to all known failure modes of all components within a system.
|
|
|
|
To perform FMEA rigorously (i.e. to examine every possible interaction
|
|
of a failure mode with all other components in a system). Or in other words,
|
|
---we would need to look at all possible failure scenarios.
|
|
%to do this completely (all failure modes against all components).
|
|
This is represented in the equation below. %~\ref{eqn:fmea_state_exp},
|
|
where $N$ is the total number of components in the system, and
|
|
$f$ is the number of failure modes per component.
|
|
|
|
|
|
\begin{equation}
|
|
\label{eqn:fmea_single}
|
|
N.(N-1).f % \\
|
|
%(N^2 - N).f
|
|
\end{equation}
|
|
|
|
|
|
\paragraph{Rigorous Single Failure FMEA}
|
|
This would mean an order of $O(N^2)$ number of checks to perform
|
|
to undertake a `rigorous~FMEA'. Even small systems have typically
|
|
100 components, and they typically have 3 or more failure modes each.
|
|
$100*99*3=29,700$.
|
|
|
|
\paragraph{Rigorous Double Failure FMEA}
|
|
For looking at potential double failure
|
|
scenarios\footnote{Certain double failure scenarios are already legal requirements---The European Gas burner standard (EN298:2003)---demands the checking of
|
|
double failure scenarios (for burner lock-out scenarios).}
|
|
(two components failing within a given time frame) and the order becomes $O(N^3)$.
|
|
|
|
\begin{equation}
|
|
\label{eqn:fmea_double}
|
|
N.(N-1).(N-2).f % \\
|
|
%(N^2 - N).f
|
|
\end{equation}
|
|
|
|
For our theoretical 100 components with 3 failure modes each example, this is
|
|
$100*99*98*3=2,910,600$ failure mode scenarios.
|
|
|
|
|
|
\paragraph{Reliance of experts for meaningful FMEA Analysis.}
|
|
FMEA cannot consider---for practical reasons---a rigorous approach.
|
|
We define rigorous FMEA as examining the effect of every component failure mode
|
|
against the remaining components in the system under investigation.
|
|
%
|
|
Because we cannot perform rigorous FMEA,
|
|
we rely on experts in the system under investigation
|
|
to perform a meaningful FMEA analysis.
|
|
|
|
|
|
|
|
|
|
\section{FMEA in practise: Five variants}
|
|
|
|
\paragraph{Five main Variants of FMEA}
|
|
\begin{itemize}
|
|
\item \textbf{PFMEA - Production} Car Manufacture etc
|
|
\item \textbf{FMECA - Criticallity} Military/Space
|
|
\item \textbf{FMEDA - Statistical safety} EN61508/IOC1508 Safety Integrity Levels
|
|
\item \textbf{DFMEA - Design or static/theoretical} EN298/EN230/UL1998
|
|
\item \textbf{SFMEA - Software FMEA --- only used in highly critical systems at present}
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
|
|
\section{PFMEA - Production FMEA : 1940's to present}
|
|
|
|
|
|
|
|
Production FMEA (or PFMEA), is FMEA used to prioritise, in terms of
|
|
cost, problems to be addressed in product production.
|
|
|
|
It focuses on known problems, determines the
|
|
frequency they occur and their cost to fix.
|
|
This is multiplied together and called an RPN
|
|
number.
|
|
Fixing problems with the highest RPN number
|
|
will return most cost benefit.
|
|
|
|
% benign example of PFMEA in CARS - make something up.
|
|
\subsection{PFMEA Example}
|
|
\begin{table}[ht]
|
|
\caption{FMEA Calculations} % title of Table
|
|
%\centering % used for centering table
|
|
\begin{tabular}{|| l | l | c | c | l ||} \hline
|
|
\textbf{Failure Mode} & \textbf{P} & \textbf{Cost} & \textbf{Symptom} & \textbf{RPN} \\ \hline \hline
|
|
relay 1 n/c & $1*10^{-5}$ & 38.0 & indicators fail & 0.00038 \\ \hline
|
|
relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline
|
|
% rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\
|
|
% ruptured f.tank & & & & \\ \hline
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
|
|
\section{FMECA - Failure Modes Effects and Criticality Analysis}
|
|
|
|
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
|
% \begin{figure}
|
|
% \centering
|
|
% %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg}
|
|
% \includegraphics[width=300pt]{./CH2_FMEA/A10_thunderbolt.jpg}
|
|
% % military-aircraft-desktop-computer-wallpaper-missile-launch.jpg: 1024x768 pixel, 300dpi, 8.67x6.50 cm, bb=0 0 246 184
|
|
% \caption{A10 Thunderbolt}
|
|
% \label{fig:f16missile}
|
|
% \end{figure}
|
|
Emphasis on determining criticality of failure.
|
|
Applies some Bayesian statistics (probabilities of component failures and those thereby causing given system level failures).
|
|
|
|
|
|
|
|
|
|
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
|
Very similar to PFMEA, but instead of cost, a criticality or
|
|
seriousness factor is ascribed to putative top level incidents.
|
|
FMECA has three probability factors for component failures.
|
|
|
|
\textbf{FMECA ${\lambda}_{p}$ value.}
|
|
This is the overall failure rate of a base component.
|
|
This will typically be the failure rate per million ($10^6$) or
|
|
billion ($10^9$) hours of operation. reference MIL1991.
|
|
|
|
\textbf{FMECA $\alpha$ value.}
|
|
The failure mode probability, usually denoted by $\alpha$ is the probability of
|
|
a particular failure~mode occurring within a component. reference FMD-91.
|
|
%, should it fail.
|
|
%A component with N failure modes will thus have
|
|
%have an $\alpha$ value associated with each of those modes.
|
|
%As the $\alpha$ modes are probabilities, the sum of all $\alpha$ modes for a component must equal one.
|
|
|
|
|
|
|
|
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
|
|
\textbf{FMECA $\beta$ value.}
|
|
The second probability factor $\beta$, is the probability that the failure mode
|
|
will cause a given system failure.
|
|
This corresponds to `Bayesian' probability, given a particular
|
|
component failure mode, the probability of a given system level failure.
|
|
|
|
\textbf{FMECA `t' Value}
|
|
The time that a system will be operating for, or the working life time of the product is
|
|
represented by the variable $t$.
|
|
%for probability of failure on demand studies,
|
|
%this can be the number of operating cycles or demands expected.
|
|
|
|
\textbf{Severity `s' value}
|
|
A weighting factor to indicate the seriousness of the putative system level error.
|
|
%Typical classifications are as follows:~\cite{fmd91}
|
|
|
|
\begin{equation}
|
|
C_m = {\beta} . {\alpha} . {{\lambda}_p} . {t} . {s}
|
|
\end{equation}
|
|
|
|
Highest $C_m$ values would be at the top of a `to~do' list
|
|
for a project manager.
|
|
|
|
|
|
|
|
|
|
\section{FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
% \begin{figure}
|
|
% \centering
|
|
% \includegraphics[width=200pt]{./SIL.png}
|
|
% % SIL.jpg: 350x286 pixel, 72dpi, 12.35x10.09 cm, bb=0 0 350 286
|
|
% \caption{SIL requirements}
|
|
% \end{figure}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
|
|
\begin{itemize}
|
|
\item \textbf{Statistical Safety} Safety Integrity Level (SIL) standards (EN61508/IOC5108).
|
|
\item \textbf{Diagnostics} Diagnostic or self checking elements modelled
|
|
\item \textbf{Complete Failure Mode Coverage} All failure modes of all components must be in the model
|
|
\item \textbf{Guidelines} To system architectures and development processes
|
|
\end{itemize}
|
|
|
|
FMEDA is the methodology behind statistical (safety integrity level)
|
|
type standards (EN61508/IOC5108).
|
|
It provides a statistical overall level of safety
|
|
and allows diagnostic mitigation for self checking etc.
|
|
It provides guidelines for the design and architecture
|
|
of computer/software systems for the four levels of
|
|
safety Integrity.
|
|
%For Hardware
|
|
%
|
|
FMEDA does force the user to consider all hardware components in a system
|
|
by requiring that a MTTF value is assigned for each failure~mode;
|
|
the MTTF may be statistically mitigated (improved)
|
|
if it can be shown that self-checking will detect failure modes.
|
|
For software it provides procedural quality guidelines and constraints (such as forbidding certain
|
|
programming languages and/or features.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
\label{sec:FMEDA}
|
|
\textbf{Failure Mode Classifications in FMEDA.}
|
|
\begin{itemize}
|
|
\item \textbf{Safe or Dangerous} Failure modes are classified SAFE or DANGEROUS
|
|
\item \textbf{Detectable failure modes} Failure modes are given the attribute DETECTABLE or UNDETECTABLE
|
|
\item \textbf{Four attributes to Failure Modes} All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU)
|
|
\item \textbf{Four statistical properties of a system} \\
|
|
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$
|
|
\end{itemize}
|
|
|
|
% Failure modes are classified as Safe or Dangerous according
|
|
% to the putative system level failure they will cause.
|
|
% The Failure modes are also classified as Detected or
|
|
% Undetected.
|
|
% This gives us four level failure mode classifications:
|
|
% Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
|
|
% and the probabilistic failure rate of each classification
|
|
% is represented by lambda variables
|
|
% (i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
|
|
\textbf{Diagnostic Coverage.}
|
|
The diagnostic coverage is simply the ratio
|
|
of the dangerous detected probabilities
|
|
against the probability of all dangerous failures,
|
|
and is normally expressed as a percentage. $\Sigma\lambda_{DD}$ represents
|
|
the percentage of dangerous detected base component failure modes, and
|
|
$\Sigma\lambda_D$ the total number of dangerous base component failure modes.
|
|
|
|
$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$
|
|
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
The \textbf{diagnostic coverage} for safe failures, where $\Sigma\lambda_{SD}$ represents the percentage of
|
|
safe detected base component failure modes,
|
|
and $\Sigma\lambda_S$ the total number of safe base component failure modes,
|
|
is given as
|
|
|
|
$$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
\textbf{Safe Failure Fraction.}
|
|
A key concept in FMEDA is Safe Failure Fraction (SFF).
|
|
This is the ratio of safe and dangerous detected failures
|
|
against all safe and dangerous failure probabilities.
|
|
Again this is usually expressed as a percentage.
|
|
|
|
$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$
|
|
|
|
SFF determines how proportionately fail-safe a system is, not how reliable it is !
|
|
Weakness in this philosophy; adding extra safe failures (even unused ones) improves the SFF.
|
|
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
To achieve SIL levels, diagnostic coverage and SFF levels are prescribed along with
|
|
hardware architectures and software techniques.
|
|
The overall the aim of SIL is classify the safety of a system,
|
|
by statistically determining how frequently it can fail dangerously.
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
|
|
\begin{table}[ht]
|
|
\caption{FMEA Calculations} % title of Table
|
|
%\centering % used for centering table
|
|
\begin{tabular}{|| l | l | c | c | l ||} \hline
|
|
\textbf{SIL} & \textbf{Low Demand} & \textbf{Continuous Demand} \\
|
|
& Prob of failing on demand & Prob of failure per hour \\ \hline \hline
|
|
4 & $ 10^{-5}$ to $< 10^{-4}$ & $ 10^{-9}$ to $< 10^{-8}$ \\ \hline
|
|
3 & $ 10^{-4}$ to $< 10^{-3}$ & $ 10^{-8}$ to $< 10^{-7}$ \\ \hline
|
|
2 & $ 10^{-3}$ to $< 10^{-2}$ & $ 10^{-7}$ to $< 10^{-6}$ \\ \hline
|
|
1 & $ 10^{-2}$ to $< 10^{-1}$ & $ 10^{-6}$ to $< 10^{-5}$ \\ \hline
|
|
|
|
\hline
|
|
\end{tabular}
|
|
\end{table}
|
|
|
|
Table adapted from EN61508-1:2001 [7.6.2.9 p33]
|
|
|
|
|
|
|
|
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
|
FMEDA is a modern extension of FMEA, in that it will allow for
|
|
self checking features, and provides detailed recommendations for computer/software architecture.
|
|
It has a simple final result, a Safety Integrity Level (SIL) from 1 to 4 (where 4 is safest).
|
|
|
|
%FMEA can be used as a term simple to mean Failure Mode Effects Analysis, and is
|
|
%part of product approval for many regulated products in the EU and the USA...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{FMEA used for Safety Critical Approvals}
|
|
|
|
|
|
\subsection{DESIGN FMEA: Safety Critical Approvals FMEA}
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=300pt,keepaspectratio=true]{./CH2_FMEA/tech_meeting.png}
|
|
% tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72
|
|
\caption{FMEA Meeting}
|
|
\label{fig:tech_meeting}
|
|
\end{figure}
|
|
Static FMEA, Design FMEA, Approvals FMEA
|
|
|
|
Experts from Approval House and Equipment Manufacturer
|
|
discuss selected component failure modes
|
|
judged to be in critical sections of the product.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{DESIGN FMEA: Safety Critical Approvals FMEA}
|
|
|
|
% \begin{figure}[h]
|
|
% \centering
|
|
% \includegraphics[width=70pt,keepaspectratio=true]{./tech_meeting.png}
|
|
% % tech_meeting.png: 350x299 pixel, 300dpi, 2.97x2.53 cm, bb=0 0 84 72
|
|
% \caption{FMEA Meeting}
|
|
% \label{fig:tech_meeting}
|
|
% \end{figure}
|
|
|
|
\begin{itemize}
|
|
\item Impossible to look at all component failures let alone apply FMEA rigorously.
|
|
\item In practise, failure scenarios for critical sections are contested, and either justified or extra safety measures implemented.
|
|
\item Often Meeting notes or minutes only. Unusual for detailed arguments to be documented.
|
|
\end{itemize}
|
|
|
|
|
|
|
|
|
|
\section{Literature Review}
|
|
|
|
%% FOCUS
|
|
The focus of this literature review is to establish the practice and applications
|
|
of FMEA, and to examine its strengths and weaknesses.
|
|
%% GOAL
|
|
Its
|
|
goal is to identify central issues and to criticise and assess the current
|
|
FMEA methodologies.
|
|
%% PERSPECTIVE
|
|
The perspective of the author, is as a practitioner of static failure mode analysis techniques
|
|
concerning approval of product
|
|
to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
|
|
A second perspective is that of a software engineer trained to use formal methods.
|
|
Examining FMEA methodologies for mathematical properties, influenced by
|
|
formal methods applied to software, should provide an angle not traditionally considered.
|
|
%% COVERAGE
|
|
The literature reviewed, has been restricted to published books, European safety standards (as examples
|
|
of current safety measures applied), and traditional research, from journal and conference papers.
|
|
%% ORGANISATION
|
|
The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
|
|
to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
|
|
%% AUDIENCE
|
|
% Well duh! PhD supervisors and examiners....
|
|
|
|
\subsection{Related Methodologies}
|
|
FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
|
|
\subsection{Hardware FMEA (HFMEA)}
|
|
\subsection{Multiple Failure scenarios and FMEA}
|
|
\subsection{Software FMEA (SFMEA)}
|
|
|
|
\paragraph{Current work on Software FMEA}
|
|
|
|
SFMEA usually does not seek to integrate
|
|
hardware and software models, but to perform
|
|
FMEA on the software in isolation~\cite{procsfmea}.
|
|
%
|
|
Work has been performed using databases
|
|
to track the relationships between variables
|
|
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
|
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
|
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
|
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
|
and FMEA (bottom-up inductive)
|
|
to be performed on the same system to provide insight into the
|
|
software hardware/interface~\cite{embedsfmea}.
|
|
%
|
|
Although this
|
|
would give a better picture of the failure mode behaviour, it
|
|
is by no means a rigorous approach to tracing errors that may occur in hardware
|
|
through to the top (and therefore ultimately controlling) layer of software.
|
|
|
|
\paragraph{Current FMEA techniques are not suitable for software}
|
|
|
|
The main FMEA methodologies are all based on the concept of taking
|
|
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
|
|
%
|
|
In a complicated system, mapping a component failure mode to a system level failure
|
|
will mean a long reasoning distance; that is to say the actions of the
|
|
failed component will have to be traced through
|
|
several sub-systems, gauging its effects with and on other components.
|
|
%
|
|
With software at the higher levels of these sub-systems,
|
|
we have yet another layer of complication.
|
|
%
|
|
%In order to integrate software, %in a meaningful way
|
|
%we need to re-think the
|
|
%FMEA concept of simply mapping a base component failure to a system level event.
|
|
%
|
|
SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
|
|
The failure modes of these variables, are that they could become erroneously over-written,
|
|
calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
|
|
external influences such as
|
|
ionising radiation causing bits to be erroneously altered.
|
|
|
|
|
|
|
|
%
|
|
|
|
|
|
|
|
\section{Conclusion}
|
|
|
|
\paragraph{Where FMEA is now}
|
|
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
|
|
very good with single failure modes linked to top level events.
|
|
FMEA has become part of the safety critical and safety certification industries.
|
|
%
|
|
SFMEA is in its infancy, but there is a gap in current
|
|
certification for software, EN61508~\cite{en61508}, recommends hardware redundancy architectures in conjunction
|
|
with FMEDA for hardware: for software it recommends language constraints and quality procedures
|
|
but no inductive fault finding technique.
|
|
|
|
FMEA has adapted from a cost saving exercise for mass produced items, to incorporating statistical techniques
|
|
(FMECA) to allowing for self diagnostic mitigation (FMEDA).
|
|
However, it is still based on the single component failure mapped to system level failure.
|
|
All these FMEA based methodologies have the following short comings:
|
|
\begin{itemize}
|
|
\item Impossible to integrate Software and hardware models,
|
|
\item State explosion problem exacerbated by increasing complexity due to density of modern electronics,
|
|
\item Impossibility to consider all multiple component failure modes
|
|
\end{itemize}
|