377 lines
19 KiB
TeX
377 lines
19 KiB
TeX
\label{sec:chap3}
|
|
|
|
\section*{Introduction}
|
|
|
|
This chapter examines FMEA in a critical light.
|
|
The problems with the scope---or required reasoning distance---of detail to apply
|
|
for FMEA analysis, the difficulties of integrating software
|
|
and hardware in FMEA failure models, and the impossibility of performing meaningful
|
|
multiple failure analysis are examined.
|
|
Additional problems such as the inability to easily re-use, and validate (through
|
|
traceable reasoning) FMEA models is presented.
|
|
Finally we conclude with a list of deficiencies in current FMEA methodologies, and present a wish list
|
|
for an improved methodology.
|
|
|
|
\section{Historical Origins of FMEA: {\bc} {\fm} to system level failure/symptom paradigm}
|
|
|
|
\subsection{FMEA: {\bc} {\fm} to system level failure modelling}
|
|
FMEA traces it roots to the 1940s when it was used to identify the most costly
|
|
failures arising from car mass-production~\cite{bfmea}.
|
|
It was later modified slightly to include severity of the top level failure (FMECA~\cite{fmeca}).
|
|
In the 1980s FMEA was extended again (FMEDA~\cite{fmeda}) to provide statistics
|
|
for predicting failure rates.
|
|
%
|
|
However a typical entry in each of the above methodologies, starts with a
|
|
particular component failure mode and associates it with a system---or top level---failure symptom.
|
|
This means that we have one analysis case per component failure mode for all the components in the system under investigation.
|
|
This analysis philosophy has not changed since FMEA was first used.
|
|
|
|
|
|
\subsection{FMEA does not support Traceable Reasoning}
|
|
An FMEA report normally assigns one line of a spreadsheet to
|
|
each {\bc} {\fm}.
|
|
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
|
|
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
|
|
but the structure of current FMEA reports does not encourage this.
|
|
|
|
\subsection{FMEA does not support modularity.}
|
|
It is a common practise in the process control industry to buy in sub-systems,
|
|
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
|
|
Most sensor systems now are `smart'~\cite{smartinstruments}, that is to say, they contain programmatic elements
|
|
even if their outputs are %they supply
|
|
analogue signals. For instance a liquid level sensor that
|
|
supplies a {\ft} output, would have been typically have been implemented
|
|
in analogue electronics before the 1980s. After that time, it would be common to use a micro-processor
|
|
based system to perform the functions of reading the sensor and converting it to a current (\ft) output.
|
|
For the non-safety critical systems integrator this brings with it the advantages
|
|
that come with using a digital system (increased accuracy, self checking and ease of
|
|
calibration etc. ). For a safety critical systems integrator this can be very problematic when it
|
|
comes to approvals. Even if the sensor manufacturer will let you see the internal workings and software
|
|
we have a problem with tracing the FMEA reasoning through the sensor, through the sensors software
|
|
and then though the system being integrated.
|
|
This problem is compounded by the fact that traditional FMEA cannot integrate software into FMEA models~\cite{sfmea,safeware}.
|
|
|
|
|
|
\section{Reasoning Distance used to measure Comparison Complexity}
|
|
\label{sec:reasoningdistance}
|
|
Traditional FMEA cannot ensure that each failure mode of all its
|
|
components are checked against any other components in the system which
|
|
it may affect, due to state explosion.
|
|
%
|
|
FMEA is therefore performed using heuristics to decide
|
|
which components to check the effect of a component failure mode on.
|
|
We could term the number of checks made for each failure mode
|
|
on aspects of the system to be the reasoning distance.
|
|
%
|
|
In practise FMEA may be performed by following the signal path
|
|
of the component failure mode to its system level effect. This is less than ideal
|
|
and it can easily miss interactions with adjacent components, that could cause
|
|
other system level symptoms.
|
|
%
|
|
Were we to compare the reasoning distance with the theoretical maximum, the sum of all failure
|
|
modes in a system, multiplied by the number of components in it, we could arrive at a comparison complexity figure.
|
|
This figure would mean we could compare the maximum number of checks (i.e. exhaustive%rigorous
|
|
analysis) with the number actually performed.
|
|
|
|
\paragraph{The ideal of exhaustive FMEA (XFMEA)}
|
|
Obviously, exhaustively checking every component failure mode in a system,
|
|
against all other components is the ideal for finding all possible system level failures.
|
|
While this is impossible for all but trivial systems, it should be possible
|
|
for small groups of components that work together to provide a well defined function.
|
|
We could term such a group a `{\fg}'.
|
|
|
|
\section{Re-use of FMEA analysis}
|
|
|
|
Given the {\bc} {\fm} to system level failure mode paradigm it is
|
|
difficult to re-use FMEA analysis.
|
|
%
|
|
Several strategies to aid re-use have been proposed~\cite{rudov2009language, reuse_of_fmea}, but
|
|
the fundamental problem remains, that, with any changes
|
|
to the component base in a system, it is very difficult to
|
|
determine which FMEA test scenarios must be re-worked.
|
|
%
|
|
It is common in safety critical systems to have repeated circuit topologies.
|
|
For instance we may have several signal input and output
|
|
structures that are repeated.
|
|
%
|
|
The failure mode behaviour of these repeated structures will be the same.
|
|
However with the {\bc} {\fm} to system level failure mode mapping
|
|
work is likely to be repeated.
|
|
|
|
|
|
\section{software and FMEA}
|
|
|
|
Traditional FMEA deals only with electrical and mechanical components, i.e. it does not have provision for software.
|
|
Modern control systems nearly always have a significant software/firmware element,
|
|
and not being able to model software with current FMEA methodologies
|
|
is a cause for criticism~\cite{safeware}[Ch.12].
|
|
%Even the traditionally conservative nuclear industry is now
|
|
%facing up to the ubiquity of software in control systems~\cite{parnas1991assessment}.
|
|
Similar difficulties in integrating mechanical and electronic/software
|
|
failure models are discussed in ~\cite{SMR:SMR580,swassessment}.
|
|
|
|
|
|
\paragraph{Current work on Software FMEA.}
|
|
|
|
SFMEA usually does not seek to integrate
|
|
hardware and software models, but to perform
|
|
FMEA on the software in isolation~\cite{procsfmea}.
|
|
%
|
|
Work has been performed using databases
|
|
to track the relationships between variables
|
|
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
|
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
|
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
|
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
|
and FMEA (bottom-up inductive)
|
|
to be performed on the same system to provide insight into the
|
|
software hardware/interface~\cite{embedsfmea}.
|
|
%
|
|
Although this
|
|
would give a better picture of the failure mode behaviour, it
|
|
is by no means a rigorous approach to tracing errors that may occur in hardware
|
|
through to the top (and therefore ultimately controlling) layer of software.
|
|
|
|
|
|
\subsection{The rise of the smart instrument}
|
|
%% AWE --- Atomic Weapons Establishment have this problem....
|
|
A smart instrument is defined as one that uses a micro-processor and software
|
|
in conjunction with its sensing electronics, rather than
|
|
analogue electronics only~\cite{smart_instruments_1514209}.
|
|
%
|
|
It is termed `smart' because it has some software, or intelligence incorporated into it.
|
|
%
|
|
For instance, an AVO-8 multi-meter circa 1970, uses only analogue electronics, and we can determine
|
|
using FMEA how component failures within it could affect readings.
|
|
%
|
|
A modern multi-meter will have a small dedicated micro-processor and sensing electronics, all on the same chip,
|
|
with firmware to read the user controls, and display results on an LCD.
|
|
%
|
|
For quality control, many safety critical processes require regular inspections
|
|
and measurements of physical characteristics of materials and machinery.
|
|
%
|
|
For highly critical systems i.e. the nuclear industry~\cite{parnas1991assessment},
|
|
the instruments used to perform these measurements, must be analysed using traditional assessment (which entails
|
|
FMEA), to ensure that failure modes within the instrument cannot lead to invalid measurements.
|
|
%
|
|
Some work has been performed to offer black~box---or functional testing---of these instruments instead of
|
|
static analysis~\cite{Bishop:2010:ONT:1886301.1886325}.
|
|
However, black box testing of smart instruments is
|
|
yet to be a an approved method of validation.
|
|
%
|
|
Most modern instruments now use highly integrated electronics coupled to micro-controllers, which read and filter the measurements,
|
|
and interface to an LCD readout.
|
|
%
|
|
For the highly critical systems, that means they cannot use traditional FMEA to validate
|
|
the design of instruments.
|
|
%
|
|
While noting that being more modern, these instruments are likely to be more reliable and
|
|
accurate than the analogue instruments in use some twenty years ago but this cannot be validated
|
|
to a high level of reliability. This remains an unsolved problem for the industries dealing with highly safety critical
|
|
systems. %by traditional FMEA.
|
|
%to a high level of reliability by traditional FMEA.
|
|
%
|
|
Currently the only way that some smart~instruments have been permitted for
|
|
use in highly critical systems is the have the extensively
|
|
functionally tested~\cite{bishopsmartinstruments}.
|
|
%>>>>>>> 1b3d54f0ec2963017e98c4cdadc9a72a8bac911a
|
|
|
|
\subsection{Distributed real time systems}
|
|
|
|
Distributed real time systems are control systems where
|
|
smart sensors communicate over a communications bus to
|
|
a master controller.
|
|
%
|
|
Most modern cars follow this information technology pattern and use CANbus~\cite{canspec,can}.
|
|
%
|
|
For instance, in a modern car there will be no mechanical linkage from the pedal to the engine, instead the throttle pedal will be linked to a sensor to determine how
|
|
far the pedal is pressed.
|
|
This sensor will be read by a micro-controller, and passed, via CANbus, to the Engine Control Unit (ECU)
|
|
which will use that information (along with information from other sensors) to adjust the power required from the engine.
|
|
This adjustment could be direct, or could be another CANbus message passed to a micro-controller regulating engine function.
|
|
In terms of FMEA, see figure~\ref{fig:distcon}, our reasoning path spans four interface layers of electronics to software.
|
|
Traditional FMEA does not cater for the software hardware interface, and here we have the addition complications
|
|
%with the additional complications
|
|
of the communications protocol used to transmit data, and the failure mode characteristics
|
|
of the communications physical layer.
|
|
|
|
%(figure~\ref{fig:distcon}
|
|
The failure reasoning paths for a distributed real time system, with its multiple passes of the hardware/software
|
|
interface mean traditional FMEA, for these systems,
|
|
is impossible to perform.
|
|
%
|
|
The base component failure mode to system failure paradigm is
|
|
utterly anachronistic in the distributed real time system environment.
|
|
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=400pt]{./CH3_FMEA_criticism/distcon.png}
|
|
% distcon.png: 1622x656 pixel, 72dpi, 57.22x23.14 cm, bb=0 0 1622 656
|
|
\caption{Distributed Control System FMEA reasoning path for a single failure.}
|
|
\label{fig:distcon}
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\section{FMEA ---- general criticism --- conclusion}
|
|
|
|
%\subsection{FMEA - General Criticism}
|
|
|
|
\begin{itemize}
|
|
\item FMEA type methodologies were designed for simple electro-mechanical systems of the 1940's to 1960's.
|
|
\item Reasoning Distance - component failure to system level symptom
|
|
\item State explosion - impossible to perform FMEA exhaustively %rigorously
|
|
\item Difficult to re-use previous analysis work
|
|
\item Very Difficult to model simultaneous failures.
|
|
\item Software and hardware models are separate.
|
|
\item Distributed real time systems are very difficult to meaningfully analyse with FMEA.
|
|
\end{itemize}
|
|
|
|
FMEA is no longer fit for purpose!
|
|
%
|
|
%
|
|
% \section{Conclusions on current FMEA Methodologies}
|
|
%
|
|
% %% FOCUS
|
|
% The focus of this chapter %literature review
|
|
% is to establish the current practice and applications
|
|
% of FMEA.
|
|
% %, and to examine its strengths and weaknesses.
|
|
% %% GOAL
|
|
% Its
|
|
% goal is to identify central issues and to criticise and assess the current
|
|
% FMEA methodologies.
|
|
% %% PERSPECTIVE
|
|
% The perspective of the author, is as a practitioner of static failure mode analysis techniques
|
|
% concerning approval of product
|
|
% to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
|
|
% A second perspective is that of a software engineer trained to use formal methods.
|
|
% Examining FMEA methodologies for mathematical properties, influenced by
|
|
% formal methods applied to software, should provide a perspective not traditionally considered.
|
|
% %% COVERAGE
|
|
% The literature reviewed, has been restricted to published books, European safety standards (as examples
|
|
% of current safety measures applied), and traditional research, from journal and conference papers.
|
|
% %% ORGANISATION
|
|
% The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
|
|
% to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
|
|
% %% AUDIENCE
|
|
% % Well duh! PhD supervisors and examiners....
|
|
%
|
|
% % \subsection{Related Methodologies}
|
|
% % FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
|
|
% % \subsection{Hardware FMEA (HFMEA)}
|
|
% % \subsection{Multiple Failure scenarios and FMEA}
|
|
% % \subsection{Software FMEA (SFMEA)}
|
|
%
|
|
% \paragraph{Current work on Software FMEA}
|
|
%
|
|
% SFMEA usually does not seek to integrate
|
|
% hardware and software models, but to perform
|
|
% FMEA on the software in isolation~\cite{procsfmea}.
|
|
% %
|
|
% Work has been performed using databases
|
|
% to track the relationships between variables
|
|
% and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
|
% introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
|
% automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
|
% some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
|
% and FMEA (bottom-up inductive)
|
|
% to be performed on the same system to provide insight into the
|
|
% software hardware/interface~\cite{embedsfmea}.
|
|
% %
|
|
% Although this
|
|
% would give a better picture of the failure mode behaviour, it
|
|
% is by no means a rigorous approach to tracing errors that may occur in hardware
|
|
% through to the top (and therefore ultimately controlling) layer of software~\cite{swassessment}.
|
|
%
|
|
% \paragraph{Current FMEA techniques are not suitable for software}
|
|
%
|
|
% The main FMEA methodologies are all based on the concept of taking
|
|
% base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
|
|
% %
|
|
% In a complicated system, mapping a component failure mode to a system level failure
|
|
% will mean a long reasoning distance; that is to say the actions of the
|
|
% failed component will have to be traced through
|
|
% several sub-systems, gauging its effects with and on other components.
|
|
% %
|
|
% With software at the higher levels of these sub-systems,
|
|
% we have yet another layer of complication.
|
|
% %
|
|
% %In order to integrate software, %in a meaningful way
|
|
% %we need to re-think the
|
|
% %FMEA concept of simply mapping a base component failure to a system level event.
|
|
% %
|
|
% SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
|
|
% The failure modes of these variables, are that they could become erroneously over-written,
|
|
% calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
|
|
% external influences such as
|
|
% ionising radiation causing bits to be erroneously altered.
|
|
%
|
|
%
|
|
% \paragraph{FMEA and Modularity}
|
|
% From the 1940's onwards, software has evolved from a simple procedural languages (i.e. assembly language/Fortran~\cite{f77} call return)
|
|
% to structured programming ( C~\cite{DBLP:books/ph/KernighanR88}, pascal etc) and then to object oriented models (Java C++...).
|
|
% FMEA has undergone no such evolution.
|
|
% %
|
|
% In a world where sensor systems, often including embedded software components, are brought in to
|
|
% create complex systems, FMEA still follows a rigid {\bc} {\fm} to system level error model,
|
|
% that is only suitable for simple electro mechanical systems.
|
|
%
|
|
%
|
|
%
|
|
% %
|
|
%
|
|
% %
|
|
% % MAYBE MOVE THIS TO CH3, FMEA CRITICISM
|
|
% 30JAN2013
|
|
%
|
|
|
|
\subsection{FMEA Criticism: Conclusions.}
|
|
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
|
|
very good with single failure modes linked to top level events.
|
|
FMEA has become part of the safety critical and safety certification industries.
|
|
%
|
|
SFMEA is in its infancy, and there are corresponding gaps in
|
|
certification for software, EN61508~\cite{en61508}, recommends hardware redundancy architectures in conjunction
|
|
with FMEDA for hardware: for software it recommends language constraints and quality procedures
|
|
but no inductive fault finding technique.
|
|
%
|
|
FMEA has adapted from a cost saving exercise for mass produced items~\cite{bfmea,generic_automotive_fmea_6034891}, to incorporating statistical techniques
|
|
(FMECA) to allowing for self diagnostic mitigation (FMEDA).
|
|
%
|
|
However, it is still based on the concept of single component failures mapped to top~level/system~failures.
|
|
All these FMEA based methodologies have the following short comings:
|
|
\begin{itemize}
|
|
\item Impossible to integrate Software and hardware models,
|
|
\item State explosion problem exacerbated by increasing complexity due to density of modern electronics,
|
|
\item Impossible to consider all multiple component failure modes~\cite{FMEAmultiple653556}
|
|
\end{itemize}
|
|
|
|
|
|
|
|
%\subsection{FMEA - Better Methodology - Wish List}
|
|
|
|
|
|
\subsection{FMEA - Better Methodology - Wish List}
|
|
|
|
We now form a wish list, stating the features that we would want
|
|
in an improved FMEA methodology,
|
|
\begin{itemize}
|
|
\item No state explosion making analysis impractical,
|
|
\item Exhaustive checking (total failure coverage within {\fgs} all interacting component and failure modes checked),
|
|
\item Reasoning Traceable in system models,
|
|
\item Re-useable i.e. it should be possible to re-use analysis performed previously,
|
|
\item It must be possible to analyse simultaneous/multiple failures,
|
|
\item Modular --- i.e. usable in a distributed system.
|
|
% \item
|
|
\end{itemize}
|
|
|
|
%FMEDA is a modern extension of FMEA, in that it will allow for
|
|
%self checking features, and provides detailed recommendations for computer/software architecture,
|
|
%but
|
|
|