print out-redpen-edit.... now a nice
cycle ride to prince regent-a 2000m swim and then maybe a 6 inch sub with tuna...
This commit is contained in:
parent
05f96697e7
commit
ec7dc38679
13
mybib.bib
13
mybib.bib
@ -860,7 +860,18 @@ strength of materials, the causes of boiler explosions",
|
|||||||
biburl="http://www.isa.org/InTechTemplate.cfm?template=/ContentManagement/ContentDisplay.cfm\&ContentID=77994",
|
biburl="http://www.isa.org/InTechTemplate.cfm?template=/ContentManagement/ContentDisplay.cfm\&ContentID=77994",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@INPROCEEDINGS{patterns6113886,
|
||||||
|
author={Lopatkin, I. and Iliasov, A. and Romanovsky, A. and Prokhorova, Y. and Troubitsyna, E.},
|
||||||
|
booktitle={High-Assurance Systems Engineering (HASE), 2011 IEEE 13th International Symposium on},
|
||||||
|
title={Patterns for Representing FMEA in Formal Specification of Control Systems},
|
||||||
|
year={2011},
|
||||||
|
pages={146-151},
|
||||||
|
keywords={control engineering computing;control systems;failure analysis;formal specification;program diagnostics;system recovery;effects analysis;error detection;error recovery;failure modes;formal event-B specification;formal system development;inductive safety analysis;requirement tracing;sluice control system;Computational modeling;Logic gates;Safety;Sensor systems;Switches;Event-B;FMEA;control systems;formal specification;patterns;safety},
|
||||||
|
doi={10.1109/HASE.2011.10},
|
||||||
|
ISSN={1530-2059},}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@PHDTHESIS{garrett,
|
@PHDTHESIS{garrett,
|
||||||
AUTHOR = "Chris Garrett",
|
AUTHOR = "Chris Garrett",
|
||||||
TITLE = "Functional diagnosis strategies for analog systems using heuristic programming techniques",
|
TITLE = "Functional diagnosis strategies for analog systems using heuristic programming techniques",
|
||||||
|
@ -128,7 +128,7 @@ This means that for each {\cb} node there are at least two hardware software int
|
|||||||
Because of this it is virtually impossible to apply meaningful traditional FMEA methodologies to
|
Because of this it is virtually impossible to apply meaningful traditional FMEA methodologies to
|
||||||
{\cb} systems.
|
{\cb} systems.
|
||||||
%
|
%
|
||||||
This paper firstly highlights the limitations with traditonal FMEA,
|
This paper firstly highlights the limitations with traditional FMEA,
|
||||||
and then describes a new modularised variant, Failure Mode Modular De-composition
|
and then describes a new modularised variant, Failure Mode Modular De-composition
|
||||||
which addresses the problems of applying FMEA to software/hardware hybrid systems.
|
which addresses the problems of applying FMEA to software/hardware hybrid systems.
|
||||||
%The paper first discussed work performed on software FMEA, and then shows the need
|
%The paper first discussed work performed on software FMEA, and then shows the need
|
||||||
|
@ -36,7 +36,7 @@ defined at the start of this chapter.
|
|||||||
The act
|
The act
|
||||||
of defining relationships between the data objects
|
of defining relationships between the data objects
|
||||||
in FMEA raise questions about the nature of the process
|
in FMEA raise questions about the nature of the process
|
||||||
and allow us to analytically discuss its strengths and weaknesses.
|
and allows us to analytically discuss its strengths and weaknesses.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@ -176,7 +176,8 @@ component types, but does not detail specific failure modes.
|
|||||||
Using MIL1991 in conjunction with FMD-91 we can determine statistics for the failure modes
|
Using MIL1991 in conjunction with FMD-91 we can determine statistics for the failure modes
|
||||||
of component types.
|
of component types.
|
||||||
%
|
%
|
||||||
The FMEDA process from European standard EN61508~\cite{en61508}
|
The FMEA variant\footnote{EN61508 (and related standards) are based on the FMEA variant Failure Mode Effects and Diagnostic Analysis (FMEDA)}
|
||||||
|
used for European standard EN61508~\cite{en61508}
|
||||||
requires statistics for Meantime to Failure (MTTF) for all {\bc} failure modes.
|
requires statistics for Meantime to Failure (MTTF) for all {\bc} failure modes.
|
||||||
|
|
||||||
|
|
||||||
@ -468,7 +469,7 @@ that we got from FMD-91, listed in equation~\ref{eqn:opampfms}.
|
|||||||
\end{table}
|
\end{table}
|
||||||
|
|
||||||
|
|
||||||
%\clearpage
|
\clearpage
|
||||||
|
|
||||||
\subsubsection{Failure modes of an Op-Amp}
|
\subsubsection{Failure modes of an Op-Amp}
|
||||||
|
|
||||||
@ -515,11 +516,6 @@ component {\fms} in FMEA or FMMD and require interpretation.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\clearpage
|
|
||||||
|
|
||||||
|
|
||||||
%%
|
%%
|
||||||
%% Paragraph using failure modes to build from bottom up
|
%% Paragraph using failure modes to build from bottom up
|
||||||
%%
|
%%
|
||||||
@ -662,11 +658,11 @@ echoing diagnostic/fault~finding methods~\cite{garrett, maikowski}. % loebowski}
|
|||||||
%
|
%
|
||||||
When fault finding, we generally follow the signal path checking for correct behaviour
|
When fault finding, we generally follow the signal path checking for correct behaviour
|
||||||
along it: when we find something out of place we zoom in and measure
|
along it: when we find something out of place we zoom in and measure
|
||||||
the circuit behaviour until we find a faulty component or module.
|
the circuit behaviour until we find a faulty component or module~\cite{garrett}.
|
||||||
%
|
%
|
||||||
With this style of fault finding, because it is based on experiment,
|
With this style of fault finding, because it is based on experiment,
|
||||||
we can hop from module to module eliminating working modules, until we find the
|
we can hop from module to module eliminating working modules, until we find the
|
||||||
failure.
|
failure~\cite{maikowski}.
|
||||||
%
|
%
|
||||||
The rationale and work-culture of those tasked to
|
The rationale and work-culture of those tasked to
|
||||||
perform FMEA are generally personnel who have performed fault finding.
|
perform FMEA are generally personnel who have performed fault finding.
|
||||||
@ -706,7 +702,7 @@ Also, whether following the effects through the signal path {\em only} is accept
|
|||||||
would looking at its effect on all other components in the system be necessary.
|
would looking at its effect on all other components in the system be necessary.
|
||||||
%is a matter for debate.
|
%is a matter for debate.
|
||||||
%
|
%
|
||||||
In practise, it is a compromise between the amount of time/money that can be spent
|
In practise, a compromise is made between the amount of time/money that can be spent
|
||||||
on analysis relative to the criticality of the project.
|
on analysis relative to the criticality of the project.
|
||||||
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.
|
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.
|
||||||
|
|
||||||
@ -717,7 +713,7 @@ change the circuit topology. For a single failure
|
|||||||
this effect may cause additional complications for the analyst.
|
this effect may cause additional complications for the analyst.
|
||||||
For multiple failures this means
|
For multiple failures this means
|
||||||
that the analyst
|
that the analyst
|
||||||
will have to deal altered---or changed circuit topologies---
|
will have to deal with altered---or changed circuit topologies---
|
||||||
of the electronic circuit for each analysis.
|
of the electronic circuit for each analysis.
|
||||||
|
|
||||||
|
|
||||||
@ -776,13 +772,29 @@ did not link this failure to the catastrophic failure of the spacecraft~\cite{ch
|
|||||||
This was not a failure in the objective reasoning, but more of the subjective, or the context in which the leak occurred.
|
This was not a failure in the objective reasoning, but more of the subjective, or the context in which the leak occurred.
|
||||||
%
|
%
|
||||||
What this means is that for an objectively calculated failure mode outcome, we may have
|
What this means is that for an objectively calculated failure mode outcome, we may have
|
||||||
more than one subjective outcome definition for it.
|
more than one subjective outcome. %, or definition, for it.
|
||||||
|
%
|
||||||
|
|
||||||
|
This means that objective reasoning can be applied to determine objective effects, but the criticality ---or the seriousness/consequences---
|
||||||
|
of those failures depends upon the Equipment Under Control (EUC)
|
||||||
|
and its environment.
|
||||||
|
%
|
||||||
|
For instance a leak of nuclear material on an aboard a spacecraft could have the consequences
|
||||||
|
of loss of mission, but a leak on earth could have serious health and environmental consequences.
|
||||||
|
This means one line of FMECA describing a system risk is an over simplification (consider that the same
|
||||||
|
nuclear material will be present during transport and launch, and when outside earth's environment).
|
||||||
|
%
|
||||||
|
Subjective appraisal of the outcome of a system failure mode can also
|
||||||
|
be subject to management and/or political pressure.
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Multiple Simultaneous Failure Modes}
|
\paragraph{Multiple Simultaneous Failure Modes}
|
||||||
%
|
%
|
||||||
FMEA is less useful for determining events for multiple
|
FMEA is less useful for determining events for multiple
|
||||||
simultaneous
|
simultaneous
|
||||||
failures\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}.
|
failures\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.
|
||||||
|
Detection periods are typically determined for the process under control. For a flame in an industrial burner this
|
||||||
|
could typically be one second.~\cite{en298}}.
|
||||||
%
|
%
|
||||||
Work has been performed using component failure statistics to
|
Work has been performed using component failure statistics to
|
||||||
offer the more likely multiple failures~\cite{FMEAmultiple653556} for analysis.
|
offer the more likely multiple failures~\cite{FMEAmultiple653556} for analysis.
|
||||||
@ -806,14 +818,18 @@ meaning the additional failures might have to be analysed with respect to the ch
|
|||||||
|
|
||||||
|
|
||||||
\paragraph{Failure modes and their observability criterion: detectable and undetectable.}
|
\paragraph{Failure modes and their observability criterion: detectable and undetectable.}
|
||||||
|
\label{sec:detectable}
|
||||||
Often the effects of a failure mode may be easy to detect,
|
Often the effects of a failure mode may be easy to detect,
|
||||||
and our equipment can react by raising an alarm or compensating for the resulting fault.
|
and our equipment can react by raising an alarm or compensating for the resulting fault.
|
||||||
%
|
%
|
||||||
Some failure modes may cause undetectable failures, for instance a component that causes
|
Some failure modes may cause undetectable failures, for instance a component that causes
|
||||||
a measured reading to change could have adverse consequences yet not be flagged as a failure.
|
a measured reading to change could have adverse consequences yet not be flagged as a failure.
|
||||||
%
|
%
|
||||||
This type of failure would not be flagged as a failure by the system, because
|
This type of failure %
|
||||||
it has no way of knowing the reading is invalid.
|
%would not be flagged as a failure by the system, because
|
||||||
|
can not be dealt with by passing an error indication to higher level modules
|
||||||
|
because we cannot detect it. The system therefore
|
||||||
|
has no way of knowing the reading is invalid.
|
||||||
%
|
%
|
||||||
The term observable has a specific meaning in the field of control engineering~\cite{721666, ACS:ACS1297};
|
The term observable has a specific meaning in the field of control engineering~\cite{721666, ACS:ACS1297};
|
||||||
systems submitted for FMEA are generally related to control systems,
|
systems submitted for FMEA are generally related to control systems,
|
||||||
@ -893,11 +909,13 @@ methodologies.
|
|||||||
%{sfmeaforwardbackward}
|
%{sfmeaforwardbackward}
|
||||||
\subsection{FMEA and the State Explosion Problem}
|
\subsection{FMEA and the State Explosion Problem}
|
||||||
\label{sec:xfmea}
|
\label{sec:xfmea}
|
||||||
\paragraph{Exhaustive Single Failure FMEA.}
|
\paragraph{Problem of which components to check for a given {\bc} {\fm}.}
|
||||||
|
|
||||||
FMEA for a safety critical certification~\cite{en298,en61508} will have to be applied
|
FMEA for a safety critical certification~\cite{en298,en61508} will have to be applied
|
||||||
to all known failure modes of all components within a system.
|
to all known failure modes of all components within a system.
|
||||||
%
|
%
|
||||||
|
Each one of these, in a typical report, would be one line of a spreadsheet entry.
|
||||||
|
%
|
||||||
FMEA does not define or specify the scope of the investigation of each component failure mode.
|
FMEA does not define or specify the scope of the investigation of each component failure mode.
|
||||||
Should we follow the signal path, and all components we encounter along that, or should the scope be wider?
|
Should we follow the signal path, and all components we encounter along that, or should the scope be wider?
|
||||||
%
|
%
|
||||||
@ -921,7 +939,7 @@ $f$ is the number of failure modes per component.
|
|||||||
\end{equation}
|
\end{equation}
|
||||||
|
|
||||||
|
|
||||||
\paragraph{Exhaustive Single Failure FMEA}
|
\paragraph{Exhaustive FMEA and dual failures.}
|
||||||
This would mean an order of $O(N^2)$ number of checks to perform
|
This would mean an order of $O(N^2)$ number of checks to perform
|
||||||
to undertake an `exhaustive~FMEA'. Even small systems have typically
|
to undertake an `exhaustive~FMEA'. Even small systems have typically
|
||||||
100 components, and they typically have 3 or more failure modes each.
|
100 components, and they typically have 3 or more failure modes each.
|
||||||
@ -955,7 +973,7 @@ we rely on experts in the system under investigation
|
|||||||
to perform a meaningful FMEA analysis.
|
to perform a meaningful FMEA analysis.
|
||||||
%
|
%
|
||||||
These experts must use their judgement and experience to choose
|
These experts must use their judgement and experience to choose
|
||||||
sub-sets of the components in the system to check against each {\fm}.
|
sub-sets of the components in the system, to check against each {\fm}.
|
||||||
%
|
%
|
||||||
Also, %In practise
|
Also, %In practise
|
||||||
these experts have to select the areas they see as most critical for detailed FMEA analysis:
|
these experts have to select the areas they see as most critical for detailed FMEA analysis:
|
||||||
@ -1056,7 +1074,7 @@ FMECA has three probability factors for component failures.
|
|||||||
\textbf{FMECA ${\lambda}_{p}$ value.}
|
\textbf{FMECA ${\lambda}_{p}$ value.}
|
||||||
This is the overall failure rate of a base component.
|
This is the overall failure rate of a base component.
|
||||||
This will typically be the failure rate per million ($10^6$) or
|
This will typically be the failure rate per million ($10^6$) or
|
||||||
billion ($10^9$) hours of operation. reference MIL1991.
|
billion ($10^9$) hours of operation~\cite{mil1991}.
|
||||||
|
|
||||||
\textbf{FMECA $\alpha$ value.}
|
\textbf{FMECA $\alpha$ value.}
|
||||||
The failure mode probability, usually denoted by $\alpha$ is the probability of
|
The failure mode probability, usually denoted by $\alpha$ is the probability of
|
||||||
@ -1148,13 +1166,13 @@ or across the software/hardware interface.
|
|||||||
|
|
||||||
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
||||||
\label{sec:FMEDA}
|
\label{sec:FMEDA}
|
||||||
\textbf{Failure Mode Classifications in FMEDA.}
|
\textbf{Failure Mode Classifications and metrics in FMEDA.}
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item \textbf{Safe or Dangerous} Failure modes are classified SAFE or DANGEROUS
|
\item \textbf{Safe or Dangerous} Failure modes are classified SAFE or DANGEROUS
|
||||||
\item \textbf{Detectable failure modes} Failure modes are given the attribute DETECTABLE or UNDETECTABLE
|
\item \textbf{Detectable failure modes} Failure modes are given the attribute DETECTABLE or UNDETECTABLE
|
||||||
\item \textbf{Four attributes to Failure Modes} All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU)
|
\item \textbf{Four attributes to Failure Modes} All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU)
|
||||||
\item \textbf{Four statistical properties of a system} \\
|
\item \textbf{Four statistical properties of a system} We sum the statistics for the four classifications of system failures \\
|
||||||
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$
|
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$ \\
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
% Failure modes are classified as Safe or Dangerous according
|
% Failure modes are classified as Safe or Dangerous according
|
||||||
@ -1334,7 +1352,7 @@ However, as with the components that we should check against a {\fm}, there are
|
|||||||
the reasoning stages for an FMEA entry.
|
the reasoning stages for an FMEA entry.
|
||||||
%FMEA does not stipulat which
|
%FMEA does not stipulat which
|
||||||
Ideally each FMEA entry would contain a reasoning description
|
Ideally each FMEA entry would contain a reasoning description
|
||||||
for each component the {\fm} is checked against, so that the the entry can be reviewed or revisited.
|
for each component the {\fm} is checked against, so that the the entry can be reviewed or revisited/audited.
|
||||||
Because FMEA is traditionally performed with one entry per component {\fm} full reasoning descriptions
|
Because FMEA is traditionally performed with one entry per component {\fm} full reasoning descriptions
|
||||||
are rare.
|
are rare.
|
||||||
This means that re-use, review and checking of traditional analysis must be started from `cold'.
|
This means that re-use, review and checking of traditional analysis must be started from `cold'.
|
||||||
|
@ -2,13 +2,52 @@
|
|||||||
|
|
||||||
\section*{Introduction}
|
\section*{Introduction}
|
||||||
|
|
||||||
This chapter examines FMEA in a critical light.
|
This chapter examines current FMEA
|
||||||
The problems with the scope---or required reasoning distance---of detail to apply
|
practise % practise is a noun and practise is a verb
|
||||||
for FMEA analysis, the difficulties of integrating software
|
in a
|
||||||
and hardware in FMEA failure models, and the near-impossibility of performing meaningful
|
critical light.
|
||||||
multiple failure analysis are examined.
|
Chapter~\ref{sec:chap2} introduced concepts underlying FMEA, and this chapter seeks to
|
||||||
Additional problems such as the inability to easily re-use, and validate (through
|
use these concepts to the determine the drawbacks and advantages in its current usage.
|
||||||
traceable reasoning) FMEA models is presented.
|
%
|
||||||
|
Legally mandatory FMEA for a large proportion of safety critical systems
|
||||||
|
in Europe and the USA, at the very least means that experienced
|
||||||
|
engineers have to discuss a system at a level of detail starting
|
||||||
|
at {\bc} {\fms}.
|
||||||
|
%
|
||||||
|
This undoubtedly reveals dangers inherent in designs and makes
|
||||||
|
our lives safer. This chapter aims to look for the deficiencies in the FMEA process, to probe for weaknesses
|
||||||
|
and look for ways in which it could be done better and more efficiently.
|
||||||
|
|
||||||
|
A major problem is with the scope of examination---or required reasoning distance---to apply
|
||||||
|
for FMEA analysis.
|
||||||
|
Checking all combinations quickly leads to a state explosion problem:
|
||||||
|
limiting the number of components to check for against for a given {\bc}
|
||||||
|
{\fm} could address this.
|
||||||
|
%
|
||||||
|
The difficulties of integrating software
|
||||||
|
and hardware in FMEA failure models mean that FMEA is showing its age: designed
|
||||||
|
in an era of simple electro-mechanical systems, the modern world with ubiquitous
|
||||||
|
cheap micro-controllers and processors mean that most of today’s systems are
|
||||||
|
now software/hardware hybrids.
|
||||||
|
%
|
||||||
|
|
||||||
|
With FMEA it is very difficult to perform %impossibility of performing
|
||||||
|
meaningful
|
||||||
|
multiple failure analysis.
|
||||||
|
The main reasons for this are that in electronics, each failure
|
||||||
|
can introduce a circuit topology change.
|
||||||
|
%
|
||||||
|
In software, in a similar vein,
|
||||||
|
one failure can influence the programmatic behaviour and decisions made
|
||||||
|
complicating the analysis of additional failures.
|
||||||
|
%
|
||||||
|
Dual failure analysis is required by some recent European standards~\cite{en298,en230}
|
||||||
|
and with increasing demands on safety we are likely to see more multiple failure
|
||||||
|
FMEA requirements.
|
||||||
|
|
||||||
|
Other problems such as the inability to easily re-use, and validate/audit (through
|
||||||
|
traceable reasoning) FMEA models are presented.
|
||||||
|
%
|
||||||
Finally we conclude with a list of deficiencies in current FMEA methodologies, and present a wish list
|
Finally we conclude with a list of deficiencies in current FMEA methodologies, and present a wish list
|
||||||
for an improved methodology.
|
for an improved methodology.
|
||||||
|
|
||||||
@ -33,61 +72,12 @@ each {\bc} {\fm}.
|
|||||||
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
|
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
|
||||||
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
|
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
|
||||||
but the structure of current FMEA reports does not encourage this.
|
but the structure of current FMEA reports does not encourage this.
|
||||||
|
\paragraph{Re-use of FMEA analysis}
|
||||||
\subsection{FMEA does not support modularity.}
|
|
||||||
It is a common practise in the process control industry to buy in sub-systems,
|
|
||||||
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
|
|
||||||
Most sensor systems now are `smart'~\cite{smartinstruments}, that is to say, they contain programmatic elements
|
|
||||||
even if their outputs are %they supply
|
|
||||||
analogue signals. For instance a liquid level sensor that
|
|
||||||
supplies a {\ft} output, would have been typically have been implemented
|
|
||||||
in analogue electronics before the 1980s. After that time, it would be common to use a micro-processor
|
|
||||||
based system to perform the functions of reading the sensor and converting it to a current (\ft) output.
|
|
||||||
For the non-safety critical systems integrator this brings with it the advantages
|
|
||||||
that come with using a digital system (increased accuracy, self checking and ease of
|
|
||||||
calibration etc. ). For a safety critical systems integrator this can be very problematic when it
|
|
||||||
comes to approvals. Even if the sensor manufacturer will let you see the internal workings and software
|
|
||||||
we have a problem with tracing the FMEA reasoning through the sensor, through the sensors software
|
|
||||||
and then though the system being integrated.
|
|
||||||
This problem is compounded by the fact that traditional FMEA cannot integrate software into FMEA models~\cite{sfmea,safeware}.
|
|
||||||
|
|
||||||
|
|
||||||
\section{Reasoning Distance used to measure Comparison Complexity}
|
|
||||||
\label{sec:reasoningdistance}
|
|
||||||
Traditional FMEA cannot ensure that each failure mode of all its
|
|
||||||
components are checked against any other components in the system which
|
|
||||||
it may affect, due to state explosion.
|
|
||||||
%
|
%
|
||||||
FMEA is therefore performed using heuristics to decide
|
|
||||||
which components to check the effect of a component failure mode on.
|
|
||||||
We could term the number of checks made for each failure mode
|
|
||||||
on aspects of the system to be the reasoning distance.
|
|
||||||
%
|
|
||||||
In practise FMEA may be performed by following the signal path
|
|
||||||
of the component failure mode to its system level effect. This is less than ideal
|
|
||||||
and it can easily miss interactions with adjacent components, that could cause
|
|
||||||
other system level symptoms.
|
|
||||||
%
|
|
||||||
Were we to compare the reasoning distance with the theoretical maximum, the sum of all failure
|
|
||||||
modes in a system, multiplied by the number of components in it, we could arrive at a maximum
|
|
||||||
reasoning distance, which we can use as a comparison complexity figure.
|
|
||||||
%
|
|
||||||
This figure would mean we could compare the maximum number of checks (i.e. exhaustive %rigorous
|
|
||||||
analysis) with the number actually performed.
|
|
||||||
|
|
||||||
\paragraph{The ideal of exhaustive FMEA (XFMEA)}
|
|
||||||
Obviously, exhaustively checking every component failure mode in a system,
|
|
||||||
against all other components is the ideal for finding all possible system level failures.
|
|
||||||
While this is impossible for all but trivial systems, it should be possible
|
|
||||||
for small groups of components that work together to provide a well defined function.
|
|
||||||
We could term such a group a `{\fg}'.
|
|
||||||
|
|
||||||
\section{Re-use of FMEA analysis}
|
|
||||||
|
|
||||||
Given the {\bc} {\fm} to system level failure mode paradigm it is
|
Given the {\bc} {\fm} to system level failure mode paradigm it is
|
||||||
difficult to re-use FMEA analysis.
|
difficult to re-use FMEA analysis.
|
||||||
%
|
%
|
||||||
Several strategies to aid re-use have been proposed~\cite{rudov2009language, reuse_of_fmea}, but
|
Several strategies to aid re-use have been proposed~\cite{rudov2009language, patterns6113886,931423 }, but
|
||||||
the fundamental problem remains, that, with any changes
|
the fundamental problem remains, that, with any changes
|
||||||
to the component base in a system, it is very difficult to
|
to the component base in a system, it is very difficult to
|
||||||
determine which FMEA test scenarios must be re-worked.
|
determine which FMEA test scenarios must be re-worked.
|
||||||
@ -100,6 +90,66 @@ The failure mode behaviour of these repeated structures will be the same.
|
|||||||
However with the {\bc} {\fm} to system level failure mode mapping
|
However with the {\bc} {\fm} to system level failure mode mapping
|
||||||
work is likely to be repeated.
|
work is likely to be repeated.
|
||||||
|
|
||||||
|
\subsection{FMEA does not support modularity.}
|
||||||
|
It is a common practise in the process control industry to buy in sub-systems,
|
||||||
|
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
|
||||||
|
With traditional FMEA it is difficult to deal with
|
||||||
|
a `plug~and~play' paradigm. The design philosophy of FMEA is to trace {\bc} failure through to system failures.
|
||||||
|
This is incompatible with a modular approach where the architecture of a
|
||||||
|
system may be different for implementation sites.
|
||||||
|
The modularity problem is exacerbated by FMEAS problems modelling software/hardware hybrids, a problem
|
||||||
|
examined in section~\ref{sec:distributed}.
|
||||||
|
% Most sensor systems now are `smart'~\cite{smartinstruments}, that is to say, they contain programmatic elements
|
||||||
|
% even if their outputs are %they supply
|
||||||
|
% analogue signals. For instance a liquid level sensor that
|
||||||
|
% supplies a {\ft} output, would have been typically have been implemented
|
||||||
|
% in analogue electronics before the 1980s. After that time, it would be common to use a micro-processor
|
||||||
|
% based system to perform the functions of reading the sensor and converting it to a current (\ft) output.
|
||||||
|
% For the non-safety critical systems integrator this brings with it the advantages
|
||||||
|
% that come with using a digital system (increased accuracy, self checking and ease of
|
||||||
|
% calibration etc. ). For a safety critical systems integrator this can be very problematic when it
|
||||||
|
% comes to approvals. Even if the sensor manufacturer will let you see the internal workings and software
|
||||||
|
% we have a problem with tracing the FMEA reasoning through the sensor, through the sensors software
|
||||||
|
% and then though the system being integrated.
|
||||||
|
% This problem is compounded by the fact that traditional FMEA cannot integrate software into FMEA models~\cite{sfmea,safeware}.
|
||||||
|
|
||||||
|
|
||||||
|
\section{Reasoning Distance used to measure Comparison Complexity}
|
||||||
|
\label{sec:reasoningdistance}
|
||||||
|
Traditional FMEA cannot ensure that each failure mode of all its
|
||||||
|
components are checked against any other components in the system which
|
||||||
|
it may affect, due to state explosion.
|
||||||
|
%
|
||||||
|
FMEA is therefore performed using heuristics to decide
|
||||||
|
which components to check the effect of a component failure mode on.
|
||||||
|
%We could term the number of checks made for each failure mode
|
||||||
|
%on aspects of the system to be the reasoning distance.
|
||||||
|
%
|
||||||
|
Typically FMEA will performed by following the signal path
|
||||||
|
of the component failure mode to its system level effect,
|
||||||
|
echoing fault finding reasoning.
|
||||||
|
%
|
||||||
|
This is less than ideal
|
||||||
|
and it can easily miss interactions with adjacent components, that could cause
|
||||||
|
other system level symptoms.
|
||||||
|
%
|
||||||
|
Were we to compare the reasoning distance with the theoretical maximum, the sum of all failure
|
||||||
|
modes in a system, multiplied by the number of components in it, we could arrive at a maximum
|
||||||
|
reasoning distance, which we can use as a comparison complexity figure.
|
||||||
|
%
|
||||||
|
This figure would mean we could compare the maximum number of checks (i.e. exhaustive %rigorous
|
||||||
|
analysis) with the number actually performed.
|
||||||
|
|
||||||
|
\paragraph{The ideal of exhaustive FMEA (XFMEA).}
|
||||||
|
Obviously, exhaustively checking every component failure mode in a system,
|
||||||
|
against all other components is the ideal for finding all possible system level failures.
|
||||||
|
While this is impossible for all but trivial systems, we note that it should be possible
|
||||||
|
for small groups of components that work together to provide a well defined function.
|
||||||
|
We could term such a group a `{\fg}'. Potentially here we have a way of de-composing
|
||||||
|
the problem and reducing the $O(N^2)$ state explosion effect
|
||||||
|
associated with XFMEA.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\section{Software and FMEA}
|
\section{Software and FMEA}
|
||||||
|
|
||||||
@ -138,14 +188,16 @@ With the increasing use of micro-controllers in place of analogue electronics
|
|||||||
for most new designs of electronic product, the poor integration capabilities of FMEA
|
for most new designs of electronic product, the poor integration capabilities of FMEA
|
||||||
are now being seen as deficiencies.
|
are now being seen as deficiencies.
|
||||||
|
|
||||||
This apparent then in the dilemma now faced
|
This is becoming apparent in a dilemma now faced
|
||||||
by organisations dealing with highly safety critical systems, and having rely on `smart~instruments'
|
by organisations dealing with highly safety critical systems, and having rely on `smart~instruments'
|
||||||
that they can no longer validate using FMEA.
|
that they can no longer validate using FMEA.
|
||||||
|
%
|
||||||
Smart instruments are dealt with in the section below.
|
Smart instruments are dealt with in the section below.
|
||||||
Distributed real time systems, which rely on micro-controllers connected in a network
|
Distributed real time systems, which rely on micro-controllers connected in a network
|
||||||
using a communications protocol, are also impossible to be meaningfully analysed by FMEA.
|
using a communications protocol, are also impossible to be meaningfully analysed by FMEA.
|
||||||
|
|
||||||
\subsection{The rise of the smart instrument}
|
\subsection{The rise of the smart instrument}
|
||||||
|
\label{sec:smart}
|
||||||
%% AWE --- Atomic Weapons Establishment have this problem....
|
%% AWE --- Atomic Weapons Establishment have this problem....
|
||||||
A smart instrument is defined as one that uses a micro-processor and software
|
A smart instrument is defined as one that uses a micro-processor and software
|
||||||
in conjunction with its sensing electronics, rather than
|
in conjunction with its sensing electronics, rather than
|
||||||
@ -186,25 +238,31 @@ systems. %by traditional FMEA.
|
|||||||
Currently the only way that some smart~instruments have been permitted for
|
Currently the only way that some smart~instruments have been permitted for
|
||||||
use in highly critical systems is the have the extensively
|
use in highly critical systems is the have the extensively
|
||||||
functionally tested~\cite{bishopsmartinstruments}.
|
functionally tested~\cite{bishopsmartinstruments}.
|
||||||
|
|
||||||
|
|
||||||
%>>>>>>> 1b3d54f0ec2963017e98c4cdadc9a72a8bac911a
|
%>>>>>>> 1b3d54f0ec2963017e98c4cdadc9a72a8bac911a
|
||||||
|
|
||||||
\subsection{Distributed real time systems}
|
\subsection{Distributed real time systems}
|
||||||
|
\label{sec:distributed}
|
||||||
Distributed real time systems are control systems where
|
Distributed real time systems are control systems where
|
||||||
smart sensors communicate over a communications bus to
|
smart sensors communicate over a communications bus to
|
||||||
a master controller.
|
a master controller.
|
||||||
%
|
%
|
||||||
Most modern cars follow this information technology pattern and use CANbus~\cite{canspec,can}.
|
Most modern cars follow this information technology pattern and use CANbus~\cite{canspec,can}.
|
||||||
%
|
%
|
||||||
For instance, in a modern car there will be no mechanical linkage from the pedal to the engine, instead the throttle pedal will be linked to a sensor to determine how
|
For instance, in a modern car there will be no mechanical linkage from the pedal to the engine, instead the throttle pedal
|
||||||
|
will be linked to a sensor to determine how
|
||||||
far the pedal is pressed.
|
far the pedal is pressed.
|
||||||
This sensor will be read by a micro-controller, and passed, via CANbus, to the Engine Control Unit (ECU)
|
This sensor will be read by a micro-controller, and passed, via CANbus, to the Engine Control Unit (ECU)
|
||||||
which will use that information (along with information from other sensors) to adjust the power required from the engine.
|
which will use that information (along with information from other sensors) to adjust the power required from the engine.
|
||||||
|
%
|
||||||
This adjustment could be direct, or could be another CANbus message passed to a micro-controller regulating engine function.
|
This adjustment could be direct, or could be another CANbus message passed to a micro-controller regulating engine function.
|
||||||
In terms of FMEA, see figure~\ref{fig:distcon}, our reasoning path spans four interface layers of electronics to software.
|
%
|
||||||
|
In terms of FMEA, see figure~\ref{fig:distcon}, our reasoning path spans (at least) four interface layers of electronics to software.
|
||||||
|
%
|
||||||
Traditional FMEA does not cater for the software hardware interface, and here we have the addition complications
|
Traditional FMEA does not cater for the software hardware interface, and here we have the addition complications
|
||||||
%with the additional complications
|
%with the additional complications
|
||||||
of the communications protocol used to transmit data, and the failure mode characteristics
|
of the communications protocol used to transmit data and the failure mode characteristics
|
||||||
of the communications physical layer.
|
of the communications physical layer.
|
||||||
|
|
||||||
%(figure~\ref{fig:distcon}
|
%(figure~\ref{fig:distcon}
|
||||||
@ -235,10 +293,11 @@ utterly anachronistic in the distributed real time system environment.
|
|||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item FMEA type methodologies were designed for simple electro-mechanical systems of the 1940's to 1960's.
|
\item FMEA type methodologies were designed for simple electro-mechanical systems of the 1940's to 1960's.
|
||||||
\item Reasoning Distance - component failure to system level symptom process is undefined in regard to the components to check against each given component{\fm}.
|
\item Reasoning Distance - component failure to system level symptom process is undefined in regard
|
||||||
|
to the components to check against each given component {\fm}.
|
||||||
\item State explosion - impossible to perform FMEA exhaustively %rigorously
|
\item State explosion - impossible to perform FMEA exhaustively %rigorously
|
||||||
\item Difficult to re-use previous analysis work
|
\item Difficult to re-use previous analysis work
|
||||||
\item Very Difficult to model simultaneous failures.
|
\item Very difficult to model simultaneous failures.
|
||||||
\item Software and hardware models are separate (if the software is modelled at all).
|
\item Software and hardware models are separate (if the software is modelled at all).
|
||||||
\item Distributed real time systems are very difficult to analyse with FMEA because they typically involve many hardware/software interfaces.
|
\item Distributed real time systems are very difficult to analyse with FMEA because they typically involve many hardware/software interfaces.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
@ -352,7 +411,8 @@ very good with single failure modes linked to top level events.
|
|||||||
FMEA has become part of the safety critical and safety certification industries.
|
FMEA has become part of the safety critical and safety certification industries.
|
||||||
%
|
%
|
||||||
SFMEA is in its infancy, and there are corresponding gaps in
|
SFMEA is in its infancy, and there are corresponding gaps in
|
||||||
certification for software, EN61508~\cite{en61508}, recommends hardware redundancy architectures in conjunction
|
certification for software, EN61508~\cite{en61508} a modern standard based
|
||||||
|
on a modern variant of FMEA, recommends hardware redundancy architectures in conjunction
|
||||||
with FMEDA for hardware: for software it recommends language constraints and quality procedures
|
with FMEDA for hardware: for software it recommends language constraints and quality procedures
|
||||||
but no inductive fault finding technique.
|
but no inductive fault finding technique.
|
||||||
%
|
%
|
||||||
@ -378,7 +438,7 @@ We now form a wish list, stating the features that we would want
|
|||||||
in an improved FMEA methodology,
|
in an improved FMEA methodology,
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item Must be able to analyse hybrid software/hardware systems,
|
\item Must be able to analyse hybrid software/hardware systems,
|
||||||
\item no state explosion (which would make analysis impractical),
|
\item no state explosion (which has rendered exhaustive analysis impractical),
|
||||||
\item exhaustive checking at a modular level, %(total failure coverage within {\fgs} all interacting component and failure modes checked),
|
\item exhaustive checking at a modular level, %(total failure coverage within {\fgs} all interacting component and failure modes checked),
|
||||||
\item traceable reasoning system models,% to aid repeatability and checking,
|
\item traceable reasoning system models,% to aid repeatability and checking,
|
||||||
\item re-usable i.e. it should be possible to re-use analysis,
|
\item re-usable i.e. it should be possible to re-use analysis,
|
||||||
|
@ -348,7 +348,8 @@ we thus reveal design deficiencies.
|
|||||||
In Safety Integrity Level (SIL)~\cite{en61508} terms, by identifying undetectable faults and fixing them, we raise
|
In Safety Integrity Level (SIL)~\cite{en61508} terms, by identifying undetectable faults and fixing them, we raise
|
||||||
the safe failure fraction (SFF).
|
the safe failure fraction (SFF).
|
||||||
|
|
||||||
|
\section{Objective and Subjective Reasoning stages}
|
||||||
|
Opportunity for formal definitions and perhaps an interface or process for achieving it....
|
||||||
|
|
||||||
\section{Conclusion}
|
\section{Conclusion}
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user