print out-redpen-edit.... now a nice

cycle ride to prince regent-a 2000m swim
and then maybe a 6 inch sub with tuna...
This commit is contained in:
Robin Clark 2013-05-12 11:08:52 +01:00
parent 05f96697e7
commit ec7dc38679
5 changed files with 184 additions and 94 deletions

View File

@ -860,7 +860,18 @@ strength of materials, the causes of boiler explosions",
biburl="http://www.isa.org/InTechTemplate.cfm?template=/ContentManagement/ContentDisplay.cfm\&ContentID=77994",
}
@INPROCEEDINGS{patterns6113886,
author={Lopatkin, I. and Iliasov, A. and Romanovsky, A. and Prokhorova, Y. and Troubitsyna, E.},
booktitle={High-Assurance Systems Engineering (HASE), 2011 IEEE 13th International Symposium on},
title={Patterns for Representing FMEA in Formal Specification of Control Systems},
year={2011},
pages={146-151},
keywords={control engineering computing;control systems;failure analysis;formal specification;program diagnostics;system recovery;effects analysis;error detection;error recovery;failure modes;formal event-B specification;formal system development;inductive safety analysis;requirement tracing;sluice control system;Computational modeling;Logic gates;Safety;Sensor systems;Switches;Event-B;FMEA;control systems;formal specification;patterns;safety},
doi={10.1109/HASE.2011.10},
ISSN={1530-2059},}
@PHDTHESIS{garrett,
AUTHOR = "Chris Garrett",
TITLE = "Functional diagnosis strategies for analog systems using heuristic programming techniques",

View File

@ -128,7 +128,7 @@ This means that for each {\cb} node there are at least two hardware software int
Because of this it is virtually impossible to apply meaningful traditional FMEA methodologies to
{\cb} systems.
%
This paper firstly highlights the limitations with traditonal FMEA,
This paper firstly highlights the limitations with traditional FMEA,
and then describes a new modularised variant, Failure Mode Modular De-composition
which addresses the problems of applying FMEA to software/hardware hybrid systems.
%The paper first discussed work performed on software FMEA, and then shows the need

View File

@ -36,7 +36,7 @@ defined at the start of this chapter.
The act
of defining relationships between the data objects
in FMEA raise questions about the nature of the process
and allow us to analytically discuss its strengths and weaknesses.
and allows us to analytically discuss its strengths and weaknesses.
@ -176,7 +176,8 @@ component types, but does not detail specific failure modes.
Using MIL1991 in conjunction with FMD-91 we can determine statistics for the failure modes
of component types.
%
The FMEDA process from European standard EN61508~\cite{en61508}
The FMEA variant\footnote{EN61508 (and related standards) are based on the FMEA variant Failure Mode Effects and Diagnostic Analysis (FMEDA)}
used for European standard EN61508~\cite{en61508}
requires statistics for Meantime to Failure (MTTF) for all {\bc} failure modes.
@ -468,7 +469,7 @@ that we got from FMD-91, listed in equation~\ref{eqn:opampfms}.
\end{table}
%\clearpage
\clearpage
\subsubsection{Failure modes of an Op-Amp}
@ -515,11 +516,6 @@ component {\fms} in FMEA or FMMD and require interpretation.
\clearpage
%%
%% Paragraph using failure modes to build from bottom up
%%
@ -662,11 +658,11 @@ echoing diagnostic/fault~finding methods~\cite{garrett, maikowski}. % loebowski}
%
When fault finding, we generally follow the signal path checking for correct behaviour
along it: when we find something out of place we zoom in and measure
the circuit behaviour until we find a faulty component or module.
the circuit behaviour until we find a faulty component or module~\cite{garrett}.
%
With this style of fault finding, because it is based on experiment,
we can hop from module to module eliminating working modules, until we find the
failure.
failure~\cite{maikowski}.
%
The rationale and work-culture of those tasked to
perform FMEA are generally personnel who have performed fault finding.
@ -706,7 +702,7 @@ Also, whether following the effects through the signal path {\em only} is accept
would looking at its effect on all other components in the system be necessary.
%is a matter for debate.
%
In practise, it is a compromise between the amount of time/money that can be spent
In practise, a compromise is made between the amount of time/money that can be spent
on analysis relative to the criticality of the project.
Metrics from measuring the amount of work to undertake for FMEA are examined in section~\ref{sec:xfmea}.
@ -717,7 +713,7 @@ change the circuit topology. For a single failure
this effect may cause additional complications for the analyst.
For multiple failures this means
that the analyst
will have to deal altered---or changed circuit topologies---
will have to deal with altered---or changed circuit topologies---
of the electronic circuit for each analysis.
@ -776,13 +772,29 @@ did not link this failure to the catastrophic failure of the spacecraft~\cite{ch
This was not a failure in the objective reasoning, but more of the subjective, or the context in which the leak occurred.
%
What this means is that for an objectively calculated failure mode outcome, we may have
more than one subjective outcome definition for it.
more than one subjective outcome. %, or definition, for it.
%
This means that objective reasoning can be applied to determine objective effects, but the criticality ---or the seriousness/consequences---
of those failures depends upon the Equipment Under Control (EUC)
and its environment.
%
For instance a leak of nuclear material on an aboard a spacecraft could have the consequences
of loss of mission, but a leak on earth could have serious health and environmental consequences.
This means one line of FMECA describing a system risk is an over simplification (consider that the same
nuclear material will be present during transport and launch, and when outside earth's environment).
%
Subjective appraisal of the outcome of a system failure mode can also
be subject to management and/or political pressure.
\paragraph{Multiple Simultaneous Failure Modes}
%
FMEA is less useful for determining events for multiple
simultaneous
failures\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.}.
failures\footnote{Multiple simultaneous failures are taken to mean failures that occur within the same detection period.
Detection periods are typically determined for the process under control. For a flame in an industrial burner this
could typically be one second.~\cite{en298}}.
%
Work has been performed using component failure statistics to
offer the more likely multiple failures~\cite{FMEAmultiple653556} for analysis.
@ -806,14 +818,18 @@ meaning the additional failures might have to be analysed with respect to the ch
\paragraph{Failure modes and their observability criterion: detectable and undetectable.}
\label{sec:detectable}
Often the effects of a failure mode may be easy to detect,
and our equipment can react by raising an alarm or compensating for the resulting fault.
%
Some failure modes may cause undetectable failures, for instance a component that causes
a measured reading to change could have adverse consequences yet not be flagged as a failure.
%
This type of failure would not be flagged as a failure by the system, because
it has no way of knowing the reading is invalid.
This type of failure %
%would not be flagged as a failure by the system, because
can not be dealt with by passing an error indication to higher level modules
because we cannot detect it. The system therefore
has no way of knowing the reading is invalid.
%
The term observable has a specific meaning in the field of control engineering~\cite{721666, ACS:ACS1297};
systems submitted for FMEA are generally related to control systems,
@ -893,11 +909,13 @@ methodologies.
%{sfmeaforwardbackward}
\subsection{FMEA and the State Explosion Problem}
\label{sec:xfmea}
\paragraph{Exhaustive Single Failure FMEA.}
\paragraph{Problem of which components to check for a given {\bc} {\fm}.}
FMEA for a safety critical certification~\cite{en298,en61508} will have to be applied
to all known failure modes of all components within a system.
%
Each one of these, in a typical report, would be one line of a spreadsheet entry.
%
FMEA does not define or specify the scope of the investigation of each component failure mode.
Should we follow the signal path, and all components we encounter along that, or should the scope be wider?
%
@ -921,7 +939,7 @@ $f$ is the number of failure modes per component.
\end{equation}
\paragraph{Exhaustive Single Failure FMEA}
\paragraph{Exhaustive FMEA and dual failures.}
This would mean an order of $O(N^2)$ number of checks to perform
to undertake an `exhaustive~FMEA'. Even small systems have typically
100 components, and they typically have 3 or more failure modes each.
@ -955,7 +973,7 @@ we rely on experts in the system under investigation
to perform a meaningful FMEA analysis.
%
These experts must use their judgement and experience to choose
sub-sets of the components in the system to check against each {\fm}.
sub-sets of the components in the system, to check against each {\fm}.
%
Also, %In practise
these experts have to select the areas they see as most critical for detailed FMEA analysis:
@ -1056,7 +1074,7 @@ FMECA has three probability factors for component failures.
\textbf{FMECA ${\lambda}_{p}$ value.}
This is the overall failure rate of a base component.
This will typically be the failure rate per million ($10^6$) or
billion ($10^9$) hours of operation. reference MIL1991.
billion ($10^9$) hours of operation~\cite{mil1991}.
\textbf{FMECA $\alpha$ value.}
The failure mode probability, usually denoted by $\alpha$ is the probability of
@ -1148,13 +1166,13 @@ or across the software/hardware interface.
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
\label{sec:FMEDA}
\textbf{Failure Mode Classifications in FMEDA.}
\textbf{Failure Mode Classifications and metrics in FMEDA.}
\begin{itemize}
\item \textbf{Safe or Dangerous} Failure modes are classified SAFE or DANGEROUS
\item \textbf{Detectable failure modes} Failure modes are given the attribute DETECTABLE or UNDETECTABLE
\item \textbf{Four attributes to Failure Modes} All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU)
\item \textbf{Four statistical properties of a system} \\
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$
\item \textbf{Four statistical properties of a system} We sum the statistics for the four classifications of system failures \\
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$ \\
\end{itemize}
% Failure modes are classified as Safe or Dangerous according
@ -1334,7 +1352,7 @@ However, as with the components that we should check against a {\fm}, there are
the reasoning stages for an FMEA entry.
%FMEA does not stipulat which
Ideally each FMEA entry would contain a reasoning description
for each component the {\fm} is checked against, so that the the entry can be reviewed or revisited.
for each component the {\fm} is checked against, so that the the entry can be reviewed or revisited/audited.
Because FMEA is traditionally performed with one entry per component {\fm} full reasoning descriptions
are rare.
This means that re-use, review and checking of traditional analysis must be started from `cold'.

View File

@ -2,13 +2,52 @@
\section*{Introduction}
This chapter examines FMEA in a critical light.
The problems with the scope---or required reasoning distance---of detail to apply
for FMEA analysis, the difficulties of integrating software
and hardware in FMEA failure models, and the near-impossibility of performing meaningful
multiple failure analysis are examined.
Additional problems such as the inability to easily re-use, and validate (through
traceable reasoning) FMEA models is presented.
This chapter examines current FMEA
practise % practise is a noun and practise is a verb
in a
critical light.
Chapter~\ref{sec:chap2} introduced concepts underlying FMEA, and this chapter seeks to
use these concepts to the determine the drawbacks and advantages in its current usage.
%
Legally mandatory FMEA for a large proportion of safety critical systems
in Europe and the USA, at the very least means that experienced
engineers have to discuss a system at a level of detail starting
at {\bc} {\fms}.
%
This undoubtedly reveals dangers inherent in designs and makes
our lives safer. This chapter aims to look for the deficiencies in the FMEA process, to probe for weaknesses
and look for ways in which it could be done better and more efficiently.
A major problem is with the scope of examination---or required reasoning distance---to apply
for FMEA analysis.
Checking all combinations quickly leads to a state explosion problem:
limiting the number of components to check for against for a given {\bc}
{\fm} could address this.
%
The difficulties of integrating software
and hardware in FMEA failure models mean that FMEA is showing its age: designed
in an era of simple electro-mechanical systems, the modern world with ubiquitous
cheap micro-controllers and processors mean that most of todays systems are
now software/hardware hybrids.
%
With FMEA it is very difficult to perform %impossibility of performing
meaningful
multiple failure analysis.
The main reasons for this are that in electronics, each failure
can introduce a circuit topology change.
%
In software, in a similar vein,
one failure can influence the programmatic behaviour and decisions made
complicating the analysis of additional failures.
%
Dual failure analysis is required by some recent European standards~\cite{en298,en230}
and with increasing demands on safety we are likely to see more multiple failure
FMEA requirements.
Other problems such as the inability to easily re-use, and validate/audit (through
traceable reasoning) FMEA models are presented.
%
Finally we conclude with a list of deficiencies in current FMEA methodologies, and present a wish list
for an improved methodology.
@ -33,61 +72,12 @@ each {\bc} {\fm}.
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
but the structure of current FMEA reports does not encourage this.
\subsection{FMEA does not support modularity.}
It is a common practise in the process control industry to buy in sub-systems,
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
Most sensor systems now are `smart'~\cite{smartinstruments}, that is to say, they contain programmatic elements
even if their outputs are %they supply
analogue signals. For instance a liquid level sensor that
supplies a {\ft} output, would have been typically have been implemented
in analogue electronics before the 1980s. After that time, it would be common to use a micro-processor
based system to perform the functions of reading the sensor and converting it to a current (\ft) output.
For the non-safety critical systems integrator this brings with it the advantages
that come with using a digital system (increased accuracy, self checking and ease of
calibration etc. ). For a safety critical systems integrator this can be very problematic when it
comes to approvals. Even if the sensor manufacturer will let you see the internal workings and software
we have a problem with tracing the FMEA reasoning through the sensor, through the sensors software
and then though the system being integrated.
This problem is compounded by the fact that traditional FMEA cannot integrate software into FMEA models~\cite{sfmea,safeware}.
\section{Reasoning Distance used to measure Comparison Complexity}
\label{sec:reasoningdistance}
Traditional FMEA cannot ensure that each failure mode of all its
components are checked against any other components in the system which
it may affect, due to state explosion.
\paragraph{Re-use of FMEA analysis}
%
FMEA is therefore performed using heuristics to decide
which components to check the effect of a component failure mode on.
We could term the number of checks made for each failure mode
on aspects of the system to be the reasoning distance.
%
In practise FMEA may be performed by following the signal path
of the component failure mode to its system level effect. This is less than ideal
and it can easily miss interactions with adjacent components, that could cause
other system level symptoms.
%
Were we to compare the reasoning distance with the theoretical maximum, the sum of all failure
modes in a system, multiplied by the number of components in it, we could arrive at a maximum
reasoning distance, which we can use as a comparison complexity figure.
%
This figure would mean we could compare the maximum number of checks (i.e. exhaustive %rigorous
analysis) with the number actually performed.
\paragraph{The ideal of exhaustive FMEA (XFMEA)}
Obviously, exhaustively checking every component failure mode in a system,
against all other components is the ideal for finding all possible system level failures.
While this is impossible for all but trivial systems, it should be possible
for small groups of components that work together to provide a well defined function.
We could term such a group a `{\fg}'.
\section{Re-use of FMEA analysis}
Given the {\bc} {\fm} to system level failure mode paradigm it is
difficult to re-use FMEA analysis.
%
Several strategies to aid re-use have been proposed~\cite{rudov2009language, reuse_of_fmea}, but
Several strategies to aid re-use have been proposed~\cite{rudov2009language, patterns6113886,931423 }, but
the fundamental problem remains, that, with any changes
to the component base in a system, it is very difficult to
determine which FMEA test scenarios must be re-worked.
@ -100,6 +90,66 @@ The failure mode behaviour of these repeated structures will be the same.
However with the {\bc} {\fm} to system level failure mode mapping
work is likely to be repeated.
\subsection{FMEA does not support modularity.}
It is a common practise in the process control industry to buy in sub-systems,
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
With traditional FMEA it is difficult to deal with
a `plug~and~play' paradigm. The design philosophy of FMEA is to trace {\bc} failure through to system failures.
This is incompatible with a modular approach where the architecture of a
system may be different for implementation sites.
The modularity problem is exacerbated by FMEAS problems modelling software/hardware hybrids, a problem
examined in section~\ref{sec:distributed}.
% Most sensor systems now are `smart'~\cite{smartinstruments}, that is to say, they contain programmatic elements
% even if their outputs are %they supply
% analogue signals. For instance a liquid level sensor that
% supplies a {\ft} output, would have been typically have been implemented
% in analogue electronics before the 1980s. After that time, it would be common to use a micro-processor
% based system to perform the functions of reading the sensor and converting it to a current (\ft) output.
% For the non-safety critical systems integrator this brings with it the advantages
% that come with using a digital system (increased accuracy, self checking and ease of
% calibration etc. ). For a safety critical systems integrator this can be very problematic when it
% comes to approvals. Even if the sensor manufacturer will let you see the internal workings and software
% we have a problem with tracing the FMEA reasoning through the sensor, through the sensors software
% and then though the system being integrated.
% This problem is compounded by the fact that traditional FMEA cannot integrate software into FMEA models~\cite{sfmea,safeware}.
\section{Reasoning Distance used to measure Comparison Complexity}
\label{sec:reasoningdistance}
Traditional FMEA cannot ensure that each failure mode of all its
components are checked against any other components in the system which
it may affect, due to state explosion.
%
FMEA is therefore performed using heuristics to decide
which components to check the effect of a component failure mode on.
%We could term the number of checks made for each failure mode
%on aspects of the system to be the reasoning distance.
%
Typically FMEA will performed by following the signal path
of the component failure mode to its system level effect,
echoing fault finding reasoning.
%
This is less than ideal
and it can easily miss interactions with adjacent components, that could cause
other system level symptoms.
%
Were we to compare the reasoning distance with the theoretical maximum, the sum of all failure
modes in a system, multiplied by the number of components in it, we could arrive at a maximum
reasoning distance, which we can use as a comparison complexity figure.
%
This figure would mean we could compare the maximum number of checks (i.e. exhaustive %rigorous
analysis) with the number actually performed.
\paragraph{The ideal of exhaustive FMEA (XFMEA).}
Obviously, exhaustively checking every component failure mode in a system,
against all other components is the ideal for finding all possible system level failures.
While this is impossible for all but trivial systems, we note that it should be possible
for small groups of components that work together to provide a well defined function.
We could term such a group a `{\fg}'. Potentially here we have a way of de-composing
the problem and reducing the $O(N^2)$ state explosion effect
associated with XFMEA.
\section{Software and FMEA}
@ -138,14 +188,16 @@ With the increasing use of micro-controllers in place of analogue electronics
for most new designs of electronic product, the poor integration capabilities of FMEA
are now being seen as deficiencies.
This apparent then in the dilemma now faced
This is becoming apparent in a dilemma now faced
by organisations dealing with highly safety critical systems, and having rely on `smart~instruments'
that they can no longer validate using FMEA.
%
Smart instruments are dealt with in the section below.
Distributed real time systems, which rely on micro-controllers connected in a network
using a communications protocol, are also impossible to be meaningfully analysed by FMEA.
\subsection{The rise of the smart instrument}
\label{sec:smart}
%% AWE --- Atomic Weapons Establishment have this problem....
A smart instrument is defined as one that uses a micro-processor and software
in conjunction with its sensing electronics, rather than
@ -186,25 +238,31 @@ systems. %by traditional FMEA.
Currently the only way that some smart~instruments have been permitted for
use in highly critical systems is the have the extensively
functionally tested~\cite{bishopsmartinstruments}.
%>>>>>>> 1b3d54f0ec2963017e98c4cdadc9a72a8bac911a
\subsection{Distributed real time systems}
\label{sec:distributed}
Distributed real time systems are control systems where
smart sensors communicate over a communications bus to
a master controller.
%
Most modern cars follow this information technology pattern and use CANbus~\cite{canspec,can}.
%
For instance, in a modern car there will be no mechanical linkage from the pedal to the engine, instead the throttle pedal will be linked to a sensor to determine how
For instance, in a modern car there will be no mechanical linkage from the pedal to the engine, instead the throttle pedal
will be linked to a sensor to determine how
far the pedal is pressed.
This sensor will be read by a micro-controller, and passed, via CANbus, to the Engine Control Unit (ECU)
which will use that information (along with information from other sensors) to adjust the power required from the engine.
%
This adjustment could be direct, or could be another CANbus message passed to a micro-controller regulating engine function.
In terms of FMEA, see figure~\ref{fig:distcon}, our reasoning path spans four interface layers of electronics to software.
%
In terms of FMEA, see figure~\ref{fig:distcon}, our reasoning path spans (at least) four interface layers of electronics to software.
%
Traditional FMEA does not cater for the software hardware interface, and here we have the addition complications
%with the additional complications
of the communications protocol used to transmit data, and the failure mode characteristics
of the communications protocol used to transmit data and the failure mode characteristics
of the communications physical layer.
%(figure~\ref{fig:distcon}
@ -235,10 +293,11 @@ utterly anachronistic in the distributed real time system environment.
\begin{itemize}
\item FMEA type methodologies were designed for simple electro-mechanical systems of the 1940's to 1960's.
\item Reasoning Distance - component failure to system level symptom process is undefined in regard to the components to check against each given component{\fm}.
\item Reasoning Distance - component failure to system level symptom process is undefined in regard
to the components to check against each given component {\fm}.
\item State explosion - impossible to perform FMEA exhaustively %rigorously
\item Difficult to re-use previous analysis work
\item Very Difficult to model simultaneous failures.
\item Very difficult to model simultaneous failures.
\item Software and hardware models are separate (if the software is modelled at all).
\item Distributed real time systems are very difficult to analyse with FMEA because they typically involve many hardware/software interfaces.
\end{itemize}
@ -352,7 +411,8 @@ very good with single failure modes linked to top level events.
FMEA has become part of the safety critical and safety certification industries.
%
SFMEA is in its infancy, and there are corresponding gaps in
certification for software, EN61508~\cite{en61508}, recommends hardware redundancy architectures in conjunction
certification for software, EN61508~\cite{en61508} a modern standard based
on a modern variant of FMEA, recommends hardware redundancy architectures in conjunction
with FMEDA for hardware: for software it recommends language constraints and quality procedures
but no inductive fault finding technique.
%
@ -378,7 +438,7 @@ We now form a wish list, stating the features that we would want
in an improved FMEA methodology,
\begin{itemize}
\item Must be able to analyse hybrid software/hardware systems,
\item no state explosion (which would make analysis impractical),
\item no state explosion (which has rendered exhaustive analysis impractical),
\item exhaustive checking at a modular level, %(total failure coverage within {\fgs} all interacting component and failure modes checked),
\item traceable reasoning system models,% to aid repeatability and checking,
\item re-usable i.e. it should be possible to re-use analysis,

View File

@ -348,7 +348,8 @@ we thus reveal design deficiencies.
In Safety Integrity Level (SIL)~\cite{en61508} terms, by identifying undetectable faults and fixing them, we raise
the safe failure fraction (SFF).
\section{Objective and Subjective Reasoning stages}
Opportunity for formal definitions and perhaps an interface or process for achieving it....
\section{Conclusion}