2 hours of saturday moring...

This commit is contained in:
Robin Clark 2013-08-10 11:30:11 +01:00
parent 1af8fc13c2
commit 2190240e52
4 changed files with 113 additions and 78 deletions

View File

@ -77,7 +77,7 @@ use.
It then reveals common flaws
which make them unsuitable for the higher safety requirements of the 21st century.
%
Problems with state explosion in failure mode reasoning and the current impossibility
Problems with state explosion in failure mode reasoning and the current difficulties %impossibility
of integrating software and hardware failure mode models~\cite{1372150} are the most obvious of these. %flaws.
%
The four current methodologies are described in chapter~\ref{sec:chap2} and %the advantages and drawbacks
@ -103,7 +103,8 @@ This editor allowed the user to draw Euler/Spider diagrams, and could then
represent these as abstract---i.e. mathematical---definitions.
The primary motive for writing the Spider diagram editor was to provide an alternative
to formal languages for software specification.
Because of my exposure to FMEA, I started thinking of ways to apply formal languages and spider diagrams to
%
Because of my daily work exposure to FMEA, I started thinking of ways to apply formal languages and spider diagrams to
failure mode analysis.
%
%
@ -176,11 +177,11 @@ failures would be analysed, but because failure modes are traceable from the bas
these relationships can be held in a traversable data structure.
%
If held in a traversable data structure we can apply automated methods to search for all the combinations of multiple failure modes
within the model that have been analysed. Because of this, it will not always %it may not
within the model that has been analysed. Because of this, it will not always %it may not
be necessary to apply double checking
at all higher levels in the analysis hierarchy, to achieve complete double failure coverage.
%
The point at which it is possible to relax double failure checking can be verified automatically by traversing the
The point at which it is possible to relax double failure checking can be verified automatically by traversing
the failure mode model.
%
\subsection{Initial direction: Application of Spider diagrams to FMEA.}
@ -211,7 +212,7 @@ initial ideas, but a more traditional `spreadsheet' format has been used
for the analysis stages of the new methodology.
%
Euler diagrams have been used later in the thesis to describe the containment relationships
of derived components building hierarchical analysis models with the modularised
of derived components when building hierarchical analysis models with the modularised
variant of FMEA that this thesis proposes and defends.
%

View File

@ -21,7 +21,8 @@ how we determine the failure modes associated with components.
Two common electrical components, the resistor and the operational amplifier
are examined in the context of two sources of information that define failure modes.
%
To introduce the concept of FMEA, a simple example is given, using a hypothetical {\ft} milli-amp reader.
To introduce the concept of FMEA, a simple example is given, using a hypothetical four to twenty milli-amp ({\ft}) %milli-amp
reader.
%
The four main current FMEA variants are described and we develop %conclude by describing concepts
the concepts
@ -30,8 +31,9 @@ that underlie the usage and philosophy of FMEA.
We return to the overall process of FMEA
and model it using UML.
%
By using UML we define relationships between the FMEA data objects
defined at the start of this chapter.
By using UML %we define
relationships between the FMEA data objects
are defined. % at the start of this chapter.
%
The act
of defining relationships between the data objects
@ -116,7 +118,7 @@ To perform this we need to know how a failure
mode, considering its effect on other components in the system
will translate to a system level symptom/failure.
%
The result of FMEA is to determine a system level failures,
The result of FMEA is to determine system level failures,
or symptoms for each given component failure mode.
%
In practise, each entry of an FMEA analysis of a {\bc} {\fm}
@ -356,9 +358,10 @@ For the purpose of example for EN298, we look at
a typical op-amp designed for instrumentation and measurement, the dual packaged version of the LM358~\cite{lm358}
(see figure~\ref{fig:lm258}).
%
With the results from both sources of {\fm} definition,
we compare the failure mode definitions for FMD-91 and EN298
relating to operational amplifiers.
With the results from both sources of {\fm} definition %
%we compare
the failure mode definitions for FMD-91 and EN298
relating to operational amplifiers are compared.
\paragraph{ Failure Modes of an Op-Amp according to FMD-91 }
@ -729,7 +732,7 @@ The question is by how much.
%
Too much and the task becomes impossible due to time/labour constraints.
%
Too little and the analysis could become meaningless because it misses
Too little and the analysis could become meaningless, because it could miss
potential system failures.
%
For a more complete analysis we should perhaps examine each component {\fm} along the complete signal path,
@ -771,7 +774,7 @@ because we can usually find Mean Time to Failure (MTTF) statistics~\cite{fmd91,m
Also, used in the design phase of a project, FMEA is a useful tool
for discovering potential failure scenarios~\cite{1778436820050601}.
%
From a whole system perspective, we may find that {\bc} {\fms}
From a large system perspective, we may find that {\bc} {\fms}
may have more than one possible system event associated with them.
Often there will be a clear one to one mapping, but
probabilities to failure (as used in FMECA)
@ -856,23 +859,22 @@ could typically be one second.~\cite{en298}}.
Work has been performed using component failure statistics to
offer the more likely multiple failures~\cite{FMEAmultiple653556} for analysis.
%
We now compound the multiple symptoms from one {\bc} {\fm} possibility
with the merging of Markov chains.
%
So for multiple failures we have the objective criteria complicated, and the subjective
adds another layer of complication.
%
%
Traditional FMEA has the translation from an objective to subjective
failure modes as an intrinsic part of the process,
this is an additional complication.
%We now compound the multiple symptoms from one {\bc} {\fm} possibility
%with the merging of Markov chains.
%,this is an additional complication.
%, of having to change between these two modes of thinking, it becomes more difficult to
%get a balance between subjective and objective perspectives.
Another complication for multiple failure analysis is that failure modes may cause a change in circuit topology
A complication for multiple failure analysis is that failure modes may cause a change in circuit topology
meaning the additional failures might have to be analysed with respect to the changed topology.
%subjective/objective become more cluttered when there are multiple possibilities
%for the the results of an FMEA line of reasoning.
Because multiple failures mean dealing with changed topologies
the objective criteria is additionally complicated with the subjective
adding another layer of complication.
%
%
Traditional FMEA has the translation from an objective to subjective
failure modes as an intrinsic part of its process, which can be considered a weakness.
\paragraph{Failure modes and their observability criterion: detectable and undetectable.}
\label{sec:detectable}
@ -954,7 +956,7 @@ the sum of the number of failure modes, against all other components
in that system.
%
If the milli-volt reader had say 100 components, with three failure modes each, this
would give an exhaustive reasoning distance of 3 * 100 * 99.
would give an exhaustive reasoning distance---for single failure analysis---of 3 * 100 * 99.
%
The discussion on reasoning distance leads provides us with a metric to examine
the state explosion problems associated with forward search failure investigation
@ -987,20 +989,20 @@ of a failure mode with all other components in a system). Or in other words,
This is represented in the equation below, %~\ref{eqn:fmea_state_exp},
where $N$ is the total number of components in the system, and
$f$ is the number of failure modes per component.
%
\begin{equation}
\label{eqn:fmea_single}
N.(N-1).f % \\
%(N^2 - N).f
\end{equation}
\paragraph{Exhaustive FMEA and dual failures.}
%
This would mean an order of $O(N^2)$ number of checks to perform
to undertake an `exhaustive~FMEA'. Even small systems have typically
100 components, and they typically have 3 or more failure modes each.
$100*99*3=29,700$.
100 components, and they typically have 3 or more failure modes each, which would give
$100*99*3=29,700$ as a reasoning distance.
\paragraph{Exhaustive FMEA and double failure scenarios.}
%\paragraph{Exhaustive Double Failure FMEA}
For looking at potential double failure
@ -1017,7 +1019,7 @@ double failure scenarios (for burner lock-out scenarios).}
For our theoretical 100 components with 3 failure modes each example, this is a reasoning distance of
$100*99*98*3=2,910,600$. % failure mode scenarios.
In practise there is an additional concern here, that of
In practise there is an additional complication here, that of
the circuit topology changes that {\fms} can cause.
\paragraph{Reliance on experts for meaningful FMEA Analysis.}
@ -1034,7 +1036,8 @@ sub-sets of the components in the system, to check against each {\fm}.
%
Also, %In practise
these experts have to select the areas they see as most critical for detailed FMEA analysis:
it is usually impossible to perform a detailed level of analysis on all component {\fms}
it is usually impossible, for the reason of time to perform the work,
to action a detailed level of analysis on all component {\fms}
on anything but a non-trivial system.
\subsection{Component Tolerance}
@ -1071,7 +1074,7 @@ cost, problems to be addressed in product production.
It generally focuses on known problems and using their
statistical frequency %they occur
and their cost to fix multiplied gives a Risk Priority Number (RPN)
number for the component {\fm}.
number for the germane component {\fm}.
%
Fixing problems with the highest RPN number
will return most cost benefit~\cite{bfmea}.
@ -1190,20 +1193,51 @@ and require re-design of some systems.
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
%
\begin{table}[ht]
\centering
\caption{Table adapted from EN61508-1:2001 [7.6.2.9 p33], showing statistical tolerance of `dangerous~failures' to
comply with a given SIL level} % title of Table
%\centering % used for centering table
\begin{tabular}{|| l | l | c | c | l ||} \hline
\textbf{SIL} & \textbf{Low Demand} & \textbf{Continuous Demand} \\
& Prob of failing on demand & Prob of failure per hour \\ \hline \hline
4 & $ 10^{-5}$ to $< 10^{-4}$ & $ 10^{-9}$ to $< 10^{-8}$ \\ \hline
3 & $ 10^{-4}$ to $< 10^{-3}$ & $ 10^{-8}$ to $< 10^{-7}$ \\ \hline
2 & $ 10^{-3}$ to $< 10^{-2}$ & $ 10^{-7}$ to $< 10^{-6}$ \\ \hline
1 & $ 10^{-2}$ to $< 10^{-1}$ & $ 10^{-6}$ to $< 10^{-5}$ \\ \hline
\hline
\end{tabular}
\label{tbl:sil_levels}
\end{table}
%
% \begin{itemize}
% \item \textbf{Statistical Safety} Safety Integrity Level (SIL) standards (EN61508/IOC5108).
% \item \textbf{Diagnostics} Diagnostic or self checking elements modelled
% \item \textbf{Complete Failure Mode Coverage} All failure modes of all components must be in the model
% \item \textbf{Guidelines} To system architectures and development processes
% \end{itemize}
FMEDA is a modern extension of FMEA, in that it recognises the effect of
self checking features on safety, and provides detailed recommendations for computer/software architecture.
%
It has a simple final result, a Safety Integrity Level (SIL) from 1 to 4 (where 4 is safest).
%
These SIL levels are broadly linked to the concept of an
acceptance of probability of dangerous failures against time, as shown in table~\ref{tbl:sil_levels}.
%
FMEDA is the fundamental methodology of the statistical (safety integrity level)
type standards (EN61508/IOC5108).
The end result of an EN61508 analysis is an % provides a statistical
overall `level of safety' known as a Safety Integrity level (SIL), for a system.
There are currently four SIL `levels', one to four, with four being the highest level.
It allows diagnostic mitigation for self checking circuitry.
overall `level~of~safety' known as a Safety Integrity level (SIL), for a system.
%
%There are currently four SIL `levels', one to four, with four being the highest level.
%
It allows diagnostic mitigation for self checking circuitry.
%
SIL levels are intended to
classify the statistical safety of installed and commissioned plant:
salesmens terms such as a `SIL~3~sensor' or other `device' given a SIL level, are meaningless.
%
% for four levels of
%safety integrity, referred to as Safety Integrity Levels (SIL).
@ -1214,6 +1248,7 @@ the analyst to consider all hardware components in a system
by requiring that an MTTF value is assigned for each base component failure~mode;
the MTTF may be statistically mitigated (improved)
if it can be shown that self-checking will detect failure modes.
%
The MTTF value for each component {\fm} is denoted using the symbol `$\lambda$'.
%
EN61508 regulation in relation to software provides procedural quality guidelines and constraints (such as forbidding certain
@ -1225,11 +1260,11 @@ or across the software/hardware interface.
\label{sec:FMEDA}
\textbf{Failure Mode Classifications and metrics in FMEDA.}
\begin{itemize}
\item \textbf{Safe or Dangerous} Failure modes are classified SAFE or DANGEROUS
\item \textbf{Detectable failure modes} Failure modes are given the attribute DETECTABLE or UNDETECTABLE
\item \textbf{Four attributes to Failure Modes} All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU)
\item \textbf{Four statistical properties of a system} We sum the statistics for the four classifications of system failures \\
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$ \\
\item \textbf{Safe or Dangerous.} Failure modes are classified SAFE or DANGEROUS.
\item \textbf{Detectable failure modes.} Failure modes are given the attribute DETECTABLE or UNDETECTABLE.
\item \textbf{Four attributes for FMEDA Failure Modes.} All failure modes may thus be Safe Detected(SD), Safe Undetected(SU), Dangerous Detected(DD), Dangerous Undetected(DU)
\item \textbf{Four statistical properties of a system.} We sum the statistics for the four classifications of system failures \\
$ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{DU}$. \\
\end{itemize}
% Failure modes are classified as Safe or Dangerous according
@ -1296,30 +1331,12 @@ by statistically determining how frequently it can fail dangerously.
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
\begin{table}[ht]
\centering
\caption{FMEA Calculations} % title of Table
%\centering % used for centering table
\begin{tabular}{|| l | l | c | c | l ||} \hline
\textbf{SIL} & \textbf{Low Demand} & \textbf{Continuous Demand} \\
& Prob of failing on demand & Prob of failure per hour \\ \hline \hline
4 & $ 10^{-5}$ to $< 10^{-4}$ & $ 10^{-9}$ to $< 10^{-8}$ \\ \hline
3 & $ 10^{-4}$ to $< 10^{-3}$ & $ 10^{-8}$ to $< 10^{-7}$ \\ \hline
2 & $ 10^{-3}$ to $< 10^{-2}$ & $ 10^{-7}$ to $< 10^{-6}$ \\ \hline
1 & $ 10^{-2}$ to $< 10^{-1}$ & $ 10^{-6}$ to $< 10^{-5}$ \\ \hline
\hline
\end{tabular}
\end{table}
Table adapted from EN61508-1:2001 [7.6.2.9 p33]
%\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
FMEDA is a modern extension of FMEA, in that it will allow for
self checking features, and provides detailed recommendations for computer/software architecture.
It has a simple final result, a Safety Integrity Level (SIL) from 1 to 4 (where 4 is safest).
%FMEA can be used as a term simple to mean Failure Mode Effects Analysis, and is
%part of product approval for many regulated products in the EU and the USA...

View File

@ -16,7 +16,7 @@ at {\bc} {\fms}.
%
This undoubtedly reveals dangers inherent in designs and makes
our lives safer. This chapter aims to look for the deficiencies in current FMEA processes, to probe for weaknesses
and look for ways in which it could be done better and more efficiently.
and look for ways in which it could be performed better and more efficiently.
A major problem is with the scope of examination---or required reasoning distance---to apply
for FMEA analysis.
@ -75,9 +75,12 @@ This analysis philosophy has not changed since FMEA was first used.
\subsection{FMEA does not support Traceable Reasoning}
An FMEA report normally assigns one line of a spreadsheet to
each {\bc} {\fm}.
%
This means that the reasoning involved in determining the system level failure/symptom is described (if at all) very briefly.
%
Ideally supporting documentation would give the reasoning and calculations behind each analysis case,
but the structure of current FMEA reports does not encourage this.
%
\paragraph{Re-use of FMEA analysis}
%
Given the {\bc} {\fm} to system level failure mode paradigm it is
@ -99,11 +102,15 @@ work is likely to be repeated.
\subsection{FMEA does not support modularity.}
It is a common practise in the process control industry to buy in sub-systems,
typically sensors and actuators connected to an industrially hardened computer bus, i.e. CANbus~\cite{can,canspec}, modbus~\cite{modbus} etc.
%
With traditional FMEA it is difficult to deal with
a `plug~and~play' paradigm. The design philosophy of FMEA is to trace {\bc} failure through to system failures.
a `plug~and~play' paradigm.
%
The design philosophy of FMEA is to trace {\bc} failure through to system failures.
This is incompatible with a modular approach where the architecture of a
system may be different for implementation sites.
The modularity problem is exacerbated by FMEAS problems modelling software/hardware hybrids, a problem
%
The modularity problem is exacerbated by FMEA's problems modelling software/hardware hybrids, a problem
examined in section~\ref{sec:distributed}.
% Most sensor systems now are `smart'~\cite{smartinstruments}, that is to say, they contain programmatic elements
% even if their outputs are %they supply
@ -189,7 +196,7 @@ to be performed on the same system to provide insight into the
software hardware/interface~\cite{embedsfmea}.
%
Although this
would give a better picture of the failure mode behaviour, it
should give a better picture of the failure mode behaviour, it
is by no means a rigorous approach to tracing errors that may occur in hardware
through to the top (and therefore ultimately controlling) layer of software.
%
@ -263,6 +270,7 @@ Most modern cars follow this information technology pattern and use CANbus~\cite
For instance, in a modern car there will be no mechanical linkage from the pedal to the engine, instead the throttle pedal
will be linked to a sensor to determine how
far the pedal is pressed.
%
This sensor will be read by a micro-controller, and passed, via CANbus, to the Engine Control Unit (ECU)
which will use that information (along with information from other sensors) to adjust the power required from the engine.
%
@ -305,8 +313,8 @@ utterly anachronistic in the distributed real time system environment.
\item FMEA type methodologies were designed for simple electro-mechanical systems of the 1940's to 1960's.
\item Reasoning Distance - component failure to system level symptom process is undefined in regard
to the components to check against each given component {\fm}.
\item State explosion - impossible to perform FMEA exhaustively %rigorously
\item Difficult to re-use previous analysis work
\item State explosion - impossible to perform FMEA exhaustively. %rigorously
\item Difficult to re-use previous analysis work.
\item Very difficult to model simultaneous failures.
\item Software and hardware models are separate (if the software is modelled at all).
\item Distributed real time systems are very difficult to analyse with FMEA because they typically involve many hardware/software interfaces.

View File

@ -1241,7 +1241,7 @@ An outline of the FMMD process is itemised below:
\item Assign the common failure modes from the {\fg} as the failure modes of the {\dc}.
\end{itemize}
%
The FMMD process is described in more detail in section~\ref{sec:symptomabs}.
The FMMD process is described in using formal definitions and algorithms in section~\ref{sec:symptomabs}.
%We can now call our functional~group a sub-system or a derived~component.
%The goal here is to know how it will behave under fault conditions !
@ -1479,7 +1479,7 @@ and creates a new {\dc} from it.
%group.
The newly created {\dc} requires a set of failure modes of its own.
As a derived component inherits component, the UML model shows
that it inherits a set of failure modes.
that it inherits the property of a set of failure modes.
%
%These failure modes are the failure mode behaviour---or symptoms---of the {\fg} from which it was derived.
%
@ -1491,9 +1491,12 @@ that it inherits a set of failure modes.
%fault behaviour.
A {\fg} must comprise of at least one component, and the UML diagram shows this
with the one to many relationship.
%
Under exceptional circumstances a component may need to be a member of more than
one {\fg} (this is looked at in section~\ref{sec:sideeffects}). The relationship between
the {\fg} and component is therefore $ \star \leftrightarrow 1..\star$.
one {\fg} (this is looked at in section~\ref{sec:sideeffects}).
%
The relationship between
the {\fg} and component is therefore---using UML notation---`$ \star \leftrightarrow 1..\star$'.
%
A {\fg} will only be associated with one {\dc} and is given a one to one relationship in the UML diagram.
%
@ -1534,7 +1537,7 @@ The lowest level in this hierarchy are the {\bcs}, the resistors and the op-amp.
The resistors are collected into a {\fg}, and the ${PD}$ derived component created from its analysis, is shown enclosing R1 and R2. % above the {\fg}.
%
As this derived component inherits the properties of a component, we may use
it in {\fg} higher in the hierarchy.
it in a {\fg} higher in the hierarchy.
%
The {\em PD} derived component is now placed into a {\fg}
with the op-amp.
@ -2214,12 +2217,17 @@ With the above condition true, we term this a `complete' FMMD failure model.
Ensuring this condition is described in section~\ref{sec:completetest}.
\paragraph{Mutual exclusivity of {\dc} failure modes.}
%
It is a desirable feature of a component that its failure modes
are mutually exclusive.
%
This also applies to {\dcs} produced in the FMMD process.
%
In the FMMD process symptoms are are collected, i.e no component failure modes may be shared
by a symptom within a {\fg}, and therefore the failure modes of a {\dc} are mutually exclusive.
Thus FMMD naturally produces {\dc} failure modes that are mutually exclusive.
%
Thus FMMD naturally produces {\dcs} with failure modes that are mutually exclusive.
%
This property is examined in more detail in section~\ref{ch7:mutex}.
\paragraph{Objective and contextual/subjective failure symptoms.}
@ -2234,7 +2242,8 @@ have to make this decision (the contextual effects) for each component {\fm} in
Because FMMD considers failure modes within functional groups;
the traditional state explosion problem in FMEA where the ideal of exhaustive FMEA (XFMEA)---where each failure
mode could be considered in the context of all other components in the system---disappears.
FMMD applies XFMEA within {\fgs}.
%
With FMMD, because the {\fgs} have small numbers of components in them, we can easily apply XFMEA within the {\fgs}.
%
This issue addressed formally in section~\ref{sec:cc}.