1365 lines
60 KiB
TeX
1365 lines
60 KiB
TeX
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
\abstract{
|
|
This paper proposes a methodology for
|
|
creating failure mode models of safety critical systems, which
|
|
has a common notation
|
|
for mechanical, electronic and software domains and applies an
|
|
incremental and rigorous approach.
|
|
%This paper describes how the proposed methodology
|
|
%functions, given requirements and constraints (such as number of combinations
|
|
%of failure causes for flat ).
|
|
%It describes the need for the new methodology to be bottom-up, and
|
|
%then the need for incremental modularisation
|
|
%to build a fault mode hierarchy, which leads to the concept of functional grouping,
|
|
%analysis of those groupings, and from that
|
|
%the creation of derived components.
|
|
%%
|
|
%% What I have done
|
|
%%
|
|
The four main static failure mode analysis methodologies were examined and
|
|
in the context of newer European safety standards, assessed.
|
|
Some of the deficiencies identified in these methodologies led to
|
|
a wish list for a more rigorous methodology.
|
|
%%
|
|
%% What I have found
|
|
%%
|
|
From the wish list
|
|
%and considering some constraints determined from
|
|
%the evaluation of the four established methodologies,
|
|
a new
|
|
methodology is developed and proposed.
|
|
This has been named Failure Mode Modular De-Composition (FMMD).
|
|
|
|
%% Sell it
|
|
%%
|
|
In addition to addressing the traditional weaknesses of
|
|
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticality Analysis (FMECA)
|
|
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
|
|
as specified in newer European Safety Standards \cite{en298}.
|
|
The proposed methodology is bottom-up and can guarantee to leave no component failure mode un-handled.
|
|
It is also modular, meaning that the results of analysed components may be re-used in other projects.
|
|
}
|
|
}
|
|
{
|
|
%%% CHAPTER INTO NEARLT THE SAME AS ABSTRACT
|
|
|
|
This chapter proposes a methodology for
|
|
creating failure mode models of safety critical systems, which
|
|
has a common notation
|
|
for mechanical, electronic and software domains and applies an
|
|
incremental and rigorous approach.
|
|
%%
|
|
%This chapter describes how the proposed methodology functions
|
|
%given requirements and constraints such as the number of combinations
|
|
%of failure causes.
|
|
%It describes the need for the new methodology to be bottom-up, and
|
|
%then the need for incremental modularisation
|
|
%to build a fault mode hierarchy, which leads to the conceopt of functional grouping,
|
|
%analysis of those groupings, and from that
|
|
%the creation of derived components.
|
|
|
|
%% What I have done
|
|
%%
|
|
The four main static failure mode analysis methodologies were examined and
|
|
in the context of newer European safety standards, assessed.
|
|
Some of the deficiencies identified in these methodologies led to
|
|
a wish list for a more ideal methodology.
|
|
%%
|
|
%% What I have found
|
|
%%
|
|
From the wish list %
|
|
%and considering some constraints determined from
|
|
%the evaluation of the four established methodologies,
|
|
a new
|
|
methodology is developed and proposed.
|
|
This has been named Failure Mode Modular De-Composition (FMMD).
|
|
|
|
%% Sell it
|
|
%%
|
|
In addition to addressing the traditional weaknesses of
|
|
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticality Analysis (FMECA)
|
|
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
|
|
as specified in newer European Safety Standards \cite{en298}.
|
|
The proposed methodology is bottom-up and can guarantee to leave no component failure mode unhandled.
|
|
It is also modular, meaning that the results of analysed components may be re-used in other projects.
|
|
|
|
}
|
|
|
|
|
|
|
|
\section{Current Static Failure Mode Methodologies}
|
|
|
|
There are four methodologies in common use for failure mode modelling.
|
|
These are FTA, FMEA, FMECA
|
|
and FMEDA (a form of statistical assessment).
|
|
%
|
|
These methodologies date from the 1940's onwards, and were designed for
|
|
different application areas and reasons; all have drawbacks and
|
|
advantages that are discussed in the next section.
|
|
%In short
|
|
%FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
|
|
%lack precision in predicting failure modes at the SYSTEM level.
|
|
|
|
\paragraph{FMMD in context.}
|
|
Failure Mode Modular De-composition
|
|
(FMMD) aims to address the
|
|
weaknesses in the four established methodologies, and to add
|
|
features such as the ability to analyse multiple
|
|
failure mode scenarios, and to allow modular re-use
|
|
of analysis.
|
|
|
|
%FMMD is an incremental bottom up FMEA process.
|
|
%% TERRIBLE PARAGRAPH
|
|
The FMMD
|
|
methodology provides a detailed, hierarchical, incremental and analytical
|
|
modelling system which will create a failure mode model from which
|
|
the data models for FTA, FMEA, FMECA and FMEDA % (the statistical approach)
|
|
can be
|
|
derived. % if required.
|
|
An FMMD model is effectively a super set of all the four traditional models.
|
|
It also focuses on component interaction within the model,
|
|
something not formally considered in the four established methodologies.
|
|
%
|
|
In addition it applies rigorous checking in all the analysis stages
|
|
ensuring that \textbf{all} component failure modes must be considered in the model.
|
|
|
|
%
|
|
\paragraph{FMMD process outline.}
|
|
This methodology has been named Failure Mode Modular De-composition (FMMD)
|
|
because it decomposes a SYSTEM into a hierarchy of modules or {\dc}s.
|
|
This
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
paper
|
|
}
|
|
{
|
|
chapter
|
|
}
|
|
presents the design considerations that motivated and provided the specification for
|
|
the FMMD methodology.
|
|
%
|
|
Firstly it briefly reviews the four traditional
|
|
static failure mode analysis methodologies and
|
|
lists their known weaknesses. A wish list is then drawn up
|
|
addressing these weaknesses and adding some extra requirements.
|
|
Using this wish list the philosophy for the new methodology
|
|
is determined.
|
|
%
|
|
FMMD works from the bottom up, taking small groups
|
|
of components, {\fgs}, and then analysing how they can fail.
|
|
\input{./shortfg}
|
|
|
|
\paragraph{Micro Vs. Macro failure mode analysis.}
|
|
The FMMD analysis is performed using failure mode effects analysis
|
|
from a micro rather than a macro perspective.
|
|
Thus instead of looking at component failure modes and determining how
|
|
they {\em may} cause a failure at SYSTEM level, we are looking at how
|
|
they {\em will} affect the component's local {\fg}.
|
|
When we know the failure modes of a {\fg} we can treat it as a `black box'
|
|
or {\dc}. With {\dc}s we can build {\fgs}
|
|
at higher levels of analysis, until we have a complete
|
|
hierarchy representing the failure behaviour of the SYSTEM.
|
|
%
|
|
Because all the failure modes of all the components
|
|
are held in a computer program, we can determine if the model has complete coverage
|
|
for component failure modes
|
|
(i.e. all component failure modes have been included in the model).
|
|
|
|
|
|
%OK need to describe the need for it
|
|
\section{The need for a new failure mode modelling methodology}
|
|
|
|
%%- There are dificulties with bot up methodologies,
|
|
%%- and this is in part due to the fact that accidents
|
|
%%- are always unforseen and unexpected.
|
|
|
|
%%- what do we have ENV factors, component failure modes.
|
|
|
|
%%- how difficult is it to take a single component failure mode and
|
|
%%- then from that determine how it will react with other components
|
|
%%- and how it will be affected
|
|
|
|
\subsection{General comments on bottom-up and top down approaches}
|
|
|
|
\paragraph{A general deficiency in top-down systems analysis.}
|
|
With a top down approach the investigator has to determine
|
|
a set of undesirable outcomes or `accidents'.
|
|
As most accidents are unexpected and the causes unforeseen \cite{safeware}
|
|
it is fair to say that a top down approach is not guaranteed to
|
|
predict all possible undesirable outcomes.
|
|
Top-down methodologies can miss known component failure modes, by
|
|
simply not decomposing down to the base component failure level of detail.
|
|
|
|
\paragraph{A general problem with bottom-up static failure analysis.}
|
|
With the bottom up techniques we have all the known component failure modes
|
|
and the relative freedom to determine how each of these may affect the SYSTEM.
|
|
%
|
|
A problem with this is that a component typically
|
|
interacts in a complex way with several other functionally
|
|
adjacent components.
|
|
%
|
|
To take a component failure mode and then attempt to tie that
|
|
to a SYSTEM level outcome is very difficult.
|
|
%
|
|
%
|
|
The number of components
|
|
a failure mode under investigation might interact with is typically very large.
|
|
This makes it very difficult to predict the effects of a component
|
|
failure mode, because we have to decide which components it could affect,
|
|
or
|
|
in other words, which components are functionally adjacent to it.
|
|
%
|
|
We cannot consider all the components in the SYSTEM
|
|
when looking at a single failure mode,
|
|
and therefore human judgement must be used to
|
|
decide which interactions could be important.
|
|
|
|
Let N be the number of components in our system, and K be the average number of component failure modes
|
|
(ways in which a base~component can fail). The total number of base component failure modes
|
|
is $N \times K$. To examine the effect that one failure mode has on all
|
|
the other components\footnote{A base component failure will typically affect the sub-system
|
|
it is part of, and create a failure effect at the SYSTEM level.}
|
|
will be $(N-1) \times N \times K$, in effect a very large set cross product.
|
|
|
|
|
|
Complicate this further with applied states or environmental conditions
|
|
and another order of cross product of complexity is added.
|
|
We may have a piece of self checking circuitry for instance that
|
|
has two states, normal and testing mode commanded by a logic line.
|
|
Or we may have a mechanical device that has a different
|
|
failure mode behaviour for say, different ambient pressures or temperatures.
|
|
|
|
If $E$ is the number of applied states or environmental conditions to consider
|
|
in a system, and $A$ the number of applied states,
|
|
the job of the bottom-up analyst is presented with two
|
|
additional %cross product
|
|
factors,
|
|
$(N-1) \times N \times K \times E \times A$.
|
|
If we put some typical very small embedded system numbers\footnote{these figures would
|
|
be typical of a very simple temperature controller, with a micro-controller sensor
|
|
and heater circuit.} into this, say $N=100$, $K=2.5$, $A=2$, and $E=10$
|
|
we have $99 \times 100 \times 2.5 \times 10 \times 2 = 495000 $.
|
|
To look in detail at a half of a million test cases is obviously impractical.
|
|
|
|
If we were to consider multiple simultaneous failure modes,
|
|
we have yet another cross product of checks to be performed.
|
|
%
|
|
For instance looking at double simultaneous failure modes, where $\#C$
|
|
is the number of checks to perform
|
|
the equation reads $\#C = (N-2) \times (N-1) \times N \times K \times E$.
|
|
|
|
The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes\footnote{Often component failures, rather than individual component
|
|
failure modes are used, making the analysis process less precise.} and link them
|
|
to SYSTEM level failure modes. Because of the astronomical number of possible interactions,
|
|
some valid ones are in danger of being missed, we can term this analysis as a `leap~of~faith'
|
|
(i.e. leaping from from the
|
|
component failure mode to the SYSTEM level).
|
|
|
|
|
|
|
|
\paragraph{Ideal static failure mode methodology.}
|
|
An ideal static failure mode methodology would build a failure mode model
|
|
from which the traditional four models could be derived.
|
|
It would address the short-comings in the other methodologies, and
|
|
would have a user friendly interface, with a visual (rather than symbolic) syntax with icons
|
|
to represent the results of analysis phases.
|
|
%
|
|
%There are four static analysis failure mode methodologies in common use.
|
|
%Each has its advantages and drawbacks, and each is suited for
|
|
%a different phase in the product life cycle.
|
|
The four methodologies in current use are discussed briefly below.
|
|
|
|
\subsection { FTA }
|
|
\glossary{name={FTA},description={Fault Tree Analysis}}
|
|
This, like all top~down methodologies introduces the very serious problem
|
|
of missing component failure modes \cite{faa}[Ch.9].
|
|
\fmodegloss
|
|
%, or modelling at
|
|
%a too high level of failure mode abstraction.
|
|
FTA was invented for use on the minuteman nuclear defence missile
|
|
systems in the early 1960's and was not designed as a rigorous
|
|
fault/failure mode methodology.
|
|
It was designed to look for disastrous top level hazards and
|
|
determine how they could be caused.
|
|
It is more like a procedure to
|
|
be applied when discussing the safety of a system, with a top down hierarchical
|
|
notation using logic symbols, that guides the analysis.
|
|
This methodology was designed for
|
|
experienced engineers sitting around a large diagram and discussing the safety aspects.
|
|
Also the nature of a large rocket with red wire, and remote detonation
|
|
fail-safes meant that the objective was to iron out common failures
|
|
not to rigorously detect all possible failures.
|
|
Consequently it was not designed to guarantee to covering all component failure modes,
|
|
and has no rigorous in-built safeguards to ensure coverage of all possible
|
|
system level outcomes.
|
|
Also each system level error (or undesirable event) requires its own FTA tree.
|
|
This increases the amount of work to do, and in the case of updates to
|
|
particular sub-systems, introduces the requirement to update every FTA
|
|
tree modelling that use the affected sub-system.
|
|
|
|
\subsubsection{ FTA weaknesses }
|
|
\begin{itemize}
|
|
\item Complex component interaction effects are by definition modelled by FTA, but because of the top down approach, not all
|
|
base component failure modes are guaranteed to be included in the model.
|
|
\item Possibility to miss environmental affects.
|
|
\item One FTA tree, per system failure mode. Thus there is not one model from which several FTA
|
|
trees can be derived. Maintainability and consistency cannot therefore be automatically checked.
|
|
\item No possibility to model base component level double failure modes.
|
|
\end{itemize}
|
|
|
|
\subsection { Current Bottom up methodologies - FMEA, FMECA, FMEDA }
|
|
|
|
The state explosion problem for bottom-up
|
|
can be theoretically reduced by taking the path from
|
|
the base component failure mode, through the signal path~\cite{garrett}
|
|
and considering all failure modes for each component on the path.
|
|
This thinking is discussed in the concept of a `Reasoning~distance' in section \ref{sec:rd}.
|
|
%
|
|
In practise, this following to all components on the signal path, will not be done.
|
|
Typically simpler scenarios will be used where the effect of
|
|
the component failure will be considered in isolation and then applied to the signal path.
|
|
|
|
\paragraph{Reasoning distance - complexity and reach-ability.}
|
|
Tracing a component level failure up to a top level event, without the rigour accompanying state explosion, involves
|
|
working heuristically. A base component failure will typically
|
|
be conceptually removed by several stages from a top level event.
|
|
The `reasoning~distance' $R_D$ can be calculated by summing the failure modes in each component, for all components
|
|
that must interact to reach the top level event.
|
|
Where $C$ represents the set of components in a failure mode causation chain,
|
|
$c_i$ represents a component in $C$ and
|
|
the function $fm$ returns the failure modes for a given component, equation
|
|
\ref{eqn:complexity}, returns the `reasoning~distance'.
|
|
\begin{equation}
|
|
R_D = \sum_{i=1}^{|C|} |{fm(c_i)}| %\; where \; c \in C
|
|
\label{eqn:complexity}
|
|
\end{equation}
|
|
|
|
The reasoning distance is a value representing the number of failure modes
|
|
to consider to rigorously determine the causation chain
|
|
from the base component failure to the SYSTEM level event.
|
|
|
|
The reasoning distance serves to show that when the causes of a top level
|
|
event are completely determined, a large amount of work not
|
|
typical of heuristic or intuitive interpretation is required.
|
|
% could have a chapter on this.
|
|
% take a circuit or system and follow all the interactions
|
|
% to the components that cause the system level event.
|
|
|
|
\subsubsection{FMEA}
|
|
\label{pfmea}
|
|
This is an early static analysis methodology, and concentrates
|
|
on SYSTEM level errors which have been investigated.
|
|
The investigation will typically point to a particular failure
|
|
of a component.
|
|
The methodology is now applied to find the significance of the failure.
|
|
It is based on a simple equation where $S$ ranks the severity (or cost \cite{bfmea}) of the identified SYSTEM failure,
|
|
$O$ its occurrence\footnote{The occurrence $O$ is the
|
|
probability of the failure happening.},
|
|
and $D$ giving the failures detectability\footnote{Detectability: often failures
|
|
may occur but not be noticed or cause an effect.
|
|
Consider an unused feature failing.}. Multiplying these
|
|
together,
|
|
gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
|
|
This gives in effect
|
|
a prioritised `to~do~list', with higher $RPN$ values being the most urgent.
|
|
|
|
|
|
\paragraph{ FMEA weaknesses }
|
|
\begin{itemize}
|
|
\item Possibility to miss the effects of base component failure modes at SYSTEM level.
|
|
(because the its each individual component, not all its failure modes, that are considered for analysis).
|
|
\item Possibility to miss environmental effects.
|
|
\item Complex component interaction effects can be missed.
|
|
\item No possibility to model base component level double failure modes.
|
|
\end{itemize}
|
|
\fmodegloss
|
|
\paragraph{Note.} FMEA is sometimes used in its literal sense, that is to say
|
|
Failure Mode Effects analysis, simply looking at a systems' internal failure
|
|
modes and determining what may happen as a result.
|
|
FMEA described in this section (\ref{pfmea}) is sometimes called `production FMEA'.
|
|
|
|
\subsection{FMECA}
|
|
|
|
Failure mode, effects, and critically analysis (FMECA) extends FMEA adding a critically factor.
|
|
This is a bottom up methodology, which takes component failure modes
|
|
and traces them to the SYSTEM level failures.
|
|
%
|
|
Reliability data for components is used to predict the
|
|
failure statistics in the design stage.
|
|
An openly published source for the reliability of generic
|
|
electronic components was published by the DOD
|
|
in 1991 (MIL HDK 1991 \cite{mil1991}) and is a typical
|
|
source for MTFF data.
|
|
%
|
|
FMECA has a probability factor for a component error occurring % causing
|
|
termed the $\alpha$ factor.
|
|
%\footnote{for a given component failure mode there will be a $\beta$ value, the
|
|
%probability that the component failure mode will cause a given SYSTEM failure}.
|
|
%
|
|
A second probability factor, the likelihood that the component failure
|
|
mode will cause a given SYSTEM failure, is termed the $\beta$ factor.
|
|
As often the component failure mode cannot be proven to cause a SYSTEM level failure, but is
|
|
assigned a heuristic probability $\beta$ factor by the design engineer.
|
|
The use of a $\beta$ factor
|
|
is often justified using Bayes theorem \cite{probstat}.
|
|
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
|
|
%
|
|
The results of FMECA are similar to FMEA, in that component errors are
|
|
listed according to importance, based on
|
|
probability of occurrence and critically.
|
|
% to prevent the SYSTEM fault of given criticallity.
|
|
Again this essentially produces a prioritised `to~do' list.
|
|
|
|
|
|
|
|
\paragraph{ FMECA weaknesses }
|
|
\begin{itemize}
|
|
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
|
\item Possibility to miss environmental affects.
|
|
\item The $\beta$ factor is based on heuristics and typically does not reflect any rigorous calculations.
|
|
\item The $\alpha$ factor is generally taken from literature for generic component such as MIL~HDBK~\cite{mil1992}
|
|
and may not take into account operational states and environmental factors.
|
|
\item Complex component interaction effects can be missed.
|
|
\item No possibility to model base component level double failure modes.
|
|
\end{itemize}
|
|
|
|
|
|
\subsection { FMEDA }
|
|
|
|
Failure Modes, Effects, and Diagnostic Analysis (FMEDA)
|
|
% This
|
|
is a process that takes all the components in a system,
|
|
and using the failure modes of those components, the investigating engineer
|
|
ties them to possible SYSTEM level events/failure modes.
|
|
\fmodegloss
|
|
%
|
|
This technique
|
|
evaluates a product's statistical level of safety
|
|
taking into account its self-diagnostic ability.
|
|
The calculations and procedures for FMEDA are
|
|
described in EN61508~\cite{en61508}[Part 2 App C].
|
|
The following gives an outline of the procedure.
|
|
|
|
|
|
\paragraph{Two statistical perspectives}
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
|
|
Probability of Failure on Demand (PFD), and Probability of Failure
|
|
in continuous Operation, or Failure in Time (FIT).
|
|
|
|
\paragraph{Failure in Time (FIT).} Continuous operation is measured in failures per billion ($10^9$) hours of operation.
|
|
For a continuously running nuclear power-station, industrial burner or aircraft engine
|
|
we would be interested in its operational FIT values.
|
|
|
|
\paragraph{Probability of Failure on Demand (PFD).} For instance with an anti-lock system in
|
|
automobile braking, or other fail safe measure applied in an emergency, we would be interested in PFD.
|
|
That is to say the ratio of it failing
|
|
to succeeding to operate correctly on demand.
|
|
}
|
|
{
|
|
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
|
|
Probability of Failure on Demand (PFD) (see \ref{survey:pfd})
|
|
, and Probability of Failure
|
|
in continuous Operation, or Failure in Time (FIT) (see \ref{survey:fit}).
|
|
}
|
|
|
|
\subsubsection{The FMEDA Analysis Process}
|
|
|
|
\paragraph{Determine SYSTEM level failures from base components}
|
|
The first stage is to apply FMEA to the SYSTEM.
|
|
%
|
|
Each component is analysed in terms of how its failure
|
|
would affect the system.
|
|
Failure rates of individual components in the SYSTEM
|
|
are calculated based on component type and
|
|
environmental conditions.
|
|
|
|
%The SYSTEM errors are categorised as `safe' or `dangerous'.
|
|
|
|
|
|
%
|
|
%Statistical data exists for most component types \cite{mil1992}.
|
|
%
|
|
%This phase is typically implemented on a spreadsheet
|
|
%with rows representing each component. A typical component spreadsheet row would
|
|
%comprise of
|
|
%component type, placement,
|
|
%part number, environmental stress factors, MTTF, safe/dangerous etc.
|
|
%%will be a determination of whether the component failing will lead to a `safe'
|
|
%or `unsafe' condition.
|
|
|
|
%\paragraph{Overall SYSTEM failure rate.}
|
|
%The product failure rate is the sum of all component
|
|
%failure rates. Typically the sum of all MTTF rates for all
|
|
%components in an FMEDA spreadsheet.
|
|
%This is the sum of safe and unsafe
|
|
%failures.
|
|
|
|
\paragraph{Self Diagnostics.}
|
|
We next evaluate the SYSTEM's self-diagnostic ability.
|
|
|
|
%Each component's failure modes and failure rate are now available.
|
|
Failure modes are now classified as safe or dangerous.
|
|
This is done by taking a component failure mode and determining
|
|
if the SYSTEM error it is tied to is dangerous or safe.
|
|
The decision for this may be
|
|
based on heuristics or field data.
|
|
%EN61508 uses the $\lambda$ symbol to represent probabilities.
|
|
%Because we have statistics for each component failure mode,
|
|
%we can now now classify these in terms of safe and dangerous lambda values.
|
|
%Detectable failure probabilities are labelled `$\lambda_D$' (for
|
|
%dangerous) and `$\lambda_S$' (for safe) \cite{en61508}.
|
|
|
|
\paragraph{Determine Detectable and Undetectable Failures.}
|
|
Each safe and dangerous failure mode is now
|
|
classified as detectable or un-detectable.
|
|
For the higher integrity levels, EN61508 requires that products have a high proportion of
|
|
self checking features.
|
|
%
|
|
This gives us four level failure mode classifications:
|
|
Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
|
|
and the probabilistic failure rate of each classification
|
|
is represented by lambda variables
|
|
(i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
|
|
|
\glossary{name={SD},description={Safe Detected; a SYSTEM level failure mode that is considered safe, and is detected by self checking mechanisms}}
|
|
\glossary{name={SU},description={Safe Undetected; a SYSTEM level failure mode that is considered safe, and is not detected by self checking mechanisms}}
|
|
\glossary{name={DD},description={Dangerous Detected; a SYSTEM level failure mode that is considered dangerous, and is detected by self checking mechanisms}}
|
|
\glossary{name={DU},description={Dangerous Undetected; a SYSTEM level failure mode that is considered dangerous, and is not detected by self checking mechanisms}}
|
|
|
|
Because it is recognised that some failure modes may not be discovered theoretically during the static
|
|
analysis, the
|
|
% admission of how daft it is to take a component failure mode on its own
|
|
% and guess how it will affect an ENTIRE complex SYSTEM
|
|
% Admission of failure of the process really !!!!
|
|
next step is to investigate using an actual working SYSTEM.
|
|
%
|
|
Failures are deliberately caused (by physical intervention), and any new SYSTEM level
|
|
failures are added to the model.
|
|
Heuristics and MTTF failure rates for the components
|
|
are used to calculate probabilities for these new failure modes
|
|
along with their safety and detectability classifications (i.e.
|
|
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
|
These new failures are added to the model.
|
|
%SD, SU, DD, DU.
|
|
|
|
%With these classifications, and statistics for each component
|
|
%we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is)
|
|
%and its safe failure fraction (how many of its failures are self detected or safe compared to
|
|
%all failures possible).
|
|
%
|
|
%The calculations for these are described below.
|
|
|
|
%\paragraph{Diagnostic Coverage.}
|
|
%The diagnostic coverage is simply the ratio
|
|
%of the dangerous detected probabilities
|
|
%against the probability of all dangerous failures,
|
|
%and is normally expressed as a percentage.
|
|
%%$\Sigma\lambda_{DD}$ represents
|
|
%the percentage of dangerous detected base component failure modes, and
|
|
%$\Sigma\lambda_D$ the total number of dangerous base component failure modes.
|
|
%
|
|
%$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$
|
|
%
|
|
%The diagnostic coverage for safe failures, where $\Sigma\lambda_{SD}$ represents the percentage of
|
|
%safe detected base component failure modes,
|
|
%and $\Sigma\lambda_S$ the total number of safe base component failure modes,
|
|
%is given as
|
|
%
|
|
%$$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$
|
|
%
|
|
%
|
|
\paragraph{Safe Failure Fraction.}
|
|
A key concept in FMEDA is Safe Failure Fraction (SFF).
|
|
This is the ratio of safe and dangerous detected failures
|
|
against all safe and dangerous failure probabilities.
|
|
%Again this is usually expressed as a percentage.
|
|
|
|
%$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$
|
|
|
|
%This is the ratio of
|
|
%Step 4 Calculate SFF, SIL and PFD
|
|
%The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
|
|
%SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
|
|
%PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)
|
|
|
|
% Often a given component failure mode there will be a $\beta$ value, the
|
|
% probability that the component failure mode will cause a given SYSTEM failure.
|
|
|
|
%\paragraph{Risk Mitigation}
|
|
%
|
|
%The component may be have its risk factor
|
|
%reduced by the checking interval (or $\tau$ time between self checking procedures).
|
|
%
|
|
%Ultimately this technique calculates a risk factor for each component.
|
|
%The risk factors of all the components are summed and
|
|
%%give a value for the `safety level' for the equipment in a given environment.
|
|
|
|
|
|
|
|
|
|
|
|
\paragraph{Classification into Safety Integrity Levels (SIL).}
|
|
There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
|
|
In addition to probabilistic risk factors, the
|
|
diagnostic coverage and SFF
|
|
have threshold bands becoming stricter for each level.
|
|
Demanded software verification and specification techniques and constraints
|
|
(such as language subsets, s/w redundancy etc)
|
|
become stricter for each SIL level.
|
|
%%
|
|
%% Andrew asked me to expand on this here, but it would take at least two
|
|
%% pages. I think its more appropriate for the survey.tex chapter.
|
|
%%
|
|
|
|
Thus FMEDA uses statistical methods to determine
|
|
a safety level (SIL), typically used to meet an acceptable risk
|
|
value, specified for the environment the SYSTEM must work in.
|
|
EN61508 defines in general terms,
|
|
risk assessment and required SIL levels~\cite{en61508}[5 Annex A].
|
|
|
|
%the probability of
|
|
%failures occurring, and provide an adaquate risk level.
|
|
%
|
|
%A component failure mode, given its MTTF
|
|
%the probability of detecting the fault and its safety relevant validation time $\tau$,
|
|
%contributes a simple risk factor that is summed
|
|
%in to give a final risk result.
|
|
%
|
|
Thus an FMEDA
|
|
model can be implemented on a spreadsheet, where each component
|
|
has a calculated risk, a fault detection time (if any), an estimated risk importance
|
|
and other factors such as de-rating and environmental stress.
|
|
With one component failure mode per row,
|
|
all the statistical factors for SIL rating can be produced\footnote{A SIL rating will
|
|
apply to an installed plant, i.e. a complete installed and working SYSTEM.
|
|
SIL ratings for individual components or
|
|
sub-systems are meaningless, and the nearest equivalent would be the
|
|
FIT/PFD and SFF and diagnostic coverage figures.}.
|
|
\glossary{name={FIT}, description={Failure in Time (FIT). The number of times a particular failure is expected to occur in a $10^{9}$ hour time period.}}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsubsection{FMEDA and failure outcome prediction accuracy.}
|
|
FMEDA suffers from the same problems of
|
|
lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}.
|
|
\fmodegloss
|
|
%
|
|
This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level.
|
|
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
|
|
may be part of a critical monitoring function.
|
|
The analyst is now put in a position
|
|
where he probably should assign a dangerous failure classification to it.
|
|
%
|
|
There is no analysis
|
|
of how that resistor would/could affect the components close to it, but because the circuitry
|
|
is part of a critical section it will most likely
|
|
be linked to a dangerous system level failure in an FMEDA study.
|
|
%
|
|
%%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA????
|
|
%%-
|
|
%A $\beta$ factor, the heuristically defined probability
|
|
%of the failure causing the system fault may be applied.
|
|
%
|
|
%In FMEDA there is no detailed analysis of the failure mode behaviour
|
|
%of the component in its local environment
|
|
%Component failure modes are traceable directly to the SYSTEM level.
|
|
%it becomes more
|
|
%guess work than science.
|
|
%
|
|
With FMEDA, there is no rigorous cause and effect analysis for the failure modes
|
|
and how they interact on the micro scale (the components adjacent to them in terms of functionality).
|
|
Unintended side effects that lead to failure can be missed.
|
|
Also component failure modes that are not
|
|
dangerous, may be wrongly assigned as dangerous simply because they exist in a critical
|
|
section of the product.
|
|
|
|
% some critical component failure
|
|
%modes, but we can only guess, in most cases what the safety case outcome
|
|
%will be if it occurs.
|
|
|
|
This leads to the practise of having components within a SYSTEM partitioned into different
|
|
safety level zones as recommended in EN61508\cite{en61508}. This is a vague way of determining
|
|
safety, as it can miss unexpected effects due to `unexpected' component interaction.
|
|
|
|
The Statistical Analysis methodology is the core philosophy
|
|
of the Safety Integrity Levels (SIL) embodied in EN61508 \cite{en61508}
|
|
and its international analog standard IOC5108.
|
|
|
|
|
|
|
|
\subsubsection{ FMEDA weaknesses }
|
|
\begin{itemize}
|
|
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
|
\item Statistical nature allows a proportion of undetected failures for given S.I.L. level. These could be catastrophic failures, as long as the perceived probability is low enough, they are considered acceptable for EN61508.
|
|
\item Complex component interaction effects are more likely to be seen (because self diagnostic capability is considered), than FMEA or FMECA but can still be missed.
|
|
\item Allows a small proportion of `undetectable' error conditions.
|
|
\item No possibility to model base component level double failure modes.
|
|
\end{itemize}
|
|
%AND then how we can solve all there problems
|
|
|
|
\section{Desirable Criteria for a failure mode methodology}
|
|
\begin{itemize}
|
|
\item All component failure modes must be considered in the model.
|
|
\item It should be easy to integrate mechanical, electronic and software models \cite{sccs}[pp.287].
|
|
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
|
|
\item It should have a formal basis, that is to say, be able to produce mathematical proofs
|
|
for its results, such as system level error causation trees, reliability and safety statistics.
|
|
\item It should be easy to use, ideally using a
|
|
graphical syntax (as opposed to a formal symbolic/mathematical text based language).
|
|
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
|
|
to smaller and smaller functional groupings \cite{maikowski}.
|
|
\item Multiple failure modes may be modelled from the base component level up.
|
|
\end{itemize}
|
|
|
|
|
|
\section{Design of a new static failure mode based methodology}
|
|
|
|
|
|
%
|
|
|
|
By taking {\dcs} to form higher level functional groups
|
|
we can build a bottom-up model incrementally.
|
|
In this way as we build the hierarchy, we naturally abstract the
|
|
failure mode behaviour, but can check that all failure modes in
|
|
the hierarchy have been considered and tied to causing symptoms.
|
|
|
|
\paragraph{Design Decision: Derived components must be determined from functional groups.}
|
|
The symptoms obtained from analysing a {\fg} will be used as the `failure~modes'
|
|
of its corresponding {\dc}.
|
|
|
|
\paragraph{Incremental Stages and \dcs .}
|
|
We can use incremental stages to build the hierarchy.
|
|
We can take small {\fg}s of components, where the {\fg}
|
|
is a small set of components that perform a simple
|
|
task.
|
|
%
|
|
%The functional group should perform a clearly defined task.
|
|
The design engineer must choose the components that form a {\fg}.
|
|
It should be possible to consider the {\fg} as a component or
|
|
black box, performing a given function.
|
|
The {\fg} should be chosen to be as small
|
|
(in terms of the number of components) as possible.
|
|
%
|
|
This should be small enough to be able %Another advantage of the functional group being small
|
|
to comfortably analyse all the failure
|
|
modes of its components.
|
|
%
|
|
We can consider these failure modes from the perspective
|
|
of the {\fg}. In other words, for each component failure mode in the {\fg},
|
|
we create a `test case' and decide how each failure affects the functional group.
|
|
%
|
|
With the results from the test cases we will now have the ways in which the
|
|
{\fg} can fail.
|
|
%
|
|
%
|
|
We can refine this further, by grouping the common symptoms, or results that
|
|
are the same failure {\wrt} the {\fg}.
|
|
%
|
|
We can now treat the {\fg} as a component, and create a corresponding {\dc}: in other words, a `sub-system' with a known set of failure modes.
|
|
%
|
|
We can now create a new/{\dc} and assign it these common symptoms
|
|
as its failure modes.
|
|
%
|
|
This {\dc} can be used to build higher level
|
|
{\fg}s, and this will naturally form a hierarchy.
|
|
This hierarchy can be extended until it encompasses
|
|
an entire SYSTEM.
|
|
%
|
|
It can be considered complete when
|
|
all failure modes from all components are included in the model
|
|
and all base component failure modes can be traced
|
|
through the fault tree to SYSTEM level failure modes.
|
|
|
|
\paragraph{Directed Acyclic Graph (DAG).}
|
|
If we ensure that
|
|
derived components cannot be included in {\fg}s
|
|
of a lower abstraction level\paragraph{New methodology must be bottom-up.}
|
|
In order to ensure that all component failure modes have been covered
|
|
the methodology will have to work from the bottom-up
|
|
and start with the component failure modes.
|
|
\fmodegloss
|
|
%
|
|
\paragraph{Natural Fault Finding is top down.}
|
|
The traditional fault finding, or natural fault finding
|
|
is to start at the top with SYSTEM level failure modes/faults.
|
|
%
|
|
On encountering a
|
|
fault, the symptom is first observed at the top or
|
|
SYSTEM level. By decomposing the functionality of the faulty system and testing
|
|
we can further decompose the system until we find the
|
|
faulty base level component.
|
|
Decomposition of electrical circuits is formalised and explored
|
|
in \cite{maikowski}. This top down technique de-composes by functionality.
|
|
Simpler and simpler functional groups are discovered as we delve
|
|
further into the way the system works and is built.
|
|
|
|
|
|
\paragraph{Need for a `bottom-up' system de-composition.}
|
|
There is an apparent conflict here as de-composition normally implies a top-down approach. The natural way to
|
|
de-compose a system is from the top down.
|
|
%
|
|
If we do this though, we do not naturally include
|
|
all failure modes in the modules determined as we
|
|
de-compose downwards.
|
|
%
|
|
What is required here is to mimic this top-down de-composition
|
|
with a bottom up technique.
|
|
By doing that, we can take all base component failure modes
|
|
and ensure they are included in the model.
|
|
|
|
By taking components that form {\fg}s from the bottom up
|
|
and then taking those to form higher level
|
|
{\fg}s we can get a close approximation of the de-composition process from the bottom up.
|
|
The philosophy of top down de-composition is very similar.
|
|
Top down de-composition applies functional
|
|
de-composition, because it seeks to break the system down
|
|
into manageable and separately testable entities.
|
|
A second justification for this is that the design process for a product requires both top down and bottom-up
|
|
thinking. To analyse a system from the bottom-up is a useful
|
|
design validation process in itself \cite{sommerville}.
|
|
%%
|
|
%% CAN we find a ref for both top and bottom up being used
|
|
%% as design validation ????
|
|
|
|
\paragraph{Design Decision: Methodology must be bottom-up.}
|
|
In order to ensure that all component failure modes are handled,
|
|
this methodology must start at the bottom, with base component failure modes.
|
|
In this way automated checking can be applied to all component failure modes
|
|
to ensure none have been inadvertently excluded from the process.
|
|
|
|
\paragraph{Problems with functional group hierarchy.}
|
|
A hierarchy of functional grouping, leading to a system model
|
|
still leaves us with the problem of the number of component failure modes.
|
|
The base components will typically have several failure modes each.
|
|
%
|
|
Given a typical embedded system may have hundreds of components,
|
|
this means that we would still have to tie base component failure modes
|
|
to SYSTEM level errors.
|
|
%
|
|
The problem with this is that the base component failure mode under investigation,
|
|
are not rigorously examined in relation to functionally adjacent components.
|
|
%
|
|
If failures modes could be collected and simplified somehow
|
|
at each stage in a hierarchy of {\fgs}, the functionally adjacent
|
|
ideal would be met, and as we progress up the hierarchy the number
|
|
of failure modes should decrease.
|
|
%Thus there is the `possibility to miss failure mode effects
|
|
%at the much higher SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
|
|
%%%
|
|
%%% OK Got up to here Lunchtime edit 06DEC2010.............
|
|
|
|
\paragraph{Design Decision: Methodology must collate errors at each functional group stage.}
|
|
SYSTEMS typically have far fewer failure modes than the sum of their base component failure modes.
|
|
SYSTEM level failures may be caused by a variety of component failure modes.
|
|
A SYSTEM level failure mode is an abstracted failure mode, in that
|
|
it is a symptom of some lower level failure or failures.
|
|
Tracing the SYSTEM level failure or symptom, down through
|
|
a decomposed system, will give a fault tree. This will typically
|
|
trace the SYSTEM level failure mode to some individual base component failures
|
|
or combinations thereof.
|
|
% ABSTRACTION
|
|
For instance a failed resistor in a sensor at a base component level is a specific
|
|
failure mode.
|
|
%
|
|
For example it could be called `RESISTOR 1 OPEN'.
|
|
%
|
|
Now consider the symptom in a functional group comprising the sensor channel that
|
|
RESISTOR 1 is part of `RESISTOR 1 OPEN'.
|
|
%
|
|
We might call it `READING~HIGH' failure perhaps.
|
|
The Fault has become less detailed and more general. There may be other
|
|
causes for a `READING~HIGH'. We can say that the failure
|
|
mode `READING~HIGH' is more abstract in terms of the SYSTEM, than `RESISTOR 1 OPEN'.
|
|
%
|
|
At a higher level still
|
|
this may be called `SENSOR CHANNEL 1' fault.
|
|
At a system level it may simply be a `SENSOR FAILURE'.
|
|
As we traverse up the fault tree the failure modes
|
|
become more abstract.
|
|
%
|
|
At each functional group collection, there must be a process to collect
|
|
common symptoms and reduce the number of failure modes to handle.
|
|
This must be a process that incrementally reduces the number
|
|
of failure modes as the abstraction level reaches the SYSTEM level.
|
|
|
|
\paragraph{How to build a meaningful SYSTEM failure behaviour model.}
|
|
The next problem is how we build a failure mode model
|
|
that converges from a multitude of base
|
|
component failures to a finite set of SYSTEM level failure modes.
|
|
%
|
|
It would be better to analyse the failure mode behaviour of each
|
|
functional group, and determine the ways in which it, rather than its
|
|
components, can fail.
|
|
%
|
|
By doing this, the natural process whereby symptoms of the {\fg}
|
|
(which can potentially be caused by more than one component failure mode)
|
|
are extracted.
|
|
%
|
|
The number of symptoms will be less than or equal to the number
|
|
component failure modes, and in practise will be much less.
|
|
%
|
|
Thus stage by stage symptom collection becomes the key to reducing the number
|
|
of failure modes to handle as we traverse up the hierarchy.
|
|
|
|
|
|
|
|
\paragraph{Component failures and {\fg} failure symptoms.}
|
|
In other words we want to find out what the symptoms of the failures in the {\fg}s
|
|
are.
|
|
%The number of symptoms of failure should be equal to or
|
|
%less than the number of component failure modes, simply because
|
|
%often there are several potential causes of failure symptoms.
|
|
%
|
|
When we have the symptoms, we can start thinking of the {\fg} as a component in its own right.
|
|
%with a simplified and reduced set of failure symptoms.
|
|
%
|
|
We can now create a new {\dc}, where its failure modes
|
|
are the failure symptoms of the {\fg}.
|
|
the data structure produced from collecting functional groups
|
|
and deriving components will naturally form a DAG.
|
|
In other words we can say that we cannot allow a {\fg}
|
|
to include any component created from it.
|
|
|
|
%
|
|
%
|
|
By representing the failure mode model as a DAG, we
|
|
now have the capability to take SYSTEM level failure modes
|
|
and determine the possible combinations of component failure modes that
|
|
could have caused it.
|
|
This will allow us to define fault trees for each SYSTEM level failure.
|
|
This will mean that we be able to determine which
|
|
combinations of base component failures could cause the SYSTEM
|
|
failure.
|
|
%In FTA terminology, a list of possible
|
|
%causes for a SYSTEM level failure is known as a `cut set' \cite{nasafta}\cite{nucfta}.
|
|
If statistical models exist for the component failure modes
|
|
these failure causation trees (or minimal cut sets\footnote{In FTA terminology a minimal cut set is the branch of a
|
|
%\glossary{name={entry name}, description={entry description}}
|
|
\glossary{name={cut set}, description={A cut set in a fault tree is a set of base component failure modes, whose occurrence ensures that a TOP (or SYSTEM) event occurs} }
|
|
\glossary{name={minimal cut set}, description={A cut set in a fault tree that cannot be reduced (i.e. \textbf{all} the base component failure modes are required to cause the SYSTEM level event) } }
|
|
fault tree, from the top SYSTEM level to the bottom, with the least number
|
|
of base component failure modes. If a single base component failure mode can cause
|
|
a SYSTEM level error this is usually considered a liability.})
|
|
can be used to calculate Mean Time to Failure (MTTF) or
|
|
Probability of Failure on demand (PFD) figures.
|
|
Contrast the analytical capability of FMMD with the
|
|
methodologies where the component failure modes/components are linked
|
|
directly to SYSTEM failure modes with no analysis stages in between.
|
|
|
|
|
|
|
|
\paragraph{Design Decision: A functional group cannot
|
|
contain {\dc}s at a higher abstraction level than itself}
|
|
|
|
We can say that no component may be derived from itself directly
|
|
or indirectly.
|
|
We can track the `abstraction level' by increasing it each time
|
|
there is a phase of symptom collection.
|
|
We can use the symbol $alpha$ to represent the abstraction level
|
|
and make it an attribute of a component.
|
|
Base components will have an $\alpha$ level of zero.
|
|
A derived component when created must always have a greater $\alpha$ value than any
|
|
of the components included in the {\fg} from which it was derived.
|
|
|
|
|
|
\paragraph{Natural Reduction in number of failure modes with abstraction level.}
|
|
%
|
|
Because common symptoms are being collected, as we build the tree upward
|
|
the number of failure modes decreases (or exceptionally stays the same)
|
|
at each level.\footnote{In very unusual cases where the known
|
|
failure modes of a {\fg} can be collected into symptoms,
|
|
the number of failure modes from its components would be the
|
|
same as the number of failure modes in the component derived from it.}
|
|
This decreasing of the number of failure modes is borne out {\irl}.
|
|
Of the thousands of component failure modes in a typical product
|
|
there are generally only a handful of SYSTEM level failure modes
|
|
(or top level `symptoms' of underlying failures).
|
|
%
|
|
|
|
\subsection{Outline of the FMMD process}
|
|
\label{fmmdproc}
|
|
FMMD builds {\fg}s of components from the bottom-up.
|
|
The lowest level of components are termed base components.
|
|
These are the initial building blocks.
|
|
In electronics these would be the individual
|
|
passive and active components on the parts~list.
|
|
In mechanics the levers, linkages, springs and cogs etc.
|
|
%
|
|
Functional groups are collections of components
|
|
that work together to perform a simple function.
|
|
%
|
|
We can perform a failure mode effects analysis on each of the component failure
|
|
modes within a {\fg}. Because we can guide the process in software we can
|
|
ensure that all component failure modes
|
|
are included in the model.
|
|
%
|
|
We can then treat the {\fg} as a `black box' or component in its own right.
|
|
We can now look at how the {\fg} can fail.
|
|
%
|
|
Many of the component failure modes will
|
|
cause the same failure symptoms in the {\fg}.
|
|
We can collect these failures as common symptoms.
|
|
%
|
|
When we have our set of symptoms, we can now create
|
|
a {\dc}. The {\dc} will have as its set of failures
|
|
modes, the collected symptoms of the {\fg}.
|
|
%
|
|
Because we can now have {\dcs} we can use these to form
|
|
new {\fg}s and we can build a hierarchical `failure~mode' model of the SYSTEM.
|
|
|
|
|
|
%%- Need diagram of hierarchy
|
|
%%-
|
|
%%-
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=200pt,bb=0 0 331 249,keepaspectratio=true]{./fmmd_concept/fmmd_hierarchy.jpg}
|
|
% fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
|
|
\caption{Example derived component created from the functional group comprised of components a,b,c}
|
|
\label{fig:fmmd_hierarchy}
|
|
\end{figure}
|
|
|
|
A {\fg} is a set of components (each with a set of of failure modes)
|
|
that collectively group together to serve some purpose (to perform some function),
|
|
and derived components are determined
|
|
from analysis and symptom collection
|
|
of the {\fg}.
|
|
|
|
The {\dc} is equipped with a new set of failure modes
|
|
corresponding to the symptoms from the {\fg}.
|
|
|
|
The diagram in figure \ref{fig:fmmd_hierarchy}, shows one stage
|
|
of the FMMD process. The resultant {\dc} may be used to
|
|
create higher level {\fg}s in later stages.
|
|
|
|
% \begin{figure}[h]
|
|
% \centering
|
|
% \includegraphics[bb=0 0 331 249,keepaspectratio=true]{./fmmd_hierarchy.jpg}
|
|
% % fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
|
|
% \caption{Example derived component created from a functional group comprised of components a,b,c}
|
|
% \label{fig:fmmd_hiarchy}
|
|
% \end{figure}
|
|
%
|
|
% \vspace{20pt}
|
|
% NEED DIAGRAM OF HIERARCHY
|
|
% \vspace{20pt}
|
|
|
|
We associate a component with its failure modes.
|
|
This is represented in UML in figure \ref{fig:component concept}.
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=200pt,keepaspectratio=true]{./fmmd_concept/component.jpg}
|
|
% component.jpg: 467x76 pixel, 72dpi, 16.47x2.68 cm, bb=0 0 467 76
|
|
\caption{Component with failure modes UML diagram}
|
|
\label{fig:component concept}
|
|
\end{figure}
|
|
|
|
|
|
\subsection{Environmental Conditions, Operational States and FMMD}
|
|
|
|
Any real world sub-system will exist in a variable environment
|
|
and may have several modes of operation.
|
|
In order to find all possible failures, the sub-system
|
|
must be analysed for each operational state
|
|
and environment condition that can affect it.
|
|
%
|
|
Two design decisions are required here: which objects should we
|
|
analyse the environmental and the operational states with respect to?
|
|
There are three objects in our model to which these considerations could be applied.
|
|
We could apply these conditions for analysis
|
|
to the functional group, the components, or the derived
|
|
component.
|
|
|
|
\paragraph {Environmental Conditions and FMMD.}
|
|
|
|
Environmental conditions are external to the
|
|
{\fg} and are often things over which the system has no direct control.
|
|
Consider ambient temperature, pressure or even electrical interference levels.
|
|
%
|
|
Environmental conditions may affect different components in a {\fg}
|
|
in different ways.
|
|
|
|
For instance, a system may be specified for
|
|
$0\oc$ to $85\oc$ operation, but some components
|
|
may show failure behaviour between $60\oc$ and $85\oc$
|
|
\footnote{Opto-isolators typically show marked performance decrease after
|
|
$60\oc$ \cite{tlp181}, whereas another common component, say a resistor, will be unaffected.}.
|
|
Other components may operate comfortably within that whole temperature range specified.
|
|
Environmental conditions will have an effect on the {\fg} and the {\dc},
|
|
but they will have specific effects on individual components.
|
|
|
|
\paragraph{Design Decision.}
|
|
Environmental constraints will be applied to components.
|
|
A component will hold a set of environmental states that
|
|
affect it.
|
|
Environmental conditions will apply SYSTEM wide,
|
|
but may only affect specific components.
|
|
%Some may not be required for consideration
|
|
%for the analysis of particular systems.
|
|
|
|
\paragraph {Operational States and FMMD.}
|
|
|
|
Sub-systems may have specific operational states.
|
|
These could be a general health level such as
|
|
normal operation, graceful degradation or lockout.
|
|
Or they could be self~checking sub-systems that are either in a normal or self~check state.
|
|
|
|
Operational states are conditions that apply to a functional group, not individual components.
|
|
%% Andrew says that that does no make sense But I think it does
|
|
|
|
\paragraph{Design Decision.}
|
|
Operational state will be applied to {\fg}s.
|
|
|
|
\paragraph{UML Model of FMMD Analysis}
|
|
|
|
The UML diagram in figure \ref{fig:env_op_uml}, shows the data
|
|
relationships between {\fgs} and operational states, and component
|
|
failure modes and environmental factors.
|
|
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=400pt,bb=0 0 818 249,keepaspectratio=true]{./fmmd_concept/fmmd_env_op_uml.jpg}
|
|
% fmmd_env_op_uml.jpg: 818x249 pixel, 72dpi, 28.86x8.78 cm, bb=0 0 818 249
|
|
\caption{UML model of Environmental and Operational states w.r.t FMMD}
|
|
\label{fig:env_op_uml}
|
|
\end{figure}
|
|
|
|
|
|
|
|
\subsection{Justification of wishlist}
|
|
|
|
By applying the methodology in section \ref{fmmdproc}, the wishlist can
|
|
now be evaluated for the proposed FMMD methodology.
|
|
|
|
\subsubsection{All component failure modes must be considered in the model.}
|
|
The proposed methodology will be bottom-up.
|
|
This ensures that all component failure modes are handled.
|
|
|
|
|
|
\subsubsection{ It should be easy to integrate mechanical, electronic and software models.}
|
|
Because component failure modes are considered, we have a generic entity to model.
|
|
We can describe a mechanical, electrical or software component in terms of its failure modes.
|
|
%
|
|
Because of this
|
|
we can model and analyse integrated electromechanical systems, controlled by computers,
|
|
using a common notation.
|
|
|
|
\subsubsection{ It should be re-usable, in that commonly used modules can be re-used in other designs/projects.}
|
|
The hierarchical nature, taking {\fg}s and deriving components from them, means that
|
|
commonly used {\dcs} can be re-used in a design (for instance self checking digital inputs)
|
|
or even in other projects where the same {\dc} is used.
|
|
|
|
|
|
|
|
\subsubsection{ It should have a formal basis, data should be available to produce mathematical proofs
|
|
for its results}
|
|
Because the failure mode of a SYSTEM is a hierarchy of {\fg}s and derived components
|
|
SYSTEM level failure modes are traceable back down the fault tree to
|
|
component level failure modes. This provides causation trees \cite{sccs} or, minimal cut sets
|
|
for all SYSTEM failure modes.
|
|
|
|
\subsubsection{ It should be capable of producing reliability and danger evaluation statistics.}
|
|
The minimal cuts sets for the SYSTEM level failures can have computed MTTF
|
|
and danger evaluation statistics sourced from the component failure mode statistics \cite {mil1991}.
|
|
|
|
\subsubsection{ It should be easy to use, ideally
|
|
using a graphical syntax (as opposed to a formal mathematical one).}
|
|
A modified form of constraint diagram (an extension of Euler diagrams) has
|
|
been developed to support the FMMD methodology.
|
|
This uses Euler circles to represent failure modes, and spiders to collect symptoms, to
|
|
advance a {\fg} to a {\dc}.
|
|
|
|
|
|
\subsubsection{ From the top down the failure mode model should follow a logical de-composition of the functionality
|
|
to smaller and smaller functional modules \cite{maikowski}.}
|
|
The bottom-up approach fulfils the logical de-composition requirement, because the {\fg}s
|
|
are built from components performing a given task.
|
|
|
|
|
|
\subsubsection{ Multiple failure modes may be modelled from the base component level up.}
|
|
By breaking the problem of failure mode analysis into small stages
|
|
and building a hierarchy, the problems associated with the cross products of
|
|
all failure modes within a system are reduced by an exponential order.
|
|
This is because the multiple failure modes are considered
|
|
within {\fgs} which have fewer failure modes to consider
|
|
at each FMMD stage.
|
|
Where appropriate, multiple simultaneous failures can be modelled by
|
|
introducing test~cases where the conjunction of failure modes is considered.
|
|
|
|
|
|
\begin{figure}
|
|
\centering
|
|
\begin{tikzpicture}[shorten >=1pt,->,draw=black!50, node distance=\layersep]
|
|
\draw[style=thick];
|
|
|
|
\tikzstyle{every pin edge}=[<-,shorten <=1pt]
|
|
\tikzstyle{fmmde}=[circle,fill=black!25,minimum size=17pt,inner sep=0pt]
|
|
\tikzstyle{fmmdt}=[elipse,fill=red!15,minimum size=17pt,inner sep=0pt]
|
|
\tikzstyle{fmmdc}=[rectangle,draw,fill=black!17,minimum size=17pt,inner sep=4pt]
|
|
\tikzstyle{fmmdi}=[regular polygon,regular polygon sides=6, draw],fill=black!25,minimum size=50,inner sep=4pt]
|
|
\tikzstyle{component}=[fmmde, fill=green!50];
|
|
\tikzstyle{ctext}=[fmmde, draw, fill=black!20];
|
|
\tikzstyle{failure}=[fmmde, fill=red!50];
|
|
\tikzstyle{symptom}=[fmmde, fill=blue!50];
|
|
\tikzstyle{inhibit}=[fmmdi, fill=blue!40];
|
|
\tikzstyle{condition}=[fmmdc, fill=black!20];
|
|
\tikzstyle{conjunction}=[fmmde, fill=red!40];
|
|
\tikzstyle{annot} = [text width=4em, text centered]
|
|
|
|
\node[condition] (C-Q) at (0,-1) {Condition Q};
|
|
\node[inhibit] (I) at (0,-4) {Inhibit};
|
|
\node[ctext] (CC) at (4,-4) {$\stackrel{ probability\; that}{ Q\; occurs\; given\; A}$};
|
|
%\node[text] (T) at (2,-2) {Probability that Q occurs given A};
|
|
\node[condition] (C-A) at (0,-7) {Condition A};
|
|
|
|
|
|
|
|
\path (C-A) edge (I);
|
|
\path (CC) edge (I);
|
|
\path (I) edge (C-Q);
|
|
%\path (C-1b) edge (CJ);
|
|
%\path (C-1b) edge (CJ);
|
|
|
|
\end{tikzpicture}
|
|
% End of code
|
|
\caption{FTA `inhibit' gate}
|
|
\label{fig:inhibitconcept}
|
|
\end{figure}
|
|
|
|
\subsubsection {Inhibit Conditions}
|
|
Some failure modes only occur when another failure has occurred, or
|
|
due to an environmental condition reaching a critical value. This is specifically
|
|
dealt with using the FTA methodology~\cite{nucfta}[IV 9].
|
|
An example FTA inhibit gate is shown in figure \ref{fig:inhibitconcept}.
|
|
\paragraph{Static or Dynamic Modelling of Inhibit}
|
|
If the model is static we can consider the conditional failure,
|
|
at a lower probability of occurring (i.e. the probability
|
|
of A multiplied by the probability of Q).
|
|
If we wish to dynamically model the conditional failure
|
|
an attribute to the failure~modes must be added
|
|
that can reference other failure~modes and environmental conditions.
|
|
A UML diagram with inhibit conditions added is shown in figure \ref{fig:umlconcept2}.
|
|
|
|
\subsection{Safe Dangerous, Detected and Undetected.}
|
|
|
|
The top level or SYSTEM failure modes can be examined and
|
|
assigned SIL~\cite{en61508} safe and dangerous attributes.
|
|
Detected failure modes appear as symptoms that have been
|
|
integrated into symptoms involving self checking.
|
|
Undetectable failure modes, will follow a direct line
|
|
up from component level to SYSTEM level without being
|
|
incorporated into a self checking functional group.
|
|
These undetected failures correspond to a minimal cut
|
|
set where a single base~component failure mode
|
|
can be traced to a SYSTEM level failure mode.
|
|
They can thus be determined by searching the DAG
|
|
for a single base~component failure mode minimal cut set~\cite{nucfta}.
|
|
|
|
% UML DIAGRAM
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=400pt,keepaspectratio=true]{./fmmd_concept/fmmd_env_op_uml2.jpg}
|
|
% fmmd_env_op_uml2.jpg: 866x313 pixel, 72dpi, 30.55x11.04 cm, bb=0 0 866 313
|
|
\caption{UML diagram with Inhibit conditions}
|
|
\label{fig:umlconcept2}
|
|
\end{figure}
|
|
|
|
|
|
\subsection{Aims of FMMD Methodology}
|
|
\label{sec:aims}
|
|
Taking the four current failure mode methodologies into consideration, and comparing them to the proposed FMMD methodology, the following wish list or aims can be stated.
|
|
|
|
\begin{itemize}
|
|
\item It can be checked automatically that all component failure modes have
|
|
been considered in the model. Should a failure mode have been missed
|
|
the data model can be searched and the un-handled failure modes flagged to the design engineer.
|
|
\item Because we are modelling with failure modes the {\fgs} and {\dcs} these can be generic,
|
|
i.e. mechanical, electronic or software components.
|
|
\item The {\dcs} are re-usable, in that commonly used modules can be re-used in other designs/projects.
|
|
\item It will have a formal basis, that is to say,
|
|
we have the data at hand to produce meaningful
|
|
results (MTTF and the cause trees for SYSTEM level faults).
|
|
\item Overall reliability and danger evaluation statistics can be computed.
|
|
By knowing all causation trees,
|
|
the statistical probabilities (from base component data) for all causes can be simply added.
|
|
\item A graphical representation based on Euler diagrams is used.
|
|
This provides an interface that does not involve
|
|
formal mathematical/symbolic notation.
|
|
This is intended to be user friendly and to guide the user through the FMMD process
|
|
while applying automatic checks for un-handled conditions.
|
|
\item From the top down the failure mode model will follow a logical de-composition of the functionality; by
|
|
chosing {\fg}s and working bottom-up this hierarchical trait will occur as a natural consequence.
|
|
\item Undetectable or un-handled failure modes will be specifically flagged.
|
|
\item It is possible to model multiple failure modes.
|
|
\end{itemize}
|
|
|
|
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
%paper
|
|
\pagebreak[4]
|
|
\section{Re-Factoring the UML Model}
|
|
The UML models thus far in this
|
|
have been used to develop the data relationships required to perform FMMD analysis.
|
|
This section re-organises and rationalises the UML model.
|
|
We want to be able to use {\dcs} in functional groups.
|
|
It therefore makes sense for {\dc} to inherit {\em component}.
|
|
|
|
The re-factored UML diagram is shown in figure \ref{fig:refactored_uml}.
|
|
|
|
|
|
\begin{figure}[h]
|
|
\centering
|
|
\includegraphics[width=400pt,bb=0 0 702 464]{./master_uml.jpg}
|
|
% master_uml.jpg: 702x464 pixel, 72dpi, 24.76x16.37 cm, bb=0 0 702 464
|
|
\caption{Re-factored UML Diagram}
|
|
\label{fig:refactored_uml}
|
|
\end{figure}
|
|
|
|
}
|
|
{
|
|
% chapter
|
|
\section{Re-Factoring the UML Model}
|
|
This chapter has used UML diagrams to develop the data types required to implement FMMD.
|
|
The terms used in FMMD and the UML data model are further refined in
|
|
chapter \ref{defs}.
|
|
}
|
|
|
|
\section{Conclusion}
|
|
|
|
This
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
paper
|
|
describes how the FMMD methodology
|
|
functions, given requirements and constraints such as the number of combinations
|
|
of failure causes.
|
|
It describes the need for the new methodology to be bottom-up, and
|
|
then the need for incremental modularisation
|
|
to build a fault mode hierarchy, which leads to the concept of functional grouping,
|
|
analysis of those groupings, and from that
|
|
the creation of derived components.
|
|
}
|
|
{
|
|
chapter
|
|
}
|
|
provides the background for the need for a new methodology for
|
|
static analysis that can span the mechanical electrical and software domains
|
|
using a common notation.
|
|
The author believes it addresses many short comings in current static failure mode analysis methodologies.
|
|
\vspace{60pt}
|
|
\today
|
|
|
|
%% $$\frac{-b\pm\sqrt{ {b^2-4ac}}}{2a}$$
|
|
%\today
|