871 lines
39 KiB
TeX
871 lines
39 KiB
TeX
|
||
|
||
|
||
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
\abstract{
|
||
This paper proposes a methodology for
|
||
creating failure mode models of safety critical systems, which
|
||
have a common notation
|
||
for mechanical, electronic and software domains and apply an
|
||
incremental and rigorous approach.
|
||
|
||
%% What I have done
|
||
%%
|
||
The Four main static failure mode analysis methodologies were examined and
|
||
in the context of newer European safety standards, assessed.
|
||
Some of the defeciencies identified in these methodologies lead to
|
||
a wish list for a more ideal methodology.
|
||
|
||
%% What I have found
|
||
%%
|
||
From the wish list and considering some constraints determined from
|
||
the evaluation of the four established methodologies, a new
|
||
methodology is developed and proposed. The has been named Failure Mode Modular De-Composition (FMMD).
|
||
|
||
%% Sell it
|
||
%%
|
||
In addition to addressing the traditional weaknesses of
|
||
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticallity Analysis (FMECA)
|
||
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
|
||
as specified in newer European Safety Standards \cite{en298}.
|
||
The proposed methodology is bottom-up and can guarantee to leave no component failure mode unhandled.
|
||
It is also modular, meaning that the results of analysed components may be re-used in other projects.
|
||
}
|
||
}
|
||
{
|
||
%%% CHAPTER INTO NEARLT THE SAME AS ABSTRACT
|
||
|
||
This chapter proposes a methodology for
|
||
creating failure mode models of safety critical systems, which
|
||
have a common notation
|
||
for mechanical, electronic and software domains and apply an
|
||
incremental and rigorous approach.
|
||
|
||
%% What I have done
|
||
%%
|
||
The Four main static failure mode analysis methodologies were examined and
|
||
in the context of newer European safety standards, assessed.
|
||
Some of the defeciencies identified in these methodologies lead to
|
||
a wish list for a more ideal methodology.
|
||
|
||
%% What I have found
|
||
%%
|
||
From the wish list and considering some constraints determined from
|
||
the evaluation of the four established methodologies, a new
|
||
methodology is developed and proposed. The has been named Failure Mode Modular De-Composition (FMMD).
|
||
|
||
%% Sell it
|
||
%%
|
||
In addition to addressing the traditional weaknesses of
|
||
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticallity Analysis (FMECA)
|
||
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
|
||
as specified in newer European Safety Standards \cite{en298}.
|
||
The proposed methodology is bottom-up and can guarantee to leave no component failure mode unhandled.
|
||
It is also modular, meaning that the results of analysed components may be re-used in other projects.
|
||
|
||
}
|
||
|
||
|
||
|
||
\section{Current Static Failure Mode Methodologies}
|
||
|
||
There are four methodologies in common use for failure mode modelling.
|
||
These are FTA, FMEA, FMECA
|
||
and FMEDA (a form of statistical assessment).
|
||
|
||
These methodologies date from the 1940's onwards and have several draw backs and
|
||
advantages that are discussed in the next section.
|
||
%In short
|
||
%FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
|
||
%lack precision in predicting failure modes at the SYSTEM level.
|
||
|
||
|
||
The Failure Mode Modular De-composition
|
||
(FMMD) aims to address the
|
||
weaknesses in these methodoligies and to add
|
||
features such as the ability to analyse double
|
||
failure mode scenarios, and to allow modular re-use
|
||
of analysis.
|
||
|
||
%FMMD is an incremental bottom up FMEA process.
|
||
The FMMD
|
||
methodology presented here provides a more detailed and analytical
|
||
modelling system which will create a more complete and detailed hierarchical failure mode model from which
|
||
the data models from FTA, FMEA, FMECA and FMEDA (the statistical approach) can be
|
||
derived if required. An FMMD model is therefore a super set of all these models.
|
||
It also applies rigorous checking in all the analysis stages
|
||
ensuring that all component failure modes must be considered in the model.
|
||
|
||
%
|
||
This methodology has been named Failure Mode Modular De-composition (FMMD)
|
||
because it de-composes a SYSTEM into a hierarchy of modules or {\dc}s.
|
||
This
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
paper
|
||
}
|
||
{
|
||
chapter
|
||
}
|
||
presents the design considerations that determined
|
||
the FMMD methodology.
|
||
It first briefly reviews the four traditional
|
||
static failure mode analysis methodologies and
|
||
lists their known weaknesses. A wish list is then drawn up
|
||
addressing these weaknesses and adding some extra requirements.
|
||
Using this wish list the philosophy for the new methodology
|
||
is built up.
|
||
%
|
||
FMMD works by working from the bottom up, taking small groups
|
||
of components, {\fgs}, and then analysing how they can fail.
|
||
This analysis is performed using FMEA from a micro rather than a macro perspective.
|
||
Thus instead of looking at component failure modes and determining how
|
||
they {\em may} cause a failure at SYSTEM level, we are looking at how
|
||
they {\em will} affect the {\fg}.
|
||
When we know the failure modes of a {\fg} we can treat it as a `black box'
|
||
or {\dc}. With {\dc}s we can build {\fgs}
|
||
at higher levels of analysis, until we have a complete
|
||
hierarchy representing the failure behaviour of the SYSTEM.
|
||
%
|
||
Because all the failure modes of all the components
|
||
are held in a computer program, we can determine if the model is complete
|
||
(i.e. all component failure modes have been included in the model).
|
||
|
||
|
||
%OK need to describe the need for it
|
||
\section{The need for a new failure mode modelling methodology}
|
||
|
||
%%- There are dificulties with bot up methodologies,
|
||
%%- and this is in part due to the fact that accidents
|
||
%%- are always unforseen and unexpected.
|
||
|
||
%%- what do we have ENV factors, component failure modes.
|
||
|
||
%%- how difficult is it to take a single component failure mode and
|
||
%%- then from that determine how it will react with other components
|
||
%%- and how it will be affected
|
||
|
||
\subsection{General Comments on bottom-up and top down approaches}
|
||
|
||
\paragraph{A general defeciency in top-down systems analysis}
|
||
With a top down approach the investigator has to determine
|
||
a set of undesirable outcomes or accidents.
|
||
As most accidents are unexpected and the causes unforseen \cite{safeware}
|
||
it is fair to say that a top down approach is not guaranteed to
|
||
predict all possible undesirable outcomes.
|
||
It also can miss known component failure modes, by
|
||
simply not de-composing down to the base component failure mode level of detail.
|
||
|
||
\paragraph{A general problem with bottom-up}
|
||
With the bottom up techniques we have all the known component failure modes
|
||
and the freedom to determine how each of these may affect the SYSTEM.
|
||
We do have a real prolem though in determining how
|
||
the failure mode of one component will affect another working component
|
||
to cause an undesirable state. Because of the number of components
|
||
our one failure mode may interact with is large,
|
||
we cannot consider them all and human judgement is used to
|
||
decide which interactions are important.
|
||
|
||
Let N be the number of components in our system, and K be the average number of component failure modes
|
||
(ways in which the component can fail). The total number of base component failure modes
|
||
is $N \times K$. To examine the affect that one failure mode has on all the other components
|
||
will be $(N-1) \times N \times K$, in effect a set cross product.
|
||
|
||
|
||
Complicate this further with applied states or environmental conditions
|
||
and another order of cross product of complexity is added.
|
||
We may have a piece of self checking circuitry for instance that
|
||
has two states, normal and testing mode commanded by a logic line.
|
||
Or we may have a mechanical device that has a different
|
||
failure mode behaviour for say, different ambient pressures or temperatures.
|
||
|
||
If $E$ is the number of applied states or environmental conditions to consider
|
||
in a system, the job of the bottom-up analyst is complicated by a cross product factor again
|
||
$(N-1) \times N \times K \times E$.
|
||
If we put some typical very small embedded system numbers\footnote{these figures would
|
||
be typical of a very simple temperature controller, with a micro-controller sensor and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$
|
||
we have $99 \times 100 \times 2.5 \times 10 = 247500 $.
|
||
To look in detail at a quarter of a million test cases is obviously impractical.
|
||
|
||
If we were to consider multiple simultaneous failure modes
|
||
we have yet another complication cross product.
|
||
|
||
For instance for looking at double simultaneous failure modes,
|
||
the equation reads $(N-2) \times (N-1) \times N \times K \times E$.
|
||
|
||
The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes and link them
|
||
to SYSTEM level failure modes. Because of the astronomical number of possible interactions,
|
||
some valid ones are in danger of being missed, we can term this analysis a `leap of faith' from the
|
||
component failure mode to the SYSTEM level.
|
||
|
||
|
||
|
||
\paragraph{Ideal static failure mode methodology}
|
||
An ideal static failure mode methodology would build a failure mode model
|
||
from which the traditional four models could be derived.
|
||
It would address the short-comings in the other methodologies, and
|
||
would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons
|
||
to represent the results of analysis phases.
|
||
%
|
||
%There are four static analysis failure mode methodologies in common use.
|
||
%Each has its advantages and drawbacks, and each is suited for
|
||
%a different phase in the product life cycle.
|
||
The four methodologies in current use are discussed briefly below.
|
||
|
||
\subsection { FTA }
|
||
|
||
This, like all top~down methodologies introduces the very serious problem
|
||
of missing component failure modes \cite{faa}[Ch.9].
|
||
%, or modelling at
|
||
%a too high level of failure mode abstraction.
|
||
FTA was invented for use on the minuteman nuclear defence missile
|
||
systems in the early 1960s and was not designed as a rigorous
|
||
fault/failure mode methodology.
|
||
It was designed to look for disasterous top level hazards and
|
||
determine how they could be caused.
|
||
It is more like a structure to
|
||
be applied when discussing the safety of a system, with a top down hierarchical
|
||
notation using logic symbols, that guides the analysis.
|
||
This methodology was designed for
|
||
experienced engineers sitting around a large diagram and discussing the safety aspects.
|
||
Also the nature of a large rocket with red wire, and remote detonation
|
||
failsafes meant that the objective was to iron out common failures
|
||
not to rigorously detect all possible failures.
|
||
Consequently it was not designed to guarantee to cover all component failure modes,
|
||
and has no rigorous in-built safeguards to ensure coverage of all possible
|
||
system level outcomes.
|
||
|
||
\subsubsection{ FTA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss component failure modes
|
||
\item Possibility to miss environmetal affects.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
\subsection { FMEA }
|
||
|
||
\label{pfmea}
|
||
This is an early static analysis methodology, and concentrates
|
||
on SYSTEM level errors which have been investigated.
|
||
The investigation will typically point to a particular failure
|
||
of a component.
|
||
The methodology is now applied to find the significance of the failure.
|
||
Its is based on a simple equation where $S$ ranks the severity (or cost \cite{bfmea}) of the identified SYSTEM failure,
|
||
$O$ its occurance, and $D$ giving the failures detectability. Muliplying these
|
||
together,
|
||
gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
|
||
This gives in effect
|
||
a prioritised `todo list', with higher the $RPN$ values being the most urgent.
|
||
|
||
|
||
\subsubsection{ FMEA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item Possibility to miss environemtal affects.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
\paragraph{note.} FMEA is sometimes used in its literal sense, that is to say
|
||
failure Mode effects Analysis, simply looking at a systems internal failure
|
||
modes and determing what may happen as a result.
|
||
FMEA described in this section (\ref{pfmea}) is sometimes called `production FMEA'.
|
||
|
||
\subsection{FMECA}
|
||
|
||
Failure mode, effects, and criticality analysis (FMECDA) extends FMEA.
|
||
This is a bottom up methodology, which takes component failure modes
|
||
and traces them to the SYSTEM level failures.
|
||
%
|
||
Reliability data for components is used to predict the
|
||
failure statistics in the design stage.
|
||
A openly published source for the reliability of generic
|
||
electronic components was published by the DOD
|
||
in 1991 (MIL HDK 1991 \cite{mil1991}) and is a typical
|
||
source for MTFF data.
|
||
%
|
||
It can do this using probability \footnote{for a given component failure mode there will be a $\beta$ value, the
|
||
probability that the component failure mode will cause a given SYSTEM failure}.
|
||
%
|
||
This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea},
|
||
as often the component failure mode cannot be proven to cause a SYSTEM level failure, but
|
||
assigned a probability $\beta$ fator by the design engineer.
|
||
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
|
||
%
|
||
The results, as with FMEA are an $RPN$ number determining the significance of the SYSTEM fault.
|
||
|
||
%%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
|
||
%%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or
|
||
%%-WIKI- piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the
|
||
%%-WIKI- probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability
|
||
%%-WIKI- and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value.
|
||
%%-WIKI- FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications,
|
||
%%-WIKI- while various forms of FMEA predominate in other industries.
|
||
|
||
|
||
\subsubsection{ FMECA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item Possibility to miss environmental affects.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
|
||
\subsection { FMEDA or Statistical Analyis }
|
||
|
||
Failure Modes, Effects, and Diagnostic Analysis (FMEDA).
|
||
|
||
This is a process that takes all the components in a system,
|
||
and from the failure modes of those components, the investigating engineer
|
||
must tie them to possible SYSTEM level events/failure modes.
|
||
This technique
|
||
evaluates and the product’s self-diagnostic ability,
|
||
The calculations and procedure for FMEDA are
|
||
described in EN61508 Part 2 Appendix C \cite{en61508}[Part 2 App C].
|
||
The following gives an outline of the procedure.
|
||
|
||
\paragraph{FMEA}
|
||
The first stage is to apply FMEA to the SYSTEM.
|
||
Within the product all failure rates of individual
|
||
components contribute to the overall product failure rate.
|
||
Failure rates of individual components in the SYSTEM
|
||
are calculated based on component type and
|
||
environmental conditions.
|
||
|
||
\paragraph{Overall SYSTEM failure rate}
|
||
Product failure rate is the sum of all component
|
||
failure rates. This is the sum of safe and unsafe
|
||
failures.
|
||
|
||
\paragraph{Self Diagnostics}
|
||
We next evaluate the SYSTEMS’s self-diagnostic ability.
|
||
|
||
Each component’s failure mode and its failure rate are listed.
|
||
Failure modes are classified as safe or dangerous\footnote{Again this is taking a component failure mode and determing
|
||
how it will react with any other components in the SYSTEM and making a decision
|
||
based on hueistics.}.
|
||
detectable failures are labelled `$\lambda_D$' and safe failures `$\lambda_S$' by EN61508.
|
||
|
||
\paragraph{Determine Detectable and Undetecable Failures}
|
||
Each safe and dangerous failure mode is determined as detectable or un-detectable by the SYSTEMS’s
|
||
self checking features.
|
||
%
|
||
The result is a list of all components, their failure modes, the failure mode classification
|
||
as Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
|
||
and the failure rate of each classification using the failure rate
|
||
prediction results ($\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
||
|
||
Because some failure modes may not be discovered theoretically during the
|
||
next step is to investigate using an actual working SYSTEM.
|
||
This requires the deliberate introduction
|
||
of failures; any new failures discovered at this stage are classified
|
||
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$
|
||
and added to the result set.
|
||
%SD, SU, DD, DU.
|
||
|
||
\paragraph{Diagnostic Coverage.}
|
||
The diagnostic coverage is simply the ratio
|
||
of the dangerous detected probabilities
|
||
against the probability of all dangerous failures,
|
||
and is normally expressed as a percentage.
|
||
|
||
$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$
|
||
|
||
The diagnostic coverage for safe failures is given as
|
||
|
||
$$ SF = \frac{\Sigma\lambda_SD}{\Sigma\lambda_S} $$
|
||
|
||
|
||
\paragraph{Safe Failure Fraction.}
|
||
A key concept in FMEDA is Safe Failure Fraction (SFF).
|
||
This is the ratio of safe and dangerous detected failures
|
||
against the safe and dangerous failure probabilities.
|
||
Again this is usually expressed as a percentage.
|
||
|
||
$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$
|
||
|
||
This is the ratio of
|
||
Step 4 Calculate SFF, SIL and PFD
|
||
The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
|
||
SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
|
||
PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)
|
||
|
||
% Often a given component failure mode there will be a $\beta$ value, the
|
||
% probability that the component failure mode will cause a given SYSTEM failure.
|
||
|
||
\paragraph{Risk Mitigation}
|
||
|
||
The component may be have its risk factor
|
||
reduced by the checking interval (or $\tau$ time between self checking procedures).
|
||
|
||
Ultimately this technique calculates a risk factor for each component.
|
||
The risk factors of all the components are summed and
|
||
give a value for the `safety level' for the equipment in a given environment.
|
||
|
||
\paragraph{Classification into Safety Integrity Levels (SIL).}
|
||
There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
|
||
In addition to probablistic risk factors, the
|
||
diagnostic coverage and SFF
|
||
have threshold bands beoming stricter for each level.
|
||
Software techniques and constraints are
|
||
also become stricter for each SIL level.
|
||
|
||
FMEDA uses MTFF and other statistical models to determine the probability of
|
||
failures occurring, and provide an adaquate risk level.
|
||
%
|
||
%A component failure mode, given its MTTF
|
||
%the probability of detecting the fault and its safety relevant validation time $\tau$,
|
||
%contributes a simple risk factor that is summed
|
||
%in to give a final risk result.
|
||
%
|
||
Thus a statistical
|
||
model can be implemented on a spreadsheet, where each component
|
||
has a calculated risk, a fault detection time (if any), an estimated risk importance
|
||
and other factors such as de-rating and environmental stress.
|
||
This can be calculated, with one component failure mode per row, on a spreadsheet
|
||
and these are all summed to give the final assessment figure.
|
||
|
||
\subsubsection{Two statistical perspectives}
|
||
he Statistical Analysis method is used from two perspectives,
|
||
Probability of Failure on Demand (PFD), and Probability of Failure
|
||
in continuous Operation, Failure in Time (FIT).
|
||
\paragraph{Failure in Time (FIT)}.
|
||
|
||
Continuous operation is measured in failures per billion ($10^9$) hours of operation.
|
||
For a continuously running nuclear powerstation
|
||
we would be interested in its operational FIT values.
|
||
|
||
\paragraph{Probability of Failure on Demand (PFD)}.
|
||
For instance with the anti-lock system on a automobile braking
|
||
system, we would be interested in PFD.
|
||
That is to say the ratio of it failing
|
||
to succeeding on demand.
|
||
|
||
\subsubsection{FMEDA and determinability prediction accuracy}.
|
||
This suffers from the same problems of
|
||
lack of determinability prediction accuracy, as FMEA above.
|
||
%
|
||
We have to decide how particular components failing will impact on the SYSTEM or top level.
|
||
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
|
||
may be part of a critical monitoring function.
|
||
The analyst is now put in a position
|
||
where he must assign a critical failure possibility to it.
|
||
%
|
||
There is no analysis
|
||
of how that resistor would/could affect that circuit, but because the circuitry
|
||
it is part of critical section it will be linked to a critical system level fault.
|
||
%
|
||
A $\beta$ factor, the hueristically defined probability
|
||
of the failure causing the system fault may be applied.
|
||
%
|
||
But because there is no detailed analysis of the failure mode behaviour
|
||
of the component, traceable to the SYSTEM level, it becomes more
|
||
guess work than science.
|
||
With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side
|
||
effects that lead to failure can be missed.
|
||
|
||
By this we may have the MTTF of some critical component failure
|
||
modes, but we can only guess, in most cases what the safety case outcome
|
||
will be if it occurs.
|
||
|
||
This leads to having components within a SYSTEM partitioned into different
|
||
safety level zones \cite{en61508}. This is a vague way of determining
|
||
safety.
|
||
|
||
The Statistical Analysis methodology is the core philosophy
|
||
of the Safety Integrity Levels (SIL) ebodied in EN61508 \cite{en61508}
|
||
and its international analog standard IOC5108.
|
||
|
||
|
||
|
||
\subsubsection{ FMEDA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item Statistical nature allows a proportion of undetected failures for given S.I.L. level.
|
||
\item Allows a small proportion of `undetectable' error conditions.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
%AND then how we can solve all there problems
|
||
|
||
\section{A wish list for a failure mode methodolgy}
|
||
\begin{itemize}
|
||
\item All component failure modes must be considered in the model.
|
||
\item It should be easy to integrate mechanical, electronic and software models \cite{sccs}[pp.287].
|
||
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
|
||
\item It should have a formal basis, that is to say, it should be able to produce mathematical proofs
|
||
for its results, such as system level error causation trees, reliability and safety statistics.
|
||
\item It should be easy to use, Ideally using a graphical syntax (as oppossed to a formal mathematical one).
|
||
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
|
||
to smaller and smaller functional modules \cite{maikowski}.
|
||
\item Multiple failure modes may be modelled from the base component level up.
|
||
\end{itemize}
|
||
|
||
|
||
\section{Design of a new static failure mode based methodology}
|
||
|
||
\paragraph{New methodology must be bottom-up}
|
||
In order to ensure that all component failure modes have been covered
|
||
the methodology will have to work from the bottom-up
|
||
and start with the component failure modes.
|
||
%
|
||
\paragraph{Natural Fault Finding is top down}
|
||
The traditional fault finding, or natural fault finding
|
||
is to work from the top down.
|
||
%
|
||
On encountering a
|
||
fault, the symptom is first observed at the top or
|
||
SYSTEM level. By de-composing the functionality of the faulty system and testing
|
||
we can further de-compose the system until we find the
|
||
faulty base level component.
|
||
De-composition of electrical circuits is formalised and explored
|
||
in \cite{maikowski}. This top down technique de-composes by functionality.
|
||
Simpler and simpler functional blocks are discovered as we delve
|
||
further into the way the system works and is built.
|
||
|
||
\paragraph{Design Decision: Methodology must be bottom-up.}
|
||
In order to ensure that all component failure modes are handled,
|
||
this methodology must start at the bottom, with base component failure modes.
|
||
In this way automated checking can be applied to all component failure modes
|
||
to ensure none have been inadvertently excluded from the process.
|
||
|
||
\paragraph{Need for a `bottom-up' system de-composition}
|
||
There is an apparent conflict here. The natural way to
|
||
de-compose a system is from the top down.
|
||
%
|
||
What is required here is to mimic this top-down de-composition
|
||
with a bottom up technique.
|
||
|
||
By taking components that form {\fg}s from the bottom up
|
||
and then taking those to form higher level
|
||
{\fg}s we can get a close approximation of the de-composition process from the bottom up.
|
||
The philosophy of top down de-compositon is very similar.
|
||
Top down de-compositon applies functional
|
||
de-composition, because it seeks to break the system down
|
||
into manageable and separately testable entities.
|
||
A second justification for this is that the design process for a product requires both top down and bottom-up
|
||
thinking. To analyse a system from the bottom-up is a useful
|
||
design validatio process in itself \cite{sommerville}.
|
||
|
||
|
||
\paragraph{Problem with functional group hierarchy}
|
||
A hierarchy of functional grouping, leading to a system model
|
||
still leaves us with the problem of the number of component failure modes.
|
||
The base components will typically have several failure modes each.
|
||
%
|
||
Given a typical embedded system may have hundreds of components
|
||
This means that we have to tie base component failure modes
|
||
to SYSTEM level errors. This is the `possibility to miss failure mode effects
|
||
at SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
|
||
|
||
\paragraph{Design Decision: Methodolgy must reduce and collate errors at each functional group stage.}
|
||
SYSTEMS typically have far fewer failure modes than the sum of their component failure modes.
|
||
SYSTEM level failures may be caused by a variety of component failure modes.
|
||
A SYSTEM level failure mode is an abstracted failure mode, in that
|
||
it is a symptom of some lower level failure or failures.
|
||
% ABSTRACTION
|
||
For instance a failed resistor in a sensor at a base component level is a specific
|
||
failure mode.
|
||
%
|
||
For example it could be called `RESISTOR 1 OPEN'.
|
||
Its symptom in a functional group comprising the sensor channel that reads from it may be more abstract
|
||
or in other words describe the effect more generally.
|
||
%
|
||
We might call it `READING~HIGH' perhaps. At a higher level still
|
||
this may be called `SENSOR CHANNEL 1' fault.
|
||
At a system level it may simply be a `SENSOR FAILURE'.
|
||
As we traverse up the fault tree the failure modes
|
||
become more abstract.
|
||
%
|
||
At each functional group collection, there must be a process to collect
|
||
common symptoms and reduce the number of failure modes to handle.
|
||
This must be a process that incrementally reduces the number
|
||
of failure modes as the abstraction level reaches the SYSTEM level.
|
||
|
||
\paragraph{How to build a meaningful SYSTEM failure behaviour model.}
|
||
The next problem is how to we build a failure mode model
|
||
that converges to a finite set of SYSTEM level failure modes.
|
||
%
|
||
It would be better to analyse the failure mode behaviour of each
|
||
functional group, and determine the ways in which it, rather than its
|
||
components, can fail.
|
||
%
|
||
By doing this, the natural process whereby symptoms of the {\fg},
|
||
which can potentially be caused by more then one
|
||
component failure mode, become the target for reducing the number
|
||
of failure modes to handle as we traverse up the hierarchy.
|
||
|
||
|
||
\paragraph{Component failures and {\fg} failure symptoms.}
|
||
In other words we want to find out what the symptoms of the failures in the {\fg}s
|
||
are.
|
||
The number of symptoms of failure should be equal to or
|
||
less than the number of component failure modes, simply because
|
||
often there are several potential causes of failure symptoms.
|
||
%
|
||
When we have the symptoms, we can start thinking of the {\fg} as a component in its own right.
|
||
%with a simplified and reduced set of failure symptoms.
|
||
%
|
||
We can now create a new {\dc}, where its failure modes
|
||
are the failure symptoms of the {\fg}.
|
||
In this way as we build the hierarchy, we naturally abstract the
|
||
failure mode behaviour, but can check that all failure modes in
|
||
the hierarchy have been considered and tied to causing symptoms.
|
||
|
||
|
||
\paragraph{Incremental Stages and \dcs}.
|
||
We can use incremental stages to build the hierarchy.
|
||
We can take small {\fg}s of components, where the {\fg}
|
||
is a small set of components that perform a simple
|
||
task.
|
||
%
|
||
This should be small enough to be able to consider all the failure
|
||
modes of its components.
|
||
%
|
||
We can consider these failure modes from the perspective
|
||
of the {\fg}. In other words, for each component failure mode in the {\fg},
|
||
we create a `test case' and decide how each failure affects the functional group.
|
||
%
|
||
With the results from the test cases we will now have the ways in which the
|
||
{\fg} can fail.
|
||
%
|
||
%
|
||
We can refine this further, by grouping the common symptoms, or results that
|
||
are the same failure w.r.t. the {\fg}.
|
||
%
|
||
We can now treat the {\fg} as a component, and call it a {\dc}, in other words, a sub-system with a known set of failure modes.
|
||
%
|
||
We can now create a new {\dc} and assign it these common symptoms
|
||
as its failure modes.
|
||
%
|
||
This {\dc} can be used to build higher level
|
||
{\fg}s, and this will naturally form a hierarchy.
|
||
This hierarchy can be extended until it encompasses
|
||
an entire system. It can be considered complete when
|
||
all failure modes from all components are handled
|
||
and connectable to a SYSTEM level failure mode.
|
||
|
||
\paragraph{Directed Acyclic Graph.} This will naturally form a DAG
|
||
meaning that for all SYSTEM failure modes, we will be able to trace
|
||
back through the DAG to possible component failure mode causes.
|
||
If statistical models exist for the component failure modes
|
||
these failure causation trees (or minimal cut sets \cite{nucfta})
|
||
can be used to calculate Mean Time to Failure (MTTF) or Probability of Failure on demand (PFD) figures.
|
||
%
|
||
Because common symptoms are being collected, as we build the tree up-ward
|
||
the number of failure modes decreases (or exceptionally stays the same) at each level.
|
||
%
|
||
This decreasing of the number of failure modes is bourne out {\irl}.
|
||
Of the thousands of component failure modes in a typical product
|
||
there are generally only a handful of SYSTEM level failure modes
|
||
(or top level `symptoms' of underlying failures).
|
||
%
|
||
|
||
\subsection{Outline of the FMMD process}
|
||
\label{fmmdproc}
|
||
FMMD builds {\fg}s of components from the bottom-up.
|
||
The lowest level of components are termed base components.
|
||
These are the initial building blocks.
|
||
In Electronics these would be the individual
|
||
passive and active components on the parts~list.
|
||
In mechanics the the levers springs cogs etc.
|
||
Functional groups are collections of components
|
||
that work together to perform a simple function.
|
||
%
|
||
We can perform a failure mode effects analysis on each of the component failure
|
||
modes within the {\fg}. We can thus ensure that all component failure modes
|
||
are covered.
|
||
%
|
||
We can then treat the {\fg} as a `black box' or component in its own right.
|
||
We can now look at how the {\fg} can fail.
|
||
%
|
||
Many of the component failure modes will
|
||
cause the same failure symptoms in the {\fg} failure behaviour.
|
||
We can collect these failures as common symptoms.
|
||
%
|
||
When we have our set of symptoms, we can now create
|
||
a {\dc}. The {\dc} will have as its set of failures
|
||
modes, the collected symptoms of the {\fg}.
|
||
%
|
||
Because we can now have {\dcs} we can use these to form
|
||
new {\fg}s and we can build a hierarchical `failure~mode' model of the SYSTEM.
|
||
|
||
The diagram in figure \ref{fig:fmmd_hierachy}, shows one stage
|
||
of the FMMD process. The resultant {\dc} may be used to
|
||
create higher level {\fg}s in later stages.
|
||
%%- Need diagram of hierarchy
|
||
%%-
|
||
%%-
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=200pt,bb=0 0 331 249,keepaspectratio=true]{./fmmd_concept/fmmd_hierarchy.jpg}
|
||
% fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
|
||
\caption{Example derived component created from the functional group comprised of components a,b,c}
|
||
\label{fig:fmmd_hierarchy}
|
||
\end{figure}
|
||
|
||
|
||
% \begin{figure}[h]
|
||
% \centering
|
||
% \includegraphics[bb=0 0 331 249,keepaspectratio=true]{./fmmd_hierarchy.jpg}
|
||
% % fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
|
||
% \caption{Example derived component created from a functional group comprised of components a,b,c}
|
||
% \label{fig:fmmd_hiarchy}
|
||
% \end{figure}
|
||
%
|
||
% \vspace{20pt}
|
||
% NEED DIAGRAM OF HIERACY
|
||
% \vspace{20pt}
|
||
|
||
|
||
\subsection{Environmental Conditoions, Operational States and FMMD}
|
||
|
||
Any real world sub-system will exist in a variable environment and may have several modes of operation.
|
||
In order to find all possible failures, the sub-system must be analysed for each operational state
|
||
and environment condition that can affect it.
|
||
Two design decision are required here, which objects should we
|
||
analyse the environment and operational states with respect to.
|
||
We could apply these conditions for analysis
|
||
to the functional group, the components, or the derived
|
||
component.
|
||
|
||
\paragraph {Environmental Conditions and FMMD.}
|
||
|
||
Environmental conditions are external to the
|
||
{\fg} and are often things that the system has no direct control over.
|
||
Consider ambient temperature, pressure or even electrical interferrence levels.
|
||
|
||
Environmental conditions may affect different components in a {\fg}
|
||
in different ways.
|
||
|
||
For instance a system may be specified for
|
||
0 to 85oC operation, but some components
|
||
may show failure behaviour between 60 and 85
|
||
\footnote{Opto-islolators typically show marked performace decrease after
|
||
60oC whereas another common component, the resistor will be unaffected.}.
|
||
Environmental conditions will have an effect on the {\fg} and the {\dc}
|
||
but they will have specific effects on individual components.
|
||
|
||
\paragraph{Design Decision.}
|
||
Environmental constraints will be applied to components.
|
||
A component will hold a set of Environmental states that
|
||
affect it.
|
||
Environmental conditions will apply SYSTEM wide,
|
||
but may only affect specific components.
|
||
%Some may not be required for consideration
|
||
%for the analysis of particular systems.
|
||
|
||
\paragraph {Operational States and FMMD}
|
||
|
||
Sub-systems may have specific operational states.
|
||
These could be a general health level such as
|
||
normal operation, graceful degradation or lockout.
|
||
Or they could be self~checking sub-systems that are either in a normal or self~check state.
|
||
|
||
Operational states are conditions that apply to a functional group, not individual components.
|
||
|
||
|
||
\paragraph{Design Decision.}
|
||
Operational state will be applied to {\fg}s.
|
||
|
||
\paragraph{UML Model of FMMD Analysis}
|
||
|
||
Draw a UML model showing the components and the functional group
|
||
with the ENV and OP\_STAT classes associated with them
|
||
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=400pt,bb=0 0 818 249,keepaspectratio=true]{./fmmd_concept/fmmd_env_op_uml.jpg}
|
||
% fmmd_env_op_uml.jpg: 818x249 pixel, 72dpi, 28.86x8.78 cm, bb=0 0 818 249
|
||
\caption{UML model of Environmental and Operational states w.r.t FMMD}
|
||
\label{fig:env_op_uml}
|
||
\end{figure}
|
||
|
||
|
||
|
||
\subsection{Justification of wishlist}
|
||
|
||
By applying the methodology in section \ref{fmmdproc}, the wishlist can
|
||
now be evaluated for the proposed FMMD methodology.
|
||
|
||
\subsubsection{All component failure modes must be considered in the model.}
|
||
The proposed methodology will be bottom-up.
|
||
This ensures that all component failure modes are handled.
|
||
|
||
|
||
\subsubsection{ It should be easy to integrate mechanical, electronic and software models.}
|
||
Because component failure modes are considered, we have a generic entity to model.
|
||
We can describe a mechanical, electrical or software component in terms of its failure modes.
|
||
%
|
||
Because of this
|
||
we can model and analyse integrated electro mechanical systems, controlled by computers,
|
||
using a common notation.
|
||
|
||
\subsubsection{ It should be re-usable, in that commonly used modules can be re-used in other designs/projects.}
|
||
The hierarchical nature, taking {\fg}s and deriving components from them, means that
|
||
commonly used {\dcs} can be re-used in a design (for instance self checking digital inputs)
|
||
or even in other projects where the same {\dc} is used.
|
||
|
||
|
||
|
||
\subsubsection{ It should have a formal basis, that is to say, it should be able to produce mathematical proofs
|
||
for its results}
|
||
Because the failure mode of a SYSTEM is a hierarchy of {\fg}s and derived components
|
||
SYSTEM level failure modes are traceable back down the tree to
|
||
component level failure modes. This provides causation trees \cite{sccs} or, minimal cut sets
|
||
\footnote{Here minimal cut sets represent combinations of component failure modes that can result in s SYSTEM level failure.}
|
||
for all SYSTEM failure modes.
|
||
|
||
\subsubsection{ It should be capable of producing reliability and danger evaluation statistics.}
|
||
The Minimal cuts sets for the SYSTEM level failures can have computed MTTF
|
||
and danger evaluation statistics sourced from the component failure mode statistics \cite {mil1991}.
|
||
|
||
\subsubsection{ It should be easy to use, ideally using a graphical syntax (as oppossed to a formal mathematical one).}
|
||
A modified form of constraint diagram (an extension of Euler diagrams) has been developed to support the FMMD methodology.
|
||
This uses Euler circles to represent failure modes, and spiders to collect symptoms, to
|
||
advance a {\fg} to a {\dc}.
|
||
|
||
|
||
\subsubsection{ From the top down the failure mode model should follow a logical de-composition of the functionality
|
||
to smaller and smaller functional modules \cite{maikowski}.}
|
||
The bottom-up approach fulfils the logical de-composition requirement, because the {\fg}s
|
||
are built from components performing a given task.
|
||
|
||
|
||
\subsubsection{ Multiple failure modes may be modelled from the base component level up}
|
||
By breaking the problem of failure mode analysis into small stages
|
||
and building a hierarchy, the problems associated with the cross products of
|
||
all failure modes within a system are reduced by an exponential order.
|
||
|
||
|
||
|
||
\subsection{Advantages of FMMD Methodology}
|
||
|
||
\begin{itemize}
|
||
\item It can be checked automatically that all component failure modes have been considered in the model.
|
||
\item Because we are modelling with failure modes the {\fgs} and {\dcs} these can be generic, i.e. mechanical, electronic or software components.
|
||
\item The {\dcs} are re-usable, in that commonly used modules can be re-used in other designs/projects.
|
||
\item It will have a formal basis, that is to say, it is able to produce mathematical proofs
|
||
for its results (MTTF and the cause trees for SYSTEM level faults).
|
||
\item Overall reliability and danger evaluation statistics can be computed.
|
||
By knowing all causation trees,
|
||
the statistical probabilities (from base component data) for all causes can be simply added.
|
||
\item A graphical representation based on Euler diagrams is used. Providing an interface that does not involve
|
||
formal mathematical notation. This is intended to be user friendly and to guide the user through the FMMD process
|
||
while applying automatic checks for unhandled conditions.
|
||
\item From the top down the failure mode model will follow a logical de-composition of the functionality; by
|
||
chosing {\fg}s and working bottom-up this hierarchical trait will occur as a natural consequence.
|
||
\item Undetectable or unhandled failure modes will be specifically flagged.
|
||
\item It is possible to model multiple failure modes.
|
||
\end{itemize}
|
||
|
||
\section{Conclusion}
|
||
|
||
This paper provides the background for the need for a new methodology for
|
||
static analysis that can span the mechanical electrical and software domains
|
||
using a common notation.
|
||
The author believes it addresses mant short comings in current static failure mode analysis methodologies.
|
||
\vspace{60pt}
|
||
\today
|