Robin_PHD/fmmd_concept/fmmd_concept.tex

402 lines
19 KiB
TeX

\ifthenelse {\boolean{paper}}
{
\abstract{ This paper proposes a methodology for
creating failure mode models of safety critical systems, which
has a common and integrateable notation
for mechanical, electronic and software domains.
In addition, the methodology address the traditional weaknesses of
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA)
and Failure Mode Effects and Diagnostic Analysis (FMEDA).
The proposed methodology is bottom-up and
modular.}
}
{}
\section{Introduction}
There are four methodologies in common use for failure mode modelling.
These are FTA, FMEA, FMECA
and FMEDA (a form of statistical analysis).
These methodologies date from the 1940's onwards and have several draw backs.
%In short
%FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
%lack precision in predicting failure modes at the SYSTEM level.
The Failure Mode Modular De-composition
(FMMD) aims to address the
weaknesses in these methodoligies and to add
features such as the ability to analyse double
failure mode scenarios, and to allow modular re-use
of analysis.
The FMMD
methodology presented here provides a more detailed and analytical
modelling system which will create a more complete and detail hierarchical failure mode model from which
the data models from FTA, FMEA and the statistical approach can be
derived if required.
It also applies rigorous checking in the analysis stages
ensuring that all component failure modes must be considered in the model.
FMMD
\ifthenelse {\boolean{paper}}
{
paper
}
{
chapter
}
presents a bottom up modular methodology, a extension and refinement to the FMEA, where instead of looking
at individual component failure modes and deciding on their impact on the SYSTEM
it uses the component failure modes, to build modules or derived components,
using incremental steps to build a hierarchical model.
%
This methodology has been named Failure Mode Modular De-composition (FMMD)
because it de-composes a SYSTEM into a hierarchy of modules or {\dc}s.
%
It does this by working from the bottom up, taking small groups
of components, {\fgs}, and then analysing how they can fail.
This analysis is performed using FMEA from a micro rather than a macro perspective.
Thus instead of looking at a component failure modes, and determining how
they {\em may} cause a failure at SYSTEM level, we are looking at how
they {\em will} affect the {\fg}.
When we know the failure modes of a {\fg} we can treat it as a `black box'
or {\dc}. With {\dc}s we can build {\fgs}
at higher levels of analysis, until we have a complete
hierarchy representing the failure behaviour of the SYSTEM.
%
Because all the failure modes of all the components
are held in a computer program, we can determine if the model is complete
(i.e. all component failure modes have been included in the model).
%OK need to describe the need for it
\section{The need for a new failure mode modelling methodology}
\paragraph{Ideal Static failure mode methodology}
An ideal Static failure mode methodology would build a failure mode model
from which the traditional four models could be derived.
It would address the short-comings in the other methodologies, and
would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons
to represent the results of analysis phases.
%
%There are four static analysis failure mode methodologies in common use.
%Each has its advantages and drawbacks, and each is suited for
%a different phase in the product life cycle.
The four methodologies in current use are discussed briefly below.
\subsection { FTA }
This, like all top~down methodologies introduces the very serious problem
of missing component failure modes \cite{faa}[Ch.9]
%, or modelling at
%a too high level of failure mode abstraction.
FTA was invented for use on the minuteman nuclear defence missile
systems in the early 1960's and was not designed as a rigorous
fault/failure mode methodology. It is more like a structure to
be applied when discussing the safety of a system, with a top down hierarchical
notation, that guides the analysis. This methodology was designed for
experienced engineers sitting around a large diagram and discussing the safety aspects.
Also the nature of a large rocket with red wire, and remote detonation
failsafes meant that the objective was to iron out common failures
not to rigorously detect all possible failures.
Consequently it was not designed to guarantee to cover all component failure modes,
and has no rigorous in-built safeguards to ensure coverage of all possible
system level outcomes.
\subsubsection{ FTA weaknesses }
\begin{itemize}
\item Possibility to miss component failure modes
\item Possibility to miss environemtal affects.
\item No possibility to model base component level double failure modes.
\end{itemize}
\subsection { FMEA }
This is an early static analysis methodology, and concentrates
on SYSTEM level errors which have been investigated.
The investigation will typically point to a particular failure
of a component.
The methodology is now applied to find the significance of the failure.
Its is based on a simple equation where $S$ ranks the severity (or cost \cite{fmea}) of the identified SYSTEM failure,
$O$ its occurrance, and $D$ giving the failures detectability. Mulipliying these
together,
gives a risk probability number, i.e. $RPN = S \times O \times D$.
This gives in effect
a prioritised todo list, with higher the $RPN$ values being the most urgent.
\subsubsection{ FMEA weaknesses }
\begin{itemize}
\item Possibility to miss the effects of failure modes at SYSTEM level.
\item Possibility to miss environemtal affects.
\item No possibility to model base component level double failure modes.
\end{itemize}
\subsection{FMECA}
Failure mode, effects, and criticality analysis (FMECDA) extends FMEA.
This is a bottom up methodology, which takes component failure modes
and traces them to the SYSTEM level failures. The components
have reliability data and this can be used to predict the
failure statistics in the design stage \cite{mil1991}.
It can do this using probability \footnote{for a given component failure mode there will be a $\beta$ value, the
probability that the component failure mode will cause a given SYSTEM failure}.
%
This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea},
as often the component failure mode can't be proven to cause a SYSTEM level failure, but
assigned a probability $\beta$ fator by the design engineer.
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
%
The results, as with FMEA are an $RPN$ number determing the significance of the SYSTEM fault.
%%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
%%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or
%%-WIKI- piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the
%%-WIKI- probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability
%%-WIKI- and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value.
%%-WIKI- FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications,
%%-WIKI- while various forms of FMEA predominate in other industries.
\subsubsection{ FMEA weaknesses }
\begin{itemize}
\item Possibility to miss the effects of failure modes at SYSTEM level.
\item Possibility to miss environemtal affects.
\item No possibility to model base component level double failure modes.
\end{itemize}
\subsection { FMEDA or Statistical Analyis }
This is a process that takes all the components in a system,
and from the failure modes of those components
tnote{for a given component failure mode there will be a $\beta$ value, the
probability that the component failure mode will cause a given SYSTEM failure}.
calculates a risk factor for each.
The risk factors of all the component failure modes are summed and
give a value for the `safety level' for the equipment in a given environment.
%%-he FMEDA technique considers
%%-• All components of a design,
%%-• The functionality of each component,
%%-• The failure modes of each component,
%%-• The impact of each component failure mode on the product functionality,
%%-• The ability of any automatic diagnostics to detect the failure,
%%-• The design strength (de-rating, safety factors) and
%%-• The operational profile (environmental stress factors).
This uses MTFF and other statisical models to determine the probability of
failures occurring. A component failure mode, given its MTTF
the probability of detecting the fault and its safety relevant validation time $\tau$,
contributes a simple risk factor that is summed
in to give a final risk result. Thus a statistical
model can be implemented on a spreadsheet, where each component
has a calculated risk, a fault detection time (if any), an estimated risk importance
and other factors such as de-rating and environmental stress.
This can be calculated, with one component failure mode per row, on a spreadsheet
and these are all summed to give the final assement figure.
\paragraph{Two statistical perspectives}
The Statistical Analysis method is used from two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, Failure in Time (FIT) and measured in failures per billion ($10^9$) hours of operation.
For instance with the anti-lock system on a automobile braking
system, we would be interested in PFD.
For a continuously running nuclear powerstation
we would be interested in its 24/7 operation FIT values.
This suffers from the same problems of
lack of determinability prediction accuracy, as FMEA above.
We have to decide how particular components failing will impact ot the SYSTEM or top level.
This involves a `leap of faith'. For instance a resistor failing in a sensor cirrcuit
may be part of a critical montioring function.
The analyst is now put in a position
where he must assign a critical failure possibility to it. There is no analysis
of how that resistor would/could affect that circuit, but because of the circuitry
it is part of critical section it is linked to a critical system level fault.
There is no cause and effect analysis for the failure modes. Unintended side
effects that lead to failure can be missed.
By this we may have the MTTF of some critical component failure
modes, but we can only guess, in most cases what the safety case outcome
will be if it occurs.
This leads to having components within a SYSTEM partitioned into different
safety level zones \cite{en61508}. This is a vague way of determining
safety.
The Statistical Analyis methodology is the core philosophy
of the Safety Integrity Levels (SIL) of EN61508 \cite{en61508}.
\subsubsection{ FMEDA weaknesses }
\begin{itemize}
\item Possibility to miss the effects of failure modes at SYSTEM level.
\item Statistical nature allows critical failures considered acceptable for given S.I.L. level.
\item Allows a small proportion of `undetectable' error conditions.
\item No possibility to model base component level double failure modes.
\end{itemize}
%AND then how we can solve all there problems
\section{A wish list for a failure mode methodolgy}
\begin{itemize}
\item All component failure modes must be considered in the model.
\item It should be easy to integrate mechanical, electronic and software models \cite{sccs}[pp.287].
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
\item It should have a formal basis, that is to say, it should be able to produce mathematical proofs
for its results.
\item It should be capable of producing reliability and danger evaluation statistics.
\item It should be easy to use, Ideally useing a graphical syntax (as oppossed to a formal mathematical one).
\item From the top down the failure mode model should follow a logical de-composition of the functionality
to smaller and smaller functional modules \cite{maikowski}.
\item Multiple failure modes may be modelled from the base component level up.
\end{itemize}
\section{Proposed Methodology \\ Failure Mode Modular De-Composition (FMMD)}
\paragraph{New methodology Must be bottom-up}
In order to ensure that all component failure modes have been covered
the methodology will have to work from the bottom-up
and start with the component failure modes.
%
\paragraph{How to build a SYSTEM failure behaviour model}
The next problem is how to we build a failure mode model
that converges to a finite set of SYSTEM level failure modes.
%
\paragraph{incremental stages and {\fg}s}
We can use incremental stages to build the hierarchy.
we can take small {\fg}s of components, where the {\fg}
is a small set of components that perform a simple
task.
This should be small enough to be able to consider all the failure
modes of its components.
We can consider these failure modes from the perspective
of the {\fg}. In other words, for each component failure mode in the {\fg},
we create a `test case' and decide how each failure affects the functional group.
%
With the results from the test cases we will now have the ways in which the
{\fg} can fail.
%
We can now treat the {\fg} as a component, or rather a {\dc}.
We can refine this further, by grouping the common symptoms, or results that
are the same failure w.r.t. the {\fg}.
%
We can now create a {\dc} and assign these common symptoms
as its failure modes.
%
This {\dc} can be used to build higher level
{\fg}s, and naturally a hierarchy is being formed, which is
a failure mode behaviour model.
\paragraph{Directed Acyclic Graph}. This will naturally form a DAG
meaning that for all SYSTEM failure modes, we will be able to trace
back through the DAG to possible component failure mode causes.
If statistical models exist for the component failure modes
these failure causation trees (or minimal cut sets \cite{nucfta})
can be used to calculate Mean Time to Failure (MTTF) or Probability of Failure on demand (PFD) figures.
%
Because common symptoms are being collected, as we build the tree up-ward
the number of failure modes decreases (or exceptionally stays the same) at each level.
%
This decreasing of the number of failure modes is bourne out {\irl}.
Of the thousands of component failure modes in a typical product
there are generally only a handful of SYSTEM level failure modes.
%
\subsection{Outline of the FMMD process}
FMMD builds {\fg}s of components from the bottom-up.
Thus the {\fg}s are minimal collections of components
that work together to perform a simple function.
We can perform a failure mode effects analysis on each of the component failure
modes within the {\fg}. We can thus ensure that all component failure modes
are covered. We can then treat the {\fg} as a `black box' or component in its own right.
We can now look at how the {\fg} can fail. Many of the component failure modes will
cause the same failure symptoms in the {fg} failure behaviour.
We can collect these failures as common symptoms.
When we have out set of symptoms, we can now create
a {\dc}. The {\dc} will have as its set of failures
modes, the collected symptoms of the {\fg}.
Because we can now have a {\dcs} we can use these to form
new {\fg}s and we can build a hierarchical model of the system failure modes.
\subsection{Justification of wishlist}
\subsubsection{All component failure modes must be considered in the model.}
The proposed methodology will be bottom-up.
This ensures that all component failure modes are handled.
\subsubsection{ It should be easy to integrate mechanical, electronic and software models.}
Each functional components failure modes are considered. Because of this
the failure modes of a mechanical, electrical or software system can be modelled
using a common notation.
\subsubsection{ It should be re-usable, in that commonly used modules can be re-used in other designs/projects.}
The hierarchical nature, taking {\fg}s and deriving components from them, means that
commonly used {\dcs} can be re-used in a design (for instance self checking digital inputs)
or even in other projects where the same {\dc} is used.
\subsubsection{ It should have a formal basis, that is to say, it should be able to produce mathematical proofs
for its results}
Because the failure mode mode of a SYSTEM is a hierarchy of {\fg}s and derived components
SYSTEM level failure modes are traceable back down the tree to
component level failure modes. This proivides causation trees \cite{sccs} or, minimal cut sets
\footnote{Here minimal cut sets represent combinations of component failure modes that can result in s SYSTEM level failure.}
for all SYSTEM failure modes.
\subsubsection{ It should be capable of producing reliability and danger evaluation statistics.}
The Minimal cuts sets for the SYSTEM level failures, can have computed MTTF
and danger evaluation statistics sourced from the component failure mode statistics \cite {mil1991}.
\subsubsection{ It should be easy to use, Ideally useing a graphical syntax (as oppossed to a formal mathematical one).}
A modified form of constraint diagram (an extension of Euler diagrams) has been developed to support the FMMD methodology.
This uses Euler circles to represent failure modes, and spiders to collect symptoms, to
advance a {\fg} to a {\dc}.
\subsubsection{ From the top down the failure mode model should follow a logical de-composition of the functionality
to smaller and smaller functional modules \cite{maikowski}.}
The bottom-up approach fulfills the logical de-composition requirement, because the {\fg}s
are built from components performing a given task.
\subsubsection{ Multiple failure modes may be modelled from the base component level up}
By breaking the problem of failure mode analysis into small stages
and building a hierarchy, the problems associated with the cross products of
all failure modes within a system are greatly by an exponential order.
\subsection{Advantages of FMMD Methodology}
\begin{itemize}
\item It can be checked, automatically that, all component failure modes have been considered in the model.
\item Because we are modelling with {\fgs} and {\dcs} these can be generic, i.e. mechanical, electronic or software components.
\item The {\dcs} are re-usable, in that commonly used modules can be re-used in other designs/projects.
\item It will have a formal basis, that is to say, it is able to produce mathematical proofs
for its results (MTTF and the cause trees for SYSTEM level faults).
\item Overall reliability and danger evaluation statistics can be computed.
\item A graphical representation based on Euler diagrams is used.
\item From the top down the failure mode model will follow a logical de-composition of the functionality; by
chosing {\fg}s and working bottom-up the hierarchy this happens as a natural consequence.
\item Undetectable or unhandled failure modes will be specifically flagged.
\item It is possible to model multiple failure modes.
\end{itemize}
\section{Conclusion}
\vspace{30pt}
\today