Robin_PHD/fmmd_concept/fmmd_concept.tex
2010-10-04 19:01:04 +01:00

290 lines
12 KiB
TeX

\ifthenelse {\boolean{paper}}
{
\abstract{ This paper proposes a methodology for
creating failure mode models of safety critical systems, which
has a common and integrateable notation
for mechanical, electronic and software domains.
In addition, the methodology address the traditional weaknesses of
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA)
and Failure Mode Effects and Diagnostic Analysis (FMEDA).
The proposed methodology is bottom-up and
modular.}
}
{}
\section{Introduction}
There are three methodologies in common use for failure mode modelling.
These are FTA, FMEA
and FMEDA (a form of statistical analysis).
These methodologies have several draw backs.
In short
FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
lack precision in predicting failure modes at the SYSTEM level.
The Failure Mode Modular De-composition
(FMMD) methodology presented here provides a more detailed and analytical
modelling system from which
the data models from FTA, FMEA and the statistical approach can be
derived if required.
It also applies rigorous checking in the analysis stages
ensuring that all component failure modes must be considered in the model.
FMMD
\ifthenelse {\boolean{paper}}
{
paper
}
{
chapter
}
presents a bottom up modular methodology, a extension and refinement of FMEA, where instead of looking
at individual component failure modes and deciding on their impact on the SYSTEM
it uses the component failure modes, to build modules or derived components.
This methodology has been named Failure Mode Modular De-composition (FMMD)
because it de-composes a SYSTEM into a hierarchy of modules or {\dc}s.
It does this by working from the bottom up, taking small groups
of components, {\fgs}, and then analysing how they can fail.
This analysis is performed using FMEA from a micro rather than a macro perspective.
Thus instead of looked at a component failure modes, and determining how
it {\em might} cause a failure at SYSTEM level, we are looking at how
it will affect the {\fg}.
When we know the failure modes of a {\fg} we can treat it as a `black box'
or {\dc}. With {\dc}s we can build {\fgs}
at higher levels of analysis, until we have a complete
hierarchy representing the failure behaviour of the SYSTEM.
Because all the failure modes of all the components
are held in a computer program, we can determine if the model is complete
(i.e. all component failure modes have been included in the model).
%OK need to describe the need for it
\section{The need for a new failure mode modelling methodology}
In summary.
\subsection { FTA }
This, like all top~down methodologies introduces the very serious problem
of missing component failure modes, or modelling at
a too high level of failure mode abstraction.
FTA was invented for use on the minuteman nuclear defence missile
systems in the early 1960's and was not designed as a rigorous
fault/failure mode methodology. It is more like a structure to
be applied when discussing the safety of a system, with a top down hierarchical
notation, that guides the analysis. This methodology was designed for
experienced engineers sitting around a large diagram and discussing the safety aspects.
Also the nature of a large rocket with red wire, and remote detonation
failsafes meant that the objective was to iron out common failures
not to rigorously detect all possible failures.
Consequently it was not designed to guarantee to cover all component failure modes,
and has no rigorous in-built safeguards to ensure coverage of all possible
system level outcomes.
\subsection { FMEA }
This places a burden of taking individual component failure modes
and trying to determine what affects this will have at SYSTEM level.
Justifications for this methodology are often statistical and Bayes Theorem \cite{probstat}
is often cited.
This lacks precision, or in other words, determinability prediction accuracy,
as often the component failure mode cannt be proven to cause a SYSTEM level failure, only to make it more likely.
Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
\subsection { FMEDA or Statistical Analyis }
This is a process that takes all the components in a system,
and from the failure modes of those components
calculates a risk factor for each.
The risk factors of all the component failure modes are summed and
give a value for the `safety level' for the equipment in a given environment.
%%-he FMEDA technique considers
%%-• All components of a design,
%%-• The functionality of each component,
%%-• The failure modes of each component,
%%-• The impact of each component failure mode on the product functionality,
%%-• The ability of any automatic diagnostics to detect the failure,
%%-• The design strength (de-rating, safety factors) and
%%-• The operational profile (environmental stress factors).
This uses MTFF and other statisical models to determine the probability of
failures occurring. A component failure mode, given its MTTF
the probability of detecting the fault and its safety relevant validation time $\tau$,
contributes a simple risk factor that is summed
in to give a final risk result. Thus a statistical
model can be implemented on a spreadsheet, where each component
has a calculated risk, a fault detection time (if any), an estimated risk importance
and other factors such as de-rating and environmental stress.
This can be calculated, with one component failure mode per row, on a spreadsheet
and these are all summed to give the final assement figure.
\paragraph{Two statistical perspectives}
The Statistical Analysis method is used from two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, Failure in Time (FIT) and measured in failures per billion ($10^9$) hours of operation.
For instance with the anti-lock system on a automobile braking
system, we would be interested in PFD.
For a continuously running nuclear powerstation
we would be interested in its 24/7 operation FIT values.
This suffers from the same problems of
lack of determinability prediction accuracy, as FMEA above.
We have to decide how particular components failing will impact ot the SYSTEM or top level.
This involves a `leap of faith'. For instance a resistor failing in a sensor cirrcuit
may be part of a critical montioring function. But the analyst is put in a position
where he must assign a critical failure possibility to it. There is no analysis
of how that resistor would/could affect that circuit, but because of the circuitry
it is part of critical section it is linked to a critical system level fault.
By this we may have the MTTF of some critical component failure
modes, but we can only guess, in most cases what the safety case outcome
will be if it occurs.
This leads to having components within a SYSTEM partitioned into different
safety level zones \cite{en61508}. This is a vague way of determining
safety.
The Statistical Analyis methodology is the core philosophy
of the Safety Integrity Levels (SIL) of EN61508 \cite{en61508}.
%AND then how we can solve all there problems
\section{A wish list for a failure mode methodolgy}
\begin{itemize}
\item All component failure modes must be considered in the model.
\item It should be easy to integrate mechanical, electronic and software models.
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
\item It should have a formal basis, that is to say, it should be able to produce mathematical proofs
for its results.
\item It should be capable of producing reliability and danger evaluation statistics.
\item It should be easy to use.
\end{itemize}
\section{building blocks of a safety critical systen}
This section looks at common features in a safety critical system and
then looks at the building blocks of these systems
and their characteristics.
\subsection{what is a safety critical system?}
DEFINITIONS GET REFS
TYPICALLY HAS MECHANICAL, ELECTRONIC and SOFTWARE
actuators control intelligence
\subsection{An example : industrial burner}
An industrial burner is a nice example of a safety critical system.
It has some lethal risks and some environmental.
It could, by igniting an explosive mixture, cause an explosion.
By burning incorrect proportions of fuel and air, it could be ineffecient and waste
resources, or worse could cause poisonous burning (typically carbon monoxide, but also
where flame temperature is very high, can produce NOX emmissions).
To prevent igniting an explosive mixture, air is pumped though the furnace
chamber on start-up, and this is verified with an air pressure switch.
NEED A DIAGRAM HERE
NEED A STATE CHART TOO
It is interesting here to compare how the different methodologies
would deal with a particular sub-system in the burner controller
and compare how they analyse it.
The Flame scanner is a good example for this.
We shall consider a simple infra red (IR) flame scanner.
This is in the form of an IR sensitive resistor.
The flame type we will be looking for will have a characteristic
flicker frequency of around 13Hz.
The circuit is then simply a resitor voltage divider connected to
a micro-controller reading the voltage.
The flame scanner is thus a two resistor voltage divider.
\subsection{The Flame Scanner}
\subsubsection{Macro FTA perspective}
SHOW ALL TOP LEVEL FAULTS. EXPLOSION, POISONOUS BURNING CO, POISONOUS BURNING NOX, FAILS TO LIGHT etc
Follow the explosion tree down to flame scanner fails ON, and OFF
etc
\subsubsection{Macro FMEA/Statistical perspective}
Each of the resistors is considered critical, in the statistical case, and so the MTTF
is added inot the DANGEROUS section.
For FMEA the resistor failures add up to the SYSTEM level, show this is inappropriate
and makes several jumps in applied knowledge, thus Bayes theorem etc
\subsubsection{Micro FMMD perspective}
Here show how the flame scanner becomes a black box, or component in itself.
How it is now available to be integrated into higher level designs.
%and then an ignition position is checked.
%Initially a pilot flame is started and when this is stable, the main
%flame is fired.
%To check the stability of the flame, a flame scanner is required.
%To mix the fuel and air, motors to position valves are generally used.
%To prevent fuel leakage into the furnace, safety shut-off valves are used \footnote{These generally open slowly under power, and when power is removed `slam shut'. Thus
%in the event of a general power failure, the default to safe behaviour.}
Motors controlling air and fuel flow
safety chain to power for shutdown valves
safety shutdown valves on fuel
flame sensor
air pressure sensor
\section{Base Level Components}
A common factor with all safety critical systems, is
base level -or- bought in components. Be these
electrical, mechanical or firmware, they should all
have known failure modes.
\subsection { Failure modes defining the component}
We can consider each bought-in component as a base level component,
and it should have an associated set of failure modes.
\subsection { Complication of multiple failure modes }
A very complicated component, like an integrated circuit or perhaps a servo motor, has
a set of failure modes, where several things could go worng with it within the $\tau$ period.
This is a simultaneous failure, or more than one failure mode being active during the same time period.
\section{FMMD Proposed Methology Outline}
fire away, essentially the elevator pitch
\subsection{Treating a functional group as a component}
\subsection{Using a derived component in designs}
\section{Building a failure Mode model Hierarchy}
AND the hierarchy...
Probab about 3 pages