359 lines
16 KiB
TeX
359 lines
16 KiB
TeX
|
|
|
|
|
|
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
\abstract{ This paper proposes a methodology for
|
|
creating failure mode models of safety critical systems, which
|
|
has a common and integrateable notation
|
|
for mechanical, electronic and software domains.
|
|
In addition, the methodology address the traditional weaknesses of
|
|
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA)
|
|
and Failure Mode Effects and Diagnostic Analysis (FMEDA).
|
|
The proposed methodology is bottom-up and
|
|
modular.}
|
|
}
|
|
{}
|
|
|
|
|
|
\section{Introduction}
|
|
|
|
There are three methodologies in common use for failure mode modelling.
|
|
These are FTA, FMEA
|
|
and FMEDA (a form of statistical analysis).
|
|
|
|
These methodologies have several draw backs.
|
|
In short
|
|
FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
|
|
lack precision in predicting failure modes at the SYSTEM level.
|
|
|
|
|
|
The Failure Mode Modular De-composition
|
|
(FMMD) methodology presented here provides a more detailed and analytical
|
|
modelling system from which
|
|
the data models from FTA, FMEA and the statistical approach can be
|
|
derived if required.
|
|
It also applies rigorous checking in the analysis stages
|
|
ensuring that all component failure modes must be considered in the model.
|
|
|
|
FMMD
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
paper
|
|
}
|
|
{
|
|
chapter
|
|
}
|
|
presents a bottom up modular methodology, a extension and refinement of FMEA, where instead of looking
|
|
at individual component failure modes and deciding on their impact on the SYSTEM
|
|
it uses the component failure modes, to build modules or derived components.
|
|
This methodology has been named Failure Mode Modular De-composition (FMMD)
|
|
because it de-composes a SYSTEM into a hierarchy of modules or {\dc}s.
|
|
It does this by working from the bottom up, taking small groups
|
|
of components, {\fgs}, and then analysing how they can fail.
|
|
This analysis is performed using FMEA from a micro rather than a macro perspective.
|
|
Thus instead of looked at a component failure modes, and determining how
|
|
it {\em might} cause a failure at SYSTEM level, we are looking at how
|
|
it will affect the {\fg}.
|
|
When we know the failure modes of a {\fg} we can treat it as a `black box'
|
|
or {\dc}. With {\dc}s we can build {\fgs}
|
|
at higher levels of analysis, until we have a complete
|
|
hierarchy representing the failure behaviour of the SYSTEM.
|
|
Because all the failure modes of all the components
|
|
are held in a computer program, we can determine if the model is complete
|
|
(i.e. all component failure modes have been included in the model).
|
|
|
|
|
|
%OK need to describe the need for it
|
|
\section{The need for a new failure mode modelling methodology}
|
|
|
|
|
|
\paragraph{Ideal Static failure mode methodology}
|
|
An ideal Static failure mode methodology would build a failure mode model
|
|
from which the the other four could be derived.
|
|
It would address the short-comings in the other methodologies, and
|
|
would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons
|
|
to represent the results of analysis phases.
|
|
|
|
There are four static analysis failure mode methodologies in common use.
|
|
Each has its advantages and drawbacks, and each is suited for
|
|
a different phase in the product life cycle.
|
|
These four methodologies are discussed briefly below.
|
|
|
|
\subsection { FTA }
|
|
|
|
This, like all top~down methodologies introduces the very serious problem
|
|
of missing component failure modes, or modelling at
|
|
a too high level of failure mode abstraction.
|
|
FTA was invented for use on the minuteman nuclear defence missile
|
|
systems in the early 1960's and was not designed as a rigorous
|
|
fault/failure mode methodology. It is more like a structure to
|
|
be applied when discussing the safety of a system, with a top down hierarchical
|
|
notation, that guides the analysis. This methodology was designed for
|
|
experienced engineers sitting around a large diagram and discussing the safety aspects.
|
|
Also the nature of a large rocket with red wire, and remote detonation
|
|
failsafes meant that the objective was to iron out common failures
|
|
not to rigorously detect all possible failures.
|
|
Consequently it was not designed to guarantee to cover all component failure modes,
|
|
and has no rigorous in-built safeguards to ensure coverage of all possible
|
|
system level outcomes.
|
|
|
|
\subsection { FMEA }
|
|
|
|
This is an early static analysis methodology, and concentrates
|
|
on SYSTEM level errors which have been investigated.
|
|
The investigation will typically point to a particular failure
|
|
of a component.
|
|
The methodology is now applied to find the significance of the failure.
|
|
Its is based on a simple equation where $S$ ranks the severity (or cost \cite{fmea}) of the identified SYSTEM failure,
|
|
$O$ its occurrance, and $D$ giving the failures detectability. Mulipliying these
|
|
together,
|
|
gives a risk probability number, i.e. $RPN = S \times O \times D$.
|
|
This gives in effect
|
|
a prioritised todo list, with higher the $RPN$ values being the most urgent.
|
|
|
|
\subsection{FMECA}
|
|
|
|
Failure mode, effects, and criticality analysis (FMECDA) extends FMEA.
|
|
This is a bottom up methodology, which takes component failure modes
|
|
and traces them to the SYSTEM level failures. The components
|
|
have reliability data and this can be used to predict the
|
|
failure statistics in the design stage \cite{mil1992}.
|
|
It can do this using probability \footnote{for a given component failure mode there will be a $\Beta$ value, the
|
|
probability that the component failure mode will cause a given SYSTEM failure}.
|
|
%
|
|
This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea},
|
|
as often the component failure mode can't be proven to cause a SYSTEM level failure, but
|
|
assigned a probability $\Beta$ fator by the design engineer.
|
|
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
|
|
%
|
|
The results, as with FMEA are an $RPN$ number determing the significance of the SYSTEM fault.
|
|
|
|
%%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
|
|
%%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or
|
|
%%-WIKI- piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the
|
|
%%-WIKI- probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability
|
|
%%-WIKI- and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value.
|
|
%%-WIKI- FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications,
|
|
%%-WIKI- while various forms of FMEA predominate in other industries.
|
|
|
|
|
|
|
|
\subsection { FMEDA or Statistical Analyis }
|
|
|
|
|
|
This is a process that takes all the components in a system,
|
|
and from the failure modes of those components
|
|
tnote{for a given component failure mode there will be a $\Beta$ value, the
|
|
probability that the component failure mode will cause a given SYSTEM failure}.
|
|
|
|
calculates a risk factor for each.
|
|
The risk factors of all the component failure modes are summed and
|
|
give a value for the `safety level' for the equipment in a given environment.
|
|
|
|
%%-he FMEDA technique considers
|
|
%%-• All components of a design,
|
|
%%-• The functionality of each component,
|
|
%%-• The failure modes of each component,
|
|
%%-• The impact of each component failure mode on the product functionality,
|
|
%%-• The ability of any automatic diagnostics to detect the failure,
|
|
%%-• The design strength (de-rating, safety factors) and
|
|
%%-• The operational profile (environmental stress factors).
|
|
|
|
This uses MTFF and other statisical models to determine the probability of
|
|
failures occurring. A component failure mode, given its MTTF
|
|
the probability of detecting the fault and its safety relevant validation time $\tau$,
|
|
contributes a simple risk factor that is summed
|
|
in to give a final risk result. Thus a statistical
|
|
model can be implemented on a spreadsheet, where each component
|
|
has a calculated risk, a fault detection time (if any), an estimated risk importance
|
|
and other factors such as de-rating and environmental stress.
|
|
This can be calculated, with one component failure mode per row, on a spreadsheet
|
|
and these are all summed to give the final assement figure.
|
|
|
|
\paragraph{Two statistical perspectives}
|
|
The Statistical Analysis method is used from two perspectives,
|
|
Probability of Failure on Demand (PFD), and Probability of Failure
|
|
in continuous Operation, Failure in Time (FIT) and measured in failures per billion ($10^9$) hours of operation.
|
|
For instance with the anti-lock system on a automobile braking
|
|
system, we would be interested in PFD.
|
|
For a continuously running nuclear powerstation
|
|
we would be interested in its 24/7 operation FIT values.
|
|
|
|
This suffers from the same problems of
|
|
lack of determinability prediction accuracy, as FMEA above.
|
|
We have to decide how particular components failing will impact ot the SYSTEM or top level.
|
|
This involves a `leap of faith'. For instance a resistor failing in a sensor cirrcuit
|
|
may be part of a critical montioring function.
|
|
The analyst is now put in a position
|
|
where he must assign a critical failure possibility to it. There is no analysis
|
|
of how that resistor would/could affect that circuit, but because of the circuitry
|
|
it is part of critical section it is linked to a critical system level fault.
|
|
There is no cause and effect analysis for the failure modes. Unintended side
|
|
effects that lead to failure can be missed.
|
|
|
|
By this we may have the MTTF of some critical component failure
|
|
modes, but we can only guess, in most cases what the safety case outcome
|
|
will be if it occurs.
|
|
|
|
This leads to having components within a SYSTEM partitioned into different
|
|
safety level zones \cite{en61508}. This is a vague way of determining
|
|
safety.
|
|
|
|
The Statistical Analyis methodology is the core philosophy
|
|
of the Safety Integrity Levels (SIL) of EN61508 \cite{en61508}.
|
|
|
|
|
|
%AND then how we can solve all there problems
|
|
|
|
\section{A wish list for a failure mode methodolgy}
|
|
\begin{itemize}
|
|
\item All component failure modes must be considered in the model.
|
|
\item It should be easy to integrate mechanical, electronic and software models.
|
|
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
|
|
\item It should have a formal basis, that is to say, it should be able to produce mathematical proofs
|
|
for its results.
|
|
\item It should be capable of producing reliability and danger evaluation statistics.
|
|
\item It should be easy to use, Ideally useing a graphical syntax (as oppossed to a formal mathematical one).
|
|
\item From the top down the failure mode model should follow a logical de-composition of the functionality
|
|
to smaller and smaller functional modules \cite{maikowski}.
|
|
\end{itemize}
|
|
|
|
|
|
\section{Proposed Methodology}
|
|
|
|
The proposed methodology will be bottom-up. to fulfill the logical de-composition requirement
|
|
it must build {\fg}s from the bottom-up. These are minimal collections of components
|
|
that work together to perform a simple function.
|
|
We can perform a failure mode effects analysis on each of the component failure
|
|
modes within the {\fg}. We can thus ensure that all component failure modes
|
|
are covered. We can then treat the {\fg} as a `black box' or component in its own right.
|
|
We can now look at how the {\fg} can fail. Many of the component failure modes will
|
|
cause the same failure symptoms in the {fg} fialure behaviour.
|
|
We can collect these failures as common symptoms.
|
|
When we have out set of symptoms, we can now create
|
|
a {\dc}. The {\dc} will have as its set of failures
|
|
modes, the collected symptoms of the {\fg}.
|
|
|
|
Because we can now have a {\dc} we can use these to form
|
|
new {\fg}s and we can build a hierarchical model of the system failure modes.
|
|
|
|
|
|
|
|
%%- \section{building blocks of a safety critical systen}
|
|
%%-
|
|
%%- This section looks at common features in a safety critical system and
|
|
%%- then looks at the building blocks of these systems
|
|
%%- and their characteristics.
|
|
%%-
|
|
%%- \subsection{what is a safety critical system?}
|
|
%%-
|
|
%%- DEFINITIONS GET REFS
|
|
%%-
|
|
%%-
|
|
%%- TYPICALLY HAS MECHANICAL, ELECTRONIC and SOFTWARE
|
|
%%- actuators control intelligence
|
|
%%-
|
|
%%- \subsection{An example : industrial burner}
|
|
%%-
|
|
%%- An industrial burner is a nice example of a safety critical system.
|
|
%%- It has some lethal risks and some environmental.
|
|
%%- It could, by igniting an explosive mixture, cause an explosion.
|
|
%%- By burning incorrect proportions of fuel and air, it could be ineffecient and waste
|
|
%%- resources, or worse could cause poisonous burning (typically carbon monoxide, but also
|
|
%%- where flame temperature is very high, can produce NOX emmissions).
|
|
%%-
|
|
%%- To prevent igniting an explosive mixture, air is pumped though the furnace
|
|
%%- chamber on start-up, and this is verified with an air pressure switch.
|
|
%%-
|
|
%%-
|
|
%%- NEED A DIAGRAM HERE
|
|
%%-
|
|
%%-
|
|
%%- NEED A STATE CHART TOO
|
|
%%-
|
|
%%- It is interesting here to compare how the different methodologies
|
|
%%- would deal with a particular sub-system in the burner controller
|
|
%%- and compare how they analyse it.
|
|
%%- The Flame scanner is a good example for this.
|
|
%%- We shall consider a simple infra red (IR) flame scanner.
|
|
%%- This is in the form of an IR sensitive resistor.
|
|
%%- The flame type we will be looking for will have a characteristic
|
|
%%- flicker frequency of around 13Hz.
|
|
%%- The circuit is then simply a resitor voltage divider connected to
|
|
%%- a micro-controller reading the voltage.
|
|
%%- The flame scanner is thus a two resistor voltage divider.
|
|
%%-
|
|
%%- \subsection{The Flame Scanner}
|
|
%%- \subsubsection{Macro FTA perspective}
|
|
%%-
|
|
%%- SHOW ALL TOP LEVEL FAULTS. EXPLOSION, POISONOUS BURNING CO, POISONOUS BURNING NOX, FAILS TO LIGHT etc
|
|
%%-
|
|
%%- Follow the explosion tree down to flame scanner fails ON, and OFF
|
|
%%-
|
|
%%- etc
|
|
%%- \subsubsection{Macro FMEA/Statistical perspective}
|
|
%%-
|
|
%%- Each of the resistors is considered critical, in the statistical case, and so the MTTF
|
|
%%- is added inot the DANGEROUS section.
|
|
%%-
|
|
%%- For FMEA the resistor failures add up to the SYSTEM level, show this is inappropriate
|
|
%%- and makes several jumps in applied knowledge, thus Bayes theorem etc
|
|
%%-
|
|
%%- \subsubsection{Micro FMMD perspective}
|
|
%%-
|
|
%%-
|
|
%%- Here show how the flame scanner becomes a black box, or component in itself.
|
|
%%- How it is now available to be integrated into higher level designs.
|
|
%%-
|
|
%%- %and then an ignition position is checked.
|
|
%%- %Initially a pilot flame is started and when this is stable, the main
|
|
%%- %flame is fired.
|
|
%%- %To check the stability of the flame, a flame scanner is required.
|
|
%%- %To mix the fuel and air, motors to position valves are generally used.
|
|
%%- %To prevent fuel leakage into the furnace, safety shut-off valves are used \footnote{These generally open slowly under power, and when power is removed `slam shut'. Thus
|
|
%%- %in the event of a general power failure, the default to safe behaviour.}
|
|
%%-
|
|
%%-
|
|
%%-
|
|
%%-
|
|
%%- Motors controlling air and fuel flow
|
|
%%- safety chain to power for shutdown valves
|
|
%%- safety shutdown valves on fuel
|
|
%%- flame sensor
|
|
%%- air pressure sensor
|
|
%%-
|
|
%%-
|
|
%%- \section{Base Level Components}
|
|
%%-
|
|
%%- A common factor with all safety critical systems, is
|
|
%%- base level -or- bought in components. Be these
|
|
%%- electrical, mechanical or firmware, they should all
|
|
%%- have known failure modes.
|
|
%%-
|
|
%%- \subsection { Failure modes defining the component}
|
|
%%- We can consider each bought-in component as a base level component,
|
|
%%- and it should have an associated set of failure modes.
|
|
%%-
|
|
%%-
|
|
%%-
|
|
%%- \subsection { Complication of multiple failure modes }
|
|
%%- A very complicated component, like an integrated circuit or perhaps a servo motor, has
|
|
%%- a set of failure modes, where several things could go worng with it within the $\tau$ period.
|
|
%%- This is a simultaneous failure, or more than one failure mode being active during the same time period.
|
|
%%-
|
|
%%-
|
|
%%- \section{FMMD Proposed Methology Outline}
|
|
%%-
|
|
%%- fire away, essentially the elevator pitch
|
|
%%-
|
|
%%- \subsection{Treating a functional group as a component}
|
|
%%- \subsection{Using a derived component in designs}
|
|
%%- \section{Building a failure Mode model Hierarchy}
|
|
%%-
|
|
%%- AND the hierarchy...
|
|
%%-
|
|
%%-
|
|
%%- Probab about 3 pages
|