729 lines
30 KiB
TeX
729 lines
30 KiB
TeX
%
|
||
% Make the revision and doc number macro's then they are defined in one place
|
||
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
\begin{abstract}
|
||
A survey of Static Failure Mode analysis Methodologies applicable to saefty critical systems.
|
||
\end{abstract}
|
||
}
|
||
{
|
||
\section{Overvew}
|
||
A survey of Static Failure Mode analysis Methodologies applicable to saefty critical systems.
|
||
}
|
||
|
||
There are four methodologies in common use for failure mode modelling.
|
||
These are FTA, FMEA, FMECA
|
||
and FMEDA (a form of statistical assessment).
|
||
%
|
||
These methodologies date from the 1940's onwards, and were designed for
|
||
different application areas and reasons; all have drawbacks and
|
||
advantages that are discussed in the next section.
|
||
%In short
|
||
%FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
|
||
%lack precision in predicting failure modes at the SYSTEM level.
|
||
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
paper
|
||
}
|
||
{
|
||
chapter
|
||
}
|
||
presents the design considerations that motivated and provided the specification for
|
||
the FMMD methodology.
|
||
%
|
||
|
||
\section{Introduction}
|
||
|
||
\subsection{Failure Modes and System Failure Symptoms}
|
||
describe briefly what a base component failure mode is and what a system level failure mode is.
|
||
|
||
\section {Four Current Failure Mode Analysis Methodologies}
|
||
|
||
\subsection { FTA }
|
||
|
||
This, like all top~down methodologies introduces the very serious problem
|
||
of missing component failure modes \cite{faa}[Ch.9].
|
||
%, or modelling at
|
||
%a too high level of failure mode abstraction.
|
||
FTA was invented for use on the minuteman nuclear defence missile
|
||
systems in the early 1960s and was not designed as a rigorous
|
||
fault/failure mode methodology.
|
||
It was designed to look for disastrous top level hazards and
|
||
determine how they could be caused.
|
||
It is more like a procedure to
|
||
be applied when discussing the safety of a system, with a top down hierarchical
|
||
notation using logic symbols, that guides the analysis.
|
||
This methodology was designed for
|
||
experienced engineers sitting around a large diagram and discussing the safety aspects.
|
||
Also the nature of a large rocket with red wire, and remote detonation
|
||
failsafes meant that the objective was to iron out common failures
|
||
not to rigorously detect all possible failures.
|
||
Consequently it was not designed to guarantee to covering all component failure modes,
|
||
and has no rigorous in-built safeguards to ensure coverage of all possible
|
||
system level outcomes.
|
||
|
||
\subsubsection{ FTA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss component failure modes.
|
||
\item Possibility to miss environmental affects.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
\subsection { FMEA }
|
||
|
||
\label{pfmea}
|
||
This is an early static analysis methodology, and concentrates
|
||
on SYSTEM level errors which have been investigated.
|
||
The investigation will typically point to a particular failure
|
||
of a component.
|
||
The methodology is now applied to find the significance of the failure.
|
||
It is based on a simple equation where $S$ ranks the severity (or cost \cite{bfmea}) of the identified SYSTEM failure,
|
||
$O$ its occurrence\footnote{The occurrence $O$ is the
|
||
probability of the failure happening.},
|
||
and $D$ giving the failures detectability\footnote{Detectability: often failures
|
||
may occur but not be noticed or cause an effect.
|
||
Consider an unused feature failing.}. Muliplying these
|
||
together,
|
||
gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
|
||
This gives in effect
|
||
a prioritised `todo list', with higher $RPN$ values being the most urgent.
|
||
|
||
|
||
\subsubsection{ FMEA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item Possibility to miss environmental effects.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
\paragraph{Note.} FMEA is sometimes used in its literal sense, that is to say
|
||
Failure Mode Effects analysis, simply looking at a systems' internal failure
|
||
modes and determining what may happen as a result.
|
||
FMEA described in this section (\ref{pfmea}) is sometimes called `production FMEA'.
|
||
|
||
\subsection{FMECA}
|
||
|
||
Failure mode, effects, and criticality analysis (FMECA) extends FMEA and adds a failure outcome criticallity factor.
|
||
This is a bottom up methodology, which takes component failure modes
|
||
and traces them to the SYSTEM level failures.
|
||
%
|
||
Reliability data for components is used to predict the
|
||
failure statistics in the design stage.
|
||
An openly published source for the reliability of generic
|
||
electronic components was published by the DOD
|
||
in 1991 (MIL HDK 1991 \cite{mil1991}) and is a typical
|
||
source for MTFF data.
|
||
%
|
||
FMECA has a probability factor for a component error becoming % causing
|
||
a SYSTEM level error.
|
||
This is termed the $\beta$ factor.
|
||
%\footnote{for a given component failure mode there will be a $\beta$ value, the
|
||
%probability that the component failure mode will cause a given SYSTEM failure}.
|
||
%
|
||
This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea},
|
||
as often the component failure mode cannot be proven to cause a SYSTEM level failure, but is
|
||
assigned a probability $\beta$ factor by the design engineer. The use of a $\beta$ factor
|
||
is often justified using Bayes theorem \cite{probstat}.
|
||
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
|
||
%
|
||
The results of FMECA are similar to FMEA, in that component errors are
|
||
listed according to importance, based on
|
||
probability of occurrence and criticallity.
|
||
% to prevent the SYSTEM fault of given criticallity.
|
||
Again this essentially produces a prioritised `todo' list.
|
||
|
||
%%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
|
||
%%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or
|
||
%%-WIKI- piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the
|
||
%%-WIKI- probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability
|
||
%%-WIKI- and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value.
|
||
%%-WIKI- FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications,
|
||
%%-WIKI- while various forms of FMEA predominate in other industries.
|
||
|
||
|
||
\subsubsection{ FMECA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item The $\beta$ factor is based on heuristics and does not reflect any rigourous calculations.
|
||
\item Possibility to miss environmental affects.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
|
||
\subsection { FMEDA or Statistical Analyis }
|
||
|
||
Failure Modes, Effects, and Diagnostic Analysis (FMEDA)
|
||
% This
|
||
is a process that takes all the components in a system,
|
||
and using the failure modes of those components, the investigating engineer
|
||
ties them to possible SYSTEM level events/failure modes.
|
||
%
|
||
This technique
|
||
evaluates a products statistical level of safety
|
||
taking into account its self-diagnostic ability.
|
||
The calculations and procedures for FMEDA are
|
||
described in EN61508 %Part 2 Appendix C
|
||
\cite{en61508}[Part 2 App C].
|
||
The following gives an outline of the procedure.
|
||
|
||
|
||
\subsubsection{Two statistical perspectives}
|
||
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
|
||
Probability of Failure on Demand (PFD), and Probability of Failure
|
||
in continuous Operation, or Failure in Time (FIT).
|
||
\label{survey:fit}
|
||
\paragraph{Failure in Time (FIT).} Continuous operation is measured in failures per billion ($10^9$) hours of operation.
|
||
For a continuously running nuclear powerstation, industrial burner or aircraft engine
|
||
we would be interested in its operational FIT values.
|
||
\label{survey:pfd}
|
||
\paragraph{Probability of Failure on Demand (PFD).} For instance with an anti-lock system in
|
||
automobile braking, or other fail safe measure applied in an emergency, we would be interested in PFD.
|
||
That is to say the ratio of it failing
|
||
to succeeding to operate correctly on demand.
|
||
|
||
\subsubsection{The FMEDA Analysis Process}
|
||
|
||
\paragraph{Determine SYSTEM level failures from base components}
|
||
The first stage is to apply FMEA to the SYSTEM.
|
||
%
|
||
Each component is analysed in terms of how its failure
|
||
would affect the system.
|
||
Failure rates of individual components in the SYSTEM
|
||
are calculated based on component type and
|
||
environmental conditions. The SYSTEM errors are categorised as `safe' or `dangerous'.
|
||
%
|
||
%Statistical data exists for most component types \cite{mil1992}.
|
||
%
|
||
This phase is typically implemented on a spreadsheet
|
||
with rows representing each component. A typical component spreadsheet row would
|
||
comprise of
|
||
component type, placement,
|
||
part number, environmental stress factors, MTTF, safe/dangerous etc.
|
||
%will be a determination of whether the component failing will lead to a `safe'
|
||
%or `unsafe' condition.
|
||
|
||
\paragraph{Overall SYSTEM failure rate.}
|
||
The product failure rate is the sum of all component
|
||
failure rates. Typically the sum of all MTTF rates for all
|
||
components in an FMEDA spreadsheet.
|
||
%This is the sum of safe and unsafe
|
||
%failures.
|
||
|
||
\paragraph{Self Diagnostics.}
|
||
We next evaluate the SYSTEM's self-diagnostic ability.
|
||
|
||
%Each component’s failure modes and failure rate are now available.
|
||
Failure modes are now classified as safe or dangerous.
|
||
This is done by taking a component failure mode and determining
|
||
if the SYSTEM error it is tied to is dangerous or safe.
|
||
The decision for this may be
|
||
based on heuristics or field data.
|
||
EN61508 uses the $\lambda$ symbol to represent probabilities.
|
||
Because we have statistics for each component failure mode,
|
||
we can now now classify these in terms of safe and dangerous lambda values.
|
||
Detectable failure probabilities are labelled `$\lambda_D$' (for
|
||
dangerous) and `$\lambda_S$' (for safe) \cite{en61508}.
|
||
|
||
\paragraph{Determine Detectable and Undetectable Failures.}
|
||
Each safe and dangerous failure mode is now
|
||
classified as detectable or un-detectable.
|
||
EN61508 assumes that products have a high level of
|
||
self checking features.
|
||
%
|
||
This gives us four level failure mode classifications:
|
||
Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
|
||
and the probablistic failure rate of each classification
|
||
is represented by lambda variables
|
||
(i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
||
|
||
Because it is recognised that some failure modes may not be discovered theoretically during the static
|
||
analysis, the
|
||
% admission of how daft it is to take a component failure mode on its own
|
||
% and guess how it will affect an ENTIRE complex SYSTEM
|
||
% Admission of failure of the process really !!!!
|
||
next step is to investigate using an actual working SYSTEM.
|
||
|
||
Failures are deliberately caused (by physical intervention), and any new SYSTEM level
|
||
failures are added to the model.
|
||
Heuristics and MTTF failure rates for the components
|
||
are used to calculate probabilities for these new failure modes
|
||
along with their safety and detectability classifications (i.e.
|
||
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
||
These new failures are added to the model.
|
||
%SD, SU, DD, DU.
|
||
|
||
With these classifications, and statistics for each component
|
||
we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is)
|
||
and its safe failure fraction (how many of its failures are self detected or safe compared to
|
||
all failures possible).
|
||
|
||
The calculations for these are described below.
|
||
|
||
\paragraph{Diagnostic Coverage.}
|
||
The diagnostic coverage is simply the ratio
|
||
of the dangerous detected probabilities
|
||
against the probability of all dangerous failures,
|
||
and is normally expressed as a percentage. $\Sigma\lambda_{DD}$ represents
|
||
the percentage of dangerous detected base component failure modes, and
|
||
$\Sigma\lambda_D$ the total number of dangerous base component failure modes.
|
||
|
||
$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$
|
||
|
||
The diagnostic coverage for safe failures, where $\Sigma\lambda_{SD}$ represents the percentage of
|
||
safe detected base component failure modes,
|
||
and $\Sigma\lambda_S$ the total number of safe base component failure modes,
|
||
is given as
|
||
|
||
$$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$
|
||
|
||
|
||
\paragraph{Safe Failure Fraction.}
|
||
A key concept in FMEDA is Safe Failure Fraction (SFF).
|
||
This is the ratio of safe and dangerous detected failures
|
||
against all safe and dangerous failure probabilities.
|
||
Again this is usually expressed as a percentage.
|
||
|
||
$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$
|
||
|
||
%This is the ratio of
|
||
%Step 4 Calculate SFF, SIL and PFD
|
||
%The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
|
||
%SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
|
||
%PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)
|
||
|
||
% Often a given component failure mode there will be a $\beta$ value, the
|
||
% probability that the component failure mode will cause a given SYSTEM failure.
|
||
|
||
%\paragraph{Risk Mitigation}
|
||
%
|
||
%The component may be have its risk factor
|
||
%reduced by the checking interval (or $\tau$ time between self checking procedures).
|
||
%
|
||
%Ultimately this technique calculates a risk factor for each component.
|
||
%The risk factors of all the components are summed and
|
||
%%give a value for the `safety level' for the equipment in a given environment.
|
||
|
||
|
||
|
||
|
||
|
||
\paragraph{Classification into Safety Integrity Levels (SIL).}
|
||
There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
|
||
In addition to probablistic risk factors, the
|
||
diagnostic coverage and SFF
|
||
have threshold bands beoming stricter for each level.
|
||
Demanded software verification and specification techniques and constraints
|
||
(such as language subsets, s/w redundancy etc)
|
||
become stricter for each SIL level.
|
||
%%
|
||
%% Andrew asked me to expand on this here, but it would take at least two
|
||
%% pages. I think its more appropriate for the survey.tex chapter.
|
||
%%
|
||
|
||
Thus FMEDA uses statistical methods to determine
|
||
a safety level (SIL), typically used to meet an acceptable risk
|
||
value, specified for the environment the SYSTEM must work in.
|
||
EN61508 defines in general terms,
|
||
risk assessment and required SIL levels \cite{en61508} [5 Annex A].
|
||
|
||
%the probability of
|
||
%failures occurring, and provide an adaquate risk level.
|
||
%
|
||
%A component failure mode, given its MTTF
|
||
%the probability of detecting the fault and its safety relevant validation time $\tau$,
|
||
%contributes a simple risk factor that is summed
|
||
%in to give a final risk result.
|
||
%
|
||
Thus an FMEDA
|
||
model can be implemented on a spreadsheet, where each component
|
||
has a calculated risk, a fault detection time (if any), an estimated risk importance
|
||
and other factors such as de-rating and environmental stress.
|
||
With one component failure mode per row,
|
||
all the statistical factors for SIL rating can be produced\footnote{A SIL rating will apply
|
||
to an installed plant, i.e. a complete installed and working SYSTEM. SIL ratings for individual components or
|
||
sub-systems are meaningless, and the nearest equivalent would be the FIT/PFD and SFF and diagnostic coverage figures.}.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
\subsubsection{FMEDA and failure outcome prediction accuracy.}
|
||
FMEDA suffers from the same problems of
|
||
lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}.
|
||
%
|
||
This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level.
|
||
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
|
||
may be part of a critical monitoring function.
|
||
The analyst is now put in a position
|
||
where he probably should assign a dangerous failure classification to it.
|
||
%
|
||
There is no analysis
|
||
of how that resistor would/could affect the components close to it, but because the circuitry
|
||
is part of critical section it will most likely
|
||
be linked to a dangerous system level failure in an FMEDA study.
|
||
%
|
||
%%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA????
|
||
%%-
|
||
%A $\beta$ factor, the heuristically defined probability
|
||
%of the failure causing the system fault may be applied.
|
||
%
|
||
%In FMEDA there is no detailed analysis of the failure mode behaviour
|
||
%of the component in its local environment
|
||
%Component failure modes are traceable directly to the SYSTEM level.
|
||
%it becomes more
|
||
%guess work than science.
|
||
%
|
||
With FMEDA, there is no rigorous cause and effect analysis for the failure modes
|
||
and how they interact on the micro scale (the components adjacent to them in terms of functionality).
|
||
Unintended side effects that lead to failure can be missed.
|
||
Also component failure modes that are not
|
||
dangerous, may be wrongly assigned as dangerous simply because they exist in a critical
|
||
section of the product.
|
||
|
||
% some critical component failure
|
||
%modes, but we can only guess, in most cases what the safety case outcome
|
||
%will be if it occurs.
|
||
|
||
This leads to the practise of having components within a SYSTEM partitioned into different
|
||
safety level zones as recomended in EN61508\cite{en61508}. This is a vague way of determining
|
||
safety, as it can miss unexpected effects due to `unexpected' component interaction.
|
||
|
||
The Statistical Analysis methodology is the core philosophy
|
||
of the Safety Integrity Levels (SIL) embodied in EN61508 \cite{en61508}
|
||
and its international analog standard IOC5108.
|
||
|
||
|
||
|
||
\subsubsection{ FMEDA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item Statistical nature allows a proportion of undetected failures for given S.I.L. level.
|
||
\item Allows a small proportion of `undetectable' error conditions.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
%AND then how we can solve all there problems
|
||
|
||
\subsection{Deterministic FMEA}
|
||
|
||
EN298 no two individual component failures may give rise to a dangerous condition.
|
||
|
||
\section{FMEDA Failure effect Mode Diagnositic Analysis}
|
||
|
||
This is the main babsis of SIL certification for Programmed Electronic Equipment.
|
||
Itr applies FMEA, with classification of criticality of
|
||
components, adjustment to MTTF values by self checking mechanisms in the product,
|
||
and mitigation for a safe failure fraction. This leads to a probablistic
|
||
mean time to failure or probability of failure on demand, that will
|
||
fall within the criteria for a given SIL safety level.
|
||
An overview for this method can be found in an EXIDA paper \cite{fmeda}
|
||
and detailed description of the method for SIL certification in part 2 of
|
||
EN61508 \cite{en61508}.
|
||
|
||
disadvantage: single component failure is used to determine its effect on
|
||
the entire system. This leads to classifying components as safety or non-safety critical
|
||
at an early stage in the analysis. This means that complex interactions or side effects
|
||
of the components failing may not be taken into account.
|
||
|
||
advantage: concepts of self checking systems, and safe failure fraction\footnote{Safe Failure Fraction (SFF) is the number of non-safety critical components
|
||
that can be detected as failed compared to the number of safety critcal components. The thinking here is that is components are detected as failing
|
||
even though they are not safety critical, the system is self checking a greater proportion of its own systems, and is therefore safer. This
|
||
is applying bayes theorem for probablistic error detection}
|
||
|
||
This is a probablistic based methodology.
|
||
|
||
\subsection{Safe Failure Fraction}
|
||
|
||
Introduce the idea of coverage.
|
||
A good example is RAM in a microprocessor/microcontroller, we cann ot give 100i\% coverage to it.
|
||
We can perform some tests that give us 60\% coverage etc
|
||
|
||
\subsection{Diagnostic interval}
|
||
|
||
Reducing FIT with detecting a fraction of the faults within an interval. Give formulas etc
|
||
|
||
|
||
\subsection{Redundancy - Models}
|
||
|
||
1oo1 2oo3 etc
|
||
|
||
\subsection{Field Data}
|
||
|
||
OK for EN61508, not OK for nuclear industry find refs.
|
||
|
||
|
||
\section {FTA}
|
||
|
||
Fault tree Analysis
|
||
Show how it works, top down,
|
||
|
||
FROM INTERBET HISTORY OF FTA
|
||
|
||
% A simple fault tree
|
||
% Author: Zhang Long, Mail: zhangloong[at]gmail.com
|
||
%\def\pgfsysdriver{pgfsys-dvipdfm.def}
|
||
%\documentclass{minimal}
|
||
%\usepackage{tikz}
|
||
%\usetikzlibrary{shapes.gates.logic.US,trees,positioning,arrows}
|
||
%\begin{document}
|
||
|
||
\begin{figure}
|
||
\begin{tikzpicture}[
|
||
% Gates and symbols style
|
||
and/.style={and gate US,thick,draw,fill=blue!40,rotate=90,
|
||
anchor=east,xshift=-1mm},
|
||
or/.style={or gate US,thick,draw,fill=blue!40,rotate=90,
|
||
anchor=east,xshift=-1mm},
|
||
be/.style={circle,thick,draw,fill=white!60,anchor=north,
|
||
minimum width=0.7cm},
|
||
tr/.style={buffer gate US,thick,draw,fill=white!60,rotate=90,
|
||
anchor=east,minimum width=0.8cm},
|
||
% Label style
|
||
label distance=3mm,
|
||
every label/.style={blue},
|
||
% Event style
|
||
event/.style={rectangle,thick,draw,fill=yellow!20,text width=2cm,
|
||
text centered,font=\sffamily,anchor=north},
|
||
% Children and edges style
|
||
edge from parent/.style={very thick,draw=black!70},
|
||
edge from parent path={(\tikzparentnode.south) -- ++(0,-1.05cm)
|
||
-| (\tikzchildnode.north)},
|
||
level 1/.style={sibling distance=7cm,level distance=1.4cm,
|
||
growth parent anchor=south,nodes=event},
|
||
level 2/.style={sibling distance=7cm},
|
||
level 3/.style={sibling distance=6cm},
|
||
level 4/.style={sibling distance=3cm}
|
||
%% For compatability with PGF CVS add the absolute option:
|
||
% absolute
|
||
]
|
||
%% Draw events and edges
|
||
\node (g1) [event] {No flow to receiver}
|
||
child{node (g2) {No flow from Component B}
|
||
child {node (g3) {No flow into Component B}
|
||
child {node (g4) {No flow from Component A1}
|
||
child {node (t1) {No flow from source1}}
|
||
child {node (b2) {Component A1 blocks flow}}
|
||
}
|
||
child {node (g5) {No flow from Component A2}
|
||
child {node (t2) {No flow from source2}}
|
||
child {node (b3) {Component A2 blocks flow}}
|
||
}
|
||
}
|
||
child {node (b1) {Component B blocks flow}}
|
||
};
|
||
%% Place gates and other symbols
|
||
%% In the CVS version of PGF labels are placed differently than in PGF 2.0
|
||
%% To render them correctly replace '-20' with 'right' and add the 'absolute'
|
||
%% option to the tikzpicture environment. The absolute option makes the
|
||
%% node labels ignore the rotation of the parent node.
|
||
\node [or] at (g2.south) [label=-20:G02] {};
|
||
\node [and] at (g3.south) [label=-20:G03] {};
|
||
\node [or] at (g4.south) [label=-20:G04] {};
|
||
\node [or] at (g5.south) [label=-20:G05] {};
|
||
\node [be] at (b1.south) [label=below:B01] {};
|
||
\node [be] at (b2.south) [label=below:B02] {};
|
||
\node [be] at (b3.south) [label=below:B03] {};
|
||
\node [tr] at (t1.south) [label=below:T01] {};
|
||
\node [tr] at (t2.south) [label=below:T02] {};
|
||
%% Draw system flow diagram
|
||
% \begin{scope}[xshift=-7.5cm,yshift=-5cm,very thick,
|
||
% node distance=1.6cm,on grid,>=stealth',
|
||
% block/.style={rectangle,draw,fill=cyan!20},
|
||
% comp/.style={circle,draw,fill=orange!40}]
|
||
% \node [block] (re) {Receiver};
|
||
% \node [comp] (cb) [above=of re] {B} edge [->] (re);
|
||
% \node [comp] (ca1) [above=of cb,xshift=-0.8cm] {A1} edge [->] (cb);
|
||
% \node [comp] (ca2) [right=of ca1] {A2} edge [->] (cb);
|
||
% \node [block] (s1) [above=of ca1] {Source1} edge [->] (ca1);
|
||
% \node [block] (s2) [right=of s1] {Source2} edge [->] (ca2);
|
||
% \end{scope}
|
||
\end{tikzpicture}
|
||
\caption{Example FTA for a Gas Supply with two Shutoff Valves}
|
||
\end{figure}
|
||
|
||
|
||
\subsection{Bayes Theorm in Relation to Failure Modes}
|
||
|
||
\paragraph{Conditional Probability}
|
||
Bayes theorem describes the probability of causes.
|
||
|
||
In the context of failure modes in components
|
||
we are interested in how they may affect a SYSTEM.
|
||
The SYSTEM failure modes can be seen as symptoms of the failure modes of base
|
||
components.
|
||
For example, let $B$ be a base component failure mode
|
||
abd let $S$ be a system level failure mode.
|
||
|
||
We can say that the conditional probability of $S$ given $B$ is denoted as
|
||
\begin{equation}
|
||
\label{eqn:condprob}
|
||
P(S|B) = \frac{P(S \cap B)}{P(S)}
|
||
\end{equation}
|
||
|
||
%\paragraph{Multiple Events and conditional Probability}
|
||
%
|
||
%add copy, describe probabilities for multiple events.....
|
||
|
||
|
||
%Or in other words we can say that the probability of $B$ and $S$ occurring
|
||
%divided by the probability of $S$ occurring due to any cause, is the probability
|
||
%the $B$ caused $S$.
|
||
We can call this the {\em conditional probability} of $S$ given $B$.
|
||
Re-arranging \ref{eqn:bayes1}
|
||
|
||
$$ P(S) P(S|B) = P(S \cap B) $$
|
||
|
||
The inverse condition, $B$ given $S$ is
|
||
|
||
$$ P(B) P(B|S) = P(S \cap B) $$
|
||
|
||
As for one being the cause of the other, both equations must be equal,
|
||
we can state,
|
||
|
||
$$ P(B) P(B|S) = P(S \cap B) = P(S) P(S|B). $$
|
||
|
||
We can now re-arrange the equation~\cite{probstat} to remove the intersection $P(S \cap B)$ term
|
||
thus
|
||
|
||
\begin{equation}
|
||
\label{eqn:bayes1}
|
||
P(S|B) = \frac{P(S) P(B|S)} {P(B)} .
|
||
\end{equation}
|
||
|
||
Equation \ref{eqn:bayes1} means, given the event $B$ what is the probability it was caused by $S$.
|
||
Because we are interested in what base component failure modes could have caused $S$
|
||
we need to re-arrange this
|
||
|
||
\begin{equation}
|
||
\label{eqn:bayes2}
|
||
P(B|S) = \frac{P(B) P(S|B)} {P(S)} .
|
||
\end{equation}
|
||
|
||
Equation \ref{eqn:bayes2} can be read as given the system failure mode $S$
|
||
|
||
Typically a system level failure will have a number of possible causes, or base component failure
|
||
modes. Some base component failure modes may not be able to cause given system failures.
|
||
We can represent the the base component failure modes as a partioned set~\cite{nucfta}[fig VI-7], and overlay
|
||
a given system failure mode on it.
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=350pt,keepaspectratio=true]{./survey/partition.jpg}
|
||
% partition.jpg: 510x264 pixel, 72dpi, 17.99x9.31 cm, bb=0 0 510 264
|
||
\caption{Base Component Failure Modes represented as partitioned sets}
|
||
\label{fig:partitionbcfm}
|
||
\end{figure}
|
||
|
||
|
||
Figure \ref{fig:partitionbcfm} represents a small theoretical system
|
||
with nine base component failure modes. These are represented as partitions
|
||
in a set theoretic model of the systems possible failure mode causes.
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=350pt,keepaspectratio=true]{./survey/partition2.jpg}
|
||
% partition.jpg: 510x264 pixel, 72dpi, 17.99x9.31 cm, bb=0 0 510 264
|
||
\caption{Base Component Failure Modes with Overlaid System Error}
|
||
\label{fig:partitionbcfm2}
|
||
\end{figure}
|
||
|
||
Figure \ref{fig:partitionbcfm2} represents the case where we are looking at a particular
|
||
system level failure $S_k$. Looking at the diagram we can see that this system failure
|
||
could be, but is not necessarily caused by base component failure modes $B_1, B_2 \; or \; B_4$.
|
||
Should any other base component failure mode (causation event occur) according to the diagram
|
||
it will not be able to cause the system failure $S_k$.
|
||
|
||
\paragraph{Bayes Theorem}
|
||
|
||
Consider a SYSTEM error that has several potential base component causes.
|
||
Because a SYSTEM typically has a number of high level errors let us consider
|
||
a specific one and label it $S_k$.
|
||
We can call $P(S_k)$ the prior probability of the SYSTEM error. That is to
|
||
say the iprobability od $S_k$ occuring with no information about possible causes for it.
|
||
Consider a number of possible
|
||
base component `potential cause' events as $B_n$ where $n$ is an index.
|
||
Our sample space $SS$, for investigating the system failure mode/symptom
|
||
$S_k$ is thus $ SS = \{B_1 ... B_n\} $.
|
||
Thus if B is any event, we can apply bayes theorem
|
||
to determine the statistical likelihood that a given failure mode $B_n$
|
||
will cause the system level error $S_k$
|
||
|
||
%IN ENGLEEEESH Inverse causality.....
|
||
%Prob $B_n$ caused $S_k$ is the prob $S_k$ caused by $B_n$ divided by prob of $B_n$
|
||
|
||
$$
|
||
% P(S_k|B_n) = \frac{P(S_k) \; P(B_n | S_k) }{P(B_n)} alternate form of no use to MEEEEEE
|
||
P(B_n|S_k) = \frac{P(B_n) \; P(S_k | B_n) }{P(S_k)}
|
||
$$
|
||
|
||
For example were we to have a component that has a failure mode $B_n$ with an MTTF of $10^{-7}$ hours
|
||
and its associated system failure mode $S_k$ has a MTTF of $5.10^{-8}$ hours, and given that
|
||
when the system error $S_k$ occurs, there is a 10\% probability that $B_n$ had occured (i.e. $P(S_k | B_n) = 0.1$), we can determine
|
||
the probability that $S_k$ is caused by $B_n$ thus
|
||
|
||
|
||
$$
|
||
P(S_k|B_n) = \frac{5.10^{-8} .\; 0.1 }{ 10^{-7}} = 0.05 = 5\%
|
||
$$
|
||
|
||
|
||
Taking an example from the diagram (figure \ref{fig:partitionbcfm2}), where the base component fault cannot
|
||
lead to the system failure $S_k$.
|
||
Taking say $B_9$ which does not overlap with $S_k$ (i.e. $B_9 \cap S_k = \emptyset $),
|
||
we can see that $P(S_k | B_9) = 0$.
|
||
Bayes theorem applied to $B_9$ becomes
|
||
|
||
$$P(S_k|B_9) = \frac{P(B_9) .\; 0 }{ 10^{-7}} = 0 = 0\%$$
|
||
|
||
As $ P(S_k | B_n)$ is a factor in the numerator,
|
||
the application of bayes theorem to $B_9$ being a cause for $S_k$ has a probability
|
||
of zero, as we would expect.
|
||
|
||
Because we are interested in finding the probability of $S_k$ for all
|
||
base component failure modes, it is helpful to re-define
|
||
$P(B_n)$.
|
||
|
||
|
||
%
|
||
% here derive the trad version of bayes with the summation as the denominator
|
||
%
|
||
|
||
RESTRICTIONS:
|
||
|
||
Because this uses conditional probability for multiple independent events
|
||
complications such as operational states or envi1ronmental conditions
|
||
cannot be represented by the Bayesian model.
|
||
% consider 747 engines and a volcanic ash cloud....
|
||
|
||
\paragraph{mutually independent events and base component failure statistics}
|
||
|
||
FMEA, FTA, FMECA and to a great extent FMEDA, apply bayesian
|
||
concepts to individual base~components failure rates, rather than
|
||
using base~component failure modes, for the events under
|
||
investigation.
|
||
This means a lack of precision in interpretting the base failure
|
||
modes as statistically independent events.
|
||
Typically, a base component may fail in more than one way,
|
||
and usually once it has it stays in that failure mode.
|
||
This violates the principle of the events being statistically independent.
|
||
|
||
show using area propostional Euler Diagrams the failure modes and their
|
||
possible sdystem level failure outcomes.
|
||
|
||
Discuss unused sections of hardware in a product.
|
||
|
||
Discuss protection devices like VDR's and capacitors for smoothing
|
||
|
||
Discuss microprocessor watchdog and CRC ROM schemes
|
||
|
||
Discuss hardware failsafes (good example over pressure saefty values).
|
||
|
||
Keep relating these back to bayes theorem.
|
||
|
||
|
||
|
||
typeset in {\Huge \LaTeX} \today
|
||
|