1334 lines
58 KiB
TeX
1334 lines
58 KiB
TeX
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
\abstract{
|
||
This paper proposes a methodology for
|
||
creating failure mode models of safety critical systems, which
|
||
has a common notation
|
||
for mechanical, electronic and software domains and applies an
|
||
incremental and rigorous approach.
|
||
%This paper describes how the proposed methodology
|
||
%functions, given requirements and constraints (such as number of combinations
|
||
%of failure causes for flat ).
|
||
%It describes the need for the new methodology to be bottom-up, and
|
||
%then the need for incremental modularisation
|
||
%to build a fault mode hierarchy, which leads to the conceopt of functional grouping,
|
||
%analysis of those groupings, and from that
|
||
%the creation of derived components.
|
||
%%
|
||
%% What I have done
|
||
%%
|
||
The four main static failure mode analysis methodologies were examined and
|
||
in the context of newer European safety standards, assessed.
|
||
Some of the deficiencies identified in these methodologies led to
|
||
a wish list for a more rigorous methodology.
|
||
%%
|
||
%% What I have found
|
||
%%
|
||
From the wish list
|
||
%and considering some constraints determined from
|
||
%the evaluation of the four established methodologies,
|
||
a new
|
||
methodology is developed and proposed.
|
||
This has been named Failure Mode Modular De-Composition (FMMD).
|
||
|
||
%% Sell it
|
||
%%
|
||
In addition to addressing the traditional weaknesses of
|
||
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticality Analysis (FMECA)
|
||
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
|
||
as specified in newer European Safety Standards \cite{en298}.
|
||
The proposed methodology is bottom-up and can guarantee to leave no component failure mode unhandled.
|
||
It is also modular, meaning that the results of analysed components may be re-used in other projects.
|
||
}
|
||
}
|
||
{
|
||
%%% CHAPTER INTO NEARLT THE SAME AS ABSTRACT
|
||
|
||
This chapter proposes a methodology for
|
||
creating failure mode models of safety critical systems, which
|
||
has a common notation
|
||
for mechanical, electronic and software domains and applies an
|
||
incremental and rigorous approach.
|
||
%%
|
||
%This chapter describes how the proposed methodology functions
|
||
%given requirements and constraints such as the number of combinations
|
||
%of failure causes.
|
||
%It describes the need for the new methodology to be bottom-up, and
|
||
%then the need for incremental modularisation
|
||
%to build a fault mode hierarchy, which leads to the conceopt of functional grouping,
|
||
%analysis of those groupings, and from that
|
||
%the creation of derived components.
|
||
|
||
%% What I have done
|
||
%%
|
||
The four main static failure mode analysis methodologies were examined and
|
||
in the context of newer European safety standards, assessed.
|
||
Some of the deficiencies identified in these methodologies led to
|
||
a wish list for a more ideal methodology.
|
||
%%
|
||
%% What I have found
|
||
%%
|
||
From the wish list %
|
||
%and considering some constraints determined from
|
||
%the evaluation of the four established methodologies,
|
||
a new
|
||
methodology is developed and proposed.
|
||
This has been named Failure Mode Modular De-Composition (FMMD).
|
||
|
||
%% Sell it
|
||
%%
|
||
In addition to addressing the traditional weaknesses of
|
||
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticality Analysis (FMECA)
|
||
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
|
||
as specified in newer European Safety Standards \cite{en298}.
|
||
The proposed methodology is bottom-up and can guarantee to leave no component failure mode unhandled.
|
||
It is also modular, meaning that the results of analysed components may be re-used in other projects.
|
||
|
||
}
|
||
|
||
|
||
|
||
\section{Current Static Failure Mode Methodologies}
|
||
|
||
There are four methodologies in common use for failure mode modelling.
|
||
These are FTA, FMEA, FMECA
|
||
and FMEDA (a form of statistical assessment).
|
||
%
|
||
These methodologies date from the 1940's onwards, and were designed for
|
||
different application areas and reasons; all have drawbacks and
|
||
advantages that are discussed in the next section.
|
||
%In short
|
||
%FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
|
||
%lack precision in predicting failure modes at the SYSTEM level.
|
||
|
||
\paragraph{FMMD in context.}
|
||
Failure Mode Modular De-composition
|
||
(FMMD) aims to address the
|
||
weaknesses in the four established methodoligies, and to add
|
||
features such as the ability to analyse multiple
|
||
failure mode scenarios, and to allow modular re-use
|
||
of analysis.
|
||
|
||
%FMMD is an incremental bottom up FMEA process.
|
||
%% TERRIBLE PARAGRAPH
|
||
The FMMD
|
||
methodology provides a detailed, hierarchical, incremental and analytical
|
||
modelling system which will create a failure mode model from which
|
||
the data models for FTA, FMEA, FMECA and FMEDA % (the statistical approach)
|
||
can be
|
||
derived. % if required.
|
||
An FMMD model is effectively a super set of all the four traditional models.
|
||
It also focuses on component interaction within the model,
|
||
something not formally considered in the four established methodologies.
|
||
%
|
||
In addition it applies rigorous checking in all the analysis stages
|
||
ensuring that \textbf{all} component failure modes must be considered in the model.
|
||
|
||
%
|
||
\paragraph{FMMD process outline.}
|
||
This methodology has been named Failure Mode Modular De-composition (FMMD)
|
||
because it decomposes a SYSTEM into a hierarchy of modules or {\dc}s.
|
||
This
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
paper
|
||
}
|
||
{
|
||
chapter
|
||
}
|
||
presents the design considerations that motivated and provided the specification for
|
||
the FMMD methodology.
|
||
%
|
||
Firstly it briefly reviews the four traditional
|
||
static failure mode analysis methodologies and
|
||
lists their known weaknesses. A wish list is then drawn up
|
||
addressing these weaknesses and adding some extra requirements.
|
||
Using this wish list the philosophy for the new methodology
|
||
is determined.
|
||
%
|
||
FMMD works from the bottom up, taking small groups
|
||
of components, {\fgs}, and then analysing how they can fail.
|
||
\input{./shortfg}
|
||
|
||
\paragraph{Micro Vs. Macro failure mode analysis.}
|
||
The FMMD analysis is performed using failure mode effects analysis
|
||
from a micro rather than a macro perspective.
|
||
Thus instead of looking at component failure modes and determining how
|
||
they {\em may} cause a failure at SYSTEM level, we are looking at how
|
||
they {\em will} affect the component's local {\fg}.
|
||
When we know the failure modes of a {\fg} we can treat it as a `black box'
|
||
or {\dc}. With {\dc}s we can build {\fgs}
|
||
at higher levels of analysis, until we have a complete
|
||
hierarchy representing the failure behaviour of the SYSTEM.
|
||
%
|
||
Because all the failure modes of all the components
|
||
are held in a computer program, we can determine if the model has complete coverage
|
||
for component failure modes
|
||
(i.e. all component failure modes have been included in the model).
|
||
|
||
|
||
%OK need to describe the need for it
|
||
\section{The need for a new failure mode modelling methodology}
|
||
|
||
%%- There are dificulties with bot up methodologies,
|
||
%%- and this is in part due to the fact that accidents
|
||
%%- are always unforseen and unexpected.
|
||
|
||
%%- what do we have ENV factors, component failure modes.
|
||
|
||
%%- how difficult is it to take a single component failure mode and
|
||
%%- then from that determine how it will react with other components
|
||
%%- and how it will be affected
|
||
|
||
\subsection{General comments on bottom-up and top down approaches}
|
||
|
||
\paragraph{A general deficiency in top-down systems analysis.}
|
||
With a top down approach the investigator has to determine
|
||
a set of undesirable outcomes or `accidents'.
|
||
As most accidents are unexpected and the causes unforeseen \cite{safeware}
|
||
it is fair to say that a top down approach is not guaranteed to
|
||
predict all possible undesirable outcomes.
|
||
Top-down methodologies can miss known component failure modes, by
|
||
simply not decomposing down to the base component failure level of detail.
|
||
|
||
\paragraph{A general problem with bottom-up static failure analysis.}
|
||
With the bottom up techniques we have all the known component failure modes
|
||
and the relative freedom to determine how each of these may affect the SYSTEM.
|
||
%
|
||
A problem with this is that a component typically
|
||
interacts in a complex way with several other functionally
|
||
adjacent components.
|
||
%
|
||
To take a component failure mode and then attempt to tie that
|
||
to a SYSTEM level outcome is very difficult.
|
||
%
|
||
%
|
||
The number of components
|
||
a failure mode under investigation might interact with is typically very large.
|
||
This makes it very difficult to predict the effects of a component
|
||
failure mode, because we have to decide which components it could affect,
|
||
or
|
||
in other words, which components are functionally adjacent to it.
|
||
%
|
||
We cannot consider all the components in the SYSTEM
|
||
when looking at a single failure mode,
|
||
and therefore human judgement must be used to
|
||
decide which interactions could be important.
|
||
|
||
Let N be the number of components in our system, and K be the average number of component failure modes
|
||
(ways in which a base~component can fail). The total number of base component failure modes
|
||
is $N \times K$. To examine the effect that one failure mode has on all
|
||
the other components\footnote{A base component failure will typically affect the sub-system
|
||
it is part of, and create a failure effect at the SYSTEM level.}
|
||
will be $(N-1) \times N \times K$, in effect a very large set cross product.
|
||
|
||
|
||
Complicate this further with applied states or environmental conditions
|
||
and another order of cross product of complexity is added.
|
||
We may have a piece of self checking circuitry for instance that
|
||
has two states, normal and testing mode commanded by a logic line.
|
||
Or we may have a mechanical device that has a different
|
||
failure mode behaviour for say, different ambient pressures or temperatures.
|
||
|
||
If $E$ is the number of applied states or environmental conditions to consider
|
||
in a system, and $A$ the number of applied states,
|
||
the job of the bottom-up analyst is presented with two
|
||
additional %cross product
|
||
factors,
|
||
$(N-1) \times N \times K \times E \times A$.
|
||
If we put some typical very small embedded system numbers\footnote{these figures would
|
||
be typical of a very simple temperature controller, with a micro-controller sensor
|
||
and heater circuit.} into this, say $N=100$, $K=2.5$, $A=2$, and $E=10$
|
||
we have $99 \times 100 \times 2.5 \times 10 \times 2 = 495000 $.
|
||
To look in detail at a half of a million test cases is obviously impractical.
|
||
|
||
If we were to consider multiple simultaneous failure modes,
|
||
we have yet another cross product of checks to be performed.
|
||
%
|
||
For instance looking at double simultaneous failure modes, where $\#C$
|
||
is the number of checks to perform
|
||
the equation reads $\#C = (N-2) \times (N-1) \times N \times K \times E$.
|
||
|
||
The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes\footnote{Often component failures, rather than individual component
|
||
failure modes are used, making the analysis process less precise.} and link them
|
||
to SYSTEM level failure modes. Because of the astronomical number of possible interactions,
|
||
some valid ones are in danger of being missed, we can term this analysis as a `leap~of~faith'
|
||
(i.e. leaping from from the
|
||
component failure mode to the SYSTEM level).
|
||
|
||
|
||
|
||
\paragraph{Ideal static failure mode methodology.}
|
||
An ideal static failure mode methodology would build a failure mode model
|
||
from which the traditional four models could be derived.
|
||
It would address the short-comings in the other methodologies, and
|
||
would have a user friendly interface, with a visual (rather than symbolic) syntax with icons
|
||
to represent the results of analysis phases.
|
||
%
|
||
%There are four static analysis failure mode methodologies in common use.
|
||
%Each has its advantages and drawbacks, and each is suited for
|
||
%a different phase in the product life cycle.
|
||
The four methodologies in current use are discussed briefly below.
|
||
|
||
\subsection { FTA }
|
||
\glossary{name={FTA},description={Fault Tree Analysis}}
|
||
This, like all top~down methodologies introduces the very serious problem
|
||
of missing component failure modes \cite{faa}[Ch.9].
|
||
\fmodegloss
|
||
%, or modelling at
|
||
%a too high level of failure mode abstraction.
|
||
FTA was invented for use on the minuteman nuclear defence missile
|
||
systems in the early 1960s and was not designed as a rigorous
|
||
fault/failure mode methodology.
|
||
It was designed to look for disastrous top level hazards and
|
||
determine how they could be caused.
|
||
It is more like a procedure to
|
||
be applied when discussing the safety of a system, with a top down hierarchical
|
||
notation using logic symbols, that guides the analysis.
|
||
This methodology was designed for
|
||
experienced engineers sitting around a large diagram and discussing the safety aspects.
|
||
Also the nature of a large rocket with red wire, and remote detonation
|
||
failsafes meant that the objective was to iron out common failures
|
||
not to rigorously detect all possible failures.
|
||
Consequently it was not designed to guarantee to covering all component failure modes,
|
||
and has no rigorous in-built safeguards to ensure coverage of all possible
|
||
system level outcomes.
|
||
Also each system level error (or undesireable event) requires its own FTA tree.
|
||
This increases the amount of work to do, and in the case of updates to
|
||
particular sub-systems, introduces the requirement to update every FTA
|
||
tree modelling that use the affected sub-system.
|
||
|
||
\subsubsection{ FTA weaknesses }
|
||
\begin{itemize}
|
||
\item Complex component interaction effects are by definition modelled by FTA, but because of the top down approach, not all
|
||
base component failure modes are guaranteed to be included in the model.
|
||
\item Possibility to miss environmental affects.
|
||
\item One FTA tree, per system failure mode. Thus there is not one model from which several FTA
|
||
trees can be derived. Maintainability and consistency cannot therefore be automatically checked.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
\subsection { FMEA }
|
||
|
||
\label{pfmea}
|
||
This is an early static analysis methodology, and concentrates
|
||
on SYSTEM level errors which have been investigated.
|
||
The investigation will typically point to a particular failure
|
||
of a component.
|
||
The methodology is now applied to find the significance of the failure.
|
||
It is based on a simple equation where $S$ ranks the severity (or cost \cite{bfmea}) of the identified SYSTEM failure,
|
||
$O$ its occurrence\footnote{The occurrence $O$ is the
|
||
probability of the failure happening.},
|
||
and $D$ giving the failures detectability\footnote{Detectability: often failures
|
||
may occur but not be noticed or cause an effect.
|
||
Consider an unused feature failing.}. Muliplying these
|
||
together,
|
||
gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
|
||
This gives in effect
|
||
a prioritised `to~do~list', with higher $RPN$ values being the most urgent.
|
||
|
||
|
||
\subsubsection{ FMEA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of base component failure modes at SYSTEM level.
|
||
(because the its each individual component, not all its failure modes, that are considered for analysis).
|
||
\item Possibility to miss environmental effects.
|
||
\item Complex component interaction effects can be missed.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
\fmodegloss
|
||
\paragraph{Note.} FMEA is sometimes used in its literal sense, that is to say
|
||
Failure Mode Effects analysis, simply looking at a systems' internal failure
|
||
modes and determining what may happen as a result.
|
||
FMEA described in this section (\ref{pfmea}) is sometimes called `production FMEA'.
|
||
|
||
\subsection{FMECA}
|
||
|
||
Failure mode, effects, and criticality analysis (FMECA) extends FMEA adding a criticality factor.
|
||
This is a bottom up methodology, which takes component failure modes
|
||
and traces them to the SYSTEM level failures.
|
||
%
|
||
Reliability data for components is used to predict the
|
||
failure statistics in the design stage.
|
||
An openly published source for the reliability of generic
|
||
electronic components was published by the DOD
|
||
in 1991 (MIL HDK 1991 \cite{mil1991}) and is a typical
|
||
source for MTFF data.
|
||
%
|
||
FMECA has a probability factor for a component error becoming % causing
|
||
a SYSTEM level error.
|
||
This is termed the $\beta$ factor.
|
||
%\footnote{for a given component failure mode there will be a $\beta$ value, the
|
||
%probability that the component failure mode will cause a given SYSTEM failure}.
|
||
%
|
||
This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea},
|
||
as often the component failure mode cannot be proven to cause a SYSTEM level failure, but is
|
||
assigned a probability $\beta$ factor by the design engineer. The use of a $\beta$ factor
|
||
is often justified using Bayes theorem \cite{probstat}.
|
||
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
|
||
%
|
||
The results of FMECA are similar to FMEA, in that component errors are
|
||
listed according to importance, based on
|
||
probability of occurrence and criticality.
|
||
% to prevent the SYSTEM fault of given criticallity.
|
||
Again this essentially produces a prioritised `to~do' list.
|
||
|
||
%%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
|
||
%%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or
|
||
%%-WIKI- piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the
|
||
%%-WIKI- probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability
|
||
%%-WIKI- and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value.
|
||
%%-WIKI- FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications,
|
||
%%-WIKI- while various forms of FMEA predominate in other industries.
|
||
|
||
|
||
\subsubsection{ FMECA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item Possibility to miss environmental affects.
|
||
\item The $\beta$ factor is based on heuristics and does not reflect any rigorous calculations. Applying failure rates of individual components rather than individual failure modes
|
||
makes the factor less statistically reliable.
|
||
\item Complex component interaction effects can be missed.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
|
||
|
||
\subsection { FMEDA }
|
||
|
||
Failure Modes, Effects, and Diagnostic Analysis (FMEDA)
|
||
% This
|
||
is a process that takes all the components in a system,
|
||
and using the failure modes of those components, the investigating engineer
|
||
ties them to possible SYSTEM level events/failure modes.
|
||
\fmodegloss
|
||
%
|
||
This technique
|
||
evaluates a product's statistical level of safety
|
||
taking into account its self-diagnostic ability.
|
||
The calculations and procedures for FMEDA are
|
||
described in EN61508 %Part 2 Appendix C
|
||
\cite{en61508}[Part 2 App C].
|
||
The following gives an outline of the procedure.
|
||
|
||
|
||
\subsubsection{Two statistical perspectives}
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
|
||
Probability of Failure on Demand (PFD), and Probability of Failure
|
||
in continuous Operation, or Failure in Time (FIT).
|
||
|
||
\paragraph{Failure in Time (FIT).} Continuous operation is measured in failures per billion ($10^9$) hours of operation.
|
||
For a continuously running nuclear powerstation, industrial burner or aircraft engine
|
||
we would be interested in its operational FIT values.
|
||
|
||
\paragraph{Probability of Failure on Demand (PFD).} For instance with an anti-lock system in
|
||
automobile braking, or other fail safe measure applied in an emergency, we would be interested in PFD.
|
||
That is to say the ratio of it failing
|
||
to succeeding to operate correctly on demand.
|
||
}
|
||
{
|
||
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
|
||
Probability of Failure on Demand (PFD) (see \ref{survey:pfd})
|
||
, and Probability of Failure
|
||
in continuous Operation, or Failure in Time (FIT) (see \ref{survey:fit}).
|
||
}
|
||
|
||
\subsubsection{The FMEDA Analysis Process}
|
||
|
||
\paragraph{Determine SYSTEM level failures from base components}
|
||
The first stage is to apply FMEA to the SYSTEM.
|
||
%
|
||
Each component is analysed in terms of how its failure
|
||
would affect the system.
|
||
Failure rates of individual components in the SYSTEM
|
||
are calculated based on component type and
|
||
environmental conditions.
|
||
|
||
%The SYSTEM errors are categorised as `safe' or `dangerous'.
|
||
|
||
|
||
%
|
||
%Statistical data exists for most component types \cite{mil1992}.
|
||
%
|
||
%This phase is typically implemented on a spreadsheet
|
||
%with rows representing each component. A typical component spreadsheet row would
|
||
%comprise of
|
||
%component type, placement,
|
||
%part number, environmental stress factors, MTTF, safe/dangerous etc.
|
||
%%will be a determination of whether the component failing will lead to a `safe'
|
||
%or `unsafe' condition.
|
||
|
||
%\paragraph{Overall SYSTEM failure rate.}
|
||
%The product failure rate is the sum of all component
|
||
%failure rates. Typically the sum of all MTTF rates for all
|
||
%components in an FMEDA spreadsheet.
|
||
%This is the sum of safe and unsafe
|
||
%failures.
|
||
|
||
\paragraph{Self Diagnostics.}
|
||
We next evaluate the SYSTEM's self-diagnostic ability.
|
||
|
||
%Each component’s failure modes and failure rate are now available.
|
||
Failure modes are now classified as safe or dangerous.
|
||
This is done by taking a component failure mode and determining
|
||
if the SYSTEM error it is tied to is dangerous or safe.
|
||
The decision for this may be
|
||
based on heuristics or field data.
|
||
%EN61508 uses the $\lambda$ symbol to represent probabilities.
|
||
%Because we have statistics for each component failure mode,
|
||
%we can now now classify these in terms of safe and dangerous lambda values.
|
||
%Detectable failure probabilities are labelled `$\lambda_D$' (for
|
||
%dangerous) and `$\lambda_S$' (for safe) \cite{en61508}.
|
||
|
||
\paragraph{Determine Detectable and Undetectable Failures.}
|
||
Each safe and dangerous failure mode is now
|
||
classified as detectable or un-detectable.
|
||
For the higher integrity levels, EN61508 requires that products have a high proportion of
|
||
self checking features.
|
||
%
|
||
This gives us four level failure mode classifications:
|
||
Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
|
||
and the probablistic failure rate of each classification
|
||
is represented by lambda variables
|
||
(i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
||
|
||
\glossary{name={SD},description={Safe Detected; a SYSTEM level failure mode that is considered safe, and is detected by self checking mechanisms}}
|
||
\glossary{name={SU},description={Safe Undetected; a SYSTEM level failure mode that is considered safe, and is not detected by self checking mechanisms}}
|
||
\glossary{name={DD},description={Dangerous Detected; a SYSTEM level failure mode that is considered dangerous, and is detected by self checking mechanisms}}
|
||
\glossary{name={DU},description={Dangerous Undetected; a SYSTEM level failure mode that is considered dangerous, and is not detected by self checking mechanisms}}
|
||
|
||
Because it is recognised that some failure modes may not be discovered theoretically during the static
|
||
analysis, the
|
||
% admission of how daft it is to take a component failure mode on its own
|
||
% and guess how it will affect an ENTIRE complex SYSTEM
|
||
% Admission of failure of the process really !!!!
|
||
next step is to investigate using an actual working SYSTEM.
|
||
%
|
||
Failures are deliberately caused (by physical intervention), and any new SYSTEM level
|
||
failures are added to the model.
|
||
Heuristics and MTTF failure rates for the components
|
||
are used to calculate probabilities for these new failure modes
|
||
along with their safety and detectability classifications (i.e.
|
||
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
||
These new failures are added to the model.
|
||
%SD, SU, DD, DU.
|
||
|
||
%With these classifications, and statistics for each component
|
||
%we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is)
|
||
%and its safe failure fraction (how many of its failures are self detected or safe compared to
|
||
%all failures possible).
|
||
%
|
||
%The calculations for these are described below.
|
||
|
||
%\paragraph{Diagnostic Coverage.}
|
||
%The diagnostic coverage is simply the ratio
|
||
%of the dangerous detected probabilities
|
||
%against the probability of all dangerous failures,
|
||
%and is normally expressed as a percentage.
|
||
%%$\Sigma\lambda_{DD}$ represents
|
||
%the percentage of dangerous detected base component failure modes, and
|
||
%$\Sigma\lambda_D$ the total number of dangerous base component failure modes.
|
||
%
|
||
%$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$
|
||
%
|
||
%The diagnostic coverage for safe failures, where $\Sigma\lambda_{SD}$ represents the percentage of
|
||
%safe detected base component failure modes,
|
||
%and $\Sigma\lambda_S$ the total number of safe base component failure modes,
|
||
%is given as
|
||
%
|
||
%$$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$
|
||
%
|
||
%
|
||
\paragraph{Safe Failure Fraction.}
|
||
A key concept in FMEDA is Safe Failure Fraction (SFF).
|
||
This is the ratio of safe and dangerous detected failures
|
||
against all safe and dangerous failure probabilities.
|
||
%Again this is usually expressed as a percentage.
|
||
|
||
%$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$
|
||
|
||
%This is the ratio of
|
||
%Step 4 Calculate SFF, SIL and PFD
|
||
%The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
|
||
%SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
|
||
%PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)
|
||
|
||
% Often a given component failure mode there will be a $\beta$ value, the
|
||
% probability that the component failure mode will cause a given SYSTEM failure.
|
||
|
||
%\paragraph{Risk Mitigation}
|
||
%
|
||
%The component may be have its risk factor
|
||
%reduced by the checking interval (or $\tau$ time between self checking procedures).
|
||
%
|
||
%Ultimately this technique calculates a risk factor for each component.
|
||
%The risk factors of all the components are summed and
|
||
%%give a value for the `safety level' for the equipment in a given environment.
|
||
|
||
|
||
|
||
|
||
|
||
\paragraph{Classification into Safety Integrity Levels (SIL).}
|
||
There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
|
||
In addition to probablistic risk factors, the
|
||
diagnostic coverage and SFF
|
||
have threshold bands beoming stricter for each level.
|
||
Demanded software verification and specification techniques and constraints
|
||
(such as language subsets, s/w redundancy etc)
|
||
become stricter for each SIL level.
|
||
%%
|
||
%% Andrew asked me to expand on this here, but it would take at least two
|
||
%% pages. I think its more appropriate for the survey.tex chapter.
|
||
%%
|
||
|
||
Thus FMEDA uses statistical methods to determine
|
||
a safety level (SIL), typically used to meet an acceptable risk
|
||
value, specified for the environment the SYSTEM must work in.
|
||
EN61508 defines in general terms,
|
||
risk assessment and required SIL levels \cite{en61508} [5 Annex A].
|
||
|
||
%the probability of
|
||
%failures occurring, and provide an adaquate risk level.
|
||
%
|
||
%A component failure mode, given its MTTF
|
||
%the probability of detecting the fault and its safety relevant validation time $\tau$,
|
||
%contributes a simple risk factor that is summed
|
||
%in to give a final risk result.
|
||
%
|
||
Thus an FMEDA
|
||
model can be implemented on a spreadsheet, where each component
|
||
has a calculated risk, a fault detection time (if any), an estimated risk importance
|
||
and other factors such as de-rating and environmental stress.
|
||
With one component failure mode per row,
|
||
all the statistical factors for SIL rating can be produced\footnote{A SIL rating will
|
||
apply to an installed plant, i.e. a complete installed and working SYSTEM.
|
||
SIL ratings for individual components or
|
||
sub-systems are meaningless, and the nearest equivalent would be the
|
||
FIT/PFD and SFF and diagnostic coverage figures.}.
|
||
\glossary{name={FIT}, description={Failure in Time (FIT). The number of times a particular failure is expected to occur in a $10^{9}$ hour time period.}}
|
||
|
||
|
||
|
||
|
||
|
||
|
||
\subsubsection{FMEDA and failure outcome prediction accuracy.}
|
||
FMEDA suffers from the same problems of
|
||
lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}.
|
||
\fmodegloss
|
||
%
|
||
This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level.
|
||
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
|
||
may be part of a critical monitoring function.
|
||
The analyst is now put in a position
|
||
where he probably should assign a dangerous failure classification to it.
|
||
%
|
||
There is no analysis
|
||
of how that resistor would/could affect the components close to it, but because the circuitry
|
||
is part of a critical section it will most likely
|
||
be linked to a dangerous system level failure in an FMEDA study.
|
||
%
|
||
%%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA????
|
||
%%-
|
||
%A $\beta$ factor, the heuristically defined probability
|
||
%of the failure causing the system fault may be applied.
|
||
%
|
||
%In FMEDA there is no detailed analysis of the failure mode behaviour
|
||
%of the component in its local environment
|
||
%Component failure modes are traceable directly to the SYSTEM level.
|
||
%it becomes more
|
||
%guess work than science.
|
||
%
|
||
With FMEDA, there is no rigorous cause and effect analysis for the failure modes
|
||
and how they interact on the micro scale (the components adjacent to them in terms of functionality).
|
||
Unintended side effects that lead to failure can be missed.
|
||
Also component failure modes that are not
|
||
dangerous, may be wrongly assigned as dangerous simply because they exist in a critical
|
||
section of the product.
|
||
|
||
% some critical component failure
|
||
%modes, but we can only guess, in most cases what the safety case outcome
|
||
%will be if it occurs.
|
||
|
||
This leads to the practise of having components within a SYSTEM partitioned into different
|
||
safety level zones as recomended in EN61508\cite{en61508}. This is a vague way of determining
|
||
safety, as it can miss unexpected effects due to `unexpected' component interaction.
|
||
|
||
The Statistical Analysis methodology is the core philosophy
|
||
of the Safety Integrity Levels (SIL) embodied in EN61508 \cite{en61508}
|
||
and its international analog standard IOC5108.
|
||
|
||
|
||
|
||
\subsubsection{ FMEDA weaknesses }
|
||
\begin{itemize}
|
||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||
\item Statistical nature allows a proportion of undetected failures for given S.I.L. level. These could be catastrophic failures, as long as the perceived probability is low enough, they are considered acceptable for EN61508.
|
||
\item Complex component interaction effects are more likely to be seen (because self diagnostic capability is considered), than FMEA or FMECA but can still be missed.
|
||
\item Allows a small proportion of `undetectable' error conditions.
|
||
\item No possibility to model base component level double failure modes.
|
||
\end{itemize}
|
||
%AND then how we can solve all there problems
|
||
|
||
\section{A wish list for a failure mode methodology}
|
||
\begin{itemize}
|
||
\item All component failure modes must be considered in the model.
|
||
\item It should be easy to integrate mechanical, electronic and software models \cite{sccs}[pp.287].
|
||
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
|
||
\item It should have a formal basis, that is to say, be able to produce mathematical proofs
|
||
for its results, such as system level error causation trees, reliability and safety statistics.
|
||
\item It should be easy to use, ideally using a
|
||
graphical syntax (as opposed to a formal symbolic/mathematical text based language).
|
||
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
|
||
to smaller and smaller functional groupings \cite{maikowski}.
|
||
\item Multiple failure modes may be modelled from the base component level up.
|
||
\end{itemize}
|
||
|
||
|
||
\section{Design of a new static failure mode based methodology}
|
||
|
||
\paragraph{New methodology must be bottom-up.}
|
||
In order to ensure that all component failure modes have been covered
|
||
the methodology will have to work from the bottom-up
|
||
and start with the component failure modes.
|
||
\fmodegloss
|
||
%
|
||
\paragraph{Natural Fault Finding is top down.}
|
||
The traditional fault finding, or natural fault finding
|
||
is to start at the top with SYSTEM level failure modes/faults.
|
||
%
|
||
On encountering a
|
||
fault, the symptom is first observed at the top or
|
||
SYSTEM level. By decomposing the functionality of the faulty system and testing
|
||
we can further decompose the system until we find the
|
||
faulty base level component.
|
||
Decomposition of electrical circuits is formalised and explored
|
||
in \cite{maikowski}. This top down technique de-composes by functionality.
|
||
Simpler and simpler functional groups are discovered as we delve
|
||
further into the way the system works and is built.
|
||
|
||
|
||
\paragraph{Need for a `bottom-up' system de-composition.}
|
||
There is an apparent conflict here as de-composition ormally implies a top-down approach. The natural way to
|
||
de-compose a system is from the top down.
|
||
%
|
||
If we do this though, we do not naturally include
|
||
all failure modes in the modules determined as we
|
||
de-compose downwards.
|
||
%
|
||
What is required here is to mimic this top-down de-composition
|
||
with a bottom up technique.
|
||
By doing that, we can take all base component failure modes
|
||
and ensure they are included in the model.
|
||
|
||
By taking components that form {\fg}s from the bottom up
|
||
and then taking those to form higher level
|
||
{\fg}s we can get a close approximation of the de-composition process from the bottom up.
|
||
The philosophy of top down de-composition is very similar.
|
||
Top down de-composition applies functional
|
||
de-composition, because it seeks to break the system down
|
||
into manageable and separately testable entities.
|
||
A second justification for this is that the design process for a product requires both top down and bottom-up
|
||
thinking. To analyse a system from the bottom-up is a useful
|
||
design validation process in itself \cite{sommerville}.
|
||
%%
|
||
%% CAN we find a ref for both top and bottom up being used
|
||
%% as design validation ????
|
||
|
||
\paragraph{Design Decision: Methodology must be bottom-up.}
|
||
In order to ensure that all component failure modes are handled,
|
||
this methodology must start at the bottom, with base component failure modes.
|
||
In this way automated checking can be applied to all component failure modes
|
||
to ensure none have been inadvertently excluded from the process.
|
||
|
||
\paragraph{Problems with functional group hierarchy.}
|
||
A hierarchy of functional grouping, leading to a system model
|
||
still leaves us with the problem of the number of component failure modes.
|
||
The base components will typically have several failure modes each.
|
||
%
|
||
Given a typical embedded system may have hundreds of components,
|
||
this means that we would still have to tie base component failure modes
|
||
to SYSTEM level errors.
|
||
%
|
||
The problem with this is that the base component failure mode under investigation,
|
||
are not rigorously examined in relation to functionally adjacent components.
|
||
%
|
||
If failures modes could be collected and simplified somehow
|
||
at each stage in a hierarchy of {\fgs}, the functionally adjacent
|
||
ideal would be met, and as we progress up the hierarchy the number
|
||
of failure modes should decrease.
|
||
%Thus there is the `possibility to miss failure mode effects
|
||
%at the much higher SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
|
||
%%%
|
||
%%% OK Got up to here Lunchtime edit 06DEC2010.............
|
||
|
||
\paragraph{Design Decision: Methodology must collate errors at each functional group stage.}
|
||
SYSTEMS typically have far fewer failure modes than the sum of their base component failure modes.
|
||
SYSTEM level failures may be caused by a variety of component failure modes.
|
||
A SYSTEM level failure mode is an abstracted failure mode, in that
|
||
it is a symptom of some lower level failure or failures.
|
||
Tracing the SYSTEM level failure or symptom, down through
|
||
a decomposed system, will give a fault tree. This will typically
|
||
trace the SYSTEM level failure mode to some individual base component failures
|
||
or combinations thereof.
|
||
% ABSTRACTION
|
||
For instance a failed resistor in a sensor at a base component level is a specific
|
||
failure mode.
|
||
%
|
||
For example it could be called `RESISTOR 1 OPEN'.
|
||
%
|
||
Now consider the symptom in a functional group comprising the sensor channel that
|
||
RESISTOR 1 is part of `RESISTOR 1 OPEN'.
|
||
%
|
||
We might call it `READING~HIGH' failure perhaps.
|
||
The Fault has become less detailed and more general. There may be other
|
||
causes for a `READING~HIGH'. We can say that the failure
|
||
mode `READING~HIGH' is more abstract in terms of the SYSTEM, than `RESISTOR 1 OPEN'.
|
||
%
|
||
At a higher level still
|
||
this may be called `SENSOR CHANNEL 1' fault.
|
||
At a system level it may simply be a `SENSOR FAILURE'.
|
||
As we traverse up the fault tree the failure modes
|
||
become more abstract.
|
||
%
|
||
At each functional group collection, there must be a process to collect
|
||
common symptoms and reduce the number of failure modes to handle.
|
||
This must be a process that incrementally reduces the number
|
||
of failure modes as the abstraction level reaches the SYSTEM level.
|
||
|
||
\paragraph{How to build a meaningful SYSTEM failure behaviour model.}
|
||
The next problem is how we build a failure mode model
|
||
that converges from a multitude of base
|
||
component failures to a finite set of SYSTEM level failure modes.
|
||
%
|
||
It would be better to analyse the failure mode behaviour of each
|
||
functional group, and determine the ways in which it, rather than its
|
||
components, can fail.
|
||
%
|
||
By doing this, the natural process whereby symptoms of the {\fg}
|
||
(which can potentially be caused by more than one component failure mode)
|
||
are extracted.
|
||
%
|
||
The number of symptoms will be less than or equal to the number
|
||
component failure modes, and in practise will be much less.
|
||
%
|
||
Thus stage by stage symptom collection becomes the key to reducing the number
|
||
of failure modes to handle as we traverse up the hierarchy.
|
||
|
||
|
||
|
||
\paragraph{Component failures and {\fg} failure symptoms.}
|
||
In other words we want to find out what the symptoms of the failures in the {\fg}s
|
||
are.
|
||
%The number of symptoms of failure should be equal to or
|
||
%less than the number of component failure modes, simply because
|
||
%often there are several potential causes of failure symptoms.
|
||
%
|
||
When we have the symptoms, we can start thinking of the {\fg} as a component in its own right.
|
||
%with a simplified and reduced set of failure symptoms.
|
||
%
|
||
We can now create a new {\dc}, where its failure modes
|
||
are the failure symptoms of the {\fg}.
|
||
%
|
||
|
||
By taking {\dcs} to form higher level functional groups
|
||
we can build a bottom-up model incrementally.
|
||
In this way as we build the hierarchy, we naturally abstract the
|
||
failure mode behaviour, but can check that all failure modes in
|
||
the hierarchy have been considered and tied to causing symptoms.
|
||
|
||
\paragraph{Design Decision: Derived components must be determined from functional groups.}
|
||
The symptoms obtained from analysing a {\fg} will be used as the `failure~modes'
|
||
of its corresponding {\dc}.
|
||
|
||
\paragraph{Incremental Stages and \dcs .}
|
||
We can use incremental stages to build the hierarchy.
|
||
We can take small {\fg}s of components, where the {\fg}
|
||
is a small set of components that perform a simple
|
||
task.
|
||
%
|
||
%The functional group should perform a clearly defined task.
|
||
The design engineer must choose the components that form a {\fg}.
|
||
It should be possible to consider the {\fg} as a component or
|
||
black box, performing a given function.
|
||
The {\fg} should be chosen to be as small
|
||
(in terms of the number of components) as possible.
|
||
%
|
||
This should be small enough to be able %Another advantage of the functional group being small
|
||
to comfortably analyse all the failure
|
||
modes of its components.
|
||
%
|
||
We can consider these failure modes from the perspective
|
||
of the {\fg}. In other words, for each component failure mode in the {\fg},
|
||
we create a `test case' and decide how each failure affects the functional group.
|
||
%
|
||
With the results from the test cases we will now have the ways in which the
|
||
{\fg} can fail.
|
||
%
|
||
%
|
||
We can refine this further, by grouping the common symptoms, or results that
|
||
are the same failure {\wrt} the {\fg}.
|
||
%
|
||
We can now treat the {\fg} as a component, and create a corresponding {\dc}: in other words, a `sub-system' with a known set of failure modes.
|
||
%
|
||
We can now create a new/{\dc} and assign it these common symptoms
|
||
as its failure modes.
|
||
%
|
||
This {\dc} can be used to build higher level
|
||
{\fg}s, and this will naturally form a hierarchy.
|
||
This hierarchy can be extended until it encompasses
|
||
an entire SYSTEM.
|
||
%
|
||
It can be considered complete when
|
||
all failure modes from all components are included in the model
|
||
and all base component failure modes can be traced
|
||
through the fault tree to SYSTEM level failure modes.
|
||
|
||
\paragraph{Directed Acyclic Graph (DAG).}
|
||
If we ensure that
|
||
derived components cannot be included in {\fg}s
|
||
of a lower abstraction level
|
||
the data structure produced from collecting functional groups
|
||
and deriving components will naturally form a DAG.
|
||
In other words we can say that we cannot allow a {\fg}
|
||
to include any component created from it.
|
||
|
||
%
|
||
%
|
||
By representing the failure mode model as a DAG, we
|
||
now have the capability to take SYSTEM level failure modes
|
||
and determine the possible combinations of component failure modes that
|
||
could have caused it.
|
||
This will allow us to define fault trees for each SYSTEM level failure.
|
||
This will mean that we be able to determine which
|
||
combinations of base component failures could cause the SYSTEM
|
||
failure.
|
||
%In FTA terminology, a list of possible
|
||
%causes for a SYSTEM level failure is known as a `cut set' \cite{nasafta}\cite{nucfta}.
|
||
If statistical models exist for the component failure modes
|
||
these failure causation trees (or minimal cut sets\footnote{In FTA terminology a minimal cut set is the branch of a
|
||
%\glossary{name={entry name}, description={entry description}}
|
||
\glossary{name={cut set}, description={A cut set in a fault tree is a set of base component failure modes, whose occurrence ensures that a TOP (or SYSTEM) event occurs} }
|
||
\glossary{name={minimal cut set}, description={A cut set in a fault tree that cannot be reduced (i.e. \textbf{all} the base component failure modes are required to cause the SYSTEM level event) } }
|
||
fault tree, from the top SYSTEM level to the bottom, with the least number
|
||
of base component failure modes. If a single base component failure mode can cause
|
||
a SYSTEM level error this is usually considered a liability.})
|
||
can be used to calculate Mean Time to Failure (MTTF) or
|
||
Probability of Failure on demand (PFD) figures.
|
||
Contrast the analytical capability of FMMD with the
|
||
methodologies where the component failure modes/components are linked
|
||
directly to SYSTEM failure modes with no analysis stages in between.
|
||
|
||
|
||
|
||
\paragraph{Design Decision: A functional group cannot
|
||
contain {\dc}s at a higher abstraction level than itself}
|
||
|
||
We can say that no component may be derived from itself directly
|
||
or indirectly.
|
||
We can track the `abstraction level' by increasing it each time
|
||
there is a phase of symptom collection.
|
||
We can use the symbol $alpha$ to represent the abstraction level
|
||
and make it an attribute of a component.
|
||
Base components will have an $\alpha$ level of zero.
|
||
A derived component when created must always have a greater $\alpha$ value than any
|
||
of the components included in the {\fg} from which it was derived.
|
||
|
||
|
||
\paragraph{Natural Reduction in number of failure modes with abstraction level.}
|
||
%
|
||
Because common symptoms are being collected, as we build the tree upward
|
||
the number of failure modes decreases (or exceptionally stays the same)
|
||
at each level.\footnote{In very unusual cases where the known
|
||
failure modes of a {\fg} can be collected into symptoms,
|
||
the number of failure modes from its components would be the
|
||
same as the number of failure modes in the component derived from it.}
|
||
This decreasing of the number of failure modes is borne out {\irl}.
|
||
Of the thousands of component failure modes in a typical product
|
||
there are generally only a handful of SYSTEM level failure modes
|
||
(or top level `symptoms' of underlying failures).
|
||
%
|
||
|
||
\subsection{Outline of the FMMD process}
|
||
\label{fmmdproc}
|
||
FMMD builds {\fg}s of components from the bottom-up.
|
||
The lowest level of components are termed base components.
|
||
These are the initial building blocks.
|
||
In electronics these would be the individual
|
||
passive and active components on the parts~list.
|
||
In mechanics the levers, linkages, springs and cogs etc.
|
||
%
|
||
Functional groups are collections of components
|
||
that work together to perform a simple function.
|
||
%
|
||
We can perform a failure mode effects analysis on each of the component failure
|
||
modes within a {\fg}. Because we can guide the process in software we can
|
||
ensure that all component failure modes
|
||
are included in the model.
|
||
%
|
||
We can then treat the {\fg} as a `black box' or component in its own right.
|
||
We can now look at how the {\fg} can fail.
|
||
%
|
||
Many of the component failure modes will
|
||
cause the same failure symptoms in the {\fg}.
|
||
We can collect these failures as common symptoms.
|
||
%
|
||
When we have our set of symptoms, we can now create
|
||
a {\dc}. The {\dc} will have as its set of failures
|
||
modes, the collected symptoms of the {\fg}.
|
||
%
|
||
Because we can now have {\dcs} we can use these to form
|
||
new {\fg}s and we can build a hierarchical `failure~mode' model of the SYSTEM.
|
||
|
||
|
||
%%- Need diagram of hierarchy
|
||
%%-
|
||
%%-
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=200pt,bb=0 0 331 249,keepaspectratio=true]{./fmmd_concept/fmmd_hierarchy.jpg}
|
||
% fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
|
||
\caption{Example derived component created from the functional group comprised of components a,b,c}
|
||
\label{fig:fmmd_hierarchy}
|
||
\end{figure}
|
||
|
||
A {\fg} is a set of components (each with a set of of failure modes)
|
||
that collectively group together to serve some purpose (to perform some function),
|
||
and derived components are determined
|
||
from analysis and symptom collection
|
||
of the {\fg}.
|
||
|
||
The {\dc} is equipped with a new set of failure modes
|
||
corresponding to the symptoms from the {\fg}.
|
||
|
||
The diagram in figure \ref{fig:fmmd_hierarchy}, shows one stage
|
||
of the FMMD process. The resultant {\dc} may be used to
|
||
create higher level {\fg}s in later stages.
|
||
|
||
% \begin{figure}[h]
|
||
% \centering
|
||
% \includegraphics[bb=0 0 331 249,keepaspectratio=true]{./fmmd_hierarchy.jpg}
|
||
% % fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
|
||
% \caption{Example derived component created from a functional group comprised of components a,b,c}
|
||
% \label{fig:fmmd_hiarchy}
|
||
% \end{figure}
|
||
%
|
||
% \vspace{20pt}
|
||
% NEED DIAGRAM OF HIERARCHY
|
||
% \vspace{20pt}
|
||
|
||
We associate a component with its failure modes.
|
||
This is represented in UML in figure \ref{fig:component concept}.
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=200pt,keepaspectratio=true]{./fmmd_concept/component.jpg}
|
||
% component.jpg: 467x76 pixel, 72dpi, 16.47x2.68 cm, bb=0 0 467 76
|
||
\caption{Component with failure modes UML diagram}
|
||
\label{fig:component concept}
|
||
\end{figure}
|
||
|
||
|
||
\subsection{Environmental Conditions, Operational States and FMMD}
|
||
|
||
Any real world sub-system will exist in a variable environment
|
||
and may have several modes of operation.
|
||
In order to find all possible failures, the sub-system
|
||
must be analysed for each operational state
|
||
and environment condition that can affect it.
|
||
%
|
||
Two design decisions are required here: which objects should we
|
||
analyse the environmental and the operational states with respect to?
|
||
There are three objects in our model to which these considerations could be applied.
|
||
We could apply these conditions for analysis
|
||
to the functional group, the components, or the derived
|
||
component.
|
||
|
||
\paragraph {Environmental Conditions and FMMD.}
|
||
|
||
Environmental conditions are external to the
|
||
{\fg} and are often things over which the system has no direct control.
|
||
Consider ambient temperature, pressure or even electrical interference levels.
|
||
%
|
||
Environmental conditions may affect different components in a {\fg}
|
||
in different ways.
|
||
|
||
For instance, a system may be specified for
|
||
$0\oc$ to $85\oc$ operation, but some components
|
||
may show failure behaviour between $60\oc$ and $85\oc$
|
||
\footnote{Opto-islolators typically show marked performance decrease after
|
||
$60\oc$ \cite{tlp181}, whereas another common component, say a resistor, will be unaffected.}.
|
||
Other components may operate comfortably within that whole temperature range specified.
|
||
Environmental conditions will have an effect on the {\fg} and the {\dc},
|
||
but they will have specific effects on individual components.
|
||
|
||
\paragraph{Design Decision.}
|
||
Environmental constraints will be applied to components.
|
||
A component will hold a set of environmental states that
|
||
affect it.
|
||
Environmental conditions will apply SYSTEM wide,
|
||
but may only affect specific components.
|
||
%Some may not be required for consideration
|
||
%for the analysis of particular systems.
|
||
|
||
\paragraph {Operational States and FMMD.}
|
||
|
||
Sub-systems may have specific operational states.
|
||
These could be a general health level such as
|
||
normal operation, graceful degradation or lockout.
|
||
Or they could be self~checking sub-systems that are either in a normal or self~check state.
|
||
|
||
Operational states are conditions that apply to a functional group, not individual components.
|
||
%% Andrew says that that does no make sense But I think it does
|
||
|
||
\paragraph{Design Decision.}
|
||
Operational state will be applied to {\fg}s.
|
||
|
||
\paragraph{UML Model of FMMD Analysis}
|
||
|
||
The UML diagram in figure \ref{fig:env_op_uml}, shows the data
|
||
relationships between {\fgs} and operational states, and component
|
||
failure modes and environmental factors.
|
||
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=400pt,bb=0 0 818 249,keepaspectratio=true]{./fmmd_concept/fmmd_env_op_uml.jpg}
|
||
% fmmd_env_op_uml.jpg: 818x249 pixel, 72dpi, 28.86x8.78 cm, bb=0 0 818 249
|
||
\caption{UML model of Environmental and Operational states w.r.t FMMD}
|
||
\label{fig:env_op_uml}
|
||
\end{figure}
|
||
|
||
|
||
|
||
\subsection{Justification of wishlist}
|
||
|
||
By applying the methodology in section \ref{fmmdproc}, the wishlist can
|
||
now be evaluated for the proposed FMMD methodology.
|
||
|
||
\subsubsection{All component failure modes must be considered in the model.}
|
||
The proposed methodology will be bottom-up.
|
||
This ensures that all component failure modes are handled.
|
||
|
||
|
||
\subsubsection{ It should be easy to integrate mechanical, electronic and software models.}
|
||
Because component failure modes are considered, we have a generic entity to model.
|
||
We can describe a mechanical, electrical or software component in terms of its failure modes.
|
||
%
|
||
Because of this
|
||
we can model and analyse integrated electro mechanical systems, controlled by computers,
|
||
using a common notation.
|
||
|
||
\subsubsection{ It should be re-usable, in that commonly used modules can be re-used in other designs/projects.}
|
||
The hierarchical nature, taking {\fg}s and deriving components from them, means that
|
||
commonly used {\dcs} can be re-used in a design (for instance self checking digital inputs)
|
||
or even in other projects where the same {\dc} is used.
|
||
|
||
|
||
|
||
\subsubsection{ It should have a formal basis, data should be available to produce mathematical proofs
|
||
for its results}
|
||
Because the failure mode of a SYSTEM is a hierarchy of {\fg}s and derived components
|
||
SYSTEM level failure modes are traceable back down the fault tree to
|
||
component level failure modes. This provides causation trees \cite{sccs} or, minimal cut sets
|
||
for all SYSTEM failure modes.
|
||
|
||
\subsubsection{ It should be capable of producing reliability and danger evaluation statistics.}
|
||
The minimal cuts sets for the SYSTEM level failures can have computed MTTF
|
||
and danger evaluation statistics sourced from the component failure mode statistics \cite {mil1991}.
|
||
|
||
\subsubsection{ It should be easy to use, ideally
|
||
using a graphical syntax (as opposed to a formal mathematical one).}
|
||
A modified form of constraint diagram (an extension of Euler diagrams) has
|
||
been developed to support the FMMD methodology.
|
||
This uses Euler circles to represent failure modes, and spiders to collect symptoms, to
|
||
advance a {\fg} to a {\dc}.
|
||
|
||
|
||
\subsubsection{ From the top down the failure mode model should follow a logical de-composition of the functionality
|
||
to smaller and smaller functional modules \cite{maikowski}.}
|
||
The bottom-up approach fulfils the logical de-composition requirement, because the {\fg}s
|
||
are built from components performing a given task.
|
||
|
||
|
||
\subsubsection{ Multiple failure modes may be modelled from the base component level up.}
|
||
By breaking the problem of failure mode analysis into small stages
|
||
and building a hierarchy, the problems associated with the cross products of
|
||
all failure modes within a system are reduced by an exponential order.
|
||
This is because the multiple failure modes are considered
|
||
within {\fgs} which have fewer failure modes to consider
|
||
at each FMMD stage.
|
||
Where appropriate, multiple simultaneous failures can be modelled by
|
||
introducing test~cases where the conjunction of failure modes is considered.
|
||
|
||
|
||
\begin{figure}
|
||
\centering
|
||
\begin{tikzpicture}[shorten >=1pt,->,draw=black!50, node distance=\layersep]
|
||
\draw[style=thick];
|
||
|
||
\tikzstyle{every pin edge}=[<-,shorten <=1pt]
|
||
\tikzstyle{fmmde}=[circle,fill=black!25,minimum size=17pt,inner sep=0pt]
|
||
\tikzstyle{fmmdt}=[elipse,fill=red!15,minimum size=17pt,inner sep=0pt]
|
||
\tikzstyle{fmmdc}=[rectangle,draw,fill=black!17,minimum size=17pt,inner sep=4pt]
|
||
\tikzstyle{fmmdi}=[regular polygon,regular polygon sides=6, draw],fill=black!25,minimum size=50,inner sep=4pt]
|
||
\tikzstyle{component}=[fmmde, fill=green!50];
|
||
\tikzstyle{ctext}=[fmmde, draw, fill=black!20];
|
||
\tikzstyle{failure}=[fmmde, fill=red!50];
|
||
\tikzstyle{symptom}=[fmmde, fill=blue!50];
|
||
\tikzstyle{inhibit}=[fmmdi, fill=blue!40];
|
||
\tikzstyle{condition}=[fmmdc, fill=black!20];
|
||
\tikzstyle{conjunction}=[fmmde, fill=red!40];
|
||
\tikzstyle{annot} = [text width=4em, text centered]
|
||
|
||
\node[condition] (C-Q) at (0,-1) {Condition Q};
|
||
\node[inhibit] (I) at (0,-4) {Inhibit};
|
||
\node[ctext] (CC) at (4,-4) {$\stackrel{ probability\; that}{ Q\; occurs\; given\; A}$};
|
||
%\node[text] (T) at (2,-2) {Probability that Q occurs given A};
|
||
\node[condition] (C-A) at (0,-7) {Condition A};
|
||
|
||
|
||
|
||
\path (C-A) edge (I);
|
||
\path (CC) edge (I);
|
||
\path (I) edge (C-Q);
|
||
%\path (C-1b) edge (CJ);
|
||
%\path (C-1b) edge (CJ);
|
||
|
||
\end{tikzpicture}
|
||
% End of code
|
||
\caption{FTA `inhibit' gate}
|
||
\label{fig:inhibitconcept}
|
||
\end{figure}
|
||
|
||
\subsubsection {Inhibit Conditions}
|
||
Some failure modes only occur when another failure has occurred, or
|
||
due to an environmental condition reaching a critical value. This is specifically
|
||
dealt with using the FTA methodology~\cite{nucfta}[IV 9].
|
||
An example FTA inhibit gate is shown in figure \ref{fig:inhibitconcept}.
|
||
\paragraph{Static or Dynamic Modelling of Inhibit}
|
||
If the model is static we can consider the conditional failure,
|
||
at a lower probability of occurring (i.e. the probability
|
||
of A multiplied by the probability of Q).
|
||
If we wish to dynamically model the conditional failure
|
||
an attribute to the failure~modes must be added
|
||
that can reference other failure~modes and environmental conditions.
|
||
A UML diagram with inhibit conditions added is shown in figure \ref{fig:umlconcept2}.
|
||
|
||
\subsection{Safe Dangerous, Detected and Undetected.}
|
||
|
||
The top level or SYSTEM failure modes can be examined and
|
||
assigned SIL~\cite{en61508} safe and dangerous attributes.
|
||
Detected failure modes appear as symptoms that have been
|
||
integrated into symptoms involving self checking.
|
||
Undetectable failure modes, will follow a direct line
|
||
up from component level to SYSTEM level without being
|
||
incorporated into a self checking functional group.
|
||
These undetected failures correspond to a minimal cut
|
||
set where a single base~component failure mode
|
||
can be traced to a SYSTEM level failure mode.
|
||
They can thus be determined by searching the DAG
|
||
for a single base~component failure mode minimal cut set~\cite{nucfta}.
|
||
|
||
% UML DIAGRAM
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=400pt,keepaspectratio=true]{./fmmd_concept/fmmd_env_op_uml2.jpg}
|
||
% fmmd_env_op_uml2.jpg: 866x313 pixel, 72dpi, 30.55x11.04 cm, bb=0 0 866 313
|
||
\caption{UML diagram with Inhibit conditions}
|
||
\label{fig:umlconcept2}
|
||
\end{figure}
|
||
|
||
|
||
\subsection{Aims of FMMD Methodology}
|
||
\label{sec:aims}
|
||
Taking the four current failure mode methodologies into consideration, and comparing them to the proposed FMMD methodology, the following wish list or aims can be stated.
|
||
|
||
\begin{itemize}
|
||
\item It can be checked automatically that all component failure modes have
|
||
been considered in the model. Should a failure mode have been missed
|
||
the data model can be searched and the unhandled failure modes flagged to the design engineer.
|
||
\item Because we are modelling with failure modes the {\fgs} and {\dcs} these can be generic,
|
||
i.e. mechanical, electronic or software components.
|
||
\item The {\dcs} are re-usable, in that commonly used modules can be re-used in other designs/projects.
|
||
\item It will have a formal basis, that is to say,
|
||
we have the data at hand to produce meaningful
|
||
results (MTTF and the cause trees for SYSTEM level faults).
|
||
\item Overall reliability and danger evaluation statistics can be computed.
|
||
By knowing all causation trees,
|
||
the statistical probabilities (from base component data) for all causes can be simply added.
|
||
\item A graphical representation based on Euler diagrams is used.
|
||
This provides an interface that does not involve
|
||
formal mathematical/symbolic notation.
|
||
This is intended to be user friendly and to guide the user through the FMMD process
|
||
while applying automatic checks for unhandled conditions.
|
||
\item From the top down the failure mode model will follow a logical de-composition of the functionality; by
|
||
chosing {\fg}s and working bottom-up this hierarchical trait will occur as a natural consequence.
|
||
\item Undetectable or unhandled failure modes will be specifically flagged.
|
||
\item It is possible to model multiple failure modes.
|
||
\end{itemize}
|
||
|
||
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
%paper
|
||
\pagebreak[4]
|
||
\section{Re-Factoring the UML Model}
|
||
The UML models thus far in this
|
||
have been used to develop the data relationships required to perform FMMD analysis.
|
||
This section re-organises and rationalises the UML model.
|
||
We want to be able to use {\dcs} in functional groups.
|
||
It therefore makes sense for {\dc} to inherit {\em component}.
|
||
|
||
The re-factored UML diagram is shown in figure \ref{fig:refactored_uml}.
|
||
|
||
|
||
\begin{figure}[h]
|
||
\centering
|
||
\includegraphics[width=400pt,bb=0 0 702 464]{./master_uml.jpg}
|
||
% master_uml.jpg: 702x464 pixel, 72dpi, 24.76x16.37 cm, bb=0 0 702 464
|
||
\caption{Re-factored UML Diagram}
|
||
\label{fig:refactored_uml}
|
||
\end{figure}
|
||
|
||
}
|
||
{
|
||
% chapter
|
||
\section{Re-Factoring the UML Model}
|
||
This chapter has used UML diagrams to develop the data types required to implement FMMD.
|
||
The terms used in FMMD and the UML data model are further refined in
|
||
chapter \ref{defs}.
|
||
}
|
||
|
||
\section{Conclusion}
|
||
|
||
This
|
||
\ifthenelse {\boolean{paper}}
|
||
{
|
||
paper
|
||
describes how the FMMD methodology
|
||
functions, given requirements and constraints such as the number of combinations
|
||
of failure causes.
|
||
It describes the need for the new methodology to be bottom-up, and
|
||
then the need for incremental modularisation
|
||
to build a fault mode hierarchy, which leads to the conceopt of functional grouping,
|
||
analysis of those groupings, and from that
|
||
the creation of derived components.
|
||
}
|
||
{
|
||
chapter
|
||
}
|
||
provides the background for the need for a new methodology for
|
||
static analysis that can span the mechanical electrical and software domains
|
||
using a common notation.
|
||
The author believes it addresses many short comings in current static failure mode analysis methodologies.
|
||
\vspace{60pt}
|
||
\today
|
||
|
||
%% $$\frac{-b\pm\sqrt{ {b^2-4ac}}}{2a}$$
|
||
%\today
|