morning edit
This commit is contained in:
parent
bf210d6bca
commit
53352f41d8
@ -7,14 +7,14 @@
|
||||
\abstract{
|
||||
This paper proposes a methodology for
|
||||
creating failure mode models of safety critical systems, which
|
||||
has a common notation
|
||||
have a common notation
|
||||
for mechanical, electronic and software domains and apply an
|
||||
incremental and rigorous approach.
|
||||
|
||||
%% What I have done
|
||||
%%
|
||||
The Four main static failure mode analysis methodologies were examined and
|
||||
in in the context of newer European safety standards assessed.
|
||||
in the context of newer European safety standards assessed.
|
||||
Some of the defeciencies in these methodologies lead to
|
||||
a wish list for a more ideal methodology.
|
||||
|
||||
@ -27,7 +27,7 @@ methodology is developed. The has been named Failure Mode Modular De-Composition
|
||||
%% Sell it
|
||||
%%
|
||||
In addition to addressing the traditional weaknesses of
|
||||
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Faliue Mode Effects Criticallity Analysis (FMECA)
|
||||
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticallity Analysis (FMECA)
|
||||
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
|
||||
as specified in newer European Safety Standards \cite{en298}.
|
||||
The proposed methodology is bottom-up and
|
||||
@ -145,14 +145,14 @@ are held in a computer program, we can determine if the model is complete
|
||||
|
||||
\subsection{General Comments on bottom-up and top down approaches}
|
||||
|
||||
\paragraph{A general Problem with top-down}
|
||||
\paragraph{A general defeciency in top-down systems analysis}
|
||||
With a top down approach the investigator has to determine
|
||||
a set of undesireable outcomes or accidents.
|
||||
a set of undesirable outcomes or accidents.
|
||||
As most accidents are unexpected and the causes unforseen \cite{safeware}
|
||||
it is fair to say that a top down approach is not guaranteed to
|
||||
predict all possible undesirable outcomes.
|
||||
It also can miss known component failure modes, by
|
||||
simple not de-composing down to that level of detail.
|
||||
simply not de-composing down to that level of detail.
|
||||
|
||||
\paragraph{A general problem with bottom-up}
|
||||
With the bottom up techniques we have all the known component failure modes
|
||||
@ -165,22 +165,22 @@ we cannot consider them all and human judgement is used to
|
||||
decide which interactions are important.
|
||||
|
||||
Let N be the number of components in our system, and K be the average number of component failure modes
|
||||
(ways in which the component can fail). The total number of base component failure modes
|
||||
is $N \times K$. To even examine the affect that one failure mode has on all the other components
|
||||
(ways in which the component can fail). The total number of base comp failure modes
|
||||
is $N \times K$. To examine the affect that one failure mode has on all the other components
|
||||
will be $(N-1) \times N \times K$, in effect a set cross product.
|
||||
|
||||
|
||||
Complicate this further with applied states or environmental conditions
|
||||
and another order of cross product of complexity is added.
|
||||
We may have a peice of self checking circuity for instance that
|
||||
We may have a piece of self checking circuitry for instance that
|
||||
has two states, normal and testing mode commanded by a logic line.
|
||||
Or we may have a mechanical device that has a different
|
||||
failure mode behaviour for say, differnet ambient pressures or temperatures.
|
||||
failure mode behaviour for say, different ambient pressures or temperatures.
|
||||
|
||||
If $E$ is the number of applied states or environmental conditions to consider
|
||||
in a system, the job of the bottom-up analyst is complicated by a cross product factor again
|
||||
$(N-1) \times N \times K \times E$.
|
||||
If we put some typical very small embedded system numbersi\footnote{these figures would
|
||||
If we put some typical very small embedded system numbers\footnote{these figures would
|
||||
be typical of a very simple temperature controller, with a micro-controller sensor and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$
|
||||
we have $99 \times 100 \times 2.5 \times 10 = 247500 $.
|
||||
To look in detail at a quarter of a million test cases is obviously impractical.
|
||||
@ -188,7 +188,7 @@ To look in detail at a quarter of a million test cases is obviously impractical.
|
||||
If we were to consider multiple simultaneous failure modes
|
||||
we have yet another complication cross product.
|
||||
|
||||
For instance for looking at double simultaneous failure modes
|
||||
For instance for looking at double simultaneous failure modes,
|
||||
the equation reads $(N-2) \times (N-1) \times N \times K \times E$.
|
||||
|
||||
The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes and link them
|
||||
@ -198,8 +198,8 @@ component failure mode to the SYSTEM level.
|
||||
|
||||
|
||||
|
||||
\paragraph{Ideal Static failure mode methodology}
|
||||
An ideal Static failure mode methodology would build a failure mode model
|
||||
\paragraph{Ideal static failure mode methodology}
|
||||
An ideal static failure mode methodology would build a failure mode model
|
||||
from which the traditional four models could be derived.
|
||||
It would address the short-comings in the other methodologies, and
|
||||
would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons
|
||||
@ -217,7 +217,7 @@ of missing component failure modes \cite{faa}[Ch.9].
|
||||
%, or modelling at
|
||||
%a too high level of failure mode abstraction.
|
||||
FTA was invented for use on the minuteman nuclear defence missile
|
||||
systems in the early 1960's and was not designed as a rigorous
|
||||
systems in the early 1960s and was not designed as a rigorous
|
||||
fault/failure mode methodology. It is more like a structure to
|
||||
be applied when discussing the safety of a system, with a top down hierarchical
|
||||
notation, that guides the analysis. This methodology was designed for
|
||||
@ -244,7 +244,7 @@ The investigation will typically point to a particular failure
|
||||
of a component.
|
||||
The methodology is now applied to find the significance of the failure.
|
||||
Its is based on a simple equation where $S$ ranks the severity (or cost \cite{fmea}) of the identified SYSTEM failure,
|
||||
$O$ its occurrance, and $D$ giving the failures detectability. Muliplying these
|
||||
$O$ its occurance, and $D$ giving the failures detectability. Muliplying these
|
||||
together,
|
||||
gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
|
||||
This gives in effect
|
||||
@ -286,7 +286,7 @@ The results, as with FMEA are an $RPN$ number determining the significance of th
|
||||
%%-WIKI- while various forms of FMEA predominate in other industries.
|
||||
|
||||
|
||||
\subsubsection{ FMEA weaknesses }
|
||||
\subsubsection{ FMECA weaknesses }
|
||||
\begin{itemize}
|
||||
\item Possibility to miss the effects of failure modes at SYSTEM level.
|
||||
\item Possibility to miss environmental affects.
|
||||
@ -314,7 +314,7 @@ The component may be mitigated by a vatriety of factors
|
||||
\item Coverage of self checking
|
||||
\end{itemize}
|
||||
|
||||
Ultimately this tequnique calculates a risk factor for each component.
|
||||
Ultimately this technique calculates a risk factor for each component.
|
||||
The risk factors of all the components are summed and
|
||||
give a value for the `safety level' for the equipment in a given environment.
|
||||
|
||||
@ -327,7 +327,7 @@ give a value for the `safety level' for the equipment in a given environment.
|
||||
%%-• The design strength (de-rating, safety factors) and
|
||||
%%-• The operational profile (environmental stress factors).
|
||||
|
||||
This uses MTFF and other statisical models to determine the probability of
|
||||
This uses MTFF and other statistical models to determine the probability of
|
||||
failures occurring.
|
||||
%
|
||||
A component failure mode, given its MTTF
|
||||
@ -342,21 +342,29 @@ and other factors such as de-rating and environmental stress.
|
||||
This can be calculated, with one component failure mode per row, on a spreadsheet
|
||||
and these are all summed to give the final assessment figure.
|
||||
|
||||
\paragraph{Two statistical perspectives}
|
||||
The Statistical Analysis method is used from two perspectives,
|
||||
\subsubsection{Two statistical perspectives}
|
||||
he Statistical Analysis method is used from two perspectives,
|
||||
Probability of Failure on Demand (PFD), and Probability of Failure
|
||||
in continuous Operation, Failure in Time (FIT) and measured in failures per billion ($10^9$) hours of operation.
|
||||
in continuous Operation, Failure in Time (FIT).
|
||||
\paragraph{Failure in Time (FIT)}.
|
||||
|
||||
Continuous operation is measured in failures per billion ($10^9$) hours of operation.
|
||||
For a continuously running nuclear powerstation
|
||||
we would be interested in its operational FIT values.
|
||||
|
||||
\paragraph{Probability of Failure on Demand (PFD)}.
|
||||
For instance with the anti-lock system on a automobile braking
|
||||
system, we would be interested in PFD.
|
||||
For a continuously running nuclear powerstation
|
||||
we would be interested in its 24/7 operation FIT values.
|
||||
That is to say the ratio of it failing
|
||||
to succeeding on demand.
|
||||
|
||||
\subsubsection{FMEDA and determinability prediction accuracy}.
|
||||
This suffers from the same problems of
|
||||
lack of determinability prediction accuracy, as FMEA above.
|
||||
%
|
||||
We have to decide how particular components failing will impact on the SYSTEM or top level.
|
||||
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
|
||||
may be part of a critical monitioring function.
|
||||
may be part of a critical monitoring function.
|
||||
The analyst is now put in a position
|
||||
where he must assign a critical failure possibility to it.
|
||||
%
|
||||
@ -365,10 +373,10 @@ of how that resistor would/could affect that circuit, but because the circuitry
|
||||
it is part of critical section it will be linked to a critical system level fault.
|
||||
%
|
||||
A $\beta$ factor, the hueristically defined probability
|
||||
of the failure causign the system fault may be applied.
|
||||
of the failure causing the system fault may be applied.
|
||||
%
|
||||
But because there is no detailed analysis of the failure mode behaviour
|
||||
of the component, traceable to the SYSTEM level, it becomnes more
|
||||
of the component, traceable to the SYSTEM level, it becomes more
|
||||
guess work than science.
|
||||
With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side
|
||||
effects that lead to failure can be missed.
|
||||
@ -405,6 +413,10 @@ for its results.
|
||||
\item It should be capable of producing reliability and danger evaluation statistics.
|
||||
\item It should be easy to use, Ideally using a graphical syntax (as oppossed to a formal mathematical one).
|
||||
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
|
||||
for its results.
|
||||
\item It should be capable of producing reliability and danger evaluation statistics.
|
||||
\item It should be easy to use, ideally using a graphical syntax (as oppossed to a formal mathematical one).
|
||||
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
|
||||
to smaller and smaller functional modules \cite{maikowski}.
|
||||
\item Multiple failure modes may be modelled from the base component level up.
|
||||
\end{itemize}
|
||||
@ -412,7 +424,7 @@ to smaller and smaller functional modules \cite{maikowski}.
|
||||
|
||||
\section{Design of a new static failure mode based methodology}
|
||||
|
||||
\paragraph{New methodology Must be bottom-up}
|
||||
\paragraph{New methodology must be bottom-up}
|
||||
In order to ensure that all component failure modes have been covered
|
||||
the methodology will have to work from the bottom-up
|
||||
and start with the component failure modes.
|
||||
@ -422,7 +434,7 @@ The traditional fault finding, or natural fault finding
|
||||
is to work from the top down.
|
||||
%
|
||||
On encountering a
|
||||
fault, the symptom is first know at the top or
|
||||
fault, the symptom is first observed at the top or
|
||||
SYSTEM level. By de-composing the functionality of the faulty system and testing
|
||||
we can further de-compose the system until we find the
|
||||
faulty base level component.
|
||||
@ -432,10 +444,10 @@ Simpler and simpler functional blocks are discovered as we delve
|
||||
further into the way the system works and is built.
|
||||
|
||||
\paragraph{Design Decision: Methodology must be bottom-up.}
|
||||
In oder to ensure that all component failure modes are handled,
|
||||
In order to ensure that all component failure modes are handled,
|
||||
this methodology must start at the bottom, with base component failure modes.
|
||||
In this way automated checking can be applied to all component failure modes
|
||||
to ensure none have been inadvertantly excluded from the process.
|
||||
to ensure none have been inadvertently excluded from the process.
|
||||
|
||||
\paragraph{Need for a `bottom-up' system de-composition}
|
||||
There is an apparent conflict here. The natural way to
|
||||
@ -450,7 +462,7 @@ and then taking those to form higher level
|
||||
The philosophy of top down de-compositon is very similar.
|
||||
Top down de-compositon applies functional
|
||||
de-composition, because it seeks to break the system down
|
||||
into manageable and separatetly testable entities.
|
||||
into manageable and separately testable entities.
|
||||
A second justification for this is that the design process for a product requires both top down and bottom-up
|
||||
thinking.
|
||||
|
||||
@ -463,17 +475,21 @@ The base components will typically have several failure modes each.
|
||||
Given a typical embedded system may have hundreds of components
|
||||
This means that we have to tie base component failure modes
|
||||
to SYSTEM level errors. This is the `possibility to miss failure mode effects
|
||||
at SYSTEM level' critism of the FTA, FMEDA and FMECA methodologies.
|
||||
at SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
|
||||
|
||||
\paragraph{Design Decision: Methodolgy must reduce and collate errors at each functional group stage.}
|
||||
SYSTEMS typically have far fewer failure modes then the sum of their component failure modes.
|
||||
SYSTEMS typically have far fewer failure modes than the sum of their component failure modes.
|
||||
SYSTEM level failures may be caused by a variety of component failure modes.
|
||||
A SYSTEM level failure mode is an abstracted failure mode, in that
|
||||
it is a symptom of some lower level failure or failures.
|
||||
% ABSTRACTION
|
||||
For instance a failed resistor in a sensor at a base component level is a specific
|
||||
failure mode. For example it could be called `RESISTOR 1 OPEN'.
|
||||
Its symptom in a functional group comprising the sensor channel that reads from it may be more abstract.
|
||||
failure mode.
|
||||
%
|
||||
For example it could be called `RESISTOR 1 OPEN'.
|
||||
Its symptom in a functional group comprising the sensor channel that reads from it may be more abstract
|
||||
or in other words describe the effect more generally.
|
||||
%
|
||||
We might call it `READING~HIGH' perhaps. At a higher level still
|
||||
this may be called `SENSOR CHANNEL 1' fault.
|
||||
At a system level it may simply be a `SENSOR FAILURE'.
|
||||
@ -489,7 +505,7 @@ of failure modes as the abstraction level reaches the SYSTEM level.
|
||||
The next problem is how to we build a failure mode model
|
||||
that converges to a finite set of SYSTEM level failure modes.
|
||||
%
|
||||
It would be better would be to analyse the failure mode behaviour of each
|
||||
It would be better to analyse the failure mode behaviour of each
|
||||
functional group, and determine the ways in which it, rather than its
|
||||
components, can fail.
|
||||
%
|
||||
@ -506,7 +522,7 @@ The number of symptoms of failure should be equal to or
|
||||
less than the number of component failure modes, simply because
|
||||
often there are several potential causes of failure symptoms.
|
||||
%
|
||||
When we have the the symptoms, we can start thinking of the {\fg} as a component in its own right.
|
||||
When we have the symptoms, we can start thinking of the {\fg} as a component in its own right.
|
||||
%with a simplified and reduced set of failure symptoms.
|
||||
%
|
||||
We can now create a new {\dc}, where its failure modes
|
||||
@ -548,7 +564,7 @@ an entire system. It can be considered complete when
|
||||
all failure modes from all components are handled
|
||||
and connectable to a SYSTEM level failure mode.
|
||||
|
||||
\paragraph{Directed Acyclic Graph}. This will naturally form a DAG
|
||||
\paragraph{Directed Acyclic Graph.} This will naturally form a DAG
|
||||
meaning that for all SYSTEM failure modes, we will be able to trace
|
||||
back through the DAG to possible component failure mode causes.
|
||||
If statistical models exist for the component failure modes
|
||||
@ -577,7 +593,7 @@ We can then treat the {\fg} as a `black box' or component in its own right.
|
||||
We can now look at how the {\fg} can fail.
|
||||
%
|
||||
Many of the component failure modes will
|
||||
cause the same failure symptoms in the {fg} failure behaviour.
|
||||
cause the same failure symptoms in the {\fg} failure behaviour.
|
||||
We can collect these failures as common symptoms.
|
||||
%
|
||||
When we have our set of symptoms, we can now create
|
||||
@ -605,8 +621,8 @@ This ensures that all component failure modes are handled.
|
||||
|
||||
|
||||
\subsubsection{ It should be easy to integrate mechanical, electronic and software models.}
|
||||
Because component failure modes are considered, we have a generic enitity to model.
|
||||
We can describe a mecanical, electrical or software component in terms of its failure modes.
|
||||
Because component failure modes are considered, we have a generic entity to model.
|
||||
We can describe a mechanical, electrical or software component in terms of its failure modes.
|
||||
%
|
||||
Because of this
|
||||
we can model and analyse integrated electro mechanical systems, controlled by computers,
|
||||
@ -670,7 +686,7 @@ chosing {\fg}s and working bottom-up this hierarchical trait will occur as a nat
|
||||
|
||||
\section{Conclusion}
|
||||
|
||||
This paper provides the backgroud for the need for a new methodology for
|
||||
This paper provides the background for the need for a new methodology for
|
||||
static analysis that can span the mechanical electrical and software domains
|
||||
using a common notation.
|
||||
\vspace{60pt}
|
||||
|
Loading…
Reference in New Issue
Block a user