morning edit

This commit is contained in:
Robin Clark 2010-11-04 08:05:52 +00:00
parent bf210d6bca
commit 53352f41d8

View File

@ -7,14 +7,14 @@
\abstract{ \abstract{
This paper proposes a methodology for This paper proposes a methodology for
creating failure mode models of safety critical systems, which creating failure mode models of safety critical systems, which
has a common notation have a common notation
for mechanical, electronic and software domains and apply an for mechanical, electronic and software domains and apply an
incremental and rigorous approach. incremental and rigorous approach.
%% What I have done %% What I have done
%% %%
The Four main static failure mode analysis methodologies were examined and The Four main static failure mode analysis methodologies were examined and
in in the context of newer European safety standards assessed. in the context of newer European safety standards assessed.
Some of the defeciencies in these methodologies lead to Some of the defeciencies in these methodologies lead to
a wish list for a more ideal methodology. a wish list for a more ideal methodology.
@ -27,7 +27,7 @@ methodology is developed. The has been named Failure Mode Modular De-Composition
%% Sell it %% Sell it
%% %%
In addition to addressing the traditional weaknesses of In addition to addressing the traditional weaknesses of
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Faliue Mode Effects Criticallity Analysis (FMECA) Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticallity Analysis (FMECA)
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios
as specified in newer European Safety Standards \cite{en298}. as specified in newer European Safety Standards \cite{en298}.
The proposed methodology is bottom-up and The proposed methodology is bottom-up and
@ -145,14 +145,14 @@ are held in a computer program, we can determine if the model is complete
\subsection{General Comments on bottom-up and top down approaches} \subsection{General Comments on bottom-up and top down approaches}
\paragraph{A general Problem with top-down} \paragraph{A general defeciency in top-down systems analysis}
With a top down approach the investigator has to determine With a top down approach the investigator has to determine
a set of undesireable outcomes or accidents. a set of undesirable outcomes or accidents.
As most accidents are unexpected and the causes unforseen \cite{safeware} As most accidents are unexpected and the causes unforseen \cite{safeware}
it is fair to say that a top down approach is not guaranteed to it is fair to say that a top down approach is not guaranteed to
predict all possible undesirable outcomes. predict all possible undesirable outcomes.
It also can miss known component failure modes, by It also can miss known component failure modes, by
simple not de-composing down to that level of detail. simply not de-composing down to that level of detail.
\paragraph{A general problem with bottom-up} \paragraph{A general problem with bottom-up}
With the bottom up techniques we have all the known component failure modes With the bottom up techniques we have all the known component failure modes
@ -165,22 +165,22 @@ we cannot consider them all and human judgement is used to
decide which interactions are important. decide which interactions are important.
Let N be the number of components in our system, and K be the average number of component failure modes Let N be the number of components in our system, and K be the average number of component failure modes
(ways in which the component can fail). The total number of base component failure modes (ways in which the component can fail). The total number of base comp failure modes
is $N \times K$. To even examine the affect that one failure mode has on all the other components is $N \times K$. To examine the affect that one failure mode has on all the other components
will be $(N-1) \times N \times K$, in effect a set cross product. will be $(N-1) \times N \times K$, in effect a set cross product.
Complicate this further with applied states or environmental conditions Complicate this further with applied states or environmental conditions
and another order of cross product of complexity is added. and another order of cross product of complexity is added.
We may have a peice of self checking circuity for instance that We may have a piece of self checking circuitry for instance that
has two states, normal and testing mode commanded by a logic line. has two states, normal and testing mode commanded by a logic line.
Or we may have a mechanical device that has a different Or we may have a mechanical device that has a different
failure mode behaviour for say, differnet ambient pressures or temperatures. failure mode behaviour for say, different ambient pressures or temperatures.
If $E$ is the number of applied states or environmental conditions to consider If $E$ is the number of applied states or environmental conditions to consider
in a system, the job of the bottom-up analyst is complicated by a cross product factor again in a system, the job of the bottom-up analyst is complicated by a cross product factor again
$(N-1) \times N \times K \times E$. $(N-1) \times N \times K \times E$.
If we put some typical very small embedded system numbersi\footnote{these figures would If we put some typical very small embedded system numbers\footnote{these figures would
be typical of a very simple temperature controller, with a micro-controller sensor and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$ be typical of a very simple temperature controller, with a micro-controller sensor and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$
we have $99 \times 100 \times 2.5 \times 10 = 247500 $. we have $99 \times 100 \times 2.5 \times 10 = 247500 $.
To look in detail at a quarter of a million test cases is obviously impractical. To look in detail at a quarter of a million test cases is obviously impractical.
@ -188,7 +188,7 @@ To look in detail at a quarter of a million test cases is obviously impractical.
If we were to consider multiple simultaneous failure modes If we were to consider multiple simultaneous failure modes
we have yet another complication cross product. we have yet another complication cross product.
For instance for looking at double simultaneous failure modes For instance for looking at double simultaneous failure modes,
the equation reads $(N-2) \times (N-1) \times N \times K \times E$. the equation reads $(N-2) \times (N-1) \times N \times K \times E$.
The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes and link them The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes and link them
@ -198,8 +198,8 @@ component failure mode to the SYSTEM level.
\paragraph{Ideal Static failure mode methodology} \paragraph{Ideal static failure mode methodology}
An ideal Static failure mode methodology would build a failure mode model An ideal static failure mode methodology would build a failure mode model
from which the traditional four models could be derived. from which the traditional four models could be derived.
It would address the short-comings in the other methodologies, and It would address the short-comings in the other methodologies, and
would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons
@ -217,7 +217,7 @@ of missing component failure modes \cite{faa}[Ch.9].
%, or modelling at %, or modelling at
%a too high level of failure mode abstraction. %a too high level of failure mode abstraction.
FTA was invented for use on the minuteman nuclear defence missile FTA was invented for use on the minuteman nuclear defence missile
systems in the early 1960's and was not designed as a rigorous systems in the early 1960s and was not designed as a rigorous
fault/failure mode methodology. It is more like a structure to fault/failure mode methodology. It is more like a structure to
be applied when discussing the safety of a system, with a top down hierarchical be applied when discussing the safety of a system, with a top down hierarchical
notation, that guides the analysis. This methodology was designed for notation, that guides the analysis. This methodology was designed for
@ -244,7 +244,7 @@ The investigation will typically point to a particular failure
of a component. of a component.
The methodology is now applied to find the significance of the failure. The methodology is now applied to find the significance of the failure.
Its is based on a simple equation where $S$ ranks the severity (or cost \cite{fmea}) of the identified SYSTEM failure, Its is based on a simple equation where $S$ ranks the severity (or cost \cite{fmea}) of the identified SYSTEM failure,
$O$ its occurrance, and $D$ giving the failures detectability. Muliplying these $O$ its occurance, and $D$ giving the failures detectability. Muliplying these
together, together,
gives a risk probability number (RPN), given by $RPN = S \times O \times D$. gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
This gives in effect This gives in effect
@ -286,7 +286,7 @@ The results, as with FMEA are an $RPN$ number determining the significance of th
%%-WIKI- while various forms of FMEA predominate in other industries. %%-WIKI- while various forms of FMEA predominate in other industries.
\subsubsection{ FMEA weaknesses } \subsubsection{ FMECA weaknesses }
\begin{itemize} \begin{itemize}
\item Possibility to miss the effects of failure modes at SYSTEM level. \item Possibility to miss the effects of failure modes at SYSTEM level.
\item Possibility to miss environmental affects. \item Possibility to miss environmental affects.
@ -314,7 +314,7 @@ The component may be mitigated by a vatriety of factors
\item Coverage of self checking \item Coverage of self checking
\end{itemize} \end{itemize}
Ultimately this tequnique calculates a risk factor for each component. Ultimately this technique calculates a risk factor for each component.
The risk factors of all the components are summed and The risk factors of all the components are summed and
give a value for the `safety level' for the equipment in a given environment. give a value for the `safety level' for the equipment in a given environment.
@ -327,7 +327,7 @@ give a value for the `safety level' for the equipment in a given environment.
%%-• The design strength (de-rating, safety factors) and %%-• The design strength (de-rating, safety factors) and
%%-• The operational profile (environmental stress factors). %%-• The operational profile (environmental stress factors).
This uses MTFF and other statisical models to determine the probability of This uses MTFF and other statistical models to determine the probability of
failures occurring. failures occurring.
% %
A component failure mode, given its MTTF A component failure mode, given its MTTF
@ -342,21 +342,29 @@ and other factors such as de-rating and environmental stress.
This can be calculated, with one component failure mode per row, on a spreadsheet This can be calculated, with one component failure mode per row, on a spreadsheet
and these are all summed to give the final assessment figure. and these are all summed to give the final assessment figure.
\paragraph{Two statistical perspectives} \subsubsection{Two statistical perspectives}
The Statistical Analysis method is used from two perspectives, he Statistical Analysis method is used from two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, Failure in Time (FIT) and measured in failures per billion ($10^9$) hours of operation. in continuous Operation, Failure in Time (FIT).
\paragraph{Failure in Time (FIT)}.
Continuous operation is measured in failures per billion ($10^9$) hours of operation.
For a continuously running nuclear powerstation
we would be interested in its operational FIT values.
\paragraph{Probability of Failure on Demand (PFD)}.
For instance with the anti-lock system on a automobile braking For instance with the anti-lock system on a automobile braking
system, we would be interested in PFD. system, we would be interested in PFD.
For a continuously running nuclear powerstation That is to say the ratio of it failing
we would be interested in its 24/7 operation FIT values. to succeeding on demand.
\subsubsection{FMEDA and determinability prediction accuracy}.
This suffers from the same problems of This suffers from the same problems of
lack of determinability prediction accuracy, as FMEA above. lack of determinability prediction accuracy, as FMEA above.
% %
We have to decide how particular components failing will impact on the SYSTEM or top level. We have to decide how particular components failing will impact on the SYSTEM or top level.
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
may be part of a critical monitioring function. may be part of a critical monitoring function.
The analyst is now put in a position The analyst is now put in a position
where he must assign a critical failure possibility to it. where he must assign a critical failure possibility to it.
% %
@ -365,10 +373,10 @@ of how that resistor would/could affect that circuit, but because the circuitry
it is part of critical section it will be linked to a critical system level fault. it is part of critical section it will be linked to a critical system level fault.
% %
A $\beta$ factor, the hueristically defined probability A $\beta$ factor, the hueristically defined probability
of the failure causign the system fault may be applied. of the failure causing the system fault may be applied.
% %
But because there is no detailed analysis of the failure mode behaviour But because there is no detailed analysis of the failure mode behaviour
of the component, traceable to the SYSTEM level, it becomnes more of the component, traceable to the SYSTEM level, it becomes more
guess work than science. guess work than science.
With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side
effects that lead to failure can be missed. effects that lead to failure can be missed.
@ -405,6 +413,10 @@ for its results.
\item It should be capable of producing reliability and danger evaluation statistics. \item It should be capable of producing reliability and danger evaluation statistics.
\item It should be easy to use, Ideally using a graphical syntax (as oppossed to a formal mathematical one). \item It should be easy to use, Ideally using a graphical syntax (as oppossed to a formal mathematical one).
\item From the top down, the failure mode model should follow a logical de-composition of the functionality \item From the top down, the failure mode model should follow a logical de-composition of the functionality
for its results.
\item It should be capable of producing reliability and danger evaluation statistics.
\item It should be easy to use, ideally using a graphical syntax (as oppossed to a formal mathematical one).
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
to smaller and smaller functional modules \cite{maikowski}. to smaller and smaller functional modules \cite{maikowski}.
\item Multiple failure modes may be modelled from the base component level up. \item Multiple failure modes may be modelled from the base component level up.
\end{itemize} \end{itemize}
@ -412,7 +424,7 @@ to smaller and smaller functional modules \cite{maikowski}.
\section{Design of a new static failure mode based methodology} \section{Design of a new static failure mode based methodology}
\paragraph{New methodology Must be bottom-up} \paragraph{New methodology must be bottom-up}
In order to ensure that all component failure modes have been covered In order to ensure that all component failure modes have been covered
the methodology will have to work from the bottom-up the methodology will have to work from the bottom-up
and start with the component failure modes. and start with the component failure modes.
@ -422,7 +434,7 @@ The traditional fault finding, or natural fault finding
is to work from the top down. is to work from the top down.
% %
On encountering a On encountering a
fault, the symptom is first know at the top or fault, the symptom is first observed at the top or
SYSTEM level. By de-composing the functionality of the faulty system and testing SYSTEM level. By de-composing the functionality of the faulty system and testing
we can further de-compose the system until we find the we can further de-compose the system until we find the
faulty base level component. faulty base level component.
@ -432,10 +444,10 @@ Simpler and simpler functional blocks are discovered as we delve
further into the way the system works and is built. further into the way the system works and is built.
\paragraph{Design Decision: Methodology must be bottom-up.} \paragraph{Design Decision: Methodology must be bottom-up.}
In oder to ensure that all component failure modes are handled, In order to ensure that all component failure modes are handled,
this methodology must start at the bottom, with base component failure modes. this methodology must start at the bottom, with base component failure modes.
In this way automated checking can be applied to all component failure modes In this way automated checking can be applied to all component failure modes
to ensure none have been inadvertantly excluded from the process. to ensure none have been inadvertently excluded from the process.
\paragraph{Need for a `bottom-up' system de-composition} \paragraph{Need for a `bottom-up' system de-composition}
There is an apparent conflict here. The natural way to There is an apparent conflict here. The natural way to
@ -450,7 +462,7 @@ and then taking those to form higher level
The philosophy of top down de-compositon is very similar. The philosophy of top down de-compositon is very similar.
Top down de-compositon applies functional Top down de-compositon applies functional
de-composition, because it seeks to break the system down de-composition, because it seeks to break the system down
into manageable and separatetly testable entities. into manageable and separately testable entities.
A second justification for this is that the design process for a product requires both top down and bottom-up A second justification for this is that the design process for a product requires both top down and bottom-up
thinking. thinking.
@ -463,17 +475,21 @@ The base components will typically have several failure modes each.
Given a typical embedded system may have hundreds of components Given a typical embedded system may have hundreds of components
This means that we have to tie base component failure modes This means that we have to tie base component failure modes
to SYSTEM level errors. This is the `possibility to miss failure mode effects to SYSTEM level errors. This is the `possibility to miss failure mode effects
at SYSTEM level' critism of the FTA, FMEDA and FMECA methodologies. at SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
\paragraph{Design Decision: Methodolgy must reduce and collate errors at each functional group stage.} \paragraph{Design Decision: Methodolgy must reduce and collate errors at each functional group stage.}
SYSTEMS typically have far fewer failure modes then the sum of their component failure modes. SYSTEMS typically have far fewer failure modes than the sum of their component failure modes.
SYSTEM level failures may be caused by a variety of component failure modes. SYSTEM level failures may be caused by a variety of component failure modes.
A SYSTEM level failure mode is an abstracted failure mode, in that A SYSTEM level failure mode is an abstracted failure mode, in that
it is a symptom of some lower level failure or failures. it is a symptom of some lower level failure or failures.
% ABSTRACTION % ABSTRACTION
For instance a failed resistor in a sensor at a base component level is a specific For instance a failed resistor in a sensor at a base component level is a specific
failure mode. For example it could be called `RESISTOR 1 OPEN'. failure mode.
Its symptom in a functional group comprising the sensor channel that reads from it may be more abstract. %
For example it could be called `RESISTOR 1 OPEN'.
Its symptom in a functional group comprising the sensor channel that reads from it may be more abstract
or in other words describe the effect more generally.
%
We might call it `READING~HIGH' perhaps. At a higher level still We might call it `READING~HIGH' perhaps. At a higher level still
this may be called `SENSOR CHANNEL 1' fault. this may be called `SENSOR CHANNEL 1' fault.
At a system level it may simply be a `SENSOR FAILURE'. At a system level it may simply be a `SENSOR FAILURE'.
@ -489,7 +505,7 @@ of failure modes as the abstraction level reaches the SYSTEM level.
The next problem is how to we build a failure mode model The next problem is how to we build a failure mode model
that converges to a finite set of SYSTEM level failure modes. that converges to a finite set of SYSTEM level failure modes.
% %
It would be better would be to analyse the failure mode behaviour of each It would be better to analyse the failure mode behaviour of each
functional group, and determine the ways in which it, rather than its functional group, and determine the ways in which it, rather than its
components, can fail. components, can fail.
% %
@ -506,7 +522,7 @@ The number of symptoms of failure should be equal to or
less than the number of component failure modes, simply because less than the number of component failure modes, simply because
often there are several potential causes of failure symptoms. often there are several potential causes of failure symptoms.
% %
When we have the the symptoms, we can start thinking of the {\fg} as a component in its own right. When we have the symptoms, we can start thinking of the {\fg} as a component in its own right.
%with a simplified and reduced set of failure symptoms. %with a simplified and reduced set of failure symptoms.
% %
We can now create a new {\dc}, where its failure modes We can now create a new {\dc}, where its failure modes
@ -538,7 +554,7 @@ are the same failure w.r.t. the {\fg}.
% %
We can now treat the {\fg} as a component, and call it a {\dc}, in other words, a sub-system with a known set of failure modes. We can now treat the {\fg} as a component, and call it a {\dc}, in other words, a sub-system with a known set of failure modes.
% %
We can now create a new{\dc} and assign it these common symptoms We can now create a new {\dc} and assign it these common symptoms
as its failure modes. as its failure modes.
% %
This {\dc} can be used to build higher level This {\dc} can be used to build higher level
@ -548,7 +564,7 @@ an entire system. It can be considered complete when
all failure modes from all components are handled all failure modes from all components are handled
and connectable to a SYSTEM level failure mode. and connectable to a SYSTEM level failure mode.
\paragraph{Directed Acyclic Graph}. This will naturally form a DAG \paragraph{Directed Acyclic Graph.} This will naturally form a DAG
meaning that for all SYSTEM failure modes, we will be able to trace meaning that for all SYSTEM failure modes, we will be able to trace
back through the DAG to possible component failure mode causes. back through the DAG to possible component failure mode causes.
If statistical models exist for the component failure modes If statistical models exist for the component failure modes
@ -577,7 +593,7 @@ We can then treat the {\fg} as a `black box' or component in its own right.
We can now look at how the {\fg} can fail. We can now look at how the {\fg} can fail.
% %
Many of the component failure modes will Many of the component failure modes will
cause the same failure symptoms in the {fg} failure behaviour. cause the same failure symptoms in the {\fg} failure behaviour.
We can collect these failures as common symptoms. We can collect these failures as common symptoms.
% %
When we have our set of symptoms, we can now create When we have our set of symptoms, we can now create
@ -605,8 +621,8 @@ This ensures that all component failure modes are handled.
\subsubsection{ It should be easy to integrate mechanical, electronic and software models.} \subsubsection{ It should be easy to integrate mechanical, electronic and software models.}
Because component failure modes are considered, we have a generic enitity to model. Because component failure modes are considered, we have a generic entity to model.
We can describe a mecanical, electrical or software component in terms of its failure modes. We can describe a mechanical, electrical or software component in terms of its failure modes.
% %
Because of this Because of this
we can model and analyse integrated electro mechanical systems, controlled by computers, we can model and analyse integrated electro mechanical systems, controlled by computers,
@ -670,7 +686,7 @@ chosing {\fg}s and working bottom-up this hierarchical trait will occur as a nat
\section{Conclusion} \section{Conclusion}
This paper provides the backgroud for the need for a new methodology for This paper provides the background for the need for a new methodology for
static analysis that can span the mechanical electrical and software domains static analysis that can span the mechanical electrical and software domains
using a common notation. using a common notation.
\vspace{60pt} \vspace{60pt}