Lunchtime, Andrew Fish Comments from weekend.
This commit is contained in:
parent
4da27903f4
commit
14a3dc4c34
@ -16,7 +16,7 @@ incremental and rigorous approach.
|
||||
The four main static failure mode analysis methodologies were examined and
|
||||
in the context of newer European safety standards, assessed.
|
||||
Some of the deficiencies identified in these methodologies lead to
|
||||
a wish list for a more ideal methodology.
|
||||
a wish list for a more rigorous methodology.
|
||||
|
||||
%% What I have found
|
||||
%%
|
||||
@ -24,7 +24,8 @@ From the wish list
|
||||
%and considering some constraints determined from
|
||||
%the evaluation of the four established methodologies,
|
||||
a new
|
||||
methodology is developed and proposed. The has been named Failure Mode Modular De-Composition (FMMD).
|
||||
methodology is developed and proposed.
|
||||
This has been named Failure Mode Modular De-Composition (FMMD).
|
||||
|
||||
%% Sell it
|
||||
%%
|
||||
@ -58,7 +59,8 @@ From the wish list %
|
||||
%and considering some constraints determined from
|
||||
%the evaluation of the four established methodologies,
|
||||
a new
|
||||
methodology is developed and proposed. The has been named Failure Mode Modular De-Composition (FMMD).
|
||||
methodology is developed and proposed.
|
||||
This has been named Failure Mode Modular De-Composition (FMMD).
|
||||
|
||||
%% Sell it
|
||||
%%
|
||||
@ -112,7 +114,7 @@ ensuring that all component failure modes must be considered in the model.
|
||||
%
|
||||
\paragraph{FMMD Process outline.}
|
||||
This methodology has been named Failure Mode Modular De-composition (FMMD)
|
||||
because it de-composes a SYSTEM into a hierarchy of modules or {\dc}s.
|
||||
because it decomposes a SYSTEM into a hierarchy of modules or {\dc}s.
|
||||
This
|
||||
\ifthenelse {\boolean{paper}}
|
||||
{
|
||||
@ -133,10 +135,13 @@ is determined.
|
||||
%
|
||||
FMMD works from the bottom up, taking small groups
|
||||
of components, {\fgs}, and then analysing how they can fail.
|
||||
\input{./shortfg}
|
||||
|
||||
\paragraph{Micro Vs. Macro failure mode analysis.}
|
||||
This analysis is performed using FMEA from a micro rather than a macro perspective.
|
||||
Thus instead of looking at component failure modes and determining how
|
||||
they {\em may} cause a failure at SYSTEM level, we are looking at how
|
||||
they {\em will} affect the components local {\fg}.
|
||||
they {\em will} affect the component's local {\fg}.
|
||||
When we know the failure modes of a {\fg} we can treat it as a `black box'
|
||||
or {\dc}. With {\dc}s we can build {\fgs}
|
||||
at higher levels of analysis, until we have a complete
|
||||
@ -168,8 +173,8 @@ a set of undesirable outcomes or `accidents'.
|
||||
As most accidents are unexpected and the causes unforeseen \cite{safeware}
|
||||
it is fair to say that a top down approach is not guaranteed to
|
||||
predict all possible undesirable outcomes.
|
||||
It also can miss known component failure modes, by
|
||||
simply not de-composing down to the base component failure level of detail.
|
||||
Top-down methodologies can miss known component failure modes, by
|
||||
simply not decomposing down to the base component failure level of detail.
|
||||
|
||||
\paragraph{A general problem with bottom-up static failure analysis.}
|
||||
With the bottom up techniques we have all the known component failure modes
|
||||
@ -177,25 +182,29 @@ and the relative freedom to determine how each of these may affect the SYSTEM.
|
||||
%
|
||||
A problem with this is that a component typically
|
||||
interacts in a complex way with several other functionally
|
||||
adjacent components
|
||||
adjacent components.
|
||||
%
|
||||
To take a component failure mode and then attempt to tie that
|
||||
to a SYSTEM level outcome is very difficult.
|
||||
%
|
||||
The difficulty lies in
|
||||
%
|
||||
%Because of
|
||||
the number of components
|
||||
our failure mode under investigation may interact with is typically very large.
|
||||
The number of components
|
||||
a failure mode under investigation might interact with is typically very large.
|
||||
This makes it very difficult to predict the effects of a component
|
||||
failure mode, because we have to decide which components it could affect,
|
||||
or
|
||||
in other words, which components are functionally adjacent to it.
|
||||
%
|
||||
We cannot consider all the components in the SYSTEM
|
||||
when looking at a single failure mode,
|
||||
and human judgement must be used to
|
||||
and therefore human judgement must be used to
|
||||
decide which interactions could be important.
|
||||
|
||||
Let N be the number of components in our system, and K be the average number of component failure modes
|
||||
(ways in which the component can fail). The total number of base component failure modes
|
||||
is $N \times K$. To examine the effect that one failure mode has on all the other components
|
||||
(ways in which the base~component can fail). The total number of base component failure modes
|
||||
is $N \times K$. To examine the effect that one failure mode has on all
|
||||
the other components\footnote{A base component failure will typically affect the sub-system
|
||||
it is part of, and create a failure effect at the SYSTEM level.}
|
||||
will be $(N-1) \times N \times K$, in effect a set cross product.
|
||||
|
||||
|
||||
@ -207,18 +216,21 @@ Or we may have a mechanical device that has a different
|
||||
failure mode behaviour for say, different ambient pressures or temperatures.
|
||||
|
||||
If $E$ is the number of applied states or environmental conditions to consider
|
||||
in a system, the job of the bottom-up analyst is complicated by a cross product factor again
|
||||
in a system, the job of the bottom-up analyst is presented with an
|
||||
additional cross product factor,
|
||||
$(N-1) \times N \times K \times E$.
|
||||
If we put some typical very small embedded system numbers\footnote{these figures would
|
||||
be typical of a very simple temperature controller, with a micro-controller sensor and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$
|
||||
be typical of a very simple temperature controller, with a micro-controller sensor
|
||||
and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$
|
||||
we have $99 \times 100 \times 2.5 \times 10 = 247500 $.
|
||||
To look in detail at a quarter of a million test cases is obviously impractical.
|
||||
|
||||
If we were to consider multiple simultaneous failure modes,
|
||||
we have yet another complication cross product.
|
||||
we have yet another cross product of checks to be performed.
|
||||
|
||||
For instance for looking at double simultaneous failure modes,
|
||||
the equation reads $(N-2) \times (N-1) \times N \times K \times E$.
|
||||
For instance for looking at double simultaneous failure modes, where $\#C$
|
||||
is the number of checks to perform
|
||||
the equation reads $\#C = (N-2) \times (N-1) \times N \times K \times E$.
|
||||
|
||||
The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes and link them
|
||||
to SYSTEM level failure modes. Because of the astronomical number of possible interactions,
|
||||
@ -232,7 +244,7 @@ component failure mode to the SYSTEM level).
|
||||
An ideal static failure mode methodology would build a failure mode model
|
||||
from which the traditional four models could be derived.
|
||||
It would address the short-comings in the other methodologies, and
|
||||
would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons
|
||||
would have a user friendly interface, with a visual (rather than symbolic) syntax with icons
|
||||
to represent the results of analysis phases.
|
||||
%
|
||||
%There are four static analysis failure mode methodologies in common use.
|
||||
@ -251,7 +263,7 @@ systems in the early 1960s and was not designed as a rigorous
|
||||
fault/failure mode methodology.
|
||||
It was designed to look for disastrous top level hazards and
|
||||
determine how they could be caused.
|
||||
It is more like a structure to
|
||||
It is more like a procedure to
|
||||
be applied when discussing the safety of a system, with a top down hierarchical
|
||||
notation using logic symbols, that guides the analysis.
|
||||
This methodology was designed for
|
||||
@ -265,7 +277,7 @@ system level outcomes.
|
||||
|
||||
\subsubsection{ FTA weaknesses }
|
||||
\begin{itemize}
|
||||
\item Possibility to miss component failure modes
|
||||
\item Possibility to miss component failure modes.
|
||||
\item Possibility to miss environmental affects.
|
||||
\item No possibility to model base component level double failure modes.
|
||||
\end{itemize}
|
||||
@ -279,7 +291,11 @@ The investigation will typically point to a particular failure
|
||||
of a component.
|
||||
The methodology is now applied to find the significance of the failure.
|
||||
Its is based on a simple equation where $S$ ranks the severity (or cost \cite{bfmea}) of the identified SYSTEM failure,
|
||||
$O$ its occurrance, and $D$ giving the failures detectability. Muliplying these
|
||||
$O$ its occurrance\footnote{The occurrance $O$ is the
|
||||
probability of the failure happening.},
|
||||
and $D$ giving the failures detectability\footnote{Detectability: often failures
|
||||
may occur but not be noticed or cause an effect.
|
||||
Consider an unused feature failing.}. Muliplying these
|
||||
together,
|
||||
gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
|
||||
This gives in effect
|
||||
@ -293,7 +309,7 @@ a prioritised `todo list', with higher the $RPN$ values being the most urgent.
|
||||
\item No possibility to model base component level double failure modes.
|
||||
\end{itemize}
|
||||
|
||||
\paragraph{note.} FMEA is sometimes used in its literal sense, that is to say
|
||||
\paragraph{Note.} FMEA is sometimes used in its literal sense, that is to say
|
||||
Failure Mode Effects analysis, simply looking at a systems internal failure
|
||||
modes and determing what may happen as a result.
|
||||
FMEA described in this section (\ref{pfmea}) is sometimes called `production FMEA'.
|
||||
@ -311,21 +327,23 @@ electronic components was published by the DOD
|
||||
in 1991 (MIL HDK 1991 \cite{mil1991}) and is a typical
|
||||
source for MTFF data.
|
||||
%
|
||||
FMECA has a probability factor for a component causing
|
||||
FMECA has a probability factor for a component error becoming % causing
|
||||
a SYSTEM level error.
|
||||
This is termed the $\beta$ factor.
|
||||
%\footnote{for a given component failure mode there will be a $\beta$ value, the
|
||||
%probability that the component failure mode will cause a given SYSTEM failure}.
|
||||
%
|
||||
This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea},
|
||||
as often the component failure mode cannot be proven to cause a SYSTEM level failure, but
|
||||
as often the component failure mode cannot be proven to cause a SYSTEM level failure, but is
|
||||
assigned a probability $\beta$ factor by the design engineer. The use of a $\beta$ factor
|
||||
is often justified using bayes theorem \cite{probstat}.
|
||||
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
|
||||
%
|
||||
The results of FMECA are similar to FMEA, in that component errors are
|
||||
listed according to importance of fixing it to prevent the SYSTEM fault of given criticallity.
|
||||
Again this essentially produces a prioritised todo list.
|
||||
listed according to importance, based on
|
||||
probability of occurrance and criticallity.
|
||||
% to prevent the SYSTEM fault of given criticallity.
|
||||
Again this essentially produces a prioritised `todo' list.
|
||||
|
||||
%%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
|
||||
%%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or
|
||||
@ -362,7 +380,7 @@ The following gives an outline of the procedure.
|
||||
|
||||
|
||||
\subsubsection{Two statistical perspectives}
|
||||
FMEDA is a statistical analysis methodology is used from one of two perspectives,
|
||||
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
|
||||
Probability of Failure on Demand (PFD), and Probability of Failure
|
||||
in continuous Operation, or Failure in Time (FIT).
|
||||
\paragraph{Failure in Time (FIT).} Continuous operation is measured in failures per billion ($10^9$) hours of operation.
|
||||
@ -372,7 +390,7 @@ we would be interested in its operational FIT values.
|
||||
\paragraph{Probability of Failure on Demand (PFD).} For instance with an anti-lock system in
|
||||
automobile braking, or other fail safe measure applied in an emergency, we would be interested in PFD.
|
||||
That is to say the ratio of it failing
|
||||
to succeeding on demand.
|
||||
to succeeding to operate correctly on demand.
|
||||
|
||||
\subsubsection{The FMEDA Analysis Process}
|
||||
|
||||
@ -388,9 +406,10 @@ environmental conditions. The SYSTEM errors are categorised as `safe' or `danger
|
||||
%Statistical data exists for most component types \cite{mil1992}.
|
||||
%
|
||||
This phase is typically implemented on a spreadsheet
|
||||
with rows representing each component. A typical component spreadshet row would
|
||||
with rows representing each component. A typical component spreadsheet row would
|
||||
comprise of
|
||||
component type, placing in the system, part number, environmental stress factors, MTTF, safe/dangerous etc.
|
||||
component type, placement,
|
||||
part number, environmental stress factors, MTTF, safe/dangerous etc.
|
||||
%will be a determination of whether the component failing will lead to a `safe'
|
||||
%or `unsafe' condition.
|
||||
|
||||
@ -410,6 +429,7 @@ This is done by taking a component failure mode and determining
|
||||
if the SYSTEM error it is tied to is dangerous or safe.
|
||||
The decision for this may be
|
||||
based on hueristics or field data.
|
||||
EN61508 uses the $\lambda$ symbol to represent probabilities.
|
||||
Because we have statistics for each component failure mode,
|
||||
we can now now classify these in terms of safe and dangerous lambda values.
|
||||
Detectable failure probabilities are labelled `$\lambda_D$' (for
|
||||
@ -417,8 +437,8 @@ dangerous) and `$\lambda_S$' (for safe) \cite{en61508}.
|
||||
|
||||
\paragraph{Determine Detectable and Undetectable Failures.}
|
||||
Each safe and dangerous failure mode is now
|
||||
classified as detectable or un-detectable, this
|
||||
is determined by the SYSTEM’s
|
||||
classified as detectable or un-detectable.
|
||||
EN61508 assumes that products have a high level of
|
||||
self checking features.
|
||||
%
|
||||
This gives us four level failure mode classifications:
|
||||
@ -436,7 +456,7 @@ next step is to investigate using an actual working SYSTEM.
|
||||
|
||||
Failures are deliberately caused (by physical intervention), and any new SYSTEM level
|
||||
failures are added to the model.
|
||||
Hueristics and MTTF failure rates for the components
|
||||
Heuristics and MTTF failure rates for the components
|
||||
are used to calculate probabilities for these new failure modes
|
||||
along with their safety and detectability classifications (i.e.
|
||||
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
|
||||
@ -454,11 +474,16 @@ The calculations for these are described below.
|
||||
The diagnostic coverage is simply the ratio
|
||||
of the dangerous detected probabilities
|
||||
against the probability of all dangerous failures,
|
||||
and is normally expressed as a percentage.
|
||||
and is normally expressed as a percentage. $\Sigma\lambda_{DD}$ represents
|
||||
the percentage of dangerous detected base component failure modes, and
|
||||
$\Sigma\lambda_D$ the total number of dangerous base component failure modes.
|
||||
|
||||
$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$
|
||||
|
||||
The diagnostic coverage for safe failures is given as
|
||||
The diagnostic coverage for safe failures, where $\Sigma\lambda_SD$ represents the percentage of
|
||||
safe detected base component failure modes,
|
||||
and $\Sigma\lambda_S$ the total number of safe base component failure modes,
|
||||
is given as
|
||||
|
||||
$$ SF = \frac{\Sigma\lambda_SD}{\Sigma\lambda_S} $$
|
||||
|
||||
@ -498,8 +523,13 @@ There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
|
||||
In addition to probablistic risk factors, the
|
||||
diagnostic coverage and SFF
|
||||
have threshold bands beoming stricter for each level.
|
||||
Demanded software verification and specification techniques and constraints (such as language sub-sets, s/w redundancy etc)
|
||||
Demanded software verification and specification techniques and constraints
|
||||
(such as language subsets, s/w redundancy etc)
|
||||
become stricter for each SIL level.
|
||||
%%
|
||||
%% Andrew asked me to expand on this here, but it would take at least two
|
||||
%% pages. I think its more appropriate for the survey.tex chapter.
|
||||
%%
|
||||
|
||||
Thus FMEDA uses statistical methods to determine
|
||||
a safety level (SIL), typically used to meet an acceptable risk
|
||||
@ -521,7 +551,7 @@ has a calculated risk, a fault detection time (if any), an estimated risk import
|
||||
and other factors such as de-rating and environmental stress.
|
||||
With one component failure mode per row,
|
||||
all the statistical factors for SIL rating can be produced\footnote{A SIL rating will apply
|
||||
to an installed plant, i.e. A complete SYSTEM. SIL ratings for individual components or
|
||||
to an installed plant, i.e. a complete installed and working SYSTEM. SIL ratings for individual components or
|
||||
sub-systems are meaningless, and the nearest equivalent would be the FIT/PFD and SFF and diagnostic coverage figures.}.
|
||||
|
||||
|
||||
@ -541,7 +571,7 @@ where he probably should assign a dangerous failure classification to it.
|
||||
%
|
||||
There is no analysis
|
||||
of how that resistor would/could affect the components close to it, but because the circuitry
|
||||
it is part of critical section it will most likely
|
||||
is part of critical section it will most likely
|
||||
be linked to a dangerous system level failure in an FMEDA study.
|
||||
%
|
||||
%%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA????
|
||||
@ -571,7 +601,7 @@ safety level zones as recomended in EN61508\cite{en61508}. This is a vague way o
|
||||
safety, as it can miss unexpected effects due to `unexpected' component interaction.
|
||||
|
||||
The Statistical Analysis methodology is the core philosophy
|
||||
of the Safety Integrity Levels (SIL) ebodied in EN61508 \cite{en61508}
|
||||
of the Safety Integrity Levels (SIL) embodied in EN61508 \cite{en61508}
|
||||
and its international analog standard IOC5108.
|
||||
|
||||
|
||||
@ -590,11 +620,12 @@ and its international analog standard IOC5108.
|
||||
\item All component failure modes must be considered in the model.
|
||||
\item It should be easy to integrate mechanical, electronic and software models \cite{sccs}[pp.287].
|
||||
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
|
||||
\item It should have a formal basis, that is to say, it should be able to produce mathematical proofs
|
||||
\item It should have a formal basis, that is to say, be able to produce mathematical proofs
|
||||
for its results, such as system level error causation trees, reliability and safety statistics.
|
||||
\item It should be easy to use, ideally using a graphical syntax (as oppossed to a formal mathematical one).
|
||||
\item It should be easy to use, ideally using a
|
||||
graphical syntax (as oppossed to a formal symbolic/mathematical text based language).
|
||||
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
|
||||
to smaller and smaller functional modules \cite{maikowski}.
|
||||
to smaller and smaller functional groupings \cite{maikowski}.
|
||||
\item Multiple failure modes may be modelled from the base component level up.
|
||||
\end{itemize}
|
||||
|
||||
@ -608,16 +639,16 @@ and start with the component failure modes.
|
||||
%
|
||||
\paragraph{Natural Fault Finding is top down.}
|
||||
The traditional fault finding, or natural fault finding
|
||||
is to work from the top down.
|
||||
is to start at the top with SYSTEM level failure modes/faults.
|
||||
%
|
||||
On encountering a
|
||||
fault, the symptom is first observed at the top or
|
||||
SYSTEM level. By de-composing the functionality of the faulty system and testing
|
||||
we can further de-compose the system until we find the
|
||||
SYSTEM level. By decomposing the functionality of the faulty system and testing
|
||||
we can further decompose the system until we find the
|
||||
faulty base level component.
|
||||
De-composition of electrical circuits is formalised and explored
|
||||
Decomposition of electrical circuits is formalised and explored
|
||||
in \cite{maikowski}. This top down technique de-composes by functionality.
|
||||
Simpler and simpler functional blocks are discovered as we delve
|
||||
Simpler and simpler functional groups are discovered as we delve
|
||||
further into the way the system works and is built.
|
||||
|
||||
|
||||
@ -644,6 +675,9 @@ into manageable and separately testable entities.
|
||||
A second justification for this is that the design process for a product requires both top down and bottom-up
|
||||
thinking. To analyse a system from the bottom-up is a useful
|
||||
design validation process in itself \cite{sommerville}.
|
||||
%%
|
||||
%% CAN we find a ref for both top and bottom up being used
|
||||
%% as design validaion ????
|
||||
|
||||
\paragraph{Design Decision: Methodology must be bottom-up.}
|
||||
In order to ensure that all component failure modes are handled,
|
||||
@ -656,10 +690,15 @@ A hierarchy of functional grouping, leading to a system model
|
||||
still leaves us with the problem of the number of component failure modes.
|
||||
The base components will typically have several failure modes each.
|
||||
%
|
||||
Given a typical embedded system may have hundreds of components
|
||||
Given a typical embedded system may have hundreds of components.
|
||||
This means that we would still have to tie base component failure modes
|
||||
to SYSTEM level errors. This is the `possibility to miss failure mode effects
|
||||
at SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
|
||||
to SYSTEM level errors.
|
||||
The problem with this is that the base component failure mode under investigation
|
||||
effects are not rigorously examined in relation to functionally adjacent components.
|
||||
Thus there is the `possibility to miss failure mode effects
|
||||
at the much higher SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
|
||||
%%%
|
||||
%%% OK Got up to here Lunchtime edit 06DEC2010.............
|
||||
|
||||
\paragraph{Design Decision: Methodology must reduce and collate errors at each functional group stage.}
|
||||
SYSTEMS typically have far fewer failure modes than the sum of their component failure modes.
|
||||
|
Loading…
Reference in New Issue
Block a user