First go Monday night

This commit is contained in:
Robin Clark 2010-11-15 18:58:02 +00:00
parent ff8ff96fe5
commit 7ed7388bfe

View File

@ -320,20 +320,48 @@ This is a process that takes all the components in a system,
and from the failure modes of those components, the investigating engineer
must tie them to possible SYSTEM level events/failure modes.
This technique
evaluates and the products self-diagnostic ability,
The calculations and procedure for FMEDA are
evaluates a products statistical level of safety
taking into account its self-diagnostic ability.
The calculations and procedures for FMEDA are
described in EN61508 Part 2 Appendix C \cite{en61508}[Part 2 App C].
The following gives an outline of the procedure.
\paragraph{FMEA}
\subsubsection{Two statistical perspectives}
he Statistical Analysis method is used from two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, Failure in Time (FIT).
\paragraph{Failure in Time (FIT)}.
Continuous operation is measured in failures per billion ($10^9$) hours of operation.
For a continuously running nuclear powerstation
we would be interested in its operational FIT values.
\paragraph{Probability of Failure on Demand (PFD)}.
For instance with the anti-lock system on a automobile braking
system, we would be interested in PFD.
That is to say the ratio of it failing
to succeeding on demand.
\subsubsection{The FMEDA Analysis Process}
\paragraph{Determine SYSTEM level failures from base components}
The first stage is to apply FMEA to the SYSTEM.
Within the product all failure rates of individual
components contribute to the overall product failure rate.
%
Each component is analysed in terms of how its failure
would affact the system.%
Failure rates of individual components in the SYSTEM
are calculated based on component type and
environmental conditions.
%
Statistical data exists for most component types \cite{mil1992}.
%
This phase is typically implemented on a spreadsheet. Along with a components
type, placing in the system, part number, environmental stress factors etc.
%will be a determination of whether the component failing will lead to a `safe'
%or `unsafe' condition.
\paragraph{Overall SYSTEM failure rate}
\paragraph{Overall SYSTEM failure rate.}
Product failure rate is the sum of all component
failure rates. This is the sum of safe and unsafe
failures.
@ -341,29 +369,47 @@ failures.
\paragraph{Self Diagnostics}
We next evaluate the SYSTEMSs self-diagnostic ability.
Each components failure mode and its failure rate are listed.
Failure modes are classified as safe or dangerous\footnote{Again this is taking a component failure mode and determing
how it will react with any other components in the SYSTEM and making a decision
based on hueistics.}.
detectable failures are labelled `$\lambda_D$' and safe failures `$\lambda_S$' by EN61508.
Each components failure modes and failure rate are now available.
Failure modes are now classified as safe or dangerous.
This is done by taking a component failure mode and determining
how it will react with any other components in the SYSTEM and taking a decision
based on hueristics.
Detectable failure probabilities are labelled `$\lambda_D$' (for
dangerous) and `$\lambda_S$' (for safe) \cite{EN61508}.
\paragraph{Determine Detectable and Undetecable Failures}
Each safe and dangerous failure mode is determined as detectable or un-detectable by the SYSTEMSs
Each safe and dangerous failure mode is now
determined as detectable or un-detectable by the SYSTEMSs
self checking features.
%
The result is a list of all components, their failure modes, the failure mode classification
as Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
and the failure rate of each classification using the failure rate
prediction results ($\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
This gives us four failure mode classifications:
Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
and the failure rate of each classification
is represented by lambda variables
($\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
Because some failure modes may not be discovered theoretically during the
Because some failure modes may not be discovered theoretically during the static
analysis, the
% admission of how daft it is to take a component failure mode on its own
% and guess how it will affect an ENTIRE complex SYSTEM
next step is to investigate using an actual working SYSTEM.
This requires the deliberate introduction
of failures; any new failures discovered at this stage are classified
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$
and added to the result set.
Failures are deliberately caused (by physical intervetion), and any new SYSTEM level
failures are added to the model.
Hueristics and MTTF failure rate for the components
are used to calculate probabilities for these new failure modes
according to their saefty and detectability classifications (i.e.
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
These new failures are added to the model.
%SD, SU, DD, DU.
With these classifications, and statistics for each component
we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is)
and its safe failure fraction (how many of its failures are self detected or safe compred to
all failures possible).
The calculations for these are described below.
\paragraph{Diagnostic Coverage.}
The diagnostic coverage is simply the ratio
of the dangerous detected probabilities
@ -385,23 +431,27 @@ Again this is usually expressed as a percentage.
$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$
This is the ratio of
Step 4 Calculate SFF, SIL and PFD
The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)
%This is the ratio of
%Step 4 Calculate SFF, SIL and PFD
%The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
%SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
%PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)
% Often a given component failure mode there will be a $\beta$ value, the
% probability that the component failure mode will cause a given SYSTEM failure.
\paragraph{Risk Mitigation}
%\paragraph{Risk Mitigation}
%
%The component may be have its risk factor
%reduced by the checking interval (or $\tau$ time between self checking procedures).
%
%Ultimately this technique calculates a risk factor for each component.
%The risk factors of all the components are summed and
%%give a value for the `safety level' for the equipment in a given environment.
The component may be have its risk factor
reduced by the checking interval (or $\tau$ time between self checking procedures).
Ultimately this technique calculates a risk factor for each component.
The risk factors of all the components are summed and
give a value for the `safety level' for the equipment in a given environment.
\paragraph{Classification into Safety Integrity Levels (SIL).}
There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
@ -423,55 +473,53 @@ Thus a statistical
model can be implemented on a spreadsheet, where each component
has a calculated risk, a fault detection time (if any), an estimated risk importance
and other factors such as de-rating and environmental stress.
This can be calculated, with one component failure mode per row, on a spreadsheet
and these are all summed to give the final assessment figure.
With one component failure mode per row,
all the statistical factors for SIL rating can be produced.
\subsubsection{Two statistical perspectives}
he Statistical Analysis method is used from two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, Failure in Time (FIT).
\paragraph{Failure in Time (FIT)}.
Continuous operation is measured in failures per billion ($10^9$) hours of operation.
For a continuously running nuclear powerstation
we would be interested in its operational FIT values.
\paragraph{Probability of Failure on Demand (PFD)}.
For instance with the anti-lock system on a automobile braking
system, we would be interested in PFD.
That is to say the ratio of it failing
to succeeding on demand.
\subsubsection{FMEDA and determinability prediction accuracy}.
\subsubsection{FMEDA and failure outcome prediction accuracy.}
This suffers from the same problems of
lack of determinability prediction accuracy, as FMEA above.
lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}.
%
We have to decide how particular components failing will impact on the SYSTEM or top level.
This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level.
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
may be part of a critical monitoring function.
The analyst is now put in a position
where he must assign a critical failure possibility to it.
where he probably should assign a dangerous failure classification to it.
%
There is no analysis
of how that resistor would/could affect that circuit, but because the circuitry
it is part of critical section it will be linked to a critical system level fault.
of how that resistor would/could affect the components close to it, but because the circuitry
it is part of critical section it will most likely
be linked to a dangerous system level failure in an FMEDA study.
%
A $\beta$ factor, the hueristically defined probability
of the failure causing the system fault may be applied.
%%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA????
%%-
%A $\beta$ factor, the hueristically defined probability
%of the failure causing the system fault may be applied.
%
But because there is no detailed analysis of the failure mode behaviour
of the component, traceable to the SYSTEM level, it becomes more
of the component in its local environment
but traceable directly to the SYSTEM level, it becomes more
guess work than science.
With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side
effects that lead to failure can be missed.
%
With FMEDA, there is no rigorous cause and effect analysis for the failure modes
and how they interact on the micro scale (the components adjacent to them in terms of functionality).
Unintended side effects that lead to failure can be missed.
Also component failure modes that are not
dangerous, may be wrongly assigned as dangerous simply because they exist in a critical
section of the product.
By this we may have the MTTF of some critical component failure
modes, but we can only guess, in most cases what the safety case outcome
will be if it occurs.
% some critical component failure
%modes, but we can only guess, in most cases what the safety case outcome
%will be if it occurs.
This leads to having components within a SYSTEM partitioned into different
safety level zones \cite{en61508}. This is a vague way of determining
safety.
This leads to the practise of having components within a SYSTEM partitioned into different
safety level zones as recomended in EN61508\cite{en61508}. This is a vague way of determining
safety, as it can miss unexpected effects due to `unexpected' component interaction.
The Statistical Analysis methodology is the core philosophy
of the Safety Integrity Levels (SIL) ebodied in EN61508 \cite{en61508}
@ -676,7 +724,7 @@ causes for a SYSTEM level failure is known as a minimal cut set \cite{nasafta}.
If statistical models exist for the component failure modes
these failure causation trees (or minimal cut sets \cite{nucfta})
can be used to calculate Mean Time to Failure (MTTF) or Probability of Failure on demand (PFD) figures.
Constract the analytical capability of this with the
Contrast the analytical capability of FMMD with the
methodologies where the component failure modes are linked
directly to SYSTEM failure modes with no analysis stages in between.