First go Monday night

This commit is contained in:
Robin Clark 2010-11-15 18:58:02 +00:00
parent ff8ff96fe5
commit 7ed7388bfe

View File

@ -320,20 +320,48 @@ This is a process that takes all the components in a system,
and from the failure modes of those components, the investigating engineer and from the failure modes of those components, the investigating engineer
must tie them to possible SYSTEM level events/failure modes. must tie them to possible SYSTEM level events/failure modes.
This technique This technique
evaluates and the products self-diagnostic ability, evaluates a products statistical level of safety
The calculations and procedure for FMEDA are taking into account its self-diagnostic ability.
The calculations and procedures for FMEDA are
described in EN61508 Part 2 Appendix C \cite{en61508}[Part 2 App C]. described in EN61508 Part 2 Appendix C \cite{en61508}[Part 2 App C].
The following gives an outline of the procedure. The following gives an outline of the procedure.
\paragraph{FMEA}
\subsubsection{Two statistical perspectives}
he Statistical Analysis method is used from two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, Failure in Time (FIT).
\paragraph{Failure in Time (FIT)}.
Continuous operation is measured in failures per billion ($10^9$) hours of operation.
For a continuously running nuclear powerstation
we would be interested in its operational FIT values.
\paragraph{Probability of Failure on Demand (PFD)}.
For instance with the anti-lock system on a automobile braking
system, we would be interested in PFD.
That is to say the ratio of it failing
to succeeding on demand.
\subsubsection{The FMEDA Analysis Process}
\paragraph{Determine SYSTEM level failures from base components}
The first stage is to apply FMEA to the SYSTEM. The first stage is to apply FMEA to the SYSTEM.
Within the product all failure rates of individual %
components contribute to the overall product failure rate. Each component is analysed in terms of how its failure
would affact the system.%
Failure rates of individual components in the SYSTEM Failure rates of individual components in the SYSTEM
are calculated based on component type and are calculated based on component type and
environmental conditions. environmental conditions.
%
Statistical data exists for most component types \cite{mil1992}.
%
This phase is typically implemented on a spreadsheet. Along with a components
type, placing in the system, part number, environmental stress factors etc.
%will be a determination of whether the component failing will lead to a `safe'
%or `unsafe' condition.
\paragraph{Overall SYSTEM failure rate} \paragraph{Overall SYSTEM failure rate.}
Product failure rate is the sum of all component Product failure rate is the sum of all component
failure rates. This is the sum of safe and unsafe failure rates. This is the sum of safe and unsafe
failures. failures.
@ -341,29 +369,47 @@ failures.
\paragraph{Self Diagnostics} \paragraph{Self Diagnostics}
We next evaluate the SYSTEMSs self-diagnostic ability. We next evaluate the SYSTEMSs self-diagnostic ability.
Each components failure mode and its failure rate are listed. Each components failure modes and failure rate are now available.
Failure modes are classified as safe or dangerous\footnote{Again this is taking a component failure mode and determing Failure modes are now classified as safe or dangerous.
how it will react with any other components in the SYSTEM and making a decision This is done by taking a component failure mode and determining
based on hueistics.}. how it will react with any other components in the SYSTEM and taking a decision
detectable failures are labelled `$\lambda_D$' and safe failures `$\lambda_S$' by EN61508. based on hueristics.
Detectable failure probabilities are labelled `$\lambda_D$' (for
dangerous) and `$\lambda_S$' (for safe) \cite{EN61508}.
\paragraph{Determine Detectable and Undetecable Failures} \paragraph{Determine Detectable and Undetecable Failures}
Each safe and dangerous failure mode is determined as detectable or un-detectable by the SYSTEMSs Each safe and dangerous failure mode is now
determined as detectable or un-detectable by the SYSTEMSs
self checking features. self checking features.
% %
The result is a list of all components, their failure modes, the failure mode classification This gives us four failure mode classifications:
as Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU), Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
and the failure rate of each classification using the failure rate and the failure rate of each classification
prediction results ($\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). is represented by lambda variables
($\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
Because some failure modes may not be discovered theoretically during the Because some failure modes may not be discovered theoretically during the static
analysis, the
% admission of how daft it is to take a component failure mode on its own
% and guess how it will affect an ENTIRE complex SYSTEM
next step is to investigate using an actual working SYSTEM. next step is to investigate using an actual working SYSTEM.
This requires the deliberate introduction
of failures; any new failures discovered at this stage are classified Failures are deliberately caused (by physical intervetion), and any new SYSTEM level
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$ failures are added to the model.
and added to the result set. Hueristics and MTTF failure rate for the components
are used to calculate probabilities for these new failure modes
according to their saefty and detectability classifications (i.e.
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
These new failures are added to the model.
%SD, SU, DD, DU. %SD, SU, DD, DU.
With these classifications, and statistics for each component
we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is)
and its safe failure fraction (how many of its failures are self detected or safe compred to
all failures possible).
The calculations for these are described below.
\paragraph{Diagnostic Coverage.} \paragraph{Diagnostic Coverage.}
The diagnostic coverage is simply the ratio The diagnostic coverage is simply the ratio
of the dangerous detected probabilities of the dangerous detected probabilities
@ -385,23 +431,27 @@ Again this is usually expressed as a percentage.
$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$ $$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$
This is the ratio of %This is the ratio of
Step 4 Calculate SFF, SIL and PFD %Step 4 Calculate SFF, SIL and PFD
The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used. %The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU) %SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time) %PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)
% Often a given component failure mode there will be a $\beta$ value, the % Often a given component failure mode there will be a $\beta$ value, the
% probability that the component failure mode will cause a given SYSTEM failure. % probability that the component failure mode will cause a given SYSTEM failure.
\paragraph{Risk Mitigation} %\paragraph{Risk Mitigation}
%
%The component may be have its risk factor
%reduced by the checking interval (or $\tau$ time between self checking procedures).
%
%Ultimately this technique calculates a risk factor for each component.
%The risk factors of all the components are summed and
%%give a value for the `safety level' for the equipment in a given environment.
The component may be have its risk factor
reduced by the checking interval (or $\tau$ time between self checking procedures).
Ultimately this technique calculates a risk factor for each component.
The risk factors of all the components are summed and
give a value for the `safety level' for the equipment in a given environment.
\paragraph{Classification into Safety Integrity Levels (SIL).} \paragraph{Classification into Safety Integrity Levels (SIL).}
There are four SIL levels, from 1 to 4 with 4 being the highest safety level. There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
@ -423,55 +473,53 @@ Thus a statistical
model can be implemented on a spreadsheet, where each component model can be implemented on a spreadsheet, where each component
has a calculated risk, a fault detection time (if any), an estimated risk importance has a calculated risk, a fault detection time (if any), an estimated risk importance
and other factors such as de-rating and environmental stress. and other factors such as de-rating and environmental stress.
This can be calculated, with one component failure mode per row, on a spreadsheet With one component failure mode per row,
and these are all summed to give the final assessment figure. all the statistical factors for SIL rating can be produced.
\subsubsection{Two statistical perspectives}
he Statistical Analysis method is used from two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, Failure in Time (FIT).
\paragraph{Failure in Time (FIT)}.
Continuous operation is measured in failures per billion ($10^9$) hours of operation.
For a continuously running nuclear powerstation
we would be interested in its operational FIT values.
\paragraph{Probability of Failure on Demand (PFD)}.
For instance with the anti-lock system on a automobile braking
system, we would be interested in PFD.
That is to say the ratio of it failing
to succeeding on demand.
\subsubsection{FMEDA and determinability prediction accuracy}.
\subsubsection{FMEDA and failure outcome prediction accuracy.}
This suffers from the same problems of This suffers from the same problems of
lack of determinability prediction accuracy, as FMEA above. lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}.
% %
We have to decide how particular components failing will impact on the SYSTEM or top level. This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level.
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
may be part of a critical monitoring function. may be part of a critical monitoring function.
The analyst is now put in a position The analyst is now put in a position
where he must assign a critical failure possibility to it. where he probably should assign a dangerous failure classification to it.
% %
There is no analysis There is no analysis
of how that resistor would/could affect that circuit, but because the circuitry of how that resistor would/could affect the components close to it, but because the circuitry
it is part of critical section it will be linked to a critical system level fault. it is part of critical section it will most likely
be linked to a dangerous system level failure in an FMEDA study.
% %
A $\beta$ factor, the hueristically defined probability %%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA????
of the failure causing the system fault may be applied. %%-
%A $\beta$ factor, the hueristically defined probability
%of the failure causing the system fault may be applied.
% %
But because there is no detailed analysis of the failure mode behaviour But because there is no detailed analysis of the failure mode behaviour
of the component, traceable to the SYSTEM level, it becomes more of the component in its local environment
but traceable directly to the SYSTEM level, it becomes more
guess work than science. guess work than science.
With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side %
effects that lead to failure can be missed. With FMEDA, there is no rigorous cause and effect analysis for the failure modes
and how they interact on the micro scale (the components adjacent to them in terms of functionality).
Unintended side effects that lead to failure can be missed.
Also component failure modes that are not
dangerous, may be wrongly assigned as dangerous simply because they exist in a critical
section of the product.
By this we may have the MTTF of some critical component failure % some critical component failure
modes, but we can only guess, in most cases what the safety case outcome %modes, but we can only guess, in most cases what the safety case outcome
will be if it occurs. %will be if it occurs.
This leads to having components within a SYSTEM partitioned into different This leads to the practise of having components within a SYSTEM partitioned into different
safety level zones \cite{en61508}. This is a vague way of determining safety level zones as recomended in EN61508\cite{en61508}. This is a vague way of determining
safety. safety, as it can miss unexpected effects due to `unexpected' component interaction.
The Statistical Analysis methodology is the core philosophy The Statistical Analysis methodology is the core philosophy
of the Safety Integrity Levels (SIL) ebodied in EN61508 \cite{en61508} of the Safety Integrity Levels (SIL) ebodied in EN61508 \cite{en61508}
@ -676,7 +724,7 @@ causes for a SYSTEM level failure is known as a minimal cut set \cite{nasafta}.
If statistical models exist for the component failure modes If statistical models exist for the component failure modes
these failure causation trees (or minimal cut sets \cite{nucfta}) these failure causation trees (or minimal cut sets \cite{nucfta})
can be used to calculate Mean Time to Failure (MTTF) or Probability of Failure on demand (PFD) figures. can be used to calculate Mean Time to Failure (MTTF) or Probability of Failure on demand (PFD) figures.
Constract the analytical capability of this with the Contrast the analytical capability of FMMD with the
methodologies where the component failure modes are linked methodologies where the component failure modes are linked
directly to SYSTEM failure modes with no analysis stages in between. directly to SYSTEM failure modes with no analysis stages in between.