diff --git a/fmmd_concept/fmmd_concept.tex b/fmmd_concept/fmmd_concept.tex index b1e42d7..9292187 100644 --- a/fmmd_concept/fmmd_concept.tex +++ b/fmmd_concept/fmmd_concept.tex @@ -320,20 +320,48 @@ This is a process that takes all the components in a system, and from the failure modes of those components, the investigating engineer must tie them to possible SYSTEM level events/failure modes. This technique -evaluates and the product’s self-diagnostic ability, -The calculations and procedure for FMEDA are +evaluates a products statistical level of safety +taking into account its self-diagnostic ability. +The calculations and procedures for FMEDA are described in EN61508 Part 2 Appendix C \cite{en61508}[Part 2 App C]. The following gives an outline of the procedure. -\paragraph{FMEA} + +\subsubsection{Two statistical perspectives} +he Statistical Analysis method is used from two perspectives, +Probability of Failure on Demand (PFD), and Probability of Failure +in continuous Operation, Failure in Time (FIT). +\paragraph{Failure in Time (FIT)}. + +Continuous operation is measured in failures per billion ($10^9$) hours of operation. +For a continuously running nuclear powerstation +we would be interested in its operational FIT values. + +\paragraph{Probability of Failure on Demand (PFD)}. +For instance with the anti-lock system on a automobile braking +system, we would be interested in PFD. +That is to say the ratio of it failing +to succeeding on demand. + +\subsubsection{The FMEDA Analysis Process} + +\paragraph{Determine SYSTEM level failures from base components} The first stage is to apply FMEA to the SYSTEM. -Within the product all failure rates of individual -components contribute to the overall product failure rate. +% +Each component is analysed in terms of how its failure +would affact the system.% Failure rates of individual components in the SYSTEM are calculated based on component type and environmental conditions. +% +Statistical data exists for most component types \cite{mil1992}. +% +This phase is typically implemented on a spreadsheet. Along with a components +type, placing in the system, part number, environmental stress factors etc. +%will be a determination of whether the component failing will lead to a `safe' +%or `unsafe' condition. -\paragraph{Overall SYSTEM failure rate} +\paragraph{Overall SYSTEM failure rate.} Product failure rate is the sum of all component failure rates. This is the sum of safe and unsafe failures. @@ -341,29 +369,47 @@ failures. \paragraph{Self Diagnostics} We next evaluate the SYSTEMS’s self-diagnostic ability. -Each component’s failure mode and its failure rate are listed. -Failure modes are classified as safe or dangerous\footnote{Again this is taking a component failure mode and determing -how it will react with any other components in the SYSTEM and making a decision -based on hueistics.}. -detectable failures are labelled `$\lambda_D$' and safe failures `$\lambda_S$' by EN61508. +Each component’s failure modes and failure rate are now available. +Failure modes are now classified as safe or dangerous. +This is done by taking a component failure mode and determining +how it will react with any other components in the SYSTEM and taking a decision +based on hueristics. +Detectable failure probabilities are labelled `$\lambda_D$' (for +dangerous) and `$\lambda_S$' (for safe) \cite{EN61508}. \paragraph{Determine Detectable and Undetecable Failures} -Each safe and dangerous failure mode is determined as detectable or un-detectable by the SYSTEMS’s +Each safe and dangerous failure mode is now +determined as detectable or un-detectable by the SYSTEMS’s self checking features. % -The result is a list of all components, their failure modes, the failure mode classification -as Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU), -and the failure rate of each classification using the failure rate -prediction results ($\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). +This gives us four failure mode classifications: +Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU), +and the failure rate of each classification +is represented by lambda variables +($\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). -Because some failure modes may not be discovered theoretically during the +Because some failure modes may not be discovered theoretically during the static +analysis, the +% admission of how daft it is to take a component failure mode on its own +% and guess how it will affect an ENTIRE complex SYSTEM next step is to investigate using an actual working SYSTEM. -This requires the deliberate introduction -of failures; any new failures discovered at this stage are classified -$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$ -and added to the result set. + +Failures are deliberately caused (by physical intervetion), and any new SYSTEM level +failures are added to the model. +Hueristics and MTTF failure rate for the components +are used to calculate probabilities for these new failure modes +according to their saefty and detectability classifications (i.e. +$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). +These new failures are added to the model. %SD, SU, DD, DU. +With these classifications, and statistics for each component +we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is) +and its safe failure fraction (how many of its failures are self detected or safe compred to +all failures possible). + +The calculations for these are described below. + \paragraph{Diagnostic Coverage.} The diagnostic coverage is simply the ratio of the dangerous detected probabilities @@ -385,23 +431,27 @@ Again this is usually expressed as a percentage. $$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$ -This is the ratio of -Step 4 Calculate SFF, SIL and PFD -The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used. -SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU) -PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time) +%This is the ratio of +%Step 4 Calculate SFF, SIL and PFD +%The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used. +%SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU) +%PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time) % Often a given component failure mode there will be a $\beta$ value, the % probability that the component failure mode will cause a given SYSTEM failure. -\paragraph{Risk Mitigation} +%\paragraph{Risk Mitigation} +% +%The component may be have its risk factor +%reduced by the checking interval (or $\tau$ time between self checking procedures). +% +%Ultimately this technique calculates a risk factor for each component. +%The risk factors of all the components are summed and +%%give a value for the `safety level' for the equipment in a given environment. + + -The component may be have its risk factor -reduced by the checking interval (or $\tau$ time between self checking procedures). -Ultimately this technique calculates a risk factor for each component. -The risk factors of all the components are summed and -give a value for the `safety level' for the equipment in a given environment. \paragraph{Classification into Safety Integrity Levels (SIL).} There are four SIL levels, from 1 to 4 with 4 being the highest safety level. @@ -423,55 +473,53 @@ Thus a statistical model can be implemented on a spreadsheet, where each component has a calculated risk, a fault detection time (if any), an estimated risk importance and other factors such as de-rating and environmental stress. -This can be calculated, with one component failure mode per row, on a spreadsheet -and these are all summed to give the final assessment figure. +With one component failure mode per row, +all the statistical factors for SIL rating can be produced. -\subsubsection{Two statistical perspectives} -he Statistical Analysis method is used from two perspectives, -Probability of Failure on Demand (PFD), and Probability of Failure -in continuous Operation, Failure in Time (FIT). -\paragraph{Failure in Time (FIT)}. -Continuous operation is measured in failures per billion ($10^9$) hours of operation. -For a continuously running nuclear powerstation -we would be interested in its operational FIT values. -\paragraph{Probability of Failure on Demand (PFD)}. -For instance with the anti-lock system on a automobile braking -system, we would be interested in PFD. -That is to say the ratio of it failing -to succeeding on demand. -\subsubsection{FMEDA and determinability prediction accuracy}. + + +\subsubsection{FMEDA and failure outcome prediction accuracy.} This suffers from the same problems of -lack of determinability prediction accuracy, as FMEA above. +lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}. % -We have to decide how particular components failing will impact on the SYSTEM or top level. +This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level. This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit may be part of a critical monitoring function. The analyst is now put in a position -where he must assign a critical failure possibility to it. +where he probably should assign a dangerous failure classification to it. % There is no analysis -of how that resistor would/could affect that circuit, but because the circuitry -it is part of critical section it will be linked to a critical system level fault. +of how that resistor would/could affect the components close to it, but because the circuitry +it is part of critical section it will most likely +be linked to a dangerous system level failure in an FMEDA study. % -A $\beta$ factor, the hueristically defined probability -of the failure causing the system fault may be applied. +%%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA???? +%%- +%A $\beta$ factor, the hueristically defined probability +%of the failure causing the system fault may be applied. % But because there is no detailed analysis of the failure mode behaviour -of the component, traceable to the SYSTEM level, it becomes more +of the component in its local environment +but traceable directly to the SYSTEM level, it becomes more guess work than science. -With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side -effects that lead to failure can be missed. +% +With FMEDA, there is no rigorous cause and effect analysis for the failure modes +and how they interact on the micro scale (the components adjacent to them in terms of functionality). +Unintended side effects that lead to failure can be missed. +Also component failure modes that are not +dangerous, may be wrongly assigned as dangerous simply because they exist in a critical +section of the product. -By this we may have the MTTF of some critical component failure -modes, but we can only guess, in most cases what the safety case outcome -will be if it occurs. +% some critical component failure +%modes, but we can only guess, in most cases what the safety case outcome +%will be if it occurs. -This leads to having components within a SYSTEM partitioned into different -safety level zones \cite{en61508}. This is a vague way of determining -safety. +This leads to the practise of having components within a SYSTEM partitioned into different +safety level zones as recomended in EN61508\cite{en61508}. This is a vague way of determining +safety, as it can miss unexpected effects due to `unexpected' component interaction. The Statistical Analysis methodology is the core philosophy of the Safety Integrity Levels (SIL) ebodied in EN61508 \cite{en61508} @@ -676,7 +724,7 @@ causes for a SYSTEM level failure is known as a minimal cut set \cite{nasafta}. If statistical models exist for the component failure modes these failure causation trees (or minimal cut sets \cite{nucfta}) can be used to calculate Mean Time to Failure (MTTF) or Probability of Failure on demand (PFD) figures. -Constract the analytical capability of this with the +Contrast the analytical capability of FMMD with the methodologies where the component failure modes are linked directly to SYSTEM failure modes with no analysis stages in between.