% % Make the revision and doc number macro's then they are defined in one place \ifthenelse {\boolean{paper}} { \begin{abstract} A survey of Static Failure Mode analysis Methodologies applicable to safety critical systems. \end{abstract} } { \section{Overvew} A survey of Static Failure Mode analysis Methodologies applicable to safety critical systems. } There are four methodologies in common use for failure mode modelling. These are FTA, FMEA, FMECA and FMEDA (a form of statistical assessment). % These methodologies date from the 1940's onwards, and were designed for different application areas and reasons; all have drawbacks and advantages that are discussed in the next section. %In short %FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods %lack precision in predicting failure modes at the SYSTEM level. \ifthenelse {\boolean{paper}} { paper } { chapter } presents the design considerations that motivated and provided the specification for the FMMD methodology. % \section{Introduction} \subsection{Failure Modes and System Failure Symptoms} describe briefly what a base component failure mode is and what a system level failure mode is. \section {Four Current Failure Mode Analysis Methodologies} \subsection { FTA } This, like all top~down methodologies introduces the very serious problem of missing component failure modes \cite{faa}[Ch.9]. %, or modelling at %a too high level of failure mode abstraction. FTA was invented for use on the minuteman nuclear defence missile systems in the early 1960s and was not designed as a rigorous fault/failure mode methodology. It was designed to look for disastrous top level hazards and determine how they could be caused. It is more like a procedure to be applied when discussing the safety of a system, with a top down hierarchical notation using logic symbols, that guides the analysis. This methodology was designed for experienced engineers sitting around a large diagram and discussing the safety aspects. Also the nature of a large rocket with red wire, and remote detonation failsafes meant that the objective was to iron out common failures not to rigorously detect all possible failures. Consequently it was not designed to guarantee to covering all component failure modes, and has no rigorous in-built safeguards to ensure coverage of all possible system level outcomes. \paragraph{Outline of FTA Methodology} FTA works by taking an undesireable event (or SYSTEM level failure mode or TOP level failure) and deciding top-down, what sub-systems it depends upon, and which failure events of those sub-systems could cause the top level failure. It then applies the same process to the sub-systems it identified from the top level, identifying level level sub-systems and events. It is not required to de-compose down to base component level. \paragraph{One FTA Tree per System Level Failure Mode.} This means that each system level error (or undesireable event) requires its own FTA tree. This increases the amount of work to do, and in the case of updates to particular sub-systems, introduces the requirement to update every FTA tree modelling that sub-system. \subsubsection{ FTA weaknesses } \begin{itemize} \item Possibility to miss component failure modes. \item Possibility to miss environmental affects. \item One FTA tree, per system failure mode. Thus there is not one model from which several FTA trees can be derived. Maintainability and consistency cannot therefore be automatically checked. \item No possibility to model base component level double failure modes. \end{itemize} \subsection {FTA Example} Fault tree Analysis Show how it works, top down, FROM INTERBET HISTORY OF FTA % A simple fault tree % Author: Zhang Long, Mail: zhangloong[at]gmail.com %\def\pgfsysdriver{pgfsys-dvipdfm.def} %\documentclass{minimal} %\usepackage{tikz} %\usetikzlibrary{shapes.gates.logic.US,trees,positioning,arrows} %\begin{document} \begin{figure} \begin{tikzpicture}[ % Gates and symbols style and/.style={and gate US,thick,draw,fill=blue!40,rotate=90, anchor=east,xshift=-1mm}, or/.style={or gate US,thick,draw,fill=blue!40,rotate=90, anchor=east,xshift=-1mm}, be/.style={circle,thick,draw,fill=white!60,anchor=north, minimum width=0.7cm}, tr/.style={buffer gate US,thick,draw,fill=white!60,rotate=90, anchor=east,minimum width=0.8cm}, % Label style label distance=3mm, every label/.style={blue}, % Event style event/.style={rectangle,thick,draw,fill=yellow!20,text width=2cm, text centered,font=\sffamily,anchor=north}, % Children and edges style edge from parent/.style={very thick,draw=black!70}, edge from parent path={(\tikzparentnode.south) -- ++(0,-1.05cm) -| (\tikzchildnode.north)}, level 1/.style={sibling distance=7cm,level distance=1.4cm, growth parent anchor=south,nodes=event}, level 2/.style={sibling distance=7cm}, level 3/.style={sibling distance=6cm}, level 4/.style={sibling distance=3cm} %% For compatability with PGF CVS add the absolute option: % absolute ] %% Draw events and edges \node (g1) [event] {No flow to receiver} child{node (g2) {No flow from Component B} child {node (g3) {No flow into Component B} child {node (g4) {No flow from Component A1} child {node (t1) {No flow from source1}} child {node (b2) {Component A1 blocks flow}} } child {node (g5) {No flow from Component A2} child {node (t2) {No flow from source2}} child {node (b3) {Component A2 blocks flow}} } } child {node (b1) {Component B blocks flow}} }; %% Place gates and other symbols %% In the CVS version of PGF labels are placed differently than in PGF 2.0 %% To render them correctly replace '-20' with 'right' and add the 'absolute' %% option to the tikzpicture environment. The absolute option makes the %% node labels ignore the rotation of the parent node. \node [or] at (g2.south) [label=-20:G02] {}; \node [and] at (g3.south) [label=-20:G03] {}; \node [or] at (g4.south) [label=-20:G04] {}; \node [or] at (g5.south) [label=-20:G05] {}; \node [be] at (b1.south) [label=below:B01] {}; \node [be] at (b2.south) [label=below:B02] {}; \node [be] at (b3.south) [label=below:B03] {}; \node [tr] at (t1.south) [label=below:T01] {}; \node [tr] at (t2.south) [label=below:T02] {}; %% Draw system flow diagram % \begin{scope}[xshift=-7.5cm,yshift=-5cm,very thick, % node distance=1.6cm,on grid,>=stealth', % block/.style={rectangle,draw,fill=cyan!20}, % comp/.style={circle,draw,fill=orange!40}] % \node [block] (re) {Receiver}; % \node [comp] (cb) [above=of re] {B} edge [->] (re); % \node [comp] (ca1) [above=of cb,xshift=-0.8cm] {A1} edge [->] (cb); % \node [comp] (ca2) [right=of ca1] {A2} edge [->] (cb); % \node [block] (s1) [above=of ca1] {Source1} edge [->] (ca1); % \node [block] (s2) [right=of s1] {Source2} edge [->] (ca2); % \end{scope} \end{tikzpicture} \caption{Example FTA for a Gas Supply with two Shutoff Valves} \end{figure} \clearpage \subsection { FMEA } \label{pfmea} This is an early static analysis methodology, and concentrates on SYSTEM level errors which have been investigated. The investigation will typically point to a particular failure of a component. The methodology is now applied to find the significance of the failure. It is based on a simple equation where $S$ ranks the severity (or cost \cite{bfmea}) of the identified SYSTEM failure, $O$ its occurrence\footnote{The occurrence $O$ is the probability of the failure happening.}, and $D$ giving the failures detectability\footnote{Detectability: often failures may occur but not be noticed or cause an effect. Consider an unused feature failing.}. Muliplying these together, gives a risk probability number (RPN), given by $RPN = S \times O \times D$. This gives in effect a prioritised `to~do~list', with higher $RPN$ values being the most urgent. \subsubsection{ FMEA weaknesses } \begin{itemize} \item Possibility to miss the effects of failure modes at SYSTEM level. \item Possibility to miss environmental effects. \item No possibility to model base component level double failure modes. \end{itemize} \paragraph{Note.} FMEA is sometimes used in its literal sense, that is to say Failure Mode Effects analysis, simply looking at a systems' internal failure modes and determining what may happen as a result. FMEA described in this section (\ref{pfmea}) is sometimes called `production FMEA'. \subsection{FMECA} Failure mode, effects, and criticality analysis (FMECA) extends FMEA and adds a failure outcome criticallity factor. This is a bottom up methodology, which takes component failure modes and traces them to the SYSTEM level failures. % Reliability data for components is used to predict the failure statistics in the design stage. An openly published source for the reliability of generic electronic components was published by the DOD in 1991 (MIL HDK 1991 \cite{mil1991}) and is a typical source for MTFF data. % FMECA has a probability factor for a component error becoming % causing a SYSTEM level error. This is termed the $\beta$ factor. %\footnote{for a given component failure mode there will be a $\beta$ value, the %probability that the component failure mode will cause a given SYSTEM failure}. % This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea}, as often the component failure mode cannot be proven to cause a SYSTEM level failure, but is assigned a probability $\beta$ factor by the design engineer. The use of a $\beta$ factor is often justified using Bayes theorem \cite{probstat}. %Also, it can miss combinations of failure modes that will cause SYSTEM level errors. % The results of FMECA are similar to FMEA, in that component errors are listed according to importance, based on probability of occurrence and criticallity. % to prevent the SYSTEM fault of given criticallity. Again this essentially produces a prioritised `to~do~list'. %%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA). %%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or %%-WIKI- piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the %%-WIKI- probability of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability %%-WIKI- and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value. %%-WIKI- FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications, %%-WIKI- while various forms of FMEA predominate in other industries. \subsubsection{ FMECA weaknesses } \begin{itemize} \item Possibility to miss the effects of failure modes at SYSTEM level. \item The $\beta$ factor is based on heuristics and does not reflect any rigourous calculations. \item Possibility to miss environmental affects. \item No possibility to model base component level double failure modes. \end{itemize} \subsection { FMEDA or Statistical Analyis } Failure Modes, Effects, and Diagnostic Analysis (FMEDA) % This is a process that takes all the components in a system, and using the failure modes of those components, the investigating engineer ties them to possible SYSTEM level events/failure modes. % This technique evaluates a products statistical level of safety taking into account its self-diagnostic ability. The calculations and procedures for FMEDA are described in EN61508 %Part 2 Appendix C \cite{en61508}[Part 2 App C]. The following gives an outline of the procedure. \subsubsection{Two statistical perspectives} FMEDA is a statistical analysis methodology and is used from one of two perspectives, Probability of Failure on Demand (PFD), and Probability of Failure in continuous Operation, or Failure in Time (FIT). \label{survey:fit} \paragraph{Failure in Time (FIT).} Continuous operation is measured in failures per billion ($10^9$) hours of operation. For a continuously running nuclear powerstation, industrial burner or aircraft engine we would be interested in its operational FIT values. \label{survey:pfd} \paragraph{Probability of Failure on Demand (PFD).} For instance with an anti-lock system in automobile braking, or other fail safe measure applied in an emergency, we would be interested in PFD. That is to say the ratio of it failing to succeeding to operate correctly on demand. \subsubsection{The FMEDA Analysis Process} \paragraph{Determine SYSTEM level failures from base components} The first stage is to apply FMEA to the SYSTEM. % Each component is analysed in terms of how its failure would affect the system. Failure rates of individual components in the SYSTEM are calculated based on component type and environmental conditions. The SYSTEM errors are categorised as `safe' or `dangerous'. % %Statistical data exists for most component types \cite{mil1992}. % This phase is typically implemented on a spreadsheet with rows representing each component. A typical component spreadsheet row would comprise of component type, placement, part number, environmental stress factors, MTTF, safe/dangerous etc. %will be a determination of whether the component failing will lead to a `safe' %or `unsafe' condition. \paragraph{Overall SYSTEM failure rate.} The product failure rate is the sum of all component failure rates. Typically the sum of all MTTF rates for all components in an FMEDA spreadsheet. %This is the sum of safe and unsafe %failures. \paragraph{Self Diagnostics.} We next evaluate the SYSTEM's self-diagnostic ability. %Each component’s failure modes and failure rate are now available. Failure modes are now classified as safe or dangerous. This is done by taking a component failure mode and determining if the SYSTEM error it is tied to is dangerous or safe. The decision for this may be based on heuristics or field data. EN61508 uses the $\lambda$ symbol to represent probabilities. Because we have statistics for each component failure mode, we can now now classify these in terms of safe and dangerous lambda values. Detectable failure probabilities are labelled `$\lambda_D$' (for dangerous) and `$\lambda_S$' (for safe) \cite{en61508}. \paragraph{Determine Detectable and Undetectable Failures.} Each safe and dangerous failure mode is now classified as detectable or un-detectable. EN61508 assumes that products have a high level of self checking features. % This gives us four level failure mode classifications: Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU), and the probablistic failure rate of each classification is represented by lambda variables (i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). Because it is recognised that some failure modes may not be discovered theoretically during the static analysis, the % admission of how daft it is to take a component failure mode on its own % and guess how it will affect an ENTIRE complex SYSTEM % Admission of failure of the process really !!!! next step is to investigate using an actual working SYSTEM. Failures are deliberately caused (by physical intervention), and any new SYSTEM level failures are added to the model. Heuristics and MTTF failure rates for the components are used to calculate probabilities for these new failure modes along with their safety and detectability classifications (i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$). These new failures are added to the model. %SD, SU, DD, DU. With these classifications, and statistics for each component we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is) and its safe failure fraction (how many of its failures are self detected or safe compared to all failures possible). The calculations for these are described below. \paragraph{Diagnostic Coverage.} The diagnostic coverage is simply the ratio of the dangerous detected probabilities against the probability of all dangerous failures, and is normally expressed as a percentage. $\Sigma\lambda_{DD}$ represents the percentage of dangerous detected base component failure modes, and $\Sigma\lambda_D$ the total number of dangerous base component failure modes. $$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$ The diagnostic coverage for safe failures, where $\Sigma\lambda_{SD}$ represents the percentage of safe detected base component failure modes, and $\Sigma\lambda_S$ the total number of safe base component failure modes, is given as $$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$ \paragraph{Safe Failure Fraction.} A key concept in FMEDA is Safe Failure Fraction (SFF). This is the ratio of safe and dangerous detected failures against all safe and dangerous failure probabilities. Again this is usually expressed as a percentage. $$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$ %This is the ratio of %Step 4 Calculate SFF, SIL and PFD %The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used. %SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU) %PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time) % Often a given component failure mode there will be a $\beta$ value, the % probability that the component failure mode will cause a given SYSTEM failure. %\paragraph{Risk Mitigation} % %The component may be have its risk factor %reduced by the checking interval (or $\tau$ time between self checking procedures). % %Ultimately this technique calculates a risk factor for each component. %The risk factors of all the components are summed and %%give a value for the `safety level' for the equipment in a given environment. \paragraph{Classification into Safety Integrity Levels (SIL).} There are four SIL levels, from 1 to 4 with 4 being the highest safety level. In addition to probablistic risk factors, the diagnostic coverage and SFF have threshold bands beoming stricter for each level. Demanded software verification and specification techniques and constraints (such as language subsets, s/w redundancy etc) become stricter for each SIL level. %% %% Andrew asked me to expand on this here, but it would take at least two %% pages. I think its more appropriate for the survey.tex chapter. %% Thus FMEDA uses statistical methods to determine a safety level (SIL), typically used to meet an acceptable risk value, specified for the environment the SYSTEM must work in. EN61508 defines in general terms, risk assessment and required SIL levels \cite{en61508} [5 Annex A]. %the probability of %failures occurring, and provide an adaquate risk level. % %A component failure mode, given its MTTF %the probability of detecting the fault and its safety relevant validation time $\tau$, %contributes a simple risk factor that is summed %in to give a final risk result. % Thus an FMEDA model can be implemented on a spreadsheet, where each component has a calculated risk, a fault detection time (if any), an estimated risk importance and other factors such as de-rating and environmental stress. With one component failure mode per row, all the statistical factors for SIL rating can be produced\footnote{A SIL rating will apply to an installed plant, i.e. a complete installed and working SYSTEM. SIL ratings for individual components or sub-systems are meaningless, and the nearest equivalent would be the FIT/PFD and SFF and diagnostic coverage figures.}. \subsubsection{FMEDA and failure outcome prediction accuracy.} FMEDA suffers from the same problems of lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}. % This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level. This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit may be part of a critical monitoring function. The analyst is now put in a position where he probably should assign a dangerous failure classification to it. % There is no analysis of how that resistor would/could affect the components close to it, but because the circuitry is part of critical section it will most likely be linked to a dangerous system level failure in an FMEDA study. % %%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA???? %%- %A $\beta$ factor, the heuristically defined probability %of the failure causing the system fault may be applied. % %In FMEDA there is no detailed analysis of the failure mode behaviour %of the component in its local environment %Component failure modes are traceable directly to the SYSTEM level. %it becomes more %guess work than science. % With FMEDA, there is no rigorous cause and effect analysis for the failure modes and how they interact on the micro scale (the components adjacent to them in terms of functionality). Unintended side effects that lead to failure can be missed. Also component failure modes that are not dangerous, may be wrongly assigned as dangerous simply because they exist in a critical section of the product. % some critical component failure %modes, but we can only guess, in most cases what the safety case outcome %will be if it occurs. This leads to the practise of having components within a SYSTEM partitioned into different safety level zones as recomended in EN61508\cite{en61508}. This is a vague way of determining safety, as it can miss unexpected effects due to `unexpected' component interaction. The Statistical Analysis methodology is the core philosophy of the Safety Integrity Levels (SIL) embodied in EN61508 \cite{en61508} and its international analog standard IOC5108. \subsubsection{ FMEDA weaknesses } \begin{itemize} \item Possibility to miss the effects of failure modes at SYSTEM level. \item Statistical nature allows a proportion of undetected failures for given S.I.L. level. \item Allows a small proportion of `undetectable' error conditions. \item No possibility to model base component level double failure modes. \end{itemize} %AND then how we can solve all there problems \subsection{Deterministic FMEA} EN298 no two individual component failures may give rise to a dangerous condition. \section{FMEDA Failure effect Mode Diagnositic Analysis} This is the main babsis of SIL certification for Programmed Electronic Equipment. Itr applies FMEA, with classification of criticality of components, adjustment to MTTF values by self checking mechanisms in the product, and mitigation for a safe failure fraction. This leads to a probablistic mean time to failure or probability of failure on demand, that will fall within the criteria for a given SIL safety level. An overview for this method can be found in an EXIDA paper \cite{fmeda} and detailed description of the method for SIL certification in part 2 of EN61508 \cite{en61508}. disadvantage: single component failure is used to determine its effect on the entire system. This leads to classifying components as safety or non-safety critical at an early stage in the analysis. This means that complex interactions or side effects of the components failing may not be taken into account. advantage: concepts of self checking systems, and safe failure fraction\footnote{Safe Failure Fraction (SFF) is the number of non-safety critical components that can be detected as failed compared to the number of safety critcal components. The thinking here is that is components are detected as failing even though they are not safety critical, the system is self checking a greater proportion of its own systems, and is therefore safer. This is applying bayes theorem for probablistic error detection} This is a probablistic based methodology. \subsection{Safe Failure Fraction} Introduce the idea of coverage. A good example is RAM in a microprocessor/microcontroller, we cann ot give 100i\% coverage to it. We can perform some tests that give us 60\% coverage etc \subsection{Diagnostic interval} Reducing FIT with detecting a fraction of the faults within an interval. Give formulas etc \subsection{Redundancy - Models} 1oo1 2oo3 etc \subsection{Field Data} OK for EN61508, not OK for nuclear industry find refs. \subsection{Bayes Theorm in Relation to Failure Modes} \paragraph{Conditional Probability} Bayes theorem describes the probability of causes. In the context of failure modes in components we are interested in how they may affect a SYSTEM. The SYSTEM failure modes can be seen as symptoms of the failure modes of base components. For example, let $B$ be a base component failure mode abd let $S$ be a system level failure mode. We can say that the conditional probability of $S$ given $B$ is denoted as \begin{equation} \label{eqn:condprob} P(S|B) = \frac{P(S \cap B)}{P(S)} \end{equation} %\paragraph{Multiple Events and conditional Probability} % %add copy, describe probabilities for multiple events..... %Or in other words we can say that the probability of $B$ and $S$ occurring %divided by the probability of $S$ occurring due to any cause, is the probability %the $B$ caused $S$. We can call this the {\em conditional probability} of $S$ given $B$. Re-arranging \ref{eqn:bayes1} $$ P(S) P(S|B) = P(S \cap B) $$ The inverse condition, $B$ given $S$ is $$ P(B) P(B|S) = P(S \cap B) $$ As for one being the cause of the other, both equations must be equal, we can state, \begin{equation} \label{eqn:bayes0} P(B) P(B|S) = P(S \cap B) = P(S) P(S|B). \end{equation} We can now re-arrange the equation~\cite{probstat} to remove the intersection $P(S \cap B)$ term thus \begin{equation} \label{eqn:bayes1} P(S|B) = \frac{P(S) P(B|S)} {P(B)} . \end{equation} This equation gives us the probability that if event B has occurred, of the event S occurring. In the context of failure mode analysis, the event B would be the occurance of a component failure mode, and S would be a system level error. We can redefine $P(B)$ using equation \ref{eqn:bayes0} $$ S = \bigcup_{i=1}^{i=N} S \cap B_n $$ now to find the probabilities we can express this as $$ P(S) = P \big( \bigcup_{i=1}^{i=N} S \cap B_n \big) = \sum_{i=1}^{i=N} P(B|S) P(B) $$ and $$ P(S) = P \big( \bigcup_{i=1}^{i=N} S \cap B_n \big) = \sum_{i=1}^{i=N} P(S|B) P(S) $$ We can express bayes theorem thus \begin{equation} \label{eqn:bayes2} P(S|B) = \frac{P(S) P(B|S)} { \sum_{i=1}^{i=N} P(S|B) P(S) } . \end{equation} % %Equation \ref{eqn:bayes1} means, given the event $B$ what is the probability it was caused by $S$. %Because we are interested in what base component failure modes could have caused $S$ %we need to re-arrange this %\begin{equation} %\label{eqn:bayes2} % P(B|S) = \frac{P(B) P(S|B)} {P(S)} . %\end{equation} % %Equation \ref{eqn:bayes2} can be read as given the system failure mode $S$ Typically a system level failure will have a number of possible causes, or base component failure modes. For probability we are interested in these failure modes occuring, or rather the event of the failure modes becoming active. We can represent the the base component failure mode events as a partioned set~\cite{nucfta}[fig VI-7], and overlay a given system failure mode on it. \paragraph{Bayes Theorem} Consider a SYSTEM error that has several potential base component causes. Because a SYSTEM typically has a number of high level errors let us consider a specific one and label it $S_k$. We can call $P(S_k)$ the prior probability of the SYSTEM error. That is to say the iprobability od $S_k$ occuring with no information about possible causes for it. Consider a number of possible base component `potential cause' events as $B_n$ where $n$ is an index. Our sample space $SS$, for investigating the system failure mode/symptom $S_k$ is thus $ SS = \{B_1 ... B_n\} $. We can apply bayes theorem to determine the statistical likelihood that a given failure mode $B_n$ will cause the system level error $S_k$ useing equation \ref{eqn:bayes1}. \begin{figure}[h] \centering \includegraphics[width=350pt,keepaspectratio=true]{./survey/partition.jpg} % partition.jpg: 510x264 pixel, 72dpi, 17.99x9.31 cm, bb=0 0 510 264 \caption{Base Component Failure Modes represented as partitioned sets} \label{fig:partitionbcfm} \end{figure} Figure \ref{fig:partitionbcfm} represents a small theoretical system with nine events. representing failure mode events. \begin{figure}[h] \centering \includegraphics[width=350pt,keepaspectratio=true]{./survey/partition2.jpg} % partition.jpg: 510x264 pixel, 72dpi, 17.99x9.31 cm, bb=0 0 510 264 \caption{Base Component Failure Modes with Overlaid System Error} \label{fig:partitionbcfm2} \end{figure} Some base component failure modes may not be able to cause given system failures. Figure \ref{fig:partitionbcfm2} represents the case where we are looking at a particular system level failure $S_k$. Looking at the diagram we can see that this system failure could be, but is not necessarily caused by base component failure modes $B_1, B_2 \; or \; B_4$. Should any other base component failure mode (causation event occur) according to the diagram it will not be able to cause the system failure $S_k$. %IN ENGLEEEESH Inverse causality..... %Prob $B_n$ caused $S_k$ is the prob $S_k$ caused by $B_n$ divided by prob of $B_n$ %%% \begin{equation} %%% P(S_k|B_n) = \frac{P(S_k) \; P(B_n | S_k) }{P(B_n)} %%% %alternate form of no use to MEEEEEE %%% %P(B_n|S_k) = \frac{P(B_n) \; P(S_k | B_n) }{P(S_k)} %%% \end{equation} For example were we to have a component that has a failure mode $B_n$ with an MTTF of $10^{-7}$ hours and its associated system failure mode $S_k$ has a MTTF of $5.10^{-8}$ hours, and given that when the system error $S_k$ occurs, there is a 10\% probability that $B_n$ had occured (i.e. $P(S_k | B_n) = 0.1$), we can determine the probability that $S_k$ is caused by $B_n$ thus $$ P(S_k|B_n) = \frac{5.10^{-8} .\; 0.1 }{ 10^{-7}} = 0.05 = 5\% $$ Some base component failure modes may not be able to cause given system failures. For instance in the diagram \ref{fig:partitionbcfm2} events $B_5 ... B_9$ cannot cause event $S_k$. Taking an example from the diagram (figure \ref{fig:partitionbcfm2}), where the base component fault cannot lead to the system failure $S_k$. Taking say $B_9$ which does not overlap with $S_k$ (i.e. $B_9 \cap S_k = \emptyset $), we can see that $P(S_k | B_9) = 0$. Bayes theorem applied to $B_9$ becomes $$P(S_k|B_9) = \frac{P(B_9) .\; 0 }{ 10^{-7}} = 0 = 0\%$$ As $ P(S_k | B_n)$ is a factor in the numerator, the application of bayes theorem to $B_9$ being a cause for $S_k$ has a probability of zero, as we would expect. %%%% %% BAYES Because we are interested in finding the probability of $S_k$ for all base component failure modes, it is helpful to re-define $P(S_k)$. In terms oif set intersection, we can express $S_k$ as $$ S_k = \bigcup_{i=1}^{i=N} S_k \cap B_n .$$ now to find the probabilities we can express this as $$ P(S_k) = P \big( \bigcup_{i=1}^{i=N} S_k \cap B_n \big) = \sum_{i=1}^{i=N} P(B_i|S_k) P(B_i) $$ and $$ P(S_k) = P \big( \bigcup_{i=1}^{i=N} S_k \cap B_n \big) = \sum_{i=1}^{i=N} P(S_k|B_i) P(S_k) $$ We can express bayes theorem thus \begin{equation} \label{eqn:bayes2} P(S_k|B_n) = \frac{P(S_k) P(B|S_k)} {\sum_{i=1}^{i=n} P(B_i|S_k) P(B_i)} . \end{equation} % % here derive the trad version of bayes with the summation as the denominator % RESTRICTIONS: Because this uses conditional probability for multiple independent events complications such as operational states or envi1ronmental conditions cannot be represented by the Bayesian model. % consider 747 engines and a volcanic ash cloud.... \paragraph{mutually independent events and base component failure statistics} FMEA, FTA, FMECA and to a great extent FMEDA, apply bayesian concepts to individual base~components failure rates, rather than using base~component failure modes, for the events under investigation. This means a lack of precision in interpretting the base failure modes as statistically independent events. Typically, a base component may fail in more than one way, and usually once it has it stays in that failure mode. This violates the principle of the events being statistically independent. show using area propostional Euler Diagrams the failure modes and their possible sdystem level failure outcomes. Discuss unused sections of hardware in a product. Discuss protection devices like VDR's and capacitors for smoothing Discuss microprocessor watchdog and CRC ROM schemes Discuss hardware failsafes (good example over pressure saefty values). Keep relating these back to bayes theorem. typeset in {\Huge \LaTeX} \today