From 53352f41d87e0db41a23770a5da443a9345b982e Mon Sep 17 00:00:00 2001 From: Robin Clark Date: Thu, 4 Nov 2010 08:05:52 +0000 Subject: [PATCH] morning edit --- fmmd_concept/fmmd_concept.tex | 104 ++++++++++++++++++++-------------- 1 file changed, 60 insertions(+), 44 deletions(-) diff --git a/fmmd_concept/fmmd_concept.tex b/fmmd_concept/fmmd_concept.tex index 50fdbea..b57eb89 100644 --- a/fmmd_concept/fmmd_concept.tex +++ b/fmmd_concept/fmmd_concept.tex @@ -7,14 +7,14 @@ \abstract{ This paper proposes a methodology for creating failure mode models of safety critical systems, which -has a common notation +have a common notation for mechanical, electronic and software domains and apply an incremental and rigorous approach. %% What I have done %% The Four main static failure mode analysis methodologies were examined and -in in the context of newer European safety standards assessed. +in the context of newer European safety standards assessed. Some of the defeciencies in these methodologies lead to a wish list for a more ideal methodology. @@ -27,7 +27,7 @@ methodology is developed. The has been named Failure Mode Modular De-Composition %% Sell it %% In addition to addressing the traditional weaknesses of -Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Faliue Mode Effects Criticallity Analysis (FMECA) +Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticallity Analysis (FMECA) and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios as specified in newer European Safety Standards \cite{en298}. The proposed methodology is bottom-up and @@ -145,14 +145,14 @@ are held in a computer program, we can determine if the model is complete \subsection{General Comments on bottom-up and top down approaches} -\paragraph{A general Problem with top-down} +\paragraph{A general defeciency in top-down systems analysis} With a top down approach the investigator has to determine -a set of undesireable outcomes or accidents. +a set of undesirable outcomes or accidents. As most accidents are unexpected and the causes unforseen \cite{safeware} it is fair to say that a top down approach is not guaranteed to predict all possible undesirable outcomes. It also can miss known component failure modes, by -simple not de-composing down to that level of detail. +simply not de-composing down to that level of detail. \paragraph{A general problem with bottom-up} With the bottom up techniques we have all the known component failure modes @@ -165,22 +165,22 @@ we cannot consider them all and human judgement is used to decide which interactions are important. Let N be the number of components in our system, and K be the average number of component failure modes -(ways in which the component can fail). The total number of base component failure modes -is $N \times K$. To even examine the affect that one failure mode has on all the other components +(ways in which the component can fail). The total number of base comp failure modes +is $N \times K$. To examine the affect that one failure mode has on all the other components will be $(N-1) \times N \times K$, in effect a set cross product. Complicate this further with applied states or environmental conditions and another order of cross product of complexity is added. -We may have a peice of self checking circuity for instance that +We may have a piece of self checking circuitry for instance that has two states, normal and testing mode commanded by a logic line. Or we may have a mechanical device that has a different -failure mode behaviour for say, differnet ambient pressures or temperatures. +failure mode behaviour for say, different ambient pressures or temperatures. If $E$ is the number of applied states or environmental conditions to consider in a system, the job of the bottom-up analyst is complicated by a cross product factor again $(N-1) \times N \times K \times E$. -If we put some typical very small embedded system numbersi\footnote{these figures would +If we put some typical very small embedded system numbers\footnote{these figures would be typical of a very simple temperature controller, with a micro-controller sensor and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$ we have $99 \times 100 \times 2.5 \times 10 = 247500 $. To look in detail at a quarter of a million test cases is obviously impractical. @@ -188,7 +188,7 @@ To look in detail at a quarter of a million test cases is obviously impractical. If we were to consider multiple simultaneous failure modes we have yet another complication cross product. -For instance for looking at double simultaneous failure modes +For instance for looking at double simultaneous failure modes, the equation reads $(N-2) \times (N-1) \times N \times K \times E$. The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes and link them @@ -198,8 +198,8 @@ component failure mode to the SYSTEM level. -\paragraph{Ideal Static failure mode methodology} -An ideal Static failure mode methodology would build a failure mode model +\paragraph{Ideal static failure mode methodology} +An ideal static failure mode methodology would build a failure mode model from which the traditional four models could be derived. It would address the short-comings in the other methodologies, and would have a user friendly interface, with a visual (rather than mathematical/formal) syntax with icons @@ -217,7 +217,7 @@ of missing component failure modes \cite{faa}[Ch.9]. %, or modelling at %a too high level of failure mode abstraction. FTA was invented for use on the minuteman nuclear defence missile -systems in the early 1960's and was not designed as a rigorous +systems in the early 1960s and was not designed as a rigorous fault/failure mode methodology. It is more like a structure to be applied when discussing the safety of a system, with a top down hierarchical notation, that guides the analysis. This methodology was designed for @@ -244,7 +244,7 @@ The investigation will typically point to a particular failure of a component. The methodology is now applied to find the significance of the failure. Its is based on a simple equation where $S$ ranks the severity (or cost \cite{fmea}) of the identified SYSTEM failure, -$O$ its occurrance, and $D$ giving the failures detectability. Muliplying these +$O$ its occurance, and $D$ giving the failures detectability. Muliplying these together, gives a risk probability number (RPN), given by $RPN = S \times O \times D$. This gives in effect @@ -286,7 +286,7 @@ The results, as with FMEA are an $RPN$ number determining the significance of th %%-WIKI- while various forms of FMEA predominate in other industries. -\subsubsection{ FMEA weaknesses } +\subsubsection{ FMECA weaknesses } \begin{itemize} \item Possibility to miss the effects of failure modes at SYSTEM level. \item Possibility to miss environmental affects. @@ -314,7 +314,7 @@ The component may be mitigated by a vatriety of factors \item Coverage of self checking \end{itemize} -Ultimately this tequnique calculates a risk factor for each component. +Ultimately this technique calculates a risk factor for each component. The risk factors of all the components are summed and give a value for the `safety level' for the equipment in a given environment. @@ -327,7 +327,7 @@ give a value for the `safety level' for the equipment in a given environment. %%-• The design strength (de-rating, safety factors) and %%-• The operational profile (environmental stress factors). -This uses MTFF and other statisical models to determine the probability of +This uses MTFF and other statistical models to determine the probability of failures occurring. % A component failure mode, given its MTTF @@ -342,21 +342,29 @@ and other factors such as de-rating and environmental stress. This can be calculated, with one component failure mode per row, on a spreadsheet and these are all summed to give the final assessment figure. -\paragraph{Two statistical perspectives} -The Statistical Analysis method is used from two perspectives, +\subsubsection{Two statistical perspectives} +he Statistical Analysis method is used from two perspectives, Probability of Failure on Demand (PFD), and Probability of Failure -in continuous Operation, Failure in Time (FIT) and measured in failures per billion ($10^9$) hours of operation. +in continuous Operation, Failure in Time (FIT). +\paragraph{Failure in Time (FIT)}. + +Continuous operation is measured in failures per billion ($10^9$) hours of operation. +For a continuously running nuclear powerstation +we would be interested in its operational FIT values. + +\paragraph{Probability of Failure on Demand (PFD)}. For instance with the anti-lock system on a automobile braking system, we would be interested in PFD. -For a continuously running nuclear powerstation -we would be interested in its 24/7 operation FIT values. +That is to say the ratio of it failing +to succeeding on demand. +\subsubsection{FMEDA and determinability prediction accuracy}. This suffers from the same problems of lack of determinability prediction accuracy, as FMEA above. % We have to decide how particular components failing will impact on the SYSTEM or top level. This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit -may be part of a critical monitioring function. +may be part of a critical monitoring function. The analyst is now put in a position where he must assign a critical failure possibility to it. % @@ -365,10 +373,10 @@ of how that resistor would/could affect that circuit, but because the circuitry it is part of critical section it will be linked to a critical system level fault. % A $\beta$ factor, the hueristically defined probability -of the failure causign the system fault may be applied. +of the failure causing the system fault may be applied. % But because there is no detailed analysis of the failure mode behaviour -of the component, traceable to the SYSTEM level, it becomnes more +of the component, traceable to the SYSTEM level, it becomes more guess work than science. With FMEDA, there is no rigorous cause and effect analysis for the failure modes. Unintended side effects that lead to failure can be missed. @@ -405,6 +413,10 @@ for its results. \item It should be capable of producing reliability and danger evaluation statistics. \item It should be easy to use, Ideally using a graphical syntax (as oppossed to a formal mathematical one). \item From the top down, the failure mode model should follow a logical de-composition of the functionality +for its results. +\item It should be capable of producing reliability and danger evaluation statistics. +\item It should be easy to use, ideally using a graphical syntax (as oppossed to a formal mathematical one). +\item From the top down, the failure mode model should follow a logical de-composition of the functionality to smaller and smaller functional modules \cite{maikowski}. \item Multiple failure modes may be modelled from the base component level up. \end{itemize} @@ -412,7 +424,7 @@ to smaller and smaller functional modules \cite{maikowski}. \section{Design of a new static failure mode based methodology} -\paragraph{New methodology Must be bottom-up} +\paragraph{New methodology must be bottom-up} In order to ensure that all component failure modes have been covered the methodology will have to work from the bottom-up and start with the component failure modes. @@ -422,7 +434,7 @@ The traditional fault finding, or natural fault finding is to work from the top down. % On encountering a -fault, the symptom is first know at the top or +fault, the symptom is first observed at the top or SYSTEM level. By de-composing the functionality of the faulty system and testing we can further de-compose the system until we find the faulty base level component. @@ -432,10 +444,10 @@ Simpler and simpler functional blocks are discovered as we delve further into the way the system works and is built. \paragraph{Design Decision: Methodology must be bottom-up.} -In oder to ensure that all component failure modes are handled, +In order to ensure that all component failure modes are handled, this methodology must start at the bottom, with base component failure modes. In this way automated checking can be applied to all component failure modes -to ensure none have been inadvertantly excluded from the process. +to ensure none have been inadvertently excluded from the process. \paragraph{Need for a `bottom-up' system de-composition} There is an apparent conflict here. The natural way to @@ -450,7 +462,7 @@ and then taking those to form higher level The philosophy of top down de-compositon is very similar. Top down de-compositon applies functional de-composition, because it seeks to break the system down -into manageable and separatetly testable entities. +into manageable and separately testable entities. A second justification for this is that the design process for a product requires both top down and bottom-up thinking. @@ -463,17 +475,21 @@ The base components will typically have several failure modes each. Given a typical embedded system may have hundreds of components This means that we have to tie base component failure modes to SYSTEM level errors. This is the `possibility to miss failure mode effects -at SYSTEM level' critism of the FTA, FMEDA and FMECA methodologies. +at SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies. \paragraph{Design Decision: Methodolgy must reduce and collate errors at each functional group stage.} -SYSTEMS typically have far fewer failure modes then the sum of their component failure modes. +SYSTEMS typically have far fewer failure modes than the sum of their component failure modes. SYSTEM level failures may be caused by a variety of component failure modes. A SYSTEM level failure mode is an abstracted failure mode, in that it is a symptom of some lower level failure or failures. % ABSTRACTION For instance a failed resistor in a sensor at a base component level is a specific -failure mode. For example it could be called `RESISTOR 1 OPEN'. -Its symptom in a functional group comprising the sensor channel that reads from it may be more abstract. +failure mode. +% +For example it could be called `RESISTOR 1 OPEN'. +Its symptom in a functional group comprising the sensor channel that reads from it may be more abstract +or in other words describe the effect more generally. +% We might call it `READING~HIGH' perhaps. At a higher level still this may be called `SENSOR CHANNEL 1' fault. At a system level it may simply be a `SENSOR FAILURE'. @@ -489,7 +505,7 @@ of failure modes as the abstraction level reaches the SYSTEM level. The next problem is how to we build a failure mode model that converges to a finite set of SYSTEM level failure modes. % -It would be better would be to analyse the failure mode behaviour of each +It would be better to analyse the failure mode behaviour of each functional group, and determine the ways in which it, rather than its components, can fail. % @@ -506,7 +522,7 @@ The number of symptoms of failure should be equal to or less than the number of component failure modes, simply because often there are several potential causes of failure symptoms. % -When we have the the symptoms, we can start thinking of the {\fg} as a component in its own right. +When we have the symptoms, we can start thinking of the {\fg} as a component in its own right. %with a simplified and reduced set of failure symptoms. % We can now create a new {\dc}, where its failure modes @@ -538,7 +554,7 @@ are the same failure w.r.t. the {\fg}. % We can now treat the {\fg} as a component, and call it a {\dc}, in other words, a sub-system with a known set of failure modes. % -We can now create a new{\dc} and assign it these common symptoms +We can now create a new {\dc} and assign it these common symptoms as its failure modes. % This {\dc} can be used to build higher level @@ -548,7 +564,7 @@ an entire system. It can be considered complete when all failure modes from all components are handled and connectable to a SYSTEM level failure mode. -\paragraph{Directed Acyclic Graph}. This will naturally form a DAG +\paragraph{Directed Acyclic Graph.} This will naturally form a DAG meaning that for all SYSTEM failure modes, we will be able to trace back through the DAG to possible component failure mode causes. If statistical models exist for the component failure modes @@ -577,7 +593,7 @@ We can then treat the {\fg} as a `black box' or component in its own right. We can now look at how the {\fg} can fail. % Many of the component failure modes will -cause the same failure symptoms in the {fg} failure behaviour. +cause the same failure symptoms in the {\fg} failure behaviour. We can collect these failures as common symptoms. % When we have our set of symptoms, we can now create @@ -605,8 +621,8 @@ This ensures that all component failure modes are handled. \subsubsection{ It should be easy to integrate mechanical, electronic and software models.} -Because component failure modes are considered, we have a generic enitity to model. -We can describe a mecanical, electrical or software component in terms of its failure modes. +Because component failure modes are considered, we have a generic entity to model. +We can describe a mechanical, electrical or software component in terms of its failure modes. % Because of this we can model and analyse integrated electro mechanical systems, controlled by computers, @@ -670,7 +686,7 @@ chosing {\fg}s and working bottom-up this hierarchical trait will occur as a nat \section{Conclusion} -This paper provides the backgroud for the need for a new methodology for +This paper provides the background for the need for a new methodology for static analysis that can span the mechanical electrical and software domains using a common notation. \vspace{60pt}