Robin_PHD/fmmdset/fmmdset.tex
2010-02-19 15:01:40 +00:00

926 lines
42 KiB
TeX

% $Id: fmmdset.tex,v 1.7 2009/06/06 11:52:09 robin Exp $
%
\newcommand{\Fam}{{\mathbb F}}
\newcommand{\Pow}{{\mathbb P}}
\newcommand{\Dis}{{\vee}}
\newcommand{\Con}{{\wedge}}
\newcommand{\FMEA}{{\bowtie}}
%
\newcommand{\Nat}{{\mathbb N}}
\newcommand{\Real}{{\mathbb R}}
\newcommand{\Complex} {{\mathbb C}}
\newcommand{\Rational} {{\mathbb Q}}
%
%\newtheorem{theorem}{Thoeorem}
%
% \def\lastname{Clark}
% \begin{document}
% \begin{frontmatter}
% \title{ Failure Mode Modular De-Composition } \author{Robin Clark\thanksref{ALL}\thanksref{r.clark@energytechnologycontrol.com}}
% \address{ Energy Technology Control\\
% 25 North Street, Lewes, BN7 2PE, Great Britain}
\begin{abstract}
This chapter describes a process for analysing safety critical systems, to formally prove how safe the
designs and built -in safety measures are. It provides
the rigourous method for creating a fault effects model of a system from the bottom up using part level fault modes.
From the model fault trees,
modular re-usable sections of safety critical systems,
and accurate, statistical estimation for fault frequency can be derived automatically.
It provides the means to trace the causes of dangerous detected and dangerous undetected faults.
It is intended to be used to formally prove systems to meet EN and UL standards, including and not limited to
EN298, EN61508, EN12067, EN230, UL1998.
\end{abstract}
% \begin{keyword}
% fault~tree fault~mode EN298 EN61508 EN12067 EN230 UL1998 safety~critical
% \end{keyword}
% \end{frontmatter}
% \bibliographystyle{unsrt}
\section{Introduction}
%This paper describes the Failure Mode Modular de-Composition (FMMD) method.
% described here, models a safety critical system from the bottom up.
The purpose of the FMMD methodology is to apply formal techniques to
the assessment of safety critical designs, aiding in identifying detected and undetected faults
\footnote{Undetectabed faults
are faults which may occur but are not self~detected, or are impossible to detect by the system}.
Formal methods are just begining to be specified in some safety standards.\footnote{Formal methods
such as the Z notation appear as `highly recomended' techniques in the EN61508 standard, but
apply only to software currently}.However, some standards are now implying the handling of
simultaneous faults which complicates the scenario based approvals that are currently used.
% Some safety critical system assemesment criteria
%are statistical, and require a target failure rate per hour of operation be met \cite{EN61508}.
%Specific safety standards may apply criteria such as no single part failure in a system may lead to
%a dangerous fault.
There are two main philosophies in assessing safety critical systems.
One is to specify an acceptable level of dangerous faults per hour of operation\footnote{The probability of failure per hour (PFH)
is measured in failures per 1e-9 seconds}.
This is a statistical approach. This is the approach taken by the European safety reliability
standard EN61508 commonly referred to as the Safety Integrity Level (SIL)
standard.
The second is to specify
that any single or double part faults cannot lead to a dangerous fault in the system under consideration.
This entails tracing the effects of all part failure modes
and working out if they can lead to any dangerous faults in the system under consideration.
%For instance, during WWII after operational research teams had analysed data it was determined that
% an aircraft engine that can, through one part failure cause a catastrophic failure is an unacceptable design.\cite{boffin} .
Both of these methods require a complete fault analysis tree.%\cite{FMEA}.
The statistical method
requires additional Mean Time To Failure (MTTF) data for all part failure modes.
The FMMD methodology applies defined stages and processes that will
create a modular fault mode hierarchy. From this
complete fault analysis trees can be determined. It uses a modular approach, so that repeated sections
of system design can be modelled once, and re-used.
%formally prove safety critical
%hardware designs.
The FMMD method creates a hierarchy from
part~fault~mode level up to system level.
%It does this using
%well defined stages, and processes.
%It allows re-use of analysed modules DOH DOH DOH
%, and to create a framework where
%fault causation trees, and statistical likelihood
%of faults occurring are
When a design has been analysed using this method, fault~trees may be traversed, and statistical likelihoods of failure
and dangerous~faults can be determined from traversing the fault tree down to the MTTFs of individual parts.
%Starting with individual part failure modes, to collections of %parts (modules)
%and then to module level fault modes.
\subsection{Basic Concepts Of FMMD}
\subsubsection { What is a part ? }
A Part here means a lowest level component, an entity which can be bought and
hopefully has some statisics for MTTF and known failure modes.
Where manufacturers MTTF data is non-existant (or unverified) a guide such as MIL1992\cite{MIL1992} may be used.
Parts for approved safety critical systems under formal observance~\cite{FMproduction}
will be documented in a parts list. The `parts list' is a formal document for
both design and quality assured production.
A parts list will typically include
the manufacturers part number, guidance for placement in the system,
a functional description, vendor parts numbers
and a description. A resistor, capacitor or a microcontroller would be typical of a `part'.
A part will normally have a set of failure~modes, or ways in which it can fail. Thus a set of failure~modes
is associated with each part.
\subsubsection{ Creating a fault hierarchy}
The main idea of the methodology is to build a hierarchy of fault modes from the part
level up to highest system levels.
The first stage is to choose
parts that interact and naturally form {\em functional groups}. {Functional groups} are thus collections of parts.
%These parts all have associated fault modes. A module is a set fault~modes.
From the point of view of fault analysis, we are not interested in the parts themselves, but in the ways in which they can fail.
For this study a Module will mean a collection of fault modes.
For a module formed from a {\em functional group} this will be the collection of all fault modes of its parts.
By analysing the fault behaviour of a `module' with respect to all the fault~modes,
we can derive a new set of possible fault~modes at the module level.
This new set of faults is the set of derived faults from the module level and is thus at a higher level of
fault~mode abstraction. Thus we can say that the module as a whole entity can fail in a number of well defined ways.
In other words we have taken a functional group, and analysed how it can fail according to the failure modes of its parts.
The ways in which the module can fail now become a new set of fault modes, the fault~modes
derived from the module. What this means is the `fault~symptoms' of the module have been derived.
%When we have determined the fault~modes at the module level these can become a set of derived faults.
By taking sets of derived faults (module level faults) we can combine these to form modules
at a higher level of fault abstraction. An entire hierarchy of fault modes can now be built in this way,
to represent the fault behaviour of the entire system. This can be seen as using the modules we have analysed
as parts, parts which may now be combined to create new functional groups,
but as parts at a higher level of fault abstraction.
%Choosing small sections of the system, by choosing a collection of interacting parts we can form
%modules. These parts are collected together to form modules $M$. These modules are then
%analysed with respect to the failure modes of the parts that make them up.
%This leads to a set of fault modes for each module, the derived fault~modes $D$.
%These derived fault modes may now be used at a higher abstraction level.
%A hierarchy can be built, with each level representing a higher level of fault abstraction.
%Each part may fail in a number of ways.
We can take the levels of abstraction and view them as a hierarchy, with part level faults at the bottom.
Each time we analyse a set of fault modes, with respect to how they interact,
we get a new set of faults a higher level of fault abstraction.
At the lowest level of fault~mode abstraction is the
part failure modes. This is the hierarchy abstraction level zero. Functional~groups, collections of parts are also at
abstraction level 0. Combining derived fault modes from functional groups to form {\em modules}, will
also be at level 0.
%By deriving the fault modes for a particular module.
A set of faults derived from a `module' will be at abstraction level 1.
Thus the module, a collection of parts is analysed, and the fault symtopms of that module derived.
The act of analysing and deriving new faults raises the abstraction level.
Simple parts may have
a single failure mode, and more complex ones may have hundreds.
% Hazel 11FEB2008 said this was difficult to read
%It can be easily seen that trying to judge the effect of a single part mode failure against
%all other parts failure modes in the system would be a thankless and practically impossible task.
Were we to analyse the effect of each part failure mode against all other parts in the circuit,
we could be facing a very large task.
Modularisation allows detailed (part fault mode level) analysis of well defined functional groups within a system.
The hierarchy allows the combining of these modules to form meaningful fault represention of the system.
An example of a simple system will illustrate this.
\subsection{Example FMMD Hierarchy}
%%% This is the tikz picture ??/
%
%\begin{figure}[h+]
%\centering
%\input{fmmdset/mvsblock.tex}
%\caption{Block Diagram : Example Milli-Volt Sensor : Block Diagram}
%%\includegraphics[scale=0.20]{ptop.eps}
%\label{fig:mvsblock}
%\end{figure}
%
Consider a simple electronic system, that provides say two milli amplifiers
which supplies these onward via serial link - RS232. This is simple in concept, plug in a
computer, run a terminal prgram, and the instrument will report the milli volt readings in ASCII
with any error messages.
% in CRC checksum protected packets.
It is interesting to look at one of `functional groups'. The millivolt amplifiers are a good example.
These can be analysed by taking a functional~group, the components surrounding the op-amp,
a few resistors to determine offset and gain,
a safety resistor, and perhaps some smoothing capacitiors.
These components form the functional group. The circuit is then analysed for all the fault combinations
of these parts. This produces a large collection of possible fault~modes for the milli-volt amplifier.
The two amplifiers are now connected to the ADC which converts the voltages to binary words for the microprocessor.
The microporessor then uses the values to determine if the readings are valid and then formats text to send
via the RS232 serial line.
\begin{figure}[h+]
%\centering
%\input{millivolt_sensor.tex}
\includegraphics[scale=0.4]{fmmdset/millivolt_sensor.eps}
\caption{Hierarchical Module Diagram : Milli-Volt Sensor Example}
\label{fig:mvs}
\end{figure}
This has a number of obvious functional~groups, the PCB power supply, the milli-volt amplifiers,
the analog to digital conversion circuity, the micro processor and the UART (serial link - RS232 transceiver).
It would make sense when analysing this system to take each one of these functional~groups in turn and examine them closely.
It would be sensible if the system could detect the most obvious fault~modes by self testing.
When these have been examined and diagnostic safeguard strategies have been thought up,
we might look at reporting any fault via the RS232 link.
% (if it still works !).
By doing this we have already used a modular approach.
We have analysed each section of the circuitry,
and then using the abstract errors derived from each module,
can fit these into a picture of the
fault~modes of the milli-volt monitor as a whole. However this type of analysis is not guaranteed
to rigourously take into account all fault~modes.
It is useful to follow an example fault though levels of abstraction hierarchy however, see below.
%The FMMD technique,
%goes further than this by considering all part fault~modes and
%places the analysis phases into a rigid structure.
%Each analysis phase is
%described using set theory in later sections.
%By creating a rigid hierarchy, not only can we traverse back
%down it to find possible causes for system errors, we can also determine
%combinations of fault modes that cause certain high level fault modes.
%For instance, it may be a criteria that no single part failure may cause a fatal error.
%If a fault tree can trace down to a single part fault for a potentially fatal
%fault mode, then a re-design must be undertaken.
%Some standards for automated burner controllers demand that two part failure modes cannot cause
%a dangerous/potentially fatal error. Again having a complete fault analysis tree will reveal these conditions.
\subsection{An example part Fault and its subsequent \\ abstraction to system or top level}
An example of a part fault effect on the example system is given below, showing how this fault
manifests itself at each abstraction level.
%\begin{example}
As an example let us consider a resistor failure in the first milli-volt sensor.
Let us say that this resistor, R48 say, with the particular fault mode `shorted'
causes the amplifier to output 5V.
At the part level we have one fault mode in one part.
%This is the lowest or zero level of fault abstraction.
Let us say that this amplifier has been designed to amplify the milli-volt input
to between 1 and 4 volts, a convenient voltage for the ADC/microcontroller to read.
Any voltage outside this range will be considered erroneous.
As the resistor short causes the amplifier to output 5V we can detect the error condition.
This resistor is a part in the `millivolt amplifier 1' module.
% (see figure \ref{fig:mvs}).
The fault mode at the derived fault level (abstraction level 1) is OUTPUT\_HIGH.
Looking higher in the hierarchy, the next abstraction level higher, level 2, will see this as
a `CHANNEL\_1' input fault.
%The system as a whole (abstraction level 3) will see this as
%a `MILLI\_VOLT\_SENSOR' fault~mode.
%\end{example}
\subsubsection{Abstraction Layer Summary \\ for example fault.}
\begin{description}
%\begin{list}
\item[Abstraction Level 0 :] Resistor has fault mode `R48\_SHORT' in amplifier 1.
\item[Abstraction Level 1 :] Amplifier 1 has fault mode `OUTPUT\_HIGH'.
\item[Abstraction Level 2 :] Milli-volt sensor has `CHANNEL\_1' fault.
%\item[Abstraction Level 3 :] System has `MILLI\_VOLT\_SENSOR' fault.
%\end{itemize}
%\end{list}
\end{description}
Thus we have looked at a single part fault and analysed its effect from the
bottom up on the system as a whole, going up through the abstraction layers.
\subsubsection{Fault Abstraction Level}
\label{fal}
Let the fault abstraction level mean the number of times an element of a system
has been transformed by applying fault modes effects analysis. In the example above
the error becomes more abstract, each time we zoom out and consider the effects of
the fault on more generalised sections of the system.
\section{ Fault Modes and Collections }
A safety critical system is usually a collection of highly specified interacting parts.
These are
documented in a 'parts list', which is taken to define to the standards
agencies the authorised parts that will be used for a particular approved product.
This is an important document, used in an official sense, by standards agency
inspectors, often to validate production processes.
\begin{definition}
Let $L$ be the index set for a given system.
Let $p$ denote a part in the `parts~list' $ P_L = \{ \; p_l \; | \; l \; \in \; L \; \} $
Thus $P_L$ represents a set of all parts in a safety critical product.
\end{definition}
{\em
Should this be a list instead ?
$ P_L = < \; p_1..L \; | \; p_l \; \in \; P_L \; > $
}
All parts in a safety critical system have known
`fault~modes'.
A fault~analysis function $fa$ may be applied to a part part $p$
returning a set of `fault~modes'.
\begin{definition}
Let the set $D^{a}_{l}$ represent a set of derived fault~modes.
where $a$ represents the abstraction level, and $l$ is an index to the part that gives rise to these fault modes.
A superscript of $0$ indicates that the fault~modes have been derived from
the lowest abstraction level, the parts~level.
%Let $K_p$ be the set of possible fault~modes for a given part `$p$',
%Let $p$ denote a part in the `parts~list' indexed by the variable l, $ PL = \{ {p}_{l} | l \in L \} $
%Thus PL represents a set of all parts in the system.
\end{definition}
\begin{definition}
\label{func:fa}
$$ fa(p_l) = \{ f_1, f_2, f_3 ... f_n \} = D^{0}_{l} $$
where $f_1...f_n$ represent part fault modes for the part $p_l$.
The fault~modes corresponding to part $p_l$ are represented by the set $ D^{0}_{l}$.
\end{definition}
%\begin{example}
For instance were part number 48 in a parts~list to be a wire wound resistor
applying fault analysis
would derive two fault conditions, open and short.
$$ fa(p_{48}) = \{ OPEN_{48}, SHORT_{48} \} = D^{0}_{48}$$
\label{disjointnotcare}
%\end{example}
\subsection { Defining an entire system \\ as a family of fault modes}
As a thought experiment, let us consider an entire system in terms of all its fault modes.
So without breaking a circuit into modules or chunks or whatever, just consider all the fault modes of all the components in a system.
One way of analysing a system would be to take each part fault mode
in turn and then to determine its effect on the system as a whole.
This means taking a particular part fault mode and checking its effects upon every other component in the system.
\begin{definition}
\label{func:s0}
Let $S^{0}$ be a family of sets of part~faults, for all parts in the parts list
\label{systemlevel0}
$$ {S^{0}} = \{ \; D^{0}_{l} \; | \; l \; \in \; L \; \} $$
which is the same as saying
$${S^{0}} = \{ fa(x) | x \in P_{L} \} $$
\end{definition}
The entire set of all part `fault~modes' for a complete system $S^{0}$, can be defined
by a family of all part fault~mode sets $D^{0}_{s}$, indexed by s with L as the index set.
$S^0$ then represents all the fault modes of all the parts in the system we are analysing.
We could analyse this circuit by taking each part~fault~mode and working out what effect it will have on the system as a whole.
This would be analysing the system from the perspecive of the effect of single part failures.
We could go further and take all combinations of double simultaneous faults possible
and examine their effect on the system.
%\begin{equation}
%\label{systemlevel0}
% {S^{0}} = \{ \; K^{0}_{l} \; | \; l \; \in \; L \; \} = \{ fa(x) | x \in P_{L} \}
%\end{equation}
%
%-OR- In terms of the parts~list, parts transformed to set $K$ by function`$fa$' all exist in $S^0$
%$$ \forall x \{ x \in PL | fa(x) \in S^{0} \}$$
%
To completely analyse the effect of all part failure modes we would
need to consider their affects in all combinations. In other words we
would have to look at each fault~mode and then see how it affected every other component in the system
and then work out what effect that would have on the entire system.
Checking every combination of fault~modes corresponds to the power set of the union of the set in $S^0$
(using $ \mathcal{P} $ to represent the power set, and $(\bigcup {S^0})$ to mean flattening the family of sets).
$$ AllcombinationsofPartFaults = \mathcal{P} ({\bigcup}_{l \in L} D^{0}_{l}) = \mathcal{P} (\bigcup {S^0}) $$
%where $ \cup \mathcal{F}S = \{ x | \forall A \in \mathcal{F}(x \in A) \} = \{x|\forall A ( A \in \mathcal{F} \rightarrow x \in A) \} $
That is to say, checking for all the part faults in the system, in all combinations.
% (including the empty set)
Taking the power set $ \mathcal{P} $ of the union of the family $S^0$, i.e. $ \mathcal{P} (\bigcup {{S}^{0})} $, would give us all possible
combinations of part faults in the system.
% Get the formula from the 2004 paper done with JEAN
\begin{example}A typical circuit board with say 1000 parts each of which have
say, 5 error modes, would mean $S^0$ would have 5000 elements.
Thus, to check for the effect of single part failure modes would entail 5000 checks against 999
parts.
To check for all double failures $(5000-1)^2*998$
To check all possible combinations of failure modes,
would lead to an astonishing number part~failure~modes to check ($2^{5000}$).
A full check at the part fault mode level is therefore impractical.
\end{example}
\subsection{Reducing the Power set combinations}
It would be much better if we could break the problem down into manageable chunks,
analyse the behaviour of the chunks and then combine them, with other fundtional~groups,
that have been analysed.
In this way we could build a hierarchy of modules with analysis phases
leading to an eventual model of the entire system.
\begin{definition}
Let $F$ be a functional group of components within a system.
A `functional~group' once identified will be a sub-set of the parts~list $P_L$.
$ F \subseteq P_L $
\end{definition}
A {\em functional group} defined from the parts is at the zero'th level of abstraction.
The {\em functional group} is a collection: no analysis has been applied
and it therefore is a group at zero level fault abstraction.
To indicate this it will be given a superscript of 0.
There may be a large number of functional~groups identified at this stage.
An index will be used as a subscript to identify them. Thus $F^{0}_{4}$ would be the fourth
identified functional group.
\begin{example}
Looking at the milli-volt sensor in figure~\ref{fig:mvs},
we can easily identify the milli-volt amplifier as a functional group.
It has a small number of components and one specific well defined job.
That is to amplfy a milli volt signal to within a given range and to
pass this on to another functional group to read it (the ADC).
\end{example}
The functional group as a list of parts is not directly useful. If we define a function
`$fm$` that translates a set of parts (functional~group) into a corresponding family of
fault~mode sets, these can be used for further analysis.
These families of fault mode sets are termed `modules'.
%Using the function $\#$ to represent cardinality thus $\#(A) = Cadinality(A)$.
\begin{definition}
Let $M^{a}_{n}$ represent a family of derived fault modes
corresponding to the functional group $F^{a}_{n}$, where $a$ represents the abstraction level
and $n$ is an index.
\end{definition}
The set $M^{0}_{n}$ is a family of fault~modes corresponding to the {\em functional group}
$F^{0}_{n}$, where the superscript $a$, represents the
hierarchy/abstraction level, and $n$ corresponds to the functional group index.
\begin{definition}
\label{func:fm}
The function fm translates a set of parts to a set of corresponding fault~modes
$$fm: F^{0}_{n} \rightarrow M^{0}_{n} $$
or
$$ M^{0}_{n} = \{ fa(p) | p \in F^{0}_{n} \} $$
%
% completeness not necessaary with cardinality of sets as it iterates over whole parts list
% $$ fm( M^{a}_{n} ) = \forall x \{ x | ( x \in M^{a}_{n} \rightarrow ( fa(x) \in K^{a}_{n} ) \Con ( \#(M^{a}_{n}) = \#(K^{a}_{n} ) ) \}$$
%
\end{definition}
\begin{summary}
Beginning with the
individual parts, we have combined these to functional~groups.
These functional groups have been converted into a family of corresponding sets of fault~modes (modules).
The next stage is to perform fault~mode~effect~analysis to determine the ways in which the `module' can fail.
Or in other words a set of faults at the `module'
level : i.e. a higher level of fault~mode abstraction.
\end{summary}
\section { FMEA : Fault Mode Effect Analysis }
Fault mode effects analysis, is the process of looking at a module and determining how it will fail
according to different scenarios of part failures.
This can be in the nature of a thought experiment (for instance looking an an electronic circuit,
we might say what happens if this resistor goes open etc),
or could be based on empirical data. However the results gathered would always be subject to
the scrutiny and approval of any standards agency they would be submitted to\footnote{Some standards are now listing common electronic parts with associated fault~modes that must be considered for
the approval process \cite{EN61508}.}. Often in an approval process, the approval agencies will talk through fault scenarios and require that manufacturers defend against potential part faults. This is on the basis of experience and knowledge of the type of
system under analysis. It is not necessarilly a rigourous or mathematically complete process.
%This analysis needs to be performed by an expert
%for the system under scrutiny.
%\begin{definition} A module is a family of sets of `fault modes' of parts (eqn \ref{module}).\end{definition}
By looking at how the part fault~modes within a module interact; and then
analysing the scenario for every fault~mode combination, we can determine a new set of fault~modes, fault~modes
from the perspective of the module.
\begin{definition}
Let $D^{a+1}_{n}$ be the set of fault~modes derived from a module $M^{a}_{n}$.
A superscript will define the abstraction level and a subscript the module index.
Note that the act of deriving fault~modes from a module, raises the abstraction level.
\end{definition}
\begin{definition}
Let the symbol $\FMEA$ mean `fault mode effects analysis'. This will translate a set family $M$ into a corresponding set $D$.
i.e. $ \bowtie: M^{a}_{n} \mapsto D^{a+1}_{n}$ The $\bowtie$ operation has the effect of raising the fault abstraction level.
\end{definition}
%Each module analysed map to a derived set
%which will have the same subscript and index.
%A module at abstraction level 0, can be mapped
% to a derived set by applying $\FMEA$ to the M set thus:
\begin{equation}
\label{derive0}
D^{1}_{N} \; = \FMEA({\cal P} \; \bigcup \; M^{0}_{N}) = \FMEA({\cal P} \; \bigcup \; fm (F^{0}_{N}) \; )
\end{equation}
The set $ D^{1}_{N} $ is the set of errors derived from the Nth module at abstraction level 0.
$D^1_{n}$ sets derived in this way, could now be used in the same way as the $D^0_n$ sets were in the last example,
but at a higher level of abstraction. Thus one or more $ D^{1}_{N} $ sets can be combined
to form a $M^1_{n}$ set i.e.
\begin{example}
For example, we could build a first level $M$ set from three first level module derived fault modes, say
$ D^{1}_{1}, D^{1}_{4}, D^{1}_{7}$.
$$ M^{1}_{N} = \{ D^{1}_{1}, D^{1}_{4}, D^{1}_{7} \} $$
This $ M^{1}_{N} $ set may now be subjected to FMEA and will return a set of derived fault~modes at the second level of hierarchy thus
\begin{equation}
D^{2}_{N} \; = \FMEA({\cal P} \; (\bigcup \; M^{1}_{N}))
\end{equation}
This may continue up until the hierarchy is complete, with the fault~modes becoming more and
more abstract as the hierarchy gets higher. The top of the hierarchy will be a set of derived
fault modes, representing the system fault modes.
\end{example}
\subsection{The FMEA Process}
In practise the combinations of fault modes to be analysed are placed on a
matrix, and the fault effect is determined for each combination under scrutiny.
It may be found that one or more combinations of part fault~modes will lead to the same module
level fault~mode. The means that the module could fail in the same way due to a variety of causes,
and this fact will be useful later for determining fault trees. But more importantly, it means that the number
of fault conditions for a module should be smaller than the sum of all the part error conditions
in the module.
\subsection{ Practical limits for the number \\ of fault mode combinations to \\ consider within a module}
It may not be deemed necessary to analyse all scenarios from the power-set of an $M$ set.
Some European standards will only consider one part fault at a time for analysis
and stricter ones\ref{en298} imply checking for double simultaneous faults. Statistically it
becomes very unlikely to have more than two parts fail suddenly and
European\ref{en298} and North American\ref{UL1998} standards reflect this belief.
To express this formally we can use a cardinality restricted powerset see \ref{ccp}
Thus for considering single part fault modes only, equation \ref{derive0} becomes
$ D^{2}_{N} \; = \FMEA({\cal P}_1 \; (\bigcup \; M^{1}_{N}))$
And for considering double and single fault modes $ D^{2}_{N} \; = \FMEA({\cal P}_2 \; (\bigcup \; M^{1}_{N}))$.
{\em NOTE: need proof here of how this translates UP the hierarchy, because as it goes UP
only more error sources can be included in the fault tree NOT LESS OK need the express this formally maybe}
%Collecting the derived fault modes is thus important.
The FMEA process can be represented visually, by making part fault modes contours and scenario test
cases points. Points which have the same module level fault~mode can now be joined by lines.
%Although this looks very much like a spider diagram, but it would be misleading to think of it so.
Note this is not a diagram representing sets. The Euler diagram here
is being used to represent logical conditions.
Each collection of test~cases joined by line(s) is a derived fault mode at the module level.
\subsection{FMEA Diagram : Definition of Symbols}
%\begin{figure}[t+]
%\centering
%\epsfig{file=cimg5043fmmd_spider.eps, width=\textwidth}
%\caption{ FMEA Diagram : Example Incomplete Analysis }
%\label{fig:sdfmea}
%\end{figure}
\begin{figure}
\centering
\input{fmmdset/fmea_diagram.tex}
\caption{FMEA Diagram : Example Incomplete Analysis}
\label{fig:sdfmea}
\end{figure}
When viewed as an FMEA diagram, each part fault mode would become a contour.
Each (asterisk `*') represents test~case corresponding to a combination of fault~modes within the module.
Overlapping contours represent the occurrence of simultaneous faults (i.e. all the fault modes
corresponding to contours are considered active for the test case).
\begin{summary}
\begin{itemize}
\item Test cases are represented by `*' marks.
\item Conjuction of fault~modes (i.e. simultaneous faults) are represented by overlapping regions of contours.
\item Lines joining test cases mean the test cases cause the same fault at the module or $M$ set abstraction level.
\end{itemize}
\end{summary}
{\em TO DO: well formness and specific rules for FMEA diagrams}
\subsection {Example FMEA process using an FMEA diagram}
Consider a simple functional~group $ F^0_1 $ derived from two parts $p1,p2$.
Applying fault analysis to these parts gives sets of corresponding fault modes
(where $D^0_p$ is the set of fault modes for the part $p$,
and the individual fault modes use an indexed lower case $f$
with the part number with a post fixed fault type, here a..z).
$$ fa(p1) = \{ f_{p1a}, f_{p1b}, f_{p1c} \} = D^0_{p1} $$
$$ fa(p2) = \{ f_{p2a}, f_{p2b} \} = D^0_{p2} $$
Applying the `$fm$' function defined in (\ref{func:fm}) to the functional group $F$
gives an $M$ set. This is a module that we can use for fault behaviour analysis.
$$ fm( F_{1}^{0} ) = M_{1}^{0} = \{ D^0_{p1}, D^0_{p2} \} $$
Note the definition of the Union of this family is
$$ {\bigcup}{M_1^0} = \{ f_{1a}, c_{1b}, f_{1c}, f_{2a}, f_{2b} \} $$
To analyse the effects of the fault~modes on the module, we first take the power set of the union of this family
$$ \mathcal{P} ({\bigcup}{M_1^1}) = \mathcal{P} \{ f_{1a}, f_{1b}, f_{1c}, f_{2a}, f_{2b} \} $$
The Power-set returns all possible combinations of faults. In this case it would be $2^5$ number of combinations to check.
We could restrict the cardinality of the powersets and reduce the complexity level.
For this we can use a restricted cardinality powerset (see \ref{ccp}).
If we restrict our search to single faults and double simultaneous faults $\mathcal{P}_{2}$, we can use a two dimensional
Euler\footnote{Euler diagrams are here not used to represent sets, but are used to represent boolean logic conditions}
diagram to represent this. In this Euler diagram, for simplicity, only 5 test cases
have been analysed (see figure \ref{fig:sdfmea}).
For the purposes of this example it has been decided that some combinations of part faults, cause the same module level error.
Where this happens test~cases are joined by lines to indicate that they cause the same fault
(from the module or $M^a_n$ set perspective).
The single point $f2$ represents a module fault mode caused by the
combined part failures of $ f_{p2b}$ and $f_{p1a}$.
As an equation
$$f2 \rightarrow f_{p2b} \Con f_{p1a} $$
Two pairs of test cases cause the same module level error.
$f1$ and $f3$, are both joined to two test cases by connecting lines.
The module level fault mode $f1$ was caused by either fault~mode $c_{1b}$ or by
the combination of $f_{p1c} \Con f_{p2b}$.
As an equation
$$f1 \rightarrow f_{p1b} \Dis (f_{p1c} \Con f_{p2b}) $$
Similarly the logical causes for $f3$, are
$$f3 \rightarrow ( ( f_{p1a} \Con f_{p2a} ) \Dis ( f_{p1b} \Con f_{p2a} ) ) $$
or simplifying by distributive law
$$f3 \rightarrow ( f_{p2a} \Con ( f_{p1b} \Dis f_{p2a} ) ) $$
All joined test cases and individual test cases, discovered during the FMEA process,
can now be considered
to be derived fault modes for the module.
Thus:
$$ D^{1}_{1} = \FMEA ({\bigcup}{M^{0}_{1}}) = \FMEA {\cup} fm({F^{0}_{1}}) = \{ f1, f2, f3 \} $$
$ D^{1}_{1} $ is now a set of fault~modes for the module. We can now
treat this in the same way we treated the part fault mode sets ($M^{0}_{n}$),
and take several derived ( $ D^{1}_{n} $) sets and combine them to form higher level modules.
This is like considering the derived set to be a part, but a part at a higher level of abstraction.
The process can then continue until up in abstraction levels until we have a complete hierarchical model.
Note for the purpose of reducing clutter from this example we are ignoring the other test cases. A full analysis would include
all test cases.
\begin{figure}
\centering
\input{fmmdset/fmmdh.tex}
\caption{FMMD example Hierarchy}
\label{fig:sdfmea}
\end{figure}
\section {Building the Hierarchy - Higher levels \\ of Fault Mode Analysis}
Figure \ref{fig:fmmdh} shows a hierarchy of failure mode descopmosition.
It can be seen that the derived fault~mode sets are higher level abstractions of the fault behaviour of the modules.
We can take this one stage further by combining the $D^{1}_{N}$ sets to form modules. These
$M^2_{N}$ fault mode collections can be used to create $D^3_{N}$ derived fault~modes sets and so on.
At the top of the hierarchy, there will be one final (where $t$ is the
top level) set $D^{t}_{N}$ of abstract fault modes. The causes for these
system level fault~modes will be traceable down to part fault modes.
A hierarchy of levels of faults becoming more abstract at each level should
converge to a small sub-set of system level errors.
This thinning out of the number of system level errors is borne out in practise ;
real time control systems often have a small number of major reportable faults (typically $ < 50$),
even though they may have accompanying diagnostic data.
\cite{sem}
%\begin{figure}
%\subfigure[Euler Diagram]{\epsfig{file=fmmd_hierarchy_cimg5040.eps,width=4.2cm}\label{fig:exa}}
%\subfigure[Intersection A B ]{\epsfig{file=exampleareasubtraction2.eps,width=4.2cm}\label{fig:exb}}
%\subfigure[area to subtract]{\epsfig{file=exampleareasubtraction3.eps,width=4.2cm}\label{fig:exc}}
%\subfigure[A second graphic]{\epsfig{file=exampleareasubtraction3.eps,width=2cm}}
%{\epsfig{file=fmmd_hierarchy_cimg5040.eps,width=12cm}
%\label{fig:ex}
%\caption{Simple Euler Diagram}
%\end{figure}
\cite{sem}
\section {Modelling considerations}
\subsection{FMEA diagramatic syntax and Well Formedness }
{\em TO DO RULES AND MEANING}
\subsection{The Parts List, Set or Bag}
The Parts list is indexed by the set $L$ to ensure all parts are unique.
Thus Set suffices and no bags required.
\subsection{The Empty Set}
Note that any power~set will always include the empty set.
Thus $\emptyset \in ({\cal P} \cup {\cal F}S)$ corresponds to the
state where there are no active errors in any of the parts
in the system i.e. it is in correct operational state.
$$ CorectOperationalState = \emptyset \in ({\cal P} \cup {\cal F}S) $$
This simply says that where no part fault modes are active the system must be functioning correctly.
If this is not the case, then all part~fault modes have not been considered.
\subsection{Complete Coverage of Fault Modes }
To ensure that all fault modes are represented, each fault mode from
the union all subsets of $S$ ($\cup {S}$) must be represented by a first
level module.
$$ CompleteCoverage = \forall \; x \exists \; y \; ( \; x \; \in \; (\bigcup \; S) \; \Rightarrow \; x \; \in \; (\bigcup \; D^{1}_{y}) ) $$
That is to say that where a fault~mode exists in the system, it must be
included in at least one module.
Were it not to be we would have a fault~condition that was not modelled.
The $CompleteCoverage$ check would not ensure that the modules had been
identified correctly, but would ensure that there were no missing
fault~modes, and acts as a type of syntax check.
Note also, that this means the modules could share parts
(although this would be unusual in practise).
\subsection{ Part Fault modes Conjoint and Disjoint }
Note that in the example (\ref{disjointnotcare}) the part fault modes are disjoint, the resistor cannot be both open and shorted
at the same time. Not all part fault modes are disjoint, however. Consider a complicated part
like a micro~processor. This could easily have more than one fault~mode active.
\section {notes}
\subsection{ The Parts List }
A parts list is typically a document with an enumerated list of parts.
Each part includes a description, placement information, manufacturers part number and optional comments and vendor sourcing numbers.
It is a document tied to a particular hardware revision of a product, and is used for the procurement of parts
and as a cross check for inspectors from standards agencies.
{\em NEED EXAMPLE PARTS LIST HERE....}
\subsection{ Proof of number of part~failure \\ modes preserved in hierarchy build}
Here need to prove that if we have an abstract fault, then as it goes higher in the tree, it can only collect MORE not less
actual part~failure modes. This is obvious but needs a proof.
Also this means may need dummy modules to not violate jumping up the tree structure
%Complete coverage for all derived hierarch levels can be generalised thus:
%$$ CompleteCoverage = \forall \; h \; \forall \; x \exists \; y \; ( \; x \; \in \; \cup \; {\cal F} \; D^{h}
% \; \Rightarrow \; x \; \in \; \cup \; M^{h}_{y} ) $$
\subsection{Cardinality Constrained Powerset }
\label{ccp}
A Cardinality Constrained powerset is one where sub-sets of a cardinality greater than a threshold
are not included. This theshold is called the cardinality constraint.
To indicate this the cardinality constraint $cc$, is subscripted to the powerset symbol thus $\mathcal{P}_{cc}$.
Consider the set $S = \{a,b,c\}$. $\mathcal{P}_{2} S $ means all subsets of S where the cardinality of the subsets is
less than or equal to 2.
$$ \mathcal{P} S = \{ 0, \{a,b,c\}, \{a,b\},\{b,c\},\{c,a\},\{a\},\{b\},\{c\} \} $$
$$ \mathcal{P}_{2} S = \{ \{a,b\},\{b,c\},\{c,a\},\{a\},\{b\},\{c\} \} $$
$$ \mathcal{P}_{1} S = \{ \{a\},\{b\},\{c\} \} $$
A $k$ combination is a subset with $k$ elements.
The number of $k$ combinations (each of size $k$) from a set $S$
with $n$ elements (size $n$) is the binomial coefficient
$$ C^n_k = {n \choose k} = \frac{n!}{k!(n-k)!}$$
To find the number of elements in a cardinality constrained subset S with up to $cc$ elements
in each comination sub-set,
we need to sum the combinations,
%subtracting $cc$ from the final result
%(repeated empty set counts)
from $1$ to $cc$ thus
%
% $$ {\sum}_{k = 1..cc} {\#S \choose k} = \frac{\#S!}{k!(\#S-k)!} $$
%
$$ \#\mathcal{P}_{cc} S = \sum^{k}_{1..cc} \frac{\#S!}{k!(\#S-k)!} $$
\section{Future Ideas}
\subsection{ Production Quality Control }
Having a fault causation tree, could be used for PCB board fault finding (from the fault codes that are reported
by the equipment). This could be used in conjunction with a database to provide
Production oriented FMEA\footnote{The term FMEA applied to production, is a statistical process of
determining the probability of the fault occurring and multiplying that by the costs incurred from the fault.
This quickly becomes a priority to-do list with the most costly faults at the top}
\subsection { Test Rigs }
Test rigs apply a rigourous checking process to safety critical equipment before
they can be sold, and this usually is a legal or contractural requirement, backed up by inspections
and and an approval process.
They are usually a clamp arrangement where the PCB under test is placed.
Precesion and calibrated test signals are then applied to the board under test. For PCBs containing
microprocessor, custom test~rig software may be run on them to excersize
active sections of the PCB (for instance to drive outputs, relays etc).
The main purpose of a test rig is to prevent fault equipment from being shipped.
However, often a test rig, will reveal an easy to fix fault on a board (such as a part not soldered down completely
or missing parts). These boards can be mended and re-submitted to the test rig.
It is often a problem, when a unit fails in a test rig, to quickly determine why it has failed.
Having a fault causation tree, would be useful for identifying which parts may be missing, not soldered down
or simply incorrect. The test rig armed with the fault analysis tree could point to parts or combinations of parts that could be checked
to correct the product.
\subsection {Modules - re-usability}
In the example system in the introduction, the milli-volt amplifiers
are the same circuit. The set of derived faults for the module may therefore
simply be given a different index number and re-used.
\subsection{ Multi Channel Safety Critical Systems }
Where a system has several independent parrallel tasks, each one can be a separate hierarchy.
% \small
% \bibliography{vmgbibliography,mybib}
% \normalsize
% Typeset in \ \ {\huge \LaTeX} \ \ on \ \ \today
\begin{verbatim}
CVS Revision Identity $Id: fmmdset.tex,v 1.7 2009/06/06 11:52:09 robin Exp $
\end{verbatim}
%\end{document}
%\theend