Robin_PHD/component_failure_modes_definition/component_failure_modes_definition.tex


\abstract{ This chapter defines what is meant by the terms
components, component fault modes and `unitary~state' component fault modes.
%The application of Bayes theorem  in current methodologies, and
%the suitability of the `null hypothesis' or `P' value statistical approach
%are discussed.
Data types and their relationships are described using UML.
Mathematical constraints and definitions are made using set theory.
}


\section{Introduction}

When analysing a safety critical system using the
FMMD technique, we need clearly defined failure modes for
all the components that are used to model the system.
These failure modes have a constraint such that
the compoent failure modes must be mutually exclusive.
This and the definition of a component are
described in this chapter.
%When building a system from components,
%we should be able to find all known failure modes for each component.
%For most common electrical and mechanical components, the failure modes
%for a given type of part can be obtained from standard literature\cite{mil1991}
%\cite{mech}. %The failure modes for a given component $K$ form a set $F$.


%%
%% Paragraph component and its relationship to its failure modes
%%

\section{ What is a Component ?}


Let us first define a component. This is anything we use to build a
product or system with. This could be something quite complicated
like an integrated microcontroller, or quite simple like the humble resistor.
We can define a
component by its name, a manufacturers part number and perhaps
a vendors reference number.
What these components all have in common is that they can fail, and fail in
a number of well defined ways. For common components
there is established literature for the failure modes for the system designer consider (with accompanying statistical
failure rates)\cite{mil1991}. For instance, a simple resistor is generally considered
to fail in two ways, it can go open circuit or it can short.  But we can also
associate it with a set of known failure modes. The UML diagram in figure
\ref{fig:component} shows a component as a simple data
structure with its failure modes.


\begin{figure}[h]
 \centering
 \includegraphics[width=400pt,bb=0 0 437 141,keepaspectratio=true]{component_failure_modes_definition/component.jpg}
 % component.jpg: 437x141 pixel, 72dpi, 15.42x4.97 cm, bb=0 0 437 141
 \caption{A Component and its Failure Modes}
 \label{fig:component}
\end{figure}

% \begin{figure}[h+]
%  \centering
%  \includegraphics[width=400pt,bb=0 0 433 68,keepaspectratio=true]{component_failure_modes_definition/component.jpg}
%  % component.jpg: 433x68 pixel, 72dpi, 15.28x2.40 cm, bb=0 0 433 68
%  \caption{A Component and its failure modes}
%  \label{fig:component}
% \end{figure}

A product naturally consists of many components and these are traditionally
kept in a `parts list'. For safety critical product this is a usually formal document
and is used by quality inspectors to ensure the correct parts are being fitted.
For our UML diagram the parts list is simply a collection of components
as shown in figure \ref{fig:componentpl}.
\begin{figure}[h]
 \centering
 \includegraphics[width=400pt,bb=0 0 712 68,keepaspectratio=true]{component_failure_modes_definition/componentpl.jpg}
 % componentpl.jpg: 712x68 pixel, 72dpi, 25.12x2.40 cm, bb=0 0 712 68
 \caption{Parts List of Components}
 \label{fig:componentpl}
\end{figure}


%%
%% Paragraph using failure modes to build from bottom up
%%

\section{Fault Mode Analysis, top down or bottom up?}

Traditional static fault analysis methods work from the top down.
They identify faults that can occur in a system, and then work down
to see how they could be caused. Some apply statistical tequniques to
determine the likelihood of component failures
causing specific system level errors (see Bayes theorem \ref{bayes}).
Another top down technique is to apply cost benifit analysis
to determine which faults are the highest priority to fix\cite{FMEA}.
The aim of this study is to produce complete  failure
models of safety critical systems  from the bottom-up,
starting, where possible with known component failure modes.

In order to analyse from the bottom-up, we need to take
small groups of components from the parts~list that naturally
work together to perform a simple function.
We can term this a `Functional~Group'. When we have a
`Functional~Group' we can look at the failure modes of all the components
in it and decide how these will affect the Group.
Or in other words we can determine the failure modes of the functional
group. These failure modes are derived from the functional group, as so we can call
them `derived failure modes'.
We now have something very useful, because
we can now treat this functional group as a component with a known set of failure modes.
This newly derived component can be used as a higher level
building block for the system we are analysing.
Derived components, can be used
to form higher level functional groups.
This process can continue until have build a hierarcy that converges to a failure model of the entire system.
To differentiate the components derived from functional groups, we can
add a new attribute to the class `Component', that of analysis
level.
We can represet this in a UML diagram see figure \ref{fig:cfg}

\begin{figure}[h]
 \centering
 \includegraphics[width=400pt,bb=0 0 712 235,keepaspectratio=true]{component_failure_modes_definition/cfg.jpg}
 % cfg.jpg: 712x205 pixel, 72dpi, 25.12x7.23 cm, bb=0 0 712 205
 \caption{Components Derived from Functional Groups}
 \label{fig:cfg}
\end{figure}

\section{Set theory description}

$$ System \stackrel{has}{\longrightarrow} PartsList $$

$$ PartsList  \stackrel{has}{\longrightarrow} Components $$

$$ Component \stackrel{has}{\longrightarrow} FailureModes $$

$$ FunctionalGroup  \stackrel{has}{\longrightarrow} Components $$

Using the symbol $\bowtie$ to indicate an analysis process that takes a
functional group and converts it into a new component.

$$ \bowtie ( FG ) \mapsto Component $$


%
% \subsection{Systems, functional groups, sub-systems and failure modes}
%
% It is helpful here to define some terms, `system', `functional~group', `component', `base~component' and `sub-system'.
%
% A System, is really any coherent entity that would be sold as a safety critical product.
% A sub-system is a  part of some larger system.
% For instance a stereo amplifier separate is a sub-system. The
% whole Sound System, consists perhaps of the following `sub-systems':
% CD-player, tuner, amplifier~separate, loudspeakers and ipod~interface.
%
% %Thinking like this is a top~down analysis approach
% %and is the way in which FTA\cite{nucfta} analyses a System
% %and breaks it down.
%
% A sub-system will be composed of component parts, which
% may themselves be sub-systems.
%
% Eventually by a recursive downwards process we would be able to identify
% sub-systems built from base component parts.
% Each `component part'
% will have a known fault/failure behaviour.
% That is to say, each base component has a set of known
% ways in which it can fail.
%
% If we look at the sound system again as an
% example; the CD~player could fail in serveral distinct ways, no matter
% what has happened to it or has gone wrong inside it.
%
% A top down approach has an intrinsic problem in that we cannot guess
% every possible failure mode at the SYSTEM level.
% Using the reasoning that working from the bottom up forces the consideration of all possible
% component failures (which could be missed in a top~down approach)
% we are presented with a problem. Which initial collections of base components should we choose ?
%
% For instance in the CD~player example; to start at the bottom; we are presented with
% a massive list of base~components, resistors, motors, user~switches, laser~diodes all sorts !
% Clearly, working from the bottom~up we need to pick small
% collections of components that work together in some way.
% These are termed `functional~groups'. For instance the  circuitry that powers the laser diode
% to illuminate the CD might contain a handful of components, and as such would make a good candidate
% to be one of the base level functional~groups.
%
%
% In choosing the lowest level (base component) sub-systems we would look
% for the smallest `functional~groups' of components within a system. A functional~group is a set of components that interact
% to perform a specific function.
%
% When we have analysed the fault behaviour of a functional group, we can treat it as a `black box'.
% We can now call our functional~group a sub-system. The goal here is to  know how will behave under fault conditions !
% %Imagine buying one such `sub~system'  from a very honest vendor.
% %One of those sir, yes but be warned it may fail in these distinct ways, here
% %in the honest data sheet the set of failure modes is listed!
% This type of thinking is starting to become more commonplace in product literature, with the emergence
% of reliability safety standards such as IOC1508\cite{sccs},EN61508\cite{en61508}.
% FIT (Failure in Time - expected number of failures per billion hours of operation) values
% are published for some micro-controllers. A micro~controller
% is a complex sub-system in its self and could be considered a `black~box' with a given reliability.
% \footnote{Microchip sources give an FIT of 4 for their PIC18 series micro~controllers\cite{microchip}, The DOD
% 1991 reliability manual\cite{mil1991} applies a FIT of 100 for this generic type of component}
%
% As electrical components have detailed datasheets a useful extension of this would
% be failure modes of the component, with environmental factors and MTTF statistics.
%
% Currently this sort of information is generally only  available for generic component types\cite{mil1991}.
%
%
% %At higher levels of analysis, functional~groups are pre-analysed sub-systems that interact to
% %erform a given function.
%
% %\vspace{0.3cm}
% \begin{table}[h]
% \begin{tabular}{||l|l||} \hline \hline
%   {\em Definition } & {\em Description}    \\ \hline
% System & A product designed to  \\
%        & work as a coherent entity  \\  \hline
% Sub-system & A part of a system, \\
%            & sub-systems may contain sub-systems \\    \hline
% Failure mode & A way in which a System, \\
%              & Sub-system or component can fail \\     \hline
% Functional Group & A collection of sub-systems and/or \\
%                  & components that interact to \\
%                  & perform a specific function  \\    \hline
% Failure Mode     & The collection of all failure \\
% Group            & modes from all the members of a \\
%                  & functional group \\ \hline
% Derived      & A failure mode determined from the analysis \\
% Failure mode & of a `Failure Mode Group' \\ \hline
% Base Component & Any bought in component, which \\
%                & hopefully has a known set of failure modes  \\    \hline
%  \hline
% component_failure_modes_definition/
% \end{tabular}
% \label{tab:def}
% \caption{Table of FMMD definitions}
% \end{table}
% %\vspace{0.3cm}
%
% \section{A UML Model of terms introduced}
%
%
% \begin{figure}[h]
%  \centering
%  \includegraphics[width=350pt,bb=0 0 680 500,keepaspectratio=true]{component_failure_modes_definition/fmmd_uml.jpg}
%  % fmmd_uml.jpg: 680x500 pixel, 72dpi, 23.99x17.64 cm, bb=0 0 680 500
%  \caption{UML respresentation of Failure Mode Data types}
%  \label{fig:fmmd_uml}
% \end{figure}
%
% The diagram in figure \ref{fig:fmmd_uml}
% shows the relationships between the terms defined in table \ref{tab:def} as classes in a UML model.
% We can start with the functional group. This is a minimal collection
% of components that perform a simple given function.
% For our audio separates rig, this could be
% the compoents that supply power to the laser diode.
% From the `Functional~Group' we can now collect
% all the `failure modes of the `components', and
% produce a `Failure~Mode~Group'. This
% has a reference to the `Functional~Group', and is a collection
% of `failure modes.
% By analysing the effects of the failure modes in the `Failure~Mode~Group'
% we can determine the failure mode behaviour of the functional group.
% This failure mode behaviour is a collection of derived failure modes.
% We can now consider the Functional group as a component now, because
% we have a set of failure modes for it.
%
% \subsection{Sub-System Class Definition}
% A sub-system can be defined by the classes used to create it, and
% its set of derived failure modes.
% In this way sub-systems naturally form trees, with the lower most leaf nodes being
% base components.
% Note that the UML model is recursive. We can build functional groups using sub-systems
% as components. This UML model naturally therefore, forms a hierarchy
% of failure mode analysis, which has a one top level entry, that being the SYSTEM.
% The TOP level entry will determine the failure modes
% for the product/system under analysis.
%
% \subsection{Refining the UML model to use inheritance}
% We can refine this model a little by noticing that a system is merely the
% top level sub-system. We can thus have System inherit sub-system.
% A derived failure mode, is simply a failure mode at a higher level of analysis
% it can therefore inherit `failure\_mode'.
%
% The modified UML diagram using inheritance is figure \ref{fig:fmmd_uml2}.
% \begin{figure}[h]
%  \centering
%  \includegraphics[width=350pt,bb=0 0 877 675,keepaspectratio=true]{./fmmd_uml2.jpg}
%  % fmmd_uml2.jpg: 877x675 pixel, 72dpi, 30.94x23.81 cm, bb=0 0 877 675
%  \caption{UML Representation of Failure Mode Data Types}
%  \label{fig:fmmd_uml2}
% \end{figure}
% %
% % \begin{figure}[h]
% %  \centering
% %  \includegraphics[width=350pt,bb=0 0 680 500,keepaspectratio=true]{component_failure_modes_definition/fmmd_uml2.jpg}
% %  % fmmd_uml.jpg: 680x500 pixel, 72dpi, 23.99x17.64 cm, bb=0 0 680 500
% %  \caption{UML respresentation of Failure Mode Data types}
% %  \label{fig:fmmd_uml2}
% % \end{figure}


\section{Unitary State Component Failure Mode sets}

An important factor in defining a set of failure modes is that they
should be as clearly defined as possible.
%
It should not be possible for instance for
a component to have two or more failure modes active at once.

Having a set of failure modes where $N$ modes could be active simultaneously
would mean having to consider $2^N$ failure mode scenarios.
%
Should a component be analysed and simultaneous failure mode cases exit,
the combinations could be represented by new failure modes, or
the component should be considered from a fresh perspective,
perhaps considering it as several smaller components
within one package.


\begin{definition}
A set of failure modes where only one fault mode
can be active at a time is termed a `unitary~state' failure mode set.
This is termed the $U$ set thoughout this study.
This corresponds to the `mutually exclusive' definition in
probability theory\cite{probandstat}.
\end{definition}

We can define a function $FM()$ to
take a given component $K$ and return its set of failure modes $F$.

$$  FM : K \mapsto F $$

We can further define a set $U$ which is a set of sets of failure modes, where
the component failure modes in each of its members are unitary~state.
Thus if the failure modes of $F$ are unitary~state, we can say $F \in U$.


\section{Component failure modes : Unitary State example}

A component with simple ``unitary~state'' failure modes is the electrical resistor.

Electrical resistors can fail by going OPEN or SHORTED.

For a given resistor R we can assign it the failure mode by applying
the function $FM$ thus  $ FM(R) =  \{R_{SHORTED},R_{OPEN}\} $.
Nothing can fail with both conditions open and short active at the same time ! The conditions
OPEN and SHORT are mutually exclusive.
Because of this the failure mode set $F=FM(R)$ is `unitary~state'.


Thus

$$ R_{SHORTED} \cap R_{OPEN} = \emptyset $$


We can make this a general case by taking a set $C$ (where $c1, c2 \in C$) representing a collection
of component failure modes.
We can now state that


$$ c1 \cap c2 \neq \emptyset  | c1 \neq c2 \wedge c1,c2 \in C \wedge C \not\in U  $$

That is to say that it is impossible that any pair of failure modes can be active at the same time
for the failure mode set $C$ to exists in the family of sets $U$

 Note where that are more than two failure~modes, by banning pairs from happening at the same time
 we have banned larger combinations as well.


\section{Component Failure Modes and Statistical Sample Space}
%\paragraph{NOT WRITTEN YET PLEASE IGNORE}
A sample space is defined as the set of all possible outcomes.
Here the outcomes we are interested in are the failure modes
of the component.
When dealing with failure modes, we are not interested in
the state where the component is working perfectly or `OK' (i.e. operating with no error).
We are interested only in ways in which it can fail.
By definition while all components in a system are `working perfectly'
that system will not exhibit faulty behaviour.
Thus the statistical sample space $\Omega$ for a component/sub-system K is
%$$ \Omega = {OK, failure\_mode_{1},failure\_mode_{2},failure\_mode_{3} ... failure\_mode_{N} $$
$$ \Omega(K) = \{OK, failure\_mode_{1},failure\_mode_{2},failure\_mode_{3}, ... ,failure\_mode_{N}\} $$
The failure mode set for a given component or sub-system $F$
is therefore
$$ F = \Omega(K) \backslash OK $$

\clearpage

THIS SHOULD BE IN A DIFFERENT CHAPTER

\section{Current Methods for Safety Critical Analysis}


\subsection{Deterministic Approach}
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
No single component fault may lead to a dangerous condition.
EN298 En230 etc

\subsection{Bayes Theorem}
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
\label{bayes}
Describe application - likely hood of faults being the cause of symptoms -
probablistic approach - no direct causation paths to the higher~abstraction fault mode.
Often for instance a component in a module within a module within a module etc
that has a probability of causing a SYSTEM level fault.

Used in FTA\cite{NASA}\cite{NUK}.
The idea being that probabilities can be assigned to components
failing, causing system level errors.

 Problems, difficult to get reliable stats
for probability to cause because of small sample numbers...

FMMD approach can by traversing down the tree  use known component failure figures
to  get {\em accurate} probabilities and potential causes.
%$$ c1 \cap c2 \eq \emptyset  | c1 \neq c2 \wedge c1,c2 \in C \wedge C \in U  $$

%Thus if the failure~modes are pairwaise mutually exclusive they qualify for inclusion into the
%unitary~state set family.

\subsection{ Saftey Integrity Level Analysis }
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
\label{sil}
This technique looks at all components in the parts list
and asks what the effect of the component failing will be.
Note that particular failure modes of the compoent are not considered.
The component can fail in any of its failure modes from the perspective of this analysis.
The analyst has to make a choice between four conditions:

\begin{itemize}
\item sd - A safe fault that is detected by an automated system
\item su - A safe fault that is undetected by an automated system
\item dd - A potentially dangerous fault that is detected by an automated system
\item du - A potentially dangerous fault that is not detected by an automated system
\end{itemize}
Actually this is almost how sil analysis is done, because
the base components are listed
and their failure result as either sd su dd du

A formula is then applied according to the system architecture 1oo1 2oo3 3oo3 etc

What is not done is the probability for all these conditions, the sil analysis
person simple has to decide which it is.
Another fault in this is that it is very difficult to
extract meaning ful stats
for how likely the detection systems are to pick the fault up, or even to introduce a fault of their own.

\subsection{Tests of Hypotheses and Significance}
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
Linked in with Bayes theorem
Accident analysis
plane crashes and faults etc
In high reliability systems the fauls are often logged - strange occurances -
processors resetting - what are the common factors - P values -
for instance very high voltage spikes can reset micro controllers -
but how do you corrollate that with unshielded suppressed contactors...

Maybe looking at the equipment and seeing if there is a 5\%
level of the error being caused ?
i.e. using it to search for these conditions ?


Actually this could be used to refine the SIL method \ref{sil}
and give probabilities for the four conditions.