\ifthenelse {\boolean{paper}}
{
\abstract{ 
This paper proposes a methodology for
creating failure mode models of safety critical systems, which
has a common notation
for mechanical, electronic and software domains and applies an
incremental and rigorous approach.
%%
%% What I have done
%%
The four main static failure mode analysis methodologies were examined and 
in the context of newer European safety standards, assessed.
Some of the deficiencies identified in these methodologies led to
a wish list for a more rigorous methodology.
%%
%% What I have found
%%
From the wish list 
%and considering some constraints determined from
%the evaluation of the four established methodologies, 
a new
methodology is developed and proposed. 
This has been named Failure Mode Modular De-Composition (FMMD).

%% Sell it
%%
In addition to addressing the traditional weaknesses of
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticality Analysis (FMECA)
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios 
as specified in newer European Safety Standards \cite{en298}.
The proposed methodology is bottom-up and can guarantee to leave no component failure mode unhandled. 
It is also modular, meaning that the results of analysed components may be re-used in other projects.
}
}
{
 %%% CHAPTER INTO NEARLT THE SAME AS ABSTRACT

This chapter proposes a methodology for
creating failure mode models of safety critical systems, which
has a common notation
for mechanical, electronic and software domains and applies an
incremental and rigorous approach.
%%
%% What I have done
%%
The four main static failure mode analysis methodologies were examined and 
in the context of newer European safety standards, assessed.
Some of the deficiencies identified in these methodologies led to
a wish list for a more ideal methodology.
%%
%% What I have found
%%
From the wish list %
%and considering some constraints determined from
%the evaluation of the four established methodologies,
a new
methodology is developed and proposed. 
This has been named Failure Mode Modular De-Composition (FMMD).

%% Sell it
%%
In addition to addressing the traditional weaknesses of
Fault Tree Analysis (FTA), Fault Mode Effects Analysis (FMEA), Failure Mode Effects Criticality Analysis (FMECA)
and Failure Mode Effects and Diagnostic Analysis (FMEDA), FMMD provides the means to model multiple failure mode scenarios 
as specified in newer European Safety Standards \cite{en298}.
The proposed methodology is bottom-up and can guarantee to leave no component failure mode unhandled. 
It is also modular, meaning that the results of analysed components may be re-used in other projects.

}


\section{Current Static Failure Mode Methodologies}

There are four methodologies in common use for failure mode modelling.
These are FTA, FMEA, FMECA
and FMEDA (a form of statistical assessment).
%
These methodologies date from the 1940's onwards, and were designed for
different application areas and reasons; all have drawbacks and 
advantages that are discussed in the next section.
%In short
%FTA, due to its top down nature, can overlook error conditions. FMEA and the Statistical Methods
%lack precision in predicting failure modes at the SYSTEM level.

\paragraph{FMMD in context.}
Failure Mode Modular De-composition 
(FMMD) aims to address the 
weaknesses in the four established methodoligies, and to add
features such as the ability to analyse multiple
failure mode scenarios, and to allow modular re-use
of analysis.

%FMMD is an incremental bottom up FMEA process. 
%% TERRIBLE PARAGRAPH
The FMMD
methodology provides a detailed, hierarchical, incremental and analytical
modelling system which will create a failure mode model from which
the data models from FTA, FMEA, FMECA and FMEDA % (the statistical approach) 
can be 
derived. % if required. 
An FMMD model is effectively a super set of all the four traditional models.
It also focuses on component interaction within the model,
something not formally considered in the four established methodologies.
%
In addition it applies rigorous checking in all the analysis stages
ensuring that all component failure modes must be considered in the model.

%
\paragraph{FMMD Process outline.}
This methodology has been named Failure Mode Modular De-composition (FMMD)
because it decomposes a SYSTEM into a hierarchy of modules or {\dc}s.
This
\ifthenelse {\boolean{paper}}
{
paper
}
{
chapter
}
presents the design considerations that motivated and provided the specification for
the FMMD methodology.
%
It first reviews the four traditional
static failure mode analysis methodologies and 
lists their known weaknesses. A wish list is then drawn up
addressing these weaknesses and adding some extra requirements.
Using this wish list the philosophy for the new methodology
is determined.
%
FMMD works from the bottom up, taking small groups
of components, {\fgs}, and then analysing how they can fail.
\input{./shortfg}

\paragraph{Micro Vs. Macro failure mode analysis.}
This analysis is performed using FMEA from a micro rather than a macro perspective.
Thus instead of looking at component failure modes and determining how
they {\em may} cause  a failure at SYSTEM level, we are looking at how
they {\em will} affect the component's local {\fg}.
When we know the failure modes of a {\fg} we can treat it as a `black box'
or {\dc}. With {\dc}s we can build {\fgs} 
at higher levels of analysis, until we have a complete
hierarchy representing the failure behaviour of the SYSTEM.
%
Because all the failure modes of all the components
are held in a computer program, we can determine if the model is complete
(i.e. all component failure modes have been included in the model).


%OK need to describe the need for it
\section{The need for a new failure mode modelling methodology}

%%- There are dificulties with bot up methodologies,
%%- and this is in part due to the fact that accidents 
%%- are always unforseen and unexpected.

%%- what do we have ENV factors, component failure modes.

%%- how difficult is it to take a single component failure mode and
%%- then from that determine how it will react with other components
%%- and how it will be affected 

\subsection{General comments on bottom-up and top down approaches}

\paragraph{A general deficiency in top-down systems analysis.}
With a top down approach the investigator has to determine
a set of undesirable outcomes or `accidents'.
As most accidents are unexpected and the causes unforeseen \cite{safeware} 
it is fair to say that a top down approach is not guaranteed to
predict all possible undesirable outcomes.
Top-down methodologies can miss known component failure modes, by
simply not decomposing down to the base component failure level of detail.

\paragraph{A general problem with bottom-up static failure analysis.}
With the bottom up techniques we have all the known component failure modes
and the relative freedom to determine how each of these may affect the SYSTEM.
%
A problem with this is that a component typically 
interacts in a complex way with several other functionally
adjacent components.
%
To take a component failure mode and then attempt to tie that
to a SYSTEM level outcome is very difficult.
%
%
The number of components
a failure mode under investigation might interact with is typically very large. 
This makes it very difficult to predict the effects of a component 
failure mode, because we have to decide which components it could affect,
or
in other words, which components are functionally adjacent to it.
%
We cannot consider all the components in the SYSTEM
when looking at a single failure mode,
and therefore human judgement must be used to 
decide which interactions could be important.

Let N be the number of components in our system, and K be the average number of component failure modes
(ways in which the base~component can fail). The total number of base component failure modes
is $N \times K$. To examine the effect that one failure mode has on all 
the other components\footnote{A base component failure will typically affect the sub-system
it is part of, and create a failure effect at the SYSTEM level.}
will be $(N-1) \times N \times K$, in effect a very large set cross product.


Complicate this further with applied states or environmental conditions
and another order of cross product of complexity is added.
We may have a piece of self checking circuitry for instance that
has two states, normal and testing mode commanded by a logic line.
Or we may have a mechanical device that has a different 
failure mode behaviour for say, different ambient pressures or temperatures.

If $E$ is the number of applied states or environmental conditions to consider
in a system, the job of the bottom-up analyst is presented with an
additional %cross product 
factor, 
$(N-1) \times N \times K \times E$.
If we put some typical very small embedded system numbers\footnote{these figures would
be typical of a very simple temperature controller, with a micro-controller sensor 
and heater circuit} into this, say $N=100$, $K=2.5$ and $E=10$
we have $99 \times 100 \times 2.5 \times 10 = 247500 $.
To look in detail at a quarter of a million test cases is obviously impractical.

If we were to consider multiple simultaneous failure modes,
we have yet another cross product of checks to be performed.
%
For instance looking at double simultaneous failure modes, where $\#C$
is the number of checks to perform
the equation reads $\#C = (N-2) \times (N-1) \times N \times K \times E$.

The bottom-up methodologies FMEA, FMECA and FMEDA take single failure modes and link them
to SYSTEM level failure modes. Because of the astronomical number of possible interactions,
some valid ones are in danger of being missed, we can term this analysis as a `leap~of~faith' 
(i.e. leaping from from the 
component failure mode to the SYSTEM level).


\paragraph{Ideal static failure mode methodology.}
An ideal static failure mode methodology would build a failure mode model
from which the traditional four models could be derived.
It would address the short-comings in the other methodologies, and
would have a user friendly interface, with a visual (rather than symbolic) syntax with icons
to represent the results of analysis phases.
%
%There are four static analysis failure mode methodologies in common use.
%Each has its advantages and drawbacks, and each is suited for 
%a different phase in the product life cycle.
The four methodologies in current use are discussed briefly below.

\subsection { FTA }

This, like all top~down methodologies introduces the very serious problem
of missing component failure modes \cite{faa}[Ch.9].
%, or modelling at
%a too high level of failure mode abstraction.
FTA was invented for use on the minuteman nuclear defence missile
systems in the early 1960s and was not designed as a rigorous
fault/failure mode methodology. 
It was designed to look for disastrous top level hazards and 
determine how they could be caused.
It is more like a procedure to
be applied when discussing the safety of a system, with a top down hierarchical
notation using logic symbols, that guides the analysis. 
This methodology was designed for
experienced engineers sitting around a large diagram and discussing the safety aspects.
Also the nature of a large rocket with red wire, and remote detonation
failsafes meant that the objective was to iron out common failures
not to rigorously detect all possible failures.
Consequently it was not designed to guarantee to covering all component failure modes, 
and has no rigorous in-built safeguards to ensure coverage of all possible
system level outcomes.

\subsubsection{ FTA weaknesses }
\begin{itemize}
\item Complex component interaction effects are by definition modelled by FTA, but because of the top down approach, not all
base component failure modes are guaranteed to be included in the model.
\item Possibility to miss environmental affects.
\item No possibility to model base component level double failure modes.
\end{itemize}

\subsection { FMEA }

\label{pfmea}
This is an early static analysis methodology, and concentrates
on SYSTEM level errors which have been investigated.
The investigation will typically point to a particular failure
of a component. 
The methodology is now applied to find the significance of the failure.
It is based on a simple equation where $S$ ranks the severity (or cost \cite{bfmea}) of the identified SYSTEM failure,
$O$ its occurrence\footnote{The occurrence $O$ is the 
probability of the failure happening.}, 
and $D$ giving the failures detectability\footnote{Detectability: often failures 
may occur but not be noticed or cause an effect.
Consider an unused feature failing.}. Muliplying these
together, 
gives a risk probability number (RPN), given by $RPN = S \times O \times D$.
This gives in effect
a prioritised `todo list', with higher $RPN$ values being the most urgent.


\subsubsection{ FMEA weaknesses }
\begin{itemize}
\item Possibility to miss the effects of base component failure modes at SYSTEM level.
 (because the its each individual component, not all its failure modes, that are considered for analysis).
\item Possibility to miss environmental effects.
\item Complex component interaction effects can be missed.
\item No possibility to model base component level double failure modes.
\end{itemize}

\paragraph{Note.} FMEA is sometimes used in its literal sense, that is to say
Failure Mode Effects analysis, simply looking at a systems' internal failure
modes and determining what may happen as a result.
FMEA described in this section (\ref{pfmea}) is sometimes called `production FMEA'.

\subsection{FMECA}

Failure mode, effects, and criticality analysis (FMECA) extends FMEA adding a criticallity factor.
This is a bottom up methodology, which takes component failure modes
and traces them to the SYSTEM level failures. 
%
Reliability data for components is used to predict the 
failure statistics in the design stage.
An openly published source for the reliability of generic
electronic components was published by the DOD
in 1991 (MIL HDK 1991 \cite{mil1991}) and is a typical 
source for MTFF data.
%
FMECA has a probability factor for a component error becoming % causing 
a SYSTEM level error.
This is termed the $\beta$ factor.
%\footnote{for a given component failure mode there will be a $\beta$ value, the
%probability that the component failure mode will cause a given SYSTEM failure}.
%
This lacks precision, or in other words, determinability prediction accuracy \cite{fafmea}, 
as often the component failure mode cannot be proven to cause a SYSTEM level failure, but is
assigned a probability $\beta$ factor by the design engineer. The use of  a $\beta$ factor
is often justified using Bayes theorem \cite{probstat}.
%Also, it can miss combinations of failure modes that will cause SYSTEM level errors.
%
The results of FMECA are similar to FMEA, in that component errors are
listed according to importance, based on 
probability of occurrence and criticallity.
% to prevent the SYSTEM fault of given criticallity.
Again this essentially produces a prioritised `todo' list.

%%-WIKI- Failure mode, effects, and criticality analysis (FMECA) is an extension of failure mode and effects analysis (FMEA).
%%-WIKI- FMEA is a a bottom-up, inductive analytical method which may be performed at either the functional or 
%%-WIKI- piece-part level. FMECA extends FMEA by including a criticality analysis, which is used to chart the 
%%-WIKI- probability  of failure modes against the severity of their consequences. The result highlights failure modes with relatively high probability 
%%-WIKI- and severity of consequences, allowing remedial effort to be directed where it will produce the greatest value. 
%%-WIKI- FMECA tends to be preferred over FMEA in space and North Atlantic Treaty Organization (NATO) military applications, 
%%-WIKI- while various forms of FMEA predominate in other industries.


\subsubsection{ FMECA weaknesses }
\begin{itemize}
\item Possibility to miss the effects of failure modes at SYSTEM level.
\item Possibility to miss environmental affects.
\item The $\beta$ factor is based on heuristics and does not reflect any rigourous calculations.
\item Complex component interaction effects can be missed.
\item No possibility to model base component level double failure modes.
\end{itemize}


\subsection { FMEDA }

Failure Modes, Effects, and Diagnostic Analysis (FMEDA)
% This 
is a process that takes all the components in a system,
and using the  failure modes of those components, the investigating engineer
ties them to possible SYSTEM level events/failure modes.
%
This technique 
evaluates a products statistical level of safety
taking into account its self-diagnostic ability.
The calculations and procedures for FMEDA are
described in EN61508 %Part 2 Appendix C 
\cite{en61508}[Part 2 App C].
The following gives an outline of the procedure.


\subsubsection{Two statistical perspectives}
\ifthenelse {\boolean{paper}}
{
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
Probability of Failure on Demand (PFD), and Probability of Failure
in continuous Operation, or Failure in Time (FIT).

\paragraph{Failure in Time (FIT).} Continuous operation is measured in failures per billion ($10^9$) hours of operation.
For a continuously running nuclear powerstation, industrial burner or aircraft engine
we would be interested in its operational FIT values. 

\paragraph{Probability of Failure on Demand (PFD).} For instance with an anti-lock system in 
automobile braking, or other fail safe measure applied in an emergency, we would be interested in PFD.
That is to say the ratio of it failing 
to succeeding to operate correctly on demand.
}
{
FMEDA is a statistical analysis methodology and is used from one of two perspectives,
Probability of Failure on Demand (PFD) (see \ref{survey:pfd}), and Probability of Failure
in continuous Operation, or Failure in Time (FIT) (see \ref{survey:fit}).
}

\subsubsection{The FMEDA Analysis Process}

\paragraph{Determine SYSTEM level failures from base components}
The first stage is to apply FMEA to the SYSTEM.
%
Each component is analysed in terms of how its failure
would affect the system.
Failure rates of individual components in the SYSTEM
are calculated based on component type and
environmental conditions. The SYSTEM errors are categorised as `safe' or `dangerous'.
%
%Statistical data exists for most component types \cite{mil1992}.
%
%This phase is typically implemented on a spreadsheet
%with rows representing each component. A typical component spreadsheet row would
%comprise of 
%component type, placement, 
%part number, environmental stress factors, MTTF, safe/dangerous etc.
%%will be a determination of whether the component failing will lead to a `safe'
%or `unsafe' condition.

%\paragraph{Overall SYSTEM failure rate.}
%The product failure rate is the sum of all component
%failure rates.  Typically the sum of all MTTF rates for all
%components in an FMEDA spreadsheet.
%This is the sum of safe and unsafe
%failures.

\paragraph{Self Diagnostics.}
We  next evaluate the SYSTEM's self-diagnostic ability.

%Each component’s failure modes and failure rate are now available.
Failure modes are now classified  as safe or dangerous.
This is done by taking a component failure mode and determining
if the SYSTEM error it is tied to is dangerous or safe.
The decision for this may be 
based on heuristics or field data.
%EN61508 uses the $\lambda$ symbol to represent probabilities.
%Because we have statistics for each component failure mode,
%we can now now classify these in terms of safe and dangerous lambda values.
%Detectable failure probabilities are labelled `$\lambda_D$' (for
%dangerous) and  `$\lambda_S$' (for safe) \cite{en61508}.

\paragraph{Determine Detectable and Undetectable Failures.}
Each safe and dangerous failure mode is now
classified as detectable or un-detectable.
For the higher integrity levels, EN61508 assumes that products have a high proportion of
self checking features.
%
This gives us four level failure mode classifications:
Safe-Detected (SD), Safe-Undetected (SU), Dangerous-Detected (DD) or Dangerous-Undetected (DU),
and the probablistic failure rate of each classification 
is represented by lambda variables
(i.e. $\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).

Because it is recognised that some failure modes may  not be discovered theoretically during the static
analysis, the
% admission of how daft it is to take a component failure mode on its own
% and guess how it will affect an ENTIRE complex SYSTEM 
% Admission of failure of the process really !!!!
next step  is to investigate using an actual working SYSTEM. 
%
Failures are deliberately caused (by physical intervention), and any new SYSTEM level 
failures are added to the model.
Heuristics and MTTF failure rates for the components
are used to calculate probabilities for these new failure modes 
along with their safety and detectability classifications (i.e.
$\lambda_{SD}$, $\lambda_{SU}$, $\lambda_{DD}$, $\lambda_{DU}$).
These new failures are added to the model.
%SD, SU, DD, DU.

%With these classifications, and statistics for each component
%we can now calculate statistics for the diagnostic coverage (how good at `self checking' the system is)
%and its safe failure fraction (how many of its failures are self detected or safe compared to
%all failures possible).
%
%The calculations for these are described below.

%\paragraph{Diagnostic Coverage.}
%The diagnostic coverage is simply the ratio
%of the dangerous detected probabilities
%against the probability of all dangerous failures,
%and is normally expressed as a percentage. 
%%$\Sigma\lambda_{DD}$ represents
%the percentage of dangerous detected base component failure modes, and 
%$\Sigma\lambda_D$ the total number of dangerous base component failure modes.
%
%$$ DiagnosticCoverage = \Sigma\lambda_{DD} / \Sigma\lambda_D $$
%
%The diagnostic coverage for safe failures, where  $\Sigma\lambda_{SD}$ represents the percentage of
%safe detected base component failure modes,
%and $\Sigma\lambda_S$ the total number of safe base component failure modes,
%is given as
%
%$$ SF = \frac{\Sigma\lambda_{SD}}{\Sigma\lambda_S} $$
%
%
\paragraph{Safe Failure Fraction.}
A key concept in  FMEDA is Safe Failure Fraction (SFF).
This is the ratio of safe  and dangerous detected failures
against all safe and dangerous failure probabilities. 
%Again this is usually expressed as a percentage.

%$$ SFF = \big( \Sigma\lambda_S + \Sigma\lambda_{DD} \big) / \big( \Sigma\lambda_S + \Sigma\lambda_D \big) $$

%This is the ratio of 
%Step 4 Calculate SFF, SIL and PFD
%The SIL level of the product is finally determined from the Safe Failure Fraction (SFF) and the Probability of Failure on Demand (PFD). The following formulas are used.
%SFF = (lSD + lSU + lDD) / (lSD + lSU + lDD + lDU)
%PFD = (lDU)(Proof Test Interval)/2 + (lDD)(Down Time or Repair Time)

% Often a given component failure mode there will be a $\beta$ value, the
% probability that the component failure mode will cause a given SYSTEM failure.

%\paragraph{Risk Mitigation}
%
%The component may be have its risk factor 
%reduced by the checking interval (or $\tau$ time between self checking procedures).
%
%Ultimately this technique calculates a risk factor for each component.
%The risk factors of all the components are summed and 
%%give a value for the `safety level' for the equipment in a given environment.


\paragraph{Classification into Safety Integrity Levels (SIL).}
There are four SIL levels, from 1 to 4 with 4 being the highest safety level.
In addition to probablistic risk factors, the 
diagnostic coverage and SFF
have threshold bands beoming stricter for each level.
Demanded software verification and specification techniques and constraints 
(such as language subsets, s/w redundancy etc) 
become stricter for each SIL level.
%%
%% Andrew asked me to expand on this here, but it would take at least two
%% pages. I think its more appropriate for the survey.tex chapter.
%%

Thus FMEDA uses statistical methods to determine 
a safety level (SIL), typically used to meet an acceptable risk 
value, specified for the environment the SYSTEM must work in.
EN61508 defines in general terms,
 risk assessment and required SIL levels \cite{en61508} [5 Annex A].

%the probability of
%failures occurring, and provide an adaquate risk level.
%
%A component failure mode, given its MTTF
%the probability of detecting the fault and its safety relevant validation time $\tau$,
%contributes a simple risk factor that is summed
%in to give a final risk result. 
%
Thus an FMEDA 
model can be implemented on a spreadsheet, where each component
has a calculated risk, a fault detection time (if any), an estimated risk importance
and other factors such as de-rating and environmental stress.
With one component failure mode per row, 
all the statistical factors for SIL rating can be produced\footnote{A SIL rating will 
apply to an installed plant, i.e. a complete installed and working SYSTEM. 
SIL ratings for individual components or 
sub-systems are meaningless, and the nearest equivalent would be the 
FIT/PFD and SFF and diagnostic coverage figures.}.


\subsubsection{FMEDA and failure outcome prediction accuracy.}
FMEDA suffers from the same problems of 
lack of component failure mode outcome prediction accuracy, as FMEA in section \ref{pfmea}.
%
This is because the analyst has to decide how particular components failing will impact on the SYSTEM or top level.
This involves a `leap of faith'. For instance, a resistor failing in a sensor circuit
may be part of a critical monitoring function. 
The analyst is now put in a position
where he probably should assign a dangerous failure classification to it.  
%
There is no analysis
of how that resistor would/could  affect the components close to it, but because the circuitry
is part of critical section it will most likely
be linked to a dangerous system level failure in an FMEDA study.
%
%%- IS THIS TRUE IS THERE A BETA FACTOR IN FMEDA????
%%-
%A $\beta$ factor, the heuristically defined probability
%of the failure causing the system fault may be applied.
%
%In FMEDA there is no detailed analysis of the failure mode behaviour
%of the component in its local environment
%Component failure modes are traceable directly to the SYSTEM level. 
%it becomes more
%guess work than science.
%
With FMEDA, there is no rigorous cause and effect analysis for the failure modes
and how they interact on the micro scale (the components adjacent to them in terms of functionality). 
Unintended side effects that lead to failure can be missed. 
Also component failure modes that are not
dangerous, may be wrongly assigned as dangerous simply because they exist in a critical
section of the product.

% some critical component failure 
%modes, but we can only guess, in most cases what the safety case outcome
%will be if it occurs.

This leads to the practise of having components within a SYSTEM partitioned into different 
safety level zones as recomended in EN61508\cite{en61508}. This is a vague way of determining
safety, as it can miss unexpected effects due to `unexpected' component interaction.

The Statistical Analysis methodology is the core philosophy
of the Safety Integrity Levels (SIL) embodied in EN61508 \cite{en61508}
and its international analog standard IOC5108.


\subsubsection{ FMEDA weaknesses }
\begin{itemize}
\item Possibility to miss the effects of failure modes at SYSTEM level.
\item Statistical nature allows a proportion of undetected failures for given S.I.L. level. These could be catastrophic failures, as long as the perceived probability is low enough, they are considered acceptable for EN61508.
\item Complex component interaction effects are more likely to be seen (because self diagnostic capability is considered), than FMEA or FMECA but can still be missed.
\item Allows a small proportion of `undetectable' error conditions.
\item No possibility to model base component level double failure modes.
\end{itemize}
%AND then how we can solve all there problems

\section{A wish list for a failure mode methodology}
\begin{itemize}
\item All component failure modes must be considered in the model.
\item It should be easy to integrate mechanical, electronic and software models \cite{sccs}[pp.287].
\item It should be re-usable, in that commonly used modules can be re-used in other designs/projects.
\item It should have a formal basis, that is to say, be able to produce mathematical proofs
for its results, such as system level error causation trees, reliability and safety statistics.
\item It should be easy to use, ideally using a 
graphical syntax (as opposed to a formal symbolic/mathematical text based language).
\item From the top down, the failure mode model should follow a logical de-composition of the functionality
to smaller and smaller functional groupings \cite{maikowski}.
\item Multiple failure modes may be modelled from the base component level up.
\end{itemize}


\section{Design of a new static failure mode based methodology}

\paragraph{New methodology must be bottom-up.}
In order to ensure that all component failure modes have been covered
the methodology will have to work from the bottom-up
and start with the component failure modes.
%
\paragraph{Natural Fault Finding is top down.}
The traditional fault finding, or natural fault finding
is to start at the top with SYSTEM level failure modes/faults. 
%
On encountering a 
fault, the symptom is first observed at the top or
SYSTEM level. By decomposing the functionality of the faulty system and testing
we can further decompose the system until we find the
faulty base level component.
Decomposition of electrical circuits is formalised and explored
in \cite{maikowski}. This top down technique de-composes by functionality.
Simpler and simpler functional groups are discovered as we delve
further into the way the system works and is built.


\paragraph{Need for a `bottom-up' system de-composition.}
There is an apparent conflict here. The natural way to 
de-compose a system is from the top down.
%
If we do this though, we do not naturally include
all failure modes in the modules determined as we
de-compose downwards.
%
What is required here is to mimic this top-down de-composition
with a bottom up technique.
By doing that, we can take all base component failure modes
and ensure they are included in the model.

By taking components that form {\fg}s from the bottom up
and then taking those to form higher level
{\fg}s we can get a close approximation of the de-composition process from the bottom up.
The philosophy of top down de-composition is very similar.
Top down de-composition applies functional 
de-composition, because it seeks to break the system down
into manageable and separately testable entities.
A second justification for this is that the design process for a product requires both top down and bottom-up
thinking. To analyse a system from the bottom-up is a useful
design validation process in itself \cite{sommerville}.
%%
%% CAN we find a ref for both top and bottom up being used 
%% as design validation ????

\paragraph{Design Decision: Methodology must be bottom-up.}
In order to ensure that all component failure modes are handled,
this methodology must start at the bottom, with base component failure modes.
In this way automated checking can be applied to all component failure modes
to ensure none have been inadvertently excluded from the process.

\paragraph{Problems with functional group hierarchy.}
A hierarchy of functional grouping, leading to a system model
still leaves us with the problem of the number of component failure modes.
The base components will typically have several failure modes each.
%
Given a typical embedded system may have hundreds of components.
This means that we would still have to tie base component failure modes
to SYSTEM level errors. 
The problem with this is that the base component failure mode under investigation
effects are not rigorously examined in relation to functionally adjacent components.
Thus there is the `possibility to miss failure mode effects
at the much higher SYSTEM level' criticism of the FTA, FMEDA and FMECA methodologies.
%%%
%%% OK Got up to here Lunchtime edit 06DEC2010.............

\paragraph{Design Decision: Methodology must collate errors at each functional group stage.}
SYSTEMS typically have far fewer failure modes than the sum of their base component failure modes.
SYSTEM level failures may be caused by a variety of component failure modes.
A SYSTEM level failure mode is an abstracted failure mode, in that
it is a symptom of some lower level failure or failures.
Tracing the SYSTEM level failure or symptom, down through
a decomposed system, will give a fault tree. This will typically
trace the SYSTEM level failure mode to some individual base component failures
or combinations thereof.
% ABSTRACTION
For instance a failed resistor in a sensor at a base component level is a specific 
failure mode. 
%
For example it could be called `RESISTOR 1 OPEN'. 
%
Now consider the  symptom in a functional group comprising the sensor channel that 
RESISTOR 1 is part of `RESISTOR 1 OPEN'.
%
We might call it `READING~HIGH' failure perhaps. 
The Fault has become less detailed and more general. There may be other 
causes for a `READING~HIGH'. We can say that the failure
mode `READING~HIGH' is more abstract in terms of the SYSTEM, than `RESISTOR 1 OPEN'.
%
At a higher level still
this may be called `SENSOR CHANNEL 1' fault.
At a system level it may simply be a `SENSOR FAILURE'.
As we traverse up the fault tree the failure modes
become more abstract. 
%
At each functional group collection, there must be a process to collect
common symptoms and reduce the number of failure modes to handle.
This must be a process that incrementally reduces the number
of failure modes as the abstraction level reaches the SYSTEM level.

\paragraph{How to build a meaningful SYSTEM failure behaviour model.}
The next problem is how we build a failure mode model
that converges from a multitude of base
component failures to a finite set of SYSTEM level failure modes.
%
It would be better to analyse the failure mode behaviour of each
functional group, and determine the ways in which it, rather than its
components, can fail.
% 
By doing this, the natural process whereby symptoms of the {\fg}
(which can potentially be caused by more than one component failure mode) 
are extracted.
%
The number of symptoms will be less than or equal to the number
component failure modes, and in practise will be much less.
%
Thus stage by stage symptom collection becomes the key to reducing the number
of failure modes to handle as we traverse up the hierarchy.


\paragraph{Component failures and {\fg} failure symptoms.}
In other words we want to find out what the symptoms of the failures in the {\fg}s
are. 
%The number of symptoms of failure should be equal to or
%less than the number of component failure modes, simply because
%often there are several potential causes of failure symptoms.
%
When we have the symptoms, we can start thinking of the {\fg} as a component in its own right.
%with a simplified and reduced set of failure symptoms.
%
We can now create a new {\dc}, where its failure modes
are the failure symptoms of the {\fg}.
%

By taking {\dcs} to form higher level functional groups
we can build a bottom-up model incrementally.
In this way as we build the hierarchy, we naturally abstract the 
failure mode behaviour, but can check that all failure modes in 
the hierarchy have been considered and tied to causing symptoms.


\paragraph{Incremental Stages and \dcs .}
We can use incremental stages to build the hierarchy.
We can take small {\fg}s of components, where the {\fg}
is a small set of components that perform a simple
task.
%
%The functional group should perform a clearly defined task.
The design engineer must choose the components that form a {\fg}.
It should be possible to consider the {\fg} as a component or
black box, performing a given function.
The {\fg} should be chosen to be as small 
(in terms of the number of components) as possible.
%
This should be small enough to be able %Another advantage of the functional group being small
to comfortably analyse all the failure
modes of its components.
%
We can consider these failure modes from the perspective
of the {\fg}. In other words, for each component failure mode in the {\fg},
we create a `test case' and decide how each failure affects the functional group.
%
With the results from the test cases we will now have the ways in which the 
{\fg} can fail.
%
%
We can refine this further, by grouping the common symptoms, or results that
are the same failure {\wrt} the {\fg}.
%
We can now treat the {\fg} as a component, and call it a {\dc}, in other words, a sub-system with a known set of failure modes.
%
We can now create a new/{\dc} and assign it these common symptoms
as its failure modes.
%
This {\dc} can be used to build higher level
{\fg}s, and this will naturally form a hierarchy.
This hierarchy can be extended until it encompasses 
an entire system. 
%
It can be considered complete when
all failure modes from all components are included in the model
and all base component failure modes can be traced
through the fault tree to SYSTEM level failure modes.

\paragraph{Directed Acyclic Graph (DAG).} 
If we ensure that
derived components cannot be included in {\fg}s
of a lower abstraction level
the data structure produced from collecting functional groups
and deriving components will naturally form a DAG.
In other words we can say that we cannot allow a {\fg}
to include any component created from it.

%
%
By representing the failure mode model as a DAG, we
now have the capability to take SYSTEM level failure modes
and determine the possible combinations of component failure modes that
could have caused it.
This will allow us to define fault trees for each SYSTEM level failure.
This will mean that  we be able to determine which 
combinations of base component failures could cause the SYSTEM
failure.
%In FTA terminology, a list of possible
%causes for a SYSTEM level failure is known as a `cut set' \cite{nasafta}\cite{nucfta}.
If statistical models exist for the component failure modes
these failure causation trees (or minimal cut sets\footnote{In FTA terminology a minimal cut set is the branch of a 
fault tree, from the top SYSTEM level to the bottom, with the least number
of base component failure modes. If a single base component failure mode can cause
a SYSTEM level error this is usually considered a liability.})
can be used to calculate Mean Time to Failure (MTTF) or 
Probability of Failure on demand (PFD) figures.
Contrast the analytical capability of FMMD with the 
methodologies where the component failure modes are linked
directly to SYSTEM failure modes with no analysis stages in between.


\paragraph{Design Decision: A functional group cannot 
contain {\dc}s at a higher abstraction level than itself}

We can say that no component may be derived from itself directly 
or indirectly.
We can track the `abstraction level' by increasing it each time
there is a phase of symptom collection.
We can use the symbol $alpha$ to represent the abstraction level
and make it an attribute of a component.
Base components will have an $\alpha$ level of zero.
A derived component when created must always have a greater $\alpha$ value than any
of the components included in the {\fg} from which it was derived.


\paragraph{Natural Reduction in number of failure modes with abstraction level}
%
Because common symptoms are being collected, as we build the tree upward
the number of failure modes decreases (or exceptionally stays the same)
 at each level.\footnote{In very unusual cases where the known 
failure modes of a {\fg} can be collected into symptoms,
the number of failure modes from its components would be the
same as the number of failure modes in the component derived from it.}
This decreasing of the number of failure modes is borne out {\irl}.
Of the thousands of component failure modes in a typical product
there are generally only a handful of SYSTEM level failure modes
(or top level `symptoms' of underlying failures). 
%

\subsection{Outline of the FMMD process}
\label{fmmdproc}
FMMD builds {\fg}s of components from the bottom-up. 
The lowest level of components are termed base components.
These are the initial building blocks.
In electronics these would be the individual
passive and active components on the parts~list.
In mechanics the levers, linkages, springs and cogs etc.
%
Functional groups are collections of components
that work together to perform a simple function.
%
We can perform a failure mode effects analysis on each of the component failure
modes within a {\fg}. Because we can implement the process in software we can 
thus ensure that all component failure modes 
are included in the model. 
%
We can then treat the {\fg} as a `black box' or component in its own right.
We can now look at how the {\fg} can fail. 
%
Many of the component failure modes will
cause the same failure symptoms in the {\fg}.
We can collect these failures as common symptoms.
%
When we have our set of symptoms, we can now create
a {\dc}. The {\dc} will have as its set of failures
modes, the collected symptoms of the {\fg}.
%
Because we can now have {\dcs} we can use these to form
new {\fg}s and we can build a hierarchical `failure~mode' model of the SYSTEM.


%%- Need diagram of hierarchy
%%-
%%-
\begin{figure}[h]
 \centering
 \includegraphics[width=200pt,bb=0 0 331 249,keepaspectratio=true]{./fmmd_concept/fmmd_hierarchy.jpg}
 % fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
 \caption{Example derived component created from the functional group comprised of components a,b,c}
 \label{fig:fmmd_hierarchy}
\end{figure}

A {\fg} is a set of components (each with a set of of failure modes) 
that collectively group together to serve some purpose (to perform some function),
 and  derived components are determined
from analysis and symptom collection
of the {\fg}.

The {\dc} is equipped with a new set of failure modes 
corresponding to the symptoms from the {\fg}.

The diagram in figure \ref{fig:fmmd_hierarchy}, shows one stage
of the FMMD process. The resultant {\dc} may be used to
create higher level {\fg}s in later stages.

% \begin{figure}[h]
%  \centering
%  \includegraphics[bb=0 0 331 249,keepaspectratio=true]{./fmmd_hierarchy.jpg}
%  % fmmd_hierarchy.jpg: 331x249 pixel, 72dpi, 11.68x8.78 cm, bb=0 0 331 249
%  \caption{Example derived component created from a functional group comprised of components a,b,c}
%  \label{fig:fmmd_hiarchy}
% \end{figure}
% 
% \vspace{20pt}
% NEED DIAGRAM OF HIERARCHY
% \vspace{20pt}

We associate a component with its failure modes.
This is represented in UML in figure \ref{fig:component concept}.

\begin{figure}[h]
 \centering
 \includegraphics[width=200pt,keepaspectratio=true]{./fmmd_concept/component.jpg}
 % component.jpg: 467x76 pixel, 72dpi, 16.47x2.68 cm, bb=0 0 467 76
 \caption{Component with failure modes UML diagram}
 \label{fig:component concept}
\end{figure}


\subsection{Environmental Conditions, Operational States and FMMD}

Any real world sub-system will exist in a variable environment
and may have several modes of operation.
In order to find all possible failures, the sub-system 
must be analysed for each operational state
and environment condition that can affect it.
%
Two design decisions are required here: which objects should we 
analyse the environmental and the operational states with respect to.
There are three objects in our model to which these considerations could be applied.
We could apply these conditions for analysis 
to the functional group, the components, or the derived
component.

\paragraph {Environmental Conditions and FMMD.}

Environmental conditions are external to the
{\fg} and are often things over which the system has no direct control.
Consider ambient temperature, pressure or even electrical interference levels.
%
Environmental conditions may affect different components in a {\fg}
in different ways.

For instance a system may be specified for
$0\oc$  to $85\oc$ operation, but some components
may show failure behaviour between $60\oc$  and $85\oc$ 
\footnote{Opto-islolators typically show marked performance decrease after
$60\oc$ \cite{tlp181}, whereas another common component, say a resistor, will be unaffected.}.
Other components may operate comfortably within that whole temperature range specified.
Environmental conditions will have an effect on the {\fg} and the {\dc}
but they will have specific effects on individual components.

\paragraph{Design Decision.}
Environmental constraints will be applied to components.
A component will hold a set of environmental states that
affect it. 
Environmental conditions will apply SYSTEM wide,
but may only affect specific components.
%Some may not be required for consideration
%for the analysis of particular systems.

\paragraph {Operational States and FMMD.}

Sub-systems may have specific operational states.
These could be a general health level such as 
normal operation, graceful degradation or lockout.
Or they could be self~checking sub-systems that are either in a normal or self~check state.

Operational states are conditions that apply to a functional group, not individual components.
%% Andrew says that that does no make sense But I think it does

\paragraph{Design Decision.}
Operational state  will be applied to {\fg}s.

\paragraph{UML Model of FMMD Analysis}

The UML diagram in figure \ref{fig:env_op_uml}, shows the data
relationships between {\fgs} and operational states, and component
failure modes and environmental factors.


\begin{figure}[h]
 \centering
 \includegraphics[width=400pt,bb=0 0 818 249,keepaspectratio=true]{./fmmd_concept/fmmd_env_op_uml.jpg}
 % fmmd_env_op_uml.jpg: 818x249 pixel, 72dpi, 28.86x8.78 cm, bb=0 0 818 249
 \caption{UML model of Environmental and Operational states w.r.t FMMD}
 \label{fig:env_op_uml}
\end{figure}


\subsection{Justification of wishlist}

By applying the methodology in section \ref{fmmdproc}, the wishlist can
now be evaluated for the proposed FMMD methodology.

\subsubsection{All component failure modes must be considered in the model.}
The proposed methodology will be bottom-up. 
This ensures that all component failure modes are handled.


\subsubsection{ It should be easy to integrate mechanical, electronic and software models.}
Because component failure modes are considered, we have a generic entity to model.
We can describe a mechanical, electrical or software component in terms of its failure modes. 
%
Because of this 
we can model and analyse integrated electro mechanical systems, controlled by computers,
using a common notation.

\subsubsection{ It should be re-usable, in that commonly used modules can be re-used in other designs/projects.}
The hierarchical nature, taking {\fg}s and deriving components from them, means that 
commonly used {\dcs} can be re-used in a design (for instance self checking digital inputs)
or even in other projects where the same {\dc} is used.


\subsubsection{ It should have a formal basis, data should be available to produce mathematical proofs
for its results}
Because the failure mode of a SYSTEM is a hierarchy of {\fg}s and derived components
SYSTEM level failure modes are traceable back down the fault tree to
component level failure modes. This provides causation trees \cite{sccs} or, minimal cut sets
for all SYSTEM failure modes.

\subsubsection{ It should be capable of producing reliability and danger evaluation statistics.}
The minimal cuts sets for the SYSTEM level failures can have computed MTTF
and danger evaluation statistics sourced from the component failure mode statistics \cite {mil1991}.

\subsubsection{ It should be easy to use, ideally 
using a graphical syntax (as oppossed to a formal mathematical one).}
A modified form of constraint diagram (an extension of Euler diagrams) has
been developed to support the FMMD methodology.
This uses Euler circles to represent failure modes, and spiders to collect symptoms, to
advance a {\fg} to a {\dc}.


\subsubsection{ From the top down the failure mode model should follow a logical de-composition of the functionality
to smaller and smaller functional modules \cite{maikowski}.}
The bottom-up approach fulfils the  logical de-composition requirement, because the {\fg}s
are built from components performing a given task. 


\subsubsection{ Multiple failure modes may be modelled from the base component level up.}
By breaking the problem of failure mode analysis into small stages
and building a hierarchy, the problems associated with the cross products of
all failure modes within a system are reduced by an exponential order.
This is because the multiple failure modes are considered
within {\fgs} which have fewer failure modes to consider 
at each FMMD stage.
Where appropriate, multiple simultaneous failures can be modelled by
introducing test~cases where the conjunction of failure modes is considered.

\subsubsection {Inhibit Conditions}
Some failure modes only occur when another failure has occurred, or
due to an environmental condition reaching a critical value. This is specifically
dealt with using the FTA methodology~\cite{nucfta}[IV 9].
An example FTA inhibit gate is shown in figure \ref{fig:inhibitconcept}.

 \begin{figure}
 \centering
 \begin{tikzpicture}[shorten >=1pt,->,draw=black!50, node distance=\layersep]
     \draw[style=thick];

     \tikzstyle{every pin edge}=[<-,shorten <=1pt]
     \tikzstyle{fmmde}=[circle,fill=black!25,minimum size=17pt,inner sep=0pt]
     \tikzstyle{fmmdt}=[elipse,fill=red!15,minimum size=17pt,inner sep=0pt]
     \tikzstyle{fmmdc}=[rectangle,draw,fill=black!17,minimum size=17pt,inner sep=4pt]
     \tikzstyle{fmmdi}=[regular polygon,regular polygon sides=6, draw],fill=black!25,minimum size=50,inner sep=4pt]
     \tikzstyle{component}=[fmmde, fill=green!50];
     \tikzstyle{ctext}=[fmmde, draw, fill=black!20];
     \tikzstyle{failure}=[fmmde, fill=red!50];
     \tikzstyle{symptom}=[fmmde, fill=blue!50];
     \tikzstyle{inhibit}=[fmmdi, fill=blue!40];
     \tikzstyle{condition}=[fmmdc, fill=black!20];
     \tikzstyle{conjunction}=[fmmde, fill=red!40];
     \tikzstyle{annot} = [text width=4em, text centered]

 	\node[condition] (C-Q) at (0,-1) {Condition Q};
        \node[inhibit]   (I)   at (0,-4) {Inhibit};
        \node[ctext] (CC)  at (4,-4) {$\stackrel{ probability\; that}{  Q\; occurs\; given\; A}$};
	%\node[text]      (T)   at (2,-2) {Probability that Q occurs given A};
 	\node[condition] (C-A) at (0,-7) {Condition A};

	
     \path (C-A) edge (I);
     \path (CC) edge (I);
     \path (I) edge (C-Q);
     %\path (C-1b) edge (CJ);
     %\path (C-1b) edge (CJ);

 \end{tikzpicture}
 % End of code
 \caption{FTA `inhibit' gate}
 \label{fig:inhibitconcept}
\end{figure}

\paragraph{Static or Dynamic Modelling of Inhibit}
If the model is static we can consider the conditional failure
at a lower probability of occurring (i.e. the probability 
of A multiplied by the probability of Q).
If we wish to dynamically model the conditional failure
an attribute to the failure~modes must be added
that can reference other failure~modes and environmental conditions.
A UML diagram with inhibit conditions added is shown in figure \ref{fig:umlconcept2}.

\subsection{Safe Dangerous, Detected and Undetected.}

The top level or SYSTEM failure modes can be examined and
assigned SIL~\cite{en61508} safe and dangerous attributes.
Detected failure modes appear as symptoms that have been 
integrated into symptoms involving self checking.
Undetectable failure modes, will follow a direct line
up from component level to SYSTEM level without being
incorporated into a self checking functional group.
These undetected failures correspond to a minimal cut 
set where a single base~component failure mode
can be traced to a SYSTEM level failure mode.
They can thus be determined by searched the DAG
for a single base~component failure mode minimal cut set~\cite{nucfta}.

% UML DIAGRAM  

\begin{figure}[h]
 \centering
 \includegraphics[width=400pt,keepaspectratio=true]{./fmmd_concept/fmmd_env_op_uml2.jpg}
 % fmmd_env_op_uml2.jpg: 866x313 pixel, 72dpi, 30.55x11.04 cm, bb=0 0 866 313
 \caption{UML diagram with Inhibit conditions}
 \label{fig:umlconcept2}
\end{figure}


\subsection{Aims of FMMD Methodology}

Taking the four current failure mode methodologies into consideration, and comparing them to the proposed FMMD methodology, the following wish list or aims can be stated.

\begin{itemize}
\item It can be checked automatically that all component failure modes have
 been considered in the model. Should a failure mode have been missed
the data model can be searched and the unhandled failure modes flagged to the design engineer.
\item Because we are modelling with failure modes the {\fgs} and {\dcs} these can be generic, 
i.e.  mechanical, electronic or software components.
\item The {\dcs} are re-usable, in that commonly used modules can be re-used in other designs/projects.
\item It will have a formal basis, that is to say, 
we have the data at hand to produce meaningful
results (MTTF and the cause trees for SYSTEM level faults).
\item Overall reliability and danger evaluation statistics can be computed. 
By knowing all causation trees,
the statistical probabilities (from base component data) for all causes can be simply added.
\item A graphical representation based on Euler diagrams is used. 
This provides an interface that does not involve
formal mathematical/symbolic notation.
This is intended to be user friendly and to guide the user through the FMMD process
while applying automatic checks for unhandled conditions.
\item From the top down the failure mode model will follow a logical de-composition of the functionality; by
chosing {\fg}s and working bottom-up this hierarchical trait will occur as a natural consequence.
\item Undetectable or unhandled failure modes will be specifically flagged.
\item It is possible to model multiple failure modes.
\end{itemize}


\ifthenelse {\boolean{paper}}
{
%paper
\pagebreak[4]
\section{Re-Factoring the UML Model}
The UML models thus far in this 
have been used to develop the data relationships required to perform FMMD analysis.
This section re-organises and rationalises the UML model.
We want to be able to use {\dcs} in functional groups.
It therefore makes sense for {\dc} to inherit {\em component}.

The re-factored UML diagram is shown in figure \ref{fig:refactored_uml}.


\begin{figure}[h]
 \centering
 \includegraphics[width=400pt,bb=0 0 702 464]{./master_uml.jpg}
 % master_uml.jpg: 702x464 pixel, 72dpi, 24.76x16.37 cm, bb=0 0 702 464
 \caption{Re-factored UML Diagram}
 \label{fig:refactored_uml}
\end{figure}

}
{
% chapter
\section{Re-Factoring the UML Model}
This chapter has used UML diagrams to develop the data types required to implement FMMD.
The terms used in FMMD and the UML data model are further refined in 
chapter \ref{defs}.
}

\section{Conclusion}

This 
\ifthenelse {\boolean{paper}}
{
paper 
}
{
chapter
}
provides the background for the need for a new methodology for
static analysis that can span the mechanical electrical and software domains
using a common notation.
The author believes it addresses many short comings in current static failure mode analysis methodologies.
\vspace{60pt}
\today

%% $$\frac{-b\pm\sqrt{ {b^2-4ac}}}{2a}$$
%\today