Robin_PHD/submission_thesis/CH8_Conclusion/copy.tex

\label{sec:chap8}

This study has examined the processes and state of the art of the four main FMEA variants.
%
It has exposed shortcomings in these methodologies, which can be summed up as an inability to
model hybrid software and hardware systems in a satisfactory manner, a problem with state explosion
and difficulty of re-use of analysis because there is no support for modularity.
%
The FMECA and FMEDA variants also suffer from embedding subjective and objective assessments of failure modes.
%
A modularised FMEA---Failure Mode Modular De-composition (FMMD)---had been proposed.
%
This modularised version had been supported by the work already established by the definition of
{\fms} for {\bc} in the literature~\cite{fmd91,mil1991,en298,en230}.
%
A selection of electronic examples was analysed using FMMD
which deliberately introduced varying circuit
topologies with conventional and circular signal paths
and mixed digital and analogue designs.
%
For all these examples, the state explosion related performance was compared with that of
traditional FMEA.
%
In all cases there was a performance gain,
that is to say that for all but trivial cases,
the number of manual analysis operations to perform
was significantly reduced.
%
Not only this, but the analysis naturally provided modules which could be re-used,
re-used not only in the circuit under analysis but potentially in different and future projects as well.

Traditional FMEA methods have been applied to software, but analysis has always been performed separately from
the electronic FMEA~\cite{sfmeaa,sfmea}. %, and while modular kept strictly to a bottom-up approach.
%
Using established concepts from contract programming~\cite{dbcbe}  FMMD was extended to analyse software,
which facilitated a solution to the software/hardware interfacing problem~\cite{sfmeainterface}.
%
Two examples of mixed software and hardware systems were analysed as integrated FMMD models
as proof of concept. The first example in chapter~\ref{sec:chap6}, was
presented to the System Safety IET conference in 2012~\cite{syssafe2012}.
%
Chapter~\ref{sec:chap7} viewed FMMD from a formal perspective and looked at problems and constraints
necessary to perform FMEA and FMMD.
%
Theoretical performance models were developed (see section~\ref{sec:theoreticalperfmodel}) which showed that with increasing modularisation
the number of manual checks to perform for analysis fell, which was validated by examining the reasoning distance performance of
the examples from chapter~\ref{sec:chap5}. % in this regard.
%
A unitary state failure mode concept was developed (see section~\ref{sec:unitarystate}), and it was shown that
the FMMD process naturally enforced this throughout the hierarchy of a model.
%
Finally the FMMD process was described algorithmically using set theory in appendix~\ref{sec:algorithmfmmd}.%{app:alg}.

In conclusion then, a new method of failure analysis has been devised  which improves on established techniques in the following ways:
% \begin{itemize}
%     \item Must be able to analyse hybrid software/hardware systems,
%     \item no state explosion (which has rendered exhaustive analysis impractical),
%     \item exhaustive checking at a modular level, %(total failure coverage within {\fgs} all interacting component and failure modes checked),
%     \item traceable reasoning system models,% to aid repeatability and checking,
%     \item re-usable i.e. it should be possible to re-use analysis,
%     \item possibility to analyse simultaneous/multiple failures,
%     \item modular --- i.e. usable in a distributed system.
%   % \item
% \end{itemize}

\begin{itemize}
 \item FMMD provides the means to create failure models that integrate software and hardware,
 \item the state explosion related to exhaustive FMEA solved,
 \item a modular approach to FMEA means that analysis work is re-usable,
 %\item FMMD encourages
 \item distributed systems, and smart instruments, can now be analysed and assessed,
 \item multiple failures can be analysed (without an undue state explosion cost).
\end{itemize}
These benefits fall under the following assumptions and constraints:
\begin{itemize}
 \item Failure modes are available for all {\bcs},
 \item Analysts are capable of finding suitable {\fgs} from electronic schematics,
 \item Software is hierarchical and its elements (functions) can be modelled using contract programming.
 %\item
\end{itemize}


Whilst investigating FMMD a number of further areas for research revealed themselves.
These are presented below.

%\section{Conclusion}

% It is the authors belief that the practise of FMEA would be improved by taking a modular approach
% and that it is necessary that software and hardware should be included in the same failure mode models.
% %
% The proposed methodology, FMMD, provides the means to do this, and it is the authors hope that this
% or a variant thereof is taken up and used to improve system safety.

\section{Further Work}
%This section describes areas that the study has revealed where the FMMD methodology may be extended or improved.
 \subsection{How traditional FMEA reports can be derived from an FMMD model.}
%
An FMMD model has a data structure (described by UML diagrams, see figure~\ref{fig:cfg}), and by traversing an FMMD hierarchy
we can map system level failures back to {\bc} {\fms} (or combinations thereof).
%
Because we can determine these mappings we can produce reports in the traditional FMEA format ({\bc}~{\fm}~$\mapsto$~{system failure}).
%
With the addition of {\bc} {\fm} statistics~\cite{mil1991} we can provide reliability predictions for system level failures.
%
The Pt100 example is revisited for this purpose and analysed for single and double failures, with statistics for {\bcs}
taken from MIL1991 %~\cite{mil1991},
in section~\ref{sec:bcstats}.
%
With an FMMD failure mode model a top down perspective is possible.
%
We could for instance take each system level failure and produce a causation tree for it, tracing back
to all {\bc} {\fms}.
%
This is very closely related to the structure of FTA (top down) failure causation graphs.
%
The possibility of automatically producing FTA diagrams from FMMD models
is examined in section~\ref{sec:fta}.
%


\subsection{Statistics: From base component failure modes to System level events/failures.}
\label{sec:bcstats}
Knowing the statistical likelihood of a component failing can give a good indication
of the reliability of a system, or in the case of dangerous failures, the Safety Integrity Level
of a system.
%
EN61508~\cite{en61508} requires that statistical data is available and used for all component failure modes
analysed by FMEDA.
%
FMMD, as a bottom up methodology can use component failure mode statistical data, and incorporate it
into its hierarchical model.
%By way of example, the Pt100 analysis %example
%from section~\{sec:pt100} has been used to demonstrate this.
Because we can use an FMMD model to generate an FMEA report, with additional {\bc} failure mode statistics
we can %therefore
use FMMD to produce an FMEDA report.


\paragraph{Pt100 Example: Single Failures and statistical data} %Mean Time to Failure}

From an earlier example, the model for the failure mode behaviour of the Pt100 circuit,
we can add {\bc} {\fm} statistics and determine the probability of symptoms of failure.
%
The DOD electronic reliability of components
document  MIL-HDBK-217F\cite{mil1991} gives formulae for calculating
the
%$\frac{failures}{{10}^6}$
${failures}/{{10}^6}$ % looks better
in hours for a wide range of generic components
\footnote{These figures are based on components from the 1980's and MIL-HDBK-217F
can give conservative reliability figures when applied to
modern components}.
%
Using the MIL-HDBK-217F\cite{mil1991}  specifications for resistor and thermistor
failure statistics, we calculate the reliability of the Pt100 example (see section~\ref{sec:Pt100}).


\paragraph{Resistor FIT Calculations}

The formula given in MIL-HDBK-217F\cite{mil1991}[9.2] for a generic fixed film non-power resistor
is reproduced in equation \ref{resistorfit}. The meanings
and values assigned to its co-efficients are described in table \ref{tab:resistor}.
\glossary{name={FIT}, description={Failure in Time (FIT). The number of times a particular
failure is expected to occur in a $10^{9}$ hour time period.}}


\fmodegloss

\begin{equation}
% fixed comp resistor{\lambda}_p = {\lambda}_{b}{\pi}_{R}{\pi}_Q{\pi}_E
resistor{\lambda}_p = {\lambda}_{b}{\pi}_{R}{\pi}_Q{\pi}_E
 \label{resistorfit}
\end{equation}

\begin{table}[ht]
\caption{Fixed film resistor Failure in time assessment} % title of Table
\centering % used for centering table
\begin{tabular}{||c|c|l||}
\hline \hline
 \em{Parameter}      &  \em{Value}    &   \em{Comments} \\
                     &                &      \\ \hline \hline
 ${\lambda}_{b}$ &  0.00092        & stress/temp base failure rate  $60^o$ C  \\   \hline
 %${\pi}_T$ &  4.2         & max temp of $60^o$ C\\ \hline
 ${\pi}_R$ &  1.0         & Resistance range $< 0.1M\Omega$\\ \hline
 ${\pi}_Q$ &  15.0         & Non-Mil spec component\\ \hline
 ${\pi}_E$ &  1.0         & benign ground environment\\ \hline

\hline \hline
\end{tabular}
\label{tab:resistor}
\end{table}

Applying equation \ref{resistorfit} with the parameters from table \ref{tab:resistor}
give the following failures in ${10}^6$ hours:

\begin{equation}
 0.00092 \times 1.0 \times 15.0 \times 1.0 = 0.0138  \;{failures}/{{10}^{6} Hours}
 \label{eqn:resistor}
\end{equation}

While MIL-HDBK-217F gives MTTF for a wide range of common components,
it does not specify how the components will fail (in this case OPEN or SHORT).
%
Some standards, notably EN298 only consider most types of resistor as failing in OPEN mode.
%FMD-97 gives 27\% OPEN and 3\% SHORTED, for resistors under certain electrical and environmental stresses.
% FMD-91 gives parameter change as a third failure mode, luvvverly 08FEB2011
This example
compromises and uses a 9:1 OPEN:SHORT ratio, for resistor failure.
%
Thus for this example resistors are expected to fail OPEN in 90\% of cases and SHORTED
in the other 10\%.
A standard fixed film resistor, for use in a benign environment, non military specification at
temperatures up to {60\oc} is given a probability of 13.8 failures per billion ($10^9$)
hours of operation (see equation \ref{eqn:resistor}).
In EN61508 terminology, this figure is referred to as a Failure in Time FIT\footnote{FIT values are measured as the number of
failures per Billion (${10}^9$) hours of operation, (roughly 114,000 years). The smaller the
FIT number the more reliable the component.}.
%
The formula given for a thermistor in  MIL-HDBK-217F\cite{mil1991}[9.8] is reproduced in
equation \ref{thermistorfit}. The variable meanings and values are described in table \ref{tab:thermistor}.
%
\begin{equation}
% fixed comp resistor{\lambda}_p = {\lambda}_{b}{\pi}_{R}{\pi}_Q{\pi}_E
resistor{\lambda}_p = {\lambda}_{b}{\pi}_Q{\pi}_E
 \label{thermistorfit}
\end{equation}
%
\begin{table}[ht]
\caption{Bead type Thermistor Failure in time assessment} % title of Table
\centering % used for centering table
\begin{tabular}{||c|c|l||}
\hline \hline
 \em{Parameter}      &  \em{Value}    &   \em{Comments} \\
                     &                &      \\ \hline \hline
 ${\lambda}_{b}$ &  0.021        & stress/temp base failure rate bead thermistor \\   \hline
 %${\pi}_T$ &  4.2         & max temp of $60^o$ C\\ \hline
 %${\pi}_R$ &  1.0         & Resistance range $< 0.1M\Omega$\\ \hline
 ${\pi}_Q$ &  15.0         & Non-Mil spec component\\ \hline
 ${\pi}_E$ &  1.0         & benign ground environment\\ \hline

\hline \hline
\end{tabular}
\label{tab:thermistor}
\end{table}
%
\begin{equation}
 0.021 \times 1.0 \times 15.0 \times 1.0 = 0.315 \; {failures}/{{10}^{6} Hours}
 \label{eqn:thermistor}
\end{equation}
%
Thus thermistor, bead type, `non~military~spec' is given a FIT of 315.0
%
Using the RIAC finding we can draw up the following table (table \ref{tab:stat_single}),
showing the FIT values for all faults considered.
\glossary{name={FIT}, description={Failure in Time (FIT). The number of times a particular failure is expected to occur in a $10^{9}$ hour time period.}}

\begin{table}[h+]
\caption{Pt100 FMEA Single // Fault Statistics} % title of Table
\centering % used for centering table
\begin{tabular}{||l|c|c|l|l||}
\hline \hline
 \textbf{Test} & \textbf{Result} & \textbf{Result } & \textbf{MTTF}   \\
 \textbf{Case} &  \textbf{sense +} & \textbf{sense -} & \textbf{per $10^9$ hours of operation}   \\
%   R         &    wire        & res +         & res -    & description
\hline
\hline
TC:1 $R_1$ SHORT    &  High Fault        &  -       &  1.38  \\ \hline
TC:2 $R_1$  OPEN     &  Low Fault         & Low Fault     &  12.42\\ \hline
 \hline
TC:3 $R_3$  SHORT    &  Low Fault          & High Fault      & 31.5  \\ \hline
TC:4 $R_3$  OPEN     &  High Fault         & Low  Fault    &   283.5 \\ \hline
\hline
TC:5 $R_2$ SHORT     &  -         &  Low Fault   &  1.38  \\
TC:6 $R_2$ OPEN     &  High Fault         & High Fault     & 12.42 \\ \hline
\hline
\end{tabular}
\label{tab:stat_single}
\end{table}

The FIT for the circuit as a whole is the sum of MTTF values for all the
test cases. The Pt100 circuit here has a  FIT of 342.6. This is a MTTF of
about 360 years per circuit.

A probabilistic tree can now be drawn, with a FIT value for the Pt100
circuit and FIT values for all the component fault modes from which  it was calculated.
We can see from this that  the most likely fault is the thermistor going OPEN.
This circuit is around 10 times more likely to fail in this way than in any other.
Were we to need a more reliable temperature sensor, this would probably
be the fault~mode we would scrutinise first.


\begin{figure}[h+]
 \centering
 \includegraphics[width=400pt,bb=0 0 856 327,keepaspectratio=true]{./CH5_Examples/stat_single.png}
 % stat_single.jpg: 856x327 pixel, 72dpi, 30.20x11.54 cm, bb=0 0 856 327
 \caption{Probablistic Fault Tree : Pt100 Single Faults}
 \label{fig:stat_single}
\end{figure}


The Pt100 analysis presents a simple result for single faults.
The next analysis phase looks at how the circuit will behave under double simultaneous failure
conditions.


\paragraph{Pt100 Example: Double Failures and statistical data}
Because we can perform double simultaneous failure analysis under FMMD
we can also apply failure rate statistics to double failures.
%
%%
%% Need to talk abou the `detection time'
%% or `Safety Relevant Validation Time' ref can book
%% EN61508 gives detection calculations to reduce
%% statistical impacts of failures.
%%
%
If we consider the failure modes to be statistically independent we can calculate
the FIT values for all the combinations failures in the electronic examples chapter~\ref{sec:chap5} table~\ref{tab:ptfmea2}.
%
The failure mode of concern, the undetectable {\textbf{FLOATING}} condition
requires that resistors $R_1$ and $R_2$ fail.
%
We can multiply the MTTF
together and find an MTTF for both failing.
%
The FIT value of 12.42 corresponds to
$12.42 \times {10}^{-9}$ failures per hour. Squaring this gives $ 154.3 \times {10}^{-18} $.
%
This is an astronomically small MTTF, and so small that it would
probably fall below a threshold to sensibly consider.
%
However, it is very interesting from a failure analysis perspective,
because here we have found a fault that we cannot detect (at least at this
level in the FMMD hierarchy).
%
This means that should we wish to cope with
this fault, we need to devise a new way of detecting this
condition, perhaps in higher levels of the system/FMMD hierarchy.
%
\glossary{name={FIT}, description={Failure in Time (FIT). The number of times a particular failure is expected to occur in a $10^{9}$ hour time period. Associated with continuous demand systems under EN61508~\cite{en61508}}}
%
%
\subsection{Deriving FTA diagrams from FMMD models}
\label{sec:fta}
%
Fault Tree Analysis (FTA)~\cite{ftahistory} is a top down methodology that
draws a fault tree---or top down fault causation diagram---for each given top-level
failure. With an FMMD model, we can trace all the causes of system failures
down to the base component level.
%
This would be enough to create a fault causation tree, but FTA introduces
concepts of operational and environmental states, and inhibit gates.
%
The FMEA philosophy in relation to these three concepts are to assume that they are worst cases, that they
{\em may} occur,
and determine what system failures may arise.
%
The FTA perspective is that some safety can be built in
by preventing certain things happening (inhibit gates), and by considering
different behaviour due to environmental or operational states~\cite{nucfta,nasafta}.
%
If we require FMMD to produce full FTA diagrams, we need to add these
attributes to the FMMD UML model\footnote{Top down failure mode models, such as FTA, are additionally
useful in guiding diagnostic analysis.}.


\paragraph{Environment, operational states and inhibit gates: additions to the UML model.}

FTA, in addition to using symbols borrowed from digital logic introduces three new symbols to
model environmental, operational state and inhibit gates; we discuss here how these can be incorporated into
the FMMD model.

A  system will be expected to perform in a given environment.
%
Environment in the context of this study
means external influences under which the system could be expected to work. % under.
%
A typical data sheet for an electrical component will give
a working temperature range: %, for instance.
mechanical components could  be specified for stress and loading limits.
It is unusual to have failure modes  described in product literature, although
for complicated components with firmware, errata documents~\cite{pic18f25k80erratta} are sometimes produced.

Systems may have distinct operational states. For instance, a safety critical controller
may have a LOCKOUT state where it has detected a serious problem and will not continue to operate until
authorised human intervention takes place.
A safety critical circuit may have a self test mode which could be operated externally:
a micro-processor may have a SLEEP mode etc.
%
To make FMMD compatible with FTA operational states and environmental conditions should %can %must
be factored into the UML model.
%
An undesired condition may occur where it could be necessary to inhibit some action of the system.
This is rather like a logical guard criterion. For instance in the gas burner standard EN298 it
states that a flame detector must confirm that a pilot flame has been established before the main burner fuel can be applied.
In FTA terms this would be an inhibit condition on the main fuel, i.e. PILOT\_NOT\_CONFIRMED.

We now look at the nature of these three attributes and decide how they should fit into the UML
model for FMMD developed in section~\ref{sec:fmmd_uml}.

\paragraph{Environmental Modelling.} The external influences/environment could typically be temperature ranges,
levels of electrical interference, high voltage contamination on supply
lines, radiation levels etc.
Environmental influences will affect specific components in specific ways\footnote{A good example of a part
affected by environmental conditions, in this case temperature, is the opto-isolator~\cite{tlp181}
which is typically affected at around {60 \oc}. Most electrical components are more robust to temperature variations.}.
Environmental analysis is thus applicable to components.
Environmental influences, such as over-stress due to voltage
can be eliminated by down-rating components as discussed in section~\ref{sec:determine_fms}.
With given environmental constraints, we can therefore eliminate some failure modes from the model.


\paragraph{Operational states.}
Within the field of safety critical engineering, we often encounter
elements that include test or self-test facilities.
%
We also encounter degraded performance
(such as only performing certain functions in an emergency) and lockout/emergency conditions.
These can be broadly termed operational states. %, and apply to the
%functional groups.
%
We need to determine which UML class is most appropriate to hold a relationship
to operational states.
%
Consider for instance an electrical circuit that has a TEST line.
When the TEST line is activated, it supplies a test signal
which will validate the circuit. This circuit will have two operational states,
NORMAL and TEST mode.
%
It seems more appropriate to apply the operational states to {\fgs}
which %
%Functional groupings
by definition implement functionality, or purpose.
On this basis we associate operational states with {\fgs}.
%therefore are the best objects to model
%operational states.% with.

\paragraph{Inhibit Conditions.}
Inhibit conditions and the symbols used for them are described in~\cite{nasafta}[p.40]. % is required. %desired.
%
Some failure modes may only be active given specific environmental conditions
or when other failures are already active.
%
To model this, an `inhibit' class has been added.
%
This is an optional attribute of
a failure mode.
%
This inhibit class can be triggered
on a combination of environmental or failure modes.
%
In the UML diagram, we therefore link this with
both environmental conditions and failure modes.


\paragraph{UML Diagram Additional Objects.}
The additional objects System, Environment, Inhibit and Operational States
are added to  UML diagram in figure \ref{fig:cfg} are represented in figure \ref{fig:cfg2}.

\label{completeumlfurtherwork}

\begin{figure}[h]
 \centering
 \includegraphics[width=400pt,keepaspectratio=true]{./CH8_Conclusion/master_uml_further_work.png}
 % cfg2.png: 702x464 pixel, 72dpi, 24.76x16.37 cm, bb=0 0 702 464
 \caption{FMMD UML diagram, incorporating Environmental, Operational State and Inhibit gates}
 \label{fig:cfg2}
\end{figure}

\clearpage

\subsection{Retrospective Failure Mode analysis and FMMD}

The reasons for applying retrospective failure mode analysis could be approving previously un-assessed
systems to a safety standard, or to determine  the failure mode behaviour of an instrument used in
safety critical verification. % verification.
%
FMMD can be applied retrospectively to a project, and because of its modular nature, coupled with
its `bottom-up~work~flow' it
can reveal previously undetected system failure modes.
%
This is because the analyst
is  forced to deal with all component failure modes when applying the FMMD process, and
all failure modes of the resultant {\dcs} as we progress up a hierarchy.
%
FMMD requires that all failure modes of components in a {\fg} are resolved to
a symptom in the resulting {\dc}.
%
Because we can enforce a `complete' analysis, FMMD can find failure modes were missed by
other FMEA processes; meaning that the  FMMD process can expose un-handled
failure modes.
%come to light.

We can apply retrospective FMMD to electronic and software hybrid systems as well.
%
The electronic components {\fms} are established in the literature~\cite{fmd91,mil1991,en298,en230}.
%
Each function in the software would have to be assigned a `design~contract'~\cite{dbcbe} (where violations of
contract clauses will be treated as failure modes in FMMD).
%
% By %doing
% applying contracts and seeing how calling functions deal with
% the failures in the functions they call, we reveal un-handled the error conditions in
% the software.
% By treating hardware interfaces to software as {\dcs}, we automatically have a list of the failure modes
% of the electronics.
%%
With the contracts in place for the software functions, we can then integrate them into the FMMD model.
%
FMMD models both software and hardware;
we can thus verify that all
failure modes from the electronics module have been dealt
with by the controlling software.
%
If not they are an un-handled error condition relating to the software hardware interface.
%
% That is the  hardware interfaces to software in FMMD is a {\dc},
% the failure modes of this {\dc} are the list of all known failure modes
% of the electronics.
%
By performing FMMD on a software electronic hybrid system,
we thus reveal design deficiencies in both the software, the electronics and the software/electronics interface.
%in the hardware/software interface.
%
FMEDA does not handle software ---or---the software/hardware interface.
It thus potentially misses many undetected failures (in EN61508 terms undetected-dangerous and undetected safe failures).
In Safety Integrity Level (SIL)~\cite{en61508} terms, by identifying undetectable faults and fixing them, we raise
the safe failure fraction (SFF).


%

\section{Objective and Subjective Reasoning stages}
%Opportunity for formal definitions and perhaps an interface or process for achieving it....
The act of applying failure mode effects analysis, in terms of cause and effect is viewed from
an `engineering' mentality  cause and effect perspective. This is the realm of the objective.
%
The executive decisions about deploying systems are in the domain of management and politics.
%
The dangers, or potential negative effects of a safety critical system depend not only on the system itself,
but on the environment in which they are used
and other human factors such as the training level of operatives, psychological and logical factors in
the Human Machine Interface~(HMI)~\cite{stranks2007human}.
%
\paragraph{Objective and Subjective Reasoning in FMEA: Three Mile Island nuclear accident example.}
An example of objective and subjective factors is demonstrated in the accident report on the 1979  Three Mile Island
nuclear accident~\cite{safeware}[App.D]. Here, a vent valve for the primary reactor coolant (pressurised water) became stuck open.
This condition causes an objectively derived  failure mode --- `leakage~of~coolant' --- due to a stuck valve.
%
This, if recognised correctly by the operators, would have lead quickly to
to a  reactor shut-down and
a maintenance procedure  to replace the valve.
%
The failure was not recognised in time however, and coolant was lost
until a partial meltdown of the reactor fuel occurred, with a resulting
leak of radioactive material into the environment.
%
For the objective failure mode determined by
FMEA, that of leakage of coolant,
we would not reasonably expect this to go unchecked and unresolved for an extended period and cause such a critical failure.
%
The criticality level of that accident was therefore subjective. It was not known how the operators
would have reacted, and deficiencies in the Human Machine Interface (HMI) were not a factor in the failure analysis.


\paragraph{Further Work: Objective and Subjective Reasoning in FMEA.}
%
We could term the criticality prediction to be in the domain of subjective reasoning. With an objectively defined  system level failure
we often are next required to determine its level of criticality, or how serious the risk posed would be.
%
Two methodologies have started to consider this aspect, FMECA~\cite{fmeca} with its criticality and probability factors, and
FMEDA~\cite{en61508,fmeda} with its classification of dangerous and safe failures.
%
It is the author's opinion that more work is required to clarify this area. The scope of FMMD is  the objective level only.