265 lines
14 KiB
TeX
265 lines
14 KiB
TeX
\label{sec:chap1}
|
|
|
|
%\paragraph{Abstract} % : The Scope of this study.}{
|
|
{
|
|
%
|
|
Increasingly we rely on automation in everyday life.
|
|
Many % of the
|
|
automated systems have the potential to cause harm or even death should they fail.
|
|
%
|
|
Safety assessment and certification is now required for %of
|
|
almost all potentially dangerous equipment.
|
|
%
|
|
As part of the assessment/certification process, we typically apply
|
|
a battery of tests, examining features such as resistance to extremes of environment, Electro Magnetic Compatibility (EMC),
|
|
endurance regimes and static testing.
|
|
%
|
|
Static testing is at the theoretical, or design level, and involves
|
|
looking at failure scenarios and trying to predict how systems would react.
|
|
%
|
|
This thesis deals with one area of static testing, that of Failure Mode Effects Analysis (FMEA)~\cite{iec60812}, a commonly
|
|
used technique that is legally mandatory for a wide range of equipment certification.
|
|
|
|
The ability to assess the safety of man made equipment has been a concern
|
|
since the dawn of the industrial age~\cite{usefulinfoengineers,steamboilers}.
|
|
%
|
|
The philosophy behind safety measures has progressed
|
|
with time and by World War Two we began to see concepts such as `no single component failure should cause
|
|
a dangerous system failure'~\cite{boffin} emerging~\cite{echoesofwar}[Ch.13].
|
|
%
|
|
Concepts such as these allow us to apply
|
|
objective criteria to safety assessment. We can extend the `no~single~failure' concept
|
|
to double or even multiple failures being unacceptable as the cause of dangerous states.
|
|
%
|
|
The concept of a double failure causing a dangerous condition being forbidden
|
|
can be found in the legally binding European standard EN298\footnote{EN298:2003 became
|
|
a legal requirement for all new forced draft industrial burner controllers in 2006 within
|
|
the European Union.} which
|
|
came into force
|
|
in 2006~\cite{en298}.
|
|
%
|
|
More sophisticated statistically based standards, i.e EN61508~\cite{en61508} and variants thereof,
|
|
are based on statistical thresholds for the frequency of dangerous failures.
|
|
%
|
|
We could state, for instance, that we can tolerate an `acceptable' maximum number of
|
|
dangerous failures per billion hours of operation.
|
|
%
|
|
We can then broadly categorise ratings of failure rates into Safety Integrity Levels (SIL)~\cite{scsh}.
|
|
%
|
|
So for a maximum of 10 potentially dangerous failures per billion hours of operation we assign a SIL level of 4,
|
|
for 100 a SIL level of 3, and so on in powers of ten.
|
|
%
|
|
If we can determine a SIL rating,
|
|
we can match it against a risk.
|
|
%
|
|
The more dangerous the consequences of failure
|
|
the higher SIL rating we can demand for it.
|
|
%
|
|
A band-saw with one operative may require a SIL rating of 1,
|
|
but something with higher potential for harm to a larger number of people,
|
|
such as a nuclear power-station or air-liner,
|
|
with far greater consequences on dangerous failure
|
|
may require a SIL rating of 4.
|
|
%
|
|
What we are saying is that while we may tolerate a low incidence of failure on a band-saw,
|
|
we will only tolerate extremely low incidences of failure in nuclear plant.
|
|
SIL ratings provide another objective yardstick for the measurement of system safety.
|
|
%governing failure conditions and determining risk levels associated with systems.
|
|
|
|
All of these risk assessment techniques are based on variations of %on the theme of
|
|
Failure Mode Effect Analysis (FMEA), which has its roots in the 1940's mass production industry
|
|
and was designed to save large companies money by prioritising the most financially
|
|
draining problems in a product. % first.
|
|
%
|
|
The FMEA of the 1940's has been refined and extended into four main variants.
|
|
%
|
|
This thesis describes the refinements and additions made to
|
|
FMEA to tailor them for military or statistically biased % commercial
|
|
use.
|
|
It then reveals common flaws
|
|
which make them unsuitable for the higher safety requirements of the 21st century.
|
|
%
|
|
Problems with state explosion in failure mode reasoning and the current difficulties %impossibility
|
|
of integrating software and hardware failure mode models~\cite{1372150} are the most obvious of these. %flaws.
|
|
%
|
|
These four current methodologies are described in chapter~\ref{sec:chap2} and %the advantages and drawbacks
|
|
%of each FMEA variant are examined
|
|
critically assessed in chapter~\ref{sec:chap3}.
|
|
In chapter~\ref{sec:chap4}, a new methodology is proposed which addresses the state explosion problem
|
|
and using contract programmed software, allows the modelling of integrated
|
|
software/electrical systems.
|
|
%
|
|
This is followed by two chapters showing examples of the new modular FMEA analysis technique (Failure Mode Modular De-Composition FMMD)
|
|
firstly looking at common electronic circuits and then at electronic/software hybrid systems.
|
|
}
|
|
|
|
\section{Motivation}
|
|
The motivation for this study came from two sources, one academic (my Software Engineering MSc project) and the other
|
|
practical (as a practising embedded software engineer working with FMEA on safety critical burner systems).
|
|
%
|
|
% AF does not think the paragraph below should be included 12JAN2013
|
|
\paragraph{MSc Project: Euler/Spider diagram Editor.}
|
|
I had recently completed an
|
|
MSc and my project was to create an Euler/Spider~Diagram~\cite{howse:spider} editor in Java.
|
|
This editor allowed the user to draw Euler/Spider diagrams, and could then
|
|
represent these as abstract---i.e. mathematical---definitions.
|
|
%
|
|
The primary motive for writing the Spider diagram editor was to provide an alternative
|
|
to formal languages for software specification.
|
|
%
|
|
An added attraction for using spider diagrams was that they could be used in
|
|
proving logic~\cite{stapleton:atpieds} and theorems~\cite{theoremflower,Fish200553} in an intuitive way.
|
|
%
|
|
Because of my daily work exposure to FMEA,
|
|
I started thinking of ways to apply formal languages and spider diagrams to
|
|
failure mode analysis.
|
|
%
|
|
%
|
|
\paragraph{European Safety Requirements increase in scope and complexity.}
|
|
At work---which consisted of designing, testing, building and writing embedded `C' and assembly language code for safety critical
|
|
industrial burners---we were faced with a new and daunting requirement.
|
|
Conformance to the latest European standard, EN298~\cite{en298}.
|
|
%
|
|
It appeared to ask for the impossible:
|
|
not only did it require the usual safety measures (self checking of ROM and RAM, watchdog processors with separate clock sources, EMC and the
|
|
triple fail safe control of valves), it had one new clause in it that had far reaching consequences.
|
|
%
|
|
It stated that in the event of a failure, where the controller had gone into a `lockout~state'--- a state where the controller
|
|
applies all possible safety measures to stop fuel entering the burner---it was not permitted to % could not
|
|
become dangerous should another fault occur.
|
|
%
|
|
In short this meant we had to be able to deal with double failures.
|
|
%
|
|
Any of the components that could, in failing, create a dangerous state were already
|
|
documented and approved using failure mode effects analysis (FMEA).
|
|
%
|
|
This new requirement
|
|
effectively meant that all single and double component failures were
|
|
now required to be analysed.
|
|
%
|
|
This, from a state explosion problem alone,
|
|
meant that it was going to be virtually impossible to perform.
|
|
%
|
|
To compound the problem, %state explosion problem
|
|
FMEA has a deficiency of repeated work, as each component failure is typically represented
|
|
by one line or entry in a spreadsheet~\cite{bfmea}; analysis on repeated sections of
|
|
circuitry (for instance repeated {\ft} outputs on a PCB) meant that
|
|
analysis of identical circuitry was performed many times.
|
|
%
|
|
|
|
%
|
|
\subsection{Modularising/De-Composing FMEA: Initial concepts.} % and augmenting this with concepts from Euler/Spider Diagrams.}
|
|
In the field of digital signal processing there is an algorithm that revolutionised
|
|
access to frequency analysis of digital samples called the Fast Fourier transform (FFT)~\cite{fftoriginal}.
|
|
This took the discrete Fourier transform (DFT), and applied de-composition to its
|
|
mesh of (often repeated) complex number calculations~\cite{fpodsadsp}[Ch.8].
|
|
%
|
|
By doing this it broke the computing order of complexity problem down from having a polynomial %n exponential
|
|
%order
|
|
to logarithmic order~\cite{ctw}[pp.401-3].
|
|
I wondered if this thinking could be applied to the state explosion problems encountered in FMEA.
|
|
%
|
|
%Following the concept of de-composing a problem, and thus simplifying the state explosion---using the thinking behind
|
|
%the fast Fourier transform (FFT)~\cite{fpodsadsp}[Ch.8], which takes a complex intermeshed series of real and imaginary number calculations
|
|
%and by de-composing them, simplifies the problem.
|
|
My reasoning was that if we analysed %were we to analyse
|
|
the problem in small modules, from the bottom-up following the FFT example, we could apply
|
|
checking for all double failure scenarios.
|
|
%
|
|
Once these first modules were analysed---we now call them {\fgs}---we could determine the symptoms of failure for them.
|
|
Using the symptoms of failure, we could now treat these modules as components in their own right---or {\dcs}---and use them to build higher level
|
|
{\fgs}. Higher and higher levels of {\fgs} could be built until we had a hierarchy
|
|
representing a failure mode model for the system.
|
|
%
|
|
Because this is modular, %we can apply double simultaneous failure mode checking; and as %because
|
|
double simultaneous failure mode checking can be applied as
|
|
the number of components
|
|
in each {\fg} is typically small; we therefore avoid state explosion problems. % for the general case. % AF says `in the general case' here 12JAN2013
|
|
%
|
|
%
|
|
If we apply
|
|
double checking all the way up the hierarchy we can guarantee to have considered
|
|
every double simultaneous failure of all components in a system.
|
|
%
|
|
This means, as a fortunate by-product, that many multiple as well as double
|
|
failures would be analysed, but because failure modes are traceable from the base components to the top level---or system---failure modes,
|
|
these relationships can be held in a traversable data structure.
|
|
%
|
|
If held in a traversable data structure we can apply automated methods to search for all the combinations of multiple failure modes
|
|
within the model that had been analysed. Because of this, it will not always %it may not
|
|
be necessary to apply double checking
|
|
at all higher levels in the analysis hierarchy, to achieve complete double failure coverage.
|
|
%
|
|
The points at which it is possible to relax double failure checking can be verified automatically by traversing
|
|
the failure mode model.
|
|
%
|
|
\subsection{Initial direction: Application of Spider diagrams to FMEA.}
|
|
|
|
Because, Euler/Spider Diagrams~\cite{howse:spider}
|
|
could be used to model failure modes in components
|
|
it was thought that a diagrammatic notation would
|
|
be more user friendly than using formal logic.
|
|
%
|
|
For an FMEA Spider diagram, contours represent failure modes, and the Spider diagram
|
|
`existential~points' represent instances of failure modes.
|
|
%
|
|
Overlapping contours represent multiple failure modes.
|
|
%
|
|
By drawing a spider collecting existential points, a common failure symptom could
|
|
be determined and from this a new diagram generated automatically to represent the {\dc}.
|
|
%
|
|
Each spider represented a derived failure mode.
|
|
The act of collecting common symptoms by drawing spiders
|
|
meant that the analyst was forced to associate one component failure mode with one symptom/derived~failure~mode of failure.
|
|
%
|
|
These concepts were presented at the ``Euler~2004''~\cite{Clark200519} conference held at the University of Brighton. %
|
|
%
|
|
This defined the concepts for modularising FMEA using the formal visual notations from Spider diagrams.
|
|
This lead to work on rapidly calculating available zones in Euler diagrams~\cite{Clark_fastzone}.
|
|
%
|
|
The spider diagram notation was useful in defining the concepts and
|
|
initial ideas, but a more traditional `spreadsheet' format has been used
|
|
for the analysis stages of the new methodology.
|
|
%
|
|
Euler diagrams have been used later in the thesis to describe the containment relationships
|
|
of derived components when building hierarchical analysis models with the modularised
|
|
variant of FMEA that this thesis proposes and defends.
|
|
%
|
|
|
|
%
|
|
\section{Objectives of the thesis.}
|
|
The primary objective of the work performed for this thesis is to present a new modularised variant of
|
|
FMEA which solves the problems of:
|
|
\begin{itemize}
|
|
\item State Explosion,
|
|
\item Multiple failure mode modelling,
|
|
\item Re-usability of pre-analysed modules,
|
|
\item Inclusion of software in failure mode modelling.
|
|
\end{itemize}
|
|
To support this, worked examples using the new methodology were created and the work published and presented to
|
|
IET safety conferences. % in 2011~\cite{syssafe2011} and 2012~\cite{syssafe2012}.
|
|
%
|
|
The development of FMMD, starting with a critique of FMEA and a wish-list for a better methodology,
|
|
was presented to the IET System safety conference in 2011,~\cite{syssafe2011}.
|
|
%
|
|
FMEA, currently cannot integrate software models into its hardware failure mode models~\cite{sfmea,modelsfmea,embedsfmea,sfmeainterface}.
|
|
%
|
|
FMMD can use the existing structure of functional software, in conjunction
|
|
with contract programming, to model software and this concept was presented to the IET System safety conference in 2012~\cite{syssafe2012}.
|
|
|
|
\paragraph{Overview---quick guide to contents of the thesis.}
|
|
Chapter~\ref{sec:chap2} examines the current state of FMEA based methodologies, Chapter~\ref{sec:chap3}
|
|
examines the benefits and drawbacks of these methodologies
|
|
and proposes a detailed wish list for an ideal FMEA technique.
|
|
Chapter~\ref{sec:chap4} proposes Failure Mode Modular de-composition (FMMD)---a modularised variant
|
|
of FMEA designed to address the points in the detailed wish list.
|
|
Chapter~\ref{sec:chap5} provides worked examples using common electronic circuits.
|
|
Chapter~\ref{sec:chap6} gives two examples of integrated software and electronic systems analysed using FMMD.
|
|
Metrics and evaluation, along with an example showing double simultaneous failure analysis,
|
|
are provided in Chapter~\ref{sec:chap7}, with a conclusion and further work in Chapter~\ref{sec:chap8}.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|