173 lines
10 KiB
TeX
173 lines
10 KiB
TeX
|
|
\paragraph{Abstract}{
|
|
Increasingly we rely on automation in everyday life.
|
|
Many of the automated systems have the potential to cause harm or even death, should they fail.
|
|
Safety assessment and certification is now required of
|
|
almost all potentially dangerous equipment.
|
|
%
|
|
As part of the assessment/certification process, we typically apply
|
|
a battery of tests; examining features such as resistance to extremes of environment, electro magnetic compatibility (EMC),
|
|
endurance and static testing.
|
|
%
|
|
Static testing is at the theoretical, or design level, and involves
|
|
looking a failure scenarios and trying to predict how systems would react.
|
|
%
|
|
This thesis deals with one area of static testing, that of Failure Mode Effects Analysis (FMEA), a commonly
|
|
used technique that is legally mandatory for a wide range of equipment.
|
|
|
|
The ability to assess the safety of man made equipment has been a concern
|
|
since the dawn of the industrial age~\cite{indacc01,usefulinfoengineers,steamboilers}.
|
|
The philosophy behind safety measure has progressed
|
|
with time, and by world war two~\cite{boffin} we begin to see concepts such as `no single component failure should cause
|
|
a dangerous system failure' emerging. Concepts such as these allow us to apply
|
|
objective criteria to safety assessment. We can extend the `no~single~failure' concept
|
|
to double or even multiple failures not being allowed to cause dangerous states.
|
|
%
|
|
The concept of a double failure causing a dangerous condition being unacceptable,
|
|
can be found in the legally binding European standard EN298 which became
|
|
a legal requirement in 2006~\cite{en298}.
|
|
More sophisticated statistically based standards, i.e EN61508~\cite{en61508} and variants thereof,
|
|
are based on statistical thresholds for the frequency of dangerous failures.
|
|
We could state, for instance, that we can tolerate an `acceptable' maximum number of
|
|
dangerous failure per billion hours of operation.
|
|
We can then broadly separate these ratings failure rates into safety integrity levels (SIL).
|
|
So for a maximum of 10 failures per billion hours of operation we assign a SIL level of 4,
|
|
for 100 a sil level of 3 etc.
|
|
If we can determine a SIL rating
|
|
we can match it against the risk.
|
|
The more dangerous the consequences of failure
|
|
the higher SIL rating we can demand for it.
|
|
A band-saw with one operative may require a SIL rating of 1,
|
|
a nuclear power-station, with far greater consequences on dangerous failure
|
|
may require a SIL rating of 4.
|
|
SIL ratings give us another objective yardstick to measure system safety.
|
|
%governing failure conditions and determining risk levels associated with systems.
|
|
|
|
All of these risk assessment techniques are based on variations on the theme of
|
|
Failure Mode Effect Analysis (FMEA), which has its roots in the 1940's mass production industry
|
|
and was designed to save large companies money by fixing the most financially
|
|
draining problems in a product first.
|
|
|
|
This thesis show that the refinements and additions made to
|
|
FMEA to tailor them for military or statistical commercial use, have common flaws
|
|
which make them unsuitable for the higher safety requirements of the 21st century.
|
|
Problems with state explosion in failure mode reasoning and the impossibility
|
|
of integrating software and hardware failure mode models are the most obvious of these. %flaws.
|
|
The methodologies are explained in chapter~\ref{sec:chap2} and the advantages and drawbacks
|
|
of each FMEA variant are examined in chapter~\ref{sec:chap3}.
|
|
In chapter~\ref{sec:chap4}, a new methodology is then proposed which addresses the state explosion problem
|
|
and, using contract programmed software, allows the modelling of integrated
|
|
software/electrical systems.
|
|
This is followed by two chapters showing examples of the new modular FMEA analysis technique (Failure Mode Modular De-Composition FMMD)
|
|
firstly looking at electronic circuits and then at electronic/software hybrid systems.
|
|
}
|
|
|
|
\section{Introduction}
|
|
The motivation for this study came form two sources, one academic and the other
|
|
practical.
|
|
\paragraph{MSc Project: Euler/Spider diagram Editor.}
|
|
I had recently completed an
|
|
MSc and my project was to create an Euler/Spider~Diagram~\cite{howse:spider} editor in Java.
|
|
This editor allowed the user to draw Euler/Spider diagrams, and could then
|
|
represent these as abstract---i.e. mathematical---definitions.
|
|
\paragraph{European Safety Requirements increase in scope and complexity.}
|
|
At work, writing embedded `C' and assembly language code for safety critical
|
|
industrial burners, we were faced with a new and daunting requirement.
|
|
Conformance to the latest European standard, EN298. It appeared to ask for the impossible,
|
|
not only did it require the usual safety measures (self checking of ROM and RAM, watchdog processors with separate clock sources, EMC
|
|
triple fail safe control of valves), it had one new clause in it, that had far reaching consequences.
|
|
It stated that in the event of a failure, where the controller had gone into a `lockout~state'--- a state where the controller
|
|
applies all possible safety measures to stop fuel entering the burner---it could not become dangerous should another fault occur.
|
|
In short this meant we had to be able to deal with double failures.
|
|
Any of the components that could, in failing create a dangerous state, were already
|
|
documented and approved using failure mode effects analysis (FMEA). This new requirement
|
|
effectively meant that any all combinations of component failures were
|
|
now required to be analysed. This, from a state explosion problem alone,
|
|
meant that it was going to be virtually impossible to perform.
|
|
FMEA had a deficiency of repeated work, as each component failure is typically represented
|
|
by one line or entry in a spreadsheet~\cite{bfmea}, analysis on repeated section of
|
|
circuitry (for instance repeated 4-20mA outputs on a PCB), meant that
|
|
analysis of identical circuitry was performed many times.
|
|
A desirable feature of a new methodology would be to be able to re-use
|
|
analysis for identical repeated modules. The development of this new methodology
|
|
was presented to the IET System safety conference in 2011~\cite{syssafe2011}.
|
|
FMEA, currently cannot integrate software into its failure mode models.
|
|
A modular variant of FMEA can use the existing structure of functional software, in conjunction
|
|
with contract programming, to model software~\cite{syssafe2012}.
|
|
%
|
|
\paragraph{Modularising FMEA and augmenting this with concepts from Euler/Spider Diagrams}
|
|
Following the concept of de-composing a problem, and thus simplifying the state explosion---using the thinking behind
|
|
the fast Fourier transform (FFT)~\cite{fpodsadsp}[Ch.8], which takes a complex intermeshed series of real and imaginary number calculations
|
|
and by de-composing them simplifies the problem.
|
|
My reasoning was that were I to analyse the problem in small modules, from the bottom-up following the FFT example, I could apply
|
|
checking for all double failure scenarios.
|
|
Once these first modules were analysed, I now call them {\fgs}, I could determine the symptoms of failure for them
|
|
Using the symptoms of failure, I could now treat these modules as components, now called {\dcs}, and use them to build higher level
|
|
modules. I could apply double simultaneous failure mode checking, because the number of components
|
|
in each module/{\fg} was quite small---thus avoiding state explosion problems, but I could apply
|
|
double checking all the way up the hierarchy.
|
|
In fact this means, as a by-product that many multiple as well as double
|
|
failures would be analysed, but because failure modes are traceable from the base components to the top level---or system---failure modes
|
|
and these are held in a data structure, we can apply automated methods to search all cardinalities of multiple failure modes
|
|
within the model.
|
|
%
|
|
Because, Euler/Spider Diagrams
|
|
could be used to model failure modes in components
|
|
it was thought that a diagrammatic notation would
|
|
be easier to demonstrate than using formal logic.
|
|
%
|
|
|
|
For an FMEA Spider diagram, contours represent failure modes, and the spider diagram
|
|
`existential~points' instances of failure modes.
|
|
Overlapping contours could represent multiple failure modes.
|
|
By drawing a spider collecting existential points, a common failure symptom could
|
|
be determined and from this a new diagram generated automatically, to represent the {\dc}.
|
|
%
|
|
Each spider represented a derived failure mode.
|
|
The act of collecting common symptoms by drawing spiders
|
|
meant that the analyst was forced to associate one component failure mode with one symptom/derived~failure~mode of failure.
|
|
%
|
|
These concepts were presented at the ``Euler~2004''~\cite{Clark200519} conference held at Brighton University.
|
|
This brought together concepts for modularising FMEA and the formal visual notations from Spider diagrams.
|
|
Euler diagrams have been used later in the thesis to describe the containment relationships
|
|
of derived components building hierarchical analysis models with the modularised
|
|
variant of FMEA that this thesis proposes and defends.
|
|
|
|
\paragraph{Objectives of the thesis}.
|
|
The primary objective of the work performed for this thesis is to propose a modularised variant of
|
|
FMEA that solves the problems of:
|
|
\begin{itemize}
|
|
\item State Explosion,
|
|
\item Multiple failure mode modelling,
|
|
\item Re-usability of pre-analysed modules,
|
|
\item Inclusion of software in failure mode modelling.
|
|
\end{itemize}
|
|
|
|
|
|
Chapter~\ref{chap2} examines the current state of FMEA based methodologies, Chapter~\ref{chap3}
|
|
examines the benifits and drawbacks of these these methodologies
|
|
and proposes a detailed wish list for an ideal FMEA technique.
|
|
Chapter~\ref{chap4} proposes Failure Mode Modular de-composition (FMMD)---a modularised variant
|
|
of FMEA designed to address the points in the detailed wish list.
|
|
Chapter~\ref{chap5} provides worked examples usin g common electronic circuits.
|
|
Chapter~\ref{chap6} gives two examples of integgrated software and electronic systems anyalysed using FMMD.
|
|
Metrics and evaluation, along with an example showing double simultaneous failure analysis
|
|
are dealt with in Chapter~\ref{chap7}
|
|
|
|
|
|
|
|
% \section{Case Study: Safety Critical Product Approval changes for EN298:2003}
|
|
%
|
|
% FMEA performed on selected areas perceived as critical
|
|
% by test house.
|
|
% Blanket measures, RAM ROM checks, EMC, electrical and environmental stress testing
|
|
%
|
|
% \subsection{Practical limitations of testing for certification vs. rigorous approach}
|
|
%
|
|
% State explosion problem considering a failure mode of a given component against
|
|
% all other components in the system i.e. an exponential ($2^N$) order of processing resource
|
|
% rather than a polynomial i.e. $N^2$.
|
|
%
|
|
% Impossible to perform double simultaneous failure analysis (as demanded by EN298~\cite{en298}).
|
|
|