\label{sec:chap1} %\paragraph{Abstract} % : The Scope of this study.}{ { % Increasingly society relies on automation in everyday life. % Many % of the automated systems have the potential to cause harm or even death should they fail. % Safety assessment and certification is now required for %of almost all potentially dangerous equipment. % As part of the assessment/certification process, typically a battery of tests is applied, examining features such as resistance to extremes of environment, Electro Magnetic Compatibility (EMC), endurance regimes and static testing. % Static testing is at the theoretical, or design level, and involves looking at failure scenarios and trying to predict how systems would react. % This thesis deals with one area of static testing, that of Failure Mode Effects Analysis (FMEA)~\cite{iec60812}, a commonly used technique that is a legal requirement %mandatory for a wide range of equipment certification. The ability to assess the safety of machinery %man made equipment has been a concern since the dawn of the industrial age~\cite{usefulinfoengineers,steamboilers}. % % The philosophy behind safety measures has progressed % over time and by World War Two we began to see concepts such as `no single component failure should cause % a dangerous system failure'~\cite{boffin} emerging~\cite{echoesofwar}[Ch.13]. The philosophy behind safety measures has progressed over time and by World War Two concepts such as `no single component failure should cause a dangerous system failure'~\cite{boffin} emerged~\cite{echoesofwar}[Ch.13]. % Concepts such as these allow objective criteria of safety assessment. % The `no~single~failure' concept can be extended to double or even multiple failures being unacceptable as the cause of dangerous states. % The concept of a double failure causing a dangerous condition being forbidden can be found in the legally binding European standard EN298\footnote{EN298:2003 became a legal requirement for all new forced draft industrial burner controllers in 2006 within the European Union.} which came into force in 2006~\cite{en298}. % More sophisticated statistically based standards, i.e EN61508~\cite{en61508} and variants thereof, are based on statistical thresholds for the frequency of dangerous failures. % For instance, acceptable maximum numbers of dangerous failures per billion hours of operation could be stated. % % We can then broadly categorise orders of failure rates into Safety Integrity Levels (SIL)~\cite{scsh}. Orders of failure rates can then be broadly categorised into Safety Integrity Levels (SIL)~\cite{scsh}. % So for a maximum of 10 potentially dangerous failures per billion hours of operation a SIL level of 4 is assigned, for 100 a SIL level of 3, and so on in powers of ten. % If SIL ratings can be determined, they can be matched against given risks. % The more dangerous the consequences of failure the higher the SIL rating. % we can demand for it. % A band-saw with one operative may require a SIL rating of 1, %but something with higher potential for harm to a larger number of people, but systems such as nuclear power-stations or air-liners, with far greater consequences on dangerous failure, may require a SIL ratings of 4. % %That is while a low incidence of failure may be tolerable on a band-saw, %extremely low incidences of failure would be tolerable in a nuclear plant. %SIL ratings provide another objective yardstick for the measurement of system safety. %governing failure conditions and determining risk levels associated with systems. All of these risk assessment techniques are based on variations of %on the theme of Failure Mode Effect Analysis (FMEA), which has its roots in the 1940's mass production industry and was designed to save large companies money by prioritising the most financially draining problems in a product. % first. % The FMEA of the 1940's has been refined and extended into four main variants. % This thesis describes the refinements and additions made to FMEA to tailor them for military or statistically biased % commercial use. It then reveals common flaws which make them unsuitable for the higher safety requirements of the 21st century. % \fmmdglossSTATEEX Problems with state explosion in failure mode reasoning and the current difficulties %impossibility of integrating software and hardware failure mode models~\cite{1372150} are the most obvious of these. %flaws. % These four current methodologies are described in chapter~\ref{sec:chap2} and %the advantages and drawbacks %of each FMEA variant are examined critically assessed in chapter~\ref{sec:chap3}. \fmmdglossSTATEEX In chapter~\ref{sec:chap4}, a new methodology is proposed which addresses the state explosion problem and using contract programmed software, allows the modelling of integrated software/electrical systems. % This is followed by two chapters showing examples of the new modular FMEA analysis technique (Failure Mode Modular De-Composition FMMD) firstly looking at a variety of common electronic circuits and then at electronic/software hybrid systems. } \section{Motivation} The motivation for this study came from two sources, one academic (the author's Software Engineering MSc project) and the other practical (the author is a practising embedded software engineer working with FMEA on safety critical burner systems). % % AF does not think the paragraph below should be included 12JAN2013 \paragraph{MSc Project: Euler/Spider diagram Editor.} The author had recently completed an MSc and the project was to create an Euler/Spider~Diagram~\cite{howse:spider} editor in Java. This editor allowed the user to draw Euler/Spider diagrams, and could then represent these as abstract---i.e. mathematical---definitions. % The primary motive for writing the Spider diagram editor was to provide an alternative to formal languages for software specification. % An added attraction for using spider diagrams was that they could be used in proving logic and theorems~\cite{theoremflower,Fish200553} in an intuitive way. % Because of the author's daily work exposure to FMEA, %I started thinking it was natural to think of ways to apply formal languages and spider diagrams to failure mode analysis. % % \paragraph{European Safety Requirements increase in scope and complexity.} At work---which consisted of designing, testing, building and writing embedded `C' and assembly language code for safety critical industrial burners---the design team was faced with a new and daunting requirement. Conformance to the latest European standard, EN298~\cite{en298}. % It appeared to ask for the impossible: not only did it require the usual safety measures (self-checking of ROM and RAM, watchdog processors with separate clock sources, EMC and the triple fail safe control of valves), it had one new clause in it that had far reaching consequences. % It stated that in the event of a failure, where the controller had gone into a `lockout~state'--- a state where the controller applies all possible safety measures to stop fuel entering the burner---it was not permitted to % could not become dangerous should another fault occur. % In short this meant %we had to be able to dealing with double failures. % Any of the components that could, in failing, create a dangerous state were already documented and approved using failure mode effects analysis (FMEA). % This new requirement effectively meant that single and double component failures were now required to be analysed~\cite{en298}[9.1.5]. % This, from a state explosion problem alone, meant that it was going to be virtually impossible to perform. \fmmdglossSTATEEX % To compound the problem, %state explosion problem FMEA has a deficiency of repeated work, as each component failure is typically represented by one line or entry in a spreadsheet~\cite{bfmea}; analysis on repeated sections of circuitry (for instance repeated {\ft} outputs on a PCB) meant that analysis of identical circuitry was performed many times. % % \subsection{Modularising/De-Composing FMEA: Initial concepts.} % and augmenting this with concepts from Euler/Spider Diagrams.} % In the field of digital signal processing there is an algorithm that revolutionised access to frequency analysis of digital samples called the Fast Fourier transform (FFT)~\cite{fftoriginal}. This took the discrete Fourier transform (DFT), and applied de-composition to its mesh of (often repeated) complex number calculations~\cite{fpodsadsp}[Ch.8]. % By doing this it broke the computing order of complexity down from having a polynomial %n exponential %order to logarithmic order~\cite{ctw}[pp.401-3]. % The author wondered if this thinking could be applied to the state explosion problems encountered in FMEA. % \fmmdglossSTATEEX %Following the concept of de-composing a problem, and thus simplifying the state explosion---using the thinking behind %the fast Fourier transform (FFT)~\cite{fpodsadsp}[Ch.8], which takes a complex intermeshed series of real and imaginary number calculations %and by de-composing them, simplifies the problem. % % My reasoning was that if we analysed %were we to analyse % the problem in small modules, from the bottom-up following the FFT example, we could apply % checking for all double failure scenarios. The authors reasoning was that if %were we to analyse the problem were analysed in small modules, from the bottom-up following the FFT example, checking for all double failure scenarios could have been applied. % % Once these first modules were analysed---we now call them {\fgs}---we could determine the symptoms of failure for them. % Using the symptoms of failure, we could now treat these modules as components in their own right---or {\dcs}---and use them to build higher level % {\fgs}. Higher and higher levels of {\fgs} could be built until we had a hierarchy % representing a failure mode model for the system. Once these first modules were analysed---now called {\fgs}---the symptoms of failure could be determined for them. % Using the symptoms of failure, these modules could be treated as components in their own right---or {\dcs}---and used to build higher level {\fgs}. % Higher and higher levels of {\fgs} could be built until a hierarchy representing a failure mode model for the complete system had been created. % %Because this is modular, %we can apply double simultaneous failure mode checking; and as %because Double simultaneous failure mode checking can be applied as the number of components in each {\fg} is typically small; state explosion problems are thus avoided. % for the general case. % AF says `in the general case' here 12JAN2013 \fmmdglossSTATEEX % % % If we apply % double checking all the way up the hierarchy we can guarantee to have considered % every double simultaneous failure of all components in a system. If double checking is applied all the way up the hierarchy, %we can guarantee to have considered all possible double simultaneous failures in a system can be guaranteed to have been considered. % This means, as a fortunate by-product, that many multiple as well as double failures would be analysed, but because failure modes are traceable from the base components to the top level---or system---failure modes, these relationships can be held in a traversable data structure. % % If held in a traversable data structure we can apply automated methods to search for all the combinations of multiple failure modes % within the model that had been analysed. If held in a traversable data structure automated methods can be applied to search for all the combinations of multiple failure modes throughout the model being analysed. % Because of this, it will not always %it may not be necessary to apply double checking at all higher levels in the analysis hierarchy, to achieve complete double failure coverage. % The points at which it is possible to relax double failure checking can be verified automatically by traversing the failure mode model. % \subsection{Initial direction: Application of Spider diagrams to FMEA.} Because, Euler/Spider Diagrams~\cite{howse:spider} could be used to model failure modes in components it was thought that a diagrammatic notation would be more user friendly than using formal logic. % For an FMEA Spider diagram, contours represent failure modes, and the Spider diagram `existential~points' represent instances of failure modes. % Overlapping contours represent multiple failure modes. % By drawing a spider collecting existential points, a common failure symptom could be determined and from this a new diagram generated automatically to represent the {\dc}. % Each spider represented a derived failure mode. The act of collecting common symptoms by drawing spiders meant that the analyst was forced to associate one component failure mode with one symptom/derived~failure~mode of failure. % These concepts were presented at the ``Euler~2004''~\cite{Clark200519} conference held at the University of Brighton. % % This defined the concepts for modularising FMEA using the formal visual notations from Spider diagrams. This lead to work on rapidly calculating available zones in Euler diagrams~\cite{Clark_fastzone,Rodgers2013}. % The spider diagram notation was useful in defining the concepts and initial ideas, but a more traditional `spreadsheet' format has been used for the analysis stages of the new methodology. % Euler diagrams have been used later in the thesis to describe the containment relationships of derived components when building hierarchical analysis models with the modularised variant of FMEA that this thesis proposes and defends. % % \section{Objectives of the thesis.} The primary objective of the work performed for this thesis is to present a new modularised variant of FMEA which solves the problems of: \begin{itemize} \item State Explosion, \item Multiple failure mode modelling, \item Re-usability of pre-analysed modules, \item Inclusion of software in failure mode modelling. \end{itemize} To support this, worked examples using the new methodology were created and the work published and presented to IET safety conferences. % in 2011~\cite{syssafe2011} and 2012~\cite{syssafe2012}. % The development of FMMD, starting with a critique of FMEA and a ``wish-list'' for a better methodology, was presented to the IET System safety conference in 2011,~\cite{syssafe2011}. % FMEA, currently cannot integrate software models into its hardware failure mode models~\cite{sfmea,modelsfmea,embedsfmea,sfmeainterface}, but % \fmmdglossCONTRACTPROG FMMD can use the existing structure of functional software, in conjunction with contract programming to model software; %and this concept was presented to the IET System safety conference in 2012~\cite{syssafe2012}. \paragraph{Overview of the thesis.} Chapter~\ref{sec:chap2} examines the current state of FMEA based methodologies, Chapter~\ref{sec:chap3} examines the benefits and drawbacks of these methodologies and proposes a detailed wish list for an ideal FMEA technique. Chapter~\ref{sec:chap4} proposes Failure Mode Modular de-composition (FMMD)---a modularised variant of FMEA designed to address the points in the detailed wish list. Chapter~\ref{sec:chap5} provides worked examples using selected electronic circuits. Chapter~\ref{sec:chap6} gives two examples of integrated software and electronic systems analysed using FMMD. Metrics and evaluation, along with an example showing double simultaneous failure analysis, are provided in Chapter~\ref{sec:chap7}, with a conclusion and further work in Chapter~\ref{sec:chap8}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%