Robin_PHD/papers/JOURNAL_fmea_sw_hw/sw_hw_fmea.tex



%%% OUTLINE


\documentclass[twocolumn]{article}
%\documentclass[twocolumn,10pt]{report}
\usepackage{graphicx}
\usepackage{fancyhdr}
%\usepackage{wassysym}
\usepackage{tikz}
\usepackage{amsfonts,amsmath,amsthm}
\usetikzlibrary{shapes.gates.logic.US,trees,positioning,arrows}
%\input{../style}
\usepackage{ifthen}
\usepackage{lastpage}
\usetikzlibrary{shapes,snakes}
\newcommand{\tickYES}{\checkmark}
\newcommand{\fc}{fault~scenario}
\newcommand{\fcs}{fault~scenarios}
\date{}
%\renewcommand{\encodingdefault}{T1}
%\renewcommand{\rmdefault}{tnr}
%\newboolean{paper}
%\setboolean{paper}{true} % boolvar=true or false
\newcommand{\derivec}{{D}}
\newcommand{\ft}{\ensuremath{4\!\!\rightarrow\!\!20mA} }
\newcommand{\permil}{\ensuremath{{ }^0/_{00}}}
\newcommand{\oc}{\ensuremath{^{o}{C}}}
\newcommand{\adctw}{{${\mathcal{ADC}}_{12}$}}
\newcommand{\adcten}{{${\mathcal{ADC}}_{10}$}}
\newcommand{\ohms}[1]{\ensuremath{#1\Omega}}
\newcommand{\fm}{failure~mode}
\newcommand{\fms}{failure~modes}
\newcommand{\fg}{functional~grouping}
\newcommand{\FG}{\mathcal{G}}
\newcommand{\DC}{\mathcal{DC}}
\newcommand{\fgs}{functional~groupings}
\newcommand{\dc}{derived~component}
\newcommand{\dcs}{derived~components}
\newcommand{\bc}{base~component}
\newcommand{\FMMD}{ModularFMEA}
\newcommand{\bcs}{base~components}
\newcommand{\irl}{in real life}
\newcommand{\enc}{\ensuremath{\stackrel{enc}{\longrightarrow}}}
\newcommand{\pin}{\ensuremath{\stackrel{pi}{\longleftrightarrow}}}
%\newcommand{\pic}{\em pure~intersection~chain}
\newcommand{\pic}{\em pair-wise~intersection~chain}
\newcommand{\wrt}{\em with~respect~to}
\newcommand{\abslevel}{\ensuremath{\Psi}}
\newcommand{\fmmdgloss}{\glossary{name={FMMD},description={Failure Mode Modular De-Composition, a bottom-up methodolgy for incrementally building failure mode models, using a procedure taking functional groups of components and creating derived components representing them, and in turn using the derived components to create higher level functional groups, and so on, that are used to build a failure mode model of a system}}}
\newcommand{\fmodegloss}{\glossary{name={failure mode},description={The way in which a failure occurs. A component or sub-system may fail in a number of ways, and each of these is a
failure mode of the component or sub-system}}}
\newcommand{\fmeagloss}{\glossary{name={FMEA}, description={Failure Mode and Effects analysis (FMEA) is a process where each potential failure mode within a system, is analysed to determine system level failure modes, and  to then classify them {\wrt} perceived severity}}}
\newcommand{\frategloss}{\glossary{name={failure rate}, description={The number of failure within a population (of size N), divided by N over a given time interval}}}
\newcommand{\pecgloss}{\glossary{name={PEC},description={A Programmable Electronic controller, will typically consist of sensors and actuators interfaced electronically, with some firmware/software component in overall control}}}
\newcommand{\bcfm}{base~component~failure~mode}
\def\layersep{1.8cm}

\newboolean{pld}
\setboolean{pld}{false} % boolvar=true or false : draw analysis using propositional logic diagrams

\newboolean{dag}
\setboolean{dag}{true} % boolvar=true or false : draw analysis using directed acylic graphs

% \setlength{\topmargin}{0in}
% \setlength{\headheight}{0in}
% \setlength{\headsep}{0in}
% \setlength{\textheight}{22cm}
% \setlength{\textwidth}{18cm}
% %\setlength{\textheight}{24.35cm}
% %\setlength{\textwidth}{20cm}
% \setlength{\oddsidemargin}{0in}
% \setlength{\evensidemargin}{0in}
% \setlength{\parindent}{0.0in}
% %\setlength{\parskip}{6pt}
% % \setlength{\parskip}{1cm plus4mm minus3mm}
% \setlength{\parskip}{0pt}
% \setlength{\parsep}{0pt}
% \setlength{\headsep}{0pt}
% \setlength{\topskip}{0pt}
% \setlength{\topmargin}{0pt}
% \setlength{\topsep}{0pt}
% \setlength{\partopsep}{0pt}
% \setlength{\itemsep}{1pt}
% \renewcommand\subsection{\@startsection
% {subsection}{2}{0mm}%
% {-\baslineskip}
% {0.5\baselineskip}
% {\normalfont\normalsize\itshape}}%
\linespread{1.0}

\begin{document}
%\pagestyle{fancy}
%\fancyhf{}
%\fancyhead[LO]{}
%\fancyhead[RE]{\leftmark}

%\cfoot{Page \thepage\ of \pageref{LastPage}}
%\rfoot{\today}
%\lhead{Developing a rigorous bottom-up modular static failure mode modelling methodology}
%\lhead{Developing a rigorous bottom-up modular static failure modelling methodology}
                                   % numbers at outer edges
\pagenumbering{arabic}                        % Arabic page numbers hereafter
\author{R.Clark$^\star$, A.~Fish$^\dagger$ , C.~Garrett$^\dagger$, J.~Howse$^\dagger$  \\
         $^\star${\em Energy Technology Control, UK. r.clark@energytechnologycontrol.com} \and $^\dagger${\em University of Brighton, UK}
}

%\title{Developing a rigorous bottom-up modular static failure mode modelling methodology}
\title{Failure Mode Effects Analysis (FMEA) for  Software/Hardware Hybrid Systems using a modular bottom-up hierarchical modelling methodology}
%\nodate
\maketitle


\paragraph{Keywords:} static failure mode modelling; safety-critical; software fmea
%\small

\abstract{ % \em
%\input{abs}
%The certification process of safety critical products for European and
%other international standards often demand environmental stress,
%endurance and Electro Magnetic Compatibility (EMC) testing. Theoretical, or 'static testing',
%is often also required.
%
%Failure Mode Effects Analysis (FMEA), is a bottom-up technique that aims to assess the effect all
%component failure modes on a system.
%It is used both as a design tool (to determine weaknesses), and is a requirement of certification of safety critical products.
%FMEA has been successfully applied to mechanical, electrical and hybrid electro-mechanical systems.
%
%Work on software FMEA (SFMEA) is beginning, but
%at present no technique for SFMEA that
%integrates hardware and software models % known to the authors
%exists.
% %

%
%Failure modes in components in say a sensor, could be traced
%up through the electronics and then through the controlling software.
%
%Presently Failure Mode Effects Analysis (FMEA), stops at the glass ceiling of the computer program.
This paper takes, from the literature, new and emerging methodologies
for software FMEA, applies them to a simple example system, and then
reaches conclusions about the effectiveness and failure mode
coverage of the combined FMEA techniques.

This paper presents a worked example of FMEA applied to an
integrated electronics/software system, the industry standard
{\ft} signalling loop.
%
%FMEA methodologies trace from the  1940's and were designed to
%model simple electro-mechanical systems.
%
FMEA methodologies were originally in the 1940's designed to
model simple electro-mechanical systems.
%
Because the early systems analysed by FMEA were relatively simple,
modern FMEA methodologies follow this paradigm and
trace component failure modes to system level failures.
%
%This paper explores the historical reasons why FMEA is performed in the way it is currently and
%the new factors placing higher demands upon it.
%
Software generally sits on top of most modern safety critical control systems
and defines its most important system wide behaviour and communications.
%
Currently standards  that demand FMEA investigations for hardware(HFMEA) (e.g. EN298, EN61508),
do not specify it for software, but instead specify good practise,
review processes and language feature constraints.
%
This is a weakness.
%
Where HFMEA % scientifically
traces component {\fms}
to resultant system failures, software until recently, has been left in a non-analytical
limbo of best practises and constraints.
Software FMEA has been proposed
in several forms.
%
However, SFMEA is always performed separately from HFMEA.
%
This paper seeks to examine the effectiveness of current and proposed SFMEA
techniques, by analysing a simple hybrid hardware/software system,
which is in common use and has mature field experience. %
%analysing the chosen example, which is well known and understood
%
Because the chosen example is well understood it is
%, this example is
useful
to compare the results from these FMEA methodologies with
the known failure mode behaviour.
%from years of field experience, and determining how well the HFMEA and SFMEA
%analysis reports model the failure mode behaviour.
% %
%If software and hardware integrated FMEA were possible, electro-mechanical-software hybrids could
%be modelled, and so we could consider `complete' failure mode models.
%
%Presently FMEA, stops at the glass ceiling of the computer program: FMMD seeks to address
%this, and offers additional  test efficiency benefits.
This paper is a condensed version of the PhD thesis entitled `failure Mode Modular De-compositon'~\cite{clark}. \today
}

%\today
\nocite{en298}
\nocite{en61508}


\section{Introduction}
{
%This paper describes a modular FMEA process that can be applied to software.
%This modular variant of FMEA is called Failure Mode Modular de-composition (FMMD).
%
%Because this process is based on failure modes of components,
%it can be applied to electrical and/or mechanical systems.
%
%The hierarchical structure of software is then examined,
%and definitions from contract programming are used
%to define failure modes and failure symptoms for
%software functions.
%
%With these definitions we can apply the FMMD modular form of FMEA
%to existing software\footnote{Existing software excluding recursive~\cite{misra}[16.2] code,
%and unstructured non-functional language.}.
}

\section{FMEA Background}

%What FMEA is, briefly variants...

Failure Mode Effects Analysis is the process of taking
component failure modes, %and by reasoning,
tracing their effects through a system
and determining what system level failure modes could be caused.
%
The certification process of safety critical products for European and
other international standards often demand environmental stress, magnetic susceptibility,
endurance and Electro Magnetic Compatibility (EMC) testing.
%
Theoretical, or `static~testing', is often also required.
%
Failure Mode effects Analysis (FMEA)~\cite{iec60812} is a tool used
for static testing.
%
For many types of safety critical system in the European Union, product design testing and FMEA
is legally mandatory~\cite{en230,en298}.
%
%Its use is traditionally only applied to hardware (electrical and mechanical) systems.
%
%
FMEA has its roots in the previous century where simple electro-mechanical systems were the norm.
%
With surface mount technology and increasingly dense integrated circuitry, electronics generally
has much higher component counts and more complex components than those in use when FMEA
was designed.

% Several variants of FMEA exist,
% but the three in main use are:
% \begin{itemize}
%  \item Deisgn FMEA (DFMEA) is FMEA applied at the design or approvals stage~\cite{en298, en230}
% where the aim is to ensure that single component failures (at least) cannot
% cause unacceptable system level events~\cite{fmea},
%  \item Failure Mode Effect Criticality Analysis (FMECA)  is applied to determine the most potentially dangerous or damaging
% failure modes to fix, using FMEA in conjunction with severity and failure probability figures~\cite{fmeca,mil1991,fmd91},
%  \item Failure Mode Effects and Diagnostics Analysis, is FMEA peformed to
% determine a statistical level of safety.
% This is associated with Safety Integrity Levels (SIL)~\cite{en61508}~\cite{en61511} classification.
% \end{itemize}


\subsection{Reasoning distance.}
\label{reasoningdistance}
%\fmmdglossRD
Reasoning distance,   is the number of stages of logic and reasoning used
in {\fm} analysis to map a failure cause to its potential outcomes; counted
by the number of {\fm} to component checks made.
%
%The basic FMEA example in section~\ref{basicfmea}
%considered one {\fm} against some of  the components in the milli-volt reader.
%
To create an exhaustive FMEA report  every
known failure mode of every component
within the system would have to be examined against all its other components.
%
`Reasoning~distance', for one {\fm}, is defined as the number of components checked against it
to determine its system level symptom(s).
%
No current FMEA variant gives guidelines for the components that should
be included to analyse a {\fm} in a system.
%
Were a particular {\fm} examined against all the other components in a system
this would give us the maximum reasoning distance.
%
This is termed the exhaustive FMEA case for a single {\fm}.
%does not
% The exhaustive~reasoning~distance would be
% the sum of the number of failure modes, against all other components
% in that system.
Thus the exhaustive~reasoning~distance for a particular component
would be to multiply
the number of failure modes it has by the number of remaining components
in the system.
%
The exhaustive reasoning~distance for a system would be the
the sum of these multiplications for all the components it contains.
%
If the milli-volt reader had say 100 components, with three failure modes each, this
would give an exhaustive reasoning distance---for single failure analysis---of $3 \times 100 \times 99$.
%
The discussion on reasoning distance provides a metric to examine
the state explosion problems associated with forward search failure investigation
methodologies.
%
%\fmmdglossSTATEEX
%
It is apparent that the shorter the reasoning distance, the more precisely theoretical examination
can determine failure symptoms.
%
For instance for a very simple small circuit, a better understanding of failure effects is expected,
than for a very large system where there are more variables and potential {\fm} interactions.
%
%.... general concept... simple ideas about how complex a
%failure analysis is the more modules and components are involved
% cite for forward and backward search related to safety critical software
 %{sfmeaforwardbackward}
\subsection{FMEA and the  State Explosion Problem}
\label{sec:xfmea}
\paragraph{Problem of which components to check for a given {\bc} {\fm}.}
%\fmmdglossSTATEEX
%
FMEA for safety critical certification (i.e. for EN298 and EN61508)~\cite{en298,en61508}  has to be applied
to all known failure modes of all components within a system.
%
Each one of these, in a typical report, would be one line of a spreadsheet entry.
%
FMEA does not define or specify the scope of the investigation for each component failure mode.
%
For instance should  the signal path be followed, with all components encountered along that, or should the scope be wider?
%
%If we wethe effect of a component {\fm} against all other components
%in a system, this could be said to be exhaustive analysis.

\paragraph{Exhaustive Single Failure FMEA.}
%\fmmdglossXFMEA
%
To perform exhaustive FMEA (XFMEA), every possible interaction
of a failure mode with all other components in a system must be examined.
%
Or in other words, all possible failure scenarios considered.
%
%to do this completely (all failure modes against all components).
This is represented in the equation below, %~\ref{eqn:fmea_state_exp},
where $N$ is the total number of components in the system, $RD_{single}$ is the reasoning~distance and
$f$ is the number of failure modes per component:
%
\begin{equation}
  \label{eqn:fmea_single}
  RD_{single} = N.(N-1).f  . % \\
  %(N^2 - N).f
\end{equation}
%
This means an order of $O(N^2)$  checks to perform
to undertake XFMEA for single failures.
%
Even small systems have typically
100 components, and they typically have 3 or more failure modes each, which would give
$100 \times 99 \times 3 = 29,700 $ as a reasoning~distance.
%
%\fmmdglossSTATEEX
\paragraph{Exhaustive FMEA and double failure scenarios.}
%
%\paragraph{Exhaustive Double Failure FMEA}
For looking at potential double failure
scenarios\footnote{Certain double failure scenarios are already legal
requirements---The European Gas burner standard (EN298:2003~\cite{en298})---demands the checking of
double failure scenarios (for burner lock-out scenarios).}
%
(two components failing within a given time frame) and the order becomes $O(N^3)$.
Where $RD_{double}$ is the reasoning~distance for double failure scenarios:
\begin{equation}
  \label{eqn:fmea_double}
  RD_{double} = N.(N-1).(N-2).f  . % \\
  %(N^2 - N).f
\end{equation}
%
For a theoretical system with 100 components and a fixed 3 failure modes each, this gives reasoning distance of
$100 \times 99 \times 98 \times 3 = 2,910,600$. % failure mode scenarios.
%
In practise there is an additional complication here, that of
the circuit topology changes that {\fms} can cause.

\paragraph{Reliance on experts for meaningful FMEA Analysis.}
Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach.
%We define exhaustive FMEA ({\XFMEA}) as examining the effect of every component failure mode
%against the remaining components in the system under investigation.
%
%\fmmdglossSTATEEX
%
Because for practical reasons,   XFMEA cannot be performed for anything other than a trivial system,
reliance is placed upon  experts on the system under investigation
to perform a meaningful analysis.
%
These experts must use their judgement and experience to choose
sub-sets of the components in the system to check against each {\fm}.
%
Also, %In practise
these experts have to select the areas they see as most critical for detailed FMEA analysis:
it is usually impossible, for reasons of time to perform the work,
to action a detailed level of analysis on all component {\fms}
on anything but a small hypothetical system.

% \subsection{Component Tolerance}
%
% Component tolerances may need considering when determining if a component has failed.
% Calculations for acceptable ranges to determine failure or acceptable conditions
% must be made where appropriate.
% %
% An example of component tolerance considered for FMEA
% is given in section~\ref{sec:resistortolerance}.

%\section{FMEA in current usage: Five variants}
\section{FMEA in current usage: Four variants}

%\paragraph{Five main Variants of FMEA}
\paragraph{Four main Variants of FMEA}
 \begin{itemize}
  %\item \textbf{PFMEA - Production}   Emphasis on cost reduction and product improvement;
    \item \textbf{FMECA - Criticality}  Emphasis on minimising the effect of critical systems failing~\cite{fmeca}; % Military/Space
    \item \textbf{FMEDA - Statistical Safety} Statistical analysis giving Safety Integrity Levels~\cite{en61508};
   \item \textbf{DFMEA - Design or Static/Theoretical}  Approval of safety critical systems using FMEA and single or double failure prevention~\cite{en298};%  EN298/EN230/UL1998
   \item \textbf{SFMEA - Software FMEA} --- Usage not enforced by most current standards~\cite{en298,en230,en61508}. %only used in highly critical systems at present.
\end{itemize}


\nocite{MILSTD1629short}

\subsection{FMEA and modularity.}
Because modern electronics has become more complex the number
of basic components has risen dramatically.
To add to this components used to fulfil common functions are often Integrated Circuits (ICs)..
Typical examples include  voltage regulators, op-amps, micro-controller~\cite{pic18f2523}, memory modules and
protocol handlers~\cite{mcp2515}. To build any of these component from scratch would be very expensive and time consuming,
but these IC `components' have very high internal transistor counts, and each have their own unique
failure mode behaviour.
Modern electronics has already jumped the gun of the basic component failure mode mapped to
a system failure paradigm.

The automotive industry, because of mass production, must make products that are very safe but are
under financial pressure to keep their products affordable.
%
This leads to specialist firms producing modules, such as automatic braking systems,
that are assembled to make a automobile.
%
Performing failure analysis using the basic component single failure modes to
system failure mapping, would be very difficult: this would require expert knowledge
of the design behaviour and component types used in each module.
%
The EN61508 variant for automotive use, as defined in standard ISO~26262, is known as Automotive SIL (ASIL)~\cite{Kafka20122}.
%
Because of the modular approach  forced on automotive  designers
a process has been developed called `ASIL~de-composition'~\cite{6464473}.
%
This allows automotive designers to use pre-certified modules in their designs
and applies broad statistical guidelines to achieving particular safety levels by
use of redundancy and automated diagnostics etc.
%
The US military standard for FMECA~\cite{fmeca}, describes a very broad modularity regime, that
it terms `indenture' levels. Indenture levels are arranged from the top down
and identify finer and finer grained modules. For instance, an aircraft
may be the first indenture level, and the next may be an identifiable module such as
an altitude radar: within that finer grained modules may be identified until
the base components are listed. Note that this is a top down approach and
this can introduce errors into the reliability calculations~\cite{MILSTD1629short}.

It is interesting to compare the development of FMEA methodologies with software.
Software expanded in complexity faster than electronics,
and to cope with this software languages developed modularity (function call trees, classes and finally distributed processing mechanisms).

FMEA has had, by necessity, started to start to include some modular features, but none yet
have defined mechanisms for ensuring that all failure modes
from a module must be considered in the analysis of the module(s)
that incorporate it.

Because FMEA is a bottom up technique, applying a top down analysis (as in FMECAs indenture levels)
cannot guarantee to consider all component failure modes in the correct context.
%
A top down approach (such as FTA) can miss~\cite{faa}[Ch.~9] individual failure modes of components,
especially where there are non-obvious or unexpected  top-level failures.
%
In order to ensure that every failure mode is considered, a bottom-up approach
including every base components {\fms} must be used.
%
Going back to the software analogy, the indenture levels of FMECA are similar to
a software call tree where the highest indenture levels would be its leaf functions.
%
There is no equivalent of the software `class'.
%
In the real world however there is, consider CANOpen standard sensors, these are%~\footnote{CANopen sensors...}
modules connected by an industrial data bus~\cite{canspec, caninauto}.
%
These not only typically have electrical and mechanical
components, they have a firmware and communication bus aspects.
%
These type of modules combine hardware, electronics, software, communications
and distributed programming.
%
Current FMEA techniques struggle with software alone, and also, fail to integrate the analysis of hardware and software
systems~\cite{sfmea, embedsfmea, modelsfmea, sfmeaa, sfmeainterface }.


%
\subsection{FMEA and software.}
In addition to increasing complexity in electronics, modern control systems nearly always have a significant software/firmware element,
and not being able to model software with current FMEA methodologies
is a cause for criticism~\cite{safeware}[Ch.12].
%
Similar difficulties in integrating mechanical and electronic/software
failure models are discussed in ~\cite{SMR:SMR580}.
%
Currently standards  that demand FMEA for hardware (e.g. EN298~\cite{en298}, EN61508~\cite{en61508}),
do not specify it for software, but instead specify, recommended computer architectures, good software practise,
review processes and language feature constraints.
%


%
%Software FMEA techniques have been proposed


%FMMD is a modularisation of FMEA and can produce failure~mode models that can be used in
%all the above variants of FMEA.

\paragraph{Current work on Software FMEA}

SFMEA usually does not seek to integrate
hardware and software models, but to perform
FMEA on the software in isolation~\cite{procsfmea}.
%
Work has been performed using databases
to track the relationships between variables
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
and FMEA (bottom-up inductive)
to be performed on the same system to provide insight into the
software hardware/interface~\cite{embedsfmea}.
%
Although this
would give a better picture of the failure mode behaviour, it
is by no means a rigorous approach to tracing errors that may occur in hardware
through to the top (and therefore ultimately controlling) layer of software.

\subsection{Current FMEA techniques are not suitable for software}

The main FMEA methodologies are all based on the concept of taking
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
%
In a complicated system, mapping a component failure mode to a system level failure
will mean a long reasoning distance; that is to say the actions of the
failed component will have to be traced through
several sub-systems, gauging its effects with and on other components.
%
With software at the higher levels of these sub-systems,
we have yet another layer of complication.
%
%In order to integrate software, %in a meaningful way
%we need to re-think the
%FMEA concept of simply mapping a base component failure to a system level event.
%
SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
The failure modes of these variables, are that they could become erroneously over-written,
calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
external influences such as
ionising radiation causing bits to be erroneously altered.


\section{FMEA defeciences and `wishlist'}

%\subsection{FMEA - General Criticism}
A summary of deficiencies in current FMEA methodologies is listed below:
\begin{itemize}
   %\item FMEA type methodologies were designed for simple electro-mechanical systems of the 1940's to 1960's,
   \item State explosion - %impossible
   very difficult/time consuming to perform FMEA exhaustively, %rigorously
   \item Difficult to re-use previous analysis work,
   \item Very difficult to model simultaneous/multiple failures,
   \item Software and hardware models are separate (if the software is modelled at all) meaning the software interface may not be correctly modelled,
   %\item reasoning distance -- component failure to system level symptom process is undefined in regard
   %to the components to check against each given component {\fm},
   \item FMEA methodologies are undefined in regard to which components to check against given failure modes,
   %
   \item Distributed real time systems are very difficult to analyse with FMEA because they typically involve many hardware/software interfaces.
\end{itemize}

Traditional forms of FMEA are no longer % fit for purpose!
of meaningful use for complex modern systems especially those incorporating programmatic elements.
They were designed to analyse simple electro-mechanical systems
and even common place high component count analogue circuits (that are usually surface mount and therefore physically small), are
getting too complicated for meaningful analysis using FMEA.


%
From the above defeciencies, a wish list for a better FMEA is presented, stating the features that should exist
in an improved FMEA methodology,
\begin{itemize}
    \item Must be able to analyse hybrid software/hardware systems,
    \item avoid state explosion (i.e. XFMEA is impractical by hand~\cite{cbds}),
    \item encourage exhaustive checking within each modular, %(total failure coverage within {\fgs} all interacting component and failure modes checked),
    \item traceable reasoning inherent in system failure models,% to aid repeatability and checking,
    \item re-usable i.e. it should be possible to re-use analysis~\cite{rudov2009language},
    \item possibility to analyse simultaneous/multiple failures,
    \item one to one mapping from {\bc} {\fms} to system level failures (see section~\ref{sec:onetoone}),
    \item modular --- i.e. usable in a distributed system.
  % \item
\end{itemize}


\section{Proposed Methodology: Failure Mode Modular De-composition (FMMD)}


\paragraph{A more-complete Failure Mode Model}
%
In order to obtain a more complete failure mode model of
a hybrid electronic/software system we need to analyse
the hardware, the software, the hardware the software runs on (i.e. the software's medium),
and the software/hardware interface.
%
HFMEA is a well established technique and needs no further description in this paper.
%
\section{Example for analysis} % : How can we apply FMEA}
%
For the purpose of example, we chose a simple common safety critical industrial circuit
that is nearly always used in conjunction with a programmatic element.
A common method for delivering a quantitative value in analogue electronics is
to supply a current signal to represent the value to be sent~\cite{aoe}[p.934].
Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale,
and this is referred to as {\ft} signalling.
%
{\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20].
Thus resistance in the wires between the source and the receiving end is not an issue
that can alter the accuracy of the signal.
%
This circuit has many advantages for safety. If the signal becomes disconnected
it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range,
and is therefore easy to detect as an error rather than an incorrect value.
%
Should the driving electronics go wrong at the source end, it will usually
supply far too little or far too much current, making an error condition easy to detect.
%
At the receiving end, one needs a resistor to convert the
current signal into a voltage that we can read with an ADC.%
%we only require one simple component to convert the


%BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP

\begin{figure}[h]
 \centering
 \includegraphics[width=230pt]{./ftcontext.png}
 % ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385
 \caption{Context Diagram for {\ft} loop}
 \label{fig:ftcontext}
\end{figure}


The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending a {\ft}
signal to a micro-controller system.
The signal is locally driven over a load resistor, and then read into the micro-controller via
an ADC and its multiplexer.
With the voltage detected at the ADC the multiplexer we read the intended quantitative
value from the external equipment.

\subsection{Simple Software Example}


Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$)
representing the value intended by the current detected, with an additional error indication flag to indicate the validity
of the value returned.
%
This example straddles the hardware software interface, but is not overly complex, which allows
the FMEA seamless failure  modelling of FMMD to be demonstrated.
%
A complete
PID based temperature controller is modelled in~\cite{clark}[6.3].
%
Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage
from an ADC into the software.
Let us define any value outside the 4mA to 20mA range as an error condition.
%
As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $$0.004A * \ohms{220} = 0.88V $$
and $$0.020A * \ohms{220} = 4.4V \;.$$
%
Our acceptable voltage range is therefore
%
$$(V \ge  0.88) \wedge (V \le 4.4) \; .$$

This voltage range forms our input requirement.
%
We can now examine a software function that performs a conversion from the voltage read to
a per~mil representation of the {\ft} input current.
%
For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is
used\footnote{ C coding examples use the Misra~\cite{misra} and SIL-3 recommended language constraints~\cite{en61508}.}.
We initially  assume a function \textbf{read\_ADC} which returns a floating point %double precision
value representing the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}).


%%{\vbox{
\begin{figure}[h+]

\footnotesize
\begin{verbatim}
/***********************************************/
/* read_4_20_input()                           */
/***********************************************/
/* Software function to read 4mA to 20mA input */
/* returns a value from 0-999 proportional     */
/* to the current input.                       */
/***********************************************/
int  read_4_20_input ( int * value ) {
  double input_volts;
  int error_flag;

  /* set ADC MUX with input to read from */
  input_volts = read_ADC(INPUT_4_20_mA);

  if ( input_volts < 0.88 || input_volts > 4.4 ) {
    error_flag = 1; /* Error flag set to TRUE */
  }
  else {
    *value = (input_volts - 0.88) * ( 4.4 - 0.88 ) * 999.0;
    error_flag = 0; /* indicate current input in range */
  }
  /* ensure: value is proportional (0-999) to the
             4 to 20mA input                      */
  return error_flag;
}
\end{verbatim}
%}
%}

\caption{Software Function:  \textbf{read\_4\_20\_input}}
\label{fig:code_read_4_20_input}
%\label{fig:420i}
\end{figure}

We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a
voltage for a given ADC channel.
%
This function
deals directly with the hardware in the micro-controller on which we are running the software.
%
Its job is to select the correct channel (ADC multiplexer) and then to initiate a
conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}).
%
It takes the raw ADC reading and converts it into a
floating point\footnote{the type `double' or `double precision' is a
standard C language floating point type~\cite{DBLP:books/ph/KernighanR88}.}
voltage value.


%{\vbox{
\begin{figure}[h+]

\footnotesize
\begin{verbatim}
/***********************************************/
/* read_ADC()                                  */
/***********************************************/
/* Software function to read voltage from a    */
/* specified ADC MUX channel                   */
/* Assume 10 ADC MUX channels 0..9             */
/* ADC_CHAN_RANGE = 9                          */
/* Assume ADC is 12 bit and ADCRANGE = 4096    */
/* returns voltage read as double precision    */
/***********************************************/
double  read_ADC( int  channel ) {
  int timeout = 0;

  /* return out of range result  */
  /* if invalid channel selected */
  if ( channel > ADC_CHAN_RANGE )
     return -2.0;
  /* set the multiplexer to the desired channel */
  ADCMUX = channel;
  ADCGO = 1; /* initiate ADC conversion hardware */
  /* wait for ADC conversion with timeout */
  while ( ADCGO == 1 || timeout < 100 )
     timeout++;
  if ( timeout < 100 )
       dval = (double) ADCOUT * 5.0 / ADCRANGE;
  else
       dval = -1.0; /* indicate invalid reading */
  /* return voltage as a floating point value */
  /* ensure: value is voltage input to within 0.1% */
  return dval;
}
\end{verbatim}
\caption{Software Function: \textbf{read\_ADC}}
\label{fig:code_read_ADC}
\end{figure}
%}
%}


We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input}
calls {\em read\_ADC}, which in turn interacts with the hardware/electronics.
%shown in figure~\ref{fig:ct1}.
%
% \begin{figure}[h]
%  \centering
%  \includegraphics[width=56pt]{./ct1.png}
%  % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224
%  \caption{Call tree for software example}
%  \label{fig:ct1}
% \end{figure}
%
This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the
software is reading values from the `lower~level' electronics.
%
%FMEA is always a bottom-up process and so we must begin with this hardware.
%
The hardware is simply a load resistor, connected across an ADC input
pin on the micro-controller and ground.
%
We can identify the resistor and the ADC module of the micro-controller as
the base components in this design.
%
We now apply FMMD starting with the hardware.


\section{Failure Mode effects Analysis}

Four emerging and current techniques are now used to
apply FMEA to the hardware, the software, the software medium and the software hardware insterface.

\subsection{Hardware FMEA}

The hardware FMEA requires that for each component we consider all failure modes
and the putative effect those failure modes would have on the system.
The electronic components in our {\ft} system are the load resistor,
the multiplexer and the analogue to digital converter.

{
\tiny
\begin{table}[h+]
\caption{Hardware FMEA {\ft}} % title of Table
\label{tbl:r420i}

\begin{tabular}{|| l   | c |   l ||} \hline
 \textbf{Failure}   &  \textbf{failure}     & \textbf{System}          \\
 \textbf{Scenario}  &  \textbf{effect}      &     \textbf{Failure}                             \\ \hline
               \hline
    $R$                      &  OPEN~\cite{en298}[Ann.A]     &      $LOW$       \\
                                &           &    $READING$             \\ \hline

    $R$                      &  SHORT~\cite{en298}[Ann.A]     &      $HIGH$       \\
                                &           &    $READING$             \\ \hline


    $MUX$                   &    read wrong                           &      $VAL\_ERROR$            \\
                             &    input ~\cite{fmd91}[3-102]          &                \\ \hline


     $ADC$                  & ADC output                     &      $VAL\_ERROR$           \\
                               & erronous ~\cite{fmd91}[3-109]  &                  \\ \hline
\hline
\end{tabular}
\end{table}
}

The last two failures both lead to the system failure of  $VAL\_ERROR$ .
They could lead to low or high reading as well, but we would only be able to determine this
from knowledge of the software systems criteria for these.
%\clearpage
\subsection{Software FMEA - variables in place of components}

For software FMEA, we take the variables used by the system,
and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}.
From the function  $read\_4\_20\_input()$ we have the variables $error\_flag$,
$input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$.
We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}.


{
\tiny
\begin{table}[h+]
\caption{SFMEA {\ft}} % title of Table
\label{tbl:sfmea}

\begin{tabular}{|| l   | c |   l ||} \hline
 \textbf{Failure}   &  \textbf{failure}     & \textbf{System}          \\
 \textbf{Scenario}  &  \textbf{effect}      &   \textbf{Failure}                               \\ \hline
               \hline
    $error\_flag$               &   set FALSE        &  $VAL\_ERROR$    \\
                                &                   &            \\ \hline

   $error\_flag$               &    set TRUE        &   invalid       \\
                                &                  &   error flag             \\ \hline

    $input\_volts$              &  corrupted         & $VAL\_ERROR$      \\
                                &                    &          \\ \hline


    $value $                   &  corrupted           &   $VAL\_ERROR$                    \\
                               &                      &                 \\ \hline


     $timeout $                  &  corrupted                   &  $VAL\_ERROR$              \\
                                 &                              &                  \\ \hline


    $ADCMUX $                   &   corrupted                &   $VAL\_ERROR$             \\
                                &                            &                 \\ \hline


     $ADCGO $                  &    corrupted       &     $VAL\_ERROR$           \\
                               &                    &                  \\ \hline

    $dval $                   &    corrupted        &    $VAL\_ERROR$            \\
                             &                      &                 \\ \hline


\hline
\end{tabular}
\end{table}
}
%\clearpage
\subsection{Software FMEA - failure modes of the medium ($\mu P$) of the software}

Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM
EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and
oscillator clock timing


{
\tiny
\begin{table}[h+]
\caption{SFMEA {\ft}} % title of Table
\label{tbl:sfmeaup}

\begin{tabular}{|| l   | c |   l ||} \hline
 \textbf{Failure}   &  \textbf{failure}     & \textbf{System}        \\
 \textbf{Scenario}  &  \textbf{effect}      & \textbf{Failure}       \\ \hline
               \hline
    $RAM$               &   variable        & All errors   \\
                        &   corruption      &   from table~\ref{tbl:sfmea}        \\ \hline

   $RAM$               &    program flow        &   process       \\
                       &                        &    halts / crashes        \\ \hline

     $OSC$		& stopped &      process   \\
                      &           &     halts    \\ \hline

  $OSC$		 & too               &  ADC      \\
                  &  fast            &  value errors      \\ \hline

  $OSC$		 & too            &   ADC      \\
                  &  slow         &  value errors      \\ \hline

     $ROM$		& program               &    All errors      \\
                       &   corruption          &  from table~\ref{tbl:sfmea}       \\ \hline

   $ROM$		& constant             &    All errors      \\
                      &  /data corruption     &   from table~\ref{tbl:sfmea}      \\ \hline

\hline
\end{tabular}
\end{table}
}

%\clearpage
\subsection{Software FMEA - The software/hardware interface}

As FMEA is applied separately to software and hardware
the interface between them is an undefined factor.
Ozarin~\cite{sfmeainterface,procsfmea}  recommends that an FMEA report be written
to focus on the software/hardware interface.
The software/hardware interface has
specific problems common to many systems and configurations
and these are described in~\cite{sfmeainterface}.
%An interface FMEA is performed in table~\ref{hwswinterface}.
%
The hardware to software interface for the {\ft} example is handled
by the 'C' function $read\_ADC()$
(see code sample in figure~\ref{fig:code_read_ADC}).
%
% An FMEA of the `software~medium' is given in table~\ref{tbl:sfmeaup}.
\paragraph{Timing and Synchronisation.}
The $ADCOUT$ register, where the raw ADC value is read
is an internal register used by the ADC and presented
as a readable memory location when the ADC
has finished updating it.
Reading it at the wrong time would
cause an invalid value to be read.
The synchronisation is performed by polling an $ADCGO$
bit, a flag mapped to memory by which  the ADC indicates that the data is ready.

\paragraph{Interrupt Contention.}
Were an interrupt to also attempt to read from the ADC
the ADCMUX could be altered, causing the non-interrupt
routine to read from the wrong channel.

\paragraph{Data Formatting.}
The ADC may use a big-endian or little endian integer
format. It may also right or left justify the bits in its value.


\section{Conclusion}
%
This paper has picked a very simple example (the industry standard {\ft}
input circuit and software) to demonstrate
SFMEA and HFMEA methodologies used to describe a failure mode model.
%Even a modest system would be far too large to analyse in conference paper
%and this
%
%The {\dc} representing the {\ft} reader
%shows that by taking a
%modular approach for FMEA, i.e. FMMD, we can integrate
Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from
model the system from  several failure mode perspectives.
%
With traditional FMEA methods the reasoning~distance is large, because
it stretches from the component failure mode to the top---or---system level failure.
%
With these four  analysis reports
we do not have stages along the `reasoning~path' linking the failure modes from the
electronics to those in the software.
%Software is often written `defensively' but t
%Each {\fg} to {\dc} transition represents a
%reasoning stage.
%
%
%For this reason applying traditional FMEA to software stretches
%the reasoning distance even further.
%
In fact many these reasoning paths overlap---or even by-pass one another---
it is very difficult to gauge cause and effect.
For instance, hardware failures are not analysed in the context of how they will
be handled (or missed) by the software.
%
System outputs commanded from software may not take into account particular
hardware limitations etc.

The interface FMEA does serve to provide a useful
check-list to ensure data and synchronisation conventions used by the hardware
and software are not mismatched. However, the fact it is perceived as required
highlights the the miss-matches possible between the two types of analysis
which could run deeper than the mere interface level.


However, while these techniques ensure that the software and hardware is
viewed and analysed from several perspectives, it cannot be termed a homogeneous
failure mode model.
%  For instance
%  were the ADC to have a small value error, say adding
%  a small percentage onto the value, we would be unable to
%  detect this under the analysis conditions for this model, or
%  be able to pinpoint it.
%

Need wishlist ticks and solved problems here.

{
\footnotesize
\bibliographystyle{plain}
\bibliography{../../vmgbibliography,../../mybib}
}
\today
%\today
\end{document}