Literature review started

Using the paper "Practical Assessment Research & Evaluation"
as a guide to structure
This commit is contained in:
Robin Clark 2012-11-03 22:49:39 +00:00
parent 8d39f0c310
commit f0b4ecb0fc
3 changed files with 109 additions and 591 deletions

View File

@ -380,6 +380,13 @@ year = {2012},
YEAR = "2005" YEAR = "2005"
} }
@BOOK{easw,
AUTHOR = "Nancy Leveson",
TITLE = "Engineering a Safer World ISBN: 978-0-262-01662-9",
PUBLISHER = "Addison-Wesley",
YEAR = "2005"
}
@BOOK{scse, @BOOK{scse,
AUTHOR = "Fortescue, Swinerd, Stark", AUTHOR = "Fortescue, Swinerd, Stark",
TITLE = "Spacecraft Systems Engineering ISBN:978-0-470-75012-4", TITLE = "Spacecraft Systems Engineering ISBN:978-0-470-75012-4",

View File

@ -1,26 +1,20 @@
\section{Copy dot tex} \section{Introduction}
Msc project Euler/Spider Diagram editor --- Euler/Spider Diagrams
could be used to model failure modes in components.
--- 2005 paper --- need for static analysis because of
high reliability of modern safety critical systems.
\section{Practical Experience: Safety Critical Product Approvals}
FMEA performed on selected areas perceived as critical
by test house.
Blanket measures, RAM ROM checks, EMC, electrical and environmental stress testing
\subsection{Practical limitations of testing for certification vs. rigorous approach}
State explosion problem considering a failure mode of a given component against
all other components in the system.
Impossible to perform double simultaneous failure analysis (as demanded by EN298~\cite{en298}).
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text
sample text

View File

@ -1,7 +1,7 @@
EN61508:6\cite{en61508}[B.6.6] The generic and statistical European Safety Standard, EN61508:6\cite{en61508}[B.6.6]
describes FMEA as: describes Failure Mode Effect Analysis (FMEA) as:
\begin{quotation} \begin{quotation}
"To analyse a system design, by examining all possible sources of failure "To analyse a system design, by examining all possible sources of failure
of a system's components and determining the effects of these failures of a system's components and determining the effects of these failures
@ -10,26 +10,34 @@ on the behaviour and safety of the system."
\section{Concepts} \section{Concepts}
Forward and backward searching... \paragraph{Forward and backward searches}
forward search starts with possible failure causes
and works out what could happen
backward search uses possible failures and works back down (and not necessarily to A forward search starts with possible failure causes
base components in a system) and uses logic and reasoning to determine system level outcomes.
Reasoning distance .... general concept... simple ideas about how complex a A backward search starts with system level events
failure analysis is the more modules and components are involved works back down (and not necessarily to
base components in a system) using de-composition of
of the system and logic.
FMEA based methodologies are forward searches\cite{Lutz:1997:RAU:590564.590572} and top down
methodologies such as FTA~\cite{nucfta,nasafta}
\paragraph{Reasoning distance}
A reasoning distance is the number of stages of logic and reasoning
required to map a failure cause to its potential outcomes.
%.... general concept... simple ideas about how complex a
%failure analysis is the more modules and components are involved
% cite for forward and backward search related to safety critical software % cite for forward and backward search related to safety critical software
\cite{Lutz:1997:RAU:590564.590572} %{sfmeaforwardbackward} %{sfmeaforwardbackward}
\section{F.M.E.A.} \section{FMEA}
\subsection{FMEA} %\subsection{FMEA}
%\tableofcontents[currentsection] %\tableofcontents[currentsection]
FMEA is a broad term; it could mean anything from an informal check on how FMEA is a broad term; it could mean anything from an informal check on how
how failures could affect some equipment in an initial brain-storming session how failures could affect some equipment in an initial brain-storming session
in product design, to formal submissions as part of safety critical certification. in product design, to formal submission as part of safety critical certification.
% %
This chapter describes basic concepts of FMEA, uses a simple example to This chapter describes basic concepts of FMEA, uses a simple example to
demonstrate a single FMEA analysis stage, describes the four main variants of FMEA in use today demonstrate a single FMEA analysis stage, describes the four main variants of FMEA in use today
@ -143,9 +151,9 @@ approach in looking for system failures.
\subsection{The unacceptability of a single component failure causing a catastrophe} \subsection{The unacceptability of a single component failure causing a catastrophe}
FMEA, due to its inductive bottom-up approach, is very good FMEA, due to its inductive bottom-up approach, is very good
at finding potential component failures that could have catastrophic implications. at finding potential single component failures that could have catastrophic implications.
Used in the design phase of a project FMEA is an invaluable tool Used in the design phase of a project FMEA is an invaluable tool
for unearthing these type of failure scenario. for unearthing these failure scenarios.
It is less useful for determining catastrophic events for multiple It is less useful for determining catastrophic events for multiple
simultaneous\footnote{Multiple simultaneous failures are taken to mean failure that occur within the same detection period.} failures. simultaneous\footnote{Multiple simultaneous failures are taken to mean failure that occur within the same detection period.} failures.
@ -154,18 +162,18 @@ simultaneous\footnote{Multiple simultaneous failures are taken to mean failure t
Modern electronic components, are generally very reliable, and the systems built from them Modern electronic components, are generally very reliable, and the systems built from them
are thus very reliable too. Reliable field data on failures will, therefore be sparse. are thus very reliable too. Reliable field data on failures will, therefore be sparse.
Should we wish to prove a continuous demand system for say ${10}^{-7}$ failures\footnote{${10}^{-7}$ failures per hour of operation is the Should we wish to prove a continuous demand system for say ${10}^{-7}$ failures\footnote{${10}^{-7}$ failures per hour of operation is the
threshold for S.I.L. 3 reliability~\cite{en61508}.} threshold for S.I.L. 3 reliability~\cite{en61508}. Failure rates are normally measured per $10^9$ hours of operation
and are know as Failure in Time (FIT) values. The maximum FIT values for a SIL 3 system is therefore 100.}
per hour of operation, even with 1000 correctly monitored units in the field per hour of operation, even with 1000 correctly monitored units in the field
we could only expect one failure per ten thousand hours (a little over one a year). we could only expect one failure per ten thousand hours (a little over one a year).
It would be utterly impractical to get statistically significant data for equipment It would be utterly impractical to get statistically significant data for equipment
at these reliability levels. at these reliability levels.
However, we can use FMEA (more specifically the FMEDA variant, see section~\ref{sec:FMEDA}), working from known component failure rates, to obtain However, we can use FMEA (more specifically the FMEDA variant, see section~\ref{sec:FMEDA}),
working from known component failure rates, to obtain
statistical estimates of the equipment reliability. statistical estimates of the equipment reliability.
\subsection{Rigorous FMEA --- State Explosion Problem} \subsection{FMEA and the State Explosion Problem}
\paragraph{Rigorous Single Failure FMEA} \paragraph{Rigorous Single Failure FMEA}
@ -251,14 +259,8 @@ number.
Fixing problems with the highest RPN number Fixing problems with the highest RPN number
will return most cost benefit. will return most cost benefit.
% benign example of PFMEA in CARS - make something up. % benign example of PFMEA in CARS - make something up.
\subsection{PFMEA Example} \subsection{PFMEA Example}
\begin{table}[ht] \begin{table}[ht]
\caption{FMEA Calculations} % title of Table \caption{FMEA Calculations} % title of Table
%\centering % used for centering table %\centering % used for centering table
@ -268,97 +270,22 @@ will return most cost benefit.
relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline
% rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\ % rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\
% ruptured f.tank & & & & \\ \hline % ruptured f.tank & & & & \\ \hline
\hline \hline
\end{tabular} \end{tabular}
\end{table} \end{table}
%Savings: 180 burn deaths, 180 serious burn injuries, 2,100 burned vehicles. Unit Cost: $200,000 per death, $67,000 per injury, $700 per vehicle.
%Total Benefit: 180 X ($200,000) + 180 X ($67,000) + $2,100 X ($700) = $49.5 million.
%COSTS
%Sales: 11 million cars, 1.5 million light trucks.
%Unit Cost: $11 per car, $11 per truck.
%Total Cost: 11,000,000 X ($11) + 1,500,000 X ($11) = $137 million.
%\subsection{Production FMEA : Example Ford Pinto : 1975}
\subsection{PFMEA Example: Ford Pinto: 1975}
\begin{figure}[h]
\centering
\includegraphics[width=300pt]{./CH2_FMEA/ad_ford_pinto_mpg_red_3_1975.jpg}
% ad_ford_pinto_mpg_red_3_1975.jpg: 720x933 pixel, 96dpi, 19.05x24.69 cm, bb=0 0 540 700
\caption{Ford Pinto Advert}
\label{fig:fordpintoad}
\end{figure}
\begin{figure}[h]
\centering
\includegraphics[width=300pt]{./CH2_FMEA/burntoutpinto.png}
% burntoutpinto.png: 376x250 pixel, 72dpi, 13.26x8.82 cm, bb=0 0 376 250
\caption{Burnt Out Pinto}
\label{fig:burntoutpinto}
\end{figure}
\begin{table}[ht]
\caption{FMEA Calculations} % title of Table
%\centering % used for centering table
\begin{tabular}{|| l | l | c | c | l ||} \hline
\textbf{Failure Mode} & \textbf{P} & \textbf{Cost} & \textbf{Symptom} & \textbf{RPN} \\ \hline \hline
relay 1 n/c & $1*10^{-5}$ & 38.0 & indicators fail & 0.00038 \\ \hline
relay 2 n/c & $1*10^{-5}$ & 98.0 & doorlocks fail & 0.00098 \\ \hline
rear end crash & $14.4*10^{-6}$ & 267,700 & fatal fire & 3.855 \\
ruptured f.tank & & & allow & \\ \hline
rear end crash & $1$ & $11$ & recall & 11.0 \\
ruptured f.tank & & & fix tank & \\ \hline
\hline
\end{tabular}
\end{table}
% don't think this is relevant for the thesis: http://www.youtube.com/watch?v=rcNeorjXMrE
\section{FMECA - Failure Modes Effects and Criticality Analysis} \section{FMECA - Failure Modes Effects and Criticality Analysis}
\subsection{ FMECA - Failure Modes Effects and Criticality Analysis}
% \begin{figure}
% \centering
\subsection{ FMECA - Failure Modes Effects and Criticallity Analysis} % %\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg}
\begin{figure} % \includegraphics[width=300pt]{./CH2_FMEA/A10_thunderbolt.jpg}
\centering % % military-aircraft-desktop-computer-wallpaper-missile-launch.jpg: 1024x768 pixel, 300dpi, 8.67x6.50 cm, bb=0 0 246 184
%\includegraphics[width=100pt]{./military-aircraft-desktop-computer-wallpaper-missile-launch.jpg} % \caption{A10 Thunderbolt}
\includegraphics[width=300pt]{./CH2_FMEA/A10_thunderbolt.jpg} % \label{fig:f16missile}
% military-aircraft-desktop-computer-wallpaper-missile-launch.jpg: 1024x768 pixel, 300dpi, 8.67x6.50 cm, bb=0 0 246 184 % \end{figure}
\caption{A10 Thunderbolt}
\label{fig:f16missile}
\end{figure}
Emphasis on determining criticality of failure. Emphasis on determining criticality of failure.
Applies some Bayesian statistics (probabilities of component failures and those thereby causing given system level failures). Applies some Bayesian statistics (probabilities of component failures and those thereby causing given system level failures).
@ -538,7 +465,7 @@ by statistically determining how frequently it can fail dangerously.
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis} \subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
{
\begin{table}[ht] \begin{table}[ht]
\caption{FMEA Calculations} % title of Table \caption{FMEA Calculations} % title of Table
%\centering % used for centering table %\centering % used for centering table
@ -612,7 +539,36 @@ judged to be in critical sections of the product.
\section{Software FMEA (SFMEA)} \section{Literature Review}
%% FOCUS
The focus of this literature review is to establish the practice and applications
of FMEA, and to examine its strengths and weaknesses.
%% GOAL
Its
goal is to identify central issues and to criticise and assess the current
FMEA methodologies.
%% PERSPECTIVE
The perspective of the author, is as a practitioner of static failure mode analysis techniques
concerning approval of product
to European safety standards, both the prescriptive~\cite{en298,en230} and statistical~\cite{en61508}.
A second perspective is that of a software engineer trained to use formal methods.
Examining FMEA methodologies for mathematical properties, influenced by
formal methods applied to software, should provide an angle not traditionally considered.
%% COVERAGE
The literature reviewed, has been restricted to published books, European safety standards (as examples
of current safety measures applied), and traditional research, from journal and conference papers.
%% ORGANISATION
The review is organised by concept, that is, FMEA can be applied to hardware, software, software~interfacing and
to multiple failure scenarios etc. Methodologies related to FMEA are briefly covered for the sake of context.
%% AUDIENCE
% Well duh! PhD supervisors and examiners....
\subsection{Related Methodologies}
FTA --- HAZOP --- ALARP --- Event Tree Analysis --- bow tie concept
\subsection{Hardware FMEA (HFMEA)}
\subsection{Multiple Failure scenarios and FMEA}
\subsection{Software FMEA (SFMEA)}
\paragraph{Current work on Software FMEA} \paragraph{Current work on Software FMEA}
@ -635,7 +591,7 @@ would give a better picture of the failure mode behaviour, it
is by no means a rigorous approach to tracing errors that may occur in hardware is by no means a rigorous approach to tracing errors that may occur in hardware
through to the top (and therefore ultimately controlling) layer of software. through to the top (and therefore ultimately controlling) layer of software.
\subsection{Current FMEA techniques are not suitable for software} \paragraph{Current FMEA techniques are not suitable for software}
The main FMEA methodologies are all based on the concept of taking The main FMEA methodologies are all based on the concept of taking
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}. base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
@ -659,468 +615,29 @@ external influences such as
ionising radiation causing bits to be erroneously altered. ionising radiation causing bits to be erroneously altered.
\paragraph{A more-complete Failure Mode Model}
% HFMEA
% SFMEA
% VARIABLE CURRUPTION
% MICRO PROCESSOR FAULTS
% INTERFACE ANALYSIS
%
% add them all together --- a load of bollocks, lots of impressive inches of reports that no one will be bothered to read....
%
In order to obtain a more complete failure mode model of
a hybrid electronic/software system we need to analyse
the hardware, the software, the hardware the software runs on (i.e. the software's medium),
and the software/hardware interface.
%
HFMEA is a well established technique and needs no further description in this paper.
\section{Example for analysis} % : How can we apply FMEA}
For the purpose of example, we chose a simple common safety critical industrial circuit
that is nearly always used in conjunction with a programmatic element.
A common method for delivering a quantitative value in analogue electronics is
to supply a current signal to represent the value to be sent~\cite{aoe}[p.934].
Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale,
and this is referred to as {\ft} signalling.
%
{\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20].
Thus resistance in the wires between the source and the receiving end is not an issue
that can alter the accuracy of the signal.
%
This circuit has many advantages for safety. If the signal becomes disconnected
it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range,
and is therefore easy to detect as an error rather than an incorrect value.
%
Should the driving electronics go wrong at the source end, it will usually
supply far too little or far too much current, making an error condition easy to detect.
%
At the receiving end, one needs a resistor to convert the
current signal into a voltage that we can read with an ADC.%
%we only require one simple component to convert the
%BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP
\begin{figure}[h]
\centering
\includegraphics[width=250pt]{./CH2_FMEA/ftcontext.png}
% ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385
\caption{Context Diagram for {\ft} loop}
\label{fig:ftcontext}
\end{figure}
The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending a {\ft}
signal to a micro-controller system.
The signal is locally driven over a load resistor, and then read into the micro-controller via
an ADC and its multiplexer.
With the voltage detected at the ADC the multiplexer we read the intended quantitative
value from the external equipment.
\subsection{Simple Software Example}
Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$)
representing the value intended by the current detected, with an additional error indication flag to indicate the validity
of the value returned.
%
Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage
from an ADC into the software.
Let us define any value outside the 4mA to 20mA range as an error condition.
%
As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $$0.004A * \ohms{220} = 0.88V $$
and $$0.020A * \ohms{220} = 4.4V \;.$$
%
Our acceptable voltage range is therefore
%
$$(V \ge 0.88) \wedge (V \le 4.4) \; .$$
This voltage range forms our input requirement.
%
We can now examine a software function that performs a conversion from the voltage read to
a per~mil representation of the {\ft} input current.
%
For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is
used\footnote{ C coding examples use the Misra~\cite{misra} and SIL-3 recommended language constraints~\cite{en61508}.}.
We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision
value representing the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}).
%%{\vbox{
\begin{figure}[h+]
\footnotesize
\begin{verbatim}
/***********************************************/
/* read_4_20_input() */
/***********************************************/
/* Software function to read 4mA to 20mA input */
/* returns a value from 0-999 proportional */
/* to the current input. */
/***********************************************/
int read_4_20_input ( int * value ) {
double input_volts;
int error_flag;
/* set ADC MUX with input to read from */
input_volts = read_ADC(INPUT_4_20_mA);
if ( input_volts < 0.88 || input_volts > 4.4 ) {
error_flag = 1; /* Error flag set to TRUE */
}
else {
*value = (input_volts - 0.88) * ( 4.4 - 0.88 ) * 999.0;
error_flag = 0; /* indicate current input in range */
}
/* ensure: value is proportional (0-999) to the
4 to 20mA input */
return error_flag;
}
\end{verbatim}
%}
%}
\caption{Software Function: \textbf{read\_4\_20\_input}}
\label{fig:code_read_4_20_input}
%\label{fig:420i}
\end{figure}
We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a
voltage for a given ADC channel.
%
This function
deals directly with the hardware in the micro-controller on which we are running the software.
%
Its job is to select the correct channel (ADC multiplexer) and then to initiate a
conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}).
%
It takes the raw ADC reading and converts it into a
floating point\footnote{the type `double' or `double precision' is a
standard C language floating point type~\cite{DBLP:books/ph/KernighanR88}.}
voltage value.
%{\vbox{
\begin{figure}[h+]
\footnotesize
\begin{verbatim}
/***********************************************/
/* read_ADC() */
/***********************************************/
/* Software function to read voltage from a */
/* specified ADC MUX channel */
/* Assume 10 ADC MUX channels 0..9 */
/* ADC_CHAN_RANGE = 9 */
/* Assume ADC is 12 bit and ADCRANGE = 4096 */
/* returns voltage read as double precision */
/***********************************************/
double read_ADC( int channel ) {
int timeout = 0;
/* return out of range result */
/* if invalid channel selected */
if ( channel > ADC_CHAN_RANGE )
return -2.0;
/* set the multiplexer to the desired channel */
ADCMUX = channel;
ADCGO = 1; /* initiate ADC conversion hardware */
/* wait for ADC conversion with timeout */
while ( ADCGO == 1 || timeout < 100 )
timeout++;
if ( timeout < 100 )
dval = (double) ADCOUT * 5.0 / ADCRANGE;
else
dval = -1.0; /* indicate invalid reading */
/* return voltage as a floating point value */
/* ensure: value is voltage input to within 0.1% */
return dval;
}
\end{verbatim}
\caption{Software Function: \textbf{read\_ADC}}
\label{fig:code_read_ADC}
\end{figure}
%}
%}
We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input}
calls {\em read\_ADC}, which in turn interacts with the hardware/electronics.
%shown in figure~\ref{fig:ct1}.
%
% \begin{figure}[h]
% \centering
% \includegraphics[width=56pt]{./ct1.png}
% % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224
% \caption{Call tree for software example}
% \label{fig:ct1}
% \end{figure}
%
This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the
software is reading values from the `lower~level' electronics.
%
%FMEA is always a bottom-up process and so we must begin with this hardware.
%
The hardware is simply a load resistor, connected across an ADC input
pin on the micro-controller and ground.
%
We can identify the resistor and the ADC module of the micro-controller as
the base components in this design.
%
We now apply FMMD starting with the hardware.
\section{Failure Mode effects Analysis}
Four emerging and current techniques are now used to
apply FMEA to the hardware, the software, the software medium and the software hardware insterface.
\subsection{Hardware FMEA}
The hardware FMEA requires that for each component we consider all failure modes
and the putative effect those failure modes would have on the system.
The electronic components in our {\ft} system are the load resistor,
the multiplexer and the analogue to digital converter.
{
\tiny
\begin{table}[h+]
\caption{Hardware FMEA {\ft}} % title of Table
\label{tbl:r420i}
\begin{tabular}{|| l | c | l ||} \hline
\textbf{Failure} & \textbf{failure} & \textbf{System} \\
\textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline
\hline
$R$ & OPEN~\cite{en298}[Ann.A] & $LOW$ \\
& & $READING$ \\ \hline
$R$ & SHORT~\cite{en298}[Ann.A] & $HIGH$ \\
& & $READING$ \\ \hline
$MUX$ & read wrong & $VAL\_ERROR$ \\
& input ~\cite{fmd91}[3-102] & \\ \hline
$ADC$ & ADC output & $VAL\_ERROR$ \\
& erronous ~\cite{fmd91}[3-109] & \\ \hline
\hline
\end{tabular}
\end{table}
}
The last two failures both lead to the system failure of $VAL\_ERROR$ .
They could lead to low or high reading as well, but we would only be able to determine this
from knowledge of the software systems criteria for these.
\clearpage
\subsection{Software FMEA - variables in place of components}
For software FMEA, we take the variables used by the system,
and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}.
From the function $read\_4\_20\_input()$ we have the variables $error\_flag$,
$input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$.
We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}.
{
\tiny
\begin{table}[h+]
\caption{SFMEA {\ft}} % title of Table
\label{tbl:sfmea}
\begin{tabular}{|| l | c | l ||} \hline
\textbf{Failure} & \textbf{failure} & \textbf{System} \\
\textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline
\hline
$error\_flag$ & set FALSE & $VAL\_ERROR$ \\
& & \\ \hline
$error\_flag$ & set TRUE & invalid \\
& & error flag \\ \hline
$input\_volts$ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$value $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$timeout $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$ADCMUX $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$ADCGO $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$dval $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
\hline
\end{tabular}
\end{table} xe
}
\clearpage
\subsection{Software FMEA - failure modes of the medium ($\mu P$) of the software}
Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM
EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and
oscillator clock timing
{
\tiny
\begin{table}[h+]
\caption{SFMEA {\ft}} % title of Table
\label{tbl:sfmeaup}
\begin{tabular}{|| l | c | l ||} \hline
\textbf{Failure} & \textbf{failure} & \textbf{System} \\
\textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline
\hline
$RAM$ & variable & All errors \\
& corruption & from table~\ref{tbl:sfmea} \\ \hline
$RAM$ & proxegram flow & process \\
& & halts / crashes \\ \hline
$OSC$ & stopped & process \\
& & halts \\ \hline
$OSC$ & too & ADC \\
& fast & value errors \\ \hline
$OSC$ & too & ADC \\
& slow & value errors \\ \hline
$ROM$ & program & All errors \\
& corruption & from table~\ref{tbl:sfmea} \\ \hline
$ROM$ & constant & All errors \\
& /data corruption & from table~\ref{tbl:sfmea} \\ \hline
\hline
\end{tabular}
\end{table}
}
\clearpage
\subsection{Software FMEA - The software/hardware interface}
As FMEA is applied separately to software and hardware
the interface between them is an undefined factor.
Ozarin~\cite{sfmeainterface,procsfmea} recommends that an FMEA report be written
to focus on the software/hardware interface.
The software/hardware interface has
specific problems common to many systems and configurations
and these are described in~\cite{sfmeainterface}.
%An interface FMEA is performed in table~\ref{hwswinterface}.
%
The hardware to software interface for the {\ft} example is handled
by the 'C' function $read\_ADC()$
(see code sample in figure~\ref{fig:code_read_ADC}).
%
% An FMEA of the `software~medium' is given in table~\ref{tbl:sfmeaup}.
\paragraph{Timing and Synchronisation.}
The $ADCOUT$ register, where the raw ADC value is read
is an internal register used by the ADC and presented
as a readable memory location when the ADC
has finished updating it.
Reading it at the wrong time would
cause an invalid value to be read.
The synchronisation is performed by polling an $ADCGO$
bit, a flag mapped to memory by which the ADC indicates that the data is ready.
\paragraph{Interrupt Contention.}
Were an interrupt to also attempt to read from the ADC
the ADCMUX could be altered, causing the non-interrupt
routine to read from the wrong channel.
\paragraph{Data Formatting.}
The ADC may use a big-endian or little endian integer
format. It may also right or left justify the bits in its value.
\subsection{SFMEA Conclusion}
%
This paper has picked a very simple example (the industry standard {\ft}
input circuit and software) to demonstrate
SFMEA and HFMEA methodologies used to describe a failure mode model.
%Even a modest system would be far too large to analyse in conference paper
%and this
%
%The {\dc} representing the {\ft} reader
%shows that by taking a
%modular approach for FMEA, i.e. FMMD, we can integrate
Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from
model the system from several failure mode perspectives.
%
With traditional FMEA methods the reasoning~distance is large, because
it stretches from the component failure mode to the top---or---system level failure.
%
With these four analysis reports
we do not have stages along the `reasoning~path' linking the failure modes from the
electronics to those in the software.
%Software is often written `defensively' but t
%Each {\fg} to {\dc} transition represents a
%reasoning stage.
%
%
%For this reason applying traditional FMEA to software stretches
%the reasoning distance even further.
%
In fact many these reasoning paths overlap---or even by-pass one another---
it is very difficult to gauge cause and effect.
For instance, hardware failures are not analysed in the context of how they will
be handled (or missed) by the software.
%
System outputs commanded from software may not take into account particular
hardware limitations etc.
The interface FMEA does serve to provide a useful
check-list to ensure data and synchronisation conventions used by the hardware
and software are not mismatched. However, the fact it is perceived as required %The fact its required
highlights the the miss-matches possible between the two types of analysis
which could run deeper than the mere interface level.
However, while these techniques ensure that the software and hardware is
viewed and analysed from several perspectives, it cannot be termed a homogeneous
failure mode model.
% For instance
% were the ADC to have a small value error, say adding
% a small percentage onto the value, we would be unable to
% detect this under the analysis conditions for this model, or
% be able to pinpoint it.
% %
\section{Conclusion} \section{Conclusion}
\paragraph{Where FMEA is now}
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical --- FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
very good with single failure modes linked to top level events. very good with single failure modes linked to top level events.
FMEA has become part of the safety critical and safety certification industries. FMEA has become part of the safety critical and safety certification industries.
%
SFMEA is in its infancy, but there is a gap in current SFMEA is in its infancy, but there is a gap in current
certification for software, EN61508, recommends hardware redundancy architectures in conjunction certification for software, EN61508~\cite{en61508}, recommends hardware redundancy architectures in conjunction
with FMEDA for hardware: for software it recommends language constraints and quality procedures with FMEDA for hardware: for software it recommends language constraints and quality procedures
but no inductive fault finding technique. but no inductive fault finding technique.
FMEA has adapted from a cost saving exercise for mass produced items, to incorporating statistical techniques
(FMECA) to allowing for self diagnostic mitigation (FMEDA).
However, it is still based on the single component failure mapped to system level failure.
All these FMEA based methodologies have the following short comings:
\begin{itemize}
\item Impossible to integrate Software and hardware models,
\item State explosion problem exacerbated by increasing complexity due to density of modern electronics,
\item Impossibility to consider all multiple component failure modes
\end{itemize}