Robin_PHD/papers/fmea_software_hardware/software_fmea.tex
2012-07-29 21:14:21 +01:00

691 lines
26 KiB
TeX

%%% OUTLINE
\documentclass[twocolumn]{article}
%\documentclass[twocolumn,10pt]{report}
\usepackage{graphicx}
\usepackage{fancyhdr}
%\usepackage{wassysym}
\usepackage{tikz}
\usepackage{amsfonts,amsmath,amsthm}
\usetikzlibrary{shapes.gates.logic.US,trees,positioning,arrows}
%\input{../style}
\usepackage{ifthen}
\usepackage{lastpage}
\usetikzlibrary{shapes,snakes}
\newcommand{\tickYES}{\checkmark}
\newcommand{\fc}{fault~scenario}
\newcommand{\fcs}{fault~scenarios}
\date{}
%\renewcommand{\encodingdefault}{T1}
%\renewcommand{\rmdefault}{tnr}
%\newboolean{paper}
%\setboolean{paper}{true} % boolvar=true or false
\newcommand{\derivec}{{D}}
\newcommand{\ft}{\ensuremath{4\!\!\rightarrow\!\!20mA} }
\newcommand{\permil}{\ensuremath{{ }^0/_{00}}}
\newcommand{\oc}{\ensuremath{^{o}{C}}}
\newcommand{\adctw}{{${\mathcal{ADC}}_{12}$}}
\newcommand{\adcten}{{${\mathcal{ADC}}_{10}$}}
\newcommand{\ohms}[1]{\ensuremath{#1\Omega}}
\newcommand{\fm}{failure~mode}
\newcommand{\fms}{failure~modes}
\newcommand{\fg}{functional~grouping}
\newcommand{\FG}{\mathcal{G}}
\newcommand{\DC}{\mathcal{DC}}
\newcommand{\fgs}{functional~groupings}
\newcommand{\dc}{derived~component}
\newcommand{\dcs}{derived~components}
\newcommand{\bc}{base~component}
\newcommand{\FMMD}{ModularFMEA}
\newcommand{\bcs}{base~components}
\newcommand{\irl}{in real life}
\newcommand{\enc}{\ensuremath{\stackrel{enc}{\longrightarrow}}}
\newcommand{\pin}{\ensuremath{\stackrel{pi}{\longleftrightarrow}}}
%\newcommand{\pic}{\em pure~intersection~chain}
\newcommand{\pic}{\em pair-wise~intersection~chain}
\newcommand{\wrt}{\em with~respect~to}
\newcommand{\abslevel}{\ensuremath{\Psi}}
\newcommand{\fmmdgloss}{\glossary{name={FMMD},description={Failure Mode Modular De-Composition, a bottom-up methodolgy for incrementally building failure mode models, using a procedure taking functional groups of components and creating derived components representing them, and in turn using the derived components to create higher level functional groups, and so on, that are used to build a failure mode model of a system}}}
\newcommand{\fmodegloss}{\glossary{name={failure mode},description={The way in which a failure occurs. A component or sub-system may fail in a number of ways, and each of these is a
failure mode of the component or sub-system}}}
\newcommand{\fmeagloss}{\glossary{name={FMEA}, description={Failure Mode and Effects analysis (FMEA) is a process where each potential failure mode within a system, is analysed to determine system level failure modes, and to then classify them {\wrt} perceived severity}}}
\newcommand{\frategloss}{\glossary{name={failure rate}, description={The number of failure within a population (of size N), divided by N over a given time interval}}}
\newcommand{\pecgloss}{\glossary{name={PEC},description={A Programmable Electronic controller, will typically consist of sensors and actuators interfaced electronically, with some firmware/software component in overall control}}}
\newcommand{\bcfm}{base~component~failure~mode}
\def\layersep{1.8cm}
\newboolean{pld}
\setboolean{pld}{false} % boolvar=true or false : draw analysis using propositional logic diagrams
\newboolean{dag}
\setboolean{dag}{true} % boolvar=true or false : draw analysis using directed acylic graphs
\setlength{\topmargin}{0in}
\setlength{\headheight}{0in}
\setlength{\headsep}{0in}
\setlength{\textheight}{22cm}
\setlength{\textwidth}{18cm}
%\setlength{\textheight}{24.35cm}
%\setlength{\textwidth}{20cm}
\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\parindent}{0.0in}
%\setlength{\parskip}{6pt}
% \setlength{\parskip}{1cm plus4mm minus3mm}
\setlength{\parskip}{0pt}
\setlength{\parsep}{0pt}
\setlength{\headsep}{0pt}
\setlength{\topskip}{0pt}
\setlength{\topmargin}{0pt}
\setlength{\topsep}{0pt}
\setlength{\partopsep}{0pt}
\setlength{\itemsep}{1pt}
% \renewcommand\subsection{\@startsection
% {subsection}{2}{0mm}%
% {-\baslineskip}
% {0.5\baselineskip}
% {\normalfont\normalsize\itshape}}%
\linespread{0.953}
\begin{document}
%\pagestyle{fancy}
%\fancyhf{}
%\fancyhead[LO]{}
%\fancyhead[RE]{\leftmark}
%\cfoot{Page \thepage\ of \pageref{LastPage}}
%\rfoot{\today}
%\lhead{Developing a rigorous bottom-up modular static failure mode modelling methodology}
%\lhead{Developing a rigorous bottom-up modular static failure modelling methodology}
% numbers at outer edges
\pagenumbering{arabic} % Arabic page numbers hereafter
\author{R.Clark$^\star$, A.~Fish$^\dagger$ , C.~Garrett$^\dagger$, J.~Howse$^\dagger$ \\
$^\star${\em Energy Technology Control, UK. r.clark@energytechnologycontrol.com} \and $^\dagger${\em University of Brighton, UK}
}
%\title{Developing a rigorous bottom-up modular static failure mode modelling methodology}
\title{Applying Failure Mode Effects Analysis (FMEA) to Software/Hardware Hybrid Systems}
%\nodate
\maketitle
\paragraph{Keywords:} static failure mode modelling; safety-critical; software fmea
%\small
\abstract{ % \em
%\input{abs}
%The certification process of safety critical products for European and
%other international standards often demand environmental stress,
%endurance and Electro Magnetic Compatibility (EMC) testing. Theoretical, or 'static testing',
%is often also required.
%
%Failure Mode Effects Analysis (FMEA), is a bottom-up technique that aims to assess the effect all
%component failure modes on a system.
%It is used both as a design tool (to determine weaknesses), and is a requirement of certification of safety critical products.
%FMEA has been successfully applied to mechanical, electrical and hybrid electro-mechanical systems.
%
%Work on software FMEA (SFMEA) is beginning, but
%at present no technique for SFMEA that
%integrates hardware and software models % known to the authors
%exists.
% %
%
%Failure modes in components in say a sensor, could be traced
%up through the electronics and then through the controlling software.
%
%Presently Failure Mode Effects Analysis (FMEA), stops at the glass ceiling of the computer program.
%
This paper presents a worked example of FMEA applied to an
integrated electronics/software system.
%
FMEA methodologies trace from the 1940's and were designed to
model simple electro-mechanical systems.
%
Software generally sits on top of most modern safety critical control systems
and defines its most important system wide behaviour and communications.
%
Currently standards that demand FMEA for hardware(HFMEA) (e.g. EN298, EN61508),
do not specify it for software, but instead specify good practise,
review processes and language feature constraints.
%
This is a weakness.
%
Where HFMEA % scientifically
traces component {\fms}
to resultant system failures, software until recently, has been left in a non-analytical
limbo of best practises and constraints.
Software FMEA has been proposed
in several forms. SFMEA is always performed separately from HFMEA.
%
This paper seeks to examine the effectiveness of current and proposed SFMEA
techniques, by using a analysing the chosen example, which is well known and understood
from years of field experience, and determining how well the HFMEA and SFMEA
analysis reports model the failure mode behaviour.
% %
%If software and hardware integrated FMEA were possible, electro-mechanical-software hybrids could
%be modelled, and so we could consider `complete' failure mode models.
%
%Presently FMEA, stops at the glass ceiling of the computer program: FMMD seeks to address
%this, and offers additional test efficiency benefits.
}
%\today
\nocite{en298}
\nocite{en61508}
\section{Introduction}
{
%This paper describes a modular FMEA process that can be applied to software.
%This modular variant of FMEA is called Failure Mode Modular de-composition (FMMD).
%
%Because this process is based on failure modes of components,
%it can be applied to electrical and/or mechanical systems.
%
%The hierarchical structure of software is then examined,
%and definitions from contract programming are used
%to define failure modes and failure symptoms for
%software functions.
%
%With these definitions we can apply the FMMD modular form of FMEA
%to existing software\footnote{Existing software excluding recursive~\cite{misra}[16.2] code,
%and unstructured non-functional language.}.
}
\section{FMEA Background}
%What FMEA is, briefly variants...
Failure Mode Effects Analysis is the process of taking
component failure modes, %and by reasoning,
tracing their effects through a system
and determining what system level failure modes could be caused.
%
FMEA dates from the 1940s where simple electro-mechanical systems were the norm.
Modern control systems nearly always have a significant software/firmware element,
and not being able to model software with current FMEA methodologies
is a cause for criticism~\cite{safeware}[Ch.12].
%Software FMEA techniques have been proposed
%Several variants of FMEA exist,
% traditional FMEA being associated with the manufacturing industry, with the aims of prioritising
% the failures to fix in order of cost.
%
% Deisgn FMEA (DFMEA) is FMEA applied at the design or approvals stage
% where the aim is to ensure that single component failures cannot
% cause unacceptable system level events.
%
% Failure Mode Effect Criticality Analysis (FMECA) is applied to determine the most potentially dangerous or damaging
% failure modes to fix.
%
%
% Failure Mode Effects and Diagnostics Analysis, is FMEA peformed to
% determine a statistical level of safety.
% This is associated with Safety Integrity Levels (SIL)~\cite{en61508}~\cite{en61511} classification.
%
%FMMD is a modularisation of FMEA and can produce failure~mode models that can be used in
%all the above variants of FMEA.
\paragraph{Current work on Software FMEA}
SFMEA usually does not seek to integrate
hardware and software models, but to perform
FMEA on the software in isolation~\cite{procsfmea}.
%
Work has been performed using databases
to track the relationships between variables
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
and FMEA (bottom-up inductive)
to be performed on the same system to provide insight into the
software hardware/interface~\cite{embedsfmea}.
%
Although this
would give a better picture of the failure mode behaviour, it
is by no means a rigorous approach to tracing errors that may occur in hardware
through to the top (and therefore ultimately controlling) layer of software.
\subsection{Current FMEA techniques are not suitable for software}
The main FMEA methodologies are all based on the concept of taking
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
%
In a complicated system, mapping a component failure mode to a system level failure
will mean a long reasoning distance; that is to say the actions of the
failed component will have to be traced through
several sub-systems, gauging its effects with other components.
%
With software at the higher levels of these sub-systems,
we have yet another layer of complication.
%
In order to integrate software, %in a meaningful way
we need to re-think the
FMEA concept of simply mapping a base component failure to a system level event.
%
SFMEA regards the components to be the variables used by the programs.
These variables could become erroneously over-written,
by calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor it is running on, or
by radiation causing bits to be erroneously altered.
\paragraph{A more-complete Failure Mode Model}
% HFMEA
% SFMEA
% VARIABLE CURRUPTION
% MICRO PROCESSOR FAULTS
% INTERFACE ANALYSIS
%
% add them all together --- a load of bollocks, lots of impressive inches of reports that no one will be bothered to read....
%
In order to obtain a more complete failure mode model of
a hybrid electronic/software system we need to analyse
the hardware, the software, the hardware the software runs on,
and the software hardware interface.
%
HFMEA is a well established technique and needs no further description in this paper.
\section{Example for analysis} % : How can we apply FMEA}
For the purpose of example, we chose a simple common safety critical industrial circuit
that is nearly always used in conjunction with a programmatic element.
A common method for delivering a quantitative value in analogue electronics is
to supply a current signal to represent the value to be sent~\cite{aoe}[p.934].
Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale,
and this is referred to as {\ft} signalling.
%
{\ft} has an electrical advantage as well because the current in a loop is constant~\cite{aoe}[p.20].
Thus resistance in the wires between the source and the receiving end is not an issue
that can alter the accuracy of the signal.
%
This circuit has many advantages for safety. If the signal becomes disconnected
it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range,
and is therefore easy to detect as an error rather than an incorrect value.
%
Should the driving electronics go wrong at the source end, it will usually
supply far too little or far too much current, making an error condition easy to detect.
%
At the receiving end, one needs a resistor to convert the
current signal into a voltage that we can read with an ADC.%
%we only require one simple component to convert the
%BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP
\begin{figure}[h]
\centering
\includegraphics[width=200pt]{./ftcontext.png}
% ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385
\caption{Context Diagram for {\ft} loop}
\label{fig:ftcontext}
\end{figure}
The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending a {\ft}
signal to a micro-controller system.
The signal is locally driven over a load resistor, and then read into the micro-controller via
an ADC and its multiplexer.
With the voltage detected at the ADC the multiplexer can read the intended quantitative
value from the external equipment.
\subsection{Simple Software Example}
Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$)
representing the current detected with an additional error indication flag.
%
Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage
from an ADC into the software.
Let us define any value outside the 4mA to 20mA range as an error condition.
%
As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $0.004A * \ohms{220} = 0.88V$
and $0.020A * \ohms{220} = 4.4V$.
%
Our acceptable voltage range is therefore
%
$(V \ge 0.88) \wedge (V \le 4.4) \; .$
This voltage range forms our input requirement.
%
We can now examine a software function that performs a conversion from the voltage read to
a per~mil representation of the {\ft} input current.
%
For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is
used\footnote{ C coding examples use the Misra~\cite{misra} and SIL-3 recommended language constraints~\cite{en61508}.}.
We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision
value representing the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}).
%%{\vbox{
\begin{figure}[h+]
\tiny
\begin{verbatim}
/***********************************************/
/* read_4_20_input() */
/***********************************************/
/* Software function to read 4mA to 20mA input */
/* returns a value from 0-999 proportional */
/* to the current input. */
/***********************************************/
int read_4_20_input ( int * value ) {
double input_volts;
int error_flag;
/* set ADC MUX with input to read from */
input_volts = read_ADC(INPUT_4_20_mA);
if ( input_volts < 0.88 || input_volts > 4.4 ) {
error_flag = 1; /* Error flag set to TRUE */
}
else {
*value = (input_volts - 0.88) * ( 4.4 - 0.88 ) * 999.0;
error_flag = 0; /* indicate current input in range */
}
/* ensure: value is proportional (0-999) to the
4 to 20mA input */
return error_flag;
}
\end{verbatim}
%}
%}
\caption{Software Function: \textbf{read\_4\_20\_input}}
\label{fig:code_read_4_20_input}
%\label{fig:420i}
\end{figure}
We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a
voltage for a given ADC channel.
%
This function
deals directly with the hardware in the micro-controller on which we are running the software.
%
Its job is to select the correct channel (ADC multiplexer) and then to initiate a
conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}).
%
It takes the raw ADC reading and converts it into a
floating point\footnote{the type `double' or `double precision' is a
standard C language floating point type~\cite{DBLP:books/ph/KernighanR88}.}
voltage value.
%{\vbox{
\begin{figure}[h+]
\tiny
\begin{verbatim}
/***********************************************/
/* read_ADC() */
/***********************************************/
/* Software function to read voltage from a */
/* specified ADC MUX channel */
/* Assume 10 ADC MUX channels 0..9 */
/* ADC_CHAN_RANGE = 9 */
/* Assume ADC is 12 bit and ADCRANGE = 4096 */
/* returns voltage read as double precision */
/***********************************************/
double read_ADC( int channel ) {
int timeout = 0;
/* return out of range result */
/* if invalid channel selected */
if ( channel > ADC_CHAN_RANGE )
return -2.0;
/* set the multiplexer to the desired channel */
ADCMUX = channel;
ADCGO = 1; /* initiate ADC conversion hardware */
/* wait for ADC conversion with timeout */
while ( ADCGO == 1 || timeout < 100 )
timeout++;
if ( timeout < 100 )
dval = (double) ADCOUT * 5.0 / ADCRANGE;
else
dval = -1.0; /* indicate invalid reading */
/* return voltage as a floating point value */
/* ensure: value is voltage input to within 0.1% */
return dval;
}
\end{verbatim}
\caption{Software Function: \textbf{read\_ADC}}
\label{fig:code_read_ADC}
\end{figure}
%}
%}
We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input}
calls {\em read\_ADC}, which in turn interacts with the hardware/electronics.
%shown in figure~\ref{fig:ct1}.
%
% \begin{figure}[h]
% \centering
% \includegraphics[width=56pt]{./ct1.png}
% % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224
% \caption{Call tree for software example}
% \label{fig:ct1}
% \end{figure}
%
This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the
software is reading values from the `lower~level' electronics.
%
FMEA is always a bottom-up process and so we must begin with this hardware.
%
The hardware is simply a load resistor, connected across an ADC input
pin on the micro-controller and ground.
%
We can identify the resistor and the ADC module of the micro-controller as
the base components in this design.
%
We now apply FMMD starting with the hardware.
\section{Hardware FMEA}
The hardware FMEA requires that for each component, we consider all failure modes,
and the putative effect those failure modes would have on the system.
The electronic components in our {\ft} system are, the load resistor
the multiplexer and the analogue to digital converter.
{
\tiny
\begin{table}[h+]
\caption{Hardware FMEA {\ft}} % title of Table
\label{tbl:r420i}
\begin{tabular}{|| l | c | l ||} \hline
\textbf{Failure} & \textbf{failure} & \textbf{System Failure} \\
\textbf{Scenario} & \textbf{effect} & \\ \hline
\hline
$R$ & OPEN~\cite{en298}[Ann.A] & $LOW$ \\
& & $READING$ \\ \hline
$R$ & SHORT~\cite{en298}[Ann.A] & $HIGH$ \\
& & $READING$ \\ \hline
$MUX$ & read wrong & $VAL\_ERROR$ \\
& input ~\cite{fmd91}[3-102] & \\ \hline
$ADC$ & ADC output & $VAL\_ERROR$ \\
& erronous ~\cite{fmd91}[3-109] & \\ \hline
\hline
\end{tabular}
\end{table}
}
The last two failures both lead to the system failure of $VAL\_ERROR$ .
They could lead to low or high reading as well, but we would only be able to determine this
from knowledge of the software systems criteria for these.
\section{Software FMEA - variables in place of components}
For software FMEA we take the variables used by the system,
and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}.
From the function $read\_4\_20\_input()$ we have the variables $error\_flag$,
$input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$.
We must now determine putative system failure modes for these variables becoming corrupted.
{
\tiny
\begin{table}[h+]
\caption{SFMEA {\ft}} % title of Table
\label{tbl:sfmea}
\begin{tabular}{|| l | c | l ||} \hline
\textbf{Failure} & \textbf{failure} & \textbf{System Failure} \\
\textbf{Scenario} & \textbf{effect} & \\ \hline
\hline
$error\_flag$ & set FALSE & $VAL\_ERROR$ \\
& & \\ \hline
$error\_flag$ & set TRUE & invalid \\
& & error flag \\ \hline
$input\_volts$ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$value $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$timeout $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$ADCMUX $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$ADCGO $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
$dval $ & corrupted & $VAL\_ERROR$ \\
& & \\ \hline
\hline
\end{tabular}
\end{table}
}
\section{Software FMEA - failure modes of the medium ($\mu P$) of the software}
Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM
EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and
oscillator clock timing~\cite{sfmeaauto}.
{
\tiny
\begin{table}[h+]
\caption{SFMEA {\ft}} % title of Table
\label{tbl:sfmeaup}
\begin{tabular}{|| l | c | l ||} \hline
\textbf{Failure} & \textbf{failure} & \textbf{System Failure} \\
\textbf{Scenario} & \textbf{effect} & \\ \hline
\hline
$RAM$ & variable corruption & All errors \\
& & from table~\ref{tbl:sfmea} \\ \hline
$RAM$ & program flow & process \\
& & halts / crashes \\ \hline
$OSC$ & stopped & process \\
& & halts \\ \hline
$OSC$ & too & ADC \\
& fast & value errors \\ \hline
$OSC$ & too & ADC \\
& slow & value errors \\ \hline
$ROM$ & program & All errors \\
& corruption & from table~\ref{tbl:sfmea} \\ \hline
$ROM$ & constant & All errors \\
& /data corruption & from table~\ref{tbl:sfmea} \\ \hline
\hline
\end{tabular}
\end{table}
}
\section{Software FMEA - The software hardware interface}
As FMEA is applied separately to software and hardware
the interface between them is an undefined factor.
Ozarin~\cite{sfmeainterface} recommends that an FMEA report be written
to focus on the software/hardware interface.
The hardware to software interface for the {\ft} example is handled
by the 'C' function $read\_ADC()$.
\section{Conclusion}
%
The FMMD method has been demonstrated using an the industry standard {\ft}
input circuit and software.
%
The {\dc} representing the {\ft} reader
shows that by taking a
%modular approach for FMEA, i.e. FMMD, we can integrate
four FMEA reports we can model the failure mode behaviour from
several perspectives, for
software and electrical systems% models.
%
With this analysis
we have stages along the `reasoning~path' linking the failure modes from the
electronics to those in the software.
Each {\fg} to {\dc} transition represents a
reasoning stage.
%
%
With traditional FMEA methods the reasoning~distance is large, because
it stretches from the component failure mode to the top---or---system level failure.
%For this reason applying traditional FMEA to software stretches
%the reasoning distance even further.
%
In fact these reasoning paths overlap ---or even by-pass one another---
it is very difficult to gauge cause and effect. For instance
were the ADC to have a small value error, say adding
a small percentage onto the value, we would be unable to
detect this under the analysis conditions for this model, or
be able to pinpoint it.
{
\footnotesize
\bibliographystyle{plain}
\bibliography{../../vmgbibliography,../../mybib}
}
%\today
\end{document}