working on the software_fmea paper.

This commit is contained in:
robin 2012-04-09 15:33:33 +01:00
parent e5cad5d363
commit 043b766217
2 changed files with 59 additions and 23 deletions

View File

@ -357,6 +357,14 @@ methodology",
year = "1994" year = "1994"
} }
@MISC{en61511,
author = "E N Standard",
title = "EN61511: Functional safety. Safety instrumented systems for the process industry sector. ",
howpublished = "British standards Institution http://www.bsigroup.com/",
year = "2004"
}
@MISC{challenger, @MISC{challenger,
author = "U.S. Presidential Commission", author = "U.S. Presidential Commission",
title = "Report of the SpaceShuttle Challanger Accident", title = "Report of the SpaceShuttle Challanger Accident",

View File

@ -138,14 +138,14 @@ component failure modes on a system.
It is used both as a design tool (to determine weakness), and is a requirement of certification of safety critical products. It is used both as a design tool (to determine weakness), and is a requirement of certification of safety critical products.
FMEA has been successfully applied to mechanical, electrical and hybrid electro-mechanical systems. FMEA has been successfully applied to mechanical, electrical and hybrid electro-mechanical systems.
Work on software FMEA is beginning~\cite{sfmea}~\cite{sfmeaa}, but Work on software FMEA is beginning, but
at present no technique for software FMEA that at present no technique for software FMEA that
integrates hardware and software models known to the authors exists. integrates hardware and software models known to the authors exists.
% %
Software generally, sits on top of most modern safety critical control systems Software generally, sits on top of most modern safety critical control systems
and defines its most important system wide behaviour and communications. and defines its most important system wide behaviour and communications.
Standards~\cite{en298}~\cite{en61508} that use FMEA Currently standards that demand FMEA for hardware (e.g. EN298, EN61508),
do not specify it for Software, but do specify, good practise, do not specify it for Software, but instead specify, good practise,
review processes and language feature constraints. review processes and language feature constraints.
This is a weakness; where FMEA scientifically traces component {\fms} This is a weakness; where FMEA scientifically traces component {\fms}
@ -162,6 +162,9 @@ This paper presents an FMEA methodology which can be applied to software, and is
and integrate-able with FMEA performed on mechanical and electronic systems. and integrate-able with FMEA performed on mechanical and electronic systems.
} }
\nocite{en298}
\nocite{en61508}
\section{Introduction} \section{Introduction}
{ {
This paper describes a modular FMEA process that can be applied to software. This paper describes a modular FMEA process that can be applied to software.
@ -194,7 +197,7 @@ is a cause for criticism~\cite{easw}~\cite{safeware}~\cite{bfmea}.
Several variants of FMEA exist, Several variants of FMEA exist,
traditional FMEA being a associated with the manufacturing industry, with the aims of prioritising traditional FMEA being associated with the manufacturing industry, with the aims of prioritising
the failures to fix in order of cost. the failures to fix in order of cost.
Deisgn FMEA (DFMEA) is FMEA applied at the design or approvals stage Deisgn FMEA (DFMEA) is FMEA applied at the design or approvals stage
@ -215,7 +218,7 @@ all the above variants of FMEA.
\subsection{Current FMEA techniques are not suitable for software} \subsection{Current FMEA techniques are not suitable for software}
The main FMEA methodologies are all based on the concept of taking The main FMEA methodologies are all based on the concept of taking
base component {\fms}, and translating them into system level events/failures. base component {\fms}, and translating them into system level events/failures~\cite{sfmea}~\cite{sfmeaa}.
In a complicated system, mapping a component failure mode to a system level failure In a complicated system, mapping a component failure mode to a system level failure
will mean a long reasoning distance; that is to say the actions of the failed component will have to be traced through will mean a long reasoning distance; that is to say the actions of the failed component will have to be traced through
several sub-systems and the effects of other components on the way. several sub-systems and the effects of other components on the way.
@ -321,7 +324,7 @@ of the {\fg} that it was derived from.
% in a specific configuration. This specific configuration corresponds to % in a specific configuration. This specific configuration corresponds to
% a {\fg}. Our use of it as a building block corresponds to a {\dc}. % a {\fg}. Our use of it as a building block corresponds to a {\dc}.
We can use the symbol $\bowtie$ to represent the creation of a derived component We can use the symbol `$\bowtie$' to represent the creation of a derived component
from a {\fg}. We show an FMMD hierarchy in figure~\ref{fig:fmmdh}. from a {\fg}. We show an FMMD hierarchy in figure~\ref{fig:fmmdh}.
Using this diagram, we can follow the creation of the hierarchy in Using this diagram, we can follow the creation of the hierarchy in
a theoretical system. a theoretical system.
@ -356,7 +359,7 @@ With modular FMEA i.e. FMMD %(FMMD)
we have the concepts of failure~modes we have the concepts of failure~modes
of components, {\fgs} and symptoms of failure for a functional group. of components, {\fgs} and symptoms of failure for a functional group.
A programmatic function has similariies with a {\fg} as defined by the FMMD process. A programmatic function has similarities with a {\fg} as defined by the FMMD process.
% %
An FMMD {\fg} is placed into a hierarchy. An FMMD {\fg} is placed into a hierarchy.
A Software function is placed into a hierarchy, that of its call-tree. A Software function is placed into a hierarchy, that of its call-tree.
@ -370,7 +373,7 @@ are the failure modes of the software components (other functions it calls)
and the hardware its reads values from. and the hardware its reads values from.
Its outputs are the data it changes, or the hardware actions it performs. Its outputs are the data it changes, or the hardware actions it performs.
When we have analysed a software function, initially usin its input failure modes When we have analysed a software function, initially using its input failure modes
we can determine its symptoms of failure (how calling functions will see its failure mode behaviour). we can determine its symptoms of failure (how calling functions will see its failure mode behaviour).
We can thus apply the $\bowtie$ process to software functions, by viewing them in terms of their failure We can thus apply the $\bowtie$ process to software functions, by viewing them in terms of their failure
@ -390,7 +393,7 @@ and the subsequent hierarchy. With software already written, that hierarchy is f
\subsection{Software, a natural hierarchy} \subsection{Software, a natural hierarchy}
Software written for safety critical systems is usually constrained to Software written for safety critical systems is usually constrained to
be modular~\cite{en61508}[3]~\cite{misra}[cc] and non recursive~\cite{misra}[15.2]{iec61511}. be modular~\cite{en61508}[3] and non recursive~\cite{misra}[15.2].%{iec61511}.
Because of this we can assume a direct call tree. Functions call functions Because of this we can assume a direct call tree. Functions call functions
from the top down and eventually call the lowest level library or IO from the top down and eventually call the lowest level library or IO
functions that interact with hardware/electronics. functions that interact with hardware/electronics.
@ -398,7 +401,7 @@ functions that interact with hardware/electronics.
What is potentially difficult with a software function, is deciding what What is potentially difficult with a software function, is deciding what
are failure modes, and later what a failure symptoms. are failure modes, and later what a failure symptoms.
With electronic components, we can use literature to point us to suitable sets of With electronic components, we can use literature to point us to suitable sets of
{\fms}~\cite{en298}~\cite{fmd91}~\cite{mil1991}~\cite{en61508}. {\fms}~\cite{fmd91}~\cite{mil1991}~\cite{en298}.%~\cite{en61508}~\cite{en298}.
With software, only some library functions are well known and rigorously documented With software, only some library functions are well known and rigorously documented
enough to have the equivalent of known failure modes. enough to have the equivalent of known failure modes.
Most software is `bespoke'. We need a different strategy to Most software is `bespoke'. We need a different strategy to
@ -439,16 +442,16 @@ and to outputs (where they can be considered {failure symptoms} in FMMD terminol
For the purpose of example, we chose a simple common safety critical industrial circuit For the purpose of example, we chose a simple common safety critical industrial circuit
that is nearly always used in conjunction with a programmatic element. that is nearly always used in conjunction with a programmatic element.
A common method for delivering a quantitative value in analogue electronics is A common method for delivering a quantitative value in analogue electronics is
to supply a current signal to represent it~\cite{aoe}[p.849]. to supply a current signal to represent the value to be sent~\cite{aoe}[p.849].
Usually, 4mA represents a zero or starting value and 20mA represents the full scale, Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale,
and this is referred to as {\ft} signalling. and this is referred to as {\ft} signalling.
% %
{\ft} has a an electrical advantage as well, because the current in a loop is constant~\cite{aoe}[p.20] {\ft} has a an electrical advantage as well, because the current in a loop is constant~\cite{aoe}[p.20]
resistance in the wires between the source and the receiving end is not an issue resistance in the wires between the source and the receiving end is not an issue
that can alter the accuracy of the signal. that can alter the accuracy of the signal.
% %
This circuit has many advantages for safety. If the signal becomes discontented This circuit has many advantages for safety. If the signal becomes disconnected
it reads an out of range 0mA at the receiving end. This is outside the {\ft} range, it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range,
and is therefore easy to detect as an error rather than an incorrect value. and is therefore easy to detect as an error rather than an incorrect value.
% %
Should the driving electronics go wrong at the source end, it will usually Should the driving electronics go wrong at the source end, it will usually
@ -469,12 +472,19 @@ current signal into a voltage that we can read with an ADC: the humble resistor!
\end{figure} \end{figure}
The diagram in figure~\ref{fig:ftcontext}, shows some equipment which is sending a {\ft}
signal to a micro-controller system.
The signal is locally driven over a load resistor, and then read into the micro-controller via
an ADC and its multiplexer.
With the voltage detected at the ADC the multiplexer can read the intended quantitative
value from the external equipment.
\subsection{Simple Software Example} \subsection{Simple Software Example}
Consider a function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$) Consider a function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$)
representing the current detected with an additional error indication flag . representing the current detected with an additional error indication flag .
%
Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage
from an ADC into the software. from an ADC into the software.
Let us define any value outside the 4mA to 20mA range as an error condition. Let us define any value outside the 4mA to 20mA range as an error condition.
@ -501,9 +511,13 @@ value which holds the voltage read (see code sample in figure~\ref{fig:code_read
\footnotesize \footnotesize
\begin{verbatim} \begin{verbatim}
/***********************************************/
/* read_4_20_input() */
/***********************************************/
/* Software function to read 4mA to 20mA input */ /* Software function to read 4mA to 20mA input */
/* returns a value from 0-999 proportional */ /* returns a value from 0-999 proportional */
/* to the current input. */ /* to the current input. */
/***********************************************/
int read_4_20_input ( int * value ) { int read_4_20_input ( int * value ) {
double input_volts; double input_volts;
int error_flag; int error_flag;
@ -530,8 +544,9 @@ int read_4_20_input ( int * value ) {
\end{verbatim} \end{verbatim}
%} %}
%} %}
\label{fig:code_read_4_20_input}
\caption{Software Function: \textbf{read\_4\_20\_input}} \caption{Software Function: \textbf{read\_4\_20\_input}}
\label{fig:code_read_4_20_input}
%\label{fig:420i} %\label{fig:420i}
\end{figure} \end{figure}
@ -542,7 +557,7 @@ This function
deals directly with the hardware in the micro-controller that we are running the software on. deals directly with the hardware in the micro-controller that we are running the software on.
% %
Its job is to select the correct channel (ADC multiplexer) and then to initiate a Its job is to select the correct channel (ADC multiplexer) and then to initiate a
conversion by setting an ADC 'go' bit (see code sample in figure~\ref{code_read_ADC}). conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}).
% %
It takes the raw ADC reading and converts it into a i It takes the raw ADC reading and converts it into a i
floating point\footnote{the type, `double' or `double precision', is a standard C language floating point type~\cite{kandr}.} floating point\footnote{the type, `double' or `double precision', is a standard C language floating point type~\cite{kandr}.}
@ -554,15 +569,19 @@ voltage value.
%{\vbox{ %{\vbox{
\begin{figure} \begin{figure}
\label{fig:code_read_ADC}
\footnotesize \footnotesize
\begin{verbatim} \begin{verbatim}
/***********************************************/
/* read_ADC() */
/***********************************************/
/* Software function to read voltage from a */ /* Software function to read voltage from a */
/* specified ADC MUX channel */ /* specified ADC MUX channel */
/* Assume 10 ADC MUX channels 0..9 */ /* Assume 10 ADC MUX channels 0..9 */
/* ADC_CHAN_RANGE = 9 */ /* ADC_CHAN_RANGE = 9 */
/* Assume ADC is 12 bit and ADCRANGE = 4096 */ /* Assume ADC is 12 bit and ADCRANGE = 4096 */
/* returns voltage read as double precision */ /* returns voltage read as double precision */
/***********************************************/
double read_ADC( int channel ) { double read_ADC( int channel ) {
int timeout = 0; int timeout = 0;
/* require: a) input channel from ADC to be /* require: a) input channel from ADC to be
@ -596,6 +615,7 @@ double read_ADC( int channel ) {
} }
\end{verbatim} \end{verbatim}
\caption{Software Function: \textbf{read\_ADC}} \caption{Software Function: \textbf{read\_ADC}}
\label{fig:code_read_ADC}
\end{figure} \end{figure}
%} %}
%} %}
@ -787,7 +807,7 @@ This function sits on top of the $RADC$ {\dc} determined above.
We look at the pre-conditions for the function $read\_4\_20\_input$ $(RI)$, % which we can call $RI$ We look at the pre-conditions for the function $read\_4\_20\_input$ $(RI)$, % which we can call $RI$
to determine its {\fms}. to determine its {\fms}.
Its pre-condition is, {\em /* require: input from ADC to be between 0.88 and 4.4 volts */}. Its pre-condition is, {\em /* require: input from ADC to be between 0.88 and 4.4 volts */}.
We can call a violation of this the {\fm} VRNGE; %As this function has one pre-condition We can map this violation of the pre-condition, to the {\fm} VRNGE; %As this function has one pre-condition
we can state, we can state,
$$ fm(RI) = \{ VRNGE \} .$$ $$ fm(RI) = \{ VRNGE \} .$$
@ -871,10 +891,18 @@ as a hierarchical diagram, see figure~\ref{fig:hd}.
%\clearpage %\clearpage
\section{Conclusion} \section{Conclusion}
The derived component representing the {\ft} reader The {\dc} representing the {\ft} reader
in software shows that by taking a modular approach for FMEA, we can integrate in software shows that by taking a modular approach for FMEA, we can integrate
software and electro-mechanical FMEA models. software and electro-mechanical FMEA models.
With this analysis
we have a complete `reasoning~path' linking the failures modes from the
electronics to those in the software.
Each functional group to {\dc} transition represents a
reasoning stage.
With traditional FMEA methods the reasoning~distance is large, because
it stretches from the component failure mode to the top---or---system level failure.
For this reason applying traditional FMEA to software stretches
the reasoning distance even further.
We now have a {\dc} for a {\ft} input in software. We now have a {\dc} for a {\ft} input in software.
Typically, more than one such input could be present in a real-world system. Typically, more than one such input could be present in a real-world system.
@ -884,7 +912,7 @@ re-use the analysis for each {\ft} input in the system.
The unsolved symptoms, or unobservable errors, i.e. $VAL\_ERR$ could be addressed The unsolved symptoms, or unobservable errors, i.e. $VAL\_ERR$ could be addressed
by another software function to read other known signals by another software function to read other known signals
via the MUX (i.e. voltage references). This strategy would via the MUX (i.e. voltage references). This strategy would
detect ADC STUCK AT and MUX FAIL failure modes. detect ADC\_STUCK\_AT and MUX\_FAIL failure modes.
% %
Detailing this however, is beyond the scope %and page-count Detailing this however, is beyond the scope %and page-count
of this paper. of this paper.
@ -897,7 +925,7 @@ of this paper.
\paragraph{Future work} \paragraph{Future work}
\begin{itemize} \begin{itemize}
\item \item A complete software/electrical/mechanical system analysed
\item \item
\item \item
\end{itemize} \end{itemize}