diff --git a/papers/software_fmea/mybib.bib b/papers/software_fmea/mybib.bib index d616858..0f37e11 100644 --- a/papers/software_fmea/mybib.bib +++ b/papers/software_fmea/mybib.bib @@ -357,6 +357,14 @@ methodology", year = "1994" } + +@MISC{en61511, + author = "E N Standard", + title = "EN61511: Functional safety. Safety instrumented systems for the process industry sector. ", + howpublished = "British standards Institution http://www.bsigroup.com/", + year = "2004" +} + @MISC{challenger, author = "U.S. Presidential Commission", title = "Report of the SpaceShuttle Challanger Accident", diff --git a/papers/software_fmea/software_fmea.tex b/papers/software_fmea/software_fmea.tex index 4787ff8..a89d13f 100644 --- a/papers/software_fmea/software_fmea.tex +++ b/papers/software_fmea/software_fmea.tex @@ -138,14 +138,14 @@ component failure modes on a system. It is used both as a design tool (to determine weakness), and is a requirement of certification of safety critical products. FMEA has been successfully applied to mechanical, electrical and hybrid electro-mechanical systems. -Work on software FMEA is beginning~\cite{sfmea}~\cite{sfmeaa}, but +Work on software FMEA is beginning, but at present no technique for software FMEA that integrates hardware and software models known to the authors exists. % Software generally, sits on top of most modern safety critical control systems and defines its most important system wide behaviour and communications. -Standards~\cite{en298}~\cite{en61508} that use FMEA -do not specify it for Software, but do specify, good practise, +Currently standards that demand FMEA for hardware (e.g. EN298, EN61508), +do not specify it for Software, but instead specify, good practise, review processes and language feature constraints. This is a weakness; where FMEA scientifically traces component {\fms} @@ -162,6 +162,9 @@ This paper presents an FMEA methodology which can be applied to software, and is and integrate-able with FMEA performed on mechanical and electronic systems. } +\nocite{en298} +\nocite{en61508} + \section{Introduction} { This paper describes a modular FMEA process that can be applied to software. @@ -194,7 +197,7 @@ is a cause for criticism~\cite{easw}~\cite{safeware}~\cite{bfmea}. Several variants of FMEA exist, -traditional FMEA being a associated with the manufacturing industry, with the aims of prioritising +traditional FMEA being associated with the manufacturing industry, with the aims of prioritising the failures to fix in order of cost. Deisgn FMEA (DFMEA) is FMEA applied at the design or approvals stage @@ -215,7 +218,7 @@ all the above variants of FMEA. \subsection{Current FMEA techniques are not suitable for software} The main FMEA methodologies are all based on the concept of taking -base component {\fms}, and translating them into system level events/failures. +base component {\fms}, and translating them into system level events/failures~\cite{sfmea}~\cite{sfmeaa}. In a complicated system, mapping a component failure mode to a system level failure will mean a long reasoning distance; that is to say the actions of the failed component will have to be traced through several sub-systems and the effects of other components on the way. @@ -321,7 +324,7 @@ of the {\fg} that it was derived from. % in a specific configuration. This specific configuration corresponds to % a {\fg}. Our use of it as a building block corresponds to a {\dc}. -We can use the symbol $\bowtie$ to represent the creation of a derived component +We can use the symbol `$\bowtie$' to represent the creation of a derived component from a {\fg}. We show an FMMD hierarchy in figure~\ref{fig:fmmdh}. Using this diagram, we can follow the creation of the hierarchy in a theoretical system. @@ -356,7 +359,7 @@ With modular FMEA i.e. FMMD %(FMMD) we have the concepts of failure~modes of components, {\fgs} and symptoms of failure for a functional group. -A programmatic function has similariies with a {\fg} as defined by the FMMD process. +A programmatic function has similarities with a {\fg} as defined by the FMMD process. % An FMMD {\fg} is placed into a hierarchy. A Software function is placed into a hierarchy, that of its call-tree. @@ -370,7 +373,7 @@ are the failure modes of the software components (other functions it calls) and the hardware its reads values from. Its outputs are the data it changes, or the hardware actions it performs. -When we have analysed a software function, initially usin its input failure modes +When we have analysed a software function, initially using its input failure modes we can determine its symptoms of failure (how calling functions will see its failure mode behaviour). We can thus apply the $\bowtie$ process to software functions, by viewing them in terms of their failure @@ -390,7 +393,7 @@ and the subsequent hierarchy. With software already written, that hierarchy is f \subsection{Software, a natural hierarchy} Software written for safety critical systems is usually constrained to -be modular~\cite{en61508}[3]~\cite{misra}[cc] and non recursive~\cite{misra}[15.2]{iec61511}. +be modular~\cite{en61508}[3] and non recursive~\cite{misra}[15.2].%{iec61511}. Because of this we can assume a direct call tree. Functions call functions from the top down and eventually call the lowest level library or IO functions that interact with hardware/electronics. @@ -398,7 +401,7 @@ functions that interact with hardware/electronics. What is potentially difficult with a software function, is deciding what are failure modes, and later what a failure symptoms. With electronic components, we can use literature to point us to suitable sets of -{\fms}~\cite{en298}~\cite{fmd91}~\cite{mil1991}~\cite{en61508}. +{\fms}~\cite{fmd91}~\cite{mil1991}~\cite{en298}.%~\cite{en61508}~\cite{en298}. With software, only some library functions are well known and rigorously documented enough to have the equivalent of known failure modes. Most software is `bespoke'. We need a different strategy to @@ -439,16 +442,16 @@ and to outputs (where they can be considered {failure symptoms} in FMMD terminol For the purpose of example, we chose a simple common safety critical industrial circuit that is nearly always used in conjunction with a programmatic element. A common method for delivering a quantitative value in analogue electronics is -to supply a current signal to represent it~\cite{aoe}[p.849]. -Usually, 4mA represents a zero or starting value and 20mA represents the full scale, +to supply a current signal to represent the value to be sent~\cite{aoe}[p.849]. +Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale, and this is referred to as {\ft} signalling. % {\ft} has a an electrical advantage as well, because the current in a loop is constant~\cite{aoe}[p.20] resistance in the wires between the source and the receiving end is not an issue that can alter the accuracy of the signal. % -This circuit has many advantages for safety. If the signal becomes discontented -it reads an out of range 0mA at the receiving end. This is outside the {\ft} range, +This circuit has many advantages for safety. If the signal becomes disconnected +it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range, and is therefore easy to detect as an error rather than an incorrect value. % Should the driving electronics go wrong at the source end, it will usually @@ -469,12 +472,19 @@ current signal into a voltage that we can read with an ADC: the humble resistor! \end{figure} +The diagram in figure~\ref{fig:ftcontext}, shows some equipment which is sending a {\ft} +signal to a micro-controller system. +The signal is locally driven over a load resistor, and then read into the micro-controller via +an ADC and its multiplexer. +With the voltage detected at the ADC the multiplexer can read the intended quantitative +value from the external equipment. + \subsection{Simple Software Example} Consider a function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$) representing the current detected with an additional error indication flag . - +% Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage from an ADC into the software. Let us define any value outside the 4mA to 20mA range as an error condition. @@ -501,9 +511,13 @@ value which holds the voltage read (see code sample in figure~\ref{fig:code_read \footnotesize \begin{verbatim} +/***********************************************/ +/* read_4_20_input() */ +/***********************************************/ /* Software function to read 4mA to 20mA input */ /* returns a value from 0-999 proportional */ /* to the current input. */ +/***********************************************/ int read_4_20_input ( int * value ) { double input_volts; int error_flag; @@ -530,8 +544,9 @@ int read_4_20_input ( int * value ) { \end{verbatim} %} %} -\label{fig:code_read_4_20_input} + \caption{Software Function: \textbf{read\_4\_20\_input}} +\label{fig:code_read_4_20_input} %\label{fig:420i} \end{figure} @@ -542,7 +557,7 @@ This function deals directly with the hardware in the micro-controller that we are running the software on. % Its job is to select the correct channel (ADC multiplexer) and then to initiate a -conversion by setting an ADC 'go' bit (see code sample in figure~\ref{code_read_ADC}). +conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}). % It takes the raw ADC reading and converts it into a i floating point\footnote{the type, `double' or `double precision', is a standard C language floating point type~\cite{kandr}.} @@ -554,15 +569,19 @@ voltage value. %{\vbox{ \begin{figure} -\label{fig:code_read_ADC} + \footnotesize \begin{verbatim} +/***********************************************/ +/* read_ADC() */ +/***********************************************/ /* Software function to read voltage from a */ /* specified ADC MUX channel */ /* Assume 10 ADC MUX channels 0..9 */ /* ADC_CHAN_RANGE = 9 */ /* Assume ADC is 12 bit and ADCRANGE = 4096 */ /* returns voltage read as double precision */ +/***********************************************/ double read_ADC( int channel ) { int timeout = 0; /* require: a) input channel from ADC to be @@ -596,6 +615,7 @@ double read_ADC( int channel ) { } \end{verbatim} \caption{Software Function: \textbf{read\_ADC}} +\label{fig:code_read_ADC} \end{figure} %} %} @@ -787,7 +807,7 @@ This function sits on top of the $RADC$ {\dc} determined above. We look at the pre-conditions for the function $read\_4\_20\_input$ $(RI)$, % which we can call $RI$ to determine its {\fms}. Its pre-condition is, {\em /* require: input from ADC to be between 0.88 and 4.4 volts */}. -We can call a violation of this the {\fm} VRNGE; %As this function has one pre-condition +We can map this violation of the pre-condition, to the {\fm} VRNGE; %As this function has one pre-condition we can state, $$ fm(RI) = \{ VRNGE \} .$$ @@ -871,10 +891,18 @@ as a hierarchical diagram, see figure~\ref{fig:hd}. %\clearpage \section{Conclusion} -The derived component representing the {\ft} reader +The {\dc} representing the {\ft} reader in software shows that by taking a modular approach for FMEA, we can integrate software and electro-mechanical FMEA models. - +With this analysis +we have a complete `reasoning~path' linking the failures modes from the +electronics to those in the software. +Each functional group to {\dc} transition represents a +reasoning stage. +With traditional FMEA methods the reasoning~distance is large, because +it stretches from the component failure mode to the top---or---system level failure. +For this reason applying traditional FMEA to software stretches +the reasoning distance even further. We now have a {\dc} for a {\ft} input in software. Typically, more than one such input could be present in a real-world system. @@ -884,7 +912,7 @@ re-use the analysis for each {\ft} input in the system. The unsolved symptoms, or unobservable errors, i.e. $VAL\_ERR$ could be addressed by another software function to read other known signals via the MUX (i.e. voltage references). This strategy would -detect ADC STUCK AT and MUX FAIL failure modes. +detect ADC\_STUCK\_AT and MUX\_FAIL failure modes. % Detailing this however, is beyond the scope %and page-count of this paper. @@ -897,7 +925,7 @@ of this paper. \paragraph{Future work} \begin{itemize} -\item +\item A complete software/electrical/mechanical system analysed \item \item \end{itemize}