diff --git a/papers/software_fmea/mybib.bib b/papers/software_fmea/mybib.bib index 58b59ba..3988053 100644 --- a/papers/software_fmea/mybib.bib +++ b/papers/software_fmea/mybib.bib @@ -145,6 +145,14 @@ methodology", YEAR = "2005" } + +@BOOK{easw, + AUTHOR = "Nancy Leveson", + TITLE = "Engineering a safer world ISBN: 978-0262016629", + PUBLISHER = "MIT Press", + YEAR = "2012" +} + @BOOK{bfmea, AUTHOR = "Robin E McDermot et all", TITLE = "The Basics of FMEA ISBN: 0-527-76320-9", @@ -240,6 +248,12 @@ methodology", YEAR = "2002" } +@BOOK{kandr, + AUTHOR = "Kernighan,Ritchie", + TITLE = "The C programming Language 2nd edition", + PUBLISHER = "Prentise Hall", + YEAR = "1988" +} @BOOK{probstat, AUTHOR = " M~R~Spiegel", diff --git a/papers/software_fmea/software_fmea.tex b/papers/software_fmea/software_fmea.tex index d673e48..44335c9 100644 --- a/papers/software_fmea/software_fmea.tex +++ b/papers/software_fmea/software_fmea.tex @@ -172,13 +172,19 @@ With these definitions we can apply FMEA to existing software\footnote{Existing software excluding recursive code, and unstructured non-functional languages}. } -\section{FMEA Process} +\section{FMEA Background} %What FMEA is, briefly variants... Failure Mode effects Analysis is the process of taking component failure modes, and by reasoning, tracing their effects through a system and determining what system level failure modes could be caused. +FMEA dates from the 1940s where simple electro-mechanical systems were the norm. +Modern control systems nearly always have a significant software/firmware element, +and not being able to model software with current FMEA methodologies +is a cause for criticism~\cite{easw}~\cite{safeware}~\cite{bfmea}. + + Several variants of FMEA exist, traditional FMEA being a associated with the manufacturing industry, with the aims of prioritising @@ -203,13 +209,17 @@ all the above variants of FMEA. \section{Modularising FMEA} -In outline, in order to modularise FMEA, we must create small modules form the bottom-up. +In outline, in order to modularise FMEA, we must create small modules from the bottom-up. We can do this by taking collections of base~components that perform (ideally) a simple and well defined task. +% We can call these {\fgs}. We can then analyse the failure mode behaviour of a {\fg} using all the failure modes of all its components. +% When we have its failure mode behaviour, or the symptoms of failure from the perspective of the {\fg}, we now treat the {\fg} as a {\dc}; where the failure modes of the {\dc} are the symptoms of failure of the {\fg}. +% +% We can now use {\dcs} to build higher level {\fgs} until we have a complete hierarchical model of the failure mode behaviour of a system. An example of this process, applied to an inverting op-amp configuration is given in~\cite{syssafe2011}. @@ -297,12 +307,16 @@ of typical modern safety critical systems. With modular FMEA (FMMD) we have the concepts of failure~modes of components, {\fgs} and symptoms of failure for a functional group. -A programmatic function is very similar to a functional group. -It calls other functions, and uses data sources, which could be viewed as its `components'. +A programmatic function is very similar to a f via hardware interactionunctional group. +It calls other functions, and uses data sources via hardware interaction, which could be viewed as its `components'. It has outputs which will be used by functions that may call it. - -However, we need to define a clear concept of failure modes of a function in order to -map FMMD to software. + map the FMMD concepts of {\fms}, {\fgs} and {\dcs} +to software functions. +% +%However, we need to map a the FMMD concepts of {\fms}, {\fgs} and {\dcs} +to software functions. +% failure modes of a function in order to +%map FMMD to software. \subsection{Software, a natural hierarchy} @@ -317,25 +331,30 @@ functions that interact with hardware/electronics. Contract programming is a discipline~\cite{dbcbe} for building software functions in a controlled and traceable way. Each function is subject to pre-conditions (constraints on its inputs), -post-conditions (constraints on its outputs) and function wide invariants (rules). +post-conditions (constraints` on its outpu'ts) and function wide invariants (rules). -\paragraph{Mapping contract pre-condition violations to failure modes} +\paragraph{Mapping contract `pre-condition' violations to failure modes} A precondition, or requirement for a contract software function defines the correct ranges of input conditions for the function to operate successfully. For a software function, a violation of a pre-condition is -in effect a failure mode of one of its components. +in effect a failure mode of `one of its com'ponents. -\paragraph{Mapping contract post-condition violations to symptoms} +\paragraph{Mapping contract `post-condition' violations to symptoms} A post condition is a definition of correct behaviour by a function. -This could be an action performed or an output value. - A violated post condition is a symptom of failure of a function. +Post conditions could be either actions performed (i.e. the state of hardware changed) or an output value of a function. + +\paragraph{Mapping contract `invariant' violations to symptoms and failure modes} + +Invariants in contract programming may apply to inputs to the function (where the can be considered {\fms} in FMMD terminology), +and to outputs (where the can be considered {failure symptoms} in FMMD terminology). + \subsection{Software FMEA} @@ -343,18 +362,29 @@ A violated post condition is a symptom of failure of a function. Consider a function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$) -representing the current detected with an error indication flag . +representing the current detected with an additional error indication flag . Let us assume the {\ft} detection is via a \ohms{220} resistor., and that we read a voltage from an ADC into the software. Let us define any value outside the 4mA to 20mA range as an error condition. +% As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $0.004A * \ohms{220} = 0.88V$ -and $0.020A * \ohms{220} = 4.4V$. Our acceptable voltage range is therefore $V >= 0.88 \wedge V<= 4.4$. +and $0.020A * \ohms{220} = 4.4V$. +% +Our acceptable voltage range is therefore $V >= 0.88 \wedge V<= 4.4$. This voltage range forms our input requirement. -We can now examine software function. For the purpose of example the `C' programming language is used. -We assume a function {\em read\_ADC()} which returns a double precision -value which holds the voltage read. -{\vbox{ +% +We can now examine a software function that performs a conversion from the voltage read to +a per~mil representation of the {\ft} input current. +% +or the purpose of example the `C' programming language is used. +We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision +value which holds the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}). + + +%%{\vbox{ +\begin{figure} + \footnotesize \begin{verbatim} /* Software function to read 4mA to 20mA input */ @@ -384,22 +414,28 @@ int read_4_20_input ( int * value ) { return error_flag; } \end{verbatim} -} -} +%} +%} +\label{fig:code_read_4_20_input} +\caption{Software Function: \textbf{read\_4\_20\_input}} +\label{fig:420i} +\end{figure} -Note that the function above calls another, `read\_ADC', which returns a +We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a voltage for a given ADC channel. This function deals directly with the hardware in the micro-controller we are running the software on. Its job is to select the correct channel (ADC multiplexer) and then to initiate a -conversion by setting an ADC 'go' bit. +conversion by setting an ADC 'go' bit (see code sample in figure~\ref{code_read_ADC}). +It takes the raw ADC reading and converts it into a floating point\footnote{the type, `double' or `double precision', is a standard C language floating point type~\cite{kandr}.} +voltage value. - - -{\vbox{ +%{\vbox{ +\begin{figure} +\label{fig:code_read_ADC} \footnotesize \begin{verbatim} /* Software function to read voltage from a */ @@ -440,8 +476,10 @@ double read_ADC( int channel ) { return dval; } \end{verbatim} -} -} +\caption{Software Function: \textbf{read\_ADC}} +\end{figure} +%} +%} We now have a very simple software structure, a call tree, shown in figure~\ref{fig:ct1}. @@ -477,15 +515,23 @@ $$ fm(R) = \{OPEN,SHORT\}. $$ For the ADC we can determine the following failure modes: -$$ fm(ADC) = \{ STUCKAT, MUXFAIL, LOWOUT, HIGHOUT \}. $$ +\begin{itemize} + \item STUCKAT --- The ADC outputs a constant value, + \item MUXFAIL --- The ADC cannot select its input channel correctly, + \item LOW --- The ADC output is always LOW, or zero ADC counts, + \item HIGH --- The ADC output is always HIGH, or max ADC counts. +\end{itemize} -With these failure modes, we can analyse our first functional group. +We can use the function $fm$ to define the {\fms} of an ADC thus: +$$ fm(ADC) = \{ STUCKAT, MUXFAIL,LOW, HIGH \}. $$ + +With these failure modes, we can analyse our first functional group, see table~ref{tbl:cmatv}. { \tiny \begin{table}[h+] \caption{CMATV: Failure Mode Effects Analysis} % title of Table -\label{tbl:phs225amp} +\label{tbl:cmatv} \begin{tabular}{|| l | c | l ||} \hline \textbf{Failure} & \textbf{failure} & \textbf{Symptom} \\ @@ -507,8 +553,8 @@ With these failure modes, we can analyse our first functional group. 4: $ADC_{MUXFAIL}$ & ADC may read & $V\_ERR$ \\ & wrong channel & \\ \hline - 5: $ADC_{LOWOUT}$ & output low & $LOW$ \\ - 6: $ADC_{HIGHOUT}$ & output high & $HIGH$ \\ \hline + 5: $ADC_{LOW}$ & output low & $LOW$ \\ + 6: $ADC_{HIGH}$ & output high & $HIGH$ \\ \hline \hline @@ -561,7 +607,7 @@ We can now analyse this hardware/software combined {\fg}. \tiny \begin{table}[h+] \caption{RADC: Failure Mode Effects Analysis} % title of Table -\label{tbl:phs225amp} +\label{tbl:radc} \begin{tabular}{|| l | c | l ||} \hline \textbf{Failure} & \textbf{failure} & \textbf{Symptom} \\ @@ -630,7 +676,7 @@ We can now form a functional group with the {\dc} $RADC$ and the software compon \tiny \begin{table}[h+] \caption{Read\_4\_20: Failure Mode Effects Analysis} % title of Table -\label{tbl:phs225amp} +\label{tbl:r420i} \begin{tabular}{|| l | c | l ||} \hline \textbf{Failure} & \textbf{failure} & \textbf{Symptom} \\ @@ -705,12 +751,22 @@ The derived component representing the {\ft} reader in software shows that by taking a modular approach for FMEA, we can integrate software and electro-mechanical FMEA models. + +We now have a {\dc} for a {\ft} input in software. +Typically, more than one such input could be present in a real-world system. +Not only have we integrated electronics and software in an FMEA, we can also +re-use the analysis for each {\ft} input in the system. + The unsolved symptoms, or unobservable errors, i.e. $VAL\_ERR$ could be addressed by another software function to read other known signals via the MUX (i.e. voltage references). This strategy would detect ADC STUCK AT and MUX FAIL failure modes. +% +Detailing this however, is beyond the scope %and page-count +of this paper. + + -Detailing this however, is beyond the scope and page-count of this paper. %Its solved. Hoooo-ray !!!!!!!!!!!!!!!!!!!!!!!!