diff --git a/papers/fmea_software_hardware/software_fmea.tex b/papers/fmea_software_hardware/software_fmea.tex index 71eb2d4..64afd94 100644 --- a/papers/fmea_software_hardware/software_fmea.tex +++ b/papers/fmea_software_hardware/software_fmea.tex @@ -65,32 +65,32 @@ failure mode of the component or sub-system}}} \newboolean{dag} \setboolean{dag}{true} % boolvar=true or false : draw analysis using directed acylic graphs -\setlength{\topmargin}{0in} -\setlength{\headheight}{0in} -\setlength{\headsep}{0in} -\setlength{\textheight}{22cm} -\setlength{\textwidth}{18cm} -%\setlength{\textheight}{24.35cm} -%\setlength{\textwidth}{20cm} -\setlength{\oddsidemargin}{0in} -\setlength{\evensidemargin}{0in} -\setlength{\parindent}{0.0in} -%\setlength{\parskip}{6pt} -% \setlength{\parskip}{1cm plus4mm minus3mm} -\setlength{\parskip}{0pt} -\setlength{\parsep}{0pt} -\setlength{\headsep}{0pt} -\setlength{\topskip}{0pt} -\setlength{\topmargin}{0pt} -\setlength{\topsep}{0pt} -\setlength{\partopsep}{0pt} -\setlength{\itemsep}{1pt} +% \setlength{\topmargin}{0in} +% \setlength{\headheight}{0in} +% \setlength{\headsep}{0in} +% \setlength{\textheight}{22cm} +% \setlength{\textwidth}{18cm} +% %\setlength{\textheight}{24.35cm} +% %\setlength{\textwidth}{20cm} +% \setlength{\oddsidemargin}{0in} +% \setlength{\evensidemargin}{0in} +% \setlength{\parindent}{0.0in} +% %\setlength{\parskip}{6pt} +% % \setlength{\parskip}{1cm plus4mm minus3mm} +% \setlength{\parskip}{0pt} +% \setlength{\parsep}{0pt} +% \setlength{\headsep}{0pt} +% \setlength{\topskip}{0pt} +% \setlength{\topmargin}{0pt} +% \setlength{\topsep}{0pt} +% \setlength{\partopsep}{0pt} +% \setlength{\itemsep}{1pt} % \renewcommand\subsection{\@startsection % {subsection}{2}{0mm}% % {-\baslineskip} % {0.5\baselineskip} % {\normalfont\normalsize\itshape}}% -\linespread{0.953} +\linespread{1.0} \begin{document} %\pagestyle{fancy} @@ -144,13 +144,16 @@ failure mode of the component or sub-system}}} This paper presents a worked example of FMEA applied to an integrated electronics/software system. % -FMEA methodologies trace from the 1940's and were designed to +%FMEA methodologies trace from the 1940's and were designed to +%model simple electro-mechanical systems. +% +FMEA methodologies were originally in the 1940's designed to model simple electro-mechanical systems. % Software generally sits on top of most modern safety critical control systems and defines its most important system wide behaviour and communications. % -Currently standards that demand FMEA for hardware(HFMEA) (e.g. EN298, EN61508), +Currently standards that demand FMEA investigations for hardware(HFMEA) (e.g. EN298, EN61508), do not specify it for software, but instead specify good practise, review processes and language feature constraints. % @@ -161,12 +164,22 @@ traces component {\fms} to resultant system failures, software until recently, has been left in a non-analytical limbo of best practises and constraints. Software FMEA has been proposed -in several forms. SFMEA is always performed separately from HFMEA. +in several forms. +% +However, SFMEA is always performed separately from HFMEA. % This paper seeks to examine the effectiveness of current and proposed SFMEA -techniques, by using a analysing the chosen example, which is well known and understood -from years of field experience, and determining how well the HFMEA and SFMEA -analysis reports model the failure mode behaviour. +techniques, by analysing a simple hybrid hardware/software system, +which is in common use and has mature field experience. % +%analysing the chosen example, which is well known and understood +% +Because the chosen example is well understood it is +%, this example is +useful +to compare the results from these FMEA methodologies with +the known failure mode behaviour. +%from years of field experience, and determining how well the HFMEA and SFMEA +%analysis reports model the failure mode behaviour. % % %If software and hardware integrated FMEA were possible, electro-mechanical-software hybrids could %be modelled, and so we could consider `complete' failure mode models. @@ -205,7 +218,7 @@ component failure modes, %and by reasoning, tracing their effects through a system and determining what system level failure modes could be caused. % -FMEA dates from the 1940s where simple electro-mechanical systems were the norm. +FMEA has its roots in the previous century where simple electro-mechanical systems were the norm. Modern control systems nearly always have a significant software/firmware element, and not being able to model software with current FMEA methodologies is a cause for criticism~\cite{safeware}[Ch.12]. @@ -260,19 +273,20 @@ base component {\fms}, and translating them into system level events/failures~\c In a complicated system, mapping a component failure mode to a system level failure will mean a long reasoning distance; that is to say the actions of the failed component will have to be traced through -several sub-systems, gauging its effects with other components. +several sub-systems, gauging its effects with and on other components. % With software at the higher levels of these sub-systems, we have yet another layer of complication. % -In order to integrate software, %in a meaningful way -we need to re-think the -FMEA concept of simply mapping a base component failure to a system level event. +%In order to integrate software, %in a meaningful way +%we need to re-think the +%FMEA concept of simply mapping a base component failure to a system level event. % -SFMEA regards the components to be the variables used by the programs. -These variables could become erroneously over-written, -by calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor it is running on, or -by radiation causing bits to be erroneously altered. +SFMEA regard, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}. +The failure modes of these variables, are that they could become erroneously over-written, +calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor it is running on), or +external influences such as +ionising radiation causing bits to be erroneously altered. \paragraph{A more-complete Failure Mode Model} @@ -287,8 +301,8 @@ by radiation causing bits to be erroneously altered. % In order to obtain a more complete failure mode model of a hybrid electronic/software system we need to analyse -the hardware, the software, the hardware the software runs on, -and the software hardware interface. +the hardware, the software, the hardware the software runs on (i.e. the software's medium), +and the software/hardware interface. % HFMEA is a well established technique and needs no further description in this paper. @@ -301,7 +315,7 @@ to supply a current signal to represent the value to be sent~\cite{aoe}[p.934]. Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale, and this is referred to as {\ft} signalling. % -{\ft} has an electrical advantage as well because the current in a loop is constant~\cite{aoe}[p.20]. +{\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20]. Thus resistance in the wires between the source and the receiving end is not an issue that can alter the accuracy of the signal. % @@ -332,25 +346,26 @@ The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending signal to a micro-controller system. The signal is locally driven over a load resistor, and then read into the micro-controller via an ADC and its multiplexer. -With the voltage detected at the ADC the multiplexer can read the intended quantitative +With the voltage detected at the ADC the multiplexer we read the intended quantitative value from the external equipment. \subsection{Simple Software Example} Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$) -representing the current detected with an additional error indication flag. +representing the value intended by the current detected, with an additional error indication flag to indicate the validity +of the value returned. % Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage from an ADC into the software. Let us define any value outside the 4mA to 20mA range as an error condition. % As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $0.004A * \ohms{220} = 0.88V$ -and $0.020A * \ohms{220} = 4.4V$. +and $$0.020A * \ohms{220} = 4.4V \;.$$ % Our acceptable voltage range is therefore % -$(V \ge 0.88) \wedge (V \le 4.4) \; .$ +$$(V \ge 0.88) \wedge (V \le 4.4) \; .$$ This voltage range forms our input requirement. % @@ -479,7 +494,7 @@ calls {\em read\_ADC}, which in turn interacts with the hardware/electronics. This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the software is reading values from the `lower~level' electronics. % -FMEA is always a bottom-up process and so we must begin with this hardware. +%FMEA is always a bottom-up process and so we must begin with this hardware. % The hardware is simply a load resistor, connected across an ADC input pin on the micro-controller and ground. @@ -504,8 +519,8 @@ the multiplexer and the analogue to digital converter. \label{tbl:r420i} \begin{tabular}{|| l | c | l ||} \hline - \textbf{Failure} & \textbf{failure} & \textbf{System Failure} \\ - \textbf{Scenario} & \textbf{effect} & \\ \hline + \textbf{Failure} & \textbf{failure} & \textbf{System} \\ + \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline \hline $R$ & OPEN~\cite{en298}[Ann.A] & $LOW$ \\ & & $READING$ \\ \hline @@ -537,7 +552,7 @@ For software FMEA we take the variables used by the system, and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}. From the function $read\_4\_20\_input()$ we have the variables $error\_flag$, $input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$. -We must now determine putative system failure modes for these variables becoming corrupted. +We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}. { @@ -547,8 +562,8 @@ We must now determine putative system failure modes for these variables becoming \label{tbl:sfmea} \begin{tabular}{|| l | c | l ||} \hline - \textbf{Failure} & \textbf{failure} & \textbf{System Failure} \\ - \textbf{Scenario} & \textbf{effect} & \\ \hline + \textbf{Failure} & \textbf{failure} & \textbf{System} \\ + \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline \hline $error\_flag$ & set FALSE & $VAL\_ERROR$ \\ & & \\ \hline @@ -592,7 +607,7 @@ We must now determine putative system failure modes for these variables becoming Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and -oscillator clock timing~\cite{sfmeaauto}. +oscillator clock timing @@ -603,11 +618,11 @@ oscillator clock timing~\cite{sfmeaauto}. \label{tbl:sfmeaup} \begin{tabular}{|| l | c | l ||} \hline - \textbf{Failure} & \textbf{failure} & \textbf{System Failure} \\ - \textbf{Scenario} & \textbf{effect} & \\ \hline + \textbf{Failure} & \textbf{failure} & \textbf{System} \\ + \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline \hline - $RAM$ & variable corruption & All errors \\ - & & from table~\ref{tbl:sfmea} \\ \hline + $RAM$ & variable & All errors \\ + & corruption & from table~\ref{tbl:sfmea} \\ \hline $RAM$ & program flow & process \\ & & halts / crashes \\ \hline @@ -632,51 +647,93 @@ oscillator clock timing~\cite{sfmeaauto}. \end{table} } - -\section{Software FMEA - The software hardware interface} +\clearpage +\section{Software FMEA - The software/hardware interface} As FMEA is applied separately to software and hardware the interface between them is an undefined factor. -Ozarin~\cite{sfmeainterface} recommends that an FMEA report be written +Ozarin~\cite{sfmeainterface,procsfmea} recommends that an FMEA report be written to focus on the software/hardware interface. - +The software/hardware interface has +specific problems common to many systems and configurations +and these are described in~\cite{sfmeainterface}. +%An interface FMEA is performed in table~\ref{hwswinterface}. +% The hardware to software interface for the {\ft} example is handled by the 'C' function $read\_ADC()$. +~\cite{sfmeaauto}. +% +% An FMEA of the `software~medium' is given in table~\ref{tbl:sfmeaup}. +\paragraph{Timing and Synchronisation.} +The $ADCOUT$ register, where the raw ADC value is read +is an internal register used by the ADC and presented +as a readable memory location when the ADC +has finished updating it. +Reading it at the wrong time would +cause an invalid value to be read. +The synchronisation is performed by polling an $ADCGO$ +bit, a flag mapped to memory by which the ADC indicates that the data is ready. +\paragraph{Interrupt Contention.} +Were an interrupt to also attempt to read from the ADC +the ADCMUX could be altered, causing the non-interrupt +routine to read from the wrong channel. + +\paragraph{Data Formatting.} +The ADC may use a big-endian or little endian integer +format. It may also right or left justify the bits in its value. \section{Conclusion} % -The FMMD method has been demonstrated using an the industry standard {\ft} -input circuit and software. +This paper has picked a very simple example (the industry standard {\ft} +input circuit and software) to demonstrate +SFMEA and HFMEA methodologies used to describe a failure mode model. +%Even a modest system would be far too large to analyse in conference paper +%and this % -The {\dc} representing the {\ft} reader -shows that by taking a +%The {\dc} representing the {\ft} reader +%shows that by taking a %modular approach for FMEA, i.e. FMMD, we can integrate -four FMEA reports we can model the failure mode behaviour from -several perspectives, for -software and electrical systems% models. -% -With this analysis -we have stages along the `reasoning~path' linking the failure modes from the -electronics to those in the software. -Each {\fg} to {\dc} transition represents a -reasoning stage. -% +Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from +model the system from four several perspectives. % With traditional FMEA methods the reasoning~distance is large, because it stretches from the component failure mode to the top---or---system level failure. +% +With these four analysis reports +we do not have stages along the `reasoning~path' linking the failure modes from the +electronics to those in the software. +%Software is often written `defensively' but t +%Each {\fg} to {\dc} transition represents a +%reasoning stage. +% +% %For this reason applying traditional FMEA to software stretches %the reasoning distance even further. % - In fact these reasoning paths overlap ---or even by-pass one another--- - it is very difficult to gauge cause and effect. For instance - were the ADC to have a small value error, say adding - a small percentage onto the value, we would be unable to - detect this under the analysis conditions for this model, or - be able to pinpoint it. - +In fact many these reasoning paths overlap---or even by-pass one another--- +it is very difficult to gauge cause and effect. +For instance, hardware failures are not analysed in the context of how they will +be handled (or missed) by the software. +% +System outputs commanded from may not take into account particular +hardware limitations etc. + +The interface FMEA does serve to provide a useful +checklist to ensure conventions used by the hardware +and software are not mismatched. + +However, while these techniques ensure that the software and hardware is +viewed and analysed from several perspectives, it cannot be termed a homogeneous +failure mode model. +% For instance +% were the ADC to have a small value error, say adding +% a small percentage onto the value, we would be unable to +% detect this under the analysis conditions for this model, or +% be able to pinpoint it. +% {