night night edit, but have not eaten yet (GEDDIT)

2012-07-30 20:27:13 +01:00 · 2012-07-30 20:27:13 +01:00 · ec8d020d4e
commit ec8d020d4e
parent 5db87d00d5
1 changed files with 136 additions and 79 deletions
--- a/papers/fmea_software_hardware/software_fmea.tex
+++ b/papers/fmea_software_hardware/software_fmea.tex
@ -65,32 +65,32 @@ failure mode of the component or sub-system}}}
 \newboolean{dag}
 \setboolean{dag}{true} % boolvar=true or false : draw analysis using directed acylic graphs

-\setlength{\topmargin}{0in}
-\setlength{\headheight}{0in}
-\setlength{\headsep}{0in}
-\setlength{\textheight}{22cm}
-\setlength{\textwidth}{18cm}
-%\setlength{\textheight}{24.35cm}
-%\setlength{\textwidth}{20cm}
-\setlength{\oddsidemargin}{0in}
-\setlength{\evensidemargin}{0in}
-\setlength{\parindent}{0.0in}
-%\setlength{\parskip}{6pt}
-% \setlength{\parskip}{1cm plus4mm minus3mm}
-\setlength{\parskip}{0pt}
-\setlength{\parsep}{0pt}
-\setlength{\headsep}{0pt}
-\setlength{\topskip}{0pt}
-\setlength{\topmargin}{0pt}
-\setlength{\topsep}{0pt}
-\setlength{\partopsep}{0pt}
-\setlength{\itemsep}{1pt}
+% \setlength{\topmargin}{0in}
+% \setlength{\headheight}{0in}
+% \setlength{\headsep}{0in}
+% \setlength{\textheight}{22cm}
+% \setlength{\textwidth}{18cm}
+% %\setlength{\textheight}{24.35cm}
+% %\setlength{\textwidth}{20cm}
+% \setlength{\oddsidemargin}{0in}
+% \setlength{\evensidemargin}{0in}
+% \setlength{\parindent}{0.0in}
+% %\setlength{\parskip}{6pt}
+% % \setlength{\parskip}{1cm plus4mm minus3mm}
+% \setlength{\parskip}{0pt}
+% \setlength{\parsep}{0pt}
+% \setlength{\headsep}{0pt}
+% \setlength{\topskip}{0pt}
+% \setlength{\topmargin}{0pt}
+% \setlength{\topsep}{0pt}
+% \setlength{\partopsep}{0pt}
+% \setlength{\itemsep}{1pt}
 % \renewcommand\subsection{\@startsection
 % {subsection}{2}{0mm}%
 % {-\baslineskip}
 % {0.5\baselineskip}
 % {\normalfont\normalsize\itshape}}%
-\linespread{0.953}
+\linespread{1.0}

 \begin{document}
 %\pagestyle{fancy}
@ -144,13 +144,16 @@ failure mode of the component or sub-system}}}
 This paper presents a worked example of FMEA applied to an 
 integrated electronics/software system.
 %
-FMEA methodologies trace from the 1940's and were designed to
+%FMEA methodologies trace from the  1940's and were designed to
+%model simple electro-mechanical systems.
+%
+FMEA methodologies were originally in the 1940's designed to
 model simple electro-mechanical systems.
 %
 Software generally sits on top of most modern safety critical control systems
 and defines its most important system wide behaviour and communications.
 %
-Currently standards  that demand FMEA for hardware(HFMEA) (e.g. EN298, EN61508),
+Currently standards  that demand FMEA investigations for hardware(HFMEA) (e.g. EN298, EN61508),
 do not specify it for software, but instead specify good practise,
 review processes and language feature constraints.
 %  
@ -161,12 +164,22 @@ traces component {\fms}
 to resultant system failures, software until recently, has been left in a non-analytical
 limbo of best practises and constraints. 
 Software FMEA has been proposed 
-in several forms. SFMEA is always performed separately from HFMEA.
+in several forms.
+%
+However, SFMEA is always performed separately from HFMEA.
 %
 This paper seeks to examine the effectiveness of current and proposed SFMEA
-techniques, by using a analysing the chosen example, which is well known and understood
-from years of field experience, and determining how well the HFMEA and SFMEA 
-analysis reports model the failure mode behaviour.
+techniques, by analysing a simple hybrid hardware/software system,
+which is in common use and has mature field experience. %
+%analysing the chosen example, which is well known and understood
+%
+Because the chosen example is well understood it is
+%, this example is 
+useful
+to compare the results from these FMEA methodologies with 
+the known failure mode behaviour.
+%from years of field experience, and determining how well the HFMEA and SFMEA 
+%analysis reports model the failure mode behaviour.
 % %
 %If software and hardware integrated FMEA were possible, electro-mechanical-software hybrids could
 %be modelled, and so we could consider `complete' failure mode models. 
@ -205,7 +218,7 @@ component failure modes, %and by reasoning,
 tracing their effects through a system
 and determining what system level failure modes could be caused.
 %
-FMEA dates from the 1940s where simple electro-mechanical systems were the norm.
+FMEA has its roots in the previous century where simple electro-mechanical systems were the norm.
 Modern control systems nearly always have a significant software/firmware element,
 and not being able to model software with current FMEA methodologies 
 is a cause for criticism~\cite{safeware}[Ch.12].
@ -260,19 +273,20 @@ base component {\fms}, and translating them into system level events/failures~\c
 In a complicated system, mapping a component failure mode to a system level failure
 will mean a long reasoning distance; that is to say the actions of the 
 failed component will have to be traced through
-several sub-systems, gauging its effects with other components. 
+several sub-systems, gauging its effects with and on other components. 
 %
 With software at the higher levels of these sub-systems,
 we have yet another layer of complication.
 %
-In order to integrate software, %in a meaningful way 
-we need to re-think the 
-FMEA concept of simply mapping a base component failure to a system level event.
+%In order to integrate software, %in a meaningful way 
+%we need to re-think the 
+%FMEA concept of simply mapping a base component failure to a system level event.
 %
-SFMEA regards the components to be the variables used by the programs.
-These variables could become erroneously over-written, 
-by calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor it is running on, or
-by radiation causing bits to be erroneously altered.
+SFMEA regard, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
+The failure modes of these variables, are that they could become erroneously over-written, 
+calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor it is running on), or
+external influences such as
+ionising radiation causing bits to be erroneously altered.


 \paragraph{A more-complete Failure Mode Model}
@ -287,8 +301,8 @@ by radiation causing bits to be erroneously altered.
 % 
 In order to obtain a more complete failure mode model of 
 a hybrid electronic/software system we need to analyse 
-the hardware, the software, the hardware the software runs on,
-and the software hardware interface.
+the hardware, the software, the hardware the software runs on (i.e. the software's medium),
+and the software/hardware interface.
 %
 HFMEA is a well established technique and needs no further description in this paper.

@ -301,7 +315,7 @@ to supply a current signal to represent the value to be sent~\cite{aoe}[p.934].
 Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale,
 and this is referred to as {\ft} signalling.
 %
-{\ft} has an electrical advantage as well because the current in a loop is constant~\cite{aoe}[p.20].
+{\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20].
 Thus resistance in the wires between the source and the receiving end is not an issue
 that can alter the accuracy of the signal.
 %
@ -332,25 +346,26 @@ The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending
 signal to a micro-controller system.
 The signal is locally driven over a load resistor, and then read into the micro-controller via
 an ADC and its multiplexer.
-With the voltage detected at the ADC the multiplexer can read the intended quantitative
+With the voltage detected at the ADC the multiplexer we read the intended quantitative
 value from the external equipment.

 \subsection{Simple Software Example}


 Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$)
-representing the current detected with an additional error indication flag.
+representing the value intended by the current detected, with an additional error indication flag to indicate the validity
+of the value returned.
 %
 Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage
 from an ADC into the software.
 Let us define any value outside the 4mA to 20mA range as an error condition.
 %
 As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $0.004A * \ohms{220} = 0.88V$
-and $0.020A * \ohms{220} = 4.4V$. 
+and $$0.020A * \ohms{220} = 4.4V \;.$$
 %
 Our acceptable voltage range is therefore 
 %
-$(V \ge  0.88) \wedge (V \le 4.4) \; .$ 
+$$(V \ge  0.88) \wedge (V \le 4.4) \; .$$ 

 This voltage range forms our input requirement.
 %
@ -479,7 +494,7 @@ calls {\em read\_ADC}, which in turn interacts with the hardware/electronics.
 This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the
 software is reading values from the `lower~level' electronics.
 %
-FMEA is always a bottom-up process and so we must begin with this hardware.
+%FMEA is always a bottom-up process and so we must begin with this hardware.
 %
 The hardware is simply a load resistor, connected across an ADC input
 pin on the micro-controller and ground.
@ -504,8 +519,8 @@ the multiplexer and the analogue to digital converter.
 \label{tbl:r420i}

 \begin{tabular}{|| l   | c |   l ||} \hline
- \textbf{Failure}   &  \textbf{failure}     & \textbf{System Failure}          \\ 
- \textbf{Scenario}  &  \textbf{effect}      &                                  \\ \hline 
+ \textbf{Failure}   &  \textbf{failure}     & \textbf{System}          \\ 
+ \textbf{Scenario}  &  \textbf{effect}      &     \textbf{Failure}                             \\ \hline 
               \hline
    $R$                      &  OPEN~\cite{en298}[Ann.A]     &      $LOW$       \\  
                                &           &    $READING$             \\ \hline 
@ -537,7 +552,7 @@ For software FMEA we take the variables used by the system,
 and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}.
 From the function  $read\_4\_20\_input()$ we have the variables $error\_flag$,
 $input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$.
-We must now determine putative system failure modes for these variables becoming corrupted.
+We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}.


 {
@ -547,8 +562,8 @@ We must now determine putative system failure modes for these variables becoming
 \label{tbl:sfmea}

 \begin{tabular}{|| l   | c |   l ||} \hline
- \textbf{Failure}   &  \textbf{failure}     & \textbf{System Failure}          \\ 
- \textbf{Scenario}  &  \textbf{effect}      &                                  \\ \hline 
+ \textbf{Failure}   &  \textbf{failure}     & \textbf{System}          \\ 
+ \textbf{Scenario}  &  \textbf{effect}      &   \textbf{Failure}                               \\ \hline 
               \hline
    $error\_flag$               &   set FALSE        &  $VAL\_ERROR$    \\  
                                &                   &            \\ \hline 
@ -592,7 +607,7 @@ We must now determine putative system failure modes for these variables becoming

 Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM
 EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and 
-oscillator clock timing~\cite{sfmeaauto}.
+oscillator clock timing



@ -603,11 +618,11 @@ oscillator clock timing~\cite{sfmeaauto}.
 \label{tbl:sfmeaup}

 \begin{tabular}{|| l   | c |   l ||} \hline
- \textbf{Failure}   &  \textbf{failure}     & \textbf{System Failure}          \\ 
- \textbf{Scenario}  &  \textbf{effect}      &                                  \\ \hline 
+ \textbf{Failure}   &  \textbf{failure}     & \textbf{System}        \\ 
+ \textbf{Scenario}  &  \textbf{effect}      & \textbf{Failure}       \\ \hline 
               \hline
-    $RAM$               &   variable corruption        & All errors   \\  
-                        &                              &   from table~\ref{tbl:sfmea}        \\ \hline 
+    $RAM$               &   variable        & All errors   \\  
+                        &   corruption      &   from table~\ref{tbl:sfmea}        \\ \hline 
                                
   $RAM$               &    program flow        &   process       \\  
                       &                        &    halts / crashes        \\ \hline 
@ -632,51 +647,93 @@ oscillator clock timing~\cite{sfmeaauto}.
 \end{table} 
 }

-
-\section{Software FMEA - The software hardware interface}
+\clearpage
+\section{Software FMEA - The software/hardware interface}

 As FMEA is applied separately to software and hardware
 the interface between them is an undefined factor.
-Ozarin~\cite{sfmeainterface} recommends that an FMEA report be written
+Ozarin~\cite{sfmeainterface,procsfmea}  recommends that an FMEA report be written
 to focus on the software/hardware interface.
-
+The software/hardware interface has
+specific problems common to many systems and configurations
+and these are described in~\cite{sfmeainterface}.
+%An interface FMEA is performed in table~\ref{hwswinterface}.
+%
 The hardware to software interface for the {\ft} example is handled
 by the 'C' function $read\_ADC()$.
+~\cite{sfmeaauto}.
+%
+% An FMEA of the `software~medium' is given in table~\ref{tbl:sfmeaup}.
+\paragraph{Timing and Synchronisation.}
+The $ADCOUT$ register, where the raw ADC value is read 
+is an internal register used by the ADC and presented
+as a readable memory location when the ADC
+has finished updating it.
+Reading it at the wrong time would 
+cause an invalid value to be read.
+The synchronisation is performed by polling an $ADCGO$
+bit, a flag mapped to memory by which  the ADC indicates that the data is ready.

+\paragraph{Interrupt Contention.}
+Were an interrupt to also attempt to read from the ADC
+the ADCMUX could be altered, causing the non-interrupt
+routine to read from the wrong channel.
+
+\paragraph{Data Formatting.}
+The ADC may use a big-endian or little endian integer
+format. It may also right or left justify the bits in its value.



 \section{Conclusion}
 %
-The FMMD method has been demonstrated using an the industry standard {\ft}
-input circuit and software.
+This paper has picked a very simple example (the industry standard {\ft}
+input circuit and software) to demonstrate
+SFMEA and HFMEA methodologies used to describe a failure mode model.
+%Even a modest system would be far too large to analyse in conference paper
+%and this 
 %
-The {\dc} representing the {\ft} reader
-shows that by taking a 
+%The {\dc} representing the {\ft} reader
+%shows that by taking a 
 %modular approach for FMEA, i.e. FMMD, we can integrate
-four FMEA reports we can model the failure mode behaviour from
-several perspectives, for
-software and electrical systems% models.
-%
-With this analysis
-we have stages along the `reasoning~path' linking the failure modes from the 
-electronics to those in the software.
-Each {\fg} to {\dc} transition represents a 
-reasoning stage.
-%
+Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from
+model the system from four several perspectives.
 %
 With traditional FMEA methods the reasoning~distance is large, because
 it stretches from the component failure mode to the top---or---system level failure.
+%
+With these four  analysis reports
+we do not have stages along the `reasoning~path' linking the failure modes from the 
+electronics to those in the software.
+%Software is often written `defensively' but t 
+%Each {\fg} to {\dc} transition represents a 
+%reasoning stage.
+%
+%
 %For this reason applying traditional FMEA to software stretches
 %the reasoning distance even further.
 %
- In fact these reasoning paths overlap ---or even by-pass one another---
- it is very difficult to gauge cause and effect. For instance
- were the ADC to have a small value error, say adding
- a small percentage onto the value, we would be unable to
- detect this under the analysis conditions for this model, or
- be able to pinpoint it.
- 
+In fact many these reasoning paths overlap---or even by-pass one another---
+it is very difficult to gauge cause and effect.
+For instance, hardware failures are not analysed in the context of how they will
+be handled (or missed) by the software. 
+%
+System outputs commanded from may not take into account particular
+hardware limitations etc.
+
+The interface FMEA does serve to provide a useful
+checklist to ensure conventions used by the hardware
+and software are not mismatched.
+
+However, while these techniques ensure that the software and hardware is
+viewed and analysed from several perspectives, it cannot be termed a homogeneous
+failure mode model.
+%  For instance
+%  were the ADC to have a small value error, say adding
+%  a small percentage onto the value, we would be unable to
+%  detect this under the analysis conditions for this model, or
+%  be able to pinpoint it.
+%  


 {