diff --git a/papers/JOURNAL_fmea_sw_hw/sw_hw_fmea.tex b/papers/JOURNAL_fmea_sw_hw/sw_hw_fmea.tex index 794bc9a..3e3893f 100644 --- a/papers/JOURNAL_fmea_sw_hw/sw_hw_fmea.tex +++ b/papers/JOURNAL_fmea_sw_hw/sw_hw_fmea.tex @@ -131,10 +131,12 @@ It is used both as a design tool (to determine weaknesses), and is a requirement FMEA has been successfully applied to mechanical, electrical and hybrid electro-mechanical systems. -This paper discusses the benefits and drawback of current -FMEA techniques and then proposes a modular FMEA methodology, Failure Mode Modular De-Composition (FMMD)~\cite{clark} -that has the advantages of traceable failure modes through the model -hierarchy, increases test effeciency and has +This paper discusses the benefits and drawbacks of current +FMEA techniques and then proposes a modular FMEA methodology, +Failure Mode Modular De-Composition (FMMD)~\cite{clark} +that has the advantages modularity, traceable failure modes throughout the model +hierarchy, increases test efficiency. +and has the ability to model integrated hardware and software systems. % Work on software FMEA (SFMEA) is beginning, but @@ -153,9 +155,10 @@ the ability to model integrated hardware and software systems. % reaches conclusions about the effectiveness and failure mode % coverage of the combined FMEA techniques. -This paper presents a simple worked example of FMMD applied to an -integrated electronics/software system, the industry standard -{\ft} signalling loop. +This paper presents a small worked example of FMMD applied to an +integrated electronics/software system. +%, the industry standard +%{\ft} signalling loop. % } % abstract @@ -166,12 +169,14 @@ integrated electronics/software system, the industry standard %FMEA methodologies trace from the 1940's and were designed to %model simple electro-mechanical systems. % -FMEA methodologies were originally in the 1940's designed to +FMEA methodologies were originally designed in the 1940's to model simple electro-mechanical systems. % -Because the early systems analysed by FMEA were relatively simple, -modern FMEA methodologies follow this paradigm and -trace component failure modes to system level failures. +Because those early systems were relatively simple, +%modern FMEA methodologies follow this paradigm and +they traced component failure modes directly to system level failures. +There were no concepts of modularity and no inclusion of +software elements. % %This paper explores the historical reasons why FMEA is performed in the way it is currently and %the new factors placing higher demands upon it. @@ -180,9 +185,11 @@ Software generally sits on top of most modern safety critical control systems and defines its most important system wide behaviour and communications. % Currently standards that demand FMEA investigations for hardware(HFMEA) (e.g. EN298, EN61508), -do not specify it for software, but instead specify good practise, +do not specify it for software but instead, specify good practise, review processes and language feature constraints. % +Failure modes from low level hardware elements are not traced through into the software models. +% This is a weakness. % Where HFMEA % scientifically @@ -194,19 +201,19 @@ in several forms. % However, SFMEA is always performed separately from HFMEA. % -This paper seeks to examine the effectiveness of current and proposed SFMEA -techniques, by analysing a simple hybrid hardware/software system, -which is in common use and has mature field experience. % -%analysing the chosen example, which is well known and understood -% -Because the chosen example is well understood it is -%, this example is -useful -to compare the results from these FMEA methodologies with -the known failure mode behaviour. -%from years of field experience, and determining how well the HFMEA and SFMEA -%analysis reports model the failure mode behaviour. +% This paper seeks to examine the effectiveness of current and proposed SFMEA +% techniques, by analysing a simple hybrid hardware/software system, +% which is in common use and has mature field experience. % +% %analysing the chosen example, which is well known and understood % % +% Because the chosen example is well understood it is +% %, this example is +% useful +% to compare the results from these FMEA methodologies with +% the known failure mode behaviour. +% %from years of field experience, and determining how well the HFMEA and SFMEA +% %analysis reports model the failure mode behaviour. +% % % %If software and hardware integrated FMEA were possible, electro-mechanical-software hybrids could %be modelled, and so we could consider `complete' failure mode models. % @@ -286,9 +293,13 @@ was designed. \subsection{Reasoning distance.} \label{reasoningdistance} %\fmmdglossRD +To perform FMEA, the effects of a component failure mode are examined +with respect to other components in the system; and from this behaviour +a system level failure or effect is determined. +% Reasoning distance, is the number of stages of logic and reasoning used in {\fm} analysis to map a failure cause to its potential outcomes; counted -by the number of {\fm} to component checks made. +by the number of {\fm} to other component checks made. % %The basic FMEA example in section~\ref{basicfmea} %considered one {\fm} against some of the components in the milli-volt reader. @@ -296,9 +307,9 @@ by the number of {\fm} to component checks made. To create an exhaustive FMEA report every known failure mode of every component within the system would have to be examined against all its other components. -% -`Reasoning~distance', for one {\fm}, is defined as the number of components checked against it -to determine its system level symptom(s). +% % +% `Reasoning~distance', for one {\fm}, is defined as the number of components checked against it +% to determine its system level symptom(s). % No current FMEA variant gives guidelines for the components that should be included to analyse a {\fm} in a system. @@ -306,7 +317,7 @@ be included to analyse a {\fm} in a system. Were a particular {\fm} examined against all the other components in a system this would give us the maximum reasoning distance. % -This is termed the exhaustive FMEA case for a single {\fm}. +This is termed the exhaustive FMEA (XFMEA) case for a single {\fm}. %does not % The exhaustive~reasoning~distance would be % the sum of the number of failure modes, against all other components @@ -358,8 +369,8 @@ For instance should the signal path be followed, with all components encountere \paragraph{Exhaustive Single Failure FMEA.} %\fmmdglossXFMEA % -To perform exhaustive FMEA (XFMEA), every possible interaction -of a failure mode with all other components in a system must be examined. +To XFMEA, every possible interaction +of a failure mode with all other components in a system would have to be examined. % Or in other words, all possible failure scenarios considered. % @@ -411,7 +422,8 @@ Current FMEA methodologies cannot consider---for the reason of state explosion-- % %\fmmdglossSTATEEX % -Because for practical reasons, XFMEA cannot be performed for anything other than a trivial system, +%Because for practical reasons, +In practical terms XFMEA cannot be performed for anything other than a trivial system, reliance is placed upon experts on the system under investigation to perform a meaningful analysis. % @@ -457,19 +469,25 @@ Typical examples include voltage regulators, op-amps, micro-controllers~\cite{p protocol handlers~\cite{mcp2515}. To build any of these component from scratch would be very expensive and time consuming, but these IC `components' have very high internal transistor counts, and each have their own unique failure mode behaviour. -Thus modern electronics has already jumped the gun of the base component failure mode mapped to +Thus modern electronics has already become too large in scope to sensibly implement the base component failure mode directly mapped to a system failure paradigm. -The automotive industry, because of mass production, must make products that are very safe but - financial pressure keeps their products affordable. +The automotive industry, because of mass production, must make products that have high safety integrity %that are very safe but +% financial pressure keeps their products +but must also be affordable. % This leads to specialist firms producing modules, such as automatic braking systems, -that are assembled to make a automobile. +that are bought in and assembled to make a automobile. % Performing failure analysis using the basic component single failure modes to system failure mapping, would be very difficult: this would require expert knowledge of the design behaviour and component types used in each module. % +Because modern systems have become more complex and now include software elements modularity +of some form, has become necessary to break down the state explosion problems associated with FMEA. +% +Some modular techniques are starting to be used, and are described below. + \paragraph{Automotive SIL (ASIL) --- modularisation of FMEDA} % The EN61508 variant for automotive use, as defined in standard ISO~26262, is known as Automotive SIL (ASIL)~\cite{Kafka20122}. @@ -501,9 +519,10 @@ have defined mechanisms for ensuring that all failure modes from a module must be considered in the analysis of the module(s) that incorporate it. -Because FMEA is a bottom up technique, applying a top down analysis (as in FMECAs indenture levels) -cannot guarantee to consider all component failure modes in the correct context. -% +\paragraph{Top Down or Bottom-up?} +% Because FMEA is a bottom up technique, applying a top down analysis (as in FMECAs indenture levels) +% cannot guarantee to consider all component failure modes in the correct context. +% % A top down approach (such as FTA) can miss~\cite{faa}[Ch.~9] individual failure modes of components, especially where there are non-obvious or unexpected top-level failures. % @@ -589,12 +608,12 @@ we have yet another layer of complication. %we need to re-think the %FMEA concept of simply mapping a base component failure to a system level event. % -SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}. -The failure modes of these variables, are that they could become erroneously over-written, -calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or -external influences such as -ionising radiation causing bits to be erroneously altered. - +% SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}. +% The failure modes of these variables, are that they could become erroneously over-written, +% calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or +% external influences such as +% ionising radiation causing bits to be erroneously altered. +It is desirable to trace failure modes effects through the hardware and software interfaces. @@ -648,7 +667,8 @@ in an improved FMEA methodology, \label{fmmdproc} % %% One line -The idea is to modularise from the bottom-up, by choosing groups of components that +The basic concept of FMMD is to modularise FMEA from the bottom-up: b +y choosing groups of components that work together to perform a given function: the failure modes of the components are considered, and a failure mode behaviour for the group determined: this group can now be used as a component in its own right with a set of failure modes. @@ -783,356 +803,356 @@ applying FMMD means deciding on the members for {\fgs} and the subsequent hierar % % \section{Example for analysis} % : How can we apply FMEA} -% -For the purpose of example, a simple common safety critical industrial circuit has been chosen -that is nearly always used in conjunction with a programmatic element. -A common method for delivering a quantitative value in analogue electronics is -to supply a current signal to represent the value to be sent~\cite{aoe}[p.934]. -Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale, -and this is referred to as {\ft} signalling. -% -{\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20]. -Thus resistance in the wires between the source and the receiving end is not an issue -that can alter the accuracy of the signal. -% -This circuit has many advantages for safety. If the signal becomes disconnected -it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range, -and is therefore easy to detect as an error rather than an incorrect value. -% -Should the driving electronics go wrong at the source end, it will usually -supply far too little or far too much current, making an error condition easy to detect. -% -At the receiving end, one needs a resistor to convert the -current signal into a voltage that we can read with an ADC.% -%we only require one simple component to convert the - - -%BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP - -\begin{figure}[h] - \centering - \includegraphics[width=230pt]{./ftcontext.png} - % ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385 - \caption{Context Diagram for {\ft} loop} - \label{fig:ftcontext} -\end{figure} - - -The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending a {\ft} -signal to a micro-controller system. -The signal is locally driven over a load resistor, and then read into the micro-controller via -an ADC and its multiplexer. -With the voltage detected at the ADC the multiplexer we read the intended quantitative -value from the external equipment. +% % +% For the purpose of example, a simple common safety critical industrial circuit has been chosen +% that is nearly always used in conjunction with a programmatic element. +% A common method for delivering a quantitative value in analogue electronics is +% to supply a current signal to represent the value to be sent~\cite{aoe}[p.934]. +% Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale, +% and this is referred to as {\ft} signalling. +% % +% {\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20]. +% Thus resistance in the wires between the source and the receiving end is not an issue +% that can alter the accuracy of the signal. +% % +% This circuit has many advantages for safety. If the signal becomes disconnected +% it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range, +% and is therefore easy to detect as an error rather than an incorrect value. +% % +% Should the driving electronics go wrong at the source end, it will usually +% supply far too little or far too much current, making an error condition easy to detect. +% % +% At the receiving end, one needs a resistor to convert the +% current signal into a voltage that we can read with an ADC.% +% %we only require one simple component to convert the +% +% +% %BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP +% +% \begin{figure}[h] +% \centering +% \includegraphics[width=230pt]{./ftcontext.png} +% % ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385 +% \caption{Context Diagram for {\ft} loop} +% \label{fig:ftcontext} +% \end{figure} +% +% +% The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending a {\ft} +% signal to a micro-controller system. +% The signal is locally driven over a load resistor, and then read into the micro-controller via +% an ADC and its multiplexer. +% With the voltage detected at the ADC the multiplexer we read the intended quantitative +% value from the external equipment. \subsection{Simple Software Example} -Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$) -representing the value intended by the current detected, with an additional error indication flag to indicate the validity -of the value returned. -% -This example straddles the hardware software interface, but is not overly complex, which allows -the FMEA seamless failure modelling of FMMD to be demonstrated. -% -A complete -PID based temperature controller is modelled in~\cite{clark}[6.3]. -% -Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage -from an ADC into the software. -Let us define any value outside the 4mA to 20mA range as an error condition. -% -As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $$0.004A * \ohms{220} = 0.88V $$ -and $$0.020A * \ohms{220} = 4.4V \;.$$ -% -Our acceptable voltage range is therefore -% -$$(V \ge 0.88) \wedge (V \le 4.4) \; .$$ - -This voltage range forms our input requirement. -% -We can now examine a software function that performs a conversion from the voltage read to -a per~mil representation of the {\ft} input current. -% -For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is -used\footnote{ C coding examples use the Misra~\cite{misra} and SIL-3 recommended language constraints~\cite{en61508}.}. -We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision -value representing the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}). - - -%%{\vbox{ -\begin{figure}[h+] - -\footnotesize -\begin{verbatim} -/***********************************************/ -/* read_4_20_input() */ -/***********************************************/ -/* Software function to read 4mA to 20mA input */ -/* returns a value from 0-999 proportional */ -/* to the current input. */ -/***********************************************/ -int read_4_20_input ( int * value ) { - double input_volts; - int error_flag; - - /* set ADC MUX with input to read from */ - input_volts = read_ADC(INPUT_4_20_mA); - - if ( input_volts < 0.88 || input_volts > 4.4 ) { - error_flag = 1; /* Error flag set to TRUE */ - } - else { - *value = (input_volts - 0.88) * ( 4.4 - 0.88 ) * 999.0; - error_flag = 0; /* indicate current input in range */ - } - /* ensure: value is proportional (0-999) to the - 4 to 20mA input */ - return error_flag; -} -\end{verbatim} -%} -%} - -\caption{Software Function: \textbf{read\_4\_20\_input}} -\label{fig:code_read_4_20_input} -%\label{fig:420i} -\end{figure} - -We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a -voltage for a given ADC channel. -% -This function -deals directly with the hardware in the micro-controller on which we are running the software. -% -Its job is to select the correct channel (ADC multiplexer) and then to initiate a -conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}). -% -It takes the raw ADC reading and converts it into a -floating point\footnote{the type `double' or `double precision' is a -standard C language floating point type~\cite{DBLP:books/ph/KernighanR88}.} -voltage value. - - +% Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$) +% representing the value intended by the current detected, with an additional error indication flag to indicate the validity +% of the value returned. +% % +% This example straddles the hardware software interface, but is not overly complex, which allows +% the FMEA seamless failure modelling of FMMD to be demonstrated. +% % +% A complete +% PID based temperature controller is modelled in~\cite{clark}[6.3]. +% % +% Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage +% from an ADC into the software. +% Let us define any value outside the 4mA to 20mA range as an error condition. +% % +% As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $$0.004A * \ohms{220} = 0.88V $$ +% and $$0.020A * \ohms{220} = 4.4V \;.$$ +% % +% Our acceptable voltage range is therefore +% % +% $$(V \ge 0.88) \wedge (V \le 4.4) \; .$$ +% +% This voltage range forms our input requirement. +% % +% We can now examine a software function that performs a conversion from the voltage read to +% a per~mil representation of the {\ft} input current. +% % +% For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is +% used\footnote{ C coding examples use the Misra~\cite{misra} and SIL-3 recommended language constraints~\cite{en61508}.}. +% We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision +% value representing the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}). +% +% +% %%{\vbox{ +% \begin{figure}[h+] +% +% \footnotesize +% \begin{verbatim} +% /***********************************************/ +% /* read_4_20_input() */ +% /***********************************************/ +% /* Software function to read 4mA to 20mA input */ +% /* returns a value from 0-999 proportional */ +% /* to the current input. */ +% /***********************************************/ +% int read_4_20_input ( int * value ) { +% double input_volts; +% int error_flag; +% +% /* set ADC MUX with input to read from */ +% input_volts = read_ADC(INPUT_4_20_mA); +% +% if ( input_volts < 0.88 || input_volts > 4.4 ) { +% error_flag = 1; /* Error flag set to TRUE */ +% } +% else { +% *value = (input_volts - 0.88) * ( 4.4 - 0.88 ) * 999.0; +% error_flag = 0; /* indicate current input in range */ +% } +% /* ensure: value is proportional (0-999) to the +% 4 to 20mA input */ +% return error_flag; +% } +% \end{verbatim} +% %} +% %} +% \caption{Software Function: \textbf{read\_4\_20\_input}} +% \label{fig:code_read_4_20_input} +% %\label{fig:420i} +% \end{figure} +% +% We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a +% voltage for a given ADC channel. +% % +% This function +% deals directly with the hardware in the micro-controller on which we are running the software. +% % +% Its job is to select the correct channel (ADC multiplexer) and then to initiate a +% conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}). +% % +% It takes the raw ADC reading and converts it into a +% floating point\footnote{the type `double' or `double precision' is a +% standard C language floating point type~\cite{DBLP:books/ph/KernighanR88}.} +% voltage value. +% +% +% %{\vbox{ -\begin{figure}[h+] - -\footnotesize -\begin{verbatim} -/***********************************************/ -/* read_ADC() */ -/***********************************************/ -/* Software function to read voltage from a */ -/* specified ADC MUX channel */ -/* Assume 10 ADC MUX channels 0..9 */ -/* ADC_CHAN_RANGE = 9 */ -/* Assume ADC is 12 bit and ADCRANGE = 4096 */ -/* returns voltage read as double precision */ -/***********************************************/ -double read_ADC( int channel ) { - int timeout = 0; - - /* return out of range result */ - /* if invalid channel selected */ - if ( channel > ADC_CHAN_RANGE ) - return -2.0; - /* set the multiplexer to the desired channel */ - ADCMUX = channel; - ADCGO = 1; /* initiate ADC conversion hardware */ - /* wait for ADC conversion with timeout */ - while ( ADCGO == 1 || timeout < 100 ) - timeout++; - if ( timeout < 100 ) - dval = (double) ADCOUT * 5.0 / ADCRANGE; - else - dval = -1.0; /* indicate invalid reading */ - /* return voltage as a floating point value */ - /* ensure: value is voltage input to within 0.1% */ - return dval; -} -\end{verbatim} -\caption{Software Function: \textbf{read\_ADC}} -\label{fig:code_read_ADC} -\end{figure} -%} -%} - - -We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input} -calls {\em read\_ADC}, which in turn interacts with the hardware/electronics. -%shown in figure~\ref{fig:ct1}. -% -% \begin{figure}[h] -% \centering -% \includegraphics[width=56pt]{./ct1.png} -% % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224 -% \caption{Call tree for software example} -% \label{fig:ct1} +% \begin{figure}[h+] +% +% \footnotesize +% \begin{verbatim} +% /***********************************************/ +% /* read_ADC() */ +% /***********************************************/ +% /* Software function to read voltage from a */ +% /* specified ADC MUX channel */ +% /* Assume 10 ADC MUX channels 0..9 */ +% /* ADC_CHAN_RANGE = 9 */ +% /* Assume ADC is 12 bit and ADCRANGE = 4096 */ +% /* returns voltage read as double precision */ +% /***********************************************/ +% double read_ADC( int channel ) { +% int timeout = 0; +% +% /* return out of range result */ +% /* if invalid channel selected */ +% if ( channel > ADC_CHAN_RANGE ) +% return -2.0; +% /* set the multiplexer to the desired channel */ +% ADCMUX = channel; +% ADCGO = 1; /* initiate ADC conversion hardware */ +% /* wait for ADC conversion with timeout */ +% while ( ADCGO == 1 || timeout < 100 ) +% timeout++; +% if ( timeout < 100 ) +% dval = (double) ADCOUT * 5.0 / ADCRANGE; +% else +% dval = -1.0; /* indicate invalid reading */ +% /* return voltage as a floating point value */ +% /* ensure: value is voltage input to within 0.1% */ +% return dval; +% } +% \end{verbatim} +% \caption{Software Function: \textbf{read\_ADC}} +% \label{fig:code_read_ADC} % \end{figure} -% -This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the -software is reading values from the `lower~level' electronics. -% -%FMEA is always a bottom-up process and so we must begin with this hardware. -% -The hardware is simply a load resistor, connected across an ADC input -pin on the micro-controller and ground. -% -We can identify the resistor and the ADC module of the micro-controller as -the base components in this design. -% -We now apply FMMD starting with the hardware. +% %} +% %} - -\section{Failure Mode effects Analysis} - -Four emerging and current techniques are now used to -apply FMEA to the hardware, the software, the software medium and the software hardware insterface. - -\subsection{Hardware FMEA} - -The hardware FMEA requires that for each component we consider all failure modes -and the putative effect those failure modes would have on the system. -The electronic components in our {\ft} system are the load resistor, -the multiplexer and the analogue to digital converter. - -{ -\tiny -\begin{table}[h+] -\caption{Hardware FMEA {\ft}} % title of Table -\label{tbl:r420i} - -\begin{tabular}{|| l | c | l ||} \hline - \textbf{Failure} & \textbf{failure} & \textbf{System} \\ - \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline - \hline - $R$ & OPEN~\cite{en298}[Ann.A] & $LOW$ \\ - & & $READING$ \\ \hline - - $R$ & SHORT~\cite{en298}[Ann.A] & $HIGH$ \\ - & & $READING$ \\ \hline - - - - $MUX$ & read wrong & $VAL\_ERROR$ \\ - & input ~\cite{fmd91}[3-102] & \\ \hline - - - - $ADC$ & ADC output & $VAL\_ERROR$ \\ - & erronous ~\cite{fmd91}[3-109] & \\ \hline -\hline -\end{tabular} -\end{table} -} - -The last two failures both lead to the system failure of $VAL\_ERROR$ . -They could lead to low or high reading as well, but we would only be able to determine this -from knowledge of the software systems criteria for these. -%\clearpage -\subsection{Software FMEA - variables in place of components} - -For software FMEA, we take the variables used by the system, -and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}. -From the function $read\_4\_20\_input()$ we have the variables $error\_flag$, -$input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$. -We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}. - - -{ -\tiny -\begin{table}[h+] -\caption{SFMEA {\ft}} % title of Table -\label{tbl:sfmea} - -\begin{tabular}{|| l | c | l ||} \hline - \textbf{Failure} & \textbf{failure} & \textbf{System} \\ - \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline - \hline - $error\_flag$ & set FALSE & $VAL\_ERROR$ \\ - & & \\ \hline - - $error\_flag$ & set TRUE & invalid \\ - & & error flag \\ \hline - - $input\_volts$ & corrupted & $VAL\_ERROR$ \\ - & & \\ \hline - - - $value $ & corrupted & $VAL\_ERROR$ \\ - & & \\ \hline - - - - $timeout $ & corrupted & $VAL\_ERROR$ \\ - & & \\ \hline - - - $ADCMUX $ & corrupted & $VAL\_ERROR$ \\ - & & \\ \hline - - - - $ADCGO $ & corrupted & $VAL\_ERROR$ \\ - & & \\ \hline - - $dval $ & corrupted & $VAL\_ERROR$ \\ - & & \\ \hline - - - - -\hline -\end{tabular} -\end{table} -} -%\clearpage -\subsection{Software FMEA - failure modes of the medium ($\mu P$) of the software} - -Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM -EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and -oscillator clock timing - - - -{ -\tiny -\begin{table}[h+] -\caption{SFMEA {\ft}} % title of Table -\label{tbl:sfmeaup} - -\begin{tabular}{|| l | c | l ||} \hline - \textbf{Failure} & \textbf{failure} & \textbf{System} \\ - \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline - \hline - $RAM$ & variable & All errors \\ - & corruption & from table~\ref{tbl:sfmea} \\ \hline - - $RAM$ & program flow & process \\ - & & halts / crashes \\ \hline - - $OSC$ & stopped & process \\ - & & halts \\ \hline - - $OSC$ & too & ADC \\ - & fast & value errors \\ \hline - - $OSC$ & too & ADC \\ - & slow & value errors \\ \hline - - $ROM$ & program & All errors \\ - & corruption & from table~\ref{tbl:sfmea} \\ \hline - - $ROM$ & constant & All errors \\ - & /data corruption & from table~\ref{tbl:sfmea} \\ \hline - -\hline -\end{tabular} -\end{table} -} +% +% We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input} +% calls {\em read\_ADC}, which in turn interacts with the hardware/electronics. +% %shown in figure~\ref{fig:ct1}. +% % +% % \begin{figure}[h] +% % \centering +% % \includegraphics[width=56pt]{./ct1.png} +% % % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224 +% % \caption{Call tree for software example} +% % \label{fig:ct1} +% % \end{figure} +% % +% This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the +% software is reading values from the `lower~level' electronics. +% % +% %FMEA is always a bottom-up process and so we must begin with this hardware. +% % +% The hardware is simply a load resistor, connected across an ADC input +% pin on the micro-controller and ground. +% % +% We can identify the resistor and the ADC module of the micro-controller as +% the base components in this design. +% % +% We now apply FMMD starting with the hardware. +% +% +% \section{Failure Mode effects Analysis} +% +% Four emerging and current techniques are now used to +% apply FMEA to the hardware, the software, the software medium and the software hardware insterface. +% +% \subsection{Hardware FMEA} +% +% The hardware FMEA requires that for each component we consider all failure modes +% and the putative effect those failure modes would have on the system. +% The electronic components in our {\ft} system are the load resistor, +% the multiplexer and the analogue to digital converter. +% +% { +% \tiny +% \begin{table}[h+] +% \caption{Hardware FMEA {\ft}} % title of Table +% \label{tbl:r420i} +% +% \begin{tabular}{|| l | c | l ||} \hline +% \textbf{Failure} & \textbf{failure} & \textbf{System} \\ +% \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline +% \hline +% $R$ & OPEN~\cite{en298}[Ann.A] & $LOW$ \\ +% & & $READING$ \\ \hline +% +% $R$ & SHORT~\cite{en298}[Ann.A] & $HIGH$ \\ +% & & $READING$ \\ \hline +% +% +% +% $MUX$ & read wrong & $VAL\_ERROR$ \\ +% & input ~\cite{fmd91}[3-102] & \\ \hline +% +% +% +% $ADC$ & ADC output & $VAL\_ERROR$ \\ +% & erronous ~\cite{fmd91}[3-109] & \\ \hline +% \hline +% \end{tabular} +% \end{table} +% } +% +% The last two failures both lead to the system failure of $VAL\_ERROR$ . +% They could lead to low or high reading as well, but we would only be able to determine this +% from knowledge of the software systems criteria for these. +% %\clearpage +% \subsection{Software FMEA - variables in place of components} +% +% For software FMEA, we take the variables used by the system, +% and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}. +% From the function $read\_4\_20\_input()$ we have the variables $error\_flag$, +% $input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$. +% We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}. +% +% +% { +% \tiny +% \begin{table}[h+] +% \caption{SFMEA {\ft}} % title of Table +% \label{tbl:sfmea} +% +% \begin{tabular}{|| l | c | l ||} \hline +% \textbf{Failure} & \textbf{failure} & \textbf{System} \\ +% \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline +% \hline +% $error\_flag$ & set FALSE & $VAL\_ERROR$ \\ +% & & \\ \hline +% +% $error\_flag$ & set TRUE & invalid \\ +% & & error flag \\ \hline +% +% $input\_volts$ & corrupted & $VAL\_ERROR$ \\ +% & & \\ \hline +% +% +% $value $ & corrupted & $VAL\_ERROR$ \\ +% & & \\ \hline +% +% +% +% $timeout $ & corrupted & $VAL\_ERROR$ \\ +% & & \\ \hline +% +% +% $ADCMUX $ & corrupted & $VAL\_ERROR$ \\ +% & & \\ \hline +% +% +% +% $ADCGO $ & corrupted & $VAL\_ERROR$ \\ +% & & \\ \hline +% +% $dval $ & corrupted & $VAL\_ERROR$ \\ +% & & \\ \hline +% +% +% +% +% \hline +% \end{tabular} +% \end{table} +% } +% %\clearpage +% \subsection{Software FMEA - failure modes of the medium ($\mu P$) of the software} +% +% Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM +% EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and +% oscillator clock timing +% +% +% +% { +% \tiny +% \begin{table}[h+] +% \caption{SFMEA {\ft}} % title of Table +% \label{tbl:sfmeaup} +% +% \begin{tabular}{|| l | c | l ||} \hline +% \textbf{Failure} & \textbf{failure} & \textbf{System} \\ +% \textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline +% \hline +% $RAM$ & variable & All errors \\ +% & corruption & from table~\ref{tbl:sfmea} \\ \hline +% +% $RAM$ & program flow & process \\ +% & & halts / crashes \\ \hline +% +% $OSC$ & stopped & process \\ +% & & halts \\ \hline +% +% $OSC$ & too & ADC \\ +% & fast & value errors \\ \hline +% +% $OSC$ & too & ADC \\ +% & slow & value errors \\ \hline +% +% $ROM$ & program & All errors \\ +% & corruption & from table~\ref{tbl:sfmea} \\ \hline +% +% $ROM$ & constant & All errors \\ +% & /data corruption & from table~\ref{tbl:sfmea} \\ \hline +% +% \hline +% \end{tabular} +% \end{table} +%} %\clearpage \subsection{Software FMEA - The software/hardware interface} @@ -1174,58 +1194,59 @@ format. It may also right or left justify the bits in its value. \section{Conclusion} % -This paper has picked a very simple example (the industry standard {\ft} -input circuit and software) to demonstrate -SFMEA and HFMEA methodologies used to describe a failure mode model. -%Even a modest system would be far too large to analyse in conference paper -%and this -% -%The {\dc} representing the {\ft} reader -%shows that by taking a -%modular approach for FMEA, i.e. FMMD, we can integrate -Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from -model the system from several failure mode perspectives. -% -With traditional FMEA methods the reasoning~distance is large, because -it stretches from the component failure mode to the top---or---system level failure. -% -With these four analysis reports -we do not have stages along the `reasoning~path' linking the failure modes from the -electronics to those in the software. -%Software is often written `defensively' but t -%Each {\fg} to {\dc} transition represents a -%reasoning stage. -% -% -%For this reason applying traditional FMEA to software stretches -%the reasoning distance even further. -% -In fact many these reasoning paths overlap---or even by-pass one another--- -it is very difficult to gauge cause and effect. -For instance, hardware failures are not analysed in the context of how they will -be handled (or missed) by the software. -% -System outputs commanded from software may not take into account particular -hardware limitations etc. - -The interface FMEA does serve to provide a useful -check-list to ensure data and synchronisation conventions used by the hardware -and software are not mismatched. However, the fact it is perceived as required -highlights the the miss-matches possible between the two types of analysis -which could run deeper than the mere interface level. - - -However, while these techniques ensure that the software and hardware is -viewed and analysed from several perspectives, it cannot be termed a homogeneous -failure mode model. -% For instance -% were the ADC to have a small value error, say adding -% a small percentage onto the value, we would be unable to -% detect this under the analysis conditions for this model, or -% be able to pinpoint it. -% - -Need wishlist ticks and solved problems here. +% This paper has picked a very simple example %(the industry standard {\ft} +% %input circuit and software) +% to demonstrate +% SFMEA and HFMEA methodologies used to describe a failure mode model. +% %Even a modest system would be far too large to analyse in conference paper +% %and this +% % +% %The {\dc} representing the {\ft} reader +% %shows that by taking a +% %modular approach for FMEA, i.e. FMMD, we can integrate +% Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from +% model the system from several failure mode perspectives. +% % +% With traditional FMEA methods the reasoning~distance is large, because +% it stretches from the component failure mode to the top---or---system level failure. +% % +% With these four analysis reports +% we do not have stages along the `reasoning~path' linking the failure modes from the +% electronics to those in the software. +% %Software is often written `defensively' but t +% %Each {\fg} to {\dc} transition represents a +% %reasoning stage. +% % +% % +% %For this reason applying traditional FMEA to software stretches +% %the reasoning distance even further. +% % +% In fact many these reasoning paths overlap---or even by-pass one another--- +% it is very difficult to gauge cause and effect. +% For instance, hardware failures are not analysed in the context of how they will +% be handled (or missed) by the software. +% % +% System outputs commanded from software may not take into account particular +% hardware limitations etc. +% +% The interface FMEA does serve to provide a useful +% check-list to ensure data and synchronisation conventions used by the hardware +% and software are not mismatched. However, the fact it is perceived as required +% highlights the the miss-matches possible between the two types of analysis +% which could run deeper than the mere interface level. +% +% +% However, while these techniques ensure that the software and hardware is +% viewed and analysed from several perspectives, it cannot be termed a homogeneous +% failure mode model. +% % For instance +% % were the ADC to have a small value error, say adding +% % a small percentage onto the value, we would be unable to +% % detect this under the analysis conditions for this model, or +% % be able to pinpoint it. +% % +% +% Need wishlist ticks and solved problems here. { \footnotesize