From a8b864f083678a50b93a011ac0be58325d391fc3 Mon Sep 17 00:00:00 2001 From: Robin Clark Date: Sun, 17 Jun 2012 12:00:31 +0100 Subject: [PATCH] crammed this into 5 pages --- .../fmmd_software_hardware/software_fmmd.tex | 209 +++++++++--------- 1 file changed, 106 insertions(+), 103 deletions(-) diff --git a/papers/fmmd_software_hardware/software_fmmd.tex b/papers/fmmd_software_hardware/software_fmmd.tex index 7adbd06..97742aa 100644 --- a/papers/fmmd_software_hardware/software_fmmd.tex +++ b/papers/fmmd_software_hardware/software_fmmd.tex @@ -65,10 +65,10 @@ \newcommand{\ohms}[1]{\ensuremath{#1\Omega}} \newcommand{\fm}{failure~mode} \newcommand{\fms}{failure~modes} -\newcommand{\fg}{functional~group} +\newcommand{\fg}{functional~grouping} \newcommand{\FG}{\mathcal{G}} \newcommand{\DC}{\mathcal{DC}} -\newcommand{\fgs}{functional~groups} +\newcommand{\fgs}{functional~groupings} \newcommand{\dc}{derived~component} \newcommand{\dcs}{derived~components} \newcommand{\bc}{base~component} @@ -99,9 +99,9 @@ failure mode of the component or sub-system}}} \setlength{\topmargin}{0in} \setlength{\headheight}{0in} \setlength{\headsep}{0in} -\setlength{\textheight}{22cm} +%\setlength{\textheight}{22cm} \setlength{\textwidth}{18cm} -\setlength{\textheight}{24cm} +\setlength{\textheight}{24.5cm} %\setlength{\textwidth}{20cm} \setlength{\oddsidemargin}{0in} \setlength{\evensidemargin}{0in} @@ -115,7 +115,13 @@ failure mode of the component or sub-system}}} \setlength{\topmargin}{0pt} \setlength{\topsep}{0pt} \setlength{\partopsep}{0pt} -%\linespread{0.5} +\setlength{\itemsep}{1pt} +% \renewcommand\subsection{\@startsection +% {subsection}{2}{0mm}% +% {-\baslineskip} +% {0.5\baselineskip} +% {\normalfont\normalsize\itshape}} +\linespread{0.6} \begin{document} %\pagestyle{fancy} @@ -129,7 +135,7 @@ failure mode of the component or sub-system}}} %\lhead{Developing a rigorous bottom-up modular static failure modelling methodology} % numbers at outer edges \pagenumbering{arabic} % Arabic page numbers hereafter -\author{R.Clark$^\star$ \\ % , A.~Fish$^\dagger$ , C.~Garrett$^\dagger$, J.~Howse$^\dagger$ \\ +\author{R.Clark$^\star$,A.~Fish$^\dagger$ , C.~Garrett$^\dagger$, J.~Howse$^\dagger$ \\ $^\star${\em Energy Technology Control, UK. r.clark@energytechnologycontrol.com} \and $^\dagger${\em University of Brighton, UK} } @@ -181,7 +187,7 @@ can be applied to software, and is compatible and integrate-able with FMMD performed on mechanical and electronic systems. } -\today +%\today \nocite{en298} \nocite{en61508} @@ -199,7 +205,8 @@ to define failure modes and failure symptoms for software functions. % With these definitions we can apply the FMMD modular form of FMEA -to existing software\footnote{Existing software excluding recursive~\cite{misra}[16.2] code, and unstructured non-functional languages}. +to existing software\footnote{Existing software excluding recursive~\cite{misra}[16.2] code, +and unstructured non-functional languages}. } \section{FMEA Background} @@ -209,10 +216,11 @@ to existing software\footnote{Existing software excluding recursive~\cite{misra} Failure Mode effects Analysis is the process of taking component failure modes, and by reasoning, tracing their effects through a system and determining what system level failure modes could be caused. +% FMEA dates from the 1940s where simple electro-mechanical systems were the norm. Modern control systems nearly always have a significant software/firmware element, and not being able to model software with current FMEA methodologies -is a cause for criticism~\cite{easw}~\cite{safeware}~\cite{bfmea}. +is a cause for criticism~\cite{safeware}. %Several variants of FMEA exist, % traditional FMEA being associated with the manufacturing industry, with the aims of prioritising @@ -240,10 +248,10 @@ hardware and software models, but to perform FMEA on the software in isolation~\cite{procsfmea}. Some work has been performed using databases to track the relationships between variables -and system failure modes~\cite{procsfmeadb}, and work has been performed to +and system failure modes~\cite{procsfmeadb}, work has been performed to introduce automation into the FMEA process~\cite{appswfmea} and code analysis automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately -some schools of thought aim for FTA~\cite{nasafta}~\cite{nucfta} (top down - deductive) and FMEA (bottom-up inductive) +some schools of thought aim for FTA~\cite{nasafta,nucfta} (top down - deductive) and FMEA (bottom-up inductive) to be performed on the same system to provide insight into the software hardware/interface~\cite{embedsfmea}. % @@ -255,15 +263,16 @@ through the top (and therefore ultimately controlling) layer of software. \subsection{Current FMEA techniques are not suitable for software} The main FMEA methodologies are all based on the concept of taking -base component {\fms}, and translating them into system level events/failures~\cite{sfmea}~\cite{sfmeaa}. +base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}. In a complicated system, mapping a component failure mode to a system level failure will mean a long reasoning distance; that is to say the actions of the failed component will have to be traced through several sub-systems and the effects of other components on the way. % With software at the higher levels of these sub-systems we have yet another layer of complication. - -In order to integrate software, in a meaningful way we need to re-think the +% +In order to integrate software, %in a meaningful way +we need to re-think the FMEA concept of simply mapping a base component failure to a system level event. % One strategy would be to modularise FMEA. To break down the failure effect @@ -301,7 +310,6 @@ using all the failure modes of all its components. When we have its failure mode behaviour, or the symptoms of failure from the perspective of the {\fg}, we now treat the {\fg} as a {\dc}, where the failure modes of the {\dc} are the symptoms of failure of the {\fg}. % -% We can now use {\dcs} to build higher level {\fgs} until we have a complete hierarchical model of the failure mode behaviour of a system. An example of this process, applied to an inverting op-amp configuration is given in~\cite{syssafe2011}. @@ -386,7 +394,7 @@ derived component (which has the system---or top---level failure modes). \begin{figure} \centering - \includegraphics[width=200pt]{./fmmdh.png} + \includegraphics[width=150pt]{./fmmdh.png} % fmmdh.png: 365x405 pixel, 72dpi, 12.88x14.29 cm, bb=0 0 365 405 \caption{FMMD Hierarchy} \label{fig:fmmdh} @@ -428,7 +436,7 @@ We can thus apply the $\derivec$ function to software functions, by viewing them mode behaviour. To simplify things as well, software already fits into a hierarchy. For Electronics and Mechanical systems, although we may be guided by the original designers concepts of modularity and sub-systems in design, applying FMMD means deciding on the members for {\fgs} -and the subsequent hierarchy. With software already written, that hierarchy is fixed. +and the subsequent hierarchy. With software already written, that hierarchy is fixed/given. % map the FMMD concepts of {\fms}, {\fgs} and {\dcs} %to software functions. @@ -452,45 +460,43 @@ What is potentially difficult with a software function, is deciding what are its `failure~modes', and later what are its `failure~symptoms'. % With electronic components, we can use literature to point us to suitable sets of -{\fms}~\cite{fmd91}~\cite{mil1991}~\cite{en298}.%~\cite{en61508}~\cite{en298}. +{\fms}~\cite{fmd91,mil1991,en298}.%,en61508}.%~\cite{en298}. % With software, only some library functions are well known and rigorously documented enough to have the equivalent of known failure modes. Most software is `bespoke'. We need a different strategy to describe the failure mode behaviour of software functions. We can use definitions from contract programming to assist here. - +% \subsection{Contract programming description} - +% Contract programming is a discipline~\cite{dbcbe} for building software functions in a controlled and traceable way. Each function is subject to pre-conditions (constraints on its inputs), post-conditions (constraints on its outputs) and function wide invariants (rules). - - +% \paragraph{Mapping contract `pre-condition' violations to failure modes} - +% A precondition, or requirement for a contract software function defines the correct ranges of input conditions for the function to operate successfully. % For a software function, a violation of a pre-condition is in effect a failure mode of `one of its components'. - - +% \paragraph{Mapping contract `post-condition' violations to symptoms} - +% A post condition is a definition of correct behaviour by a function. A violated post condition is a symptom of failure of a function. Post conditions could be either actions performed (i.e. the state of hardware changed) or an output value of a function. - +% \paragraph{Mapping contract `invariant' violations to symptoms and failure modes} - +% Invariants in contract programming may apply to inputs to the function (where they can be considered {\fms} in FMMD terminology), and to outputs (where they can be considered {failure symptoms} in FMMD terminology). \subsection{Software FMMD} - +% For the purpose of example, we chose a simple common safety critical industrial circuit that is nearly always used in conjunction with a programmatic element. A common method for delivering a quantitative value in analogue electronics is @@ -509,15 +515,17 @@ and is therefore easy to detect as an error rather than an incorrect value. Should the driving electronics go wrong at the source end, it will usually supply far too little or far too much current, making an error condition easy to detect. % -At the receiving end, we only require one simple component to convert the -current signal into a voltage that we can read with an ADC: the humble resistor! +At the receiving end, need a resistor to convert the +current signal into a voltage that we can read with an ADC.% +%we only require one simple component to convert the +%current signal into a voltage that we can read with an ADC: the humble resistor! %BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP \begin{figure}[h] \centering - \includegraphics[width=230pt]{./ftcontext.png} + \includegraphics[width=200pt]{./ftcontext.png} % ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385 \caption{Context Diagram for {\ft} loop} \label{fig:ftcontext} @@ -545,15 +553,16 @@ As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$ and $0.020A * \ohms{220} = 4.4V$. % Our acceptable voltage range is therefore - -$$(V \ge 0.88) \wedge (V \le 4.4) \; .$$ +% +$(V \ge 0.88) \wedge (V \le 4.4) \; .$ This voltage range forms our input requirement. % We can now examine a software function that performs a conversion from the voltage read to a per~mil representation of the {\ft} input current. % -For the purpose of example the `C' programming language~\cite{kandr} is used\footnote{ C coding examples use the Misra~\cite{misra} and SIL 3 recomended language constraints~\cite{en61508}.}. +For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is +used\footnote{ C coding examples use the Misra~\cite{misra} and SIL 3 recommended language constraints~\cite{en61508}.}. We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision value which represents the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}). @@ -661,16 +670,18 @@ double read_ADC( int channel ) { %} -We now have a very simple software structure, a call tree, shown in figure~\ref{fig:ct1}. - -\begin{figure}[h] - \centering - \includegraphics[width=100pt]{./ct1.png} - % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224 - \caption{Call tree for software example} - \label{fig:ct1} -\end{figure} - +We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input} +calls {\em read\_ADC}, which in turn interacts with the hardware/electronics. +%shown in figure~\ref{fig:ct1}. +% +% \begin{figure}[h] +% \centering +% \includegraphics[width=56pt]{./ct1.png} +% % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224 +% \caption{Call tree for software example} +% \label{fig:ct1} +% \end{figure} +% This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the software is reading values from the `lower~level' electronics. % @@ -697,11 +708,11 @@ Our functional group, $G_1$ is thus the set of base components: $G_1 = \{R, ADC\ We now determine the {\fms} of all the components in $G_1$. For the resistor we can use a failure mode set from the literature~\cite{en298}. Where the function $fm$ returns a set of failure modes for a given component we can state: - -$$ fm(R) = \{OPEN,SHORT\}. $$ +% +$ fm(R) = \{OPEN,SHORT\}. $ \vbox{ For the ADC we can determine the following failure modes: - +% \begin{itemize} \item STUCKAT --- The ADC outputs a constant value, \item MUXFAIL --- The ADC cannot select its input channel correctly, @@ -710,8 +721,8 @@ For the ADC we can determine the following failure modes: \end{itemize} } We can use the function $fm$ to define the {\fms} of an ADC thus: -$$ fm(ADC) = \{ STUCKAT, MUXFAIL,LOW, HIGH \}. $$ - +$ fm(ADC) = \{ STUCKAT, MUXFAIL,LOW, HIGH \}. $ +% With these failure modes, we can analyse our first functional group, see table~\ref{tbl:cmatv}. { @@ -758,10 +769,10 @@ We now collect the symptoms for the hardware functional group, $\{ HIGH , LOW, V We now create a {\dc} to represent this called $CMATV$. % We can express this using the `$\derivec$' function thus: -$$ CMATV = \; \derivec (G_1) .$$ +$ CMATV = \; \derivec (G_1) .$ % As its failure modes are the symptoms of failure from the functional group we can now state: -$$fm ( CMATV ) = \{ HIGH , LOW, V\_ERR \} .$$ +$fm ( CMATV ) = \{ HIGH , LOW, V\_ERR \} .$ \paragraph{Functional Group - Software - Read\_ADC - RADC} @@ -788,8 +799,7 @@ of this function, which we can call $V\_REF$. Taken as a component for use in FMEA/FMMD our function has two failure modes. We can therefore treat it as a generic component, $Read\_ADC$, by stating: - -$$ fm(Read\_ADC) = \{ CHAN\_NO, VREF \} $$ +$ fm(Read\_ADC) = \{ CHAN\_NO, VREF \} $ As we have a failure mode model for our function, we can now use it in conjunction with with the ADC hardware {\dc} CMATV, to form a {\fg} $G_2$, where $G_2 =\{ CMSTV, Read\_ADC \}$. @@ -843,10 +853,10 @@ for the function. This postcondition, {\em /* ensure: value is voltage input to within 0.1\% */ }, corresponds to $VV\_ERR$, and is already in the {\fm} set for this {\fg}. % -We can now create a {\dc} called $RADC$ thus: $$RADC = \; \derivec(G_2)$$ which has the following +We can now create a {\dc} called $RADC$ thus: $RADC = \; \derivec(G_2)$ which has the following {\fms}: - -$$ fm(RADC) = \{ VV\_ERR, HIGH, LOW \} .$$ +% +$ fm(RADC) = \{ VV\_ERR, HIGH, LOW \} .$ @@ -861,9 +871,9 @@ to determine its {\fms}. Its pre-condition is, {\em /* require: input from ADC to be between 0.88 and 4.4 volts */}. We can map this violation of the pre-condition, to the {\fm} VRNGE; %As this function has one pre-condition we can state, - -$$ fm(read\_4\_20\_input) = \{ VRNGE \} .$$ - +% +$ fm(read\_4\_20\_input) = \{ VRNGE \} .$ +% We can now form a functional group with the {\dc} $RADC$ and the software component $read\_4\_20\_input$, i.e. $G_3 = \{read\_4\_20\_input, RADC\} $. @@ -894,12 +904,7 @@ software component $read\_4\_20\_input$, i.e. $G_3 = \{read\_4\_20\_input, RADC\ 4: $RADC_{LOW}$ & ADC may read & $OUT\_OF\_$ \\ & wrong channel & $RANGE$ \\ \hline - \hline - - -\hline - \end{tabular} \end{table} } @@ -913,26 +918,22 @@ can fail. An $OUT\_OF\_RANGE$ will be flagged by the error flag variable. The $VAL\_ERR$ will mean that the value read is simply wrong. % We can finally make a {\dc} to represent a failure mode model for our function $read\_4\_20\_input$ thus: - -$$ R420I = \; \derivec(G_3) .$$ - +% +$ R420I = \; \derivec(G_3) .$ +% This new {\dc} has the following {\fms}: -$$fm(R420I) = \{OUT\_OF\_RANGE, VAL\_ERR\} .$$ - +$fm(R420I) = \{OUT\_OF\_RANGE, VAL\_ERR\} .$ % % Using the derived components, CMATV and VTPM we create % a new functional group. This % integrates FMEA's from software and eletronics % into the same failure mode model. - - - We can now represent the software/hardware FMMD analysis as a hierarchical diagram, see figure~\ref{fig:hd}. \begin{figure}[h] \centering - \includegraphics[width=200pt]{./hd.png} + \includegraphics[width=60pt]{./hd.png} % hd.png: 363x520 pixel, 72dpi, 12.81x18.34 cm, bb=0 0 363 520 \caption{FMMD hierarchy with hardware and software elements} \label{fig:hd} @@ -952,39 +953,39 @@ using the groups as intermediate stages: % \end{eqnarray*} %or, with a nested definition, -$$ \derivec \Big( \derivec \big( \derivec(R,ADC), read\_4\_20\_input \big), read\_4\_20\_input \Big). $$ - +$ \derivec \Big( \derivec \big( \derivec(R,ADC), read\_4\_20\_input \big), read\_4\_20\_input \Big). $ +% This nested structure means that we have multiple traceable stages of failure mode reasoning in our analysis. Traditional FMEA would have only one stage of reasoning for each component failure mode. -\section{Heuristic Comments on {\ft} Input Circuit} - -Part of the design philosophy of a {\ft} loop, is that -if anything goes wrong, we should be able to detect it. -In fact unless all electrical elements in the loop -are in working order we will detect a failure in -the majority of cases. -\paragraph{Sending side of a {\ft} loop} -A current loop has to be actively maintained. If the sending side looses power, -the current will drop to zero, and thus be detectable as an error because it is below 4mA. -Should the sending circuitry fail, it is far more likely to drive too high or too low, rather than supply -an erroneous but in bounds ($4mA \ge \wedge \le 20mA$) value. -\paragraph{Receiving side of a {\ft} loop} -The most common fault is disconnection, and this is easily detected ($0mA\; \le \; 4mA$--out of bounds). -Other failure modes, such as the resistor going open or shorted -also immediately push the voltage signal out of bounds. -The software side of the interface, is easy to test, either as software modules -or as an integrated system (hand-held precision current sources are cheaply available). -\paragraph{What could go wrong---Production} -PCB construction contractors are well known for random polarity placement of diodes. -Less likely is that the resistor fitted will be an incorrect value, which could -lead to the range being incorrect. Were this the case, we would have to be very unlucky -and get a value very close to our chosen \ohms{220} for this to be a problem, and -in safety critical equipment, a production test rig would pick this up. -Worse perhaps, a resistor with poor temperature coefficient could be -erroneously chosen (this would be a cheaper component), and could contribute small errors. +% \section{Heuristic Comments on {\ft} Input Circuit} +% +% Part of the design philosophy of a {\ft} loop, is that +% if anything goes wrong, we should be able to detect it. +% In fact unless all electrical elements in the loop +% are in working order we will detect a failure in +% the majority of cases. +% \paragraph{Sending side of a {\ft} loop} +% A current loop has to be actively maintained. If the sending side looses power, +% the current will drop to zero, and thus be detectable as an error because it is below 4mA. +% Should the sending circuitry fail, it is far more likely to drive too high or too low, rather than supply +% an erroneous but in bounds ($4mA \ge \wedge \le 20mA$) value. +% \paragraph{Receiving side of a {\ft} loop} +% The most common fault is disconnection, and this is easily detected ($0mA\; \le \; 4mA$--out of bounds). +% Other failure modes, such as the resistor going open or shorted +% also immediately push the voltage signal out of bounds. +% The software side of the interface, is easy to test, either as software modules +% or as an integrated system (hand-held precision current sources are cheaply available). +% \paragraph{What could go wrong---Production} +% PCB construction contractors are well known for random polarity placement of diodes. +% Less likely is that the resistor fitted will be an incorrect value, which could +% lead to the range being incorrect. Were this the case, we would have to be very unlucky +% and get a value very close to our chosen \ohms{220} for this to be a problem, and +% in safety critical equipment, a production test rig would pick this up. +% Worse perhaps, a resistor with poor temperature coefficient could be +% erroneously chosen (this would be a cheaper component), and could contribute small errors. @@ -994,11 +995,13 @@ erroneously chosen (this would be a cheaper component), and could contribute sma The {\dc} representing the {\ft} reader in software shows that by taking a modular approach for FMEA, we can integrate software and electro-mechanical FMEA models. +% With this analysis we have a complete `reasoning~path' linking the failures modes from the electronics to those in the software. Each functional group to {\dc} transition represents a reasoning stage. +% With traditional FMEA methods the reasoning~distance is large, because it stretches from the component failure mode to the top---or---system level failure. For this reason applying traditional FMEA to software stretches @@ -1034,7 +1037,7 @@ Using FMMD we can determine an accurate failure model for the interface as well. % %\today % % { %\tiny % -\footnotesize +\tiny \bibliographystyle{plain} \bibliography{../../vmgbibliography,../../mybib} }