diff --git a/submission_thesis/CH5_Examples/context_calltree.dia b/submission_thesis/CH5_Examples/context_calltree.dia index 20e146c..33c74ae 100644 Binary files a/submission_thesis/CH5_Examples/context_calltree.dia and b/submission_thesis/CH5_Examples/context_calltree.dia differ diff --git a/submission_thesis/CH5_Examples/software.tex b/submission_thesis/CH5_Examples/software.tex index f22f307..c78da4b 100644 --- a/submission_thesis/CH5_Examples/software.tex +++ b/submission_thesis/CH5_Examples/software.tex @@ -5,7 +5,7 @@ \section{Software and Hardware Failure Mode Concepts} \label{sec:elecsw} -In this chapter we show that FMMD can be applied to software enabling us to build build complete failure models +In this chapter we show that FMMD can be applied to both software and electronics enabling us to build complete failure models of typical modern safety critical systems. With modular FMEA i.e. FMMD %(FMMD) we have the concepts of failure~modes @@ -44,7 +44,7 @@ and the subsequent hierarchy. % With software already written, the hierarchies are given. % -To apply FMMD to software, we collect the elements used by a software function, along with the function its-self +To apply FMMD to software, we collect the elements used by a software function, along with the function itself to form a {\fg}. When we have analysed the failure mode behaviour of this {\fg} and have its failure mode symptoms, we can create a {\dc}. That {\dc} can be used by functions that call the function we have just analysed, until @@ -131,23 +131,23 @@ A post condition is a definition of correct behaviour of a function. A violated post condition is a symptom of failure, or derived failure mode, from a function. % Post conditions could be either actions performed (i.e. the state of hardware changed) or an output value of a function. -In pure contract programming, a violation of a pre-condition would not cause the function to -be executed. +In pure contract programming, a violation of a pre-condition would cause the function to +\textbf{not} be executed. % In implementation code, a pre-condition violation should cause an error to be generated, and thus a post condition to fail. % A function can fail for reasons other than the -a failure of one the variables/inputs or functions that it calls. -Variables can become corrupted, by radiation affecting RAM or -by another software function erroneously overwriting variables. +failure of one the variables/inputs or functions that it calls. +Variables can become corrupted, by radiation affecting RAM~\cite{5488118,5963919} or +by another software function erroneously overwriting variables~\cite{swseatbelt}. Current work on software FMEA generally focuses on mapping variable corruption to failure modes~\cite{procsfmea,procsfmeadb,sfmeaauto,sfmea}. However, errors other than variable corruption can occur, -for instance a microprocessor may have subtle bugs in its instruction set or +for instance a microprocessor may have subtle bugs in its instruction set, or incorrectly handled -interrupt contention could cause side effects in software. -For the failure mode model of any software function +interrupt contention which could cause side effects in software. +For the failure mode model of any software function, we must consider all failure modes of post condition violations as well as those caused by `components'. @@ -227,7 +227,7 @@ We can now examine a software function that performs a conversion from the volta a per~mil representation of the {\ft} input current. % For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is used. -We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision +We initially assume a function \textbf{read\_ADC} that returns a floating point %double precision value which represents the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}). @@ -446,13 +446,13 @@ With these failure modes, we can analyse our first functional group, see table~\ } -We now collect the symptoms for the hardware functional group, $\{ HIGH , LOW, V\_ERR \} $. -We now create a {\dc} to represent this called $CMATV$. +We now collect the symptoms for the hardware functional group, $\{ HIGH , LOW, V\_ERR \} $, and +create a {\dc} to represent this called, $CMATV$. %We can express this using the `$\derivec$' function thus: %$$ CMATV = \; \derivec (G_1) .$$ -As its failure modes, are the symptoms of failure from the functional group we can now state: +As its failure modes, are the symptoms of failure from the functional group we state: $$fm ( CMATV ) = \{ HIGH , LOW, V\_ERR \} .$$ @@ -475,7 +475,7 @@ which we can call $ CHAN\_NO $. % The reference voltage for the ADC has a 0.1\% accuracy requirement. % -If the reference value is outside of this, it is also a {\fm} +If the reference value is outside this, it is also a {\fm} of this function, which we can call $V\_REF$. Taken as a component for use in FMEA/FMMD our function has @@ -484,10 +484,10 @@ by stating: $$ fm(Read\_ADC) = \{ CHAN\_NO, VREF \} $$ -As we have a failure mode model for our function, we can now use it in conjunction with +As we have a failure mode model for our function, we use it in conjunction with with the ADC hardware {\dc} CMATV, to form a {\fg} $G_2$, where $G_2 =\{ CMSTV, Read\_ADC \}$. - -We now analyse this hardware/software combined {\fg}. +% +We analyse this hardware/software combined {\fg}. @@ -540,7 +540,7 @@ We now analyse this hardware/software combined {\fg}. We now collect the symptoms of failure for the {\fg} analysed (see table~\ref{tbl:radc}) as $\{ VV\_ERR, HIGH, LOW \}$. We can add as well the violation of the postcondition for the function. -This postcondition, {\em /* ensure: value is voltage input to within 0.1\% */ }, +This postcondition, {\em /* ensure: value is voltage input to within 0.1\% */}, corresponds to $VV\_ERR$, and is already in the {\fm} set for this {\fg}. %We can now create a {\dc} called $RADC$ thus: $$RADC = \; \derivec(G_2)$$ which has the following @@ -724,21 +724,25 @@ addressed using the Proportional Integral differential (PID) algorithm~\cite{dco Traditionally this was performed in analogue electronics with trimmer potentiometers providing the P and I parameters. Since the introduction of micro-processors, it has been possible to -implement PID programmatic-ally. +implement PID pro-grammatically. An FMMD analysis of a PID temperature controller would mean an -analysis of a standalone system without being un-wieldingly large. -\paragraph{PID Temperature Control.} +analysis of a realistic standalone system without being it becoming an un-wieldingly large task. +\paragraph{The PID Temperature Control Algorithm.} PID control starts with a setpoint, or desired value for a process (here the temperature). It reads the process value and determines an error value for it. The aim of the PID controller is to minimise this error term, by setting an output value, which is fed back into the process (in this example the amount of power to supply the heater). The error value is integrated and multiplied by an I constant. A differential of the error value is calculated and multiplied by a D constant. -The error value its self is multiplied by a P constant, and all three of these are added +The error value itself is multiplied by a P constant, and all three of these are added to obtain the output required. +% +A mathematical description of PID with frequency domain modelling (La-Place transforms etc) +may be found in~\cite{dcods}[Ch.3.3]. +% \subsection{Design Stage: Implementation on a micro-controller.} When designing a computer program it is often useful to -produce a structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:62004}, see figure~\ref{fig:context_diagram_PID}. +start with a structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:62004}, see figure~\ref{fig:context_diagram_PID}. \begin{figure}[h]+ \centering @@ -747,16 +751,21 @@ produce a structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:6 \caption{Yourdon Context Diagram for PID Temperature Controller.} \label{fig:context_diagram_PID} \end{figure} + +Using figure~\ref{fig:context_diagram_PID} we review the system in terms of its data flow, starting +with the data sources ( the Pt100 inputs) and the data syncs (the heater output and the LED indicators). +% We have two voltage inputs (see section~\ref{sec:Pt100}) from the Pt100 temperature sensor. For the Pt100 sensor, we will need to read the voltages it outputs and for this -we will need an ADC and MUX. +will therefore require an ADC and MUX. % For the output, we can use a Pulse Width Modulator (PWM) (this is a common module found on micro-controllers -allowing a variable power output~\cite{pwm}). PWM's ADC's and MUX's are commonly built into cheap micro-controllers~\cite{pic18f2523}. -We can now build more detail into the Yourdon diagram, with the afferent data flow coming through the MUX and ADC on the micro-controller, and the efferent +allowing a variable power output~\cite{aoe}[p.360]). PWM's ADC's and MUX's are commonly built into cheap micro-controllers~\cite{pic18f2523}[Ch.15]. +We refine the Yourdon diagram, with the afferent data flow coming through the MUX and ADC on the micro-controller, and the efferent channelled through a PWM module, %again built into the micro-controller, % -see figure~\ref{fig:context_diagram2_PID}. +and add more detail, see figure~\ref{fig:context_diagram2_PID}. + \begin{figure}[h]+ \centering \includegraphics[width=300pt]{./CH5_Examples/context_diagram2_PID.png} @@ -774,10 +783,12 @@ We refine the data flow within the software and thus define software functions. %this in terms of software functions. % We follow the data streams through the process, creating transform bubbles as required. -In all `bare~metal' software architectures, we need a rudimentary operating system, often referred to as the monitor. +In all `bare~metal'\footnote{`Bare~metal' is a term used to indicate a micro-processor +controlled system that does not use a traditional operating system.} +software architectures, we need a rudimentary operating system, often referred to as the `monitor'. % -We bare in mind that PID, because the algorithm depends heavily on integral calculus, is time sensitive -and we therefore need to call at precise intervals specific to its integration and differential coefficients. +We bear in mind that PID, because the algorithm depends heavily on integral calculus is time sensitive +and we therefore need to call at precise intervals determined by its proportional, integral and differential (PID) coefficients. % Most micro-controllers feature several general purpose timers~\cite{pic18f2523}. We can use an internal timer in conjunction with the monitor function @@ -798,7 +809,8 @@ Using figure~\ref{fig:contextsoftware} we can now pick the transform bubble we want to be the `main' or controlling function in the software. This can be thought of as picking one bubble and holding it up. The other bubbles hang underneath forming the software call tree hierarchy, see figure~\ref{fig:context_calltree}. -From is clearly going to be the monitor function. +From examining the diagram, and with common embedded programming practise, +this is clearly going to be the monitor function. \begin{figure}[h]+ \centering \includegraphics[width=300pt]{./CH5_Examples/context_calltree.png} @@ -819,11 +831,13 @@ With the set point error value the PID function will return function (i.e. the PID demand which will be returned to the monitor function). % -On returning to the monitor function, it will return the PID demand value. +%On returning to the monitor function, it will return the PID demand value. The PID demand value will be applied via the PWM. We now have a rudimentary closed loop control system incorporating both hardware and software. % -Using the Yourdon methodology we have the system design: we have all the components, i.e. hardware elements and software functions +By using the Yourdon methodology we the programmatic design --- or call tree --- defined. +% +We now have all the components, i.e. hardware elements and software functions that will be used in the temperature controller. We list these, and begin, from the bottom-up, to apply FMMD analysis. @@ -838,7 +852,7 @@ Identified electronic components: \item HEATER --- Heating element, essentially a resistor. \item Pt100 --- Pt100 Temperature sensor, as analysed in section~\ref{sec:Pt100}. \item PWM --- Internal micro controller pulse width modulation module - \item General Purpose I/O (GPIO) --- + \item General Purpose I/O (GPIO) --- I/O used to source LED current \item LEDs --- Indication LEDs via GPIO \item micro-controller --- the medium for running the software \end{itemize} @@ -876,10 +890,14 @@ $$ fm(Pt100) = \{ OUT\_OF\_RANGE \} $$ \paragraph{PWM} -The PWM, in use, is a hardware register written to with an integer value. -It then applies a mark space ratio proportional to that value providing -a means of applying varying amounts of power. When the PWM -action is halted the digital output pin associated with it will typically be held in a high or low state. +%The PWM, in use, is a hardware register written to with an integer value~\cite{pic182523}[Ch.15]. +From a programmatic perspective a PWM output is a register that software writes +an unsigned magnitude value to~\cite{pic182523}[Ch.15]. +The PWM hardware module +applies this using a mark space ratio proportional to that value, providing +a means of varying the amount of power supplied. +When the PWM action is halted, or fails, the digital output pin associated with it, +will typically be held in a high or low state. We therefore state: $$ fm(PWM) = \{ HIGH, LOW \}.$$ @@ -905,14 +923,14 @@ With the call tree structure defined (see figure~\ref{fig:context_calltree}), we components from the bottom-up, starting with the afferent flow, the reading in of the temperature and its conversion to a PID calculated heater output demand. -\subsubsection{Afferent flow FMMD analysis , Pt100, temperature, set point error, PID output demand.} +\subsubsection{Afferent flow FMMD analysis, Pt100, temperature, set point error, PID output demand.} We start with the afferent flow from the Pt100. %with the software, and consider the hardware elements %used (if any) by each software function. -Starting at the bottom we form a {\fg} with +Starting at the bottom, we form a {\fg} with the function read\_ADC and the Pt100. -This gives us a {\dc} we shall call ReadPt100. - +This gives us a {\dc} which we call ReadPt100. +% { \tiny \begin{table}[h+] @@ -942,20 +960,24 @@ This gives us a {\dc} we shall call ReadPt100. FC4: $RADC_{LOW}$ & ADC may read & $VOLTAGE\_LOW$ \\ \hline + + FC5: post condition fails & software failure & $VAL\_ERR$ \\ + in function read\_ADC & & \\ \hline + \end{tabular} \end{table} } - +% The {\dc} Read\_Pt100 is a failure mode model of the Read\_ADC function and the Pt100 hardware, and has the following failure modes: $$ fm (Read\_Pt100) = \{ VOLTAGE\_HIGH, VAL\_ERR, VOLTAGE\_LOW \}. $$ -We can now move along in the afferent flow, and we come to the convert\_ADC\_to\_T function. -This will call Read\_ADC thwice, one for the high Pt100 value, again for the lower. % and once for to read a current sense. +We move along the afferent flow, and we come to the convert\_ADC\_to\_T function. +This will call Read\_ADC twice, one for the high Pt100 value, again for the lower. % and once for to read a current sense. We then, calculate the resistance of the Pt100 element, and with this---using a -polynomial or a lookup table~\cite{eutothermtables}---and calculate the temperature. +polynomial or a lookup table~\cite{eurothermtables}---and calculate the temperature. The pre-conditions for the function are that: \begin{itemize} % \item The current calculated is within pre-defined bounds i.e. Pt100\_current, @@ -968,7 +990,7 @@ Note that a temperature outside the pre-defined range will also cause these erro The postcondition is that it returns a temperature within a given tolerance to the temperature at the sensor. A failure of this post-condition can be termed temp\_incorrect. \clearpage -We now apply FMMD to the {\fg} formed by Read\_Pt100 and the function convert\_ADC\_to\_T. +We apply FMMD to the {\fg} formed by Read\_Pt100 and the function convert\_ADC\_to\_T. We can call the resulting {\dc} Get\_Temperature. { @@ -1012,6 +1034,10 @@ We can call the resulting {\dc} Get\_Temperature. & range error, but may also & \\ & cause us to read an & \\ & incorrect temperature & \\ \hline + + FC5: post condition fails & software failure & temp\_incorrect \\ + in function convert\_ADC\_to\_T & & \\ \hline + \hline \end{tabular} @@ -1019,16 +1045,16 @@ We can call the resulting {\dc} Get\_Temperature. } -We now collect the failure symptoms for the {\dc} Get\_Temperature and can state: +We collect the failure symptoms for the {\dc} Get\_Temperature and can state: $$fm(Get\_Temperature) = \{ Pt100\_out\_of\_range, temp\_incorrect \}$$ \clearpage Following the afferent flow further, we come to a function to determine the control error value. -The is simply the target temperature subtracted from the measured. +This is simply the target temperature subtracted from the measured. We thus form a {\fg} with our newly {\dc} Get\_Temperature and the function determine\_set\_point\_error. - +% The pre-condition for determine\_set\_point\_error is that the temperature read by it is accurate, and its post condition is to return the correct control error value. Most failure modes from a Pt100 are observable. @@ -1059,7 +1085,9 @@ an incorrect error value. FC2: $temp\_incorrect$ & pre-condition violated & IncorrectErrorValue \\ & unobservable & \\ & undetectable failure mode & \\ \hline - + + FC3: post condition fails & software failure & IncorrectErrorValue \\ + in function determine\_set\_point\_error & & \\ \hline \end{tabular} @@ -1082,13 +1110,18 @@ The post-condition is that it outputs correct control values. % RESP FOR TIMEING IS ON CALLING FUNCTION AND IS A SEPARATE ERROR- TGHINK ABOUT JITTER..... % and controll values..... Jitter might not matter, wrong int times would % controlling function provdes context of use. -Those familiar with the PID algorithm may here notice raise the point of calling frequency. -were this function to be called at an incorrect rate its output +Those familiar with the PID algorithm may realise that digital signal processing algorithms are sensitive to calling frequency. +Were this function to be called at an incorrect rate its output would be wrong (the differential and integral parameters would effectively have been changed). +% However this problem is a failure mode for the function calling it. +% The calling function sets the context for the PID algorithm (i.e. what it is used for). If this PID were to be used, say as some form of low pass filter, we could consider jitter -for instance. In a control environment with PID jitter would not be a significant factor. +for instance. +% +In a control environment with PID, jitter would not be a significant factor. +% This harks back to the context of use (see section~\ref{sec:subjectiveobjective}) discussion, the subjective being the context the {\dc} is used for/in, and the objective being the logic and process of the failure mode analysis. @@ -1116,7 +1149,10 @@ being the logic and process of the failure mode analysis. & unobservable & \\ & undetectable failure mode & \\ \hline - + + FC3: post condition fails & software failure & IncorrectControlErrorV \\ + in function PID & & \\ \hline + \end{tabular} \end{table} @@ -1149,8 +1185,8 @@ of the efferent flow. We apply FMMD analysis to this {\fg} in table~\ref{tbl:hea For the output\_control function, we have a pre-condition that the PWM module is configured and working, and has the correct clock frequency. A second pre-condition is that the heating element is connected and working. -The post condition is that is sets the correct value into the PWM register -to implement the PWM demand. +The post condition is that it sets the correct value into the PWM register +to implement the power output demand. { \tiny @@ -1174,16 +1210,19 @@ to implement the PWM demand. FC2: $ PWM stuck LOW $ & pre-condition violated & HeaterOff \\ & PWM module not working & \\ \hline - FC3: $ output\_control$ wrong value & The software supplies the wrong & HeaterOutputIncorrect \\ - & value to the PWM register & \\ \hline + - - FC4: HEATER $SHORT$ & heating element resistor & HeaterOff \\ + FC3: HEATER $SHORT$ & heating element resistor & HeaterOff \\ & SHORT no heating effect & \\ \hline - FC5: HEATER $OPEN $ & heating element resistor & HeaterOff \\ - & OPEN no heating effect & \\ \hline + FC4: HEATER $OPEN $ & heating element resistor & HeaterOff \\ + & OPEN no heating effect & \\ \hline + + FC5: $ output\_control$ post & The software supplies the wrong & HeaterOutputIncorrect \\ + condition failure & value to the PWM register & \\ \hline + + \end{tabular} \end{table} @@ -1206,42 +1245,25 @@ $$fm(HeaterOutput) = \{ HeaterOnFull, HeaterOff, HeaterOutputIncorrect \}$$ - - - - - - - - - - - - - - - - - - - - - - - \subsubsection{Efferent flow: LED status LEDs} The status LEDS will be controlled by general purpose (GPIO) I/O pins. +% We could have say, three LEDS one flashing with a human readable mark space ratio representing the heater output, one flashing at a regular interval to indicate the processor is alive and another flashing at an interval related to the temperature, (to indicate if the temperature readings are within expected ranges). +% Each LED should flash in normal operation, and any LED being permanently on or off would indicate to the operator that an error had occurred. -The pre condition for this function is that the GPIO +% +The pre-condition for this function is that the GPIO is connected to working LEDS. -The post condition is that the function setLEDS, will supply correct indication by flashing the LEDs. +% +The post condition is that the function setLEDS will supply correct indication by flashing the LEDs. +% We form a {\fg} from the GPIO, the LEDs and the software function setLEDs. +% We apply FMMD analysis to this {\fg} in table~\ref{tbl:ledoutput}. { @@ -1297,6 +1319,8 @@ We apply FMMD analysis to this {\fg} in table~\ref{tbl:ledoutput}. \end{figure} +Our {\dc} for the setLED function, GPIO and LEDs has the following failure modes: +$$ fm(LEDoutput) = \{FailureIndicated, IndicationError \} $$ \subsubsection{Final Analysis Stage: PID Temperature Controller} @@ -1347,7 +1371,7 @@ The post condition for the monitor function is that it implements the PID contro & observable error can be indicated & \\ \hline - FC2: PID IncorrectControlerrorV & undetectable/iunobservable & ControlFailure \\ + FC2: PID IncorrectControlerrorV & undetectable/unobservable & ControlFailure \\ & failure PID will not control properly & \\ \hline FC3: HeaterOutput & Heater will constantly & ControlFailureIndicated \\ @@ -1398,7 +1422,7 @@ The post condition for the monitor function is that it implements the PID contro We can now create a {\dc} for the standalone temperature controller, and give it the name TempController. It will have the following failure modes: -$$fm ( TempController ) = \{ ControlFailureIndicated, ControlFailure, KnownIndicationError, UnknownIndicationError \}$$ +$$fm ( TempController ) = \{ ControlFailureIndicated, ControlFailure, KnownIndicationError, UnknownIndicationError \}.$$ We can now represent this failure mode analysis as an Euler diagram, see figure~\ref{fig:euler_temp_controller}. @@ -1408,15 +1432,27 @@ We can now represent this failure mode analysis as an Euler diagram, see figure~ \centering \includegraphics[width=300pt]{./CH5_Examples/euler_temp_controller.png} % euler_temp_controller.png: 714x251 pixel, 72dpi, 25.19x8.85 cm, bb=0 0 714 251 - \caption{euler diagram of the temperature controller final anaysis stage, showing the hybrid software/hardware {\dcs} and the function at the head of the call tree `monitor'.} + \caption{Euler diagram of the temperature controller final anaysis stage, showing the hybrid software/hardware {\dcs} and the function at the head of the call tree `monitor'.} \label{fig:euler_temp_controller} \end{figure} +\subsection{Conclusion: Standalone system, PID Temperature Controller} - - - - +The PID temperature control example above, shows that complete hybrid software/electronic systems can be +modelled using FMMD. The analysis has revealed system level failure modes that are un-handled and some that are unobservable, +but the FMMD analysis shows which failure modes they are. For the failure modes caused +by electronics we can apply reliability statistics. +% +For software errors, we could, if necessary provide extra functions to provide self checking. +We could follow EN61508 high reliability software measures such as +duplication of functions whith checking functions arbitrating them (diverse programming~\cite{en61508}[C.3.5]). +% +We could for instance validate the processor clocking with an external watchdog and a simple +communications protocol. For PROM and RAM faults we can implement measures such as checksums +and ram complement checking. +% +Using FMMD on these extra safety measures we can ensure no single failure could lead to a +system failure, something impossible with current FMEA techniques.