JMC proofread.

This commit is contained in:
Robin P. Clark 2012-12-03 19:15:47 +00:00
parent 85cb3f9c14
commit d2ea8b9c2c
2 changed files with 132 additions and 96 deletions

View File

@ -5,7 +5,7 @@
\section{Software and Hardware Failure Mode Concepts}
\label{sec:elecsw}
In this chapter we show that FMMD can be applied to software enabling us to build build complete failure models
In this chapter we show that FMMD can be applied to both software and electronics enabling us to build complete failure models
of typical modern safety critical systems.
With modular FMEA i.e. FMMD %(FMMD)
we have the concepts of failure~modes
@ -44,7 +44,7 @@ and the subsequent hierarchy.
%
With software already written, the hierarchies are given.
%
To apply FMMD to software, we collect the elements used by a software function, along with the function its-self
To apply FMMD to software, we collect the elements used by a software function, along with the function itself
to form a {\fg}. When we have analysed the failure mode behaviour of this {\fg}
and have its failure mode symptoms, we can create a {\dc}. That {\dc} can be
used by functions that call the function we have just analysed, until
@ -131,23 +131,23 @@ A post condition is a definition of correct behaviour of a function.
A violated post condition is a symptom of failure, or derived failure mode, from a function.
%
Post conditions could be either actions performed (i.e. the state of hardware changed) or an output value of a function.
In pure contract programming, a violation of a pre-condition would not cause the function to
be executed.
In pure contract programming, a violation of a pre-condition would cause the function to
\textbf{not} be executed.
%
In implementation code, a pre-condition violation should cause
an error to be generated, and thus a post condition to fail.
%
A function can fail for reasons other than the
a failure of one the variables/inputs or functions that it calls.
Variables can become corrupted, by radiation affecting RAM or
by another software function erroneously overwriting variables.
failure of one the variables/inputs or functions that it calls.
Variables can become corrupted, by radiation affecting RAM~\cite{5488118,5963919} or
by another software function erroneously overwriting variables~\cite{swseatbelt}.
Current work on software FMEA generally focuses on mapping
variable corruption to failure modes~\cite{procsfmea,procsfmeadb,sfmeaauto,sfmea}.
However, errors other than variable corruption can occur,
for instance a microprocessor may have subtle bugs in its instruction set or
for instance a microprocessor may have subtle bugs in its instruction set, or
incorrectly handled
interrupt contention could cause side effects in software.
For the failure mode model of any software function
interrupt contention which could cause side effects in software.
For the failure mode model of any software function,
we must consider all failure modes of post condition
violations as well as those caused by `components'.
@ -227,7 +227,7 @@ We can now examine a software function that performs a conversion from the volta
a per~mil representation of the {\ft} input current.
%
For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is used.
We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision
We initially assume a function \textbf{read\_ADC} that returns a floating point %double precision
value which represents the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}).
@ -446,13 +446,13 @@ With these failure modes, we can analyse our first functional group, see table~\
}
We now collect the symptoms for the hardware functional group, $\{ HIGH , LOW, V\_ERR \} $.
We now create a {\dc} to represent this called $CMATV$.
We now collect the symptoms for the hardware functional group, $\{ HIGH , LOW, V\_ERR \} $, and
create a {\dc} to represent this called, $CMATV$.
%We can express this using the `$\derivec$' function thus:
%$$ CMATV = \; \derivec (G_1) .$$
As its failure modes, are the symptoms of failure from the functional group we can now state:
As its failure modes, are the symptoms of failure from the functional group we state:
$$fm ( CMATV ) = \{ HIGH , LOW, V\_ERR \} .$$
@ -475,7 +475,7 @@ which we can call $ CHAN\_NO $.
%
The reference voltage for the ADC has a 0.1\% accuracy requirement.
%
If the reference value is outside of this, it is also a {\fm}
If the reference value is outside this, it is also a {\fm}
of this function, which we can call $V\_REF$.
Taken as a component for use in FMEA/FMMD our function has
@ -484,10 +484,10 @@ by stating:
$$ fm(Read\_ADC) = \{ CHAN\_NO, VREF \} $$
As we have a failure mode model for our function, we can now use it in conjunction with
As we have a failure mode model for our function, we use it in conjunction with
with the ADC hardware {\dc} CMATV, to form a {\fg} $G_2$, where $G_2 =\{ CMSTV, Read\_ADC \}$.
We now analyse this hardware/software combined {\fg}.
%
We analyse this hardware/software combined {\fg}.
@ -724,21 +724,25 @@ addressed using the Proportional Integral differential (PID) algorithm~\cite{dco
Traditionally this was performed in analogue electronics
with trimmer potentiometers providing the P and I parameters.
Since the introduction of micro-processors, it has been possible to
implement PID programmatic-ally.
implement PID pro-grammatically.
An FMMD analysis of a PID temperature controller would mean an
analysis of a standalone system without being un-wieldingly large.
\paragraph{PID Temperature Control.}
analysis of a realistic standalone system without being it becoming an un-wieldingly large task.
\paragraph{The PID Temperature Control Algorithm.}
PID control starts with a setpoint, or desired value for a process
(here the temperature). It reads the process value and determines an error value for it.
The aim of the PID controller is to minimise this error term, by setting an output value,
which is fed back into the process (in this example the amount of power to supply the heater).
The error value is integrated and multiplied by an I constant.
A differential of the error value is calculated and multiplied by a D constant.
The error value its self is multiplied by a P constant, and all three of these are added
The error value itself is multiplied by a P constant, and all three of these are added
to obtain the output required.
%
A mathematical description of PID with frequency domain modelling (La-Place transforms etc)
may be found in~\cite{dcods}[Ch.3.3].
%
\subsection{Design Stage: Implementation on a micro-controller.}
When designing a computer program it is often useful to
produce a structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:62004}, see figure~\ref{fig:context_diagram_PID}.
start with a structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:62004}, see figure~\ref{fig:context_diagram_PID}.
\begin{figure}[h]+
\centering
@ -747,16 +751,21 @@ produce a structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:6
\caption{Yourdon Context Diagram for PID Temperature Controller.}
\label{fig:context_diagram_PID}
\end{figure}
Using figure~\ref{fig:context_diagram_PID} we review the system in terms of its data flow, starting
with the data sources ( the Pt100 inputs) and the data syncs (the heater output and the LED indicators).
%
We have two voltage inputs (see section~\ref{sec:Pt100}) from the Pt100 temperature sensor.
For the Pt100 sensor, we will need to read the voltages it outputs and for this
we will need an ADC and MUX.
will therefore require an ADC and MUX.
%
For the output, we can use a Pulse Width Modulator (PWM) (this is a common module found on micro-controllers
allowing a variable power output~\cite{pwm}). PWM's ADC's and MUX's are commonly built into cheap micro-controllers~\cite{pic18f2523}.
We can now build more detail into the Yourdon diagram, with the afferent data flow coming through the MUX and ADC on the micro-controller, and the efferent
allowing a variable power output~\cite{aoe}[p.360]). PWM's ADC's and MUX's are commonly built into cheap micro-controllers~\cite{pic18f2523}[Ch.15].
We refine the Yourdon diagram, with the afferent data flow coming through the MUX and ADC on the micro-controller, and the efferent
channelled through a PWM module, %again built into the micro-controller,
%
see figure~\ref{fig:context_diagram2_PID}.
and add more detail, see figure~\ref{fig:context_diagram2_PID}.
\begin{figure}[h]+
\centering
\includegraphics[width=300pt]{./CH5_Examples/context_diagram2_PID.png}
@ -774,10 +783,12 @@ We refine the data flow within the software and thus define software functions.
%this in terms of software functions.
%
We follow the data streams through the process, creating transform bubbles as required.
In all `bare~metal' software architectures, we need a rudimentary operating system, often referred to as the monitor.
In all `bare~metal'\footnote{`Bare~metal' is a term used to indicate a micro-processor
controlled system that does not use a traditional operating system.}
software architectures, we need a rudimentary operating system, often referred to as the `monitor'.
%
We bare in mind that PID, because the algorithm depends heavily on integral calculus, is time sensitive
and we therefore need to call at precise intervals specific to its integration and differential coefficients.
We bear in mind that PID, because the algorithm depends heavily on integral calculus is time sensitive
and we therefore need to call at precise intervals determined by its proportional, integral and differential (PID) coefficients.
%
Most micro-controllers feature several general purpose timers~\cite{pic18f2523}.
We can use an internal timer in conjunction with the monitor function
@ -798,7 +809,8 @@ Using figure~\ref{fig:contextsoftware} we can now pick the transform bubble we
want to be the `main' or controlling function in the software.
This can be thought of as picking one bubble and holding it up. The other bubbles hang underneath
forming the software call tree hierarchy, see figure~\ref{fig:context_calltree}.
From is clearly going to be the monitor function.
From examining the diagram, and with common embedded programming practise,
this is clearly going to be the monitor function.
\begin{figure}[h]+
\centering
\includegraphics[width=300pt]{./CH5_Examples/context_calltree.png}
@ -819,11 +831,13 @@ With the set point error value the PID function will return
function (i.e. the PID
demand which will be returned to the monitor function).
%
On returning to the monitor function, it will return the PID demand value.
%On returning to the monitor function, it will return the PID demand value.
The PID demand value will be applied via the PWM.
We now have a rudimentary closed loop control system incorporating both hardware and software.
%
Using the Yourdon methodology we have the system design: we have all the components, i.e. hardware elements and software functions
By using the Yourdon methodology we the programmatic design --- or call tree --- defined.
%
We now have all the components, i.e. hardware elements and software functions
that will be used in the temperature controller.
We list these, and begin, from the bottom-up, to apply FMMD analysis.
@ -838,7 +852,7 @@ Identified electronic components:
\item HEATER --- Heating element, essentially a resistor.
\item Pt100 --- Pt100 Temperature sensor, as analysed in section~\ref{sec:Pt100}.
\item PWM --- Internal micro controller pulse width modulation module
\item General Purpose I/O (GPIO) ---
\item General Purpose I/O (GPIO) --- I/O used to source LED current
\item LEDs --- Indication LEDs via GPIO
\item micro-controller --- the medium for running the software
\end{itemize}
@ -876,10 +890,14 @@ $$ fm(Pt100) = \{ OUT\_OF\_RANGE \} $$
\paragraph{PWM}
The PWM, in use, is a hardware register written to with an integer value.
It then applies a mark space ratio proportional to that value providing
a means of applying varying amounts of power. When the PWM
action is halted the digital output pin associated with it will typically be held in a high or low state.
%The PWM, in use, is a hardware register written to with an integer value~\cite{pic182523}[Ch.15].
From a programmatic perspective a PWM output is a register that software writes
an unsigned magnitude value to~\cite{pic182523}[Ch.15].
The PWM hardware module
applies this using a mark space ratio proportional to that value, providing
a means of varying the amount of power supplied.
When the PWM action is halted, or fails, the digital output pin associated with it,
will typically be held in a high or low state.
We therefore state:
$$ fm(PWM) = \{ HIGH, LOW \}.$$
@ -909,10 +927,10 @@ to a PID calculated heater output demand.
We start with the afferent flow from the Pt100.
%with the software, and consider the hardware elements
%used (if any) by each software function.
Starting at the bottom we form a {\fg} with
Starting at the bottom, we form a {\fg} with
the function read\_ADC and the Pt100.
This gives us a {\dc} we shall call ReadPt100.
This gives us a {\dc} which we call ReadPt100.
%
{
\tiny
\begin{table}[h+]
@ -942,20 +960,24 @@ This gives us a {\dc} we shall call ReadPt100.
FC4: $RADC_{LOW}$ & ADC may read & $VOLTAGE\_LOW$ \\ \hline
FC5: post condition fails & software failure & $VAL\_ERR$ \\
in function read\_ADC & & \\ \hline
\end{tabular}
\end{table}
}
%
The {\dc} Read\_Pt100 is a failure mode model of the Read\_ADC function and the Pt100
hardware, and has the following failure modes:
$$ fm (Read\_Pt100) = \{ VOLTAGE\_HIGH, VAL\_ERR, VOLTAGE\_LOW \}. $$
We can now move along in the afferent flow, and we come to the convert\_ADC\_to\_T function.
This will call Read\_ADC thwice, one for the high Pt100 value, again for the lower. % and once for to read a current sense.
We move along the afferent flow, and we come to the convert\_ADC\_to\_T function.
This will call Read\_ADC twice, one for the high Pt100 value, again for the lower. % and once for to read a current sense.
We then, calculate the resistance of the Pt100 element, and with this---using a
polynomial or a lookup table~\cite{eutothermtables}---and calculate the temperature.
polynomial or a lookup table~\cite{eurothermtables}---and calculate the temperature.
The pre-conditions for the function are that:
\begin{itemize}
% \item The current calculated is within pre-defined bounds i.e. Pt100\_current,
@ -968,7 +990,7 @@ Note that a temperature outside the pre-defined range will also cause these erro
The postcondition is that it returns a temperature within a given tolerance to the temperature at the sensor.
A failure of this post-condition can be termed temp\_incorrect.
\clearpage
We now apply FMMD to the {\fg} formed by Read\_Pt100 and the function convert\_ADC\_to\_T.
We apply FMMD to the {\fg} formed by Read\_Pt100 and the function convert\_ADC\_to\_T.
We can call the resulting {\dc} Get\_Temperature.
{
@ -1012,6 +1034,10 @@ We can call the resulting {\dc} Get\_Temperature.
& range error, but may also & \\
& cause us to read an & \\
& incorrect temperature & \\ \hline
FC5: post condition fails & software failure & temp\_incorrect \\
in function convert\_ADC\_to\_T & & \\ \hline
\hline
\end{tabular}
@ -1019,16 +1045,16 @@ We can call the resulting {\dc} Get\_Temperature.
}
We now collect the failure symptoms for the {\dc} Get\_Temperature and can state:
We collect the failure symptoms for the {\dc} Get\_Temperature and can state:
$$fm(Get\_Temperature) = \{ Pt100\_out\_of\_range, temp\_incorrect \}$$
\clearpage
Following the afferent flow further, we come to a function to determine the control error value.
The is simply the target temperature subtracted from the measured.
This is simply the target temperature subtracted from the measured.
We thus form a {\fg} with our newly {\dc} Get\_Temperature
and the function determine\_set\_point\_error.
%
The pre-condition for determine\_set\_point\_error is that the temperature read by it
is accurate, and its post condition is to return the correct control error value.
Most failure modes from a Pt100 are observable.
@ -1060,6 +1086,8 @@ an incorrect error value.
& unobservable & \\
& undetectable failure mode & \\ \hline
FC3: post condition fails & software failure & IncorrectErrorValue \\
in function determine\_set\_point\_error & & \\ \hline
\end{tabular}
@ -1082,13 +1110,18 @@ The post-condition is that it outputs correct control values.
% RESP FOR TIMEING IS ON CALLING FUNCTION AND IS A SEPARATE ERROR- TGHINK ABOUT JITTER.....
% and controll values..... Jitter might not matter, wrong int times would
% controlling function provdes context of use.
Those familiar with the PID algorithm may here notice raise the point of calling frequency.
were this function to be called at an incorrect rate its output
Those familiar with the PID algorithm may realise that digital signal processing algorithms are sensitive to calling frequency.
Were this function to be called at an incorrect rate its output
would be wrong (the differential and integral parameters would effectively have been changed).
%
However this problem is a failure mode for the function calling it.
%
The calling function sets the context for the PID algorithm (i.e. what it is used for).
If this PID were to be used, say as some form of low pass filter, we could consider jitter
for instance. In a control environment with PID jitter would not be a significant factor.
for instance.
%
In a control environment with PID, jitter would not be a significant factor.
%
This harks back to the context of use (see section~\ref{sec:subjectiveobjective}) discussion, the subjective
being the context the {\dc} is used for/in, and the objective
being the logic and process of the failure mode analysis.
@ -1117,6 +1150,9 @@ being the logic and process of the failure mode analysis.
& undetectable failure mode & \\ \hline
FC3: post condition fails & software failure & IncorrectControlErrorV \\
in function PID & & \\ \hline
\end{tabular}
\end{table}
@ -1149,8 +1185,8 @@ of the efferent flow. We apply FMMD analysis to this {\fg} in table~\ref{tbl:hea
For the output\_control function, we have a pre-condition that the PWM module is
configured and working, and has the correct clock frequency.
A second pre-condition is that the heating element is connected and working.
The post condition is that is sets the correct value into the PWM register
to implement the PWM demand.
The post condition is that it sets the correct value into the PWM register
to implement the power output demand.
{
\tiny
@ -1174,17 +1210,20 @@ to implement the PWM demand.
FC2: $ PWM stuck LOW $ & pre-condition violated & HeaterOff \\
& PWM module not working & \\ \hline
FC3: $ output\_control$ wrong value & The software supplies the wrong & HeaterOutputIncorrect \\
& value to the PWM register & \\ \hline
FC4: HEATER $SHORT$ & heating element resistor & HeaterOff \\
FC3: HEATER $SHORT$ & heating element resistor & HeaterOff \\
& SHORT no heating effect & \\ \hline
FC5: HEATER $OPEN $ & heating element resistor & HeaterOff \\
FC4: HEATER $OPEN $ & heating element resistor & HeaterOff \\
& OPEN no heating effect & \\ \hline
FC5: $ output\_control$ post & The software supplies the wrong & HeaterOutputIncorrect \\
condition failure & value to the PWM register & \\ \hline
\end{tabular}
\end{table}
}
@ -1206,42 +1245,25 @@ $$fm(HeaterOutput) = \{ HeaterOnFull, HeaterOff, HeaterOutputIncorrect \}$$
\subsubsection{Efferent flow: LED status LEDs}
The status LEDS will be controlled by general purpose (GPIO) I/O pins.
%
We could have say, three LEDS one flashing with a human readable mark
space ratio representing the heater output, one flashing at a regular interval to
indicate the processor is alive and another flashing at an interval related to the temperature,
(to indicate if the temperature readings are within expected ranges).
%
Each LED should flash in normal operation, and any LED being permanently on or off
would indicate to the operator that an error had occurred.
The pre condition for this function is that the GPIO
%
The pre-condition for this function is that the GPIO
is connected to working LEDS.
The post condition is that the function setLEDS, will supply correct indication by flashing the LEDs.
%
The post condition is that the function setLEDS will supply correct indication by flashing the LEDs.
%
We form a {\fg} from the GPIO, the LEDs and the software function setLEDs.
%
We apply FMMD analysis to this {\fg} in table~\ref{tbl:ledoutput}.
{
@ -1297,6 +1319,8 @@ We apply FMMD analysis to this {\fg} in table~\ref{tbl:ledoutput}.
\end{figure}
Our {\dc} for the setLED function, GPIO and LEDs has the following failure modes:
$$ fm(LEDoutput) = \{FailureIndicated, IndicationError \} $$
\subsubsection{Final Analysis Stage: PID Temperature Controller}
@ -1347,7 +1371,7 @@ The post condition for the monitor function is that it implements the PID contro
& observable error can be indicated & \\ \hline
FC2: PID IncorrectControlerrorV & undetectable/iunobservable & ControlFailure \\
FC2: PID IncorrectControlerrorV & undetectable/unobservable & ControlFailure \\
& failure PID will not control properly & \\ \hline
FC3: HeaterOutput & Heater will constantly & ControlFailureIndicated \\
@ -1398,7 +1422,7 @@ The post condition for the monitor function is that it implements the PID contro
We can now create a {\dc} for the standalone temperature controller, and give it the name TempController.
It will have the following failure modes:
$$fm ( TempController ) = \{ ControlFailureIndicated, ControlFailure, KnownIndicationError, UnknownIndicationError \}$$
$$fm ( TempController ) = \{ ControlFailureIndicated, ControlFailure, KnownIndicationError, UnknownIndicationError \}.$$
We can now represent this failure mode analysis as an Euler diagram, see figure~\ref{fig:euler_temp_controller}.
@ -1408,15 +1432,27 @@ We can now represent this failure mode analysis as an Euler diagram, see figure~
\centering
\includegraphics[width=300pt]{./CH5_Examples/euler_temp_controller.png}
% euler_temp_controller.png: 714x251 pixel, 72dpi, 25.19x8.85 cm, bb=0 0 714 251
\caption{euler diagram of the temperature controller final anaysis stage, showing the hybrid software/hardware {\dcs} and the function at the head of the call tree `monitor'.}
\caption{Euler diagram of the temperature controller final anaysis stage, showing the hybrid software/hardware {\dcs} and the function at the head of the call tree `monitor'.}
\label{fig:euler_temp_controller}
\end{figure}
\subsection{Conclusion: Standalone system, PID Temperature Controller}
The PID temperature control example above, shows that complete hybrid software/electronic systems can be
modelled using FMMD. The analysis has revealed system level failure modes that are un-handled and some that are unobservable,
but the FMMD analysis shows which failure modes they are. For the failure modes caused
by electronics we can apply reliability statistics.
%
For software errors, we could, if necessary provide extra functions to provide self checking.
We could follow EN61508 high reliability software measures such as
duplication of functions whith checking functions arbitrating them (diverse programming~\cite{en61508}[C.3.5]).
%
We could for instance validate the processor clocking with an external watchdog and a simple
communications protocol. For PROM and RAM faults we can implement measures such as checksums
and ram complement checking.
%
Using FMMD on these extra safety measures we can ensure no single failure could lead to a
system failure, something impossible with current FMEA techniques.