Evening bash---mostly Chapter 2
conversion from presentation.
This commit is contained in:
parent
5e6f0206d0
commit
45929cb921
@ -734,7 +734,10 @@ hardware limitations etc.
|
||||
|
||||
The interface FMEA does serve to provide a useful
|
||||
check-list to ensure data and synchronisation conventions used by the hardware
|
||||
and software are not mismatched.
|
||||
and software are not mismatched. However, the fact it is perceived as required
|
||||
highlights the the miss-matches possible between the two types of analysis
|
||||
which could run deeper than the mere interface level.
|
||||
|
||||
|
||||
However, while these techniques ensure that the software and hardware is
|
||||
viewed and analysed from several perspectives, it cannot be termed a homogeneous
|
||||
|
@ -3,7 +3,7 @@
|
||||
#
|
||||
# Place all .dia files here as .png targets
|
||||
#
|
||||
DIA =
|
||||
DIA = ftcontext.png
|
||||
|
||||
|
||||
doc: $(DIA)
|
||||
|
@ -15,30 +15,36 @@ on the behaviour and safety of the system."
|
||||
%\tableofcontents[currentsection]
|
||||
|
||||
|
||||
FMEA is a broad term, and can mean anything from an informal check on how
|
||||
how failures could affect some equipment in an initial brain-storming session
|
||||
in product design to formal submissions as part of safety critical certification
|
||||
procedures
|
||||
This chapter describes the basic concepts, uses a simple example to
|
||||
demonstrate an FMEA stage and then explores some concepts with which we can evaluate
|
||||
the effectiveness of FMEA.
|
||||
|
||||
|
||||
|
||||
\subsection{FMEA}
|
||||
This talk introduces Failure Mode Effects Analysis, and the different ways it is applied.
|
||||
These techniques are discussed, and then
|
||||
a refinement is proposed, which is essentially a modularisation of the FMEA process.
|
||||
% \subsection{FMEA}
|
||||
% This talk introduces Failure Mode Effects Analysis, and the different ways it is applied.
|
||||
% These techniques are discussed, and then
|
||||
% a refinement is proposed, which is essentially a modularisation of the FMEA process.
|
||||
% %
|
||||
%
|
||||
|
||||
\begin{itemize}
|
||||
\item Failure
|
||||
\item Mode
|
||||
\item Effects
|
||||
\item Analysis
|
||||
\end{itemize}
|
||||
|
||||
|
||||
|
||||
% % \begin{itemize}
|
||||
% \begin{itemize}
|
||||
% \item Failure
|
||||
% \item Mode
|
||||
% \item Effects
|
||||
% \item Analysis
|
||||
% \end{itemize}
|
||||
%
|
||||
%
|
||||
%
|
||||
% % % \begin{itemize}
|
||||
% % \item Failure
|
||||
% % \item Mode
|
||||
% % \item Effects
|
||||
% % \item Analysis
|
||||
% % \end{itemize}
|
||||
|
||||
|
||||
\subsection{FMEA basic concept}
|
||||
@ -53,6 +59,8 @@ a refinement is proposed, which is essentially a modularisation of the FMEA proc
|
||||
|
||||
|
||||
|
||||
FMEA is a procedure based on the low level components of a system, and an example
|
||||
analysis will serve to demonstrate it in practise.
|
||||
|
||||
\subsection{ FMEA Example: Milli-volt reader}
|
||||
Example: Let us consider a system, in this case a milli-volt reader, consisting
|
||||
@ -81,10 +89,6 @@ For the sake of example let us choose resistor R1 in the OP-AMP gain circuitry.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{FMEA Example: Milli-volt reader}
|
||||
% \begin{figure}
|
||||
% \centering
|
||||
@ -95,7 +99,7 @@ For the sake of example let us choose resistor R1 in the OP-AMP gain circuitry.
|
||||
\item \textbf{F - Failures of given component} The resistor (R1) could fail by going OPEN or SHORT (EN298 definition).
|
||||
\item \textbf{M - Failure Mode} Consider the component failure mode SHORT
|
||||
\item \textbf{E - Effects} This will drive the minus input LOW causing a HIGH OUTPUT/READING
|
||||
\item \textbf{A - Analysis} The reading will be out of normal range, and we will have an erroneous milli-volt reading
|
||||
\item \textbf{A - Analysis} The reading will be out of the normal range, and we will have an erroneous milli-volt reading
|
||||
\end{itemize}
|
||||
|
||||
|
||||
@ -112,20 +116,47 @@ Perhaps we should.... this would be a more rigorous and complete
|
||||
approach in looking for system failures.
|
||||
|
||||
|
||||
|
||||
\subsection{Rigorous FMEA - State Explosion}
|
||||
|
||||
\subsection{Rigorous Single Failure FMEA}
|
||||
Consider the analysis
|
||||
where we look at all the failure modes in a system, and then
|
||||
see how they can affect all other components within it.
|
||||
\section{Theoretical Concepts in FMEA}
|
||||
|
||||
|
||||
\subsection{The unacceptability of a single component failure causing a catastrophe}
|
||||
|
||||
FMEA, due to its inductive bottom-up approach, is very good
|
||||
at finding potential component failures that could have catastrophic implications.
|
||||
Used in the design phase of a project FMEA is an invaluable tool
|
||||
for unearthing these type of failure scenario.
|
||||
It is less useful for determining catastrophic events for multiple
|
||||
simultaneous\footnote{Multiple simultaneous failures are taken to mean failure that occur within the same detection period.} failures.
|
||||
|
||||
\subsection{Impracticality of Field Data for modern systems}
|
||||
|
||||
Modern electronic components, are generally very reliable, and the systems built from them
|
||||
are thus very reliable too. Reliable field data on failures will, therefore be sparse.
|
||||
Should we wish to prove a continuous demand system for say ${10}^{-7}$ failures\footnote{${10}^{-7}$ failures per hour of operation is the
|
||||
threshold for S.I.L. 3 reliability~\cite{en61508}.}
|
||||
per hour of operation, even with 1000 correctly monitored units in the field
|
||||
we could only expect one failure per ten thousand hours (a little over one a year) to fail.
|
||||
It would be impractical to get statistically significant data for equipment
|
||||
at these reliability levels.
|
||||
However, we can use FMEA (more specifically the FMEDA variant, see section~\ref{sec:FMEDA}), working from known component failure rates, to obtain
|
||||
statistical estimates of the equipment reliability.
|
||||
|
||||
|
||||
\subsection{Rigorous Single Failure FMEA}
|
||||
We need to look at a large number of failure scenarios
|
||||
to do this completely (all failure modes against all components).
|
||||
\subsection{Rigorous FMEA --- State Explosion}
|
||||
|
||||
FMEA cannot consider---for practical reasons---a rigorous approach.
|
||||
It must be applied by experts in the system under investigation
|
||||
to be a meaningful analysis.
|
||||
|
||||
\paragraph{Rigorous Single Failure FMEA}
|
||||
|
||||
FMEA for a safety critical certification~\cite{en298,en61508} will have to be applied
|
||||
to all known failure modes of all components within a system.
|
||||
|
||||
To perform FMEA rigorously (i.e. to examine every possible interaction
|
||||
of a failure mode with all other components in a system). Or in other words,
|
||||
---we would need to look at all possible failure scenarios.
|
||||
%to do this completely (all failure modes against all components).
|
||||
This is represented in the equation below. %~\ref{eqn:fmea_state_exp},
|
||||
where $N$ is the total number of components in the system, and
|
||||
$f$ is the number of failure modes per component.
|
||||
@ -138,23 +169,17 @@ $f$ is the number of failure modes per component.
|
||||
\end{equation}
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Rigorous Single Failure FMEA}
|
||||
This would mean an order of $N^2$ number of checks to perform
|
||||
\paragraph{Rigorous Single Failure FMEA}
|
||||
This would mean an order of $O(N^2)$ number of checks to perform
|
||||
to undertake a `rigorous~FMEA'. Even small systems have typically
|
||||
100 components, and they typically have 3 or more failure modes each.
|
||||
$100*99*3=29,700$.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Rigorous Double Failure FMEA}
|
||||
For looking at potential double failure scenarios (two components
|
||||
failing within a given time frame) and the order becomes
|
||||
$N^3$.
|
||||
\paragraph{Rigorous Double Failure FMEA}
|
||||
For looking at potential double failure
|
||||
scenarios\footnote{Certain double failure scenarios are already legal requirements---The European Gas burner standard (EN298:2003)---demands the checking of
|
||||
double failure scenarios (for burner lock-out scenarios).}
|
||||
(two components failing within a given time frame) and the order becomes $O(N^3)$.
|
||||
|
||||
\begin{equation}
|
||||
\label{eqn:fmea_double}
|
||||
@ -162,23 +187,24 @@ $N^3$.
|
||||
%(N^2 - N).f
|
||||
\end{equation}
|
||||
|
||||
$100*99*98*3=2,910,600$.
|
||||
|
||||
|
||||
.\\
|
||||
|
||||
The European Gas burner standard (EN298:2003), demands the checking of
|
||||
double failure scenarios (for burner lock-out scenarios).
|
||||
For our theoretical 100 components with 3 failure modes each example, this is
|
||||
$100*99*98*3=2,910,600$ failure mode scenarios.
|
||||
|
||||
|
||||
|
||||
|
||||
\subsection{Four main Variants of FMEA}
|
||||
|
||||
|
||||
|
||||
\section{FMEA in practise: Five variants}
|
||||
|
||||
\paragraph{Five main Variants of FMEA}
|
||||
\begin{itemize}
|
||||
\item \textbf{PFMEA - Production} Car Manufacture etc
|
||||
\item \textbf{FMECA - Criticallity} Military/Space
|
||||
\item \textbf{FMEDA - Statistical safety} EN61508/IOC1508 Safety Integrity Levels
|
||||
\item \textbf{DFMEA - Design or static/theoretical} EN298/EN230/UL1998
|
||||
\item \textbf{SFMEA - Software FMEA --- only used in highly critical systems at present}
|
||||
\end{itemize}
|
||||
|
||||
|
||||
@ -417,6 +443,7 @@ programming languages and/or features.
|
||||
|
||||
|
||||
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
||||
\label{sec:FMEDA}
|
||||
\textbf{Failure Mode Classifications in FMEDA.}
|
||||
\begin{itemize}
|
||||
\item \textbf{Safe or Dangerous} Failure modes are classified SAFE or DANGEROUS
|
||||
@ -438,6 +465,7 @@ $ \sum \lambda_{SD}$, $\sum \lambda_{SU}$, $\sum \lambda_{DD}$, $\sum \lambda_{D
|
||||
|
||||
|
||||
\subsection{ FMEDA - Failure Modes Effects and Diagnostic Analysis}
|
||||
|
||||
\textbf{Diagnostic Coverage.}
|
||||
The diagnostic coverage is simply the ratio
|
||||
of the dangerous detected probabilities
|
||||
@ -558,3 +586,518 @@ judged to be in critical sections of the product.
|
||||
\item Often Meeting notes or minutes only. Unusual for detailed arguments to be documented.
|
||||
\end{itemize}
|
||||
|
||||
|
||||
|
||||
|
||||
\section{Software FMEA (SFMEA)}
|
||||
|
||||
\paragraph{Current work on Software FMEA}
|
||||
|
||||
SFMEA usually does not seek to integrate
|
||||
hardware and software models, but to perform
|
||||
FMEA on the software in isolation~\cite{procsfmea}.
|
||||
%
|
||||
Work has been performed using databases
|
||||
to track the relationships between variables
|
||||
and system failure modes~\cite{procsfmeadb}, to %work has been performed to
|
||||
introduce automation into the FMEA process~\cite{appswfmea} and to provide code analysis
|
||||
automation~\cite{modelsfmea}. Although the SFMEA and hardware FMEAs are performed separately,
|
||||
some schools of thought aim for Fault Tree Analysis (FTA)~\cite{nasafta,nucfta} (top down - deductive)
|
||||
and FMEA (bottom-up inductive)
|
||||
to be performed on the same system to provide insight into the
|
||||
software hardware/interface~\cite{embedsfmea}.
|
||||
%
|
||||
Although this
|
||||
would give a better picture of the failure mode behaviour, it
|
||||
is by no means a rigorous approach to tracing errors that may occur in hardware
|
||||
through to the top (and therefore ultimately controlling) layer of software.
|
||||
|
||||
\subsection{Current FMEA techniques are not suitable for software}
|
||||
|
||||
The main FMEA methodologies are all based on the concept of taking
|
||||
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
|
||||
%
|
||||
In a complicated system, mapping a component failure mode to a system level failure
|
||||
will mean a long reasoning distance; that is to say the actions of the
|
||||
failed component will have to be traced through
|
||||
several sub-systems, gauging its effects with and on other components.
|
||||
%
|
||||
With software at the higher levels of these sub-systems,
|
||||
we have yet another layer of complication.
|
||||
%
|
||||
%In order to integrate software, %in a meaningful way
|
||||
%we need to re-think the
|
||||
%FMEA concept of simply mapping a base component failure to a system level event.
|
||||
%
|
||||
SFMEA regards, in place of hardware components, the variables used by the programs to be their equivalent~\cite{procsfmea}.
|
||||
The failure modes of these variables, are that they could become erroneously over-written,
|
||||
calculated incorrectly (due to a mistake by the programmer, or a fault in the micro-processor on which it is running), or
|
||||
external influences such as
|
||||
ionising radiation causing bits to be erroneously altered.
|
||||
|
||||
|
||||
\paragraph{A more-complete Failure Mode Model}
|
||||
|
||||
% HFMEA
|
||||
% SFMEA
|
||||
% VARIABLE CURRUPTION
|
||||
% MICRO PROCESSOR FAULTS
|
||||
% INTERFACE ANALYSIS
|
||||
%
|
||||
% add them all together --- a load of bollocks, lots of impressive inches of reports that no one will be bothered to read....
|
||||
%
|
||||
In order to obtain a more complete failure mode model of
|
||||
a hybrid electronic/software system we need to analyse
|
||||
the hardware, the software, the hardware the software runs on (i.e. the software's medium),
|
||||
and the software/hardware interface.
|
||||
%
|
||||
HFMEA is a well established technique and needs no further description in this paper.
|
||||
|
||||
\section{Example for analysis} % : How can we apply FMEA}
|
||||
|
||||
For the purpose of example, we chose a simple common safety critical industrial circuit
|
||||
that is nearly always used in conjunction with a programmatic element.
|
||||
A common method for delivering a quantitative value in analogue electronics is
|
||||
to supply a current signal to represent the value to be sent~\cite{aoe}[p.934].
|
||||
Usually, $4mA$ represents a zero or starting value and $20mA$ represents the full scale,
|
||||
and this is referred to as {\ft} signalling.
|
||||
%
|
||||
{\ft} has an electrical advantage as well because the current in an electronic loop is constant~\cite{aoe}[p.20].
|
||||
Thus resistance in the wires between the source and the receiving end is not an issue
|
||||
that can alter the accuracy of the signal.
|
||||
%
|
||||
This circuit has many advantages for safety. If the signal becomes disconnected
|
||||
it reads an out of range $0mA$ at the receiving end. This is outside the {\ft} range,
|
||||
and is therefore easy to detect as an error rather than an incorrect value.
|
||||
%
|
||||
Should the driving electronics go wrong at the source end, it will usually
|
||||
supply far too little or far too much current, making an error condition easy to detect.
|
||||
%
|
||||
At the receiving end, one needs a resistor to convert the
|
||||
current signal into a voltage that we can read with an ADC.%
|
||||
%we only require one simple component to convert the
|
||||
|
||||
|
||||
%BLOCK DIAGRAM HERE WITH FT CIRCUIT LOOP
|
||||
|
||||
\begin{figure}[h]
|
||||
\centering
|
||||
\includegraphics[width=250pt]{./CH2_FMEA/ftcontext.png}
|
||||
% ftcontext.png: 767x385 pixel, 72dpi, 27.06x13.58 cm, bb=0 0 767 385
|
||||
\caption{Context Diagram for {\ft} loop}
|
||||
\label{fig:ftcontext}
|
||||
\end{figure}
|
||||
|
||||
|
||||
The diagram in figure~\ref{fig:ftcontext} shows some equipment which is sending a {\ft}
|
||||
signal to a micro-controller system.
|
||||
The signal is locally driven over a load resistor, and then read into the micro-controller via
|
||||
an ADC and its multiplexer.
|
||||
With the voltage detected at the ADC the multiplexer we read the intended quantitative
|
||||
value from the external equipment.
|
||||
|
||||
\subsection{Simple Software Example}
|
||||
|
||||
|
||||
Consider a software function that reads a {\ft} input, and returns a value between 0 and 999 (i.e. per mil $\permil$)
|
||||
representing the value intended by the current detected, with an additional error indication flag to indicate the validity
|
||||
of the value returned.
|
||||
%
|
||||
Let us assume the {\ft} detection is via a \ohms{220} resistor, and that we read a voltage
|
||||
from an ADC into the software.
|
||||
Let us define any value outside the 4mA to 20mA range as an error condition.
|
||||
%
|
||||
As a voltage, we use ohms law~\cite{aoe} to determine the voltage ranges: $V=IR$, $$0.004A * \ohms{220} = 0.88V $$
|
||||
and $$0.020A * \ohms{220} = 4.4V \;.$$
|
||||
%
|
||||
Our acceptable voltage range is therefore
|
||||
%
|
||||
$$(V \ge 0.88) \wedge (V \le 4.4) \; .$$
|
||||
|
||||
This voltage range forms our input requirement.
|
||||
%
|
||||
We can now examine a software function that performs a conversion from the voltage read to
|
||||
a per~mil representation of the {\ft} input current.
|
||||
%
|
||||
For the purpose of example the `C' programming language~\cite{DBLP:books/ph/KernighanR88} is
|
||||
used\footnote{ C coding examples use the Misra~\cite{misra} and SIL-3 recommended language constraints~\cite{en61508}.}.
|
||||
We initially assume a function \textbf{read\_ADC} which returns a floating point %double precision
|
||||
value representing the voltage read (see code sample in figure~\ref{fig:code_read_4_20_input}).
|
||||
|
||||
|
||||
%%{\vbox{
|
||||
\begin{figure}[h+]
|
||||
|
||||
\footnotesize
|
||||
\begin{verbatim}
|
||||
/***********************************************/
|
||||
/* read_4_20_input() */
|
||||
/***********************************************/
|
||||
/* Software function to read 4mA to 20mA input */
|
||||
/* returns a value from 0-999 proportional */
|
||||
/* to the current input. */
|
||||
/***********************************************/
|
||||
int read_4_20_input ( int * value ) {
|
||||
double input_volts;
|
||||
int error_flag;
|
||||
|
||||
/* set ADC MUX with input to read from */
|
||||
input_volts = read_ADC(INPUT_4_20_mA);
|
||||
|
||||
if ( input_volts < 0.88 || input_volts > 4.4 ) {
|
||||
error_flag = 1; /* Error flag set to TRUE */
|
||||
}
|
||||
else {
|
||||
*value = (input_volts - 0.88) * ( 4.4 - 0.88 ) * 999.0;
|
||||
error_flag = 0; /* indicate current input in range */
|
||||
}
|
||||
/* ensure: value is proportional (0-999) to the
|
||||
4 to 20mA input */
|
||||
return error_flag;
|
||||
}
|
||||
\end{verbatim}
|
||||
%}
|
||||
%}
|
||||
|
||||
\caption{Software Function: \textbf{read\_4\_20\_input}}
|
||||
\label{fig:code_read_4_20_input}
|
||||
%\label{fig:420i}
|
||||
\end{figure}
|
||||
|
||||
We now look at the function called by \textbf{read\_4\_20\_input}, \textbf{read\_ADC}, which returns a
|
||||
voltage for a given ADC channel.
|
||||
%
|
||||
This function
|
||||
deals directly with the hardware in the micro-controller on which we are running the software.
|
||||
%
|
||||
Its job is to select the correct channel (ADC multiplexer) and then to initiate a
|
||||
conversion by setting an ADC 'go' bit (see code sample in figure~\ref{fig:code_read_ADC}).
|
||||
%
|
||||
It takes the raw ADC reading and converts it into a
|
||||
floating point\footnote{the type `double' or `double precision' is a
|
||||
standard C language floating point type~\cite{DBLP:books/ph/KernighanR88}.}
|
||||
voltage value.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
%{\vbox{
|
||||
\begin{figure}[h+]
|
||||
|
||||
\footnotesize
|
||||
\begin{verbatim}
|
||||
/***********************************************/
|
||||
/* read_ADC() */
|
||||
/***********************************************/
|
||||
/* Software function to read voltage from a */
|
||||
/* specified ADC MUX channel */
|
||||
/* Assume 10 ADC MUX channels 0..9 */
|
||||
/* ADC_CHAN_RANGE = 9 */
|
||||
/* Assume ADC is 12 bit and ADCRANGE = 4096 */
|
||||
/* returns voltage read as double precision */
|
||||
/***********************************************/
|
||||
double read_ADC( int channel ) {
|
||||
int timeout = 0;
|
||||
|
||||
/* return out of range result */
|
||||
/* if invalid channel selected */
|
||||
if ( channel > ADC_CHAN_RANGE )
|
||||
return -2.0;
|
||||
/* set the multiplexer to the desired channel */
|
||||
ADCMUX = channel;
|
||||
ADCGO = 1; /* initiate ADC conversion hardware */
|
||||
/* wait for ADC conversion with timeout */
|
||||
while ( ADCGO == 1 || timeout < 100 )
|
||||
timeout++;
|
||||
if ( timeout < 100 )
|
||||
dval = (double) ADCOUT * 5.0 / ADCRANGE;
|
||||
else
|
||||
dval = -1.0; /* indicate invalid reading */
|
||||
/* return voltage as a floating point value */
|
||||
/* ensure: value is voltage input to within 0.1% */
|
||||
return dval;
|
||||
}
|
||||
\end{verbatim}
|
||||
\caption{Software Function: \textbf{read\_ADC}}
|
||||
\label{fig:code_read_ADC}
|
||||
\end{figure}
|
||||
%}
|
||||
%}
|
||||
|
||||
|
||||
We now have a very simple software structure, a call tree, where {\em read\_4\_20\_input}
|
||||
calls {\em read\_ADC}, which in turn interacts with the hardware/electronics.
|
||||
%shown in figure~\ref{fig:ct1}.
|
||||
%
|
||||
% \begin{figure}[h]
|
||||
% \centering
|
||||
% \includegraphics[width=56pt]{./ct1.png}
|
||||
% % ct1.png: 151x224 pixel, 72dpi, 5.33x7.90 cm, bb=0 0 151 224
|
||||
% \caption{Call tree for software example}
|
||||
% \label{fig:ct1}
|
||||
% \end{figure}
|
||||
%
|
||||
This software is above the hardware in the conceptual call tree---from a programmatic perspective---%in software terms---the
|
||||
software is reading values from the `lower~level' electronics.
|
||||
%
|
||||
%FMEA is always a bottom-up process and so we must begin with this hardware.
|
||||
%
|
||||
The hardware is simply a load resistor, connected across an ADC input
|
||||
pin on the micro-controller and ground.
|
||||
%
|
||||
We can identify the resistor and the ADC module of the micro-controller as
|
||||
the base components in this design.
|
||||
%
|
||||
We now apply FMMD starting with the hardware.
|
||||
|
||||
|
||||
\section{Failure Mode effects Analysis}
|
||||
|
||||
Four emerging and current techniques are now used to
|
||||
apply FMEA to the hardware, the software, the software medium and the software hardware insterface.
|
||||
|
||||
\subsection{Hardware FMEA}
|
||||
|
||||
The hardware FMEA requires that for each component we consider all failure modes
|
||||
and the putative effect those failure modes would have on the system.
|
||||
The electronic components in our {\ft} system are the load resistor,
|
||||
the multiplexer and the analogue to digital converter.
|
||||
|
||||
{
|
||||
\tiny
|
||||
\begin{table}[h+]
|
||||
\caption{Hardware FMEA {\ft}} % title of Table
|
||||
\label{tbl:r420i}
|
||||
|
||||
\begin{tabular}{|| l | c | l ||} \hline
|
||||
\textbf{Failure} & \textbf{failure} & \textbf{System} \\
|
||||
\textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline
|
||||
\hline
|
||||
$R$ & OPEN~\cite{en298}[Ann.A] & $LOW$ \\
|
||||
& & $READING$ \\ \hline
|
||||
|
||||
$R$ & SHORT~\cite{en298}[Ann.A] & $HIGH$ \\
|
||||
& & $READING$ \\ \hline
|
||||
|
||||
|
||||
|
||||
$MUX$ & read wrong & $VAL\_ERROR$ \\
|
||||
& input ~\cite{fmd91}[3-102] & \\ \hline
|
||||
|
||||
|
||||
|
||||
$ADC$ & ADC output & $VAL\_ERROR$ \\
|
||||
& erronous ~\cite{fmd91}[3-109] & \\ \hline
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
}
|
||||
|
||||
The last two failures both lead to the system failure of $VAL\_ERROR$ .
|
||||
They could lead to low or high reading as well, but we would only be able to determine this
|
||||
from knowledge of the software systems criteria for these.
|
||||
\clearpage
|
||||
\subsection{Software FMEA - variables in place of components}
|
||||
|
||||
For software FMEA, we take the variables used by the system,
|
||||
and examine what could happen if they are corrupted in various ways~\cite{procsfmea, embedsfmea}.
|
||||
From the function $read\_4\_20\_input()$ we have the variables $error\_flag$,
|
||||
$input\_volts$ and $value$: from the function $read\_ADC()$, $timeout$, $ADCMUX$, $ADCGO$, $dval$.
|
||||
We must now determine putative system failure modes for these variables becoming corrupted, this is performed in table~\ref{tbl:sfmea}.
|
||||
|
||||
|
||||
{
|
||||
\tiny
|
||||
\begin{table}[h+]
|
||||
\caption{SFMEA {\ft}} % title of Table
|
||||
\label{tbl:sfmea}
|
||||
|
||||
\begin{tabular}{|| l | c | l ||} \hline
|
||||
\textbf{Failure} & \textbf{failure} & \textbf{System} \\
|
||||
\textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline
|
||||
\hline
|
||||
$error\_flag$ & set FALSE & $VAL\_ERROR$ \\
|
||||
& & \\ \hline
|
||||
|
||||
$error\_flag$ & set TRUE & invalid \\
|
||||
& & error flag \\ \hline
|
||||
|
||||
$input\_volts$ & corrupted & $VAL\_ERROR$ \\
|
||||
& & \\ \hline
|
||||
|
||||
|
||||
$value $ & corrupted & $VAL\_ERROR$ \\
|
||||
& & \\ \hline
|
||||
|
||||
|
||||
|
||||
$timeout $ & corrupted & $VAL\_ERROR$ \\
|
||||
& & \\ \hline
|
||||
|
||||
|
||||
$ADCMUX $ & corrupted & $VAL\_ERROR$ \\
|
||||
& & \\ \hline
|
||||
|
||||
|
||||
|
||||
$ADCGO $ & corrupted & $VAL\_ERROR$ \\
|
||||
& & \\ \hline
|
||||
|
||||
$dval $ & corrupted & $VAL\_ERROR$ \\
|
||||
& & \\ \hline
|
||||
|
||||
|
||||
|
||||
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table} xe
|
||||
}
|
||||
\clearpage
|
||||
\subsection{Software FMEA - failure modes of the medium ($\mu P$) of the software}
|
||||
|
||||
Microprocessors/Microcontrollers have sets of known failure modes, these include RAM, ROM
|
||||
EEPROM failure\footnote{EEPROM failure is not applicable for this example.} and
|
||||
oscillator clock timing
|
||||
|
||||
|
||||
|
||||
{
|
||||
\tiny
|
||||
\begin{table}[h+]
|
||||
\caption{SFMEA {\ft}} % title of Table
|
||||
\label{tbl:sfmeaup}
|
||||
|
||||
\begin{tabular}{|| l | c | l ||} \hline
|
||||
\textbf{Failure} & \textbf{failure} & \textbf{System} \\
|
||||
\textbf{Scenario} & \textbf{effect} & \textbf{Failure} \\ \hline
|
||||
\hline
|
||||
$RAM$ & variable & All errors \\
|
||||
& corruption & from table~\ref{tbl:sfmea} \\ \hline
|
||||
|
||||
$RAM$ & proxegram flow & process \\
|
||||
& & halts / crashes \\ \hline
|
||||
|
||||
$OSC$ & stopped & process \\
|
||||
& & halts \\ \hline
|
||||
|
||||
$OSC$ & too & ADC \\
|
||||
& fast & value errors \\ \hline
|
||||
|
||||
$OSC$ & too & ADC \\
|
||||
& slow & value errors \\ \hline
|
||||
|
||||
$ROM$ & program & All errors \\
|
||||
& corruption & from table~\ref{tbl:sfmea} \\ \hline
|
||||
|
||||
$ROM$ & constant & All errors \\
|
||||
& /data corruption & from table~\ref{tbl:sfmea} \\ \hline
|
||||
|
||||
\hline
|
||||
\end{tabular}
|
||||
\end{table}
|
||||
}
|
||||
|
||||
\clearpage
|
||||
\subsection{Software FMEA - The software/hardware interface}
|
||||
|
||||
As FMEA is applied separately to software and hardware
|
||||
the interface between them is an undefined factor.
|
||||
Ozarin~\cite{sfmeainterface,procsfmea} recommends that an FMEA report be written
|
||||
to focus on the software/hardware interface.
|
||||
The software/hardware interface has
|
||||
specific problems common to many systems and configurations
|
||||
and these are described in~\cite{sfmeainterface}.
|
||||
%An interface FMEA is performed in table~\ref{hwswinterface}.
|
||||
%
|
||||
The hardware to software interface for the {\ft} example is handled
|
||||
by the 'C' function $read\_ADC()$
|
||||
(see code sample in figure~\ref{fig:code_read_ADC}).
|
||||
%
|
||||
% An FMEA of the `software~medium' is given in table~\ref{tbl:sfmeaup}.
|
||||
\paragraph{Timing and Synchronisation.}
|
||||
The $ADCOUT$ register, where the raw ADC value is read
|
||||
is an internal register used by the ADC and presented
|
||||
as a readable memory location when the ADC
|
||||
has finished updating it.
|
||||
Reading it at the wrong time would
|
||||
cause an invalid value to be read.
|
||||
The synchronisation is performed by polling an $ADCGO$
|
||||
bit, a flag mapped to memory by which the ADC indicates that the data is ready.
|
||||
|
||||
\paragraph{Interrupt Contention.}
|
||||
Were an interrupt to also attempt to read from the ADC
|
||||
the ADCMUX could be altered, causing the non-interrupt
|
||||
routine to read from the wrong channel.
|
||||
|
||||
\paragraph{Data Formatting.}
|
||||
The ADC may use a big-endian or little endian integer
|
||||
format. It may also right or left justify the bits in its value.
|
||||
|
||||
|
||||
|
||||
\subsection{SFMEA Conclusion}
|
||||
%
|
||||
This paper has picked a very simple example (the industry standard {\ft}
|
||||
input circuit and software) to demonstrate
|
||||
SFMEA and HFMEA methodologies used to describe a failure mode model.
|
||||
%Even a modest system would be far too large to analyse in conference paper
|
||||
%and this
|
||||
%
|
||||
%The {\dc} representing the {\ft} reader
|
||||
%shows that by taking a
|
||||
%modular approach for FMEA, i.e. FMMD, we can integrate
|
||||
Our model is described by four FMEA reports; and these % we can model the failure mode behaviour from
|
||||
model the system from several failure mode perspectives.
|
||||
%
|
||||
With traditional FMEA methods the reasoning~distance is large, because
|
||||
it stretches from the component failure mode to the top---or---system level failure.
|
||||
%
|
||||
With these four analysis reports
|
||||
we do not have stages along the `reasoning~path' linking the failure modes from the
|
||||
electronics to those in the software.
|
||||
%Software is often written `defensively' but t
|
||||
%Each {\fg} to {\dc} transition represents a
|
||||
%reasoning stage.
|
||||
%
|
||||
%
|
||||
%For this reason applying traditional FMEA to software stretches
|
||||
%the reasoning distance even further.
|
||||
%
|
||||
In fact many these reasoning paths overlap---or even by-pass one another---
|
||||
it is very difficult to gauge cause and effect.
|
||||
For instance, hardware failures are not analysed in the context of how they will
|
||||
be handled (or missed) by the software.
|
||||
%
|
||||
System outputs commanded from software may not take into account particular
|
||||
hardware limitations etc.
|
||||
|
||||
The interface FMEA does serve to provide a useful
|
||||
check-list to ensure data and synchronisation conventions used by the hardware
|
||||
and software are not mismatched. However, the fact it is perceived as required %The fact its required
|
||||
highlights the the miss-matches possible between the two types of analysis
|
||||
which could run deeper than the mere interface level.
|
||||
|
||||
However, while these techniques ensure that the software and hardware is
|
||||
viewed and analysed from several perspectives, it cannot be termed a homogeneous
|
||||
failure mode model.
|
||||
% For instance
|
||||
% were the ADC to have a small value error, say adding
|
||||
% a small percentage onto the value, we would be unable to
|
||||
% detect this under the analysis conditions for this model, or
|
||||
% be able to pinpoint it.
|
||||
%
|
||||
|
||||
|
||||
|
||||
\section{Conclusion}
|
||||
|
||||
FMEA useful tool for basic safety --- provides statistics on safety where field data impractical ---
|
||||
very good with single failure modes linked to top level events.
|
||||
FMEA has become part of the safety critical and safety certification industries.
|
||||
|
||||
SFMEA is in its infancy, but there is a gap in current
|
||||
certification for software, EN61508, recommends hardware redundancy architectures in conjunction
|
||||
with FMEDA for hardware: for software it recommends language constraints and quality procedures
|
||||
but no inductive fault finding technique.
|
BIN
submission_thesis/CH2_FMEA/ftcontext.dia
Normal file
BIN
submission_thesis/CH2_FMEA/ftcontext.dia
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user