massive edit session in the afternoon after painting shelves.

This commit is contained in:
Robin Clark 2015-03-29 19:32:20 +01:00
parent c3416b7fde
commit 74259c22c8

View File

@ -159,7 +159,7 @@ the ability to model integrated hardware and software systems.
% reaches conclusions about the effectiveness and failure mode % reaches conclusions about the effectiveness and failure mode
% coverage of the combined FMEA techniques. % coverage of the combined FMEA techniques.
To demonstrate FMMDA a small, but complete embedded system To demonstrate FMMD a small, but complete embedded system
(including both software and hardware), (including both software and hardware),
worked example is presented to show FMMD applied to an worked example is presented to show FMMD applied to an
integrated electronics/software system. integrated electronics/software system.
@ -172,6 +172,46 @@ integrated electronics/software system.
\section{Introduction} \section{Introduction}
FMEA stands for Failure Mode Effects Analysis.
%
All components used to build a system can fail.
They may fail in more than one way.
The ways in which a component can fail, are known as its failure modes.
At its simplest FMEA means taking taking a failure mode of a component and predicting
what problems it may cause for the system it is part of.
%
One way the electronic component the resistor can fail for instance, is it
to go open circuit. It could be because it was not soldered on properly and fell off,
it could have had an internal mechanical fault or it could be burnt off by too much
electrical current. The cause does not matter. The fact that it can fail by going open circuit does.
%
This then is one of the failure modes of a resistor, $OPEN$.
%
For instance, an FMEA scenario could be a resistor in a system going $OPEN$. % circuit.
%
If the resistor was part of an amplifier in the circuit
it could be predicted say, that a particular reading,
as measured by the amplifiers output, would go outside of an expected
range.
%
The erroneous reading may cause the system to fail dangerously or may simply be detected and flagged
as a fault.
%
The description of the outcome is at the discretion of the Engineer
responsible for the FMEA report.
The central concept of FMEA is that if all component failures are known,
by analysing them the failure behaviour of a system can be determined.
%
This means looking at every component in the system, and for each of those components
examining all known failure modes in the context of the system that it is in.
%
Various handbooks and international standards list common components and
their know failure modes, often with accompanying statistics~\cite{en298, fmd91, mil1991}.
\subsection{Origins of FMEA tecniques}
%FMEA methodologies trace from the 1940's and were designed to %FMEA methodologies trace from the 1940's and were designed to
%model simple electro-mechanical systems. %model simple electro-mechanical systems.
% %
@ -192,8 +232,8 @@ software elements.
Software generally sits on top of most modern safety critical control systems Software generally sits on top of most modern safety critical control systems
and defines its most important system wide behaviour and communications. and defines its most important system wide behaviour and communications.
% %
Currently standards that demand FMEA investigations for hardware(HFMEA) (e.g. EN298, EN61508), Currently standards that demand FMEA investigations for hardware FMEA (HFMEA) (e.g. EN298, EN61508),
do not specify it for software but instead essentially just specify good practise, do not specify FMEA for software but instead essentially just specify good practise,
i.e. review processes and language feature constraints. i.e. review processes and language feature constraints.
% %
That is to say FMEA has no formal framework for following That is to say FMEA has no formal framework for following
@ -204,13 +244,31 @@ This is a weakness.
% %
Where HFMEA % scientifically Where HFMEA % scientifically
traces component {\fms} traces component {\fms}
to resultant system failures, software until recently, has been left in a non-analytical to resultant system failures, the issue of software until recently, has been ignored.
limbo of best practises and constraints. Most %left in a non-analytical limbo
standards that mention software do not have methodologies
to apply FMEA, instead they prescribe best practises,
defensive programming strategies, redundancy and constraints~\cite{en61508}.
% %
Software FMEA has been proposed Software FMEA (SFMEA) has been proposed
in several forms~\cite{modelsfmea,sfmea,procsfmeadb,sfmeaauto}. in several forms~\cite{modelsfmea,sfmea,procsfmeadb,sfmeaauto}.
% %
However, SFMEA is always performed separately from HFMEA.
Some work has looked at the software/hardware interface~\cite{sfmeainterface}
but in general SFMEA is always performed separately from HFMEA.
%
What this means is that the FMEA analysis cannot guarantee to handle
all possible failures from hardware.
%
The hardware may even flag that it has self-detected some kind of failure, but
because the software and hardware analyses are separate
there is no way the analysis process can guarantee the hardware error will be made known to the software.
%
This means that some hardware failure modes are unexpected and therefore
un-handled by the software.
This means the system can exhibit unpredictable and possibly dangerous behaviour which
will not be picked up by the FMEA process.
%
% %
% This paper seeks to examine the effectiveness of current and proposed SFMEA % This paper seeks to examine the effectiveness of current and proposed SFMEA
% techniques, by analysing a simple hybrid hardware/software system, % techniques, by analysing a simple hybrid hardware/software system,
@ -329,20 +387,33 @@ Because of this a programmer is very unlikely to implement a simple bubble sort
if the number of elements $N$ is predicted to be large. if the number of elements $N$ is predicted to be large.
To evaluate FMEA techniques a metric is required. To evaluate FMEA techniques a metric is required.
%
When analysing a failure mode of a component, it is reasonable to When analysing a failure mode of a component, it is reasonable to
look at how the failure mode will affect the other components in the system and to put this then look at how the failure mode will affect the other components in the system and to put this then
into the context of the systems behaviour. into the context of the systems behaviour.
%
Components may fail in several ways. European standard EN298~\cite{en298} gives the possible Components may fail in several ways. European standard EN298~\cite{en298} gives the possible
failure modes for a resistor as $OPEN$ and $SHORT$ for instance. failure modes for a resistor as $OPEN$ and $SHORT$ for instance.
%
The term $f$ is defined as the number of component failure modes for a given component. The term $f$ is defined as the number of component failure modes for a given component.
A system will have $N$ number of components. A system will have $N$ number of components.
For each component then there will be $f \times (N-1)$ other components that could be affected by
it failing.
In the case of the resistor $f$ is two
~\footnote{A resistor is assigned two failure modes by the European Burner standard EN298~\cite{en298}
as long as some specific safety precautions involving voltage and power ratings are kept.} : $OPEN$ or $SHORT$.
N is the number of components in the system.
In order to check this single resistors failure mode then,
it must be checked twice, for the condition OPEN and then for the condition SHORT, potentially
against all other components in the system $(N-1)$.
For each component then there will be $f (N-1)$ other components that could be affected by
it failing.
%
By counting the number of of checks to make, i.e. failure mode against all other components in the system, By counting the number of of checks to make, i.e. failure mode against all other components in the system,
a metric for for evaluating FMEA is defined. a metric for evaluating the maximum number of checks that need to be performed for an FMEA is defined.
% %
This count of checks is defined as `reasoning~distance' ---or in other words is --- the number of stages of logic and reasoning used This count of checks is defined as `reasoning~distance' ---or in other words is --- the number of stages of logic and reasoning used
@ -380,10 +451,11 @@ in the system.
The exhaustive reasoning~distance for a system would be the The exhaustive reasoning~distance for a system would be the
the sum of these multiplications for all its components. % it contains. the sum of these multiplications for all its components. % it contains.
% %
If a small system were to have say 100 components, with three failure modes per component, this Take a hypothetical small system with say 100 components, with three failure modes per component,
would give an exhaustive reasoning distance. % ---for single failure analysis---of $3 \times 100 \times 99$. %this
That means to for each {\fm} of every component, $3$, a check would have to be made %would give an exhaustive reasoning distance for single failure analysis---of $3 \times 100 \times 99$.
against 99 other components. There are 100 components in this hypothetical example so that means to for each {\fm} of every component, i.e. $3$ checks, would have to be made
against 99 other components. There are 100 components in this hypothetical example
for single failure analysis this means $3 \times 100 \times 99$ checks. for single failure analysis this means $3 \times 100 \times 99$ checks.
% %
This concept of `reasoning~distance' provides a metric to examine This concept of `reasoning~distance' provides a metric to examine
@ -392,13 +464,14 @@ methodologies).
% %
%\fmmdglossSTATEEX %\fmmdglossSTATEEX
% %
A high reasoning distance, because it is a manual process performed by experperts, is both A high reasoning distance, because it is a manual process performed by experts, is both
expensive in terms of time and money. expensive in terms of time and money.
%
It is apparent also that the shorter the reasoning distance, the more precisely theoretical examination It is apparent also that the shorter the reasoning distance, the more precisely theoretical examination
can determine failure symptoms. A shorter reasoning distance therefore implies a higher quality of safety analysis. can determine failure symptoms. A shorter reasoning distance therefore implies a higher quality of safety analysis.
% %
For instance for a very simple small circuit, a better understanding of failure effects is expected,
than for a very large system where there are more variables and potential {\fm} interactions.
% %
%.... general concept... simple ideas about how complex a %.... general concept... simple ideas about how complex a
%failure analysis is the more modules and components are involved %failure analysis is the more modules and components are involved
@ -446,6 +519,24 @@ to undertake XFMEA for single failures.
%Even small systems have typically %Even small systems have typically
%100 components, and they typically have 3 or more failure modes each, which would give %100 components, and they typically have 3 or more failure modes each, which would give
The hypothetical example described above gives $100 \times 99 \times 3 = 29,700 $ as a reasoning~distance. The hypothetical example described above gives $100 \times 99 \times 3 = 29,700 $ as a reasoning~distance.
%%% SANITY CHECK.
%%%
When stating a general equation such as equation~\ref{eqn:fmea_single} it can be sanity checked
by thinking of common examples.
For instance a simple amplifier circuit with a handful of components
would have a low $RD_{single}$ count of potential failure mode to components checks.
%
From experience, with a simple amplifier circuit it is relatively easy to predict
how it would react to well defined component failure modes.
For a larger circuit the problems of tracing side effects of the failure mode through the circuit
mean that it is likely to be a far more complex task.
The order $O(N^2)$ for FMEA complexity, for single failures, therefore agrees with experience.
In general terms, for a very simple small circuit, a better understanding of failure effects is expected,
than for a very large system where there are more variables and potential {\fm} interactions.
% %
%\fmmdglossSTATEEX %\fmmdglossSTATEEX
\paragraph{Exhaustive FMEA and double failure scenarios.} \paragraph{Exhaustive FMEA and double failure scenarios.}
@ -469,6 +560,13 @@ $100 \times 99 \times 98 \times 9 = 8,731,800 $. % failure mode scenarios.
% %
In practise there is an additional complication; that of In practise there is an additional complication; that of
the circuit topology changes that {\fms} can cause. the circuit topology changes that {\fms} can cause.
Double failure analysis is usually only performed on sections
of a system considered most critical, and often in the context of redundancy.
%
For a combustion controller, it is stated~\cite{en298} that there must be two separate
fuel shut-off valves, that are controlled from different relays and wiring.
%
This is actually more an enforcement of redundancy than FMEA for `any~double~combination' of failure modes.
\paragraph{Reliance on experts for meaningful FMEA Analysis.} \paragraph{Reliance on experts for meaningful FMEA Analysis.}
Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach. Current FMEA methodologies cannot consider---for the reason of state explosion---an exhaustive approach.
@ -501,50 +599,104 @@ system.
% An example of component tolerance considered for FMEA % An example of component tolerance considered for FMEA
% is given in section~\ref{sec:resistortolerance}. % is given in section~\ref{sec:resistortolerance}.
%\section{FMEA in current usage: Five variants} % %\section{FMEA in current usage: Five variants}
\section{FMEA in current usage: Four variants} % \section{FMEA in current usage: Four variants}
%
%\paragraph{Five main Variants of FMEA} % %\paragraph{Five main Variants of FMEA}
\paragraph{Four main Variants of FMEA} % \paragraph{Four main Variants of FMEA}
\begin{itemize} % \begin{itemize}
%\item \textbf{PFMEA - Production} Emphasis on cost reduction and product improvement; % %\item \textbf{PFMEA - Production} Emphasis on cost reduction and product improvement;
\item \textbf{FMECA - Criticality} Emphasis on minimising the effect of critical systems failing~\cite{fmeca}; % Military/Space % \item \textbf{FMECA - Criticality} Emphasis on minimising the effect of critical systems failing~\cite{fmeca}; % Military/Space
\item \textbf{FMEDA - Statistical Safety} Statistical analysis giving Safety Integrity Levels~\cite{en61508}; % \item \textbf{FMEDA - Statistical Safety} Statistical analysis giving Safety Integrity Levels~\cite{en61508};
\item \textbf{DFMEA - Design or Static/Theoretical} Approval of safety critical systems using FMEA and single or double failure prevention~\cite{en298};% EN298/EN230/UL1998 % \item \textbf{DFMEA - Design or Static/Theoretical} Approval of safety critical systems using FMEA and single or double failure prevention~\cite{en298};% EN298/EN230/UL1998
\item \textbf{SFMEA - Software FMEA} --- Usage not enforced by most current standards~\cite{en298,en230,en61508}. %only used in highly critical systems at present. % \item \textbf{SFMEA - Software FMEA} --- Usage not enforced by most current standards~\cite{en298,en230,en61508}. %only used in highly critical systems at present.
\end{itemize} % \end{itemize}
\nocite{MILSTD1629short} \nocite{MILSTD1629short}
\subsection{FMEA and modularity.} \section{FMEA and modularity.}
Because modern electronics has become more complex the number Because modern electronics has become more complex the number
of basic components has risen dramatically. of basic components in atypical safety critical system has risen dramatically.
To add to this components used to fulfil common functions are often Integrated Circuits (ICs).. %
%
To add to this components used to fulfil common functions are often Integrated Circuits (ICs).
%
Typical examples include voltage regulators, op-amps, micro-controllers~\cite{pic18f2523}, memory modules and Typical examples include voltage regulators, op-amps, micro-controllers~\cite{pic18f2523}, memory modules and
protocol handlers~\cite{mcp2515}. To build any of these component from scratch would be very expensive and time consuming, protocol handlers~\cite{mcp2515}. To build any of these component from scratch would be very expensive and time consuming,
but these IC `components' have very high internal transistor counts, and each have their own unique but these IC `components' have very high internal transistor counts, and each have their own unique
failure mode behaviour. failure mode behaviour.
%
Thus modern electronics has already become too large in scope to sensibly implement the base component failure mode directly mapped to Thus modern electronics has already become too large in scope to sensibly implement the base component failure mode directly mapped to
a system failure paradigm. a system failure paradigm.
\paragraph{Modularity --- breaking large systems into manageable blocks}
When faced with complex systems, a typical way to make them
manageable is to break them into sub-systems, and even sub-systems of sub-systems ad infinitum.
\paragraph{History of Modularisation in Software}
%
It is interesting to compare the development of FMEA methodologies with software.
%
Software faced a crisis in complexity in the 1960's where the architecture of
dominant computer language FORTRAN~\cite{f77} became a limiting factor.
%
Programs written in FORTRAN became clumsy when they became large.
%
All variables were global.
%
A miss-spelled variable could cause chaos.
%
Also it was often difficult to pull a function
out of one program and place it in another if it used some of the global variables.
Newer computer languages were invented where modularity was encouraged.
Instead of FORTRANs global scope for variables, individual functions in a newer language like `C'
started to have `local' variables. This meant that
a programmer could take a function from a `C' program and
use it in another one without complication.
%
Later languages implemented object orientation
which grouped functions and data together into modules called classes, where
even the internal local variables could be hidden from the
programmer using the class.
%
Software expanded in complexity faster than electronics,
and to cope with this software languages developed modularity (function call trees, classes and finally distributed processing mechanisms).
%
FMEA has, by necessity, started to include some modular features but none yet
have defined mechanisms for ensuring that all failure modes
from a module must be considered in the analysis of the module(s)
that incorporate it.
\paragraph{Modularisation in safey analysis in the automotive industry.}
The automotive industry, because of mass production, must make products that have high safety integrity %that are very safe but The automotive industry, because of mass production, must make products that have high safety integrity %that are very safe but
% financial pressure keeps their products % financial pressure keeps their products
but must also be affordable. but must also be affordable.
% %
This leads to specialist firms producing modules, such as automatic braking systems, This leads to specialist firms producing modules, such as automatic braking systems,
that are bought in and assembled to make a auto-mobile. that are bought in and assembled to make an auto-mobile.
% %
Performing failure analysis using the basic component single failure modes to Performing failure analysis using the basic component single failure modes to
system failure mapping, would be very difficult: this would require expert knowledge system failure mapping, would thus be very difficult: this would require expert knowledge
of the design behaviour and component types used in each module. of the design behaviour and component types used in each module.
%%
%Because modern systems have become more complex and now include software elements,
%modularity
%of some form (breaking the problem down into smaller sections),
%has become necessary to break down the state explosion problems associated with FMEA.
% %
Because modern systems have become more complex and now include software elements, Some modular FMEA techniques are starting to be used and specified, and are described below.
modularity
of some form (breaking the problem down into smaller sections),
has become necessary to break down the state explosion problems associated with FMEA.
%
Some modular techniques are starting to be formalised, and are described below.
\paragraph{Automotive SIL (ASIL) --- modularisation of FMEDA} \paragraph{Automotive SIL (ASIL) --- modularisation of FMEDA}
% %
@ -564,66 +716,55 @@ It does not introduce traceable {\fm} reasoning in its hierarchy.
% %
\paragraph{Indenture levels --- modularisation of FMECA} \paragraph{Indenture levels --- modularisation of FMECA}
%
The US military standard for FMECA~\cite{fmeca}, describes a very broad modularity regime, that The US military standard for FMECA~\cite{fmeca}, describes a very broad modularity regime, that
it terms `indenture' levels. Indenture levels are arranged from the top down it terms `indenture' levels.
and identify finer and finer grained modules. For instance, an aircraft %
Indenture levels are arranged from the top down
and identify finer and finer grained modules.
%
For instance, an aircraft
may be the first indenture level, and the next may be an identifiable module such as may be the first indenture level, and the next may be an identifiable module such as
an altitude radar: within that finer grained modules may be identified until an altitude radar: within that finer grained modules may be identified until
the base components are listed. Note that this is a top down approach to modularisation and the base components are listed.
this can introduce errors into the reliability calculations~\cite{MILSTD1629short}.
% %
\paragraph{Modularisation in Software} Note that this is a top down approach to modularisation and
this can introduce errors into the reliability calculations~\cite{MILSTD1629short}
and miss-out some component failure modes.
% %
It is interesting to compare the development of FMEA methodologies with software.
Software expanded in complexity faster than electronics,
and to cope with this software languages developed modularity (function call trees, classes and finally distributed processing mechanisms).
%
FMEA has, by necessity, started to include some modular features but none yet
have defined mechanisms for ensuring that all failure modes
from a module must be considered in the analysis of the module(s)
that incorporate it.
\paragraph{Top Down or Bottom-up?} %
% Because FMEA is a bottom up technique, applying a top down analysis (as in FMECAs indenture levels) % \paragraph{Top Down or Bottom-up?}
% cannot guarantee to consider all component failure modes in the correct context. % % Because FMEA is a bottom up technique, applying a top down analysis (as in FMECAs indenture levels)
% % cannot guarantee to consider all component failure modes in the correct context.
% % %
% A top down approach (such as FTA) can miss~\cite{faa}[Ch.~9] individual failure modes of components,
% especially where there are non-obvious or unexpected top-level failures.
% % % %
A top down approach (such as FTA) can miss~\cite{faa}[Ch.~9] individual failure modes of components, % In order to ensure that every failure mode is considered, a bottom-up approach
especially where there are non-obvious or unexpected top-level failures. % including every base components {\fms} must be used.
% %
% Going back to the software analogy, the indenture levels of FMECA are similar to
% a software call tree where the highest indenture levels would be leaf functions.
% %
% There is no equivalent of the software `class'.
% %
% In the real world however there are.
% Off the shelf sensors can be purchased which communicate using standard protocols~\cite{Pfeiffer:2003:ENC:1199616}. % consider CANOpen standard sensors, these are%~\footnote{CANopen sensors...}
% %modules connected by an industrial data bus.
% %
% These not only typically have electrical and mechanical
% components, they have a firmware and communication bus aspects~\cite{canspec, caninauto}.
% %
% These type of modules combine hardware, electronics, software, communications
% and distributed programming.
% %
% Current FMEA techniques struggle with software alone, and also, fail to integrate the analysis of hardware and software
% systems~\cite{sfmea, embedsfmea, modelsfmea, sfmeaa}. %, sfmeainterface }.
% %
In order to ensure that every failure mode is considered, a bottom-up approach
including every base components {\fms} must be used.
%
Going back to the software analogy, the indenture levels of FMECA are similar to
a software call tree where the highest indenture levels would be leaf functions.
%
There is no equivalent of the software `class'.
%
In the real world however there are.
Off the shelf sensors can be purchased which communicate using standard protocols~\cite{Pfeiffer:2003:ENC:1199616}. % consider CANOpen standard sensors, these are%~\footnote{CANopen sensors...}
%modules connected by an industrial data bus.
%
These not only typically have electrical and mechanical
components, they have a firmware and communication bus aspects~\cite{canspec, caninauto}.
%
These type of modules combine hardware, electronics, software, communications
and distributed programming.
%
Current FMEA techniques struggle with software alone, and also, fail to integrate the analysis of hardware and software
systems~\cite{sfmea, embedsfmea, modelsfmea, sfmeaa}. %, sfmeainterface }.
% %
\subsection{FMEA and software.}
In addition to increasing complexity in electronics, modern control systems nearly always have a significant software/firmware element,
and not being able to model software with current FMEA methodologies
is a cause for criticism~\cite{safeware}[Ch.12].
%
Similar difficulties in integrating mechanical and electronic/software
failure models are discussed in ~\cite{SMR:SMR580}.
%
Currently standards that demand FMEA for hardware (e.g. EN298~\cite{en298}, EN61508~\cite{en61508}),
do not specify it for software, but instead recommended computer-architectures, good software practise,
review processes and language feature constraints.
% %
@ -633,8 +774,23 @@ review processes and language feature constraints.
%FMMD is a modularisation of FMEA and can produce failure~mode models that can be used in %FMMD is a modularisation of FMEA and can produce failure~mode models that can be used in
%all the above variants of FMEA. %all the above variants of FMEA.
\subsection{The problem of Systems using software and FMEA}
\paragraph{Current work on Software FMEA} Software systems are becoming part of everyday life.
It is getting increasingly rarer to find systems where there is not a computer
controlling some part of it.
All modern airliners are fly-by wire. The throttle in a modern car is fly-by wire.
Because software and hardware FMEAs are separate, tracing failure effects
from hardware into software, or even ensuring that all predicted
hardware failure modes have been handled in software is difficult.
%
This problem is recognised and work has been undertaken to
begin to redress this problem.
\paragraph{Current work on Software FMEA.}
SFMEA usually does not seek to integrate SFMEA usually does not seek to integrate
hardware and software models, but to perform hardware and software models, but to perform
@ -663,7 +819,7 @@ The main FMEA methodologies are all based on the concept of taking
base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}. base component {\fms}, and translating them into system level events/failures~\cite{sfmea,sfmeaa}.
% %
That is there is only one stage of reasoning between the low level component {\fm} and That is there is only one stage of reasoning between the low level component {\fm} and
the system level symptom of failure. the system level symptom of failure leaves ample room for error.
% %
In a complicated system, mapping a component failure mode to a system level failure In a complicated system, mapping a component failure mode to a system level failure
will mean a long reasoning distance; that is to say the actions of the will mean a long reasoning distance; that is to say the actions of the
@ -671,7 +827,7 @@ failed component will have to be traced through
several sub-systems, gauging its effects with and on other components. several sub-systems, gauging its effects with and on other components.
% %
With software at the higher levels of these sub-systems, With software at the higher levels of these sub-systems,
we have yet another layer of complication. there is yet another layer of complication.
% %
%In order to integrate software, %in a meaningful way %In order to integrate software, %in a meaningful way
%we need to re-think the %we need to re-think the
@ -720,12 +876,12 @@ getting too complicated for meaningful analysis using FMEA.
% has much higher component counts and more complex components than those in use when FMEA % has much higher component counts and more complex components than those in use when FMEA
% was designed. % was designed.
% %
From the above defeciencies, a wish list for a better FMEA is presented, stating the features that should exist From the above deficiencies, a wish list for a better FMEA is presented, stating the features that should exist
in an improved FMEA methodology, in an improved FMEA methodology,
\begin{itemize} \begin{itemize}
\item Must be able to analyse hybrid software/hardware systems, \item Must be able to analyse hybrid software/hardware systems,
\item avoid state explosion (i.e. XFMEA is impractical by hand~\cite{cbds}), \item avoid state explosion (i.e. XFMEA is impractical by hand~\cite{cbds}),
\item encourage exhaustive checking within each modular, %(total failure coverage within {\fgs} all interacting component and failure modes checked), \item encourage exhaustive checking within each module, %(total failure coverage within {\fgs} all interacting component and failure modes checked),
\item traceable reasoning inherent in system failure models,% to aid repeatability and checking, \item traceable reasoning inherent in system failure models,% to aid repeatability and checking,
\item re-usable i.e. it should be possible to re-use analysis, \item re-usable i.e. it should be possible to re-use analysis,
\item possibility to analyse simultaneous/multiple failures, \item possibility to analyse simultaneous/multiple failures,
@ -737,6 +893,61 @@ in an improved FMEA methodology,
\section{Proposed Methodology: Failure Mode Modular De-composition (FMMD)} \section{Proposed Methodology: Failure Mode Modular De-composition (FMMD)}
The basic concept behind FMMD is to from the bottom-up, modularise the problem.
FMEA cannot easily be modularised from the top-down, because
it has to deal with component failure modes.
It may seem bit counter intuitive, but this means that if FMEA is to be modularised
it must be done from the bottom up.
This may seem like a stange idea, but consider how an engineer would look
at an electronic circuit/schematic.
%
The Engineer might, for instance, trace an input signal
into some other components following a connection on the schematic.
%
The Engineer, would typically then following signal paths, try to figure out what
those components did.
%
For instance were it an amplifier, the engineer would
recognise the electronic configuration,
and maybe get his calculator out and calculate its gain
or some other feature, by looking at the other components connected to it.
%
This is a form of modularisation from the bottom-up.
%
The Engineer has identified a module, an input amplifier.
\paragraph{Broadly FMMD is modularisation from the bottom-up of FMEA}
Firstly modules are identified (for instance common circuitry formations such as amplifiers or digital outputs) and
then failure mode analysis is performed on them.
%
By analysing this small group of components as a module
the ways in which the module can fail can be listed.
%
This will give a set of symptoms of failure for the module.
%
When the lower levels have been analysed, modules can be brought
together to form larger modules using the lower ones as through they were
components.
%
These modules can be brought together to form even larger modules.
Eventually there is one large module which represents the entire system.
Because the terms module and sub-system are quite general term, and possibly over-used,
a new term has been used to take their place in FMMD.
%
This is the `functional~group'.
%
Quite simply when identifying a group of components that perform a particular task
the term `functional~group' describes it as a group that performs a function.
%
It also means that a function~group can contain other functional~groups without
dragging along the semantic baggage that comes with the terms `module' and 'sub-system'.
\section{The proposed Methodology} \section{The proposed Methodology}
\label{fmmdproc} \label{fmmdproc}
@ -747,18 +958,27 @@ work together to perform a given function: the failure modes of the components
are analysed, and a failure mode behaviour for the group determined: this group are analysed, and a failure mode behaviour for the group determined: this group
can now be used as a component in its own right with a set of its own failure modes. can now be used as a component in its own right with a set of its own failure modes.
% %
% In essence, this methodology beginning with low level modules (or {\fgs}) In essence, this methodology beginning with low level modules (or {\fgs})
% which are analysed and assigned a failure mode behaviour. which are analysed and assigned a failure mode behaviour.
% They are then considered as higher level components with They are then considered as higher level components with
% their own failure mode behaviour. These higher level components their own failure mode behaviour. These higher level components
% are then collected to form {\fgs} and so on until a hierarchy is built are then collected to form {\fgs} and so on until a hierarchy is built
% representing the entire system. representing the entire system.
% %
% Any new static failure mode methodology must ensure that it Any new static failure mode methodology must ensure that it
% represents all component failure modes and it therefore should be bottom-up, represents all component failure modes and it therefore should be bottom-up,
% starting with individual component failure modes. starting with individual component failure modes.
%
That way, all component failure modes must be considered.
%
If you modularise from the top down, it is not naturally follow
bottom-level component failure modes would be handled/used.
%
Starting at the bottom means having to deal with each component failure mode from the beginning.
\paragraph{FMMD process.} \paragraph{FMMD process.}
To ensure all component failure modes are modelled and traceable through stages of analysis, the new methodology must be bottom-up. To ensure all component failure modes are modelled and traceable through stages of analysis, the new methodology must be bottom-up.
% %
%This seems essential to satisfy criterion 2. %This seems essential to satisfy criterion 2.
@ -824,7 +1044,7 @@ A practical example of a hardware FMEA performed both traditionally and using FM
software and hardware hybrid example is analysed in~\cite{syssafe2012} software and hardware hybrid example is analysed in~\cite{syssafe2012}
and examples of `reasoning~distance' efficiency savings can be found in~\cite{clark}[Ch.7]. and examples of `reasoning~distance' efficiency savings can be found in~\cite{clark}[Ch.7].
% %
\paragraph{Integrating software into the FMMD model} \paragraph{Integrating software into the FMMD model.}
% %
%With modular FMEA i.e. FMMD %(FMMD) %With modular FMEA i.e. FMMD %(FMMD)
%the concepts of failure~modes %the concepts of failure~modes
@ -881,7 +1101,7 @@ The software reads the temperature from the sensor and applies checks
to detect any failures. to detect any failures.
The software then applies a PID~\cite{dcods} algorithm to determine the length/modulation of the pulses applied to the heater. The software then applies a PID~\cite{dcods} algorithm to determine the length/modulation of the pulses applied to the heater.
yourdon context diagram here %yourdon context diagram here
@ -920,6 +1140,10 @@ as a complete example of an electronic/hardware hybrid analysed using FMMD. %wou
When designing a computer program it is often useful to When designing a computer program it is often useful to
start with a system overview. start with a system overview.
A structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:62004} is presented below, see figure~\ref{fig:context_diagram_PID}. A structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:62004} is presented below, see figure~\ref{fig:context_diagram_PID}.
A Yourdon context diagram shows an overview of a system, with the data inputs and data outputs.
The circle in the middle defines the processing applied to those inputs and outputs.
The context diagram can be later refined by introducing more circles with data paths between them.
% %
\begin{figure}[h]+ \begin{figure}[h]+
\centering \centering
@ -932,7 +1156,7 @@ A structured analysis `Yourdon' context diagram~\cite{Yourdon:1989:MSA:62004} is
Using figure~\ref{fig:context_diagram_PID} the system in terms of its data flow is reviewed, starting Using figure~\ref{fig:context_diagram_PID} the system in terms of its data flow is reviewed, starting
with the data sources (the Pt100 temperature sensor inputs) and the data sinks (the heater output and the LED indicators). with the data sources (the Pt100 temperature sensor inputs) and the data sinks (the heater output and the LED indicators).
% %
There are two voltage inputs (see section~\ref{sec:Pt100}) from the Pt100 temperature sensor. There are two voltage inputs (see section~\ref{clark}[5]) from the Pt100 temperature sensor.
% %
For the Pt100 sensor, the voltages it outputs are read and %for For the Pt100 sensor, the voltages it outputs are read and %for
this requires an ADC and MUX. this requires an ADC and MUX.
@ -987,7 +1211,9 @@ controlled system that does not use a traditional operating system. These are ge
coded in 'C' or assembly language and run immediately from power-up.} coded in 'C' or assembly language and run immediately from power-up.}
software architectures, a rudimentary operating system is required, often referred to as the `monitor'. software architectures, a rudimentary operating system is required, often referred to as the `monitor'.
% %
PID, because the algorithm depends heavily on integral calculus~\cite{dcods}[Ch.3.3] is time sensitive The `monitor' function calls the PID function at a regular and precise interval.
%
The PID function, because the algorithm depends heavily on integral calculus~\cite{dcods}[Ch.3.3] is time sensitive
and it is necessary to execute it at precise intervals determined by its proportional, integral and differential (PID) coefficients. and it is necessary to execute it at precise intervals determined by its proportional, integral and differential (PID) coefficients.
% %
Most micro-controllers feature several general purpose timers~\cite{pic18f2523}. Most micro-controllers feature several general purpose timers~\cite{pic18f2523}.
@ -997,7 +1223,7 @@ to call the PID algorithm at a regular and precise time interval. % specified in
% %
\paragraph{Data flow model to programmatic call tree.} \paragraph{Data flow model to programmatic call tree.}
The Yourdon methodology also gives guidance as to which software The Yourdon methodology also gives guidance as to which software
functions should be called to control the process, or in `C' terms be the main function. functions should be called to control a process, or in `C' terms be the main function.
% %
\begin{figure}[h] \begin{figure}[h]
\centering \centering
@ -1058,10 +1284,10 @@ These are listed, and from the bottom-up, FMMD analysis is begun.
To summarise from the design stage, To summarise from the design stage,
the electronic components identified thus far: the electronic components identified thus far:
\begin{itemize} \begin{itemize}
\item ADCMUX --- Electronics, analysed in previous example, \item ADCMUX --- Internal micro controller multiplexer and analogue to digital converter,
\item TIMER --- Internal micro controller timer, \item TIMER --- Internal micro controller timer,
\item HEATER --- Heating element, essentially a resistor, \item HEATER --- Heating element, essentially a resistor,
\item Pt100 --- Pt100 Temperature sensor, as analysed in section~\ref{sec:Pt100}, \item Pt100 --- Pt100 Temperature sensor,
\item PWM --- Internal micro controller pulse width modulation module, \item PWM --- Internal micro controller pulse width modulation module,
\item General Purpose I/O (GPIO) --- I/O used to drive LEDS, %. %source LED current \item General Purpose I/O (GPIO) --- I/O used to drive LEDS, %. %source LED current
\item LEDs --- Indication LEDs via GPIO, \item LEDs --- Indication LEDs via GPIO,
@ -1070,6 +1296,11 @@ the electronic components identified thus far:
% %
\subsection{Temperature Controller Hardware Elements FMMD.} \subsection{Temperature Controller Hardware Elements FMMD.}
% %
NEED BETTER REFS HERE FOR THE
SOURCES FOR THE FAILURE MODES OF COMPONENTS>
\paragraph{ADCMUX and Read\_ADC.} \paragraph{ADCMUX and Read\_ADC.}
We re-use the {\dc} from section~\ref{readADC}. We re-use the {\dc} from section~\ref{readADC}.
$$ fm(RADC) = \{ VV\_ERR, HIGH, LOW \} .$$ $$ fm(RADC) = \{ VV\_ERR, HIGH, LOW \} .$$