.
This commit is contained in:
parent
0aa4c9e924
commit
3e63538142
@ -23,8 +23,8 @@ This changed the target for the study slightly to encompass these three domains
|
||||
\section{Background}
|
||||
|
||||
I completed an MSc in Software engineering in 2004 at Brighton University while working for
|
||||
an Engineering firm as a software Engineer.
|
||||
The firm make industrial burner controllers.
|
||||
an Engineering firm as a Software Engineer.
|
||||
The firm specialise in industrial burner controllers.
|
||||
Industrial Burners are potentially very dangerous industrial plant.
|
||||
They are generally left running unattended for long periods.
|
||||
They are subject to stringent safety regulations and
|
||||
@ -35,15 +35,15 @@ One cannot merely comply with the standards.
|
||||
The product must be `certified' by an independent
|
||||
and
|
||||
`competent body' recognised under European law.
|
||||
The cerification involved stress testing with repeated operation cycles,
|
||||
over a specified a range of temperatures. Electrical stress testing with high voltage interference, and
|
||||
power supply voltage surges and dips. Electro static discharge testing, and
|
||||
The certification involved stress testing with repeated operation cycles,
|
||||
over a specified a range of temperatures, electrical stress testing with high voltage interference, and
|
||||
power supply voltage surges and dips, electro static discharge testing, and
|
||||
EMC (Electro Magnetic Compatibility). A significant part
|
||||
of this process however, was `static testing'. This involved looking at the design of the products,
|
||||
from the perspective of components failing, and the effect on safety this would have.
|
||||
Some of the static testing involved checking that the germane `EN' standards had
|
||||
been complied with. Failure Mode Effects Analysis (FMEA) was also applied. This involved
|
||||
looking in detail at critical sections of the product and proposing
|
||||
looking in detail at selected critical sections of the product and proposing
|
||||
component failure scenarios.
|
||||
For each failure scenario proposed either a satisfactory
|
||||
answer was required, or a counter proposal to change the design to cope with
|
||||
@ -52,18 +52,18 @@ FMEA was time consuming, and being directed by
|
||||
experts undoubtly ironed out many potential safety faults before the product saw
|
||||
light of day.
|
||||
However it was quickly apparent that only a small proportion
|
||||
of copmponent~failure modes was considered. Also there was no formalism.
|
||||
of component~failure modes was considered. Also there was no formalism.
|
||||
The component~failure~modes investigated were not analysed within
|
||||
any rigourous or mathematically proven framework.
|
||||
|
||||
\subsection{ Blanket Risk Reduction Approach }
|
||||
|
||||
The suite of tests applied for a certified product amount to a `blanket' approach.
|
||||
That is to say that by applying Electrical, repeated operations, and environmental
|
||||
That is to say that by applying electrical, repeated operations, and environmental
|
||||
stress testing it is hoped that the majority of latent faults are discovered.
|
||||
The FMEA and static testing only looked at the most obviously safety critical
|
||||
aspects, and a small minority of the total component base for a product.
|
||||
Systememic faults, or mistakes are missed by this form of static testing.
|
||||
Systemic faults, or mistakes are missed by this form of static testing.
|
||||
|
||||
\subsection{Possibility of applying mathematical techniques to FMEA}
|
||||
|
||||
@ -73,7 +73,7 @@ and began thinking about how this could be done. One
|
||||
obvious factor was that a typical safety critical system could
|
||||
have more than 1000 component parts. Each component
|
||||
would typically have several failure modes.
|
||||
Trying to apply a rigourous methodology on an entire product
|
||||
Trying to apply a rigorous methodology on an entire product
|
||||
was going to be impractical. To do this with complete coverage
|
||||
each component failure mode would have to have been checked against
|
||||
the other thousand or so components for influence, and then
|
||||
@ -90,7 +90,7 @@ a set of system or top level faults or undesireable outcomes are defined.
|
||||
It then must break the system down into modules and
|
||||
decide which of these can contribute to a system level fault mode.
|
||||
Potentially failure modes, be they from components or the interaction
|
||||
betweem modules can be missed. A disturbing example of this
|
||||
between modules can be missed. A disturbing example of this
|
||||
is the NASA space shuttle in 1986, which missed the fault mode of an O
|
||||
ring. This was made even worse, by the fact that the `O' ring had a specified temperature
|
||||
range where the probability of this fault occuring was dramatically raised when below
|
||||
@ -98,11 +98,9 @@ the temperature range. This was a known and documented feature of a safety criti
|
||||
and it was ignored in the safety analysis.
|
||||
|
||||
\paragraph{Bottom-up Approach}
|
||||
A bottom-up approach look impractical at first due to the shear number
|
||||
of component failure modes in a typical system. However
|
||||
were this bottom-up approach to be modular
|
||||
we can reduce the
|
||||
, and built into a hierachy
|
||||
A bottom-up approach looked impractical at first due to the sheer number
|
||||
of component failure modes in a typical system.
|
||||
However were this bottom-up approach to be modular, (reducing the order of cross checking), and build a hierachy
|
||||
of modules rising up until all components are covered, we
|
||||
can model an entire complex system.
|
||||
This is the core concept behind this study.
|
||||
@ -117,15 +115,15 @@ Also a hierarchy is formed when the top level errors are formed
|
||||
naturally from the lower levels of analysis.
|
||||
Unlike a top~down analysis, we cannot miss a top level fault condition.
|
||||
|
||||
\paragraph{Multi-disipline}. Most safety critical systems are composed of mechanical, electrical and
|
||||
computing elements. A tragic example of the mechanical and electircal elements
|
||||
interfacing to a computer~controller is found in the THERAC25 x-ray dosage machine.
|
||||
With no common notation to integrate the saftey analyis between the electricali/mechanical and computing
|
||||
domains synchronisation errors occurred that were in some cases fatal.
|
||||
\paragraph{Multi-disipline} Most safety critical systems are composed of mechanical, electrical and
|
||||
computing elements. A tragic example of the mechanical and electrical elements
|
||||
interfacing to a computer is found in the THERAC25 x-ray dosage machine.
|
||||
With no common notation to integrate the saftey analyis between the electrical/mechanical and computing
|
||||
domains, synchronisation errors occurred that were in some cases fatal.
|
||||
|
||||
\paragraph{Requirements for a rigourous FMEA process}.
|
||||
\paragraph{Requirements for a rigorous FMEA process}
|
||||
It was determined that any process to apply
|
||||
FMEA in rigourous and complete (in terms of complete component coverage) had to be
|
||||
FMEA in rigorous and complete (in terms of complete component coverage) had to be
|
||||
a bottom~up process to eliminate the possibility of missing component failure modes.
|
||||
It also had to naturally converge to a failure model of the system.
|
||||
It had to take potentially thousands of component failure modes and simplify
|
||||
@ -137,7 +135,7 @@ a process of modularisation from the bottom~up.
|
||||
\begin{list}{$*$}{}
|
||||
\item The analysis process must be `bottom~up'
|
||||
\item The process must be modular and hierarchical
|
||||
\item The process must be multi-disipline and must be able to represent hardware, electronics and software
|
||||
\item The process must be multi-dicipline and must be able to represent hardware, electronics and software
|
||||
\end{list}
|
||||
|
||||
\section{Safety Critical Systems}
|
||||
@ -172,16 +170,18 @@ EN61508 \cite{EN61508} (international standard IOC1508).
|
||||
|
||||
\paragraph{Deterministic safety Measures}
|
||||
The second philosophy, applied to application specific standards, is to investigate
|
||||
components ior sub-systems in the critical safety path and to look at component failure modes
|
||||
components for sub-systems in the critical safety path and to look at component failure modes
|
||||
and ensure that they cannot cause dangerous faults.
|
||||
With the application specific standards detail
|
||||
specific to the process are
|
||||
This philosophy is first mentioned in aircraft safety operation reseach WWII
|
||||
studies. Here potential single faults (usually mechanical) are traced to
|
||||
catastrophic failures
|
||||
|
||||
% \cite{boffin}.
|
||||
|
||||
%With the application specific standards detail
|
||||
%specific to the process are
|
||||
The simplest deterministic safety measure is to require that no single component failure
|
||||
mode can cause a dangerous error.
|
||||
This philosophy is first mentioned in aircraft safety operation reseach (WWII)
|
||||
studies. Here potential single faults (usually mechanical) were traced to
|
||||
catastrophic failures \cite{boffin}.
|
||||
EN298, the European Gas burner standard, goes further than this
|
||||
and requires that no two single component faults may cause
|
||||
a dangerous condition.
|
||||
|
||||
|
||||
%
|
||||
@ -194,13 +194,13 @@ catastrophic failures
|
||||
|
||||
\subsection{Overview of regulation of safety Critical systems}
|
||||
|
||||
reference chapter dealing speciifically with this but given a quick overview.
|
||||
Reference chapter dealing specifically with this but given a quick overview.
|
||||
\subsubsection{Overview system analysis philosophies }
|
||||
- General safety standards
|
||||
- specific safety standards
|
||||
|
||||
\subsubsection{Overview of current testing and certification}
|
||||
ref chapter speciifically on this but give an overview now
|
||||
Ref chapter specifically on this but give an overview now
|
||||
|
||||
A modern industrial burner has mechanical, electronic and software
|
||||
elements, that are all safety critical. That is to say
|
||||
@ -234,7 +234,7 @@ unhandled failures could create dangerous faults.
|
||||
%
|
||||
\section{An Outline of the FMMD Technique}
|
||||
|
||||
The methodology takes a bottom up approach to
|
||||
The FMMD methodology takes a bottom up approach to
|
||||
the design of an integrated system.
|
||||
%
|
||||
Each component is assigned a well defined set of failure modes.
|
||||
@ -243,7 +243,7 @@ perform simple well defined tasks.
|
||||
These functional groups are analysed with respect to the failure modes of the
|
||||
components.
|
||||
%
|
||||
The `functional group', after analysis, have its own set of derived
|
||||
The `functional group', after analysis, has its own set of derived
|
||||
failure modes.
|
||||
%
|
||||
The number of derived failure modes will be
|
||||
@ -272,10 +272,10 @@ A formal description of this process is dealt with in Chapter \ref{fmmddefinitio
|
||||
Automated systems, as opposed to manual ones are now the norm
|
||||
in the home and in industry.
|
||||
%
|
||||
Automated systems have long been recognised as being more effecient and
|
||||
Automated systems have long been recognised as being more efficient and
|
||||
more accurate than a human opperator, and the reason for automating a process
|
||||
can now be more likely to be cost savings due to better effeciency
|
||||
than a human operator \ref{burnereffency}.
|
||||
than a not paying a salary to a human operator \ref{burnereffency}.
|
||||
%
|
||||
For instance
|
||||
early automated systems were mechanical, with cams and levers simulating
|
||||
@ -285,11 +285,11 @@ A typical control function could be the
|
||||
fuel air mixture profile curves over a the firing range.
|
||||
%
|
||||
Because fuels vary slightly in calorific value, and air density changes with the weather, no optimal tuning can be optional.
|
||||
In fact for asethtic reasons (not wanting smoke to appear at the flue)
|
||||
In fact for asethetic reasons (not wanting smoke to appear at the flue)
|
||||
the tuning was often air rich, causing air to be heated and
|
||||
uneccessarily passed through the burner, leading to direct loss of energy.
|
||||
An automated system analysing the combustions gasses and automatically
|
||||
adjusting the fuel air mix can get the effeciencies very close to theoretical levels.
|
||||
unnecessarily passed through the burner, leading to direct loss of energy.
|
||||
An automated system analysing the combustion gasses and automatically
|
||||
adjusting the fuel air mix can get the efficiencies very close to theoretical levels.
|
||||
|
||||
|
||||
As the automation takes over more and more functions from the human operator it also takes on more responsibility.
|
||||
@ -297,7 +297,7 @@ A classic example of an automated system failing, is the therac-25.
|
||||
This was an X-ray dosage machine, that, due to software errors
|
||||
caused the deaths of several patients and injured more during the 1980's.
|
||||
The Therac-25 was a designed from a manual system, which had checks and interlocks,
|
||||
and was computerised. Software bugs were the primnary causes of the radiation
|
||||
and was subsequently computerised. Software bugs were the primary causes of the radiation
|
||||
overdoses.
|
||||
\cite{therac}
|
||||
Any new safety critical analysis methodology should
|
||||
@ -311,18 +311,18 @@ fault conditions are missed.
|
||||
|
||||
% http://en.wikipedia.org/wiki/Autopilot
|
||||
\paragraph{Importance of self checking}
|
||||
To take an example of an Autopilot, simple early autopilots, were (i.e. they
|
||||
prevented the aircraft staying from a compass bearing and kept it flying striaght and level).
|
||||
To take an example of an Autopilot, simple early autopilots,
|
||||
prevented the aircraft staying from a compass bearing and kept it flying striaght and level.
|
||||
Were they to fail the pilot would notice quite quickly
|
||||
and resume manual control of the bearing.
|
||||
|
||||
Modern autopilots control all aspects of flight including the engines, and take off and landing phases.
|
||||
Modern autopilots control all aspects of flight including the engines, take off and landing phases.
|
||||
The automated system does not have the
|
||||
common sense of a human pilot either, if fed the wrong sensory information
|
||||
it could make horrendous mistakes. This means that simply reading sensors and applying control
|
||||
common sense of a human pilot either and if fed the wrong sensory information
|
||||
could make horrendous mistakes. This means that simply reading sensors and applying control
|
||||
corrections cannot be enough.
|
||||
Checking for error conditions must also be incorporated.
|
||||
It could also develop an internal fault, and must be able to cope with this.
|
||||
Checking for error conditions must also be incorporated.
|
||||
It could also develop an internal fault, and must be able to recognise and cope with this.
|
||||
|
||||
|
||||
|
||||
@ -515,16 +515,15 @@ built representing the fault behaviour of a system.
|
||||
\item To create a user friendly formal common visual notation to represent fault modes
|
||||
in Software, Electronic and Mechanical sub-systems.
|
||||
\item To formally define this visual language in concrete and abstract domains.
|
||||
\item To prove that the derived~componets used to build the hierarchies
|
||||
\item To prove that the derived~components used to build the hierarchies
|
||||
provide traceable fault handling from component level to the
|
||||
highest abstract system 'top level'.
|
||||
\item To formally define the hierarchies and procedure for bulding them.
|
||||
\item To produce a software tool to aid in the drawing of diagrams and
|
||||
ensuring that all fault modes are addressed.
|
||||
\item to provide for determinisic and probablistic failure mode analysis processes
|
||||
\item to provide for deterministic and probablistic failure mode analysis processes
|
||||
\item To allow the possiblility of MTTF calculation for statistical
|
||||
reliability/safety calculations.
|
||||
\end{itemize}
|
||||
|
||||
|
||||
% fucking cunt \end{document}
|
||||
|
@ -62,11 +62,18 @@ look at some of Nancys accident papaers.
|
||||
High level technique, look at processes with feed back loops and rules, and then interfaces wbetween them.
|
||||
|
||||
|
||||
\subsection{Deterministic Approach}
|
||||
\section{Deterministic Approach}
|
||||
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
|
||||
No single component fault may lead to a dangerous condition.
|
||||
EN298 En230 etc
|
||||
|
||||
|
||||
\section{Statistical - tolerated failure frequencies}
|
||||
|
||||
Euopean standard
|
||||
EN61508 takes a statistical approach.
|
||||
It sets out four Safety Integrity Levels (SIL)
|
||||
|
||||
\subsection{Bayes Theorem}
|
||||
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
|
||||
\label{bayes}
|
||||
@ -75,7 +82,7 @@ probablistic approach - no direct causation paths to the higher~abstraction faul
|
||||
Often for instance a component in a module within a module within a module etc
|
||||
that has a probability of causing a SYSTEM level fault.
|
||||
|
||||
Used in FTA\cite{NASA}\cite{NUK}.
|
||||
Philosophy behind FTA\cite{NASA}\cite{NUK}.
|
||||
The idea being that probabilities can be assigned to components
|
||||
failing, causing system level errors.
|
||||
|
||||
|
@ -53,7 +53,7 @@
|
||||
\chapter{Thesis Scope}
|
||||
\input{introduction/introduction}
|
||||
|
||||
\chapter{Statistical Methods and Models}
|
||||
\chapter{Safety Critical systems Analysis}
|
||||
\input{statistics/statistics}
|
||||
|
||||
\chapter{Survey of Safety Critical Analysis Methodologies and Tools Available}
|
||||
|
Loading…
Reference in New Issue
Block a user