This commit is contained in:
Robin 2010-06-25 19:55:37 +01:00
parent 0aa4c9e924
commit 3e63538142
3 changed files with 63 additions and 57 deletions

View File

@ -23,8 +23,8 @@ This changed the target for the study slightly to encompass these three domains
\section{Background} \section{Background}
I completed an MSc in Software engineering in 2004 at Brighton University while working for I completed an MSc in Software engineering in 2004 at Brighton University while working for
an Engineering firm as a software Engineer. an Engineering firm as a Software Engineer.
The firm make industrial burner controllers. The firm specialise in industrial burner controllers.
Industrial Burners are potentially very dangerous industrial plant. Industrial Burners are potentially very dangerous industrial plant.
They are generally left running unattended for long periods. They are generally left running unattended for long periods.
They are subject to stringent safety regulations and They are subject to stringent safety regulations and
@ -35,15 +35,15 @@ One cannot merely comply with the standards.
The product must be `certified' by an independent The product must be `certified' by an independent
and and
`competent body' recognised under European law. `competent body' recognised under European law.
The cerification involved stress testing with repeated operation cycles, The certification involved stress testing with repeated operation cycles,
over a specified a range of temperatures. Electrical stress testing with high voltage interference, and over a specified a range of temperatures, electrical stress testing with high voltage interference, and
power supply voltage surges and dips. Electro static discharge testing, and power supply voltage surges and dips, electro static discharge testing, and
EMC (Electro Magnetic Compatibility). A significant part EMC (Electro Magnetic Compatibility). A significant part
of this process however, was `static testing'. This involved looking at the design of the products, of this process however, was `static testing'. This involved looking at the design of the products,
from the perspective of components failing, and the effect on safety this would have. from the perspective of components failing, and the effect on safety this would have.
Some of the static testing involved checking that the germane `EN' standards had Some of the static testing involved checking that the germane `EN' standards had
been complied with. Failure Mode Effects Analysis (FMEA) was also applied. This involved been complied with. Failure Mode Effects Analysis (FMEA) was also applied. This involved
looking in detail at critical sections of the product and proposing looking in detail at selected critical sections of the product and proposing
component failure scenarios. component failure scenarios.
For each failure scenario proposed either a satisfactory For each failure scenario proposed either a satisfactory
answer was required, or a counter proposal to change the design to cope with answer was required, or a counter proposal to change the design to cope with
@ -52,18 +52,18 @@ FMEA was time consuming, and being directed by
experts undoubtly ironed out many potential safety faults before the product saw experts undoubtly ironed out many potential safety faults before the product saw
light of day. light of day.
However it was quickly apparent that only a small proportion However it was quickly apparent that only a small proportion
of copmponent~failure modes was considered. Also there was no formalism. of component~failure modes was considered. Also there was no formalism.
The component~failure~modes investigated were not analysed within The component~failure~modes investigated were not analysed within
any rigourous or mathematically proven framework. any rigourous or mathematically proven framework.
\subsection{ Blanket Risk Reduction Approach } \subsection{ Blanket Risk Reduction Approach }
The suite of tests applied for a certified product amount to a `blanket' approach. The suite of tests applied for a certified product amount to a `blanket' approach.
That is to say that by applying Electrical, repeated operations, and environmental That is to say that by applying electrical, repeated operations, and environmental
stress testing it is hoped that the majority of latent faults are discovered. stress testing it is hoped that the majority of latent faults are discovered.
The FMEA and static testing only looked at the most obviously safety critical The FMEA and static testing only looked at the most obviously safety critical
aspects, and a small minority of the total component base for a product. aspects, and a small minority of the total component base for a product.
Systememic faults, or mistakes are missed by this form of static testing. Systemic faults, or mistakes are missed by this form of static testing.
\subsection{Possibility of applying mathematical techniques to FMEA} \subsection{Possibility of applying mathematical techniques to FMEA}
@ -73,7 +73,7 @@ and began thinking about how this could be done. One
obvious factor was that a typical safety critical system could obvious factor was that a typical safety critical system could
have more than 1000 component parts. Each component have more than 1000 component parts. Each component
would typically have several failure modes. would typically have several failure modes.
Trying to apply a rigourous methodology on an entire product Trying to apply a rigorous methodology on an entire product
was going to be impractical. To do this with complete coverage was going to be impractical. To do this with complete coverage
each component failure mode would have to have been checked against each component failure mode would have to have been checked against
the other thousand or so components for influence, and then the other thousand or so components for influence, and then
@ -90,7 +90,7 @@ a set of system or top level faults or undesireable outcomes are defined.
It then must break the system down into modules and It then must break the system down into modules and
decide which of these can contribute to a system level fault mode. decide which of these can contribute to a system level fault mode.
Potentially failure modes, be they from components or the interaction Potentially failure modes, be they from components or the interaction
betweem modules can be missed. A disturbing example of this between modules can be missed. A disturbing example of this
is the NASA space shuttle in 1986, which missed the fault mode of an O is the NASA space shuttle in 1986, which missed the fault mode of an O
ring. This was made even worse, by the fact that the `O' ring had a specified temperature ring. This was made even worse, by the fact that the `O' ring had a specified temperature
range where the probability of this fault occuring was dramatically raised when below range where the probability of this fault occuring was dramatically raised when below
@ -98,11 +98,9 @@ the temperature range. This was a known and documented feature of a safety criti
and it was ignored in the safety analysis. and it was ignored in the safety analysis.
\paragraph{Bottom-up Approach} \paragraph{Bottom-up Approach}
A bottom-up approach look impractical at first due to the shear number A bottom-up approach looked impractical at first due to the sheer number
of component failure modes in a typical system. However of component failure modes in a typical system.
were this bottom-up approach to be modular However were this bottom-up approach to be modular, (reducing the order of cross checking), and build a hierachy
we can reduce the
, and built into a hierachy
of modules rising up until all components are covered, we of modules rising up until all components are covered, we
can model an entire complex system. can model an entire complex system.
This is the core concept behind this study. This is the core concept behind this study.
@ -117,15 +115,15 @@ Also a hierarchy is formed when the top level errors are formed
naturally from the lower levels of analysis. naturally from the lower levels of analysis.
Unlike a top~down analysis, we cannot miss a top level fault condition. Unlike a top~down analysis, we cannot miss a top level fault condition.
\paragraph{Multi-disipline}. Most safety critical systems are composed of mechanical, electrical and \paragraph{Multi-disipline} Most safety critical systems are composed of mechanical, electrical and
computing elements. A tragic example of the mechanical and electircal elements computing elements. A tragic example of the mechanical and electrical elements
interfacing to a computer~controller is found in the THERAC25 x-ray dosage machine. interfacing to a computer is found in the THERAC25 x-ray dosage machine.
With no common notation to integrate the saftey analyis between the electricali/mechanical and computing With no common notation to integrate the saftey analyis between the electrical/mechanical and computing
domains synchronisation errors occurred that were in some cases fatal. domains, synchronisation errors occurred that were in some cases fatal.
\paragraph{Requirements for a rigourous FMEA process}. \paragraph{Requirements for a rigorous FMEA process}
It was determined that any process to apply It was determined that any process to apply
FMEA in rigourous and complete (in terms of complete component coverage) had to be FMEA in rigorous and complete (in terms of complete component coverage) had to be
a bottom~up process to eliminate the possibility of missing component failure modes. a bottom~up process to eliminate the possibility of missing component failure modes.
It also had to naturally converge to a failure model of the system. It also had to naturally converge to a failure model of the system.
It had to take potentially thousands of component failure modes and simplify It had to take potentially thousands of component failure modes and simplify
@ -137,7 +135,7 @@ a process of modularisation from the bottom~up.
\begin{list}{$*$}{} \begin{list}{$*$}{}
\item The analysis process must be `bottom~up' \item The analysis process must be `bottom~up'
\item The process must be modular and hierarchical \item The process must be modular and hierarchical
\item The process must be multi-disipline and must be able to represent hardware, electronics and software \item The process must be multi-dicipline and must be able to represent hardware, electronics and software
\end{list} \end{list}
\section{Safety Critical Systems} \section{Safety Critical Systems}
@ -172,16 +170,18 @@ EN61508 \cite{EN61508} (international standard IOC1508).
\paragraph{Deterministic safety Measures} \paragraph{Deterministic safety Measures}
The second philosophy, applied to application specific standards, is to investigate The second philosophy, applied to application specific standards, is to investigate
components ior sub-systems in the critical safety path and to look at component failure modes components for sub-systems in the critical safety path and to look at component failure modes
and ensure that they cannot cause dangerous faults. and ensure that they cannot cause dangerous faults.
With the application specific standards detail %With the application specific standards detail
specific to the process are %specific to the process are
This philosophy is first mentioned in aircraft safety operation reseach WWII The simplest deterministic safety measure is to require that no single component failure
studies. Here potential single faults (usually mechanical) are traced to mode can cause a dangerous error.
catastrophic failures This philosophy is first mentioned in aircraft safety operation reseach (WWII)
studies. Here potential single faults (usually mechanical) were traced to
% \cite{boffin}. catastrophic failures \cite{boffin}.
EN298, the European Gas burner standard, goes further than this
and requires that no two single component faults may cause
a dangerous condition.
% %
@ -194,13 +194,13 @@ catastrophic failures
\subsection{Overview of regulation of safety Critical systems} \subsection{Overview of regulation of safety Critical systems}
reference chapter dealing speciifically with this but given a quick overview. Reference chapter dealing specifically with this but given a quick overview.
\subsubsection{Overview system analysis philosophies } \subsubsection{Overview system analysis philosophies }
- General safety standards - General safety standards
- specific safety standards - specific safety standards
\subsubsection{Overview of current testing and certification} \subsubsection{Overview of current testing and certification}
ref chapter speciifically on this but give an overview now Ref chapter specifically on this but give an overview now
A modern industrial burner has mechanical, electronic and software A modern industrial burner has mechanical, electronic and software
elements, that are all safety critical. That is to say elements, that are all safety critical. That is to say
@ -234,7 +234,7 @@ unhandled failures could create dangerous faults.
% %
\section{An Outline of the FMMD Technique} \section{An Outline of the FMMD Technique}
The methodology takes a bottom up approach to The FMMD methodology takes a bottom up approach to
the design of an integrated system. the design of an integrated system.
% %
Each component is assigned a well defined set of failure modes. Each component is assigned a well defined set of failure modes.
@ -243,7 +243,7 @@ perform simple well defined tasks.
These functional groups are analysed with respect to the failure modes of the These functional groups are analysed with respect to the failure modes of the
components. components.
% %
The `functional group', after analysis, have its own set of derived The `functional group', after analysis, has its own set of derived
failure modes. failure modes.
% %
The number of derived failure modes will be The number of derived failure modes will be
@ -272,10 +272,10 @@ A formal description of this process is dealt with in Chapter \ref{fmmddefinitio
Automated systems, as opposed to manual ones are now the norm Automated systems, as opposed to manual ones are now the norm
in the home and in industry. in the home and in industry.
% %
Automated systems have long been recognised as being more effecient and Automated systems have long been recognised as being more efficient and
more accurate than a human opperator, and the reason for automating a process more accurate than a human opperator, and the reason for automating a process
can now be more likely to be cost savings due to better effeciency can now be more likely to be cost savings due to better effeciency
than a human operator \ref{burnereffency}. than a not paying a salary to a human operator \ref{burnereffency}.
% %
For instance For instance
early automated systems were mechanical, with cams and levers simulating early automated systems were mechanical, with cams and levers simulating
@ -285,11 +285,11 @@ A typical control function could be the
fuel air mixture profile curves over a the firing range. fuel air mixture profile curves over a the firing range.
% %
Because fuels vary slightly in calorific value, and air density changes with the weather, no optimal tuning can be optional. Because fuels vary slightly in calorific value, and air density changes with the weather, no optimal tuning can be optional.
In fact for asethtic reasons (not wanting smoke to appear at the flue) In fact for asethetic reasons (not wanting smoke to appear at the flue)
the tuning was often air rich, causing air to be heated and the tuning was often air rich, causing air to be heated and
uneccessarily passed through the burner, leading to direct loss of energy. unnecessarily passed through the burner, leading to direct loss of energy.
An automated system analysing the combustions gasses and automatically An automated system analysing the combustion gasses and automatically
adjusting the fuel air mix can get the effeciencies very close to theoretical levels. adjusting the fuel air mix can get the efficiencies very close to theoretical levels.
As the automation takes over more and more functions from the human operator it also takes on more responsibility. As the automation takes over more and more functions from the human operator it also takes on more responsibility.
@ -297,7 +297,7 @@ A classic example of an automated system failing, is the therac-25.
This was an X-ray dosage machine, that, due to software errors This was an X-ray dosage machine, that, due to software errors
caused the deaths of several patients and injured more during the 1980's. caused the deaths of several patients and injured more during the 1980's.
The Therac-25 was a designed from a manual system, which had checks and interlocks, The Therac-25 was a designed from a manual system, which had checks and interlocks,
and was computerised. Software bugs were the primnary causes of the radiation and was subsequently computerised. Software bugs were the primary causes of the radiation
overdoses. overdoses.
\cite{therac} \cite{therac}
Any new safety critical analysis methodology should Any new safety critical analysis methodology should
@ -311,18 +311,18 @@ fault conditions are missed.
% http://en.wikipedia.org/wiki/Autopilot % http://en.wikipedia.org/wiki/Autopilot
\paragraph{Importance of self checking} \paragraph{Importance of self checking}
To take an example of an Autopilot, simple early autopilots, were (i.e. they To take an example of an Autopilot, simple early autopilots,
prevented the aircraft staying from a compass bearing and kept it flying striaght and level). prevented the aircraft staying from a compass bearing and kept it flying striaght and level.
Were they to fail the pilot would notice quite quickly Were they to fail the pilot would notice quite quickly
and resume manual control of the bearing. and resume manual control of the bearing.
Modern autopilots control all aspects of flight including the engines, and take off and landing phases. Modern autopilots control all aspects of flight including the engines, take off and landing phases.
The automated system does not have the The automated system does not have the
common sense of a human pilot either, if fed the wrong sensory information common sense of a human pilot either and if fed the wrong sensory information
it could make horrendous mistakes. This means that simply reading sensors and applying control could make horrendous mistakes. This means that simply reading sensors and applying control
corrections cannot be enough. corrections cannot be enough.
Checking for error conditions must also be incorporated. Checking for error conditions must also be incorporated.
It could also develop an internal fault, and must be able to cope with this. It could also develop an internal fault, and must be able to recognise and cope with this.
@ -515,16 +515,15 @@ built representing the fault behaviour of a system.
\item To create a user friendly formal common visual notation to represent fault modes \item To create a user friendly formal common visual notation to represent fault modes
in Software, Electronic and Mechanical sub-systems. in Software, Electronic and Mechanical sub-systems.
\item To formally define this visual language in concrete and abstract domains. \item To formally define this visual language in concrete and abstract domains.
\item To prove that the derived~componets used to build the hierarchies \item To prove that the derived~components used to build the hierarchies
provide traceable fault handling from component level to the provide traceable fault handling from component level to the
highest abstract system 'top level'. highest abstract system 'top level'.
\item To formally define the hierarchies and procedure for bulding them. \item To formally define the hierarchies and procedure for bulding them.
\item To produce a software tool to aid in the drawing of diagrams and \item To produce a software tool to aid in the drawing of diagrams and
ensuring that all fault modes are addressed. ensuring that all fault modes are addressed.
\item to provide for determinisic and probablistic failure mode analysis processes \item to provide for deterministic and probablistic failure mode analysis processes
\item To allow the possiblility of MTTF calculation for statistical \item To allow the possiblility of MTTF calculation for statistical
reliability/safety calculations. reliability/safety calculations.
\end{itemize} \end{itemize}
% fucking cunt \end{document}

View File

@ -62,11 +62,18 @@ look at some of Nancys accident papaers.
High level technique, look at processes with feed back loops and rules, and then interfaces wbetween them. High level technique, look at processes with feed back loops and rules, and then interfaces wbetween them.
\subsection{Deterministic Approach} \section{Deterministic Approach}
\paragraph{NOT WRITTEN YET PLEASE IGNORE} \paragraph{NOT WRITTEN YET PLEASE IGNORE}
No single component fault may lead to a dangerous condition. No single component fault may lead to a dangerous condition.
EN298 En230 etc EN298 En230 etc
\section{Statistical - tolerated failure frequencies}
Euopean standard
EN61508 takes a statistical approach.
It sets out four Safety Integrity Levels (SIL)
\subsection{Bayes Theorem} \subsection{Bayes Theorem}
\paragraph{NOT WRITTEN YET PLEASE IGNORE} \paragraph{NOT WRITTEN YET PLEASE IGNORE}
\label{bayes} \label{bayes}
@ -75,7 +82,7 @@ probablistic approach - no direct causation paths to the higher~abstraction faul
Often for instance a component in a module within a module within a module etc Often for instance a component in a module within a module within a module etc
that has a probability of causing a SYSTEM level fault. that has a probability of causing a SYSTEM level fault.
Used in FTA\cite{NASA}\cite{NUK}. Philosophy behind FTA\cite{NASA}\cite{NUK}.
The idea being that probabilities can be assigned to components The idea being that probabilities can be assigned to components
failing, causing system level errors. failing, causing system level errors.

View File

@ -53,7 +53,7 @@
\chapter{Thesis Scope} \chapter{Thesis Scope}
\input{introduction/introduction} \input{introduction/introduction}
\chapter{Statistical Methods and Models} \chapter{Safety Critical systems Analysis}
\input{statistics/statistics} \input{statistics/statistics}
\chapter{Survey of Safety Critical Analysis Methodologies and Tools Available} \chapter{Survey of Safety Critical Analysis Methodologies and Tools Available}