165 lines
6.4 KiB
TeX
165 lines
6.4 KiB
TeX
|
|
|
|
|
|
|
|
\ifthenelse {\boolean{paper}}
|
|
{
|
|
\abstract{ This chapter looks at current methodologies
|
|
for static analysis of safety critical systems
|
|
and looks at the statistical justifications for their application.}
|
|
}
|
|
{}
|
|
|
|
|
|
\section{Introduction}
|
|
|
|
\section{Safety and Reliability}
|
|
|
|
- How these are different.
|
|
|
|
- Safety is environmentally sensitive
|
|
|
|
|
|
In order to quantify a difference between safety and reliability we
|
|
need to determine which system failure modes are dangerous or safe.
|
|
Were a burner controller to detect a problem with an air pressure switch
|
|
and refuse to start up (and raise an alarm) we can see this is a safe failure mode.
|
|
Were a burner controller to pump fuel into the combustion chamber
|
|
and then ignite it after long duration\footnote{Most GAS safety timeouts for seeing a flame under ignition conditions specify < 3 seconds}
|
|
we would have a clear risk of a dangerous explosion.
|
|
Here, the picture is further complicated by the environment.
|
|
If the burner was placed in a remote building and operated
|
|
remotely, there would be minimal risk to life.
|
|
Were the burner to be located in a busy factory, surrounded by people
|
|
the safety risk is higher.
|
|
|
|
|
|
|
|
- How safety and reliability get confused.
|
|
A tale of two customers (for integrated boiler controls).
|
|
|
|
Customer 1. Brewery.
|
|
Impact of boiler going down, delayed production - some cost.
|
|
|
|
Customer 2. Nuclear Powerstation.
|
|
Impact of boiler going down, no CO2 primary coolant available, possible reactor shutdown, possible emergency shutdown methods. Cost very high.
|
|
|
|
For the Brewery, safety is of the highest importance.
|
|
For the Nuclear power station
|
|
|
|
|
|
\section{Terms and Concepts in \\ Safety Critical Engineering}
|
|
|
|
\subsection{Safety Relevant Data Object}
|
|
A Safety Relevant Data Object (SRDO)\cite{caninauto}, is a data structure describing the status of
|
|
a particular feature or attribute of a safety critical system.
|
|
For instance, in a burner this could be a flame signal value, or in a nuclear powerstation
|
|
the measure neutron flux.
|
|
\subsection{Safety relevant Object Validation Time}
|
|
Safety times can be given for SRDO's; these are termed Safety Related Object Validation Times (SROVT's)\cite{caninauto}. For instance were
|
|
a flame to fail in operation in a gas burner
|
|
standards state \cite{en298} that the gas may not continue to be fed into the
|
|
furnace for more than three seconds.
|
|
We can say that the SROVT for a flame signal in a gas burner is 3 seconds.
|
|
\subsection{Single and Double Failure Modes}
|
|
A Safety critical system must self check within the relevant SROVT's.
|
|
On detecting a failure mode it must react appropriately.
|
|
Consider the case though where two failures occurr within the
|
|
time windows of their SROVT's. We can term this a double simultaneous failure mode.
|
|
To take an extreme example, were the checking function/mechanism and the object under supervision
|
|
to fail within the SROVT, it may be impossible to detect the failure.
|
|
|
|
\section{Interfacing}
|
|
|
|
Mech - elec - sw
|
|
|
|
Most problems occur here need citations
|
|
look at some of Nancys accident papaers.
|
|
|
|
|
|
\section{Current Methods for Safety Critical Analysis}
|
|
|
|
\section{STAMP}
|
|
|
|
High level technique, look at processes with feed back loops and rules, and then interfaces wbetween them.
|
|
|
|
|
|
\section{Deterministic Approach}
|
|
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
|
|
No single component fault may lead to a dangerous condition.
|
|
EN298 En230 etc
|
|
|
|
|
|
\section{Statistical - tolerated failure frequencies}
|
|
|
|
Euopean standard
|
|
EN61508 takes a statistical approach.
|
|
It sets out four Safety Integrity Levels (SIL)
|
|
|
|
\subsection{Bayes Theorem}
|
|
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
|
|
\label{bayes}
|
|
Describe application - likely hood of faults being the cause of symptoms -
|
|
probablistic approach - no direct causation paths to the higher~abstraction fault mode.
|
|
Often for instance a component in a module within a module within a module etc
|
|
that has a probability of causing a SYSTEM level fault.
|
|
|
|
Philosophy behind FTA\cite{nasafta}\cite{nucfta}.
|
|
The idea being that probabilities can be assigned to components
|
|
failing, causing system level errors.
|
|
|
|
Problems, difficult to get reliable stats
|
|
for probability to cause because of small sample numbers...
|
|
|
|
FMMD approach can by traversing down the tree use known component failure figures
|
|
to get {\em accurate} probabilities and potential causes.
|
|
%$$ c1 \cap c2 \eq \emptyset | c1 \neq c2 \wedge c1,c2 \in C \wedge C \in U $$
|
|
|
|
%Thus if the failure~modes are pairwaise mutually exclusive they qualify for inclusion into the
|
|
%unitary~state set family.
|
|
|
|
\subsection{ Saftey Integrity Level Analysis }
|
|
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
|
|
\label{sil}
|
|
This technique looks at all components in the parts list
|
|
and asks what the effect of the component failing will be.
|
|
Note that particular failure modes of the compoent are not considered.
|
|
The component can fail in any of its failure modes from the perspective of this analysis.
|
|
The analyst has to make a choice between four conditions:
|
|
|
|
\begin{itemize}
|
|
\item sd - A safe fault that is detected by an automated system
|
|
\item su - A safe fault that is undetected by an automated system
|
|
\item dd - A potentially dangerous fault that is detected by an automated system
|
|
\item du - A potentially dangerous fault that is not detected by an automated system
|
|
\end{itemize}
|
|
Actually this is almost how sil analysis is done, because
|
|
the base components are listed
|
|
and their failure result as either sd su dd du
|
|
|
|
A formula is then applied according to the system architecture 1oo1 2oo3 3oo3 etc
|
|
|
|
What is not done is the probability for all these conditions, the sil analysis
|
|
person simple has to decide which it is.
|
|
Another fault in this is that it is very difficult to
|
|
extract meaning ful stats
|
|
for how likely the detection systems are to pick the fault up, or even to introduce a fault of their own.
|
|
|
|
\subsection{Tests of Hypotheses and Significance}
|
|
\paragraph{NOT WRITTEN YET PLEASE IGNORE}
|
|
Linked in with Bayes theorem
|
|
Accident analysis
|
|
plane crashes and faults etc
|
|
In high reliability systems the fauls are often logged - strange occurances -
|
|
processors resetting - what are the common factors - P values -
|
|
for instance very high voltage spikes can reset micro controllers -
|
|
but how do you corrollate that with unshielded suppressed contactors...
|
|
|
|
Maybe looking at the equipment and seeing if there is a 5\%
|
|
level of the error being caused ?
|
|
i.e. using it to search for these conditions ?
|
|
|
|
|
|
Actually this could be used to refine the SIL method \ref{sil}
|
|
and give probabilities for the four conditions.
|