\ifthenelse {\boolean{paper}} { \abstract{ This paper looks at current methodologies for static analysis of safety critical systems and looks at the statistical justifications for their application.} } { This chapter looks at the current state of safety critical systems and provides background to concepts and standard practises. Its aims to bridge } \section{Introduction} \section{Product} \subsection{life cycle} \subsection{parts list} Important document, used for quality inspection and production validation etc \subsubsection{BOM} \subsection{Components and Sub-systems} How can have failure modes \section{Safety and Reliability} - How these are different. - Safety is environmentally sensitive In order to quantify a difference between safety and reliability we need to determine which system failure modes are dangerous or safe. Were a burner controller to detect a problem with an air pressure switch and refuse to start up (and raise an alarm) we can see this is a safe failure mode. Were a burner controller to pump fuel into the combustion chamber and then ignite it after long duration\footnote{Most GAS safety timeouts for seeing a flame under ignition conditions specify $<$ 3 seconds\cite{en298}} we would have a clear risk of a dangerous explosion. Here, the picture is further complicated by the environment. If the burner was placed in a remote building and operated remotely, there would be minimal risk to life. Were the burner to be located in a busy factory, surrounded by people the safety risk is higher. - How safety and reliability get confused. A tale of two customers (for integrated boiler controls). Customer 1. Brewery. Impact of boiler going down, delayed production - some cost. Customer 2. Nuclear Powerstation. Impact of boiler going down, no CO2 primary coolant available, possible reactor shutdown, possible emergency shutdown methods. Cost very high. For the Brewery, safety is of the highest importance. For the Nuclear power station \section{Terms and Concepts in \\ Safety Critical Engineering} \subsection{Timing And Safety Checking} \subsubsection{CANopen Timing Definitions} CAN is a mainstream network and was internationally standardized (ISO 11898–1) in 1993. CANopen is a protocol suite based on the hardware of the CANbus\cite{canspec}. CANbus is a hardened differential serial communications bus and is arbitration free\footnote{Implemented at the physical and data link layers using DOMINANT and PASSIVE bits, with self monitoring and auto back off form the node first transmitting a PASSIVE bit that is DOMINANT on the bus} It also has a 15 bit\cite{crcembedd} CRC built into the protocol, which can detect a guaranteed six consecutive bit failures. This makes it a very safe and robust messaging medium to use for safety critical systems. CAN is a message based protocol, designed originally for automotive applications but now also used in other areas such as industrial automation, industrial burner controllers and medical equipment. CANopen literature discusses some of the concepts based around the timing relevance of given items of safety critical data. \paragraph{Safety Relevant Data Object} A Safety Relevant Data Object (SRDO)\cite{caninauto}, is a data structure describing the status of a particular feature or attribute of a safety critical system. For instance, in a burner this could be a flame signal value, or in a nuclear powerstation the measure neutron flux. \paragraph{Safety relevant Object Validation Time} Safety times can be given for SRDO's; these are termed Safety Related Object Validation Times (SROVT's)\cite{caninauto}. For instance were a flame to fail in operation in a gas burner standards state \cite{en298} that the gas may not continue to be fed into the furnace for more than three seconds. We can say that the SROVT for a flame signal in a gas burner is 3 seconds. \subsection{Single and Double Failure Modes} A Safety critical system must self check within the relevant SROVT's. On detecting a failure mode it must react appropriately. Consider the case though where two failures occurr within overlapping time windows of their SROVT's. We can term this a double simultaneous failure mode. To take an extreme example, were the checking function/mechanism and the object under supervision to fail within the SROVT, it may be impossible to detect the failure. \section{Interfacing} Mech - elec - sw Most problems occur here need citations look at some of Nancys accident papaers. \section{Current Methods for Safety Critical Analysis} \section{STAMP} High level technique, look at processes with feed back loops and rules, and then interfaces wbetween them. \section{Deterministic Approach} \paragraph{NOT WRITTEN YET PLEASE IGNORE} No single component fault may lead to a dangerous condition. EN298 En230 etc \section{Statistical - tolerated failure frequencies} Euopean standard EN61508 takes a statistical approach. It sets out four Safety Integrity Levels (SIL) \subsection{Bayes Theorem} \paragraph{NOT WRITTEN YET PLEASE IGNORE} \label{bayes} Describe application - likely hood of faults being the cause of symptoms - probablistic approach - no direct causation paths to the higher~abstraction fault mode. Often for instance a component in a module within a module within a module etc that has a probability of causing a SYSTEM level fault. Philosophy behind FTA\cite{nasafta}\cite{nucfta}. The idea being that probabilities can be assigned to components failing, causing system level errors. Problems, difficult to get reliable stats for probability to cause because of small sample numbers... FMMD approach can by traversing down the tree use known component failure figures to get {\em accurate} probabilities and potential causes. %$$ c1 \cap c2 \eq \emptyset | c1 \neq c2 \wedge c1,c2 \in C \wedge C \in U $$ %Thus if the failure~modes are pairwaise mutually exclusive they qualify for inclusion into the %unitary~state set family. \subsection{ Saftey Integrity Level Analysis } \paragraph{NOT WRITTEN YET PLEASE IGNORE} \label{sil} This technique looks at all components in the parts list and asks what the effect of the component failing will be. Note that particular failure modes of the compoent are not considered. The component can fail in any of its failure modes from the perspective of this analysis. The analyst has to make a choice between four conditions: \begin{itemize} \item sd - A safe fault that is detected by an automated system \item su - A safe fault that is undetected by an automated system \item dd - A potentially dangerous fault that is detected by an automated system \item du - A potentially dangerous fault that is not detected by an automated system \end{itemize} Actually this is almost how sil analysis is done, because the base components are listed and their failure result as either sd su dd du A formula is then applied according to the system architecture 1oo1 2oo3 3oo3 etc What is not done is the probability for all these conditions, the sil analysis person simple has to decide which it is. Another fault in this is that it is very difficult to extract meaning ful stats for how likely the detection systems are to pick the fault up, or even to introduce a fault of their own. \subsection{Tests of Hypotheses and Significance} \paragraph{NOT WRITTEN YET PLEASE IGNORE} Linked in with Bayes theorem Accident analysis plane crashes and faults etc In high reliability systems the fauls are often logged - strange occurances - processors resetting - what are the common factors - P values - for instance very high voltage spikes can reset micro controllers - but how do you corrollate that with unshielded suppressed contactors... Maybe looking at the equipment and seeing if there is a 5\% level of the error being caused ? i.e. using it to search for these conditions ? Actually this could be used to refine the SIL method \ref{sil} and give probabilities for the four conditions.