From a5dc8ccf13ac5712eed9980f735a98facbaa1196 Mon Sep 17 00:00:00 2001 From: Robin Clark Date: Sun, 6 Mar 2011 13:58:29 +0000 Subject: [PATCH] kile automated spelling blissssss --- introduction/introduction.tex | 94 ++++++++++++++++++----------------- 1 file changed, 48 insertions(+), 46 deletions(-) diff --git a/introduction/introduction.tex b/introduction/introduction.tex index 9ad54f6..0e710b6 100644 --- a/introduction/introduction.tex +++ b/introduction/introduction.tex @@ -31,7 +31,7 @@ applicable to industrial burner controllers\footnote{Burner Controllers cover th combustion, high pressure steam and hot water, mechanical control, electronics and embedded software.}. The methodology developed was designed to cope with both the deterministic\footnote{Deterministic failure mode analysis, traces failure mode effects at the SYSTEM level to lower level causes in components or sub-systems.} and probablistic approaches -\footnote{Probablistic failure mode analysis tries to determine the probability of given SYSTEM failure modes, and pfrom these +\footnote{Probabilistic failure mode analysis tries to determine the probability of given SYSTEM failure modes, and pfrom these can determine an overall failure rate, in terms of probability of failure on demand, or failure in time (or Mean Time to Failure (MTTF).}. \glossary{name={safety critical},description={A safety critical system is one in which its failure may result in death or serious injury to humans, an environmental catastrophe or severe loss or damage}} \fmodegloss @@ -39,10 +39,10 @@ can determine an overall failure rate, in terms of probability of failure on dem \paragraph{Safety Critical Controllers, knowledge and culture sub-disiplines} The maturing of the application of the programmable electronic controller (PEC) -for a wide range safety critical applications, has led to a fragmentation of subdisiplines +for a wide range safety critical applications, has led to a fragmentation of sub-disciplines which speak imperfectly to one another. This is because -the main three engineering disiplines, Electrical, Software and Mechanical Engineering +the main three engineering disciplines, Electrical, Software and Mechanical Engineering produced equipment that was interfaced a a later time. Just as electronic circuitry becomes more integrated, and sub-domains of electrical engineering (analog and digital for instance) are commonly found along-side on the same chip, @@ -50,13 +50,13 @@ so modern PEC's are becoming more and more integrated and now typically encompa input from the three engineering disciplines\footnote{Consider an aircraft, this involves expert knowledge from Software, Electronic and Mechanical Engineering and requires a high degree of safety validation}. -Additional disiplines are defined by application area of the PEC. All of these sub-displines +Additional disiplines are defined by application area of the PEC. All of these sub-disciplines are in turn split into even finer units. -The practicioners of these fields tend to view a PEC in different ways. -Discoveries and culture in one field diffuse only slowly into the conciousness of a specialist in another. -Too often, one disipline's unproven assumptions or working methods, are treated as firm boundary conditions +The practitioners of these fields tend to view a PEC in different ways. +Discoveries and culture in one field diffuse only slowly into the consciousness of a specialist in another. +Too often, one discipline's unproven assumptions or working methods, are treated as firm boundary conditions for an overlapping field. -For failure mode analysis a common notation, across disiplines is a very desirable and potentially useful +For failure mode analysis a common notation, across disciplines is a very desirable and potentially useful tool. \paragraph{Safety Assessment/analysis of PEC's} @@ -81,14 +81,15 @@ component failure mode in a model, and to be able to represent mechanical, electrical and software components in a single failure mode model. \paragraph{Desirability of a common failure mode notation} -Having a common failure mode notation accross all disciplines in a project +Having a common failure mode notation across all disciplines in a project would allow all the specialists to prepare failure mode analysis and then bring them together to model the PEC. \paragraph{Visual form of the notation} -The visual notation developed was initially designed for electronic fault modelling. +The visual notation developed was initially designed for electronic fault modeling. This notation deals with failure modes of components using concepts derived from Euler and Spider diagrams. -However, as the notation dealt with generic failure modes, it was realised that it could be applied to mechanical and software domains as well. +However, as the notation dealt with generic failure modes, it was realised that it could be applied to +mechanical and software domains as well. This changed the target for the study slightly to encompass these three domains in a common notation. \paragraph{PEC's: Legal and Insurance Issues} @@ -97,10 +98,10 @@ There is also usually a differentiation between the manufacturers and the the plant operators. The manufacturers have to ensure -that the device is adaquately safe for use in its operational context. +that the device is adequately safe for use in its operational context. This usually means conforming to device specific standards~\footnote{in Europe, conformance to European Norms (EN) are legal requirements for specific types of controllers, and in the USA conformance to Underwriters Laboratories (UL) standards -are usually a mimimum requirement to take out insurance}, and offering training +are usually a minimum requirement to take out insurance}, and offering training of operators. Operators of safety critical plant are concerned with maintenance and legal obligations for @@ -133,14 +134,14 @@ looking in detail at selected critical sections of the product and proposing component failure scenarios. For each failure scenario proposed either a satisfactory answer was required, or a counter proposal to change the design to cope with -a theroretical component failure eventuality. +a theoretical component failure eventuality. FMEA was time consuming, and being directed by -experts undoubtly ironed out many potential safety faults before the product saw +experts undoubtedly ironed out many potential safety faults before the product saw light of day. However it was quickly apparent that only a small proportion of component~failure modes was considered. Also there was no formalism. The component~failure~modes investigated were not analysed within -any rigourous or mathematically proven framework. +any rigorous or mathematically proven framework. \subsection{ Blanket Risk Reduction Approach } @@ -154,7 +155,7 @@ Systemic faults, or mistakes are missed by this form of static testing. \subsection{Possibility of applying mathematical techniques to FMEA} My MSc project was a diagram editor for Constraint diagrams. -I wanted to apply constriant diagram techniques to FMEA +I wanted to apply constraint diagram techniques to FMEA and began thinking about how this could be done. One obvious factor was that a typical safety critical system could have more than 1000 component parts. Each component @@ -173,21 +174,21 @@ to perform some particular low-level function. \paragraph{Top down Approach} A top down approach has several potential problems. By its nature it means that at the start of the process -a set of system or top level faults or undesireable outcomes are defined. +a set of system or top level faults or undesirable outcomes are defined. It then must break the system down into modules and decide which of these can contribute to a system level fault mode. Potentially failure modes, be they from components or the interaction between modules can be missed. A disturbing example of this is the NASA space shuttle in 1986, which missed the fault mode of an O ring. This was made even worse, by the fact that the `O' ring had a specified temperature -range where the probability of this fault occuring was dramatically raised when below +range where the probability of this fault occurring was dramatically raised when below the temperature range. This was a known and documented feature of a safety critical component and it was ignored in the safety analysis. \paragraph{Bottom-up Approach} A bottom-up approach looked impractical at first due to the sheer number of component failure modes in a typical system. -However were this bottom-up approach to be modular, (reducing the order of cross checking), and build a hierachy +However were this bottom-up approach to be modular, (reducing the order of cross checking), and build a hierarchal of modules rising up until all components are covered, we can model an entire complex system. This is the core concept behind this study. @@ -216,7 +217,7 @@ repeated within a SYSTEM In general terms we can describe these circuitry sub-systems -as collections of components or smaller sub-systesm, that interact to perform a given function. +as collections of components or smaller sub-systems, that interact to perform a given function. We can call these collections {\fg}s. @@ -239,7 +240,7 @@ failures occur, not know and feed incorrect data into our system. % Figure \ref{fig:millivolt} shows a typical industrial circuit to measure and amplify millivolt signals. -It will detect a disconneted milli-volt source (the most common +It will detect a disconnected Milli-volt source (the most common failure, and usually due to wiring faults) and some other internal component failures. It can however provide an incorrect (slightly low reading) if one of two resistors fail in particular ways. @@ -254,12 +255,12 @@ of detecting `undetected failures' in safety critical product design. \paragraph{Multi-disipline} Most safety critical systems are composed of mechanical, electrical and computing elements. A tragic example of the mechanical and electrical elements interfacing to a computer is found in the THERAC25 x-ray dosage machine. -With no common notation to integrate the saftey analyis between the electrical/mechanical and computing +With no common notation to integrate the safety analysis between the electrical/mechanical and computing domains, synchronisation errors occurred that were in some cases fatal. The interfacing between the hardware and software for the THERAC-25 was not considered in the design phase. Niel Story in the formal methods chapter of "safety critical computer systems" -describes the different formal languages suitable for hardward and software and +describes the different formal languages suitable for hardware and software and bemaons the fact that no single language is suitable for for such a broad range of tasks \cite{sccs}[pp. 287]. \paragraph{Requirements for a rigorous FMEA process} @@ -276,7 +277,7 @@ a process of modularisation from the bottom~up. \begin{list}{$*$}{} \item The analysis process must be `bottom~up' \item The process must be modular and hierarchical -\item The process must be multi-dicipline and must be able to represent hardware, electronics and software +\item The process must be multi-discipline and must be able to represent hardware, electronics and software \end{list} \section{Safety Critical Systems} @@ -300,7 +301,7 @@ it has the potential to become dangerous\cite{sccs}. %Medical electronics for automatically dispensing drugs or maintaining %life support are examples of systems that lives depend upon. -\subsection{Two approaches : Probablistic, and Deterministic} +\subsection{Two approaches : Probabilistic, and Deterministic} There are two main philosophies applied to safety critical systems certification. \paragraph{Probablistic safety Measures} @@ -348,7 +349,7 @@ Ref chapter specifically on this but give an overview now A modern industrial burner has mechanical, electronic and software elements, that are all safety critical. That is to say -unhandled failures could create dangerous faults. +unhanded failures could create dangerous faults. %To add to these problems %Operators are often under pressure to keep them running. An boiler supplying @@ -419,8 +420,8 @@ Automated systems, as opposed to manual ones are now the norm in the home and in industry. % Automated systems have long been recognised as being more efficient and -more accurate than a human opperator, and the reason for automating a process -can now be more likely to be cost savings due to better effeciency +more accurate than a human operator, and the reason for automating a process +can now be more likely to be cost savings due to better efficiency than a not paying a salary to a human operator \ref{burnereffency}. % For instance @@ -431,10 +432,10 @@ A typical control function could be the fuel air mixture profile curves over a the firing range. % Because fuels vary slightly in calorific value, and air density changes with the weather, no optimal tuning can be optional. -In fact for asethetic reasons (not wanting smoke to appear at the flue) +In fact for aesthetic reasons (not wanting smoke to appear at the flue) the tuning was often air rich, causing air to be heated and unnecessarily passed through the burner, leading to direct loss of energy. -An automated system analysing the combustion gasses and automatically +An automated system analysing the combustion gases and automatically adjusting the fuel air mix can get the efficiencies very close to theoretical levels. @@ -469,7 +470,8 @@ can make horrendous mistakes. This means that simply reading sensors and applyin corrections cannot be enough. Checking for error conditions must also be incorporated. Equipment can also develop an internal faults, and strategies -must be in-pcae to recognise and cope with them. +must be in-place to firstly recognise internal faults, +and then cope with them in the safest possible way. \begin{figure}[h] \centering @@ -523,9 +525,9 @@ detect several other failure modes of this circuit and a full analysis is given \paragraph{Self Checking} This introduces a level of self checking into the system. Admittedly this is the simplest failure mode scenario (that the -sensor is not wired correcly or has become disconnected). +sensor is not wired correctly or has become disconnected). % -This safety resisitor has a side effect, it also checks for internal errors +This safety resistor has a side effect, it also checks for internal errors that could occur in this circuit. Should the input resistor $R22$ go OPEN this would be detected. Should the gain resistors $R30$ or $R26$ go OPEN or SHORT a fault condition will be detected. @@ -533,7 +535,7 @@ Should the gain resistors $R30$ or $R26$ go OPEN or SHORT a fault condition will \paragraph{Not rigorous, but tested by time} This is a typical example of an industry standard circuit that has been thought through, and in practise works and detects most commonly encountered failure modes. -But it is not rogorous: it does not take into account every failure +But it is not rigorous: it does not take into account every failure mode of every component in it. However it does lead on to an important concept of three main states of a safety critical system. @@ -566,12 +568,12 @@ at the very least that single failures of hardware or software cannot create an unsafe condition in operational plant. Further to this a second fault introduced, must not cause an unsafe state, due -to the combation of both faults. +to the combination of both faults. \vskip 0.3cm This sounds like an entirely reasonable requirement. But to rigorously check the effect a particular component fault has on the system, we could check its effect on all other components. -Should a diode in the powersupply fail in a particular way, by perhaps +Should a diode in the power supply fail in a particular way, by perhaps introducing a ripple voltage, we should have to look at all components in the system to see how they will be affected. @@ -634,16 +636,16 @@ A technique of modularising, or breaking down the problem is clearly necessary. One question that anyone developing a safety critical analysis design tool could do well to answer, is how the methodology would cope with known previous disasters. -The Challenger disaster is a good example, and was well documented and invistigated. +The Challenger disaster is a good example, and was well documented and investigated. The problem lay in a seal that had an operating temperature range. On the day of the launch the temperature of this seal was out of range. A bottom up safety approach would have revealed this as a fault. -The FTA in use by NASA and the US Nuclear regulatory commisssion -allows for enviromental considerations such as temperature\cite{nasafta}\cite{nucfta}. +The FTA in use by NASA and the US Nuclear regulatory commission +allows for environmental considerations such as temperature\cite{nasafta}\cite{nucfta}. But because of the top down nature of the FTA technique, the safety designer must be aware of -the environemtnal constraints of all component parts in order to use this correctly. +the environmental constraints of all component parts in order to use this correctly. This element of FTA is discussed in \ref{surveysc} \subsection{Therac 25} @@ -665,7 +667,7 @@ excluded the software \cite{safeware}[App. A]. \subsection{Problems with Natural Language} -Written natural language desciptions can not only be ambiguous or easy to misinterpret, it +Written natural language descriptions can not only be ambiguous or easy to misinterpret, it is also not possible to apply mathematical checking to them. A mathematical model on the other hand can be checked for @@ -680,7 +682,7 @@ specifying systems, developed at Brighton and Kent university have been used and extended by this author to create a methodology for modelling complex safety critical systems, using diagrams. -This project uses a modified form of euler diagram used to represent propositional logic. +This project uses a modified form of Euler diagram used to represent propositional logic. %The propositional logic is used to analyse system components. @@ -693,7 +695,7 @@ failure modes. \subsection{Mechanical} Find refs \subsection{Software} -Software must run on a microprocessor/microcontroller, and these devices have a known set of failure modes. +Software must run on a microprocessor/micro-controller, and these devices have a known set of failure modes. The most common of these are RAM and ROM failures, but bugs in particular machine instructions can also exist. These can be checked for periodically. @@ -715,7 +717,7 @@ temperature being the most typical. Very often what happens to the system outsid \begin{itemize} \item To create a Bottom up FMEA technique that permits a connected hierarchy to be -built representing the fault behaviour of a system. +built representing the fault behavior of a system. \item To create a procedure where no component failure mode can be accidentally ignored. \item To create a user friendly formal common visual notation to represent fault modes in Software, Electronic and Mechanical sub-systems. @@ -727,7 +729,7 @@ highest abstract system 'top level'. \item To produce a software tool to aid in the drawing of diagrams and ensuring that all fault modes are addressed. \item to provide a data model that can be used as a source for deterministic and probablistic failure mode analysis reports. -\item To allow the possiblility of MTTF calculation for statistical +\item To allow the possibility of MTTF calculation for statistical reliability/safety calculations. \end{itemize}