Robin_PHD/related_papers_books/EffectiveRiskManagementandQualityImprovementbyApplicationofFMEA.html
Robin Clark 8fca630274 .
2011-01-26 10:22:38 +00:00

1098 lines
74 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"><head>
<title>Effective Risk Management and Quality Improvement by Application of FMEA and Complementary Techniques | ParagonRx</title>
<!-- base href="http://www.paragonrx.com/" -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<link rel="SHORTCUT ICON" href="http://www.paragonrx.com/favicon.ico">
<link rel="alternate" type="application/rss+xml" title="News from ParagonRx" href="http://www.paragonrx.com/feeds/ParagonRx-News.rss">
<meta name="keywords" content="Risk Management, Quality Improvement, FDA, REMS, FAA, FMEA, pharmaceutical, biopharmaceutical, biotechnology, White Paper">
<meta name="description" content="This paper provides an expert opinion of the use and effectiveness of Failure Modes and Effects Analysis (FMEA) for managing risks and improving quality.">
<link rel="stylesheet" type="text/css" media="all" href="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/stylesheet_003.css">
<link rel="stylesheet" type="text/css" media="print,screen" href="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/stylesheet.css">
<link rel="stylesheet" type="text/css" media="print,screen" href="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/stylesheet_004.css">
<link rel="stylesheet" type="text/css" media="print,screen" href="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/stylesheet_002.css">
<script src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/jquery.js" type="text/javascript"></script>
<script src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/custom_jscripts.js" type="text/javascript"></script>
<!-- TinyMCE Session vars empty --></head><body id="effective-risk-management-and-quality-improvement">
<div id="wrapper">
<div id="header">
<h1><a title="ParagonRx® | Minimizing Risk and Optimizing Use of Medical Products" href="http://www.paragonrx.com/">ParagonRx<span style="vertical-align: super; font-size: 50%;">®</span> | Minimize Risk and Optimize Use</a></h1>
<form style="display: block;" id="search" name="search" method="get" action="http://www.google.com/search">
<fieldset>
<label for="search_google">Search Website: </label><input name="sitesearch" value="paragonrx.com" type="hidden"><input id="search_google" name="q" class="textfield" type="text">
<input value="" class="search_button" type="submit">
</fieldset>
</form> <a href="http://www.paragonrx.com/contact-us/" title="Contact Us" class="contact_us">Contact Us</a></div>
<ul id="main_nav">
<li id="home"><a href="http://www.paragonrx.com/">Home</a>
</li>
<li id="your_needs"><a href="http://www.paragonrx.com/your-needs/">Your Needs</a>
<ul>
<li class="subitem02"><a href="http://www.paragonrx.com/your-needs/development/">Development</a>
</li>
<li class="subitem03"><a href="http://www.paragonrx.com/your-needs/peri-approval/">Peri-Approval</a>
</li>
<li class="subitem04"><a href="http://www.paragonrx.com/your-needs/commercialization/">Commercialization</a>
</li>
<li class="subitem05"><a href="http://www.paragonrx.com/your-needs/post-marketing/">Post-Marketing</a>
</li></ul>
</li>
<li id="services"><a href="http://www.paragonrx.com/services/">Services</a>
<ul>
<li class="subitem07"><a href="http://www.paragonrx.com/services/risk-management-and-rems/">Risk Management &amp; REMS</a>
<ul>
<li class="subitem08"><a href="http://www.paragonrx.com/services/risk-management-and-rems/evidence-based-analysis/">Evidence-Based Analysis</a>
</li>
<li class="subitem09"><a href="http://www.paragonrx.com/services/risk-management-and-rems/rems-planning-and-design/">REMS Planning &amp; Design</a>
</li>
<li class="subitem010"><a href="http://www.paragonrx.com/services/risk-management-and-rems/regulatory-preparations/">Regulatory Preparations</a>
</li>
<li class="subitem011"><a href="http://www.paragonrx.com/services/risk-management-and-rems/implementation-readiness/">Implementation Readiness</a>
</li>
<li class="subitem012"><a href="http://www.paragonrx.com/services/risk-management-and-rems/operations-assessment/">Operations &amp; Assessment</a>
</li></ul>
</li>
<li class="subitem013"><a href="http://www.paragonrx.com/services/optimal-product-use/">Optimal Product Use</a>
<ul>
<li class="subitem014"><a href="http://www.paragonrx.com/services/optimal-product-use/market-landscape-assessment/">Market Landscape Assessment</a>
</li>
<li class="subitem015"><a href="http://www.paragonrx.com/services/optimal-product-use/observational-research/">Observational Research</a>
</li>
<li class="subitem016"><a href="http://www.paragonrx.com/services/optimal-product-use/patient-care-process-mapping/">Patient Care Process Mapping</a>
</li>
<li class="subitem017"><a href="http://www.paragonrx.com/services/optimal-product-use/product-adoption-programs/">Product Adoption Programs</a>
</li></ul>
</li>
<li class="subitem018"><a href="http://www.paragonrx.com/services/other-services/">Other Services</a>
<ul>
<li class="subitem019"><a href="http://www.paragonrx.com/services/other-services/strategic-planning/">Strategic Planning</a>
</li>
<li class="subitem020"><a href="http://www.paragonrx.com/services/other-services/organizational-design/">Organizational Design</a>
</li>
<li class="subitem021"><a href="http://www.paragonrx.com/services/other-services/medical-marketing-collaboration/">Medical/Marketing Collaboration</a>
</li>
<li class="subitem022"><a href="http://www.paragonrx.com/services/other-services/virtual-advisory-boards/">Virtual Advisory Boards</a>
</li></ul></li></ul>
</li>
<li id="experience"><a href="http://www.paragonrx.com/experience/">Experience</a>
<ul>
<li class="subitem024"><a href="http://www.paragonrx.com/experience/blogs/">Blogs</a>
</li>
<li class="subitem025"><a href="http://www.paragonrx.com/experience/webinars/">Webinars</a>
</li>
<li class="subitem026"><a href="http://www.paragonrx.com/experience/video-podcasts/">Video / Podcasts</a>
</li>
<li class="subitem027"><a href="http://www.paragonrx.com/experience/books-publications/">Books / Publications</a>
</li>
<li class="subitem028"><a href="http://www.paragonrx.com/experience/presentations/">Presentations</a>
</li>
<li class="subitem029"><a href="http://www.paragonrx.com/experience/white-papers/">White Papers</a>
</li>
<li class="subitem030"><a href="http://www.paragonrx.com/experience/comments-to-the-fda/">Comments to the FDA</a>
</li>
<li class="subitem031"><a href="http://www.paragonrx.com/experience/testimonials/">Testimonials</a>
</li>
<li class="subitem032"><a href="http://www.paragonrx.com/experience/faqs/">FAQs</a>
</li>
<li class="subitem033"><a href="http://www.paragonrx.com/experience/surveys/">Surveys</a>
</li></ul>
</li>
<li id="rems_hub"><a href="http://www.paragonrx.com/rems-hub/">REMS Hub</a>
<ul>
<li class="subitem035"><a href="http://www.paragonrx.com/rems-hub/rems-daily/">REMS Daily</a>
</li>
<li class="subitem036"><a href="http://www.paragonrx.com/rems-hub/rems-events/">REMS Events</a>
</li>
<li class="subitem037"><a href="http://www.paragonrx.com/rems-hub/forems/">Wednesday FoREMS</a>
</li>
<li class="subitem038"><a href="http://www.paragonrx.com/rems-hub/rems-directory/">REMS Directory</a>
</li>
<li class="subitem039"><a href="http://www.paragonrx.com/rems-hub/rems-white-papers/">REMS White Papers</a>
</li>
<li class="subitem040"><a href="http://www.paragonrx.com/rems-hub/rems-case-studies/">REMS Case Studies</a>
</li>
<li class="subitem041"><a href="http://www.paragonrx.com/rems-hub/fda-guidance/">FDA Guidance</a>
</li>
<li class="subitem042"><a href="http://www.paragonrx.com/rems-hub/rems-history/">REMS History</a>
</li>
<li class="subitem043"><a href="http://www.paragonrx.com/rems-hub/rems-terminology/">REMS Terminology</a>
</li>
<li class="subitem044"><a href="http://www.paragonrx.com/rems-hub/rems-faqs/">REMS FAQs</a>
</li>
<li class="subitem045"><a href="http://www.paragonrx.com/rems-hub/rems-links/">REMS Links</a>
</li></ul>
</li>
<li id="news_and_events"><a href="http://www.paragonrx.com/news-events/">News &amp; Events</a>
<ul>
<li class="subitem047"><a href="http://www.paragonrx.com/news-events/in-the-news/">In The News</a>
</li>
<li class="subitem048"><a href="http://www.paragonrx.com/news-events/events/">Events</a>
</li>
<li class="subitem049"><a href="http://www.paragonrx.com/news-events/press-releases/">Press Releases</a>
</li>
<li class="subitem050"><a href="http://www.paragonrx.com/news-events/newsletter/">Newsletter</a>
</li>
<li class="subitem051"><a href="http://www.paragonrx.com/news-events/becker-daily-dose/">Becker Daily Dose</a>
</li></ul>
</li>
<li id="our_team"><a href="http://www.paragonrx.com/our-team/">Our Team</a>
<ul>
<li class="subitem053"><a href="http://www.paragonrx.com/our-team/about-us/">About Us</a>
</li>
<li class="subitem054"><a href="http://www.paragonrx.com/our-team/our-approach/">Our Approach</a>
</li>
<li class="subitem055"><a href="http://www.paragonrx.com/our-team/business-partners/">Business Partners</a>
</li>
<li class="subitem056"><a href="http://www.paragonrx.com/experience/testimonials/">Testimonials</a>
</li></ul></li>
</ul>
<div id="main_content_area" class="one_column">
<div class="inner">
<div id="content">
<div id="breadcrumbs"><a href="http://www.paragonrx.com/" title="Home">Home</a>&nbsp;<span class="arrow">»</span>&nbsp;<a href="http://www.paragonrx.com/experience/">Experience</a> <span class="arrow">»</span> <a href="http://www.paragonrx.com/experience/white-papers/">White Papers</a> <span class="arrow">»</span> <span class="lastitem">Effective Risk Management and Quality Improvement by Application of FMEA and Complementary Techniques</span></div><h2 class="page_header">Effective Risk Management and Quality Improvement by Application of FMEA and Complementary Techniques</h2>
<div style="float: left; width: 50%; text-align: left;"><em>Benjamin A. Berman</em></div>
<div style="float: right; width: 50%; text-align: right;"><em>November 2003</em></div>
<br>
<h3>Introduction</h3>
<p>This paper provides my expert opinion of the use and effectiveness of
Failure Modes and Effects Analysis (FMEA) for managing risks and
improving quality in several industrial domains. I also consider and
evaluate several other analytical techniques as complementary extensions
of FMEA.<span style="vertical-align: super; font-size: 70%;"><a href="http://www.paragonrx.com/experience/white-papers/effective-risk-management-and-quality-improvement/#1">[1]</a></span></p>
<p>The opinions that I express in this paper are based on a thorough
review that I conducted of industry standards and procedures for risk
management, FMEA techniques, and FMEA applications in aviation and other
industries. I also base these opinions on my 25 years of experience in
transportation management and analysis, airline flight operations,
safety investigation management, safety research, and airline accident
investigation. I have ten years of experience on the staff of the U.S.
National Transportation Safety Board (NTSB), concluding my service there
as the Chief of the Major Investigations Division. In that position, I
managed the overall investigative effort for U.S. air carrier accidents
from the field investigation to the public board meeting and final
accident report. I also managed the U.S. Governments participation in
foreign aviation accidents. My previous NTSB experience included
management of flight operations, air traffic control, and meteorological
aspects of air carrier accident investigations; on-scene and follow-up
investigations of flight operations for several major accident
investigations including the USAir flight 427 Boeing 737 accident near
Pittsburgh and ValuJet flight 592 DC-9 accident in the Everglades; and
management of research programs on flight crew human factors and
regional air safety issues, both of which were adopted and published by
the NTSB. I am a pilot for a major U.S. air carrier, qualified in the
Boeing 737 and two other transport category aircraft types. I have
consulted with the National Aeronautics and Space Administration (NASA),
the World Bank, the European Bank for Reconstruction and Development,
the U.S. Presidents Aviation Safety Commission, and several airlines,
financial institutions, airport authorities, and other private entities
on safety and analytical matters. I received the A.B. degree <em>summa cum laude</em> in Economics from Harvard College and am a member of the Phi Beta Kappa Society.</p>
<h3>FMEA—Summary and Definition</h3>
<p>According to the Society of Automotive Engineers (SAE) International Aerospace Recommended Practice (ARP) 5580, <em>Recommended Failure Modes and Effects (FMEA) Practices for Non-Automobile Applications</em>,
FMEA is “a formal and systematic approach to identifying potential
system failure modes, their causes, and the effects of the failure mode
occurrence on the system operation…FMEA provides a basis for identifying
potential system failures and unacceptable failure effects that prevent
achieving design requirements from postulated failure modes…FMEA is
used in many system design analyses including assessing system safety,
planning system maintenance activities, defining provisions for fault
recovery, fault tolerance, and failure detection and isolation, and
identifying design modifications and corrective actions needed to
mitigate the effects of a failure on the system.”</p>
<p>The basic FMEA process involves examining each basic hardware,
software, personnel, or functional element of a system, identifying all
the ways in which that element can fail (failure modes), assessing the
effects of each failure mode upon the function of other elements of the
system and the entire system (failure effects), and then assessing the
criticality of the failure effects. Integral to the FMEA process is the
specification of corrective actions that will prevent critical failures
or restore critical functions.</p>
<p>FMEA typically uses a worksheet for analyzing data and documenting
the results. The worksheet proceeds, left to right, from the component
identification, to the associated failure modes, to the failures
effects at various levels of the system (including detectability of the
failure modes/effects), to their risk, reliability, or quality
consequences. The following is an example of an FMEA worksheet that was
prepared by the SAE for analysis of a fictitious aerospace application:</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-sae-table.gif" alt="" height="840" width="484"></p>
<p style="padding-left: 30px;"><em>Source:SAE ARP926B, p. 32.</em></p>
<p>The criticality or level of risk, from a failure is a combination of
the severity of the effect and the probability of its occurrence. Under
FMEA the severity is estimated qualitatively with each effect assigned
to one of several categories ranging from none to catastrophic, and the
probability is assessed either qualitatively or quantitatively (the
latter if failure rate data are available from previous experience or
from laboratory or field experimentation). The severity and probability
assessments are combined into an overall assessment of the risk level of
the failure effect as being acceptable or unacceptable, along the lines
of the following graphic from Federal Aviation Administration (FAA)
guidance material:</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-faa-chart.gif" alt="" height="345" width="497"></p>
<p style="padding-left: 30px;"><em>Source:FAA Advisory Circular 25.1309-1A, System Design and Analysis, p. 7</em></p>
<p>One aspect of the FMEA process that is often ignored in discussions
of the methodology (perhaps because it is not represented on the FMEA
worksheet) is the importance of documenting and retaining all
assumptions, including rationales for failure rates and effects
categorization that underlie the FMEA worksheet entries. This is
specifically cited by the SAE in its recommended standard ARP4761,
appendix G, section 3.2.1.</p>
<p>My review of FMEA utilization in aerospace and several other fields
suggests that the most common applications of FMEA are in product design
and manufacturing processes. FMEA has not typically been applied to the
post-manufacturing environment (such as product distribution and field
usage by providers, operators, maintainers, and customers); however,
post-manufacturing applications are not specifically excluded in FMEA
standards. In fact, in SAE ARP5580 section 6.1.1 (5), “failure
conditions caused by the operational and maintenance environment” are
specifically cited among the failure modes to be considered.</p>
<h3>Cross-industry acceptance and use of FMEA</h3>
<p>FMEA is firmly established as a risk analysis and risk management
methodology. Originating in the U.S. military during the 1940s and
supported by military specification beginning in 1949 (MIL-P-1649, <em>Procedures for Performing a Failure Mode, Effects, and Criticality Analysis</em>),
FMEA methods and applications were officially accepted as a recommended
practice for aerospace engineering by the SAE beginning in 1967 under
ARP926, <em>Fault/Failure Analysis Procedure</em>. FMEA had become a
standard part of the design process in the aerospace industry by the
1980s and has been in continuous use through the present. For example,
the Boeing Commercial Airplane Group relied upon FMEA to substantiate
the safety and reliability of design changes for two generations of the
Boeing 737 commercial airliner: the 737-300/400/500 series, first
produced in the mid-1980s, and the “next generation” 737-600/700/800/900
series, first produced in the late 1990s and early 2000s. I have
personally examined numerous FMEA documents and FMEA-based safety
analyses prepared by aircraft manufacturers for original and modified
transport-category aircraft designs (these FMEA applications are
proprietary to the manufacturers). In addition to these aviation
applications of FMEA, the late 1980s saw the application of FMEA to
design and manufacturing processes by a major U.S. automobile
manufacturer, and these practices were recognized by the automotive
industry under the auspices of the Automotive Industry Action Group
(AIAG) and the SAE (Surface Vehicle Recommended Practice J-1739, first
issued in 1994). Currently, FMEA is recognized by the SAE (ARP5580, <em>Recommended Failure Modes and Effects Analysis (FMEA) Practices for non-Automobile Applications</em>), the FAA (Advisory Circular 25.1309-1A, <em>System Design and Analysis</em>), and the National Aeronautics and Space Administration (NPA 8715.3, <em>NASA Safety Manual</em>, and NSTS 22206, <em>Instructions for Preparation of FMEA and CIL</em>).
In a subsequent section of this paper, I will provide an example of a
successful government-sponsored (and therefore non-proprietary) aviation
industry application of FMEA that resulted in a significant improvement
in commercial air carrier flight safety.</p>
<p>FMEA has also been applied successfully in a wide range of other
domains. For example, FMEA is being used to analyze design and
maintenance issues in building structures (Anker Nielson, Ph.D., “Use of
FMEA, Failure Modes Effects Analysis on Moisture Problems in
Buildings,” <em>Building Physics 2002—6<sup>th</sup> Nordic Symposium</em>).
Also, engineers have applied FMEA to design and manufacturing processes
in the semiconductor industry (Steven Martin and Bedwyr Humphreys,
“FMEA Speeds Time to Market in Photonic IC Manufacturing”, <em>Compound Semiconductor</em>,
November 2002). The authors concluded, “The FMEA technique has been
successfully implemented at MetroPhotonics, aiding in the rapid
development and the successful launch of the SurePath product suite…Time
to market and development costs were greatly reduced through the
selection of optimum system alternatives (through FMEA), resulting in a
successful product launch within four months of concept” (Martin and
Humphreys, p. 69).</p>
<p>FMEA has become established as a standard methodology for risk
management in the healthcare industry. Under Joint Commission on
Accreditation of Healthcare Organizations (JCAHO) Standard LD.5.2,
adopted July 1, 2000, healthcare organizations are required to
proactively identify and manage potential risks to patient safety, using
FMEA and root cause analysis to analyze at least one high-risk process
annually. The U.S. Veterans Administration has developed and begun
implementation of an application of FMEA that the agency customized for
healthcare delivery (Joseph DeRosier, Erik, Stalhandske, James P.
Bagian, and Tina Nudell, “Using Health Care Failure Mode and Effect
Analysis™: The VA National Center for Patient Safetys Prospective Risk
Analysis System,” <em>The Joint Commission Journal on Quality Improvement</em>,
Vol 28. No 5, May 2002). Private health care organizations (for
example, Kaiser Permanente) have begun to implement FMEA-based processes
(Kaiser Permanente, <em>Failure Modes and Effects Analysis Team Instruction Guide</em>,
March 2002). Although healthcare-related applications of FMEA have
considered some aspects of pharmaceutical delivery (for example,
Institute for Healthcare Improvement, “<em>Sample FMEA: Comparison of Five Medication Dispensing Scenarios</em>,”
2003), I am not aware that a comprehensive analysis of pharmaceutical
distribution, delivery, and use, treating all post-manufacture
activities as an integrated system, has been performed to date using
FMEA or any alternative, formal risk-management methodology.</p>
<h3>Advantages of FMEA</h3>
<p>I suggest that FMEA has several general advantages for organizations seeking to improve quality and safety:</p>
<p>First, FMEA is a structured process that promotes disciplined
elicitation of ideas about the kinds of failures that may occur, careful
analysis of specific risk/hazard areas, proper documentation of sources
and assumptions, and identification of interventions that manage risks
to an acceptable level. Regarding the ultimate goal of risk management,
in most applications the FMEA process requires intervention in each
identified adverse outcome until the residual level of risk is
acceptable.</p>
<p>Further, as a “bottom-up process” proceeding from the failure an
individual component of a system to the effects on the entire system,
FMEA helps organizations identify unforeseen, undesired outcomes. Its
best applications are prospective, facilitating the control or
mitigation of adverse outcomes before they occur.</p>
<p>Also, FMEA explicitly considers the detectability of failure modes,
and thus it promotes consideration of failures that can remain latent;
that is, failures that have no immediate effect and (if they remain
undetected) are capable of resulting in adverse effects when combined
with subsequent failure modes or events (however, as is discussed below,
the basic FMEA methodology may need to be modified to fully address
latent failures).</p>
<h3>Limitations of FMEA</h3>
<p>SAE ARP5580 provides the following “cautions” for the application of FMEA:</p>
<ul>
<li>First, a FMEA traditionally considers only non-simultaneous failure
modes. Each failure mode is considered individually, assuming that all
other system components are performing as designed. Hence, a typical
FMEA provides limited insight into the following anomalous behaviors:</li>
</ul>
<ol style="padding-left: 30px;">
<li> the effects of multiple component failures on system functions, and</li>
<li> latent manifestations of defects such as timing, sequencing, etc.</li>
</ol>
<ul>
<li>Second, the prioritization of the failure modes for corrective
actions is substantially subjective. Thus, care should be taken in
decision making when using any quantitative aspects of the numbers
presented in the analysis (SAE ARP5580, Section 3.3).</li>
</ul>
<p>I concur that the basic approach of FMEA is to consider single
failures and that a typical FMEA application handles multiple
(simultaneous/sequential) failures with difficulty (later in this paper,
I will suggest several extensions to FMEA that are capable of
addressing these issues).</p>
<p>Further, I suggest that the following additional general limitations exist for FMEA:</p>
<p>First, as FMEA has typically been applied in aerospace engineering,
designers are permitted to rely upon human performance (such as
interventions by pilots and mechanics) to mitigate the adverse effects
of hardware and software component or system failures. However, in doing
so, no consideration is given to given to imperfect human performance.
For example, FAA guidance for aircraft certification states, “If…a
potential failure condition can be alleviated or overcome…without
requiring exceptional pilot skill or strength, credit can be taken for
correct and appropriate action” (FAA AC25.1309-1A, pararaph 11). The
assessment of “exceptional” skill or strength is subjective, and once a
specific human response to a failure mode is determined to require
unexceptional skill or strength, FMEA typically assumes that the human
will intervene reliably every time that the failure mode occurs. I
believe that this is an unrealistic assumption for human performance,
and as a common treatment of human performance in FMEAs it constitutes a
limitation of the typical FMEA methodology.</p>
<p>Also, as FMEA typically has been applied in design/process
applications, there is no inherent feedback to the FMEA process from the
actual failure modes and outcomes experienced in field use. However,
this feedback is not excluded by the FMEA process and the continuing
refinement of an FMEA through feedback has been explicitly recognized as
an important aspect of system safety analysis in some applications.</p>
<h3>Keys to successful application of FMEA</h3>
<p>I believe that several additional issues are important for obtaining satisfactory results from an FMEA.</p>
<p>First, while FMEA is a structured technique that provides a
comprehensive analysis, it is difficult (or impossible) to prospectively
identify all possible failure modes/adverse outcomes from a complex
component or functional element of a system. Because even the best FMEA
effort may leave some failure modes and effects undiscovered, after
completing an FMEA it is essential to avoid concluding that all risks
have been compensated for or controlled. This suggests that FMEA
analysts need to maintain an open and creative attitude about
identifying failure modes and assessing their effects and consequences,
It also establishes the rationale for obtaining, analyzing, and reacting
to feedback from field use and operations, and for treating the FMEA as
a “living document” that will be revisited and revised on a continuing
basis.</p>
<p>Further while planning and performing an FMEA, it is essential to
understand the scope of the analysis and to choose a proper scope that
will allow the evaluation of all critical risks that can result from
failure modes. For example, many FMEAs are limited to design issues and
do not necessarily consider manufacturing variations or errors. An
aircraft part that includes several linkages may not consider the
effects of cumulative (stack-up) of the manufacturing tolerances that
are allowed for each individual linkage as a possible contributor to
failure modes and effects. Even if the scope of the FMEA for this part
is enlarged to include manufacturing processes and therefore considers
tolerance stack-up, the analysis still may not consider the effects of
failure modes that remain downstream from the processes that have been
included within the analytical scope, such as improper maintenance or
use. When considering all of a products failure modes and effects in
all environments, a still broader scope of analysis might reveal
additional factors that significantly affect safety and quality. For
example, consider a pharmaceutical product with an adverse side effect
that poses a risk to some users. One option for controlling the risks of
these side effects would be for the Food and Drug Administration (FDA)
to withdraw approval for the product. However, because the product also
has therapeutic value, withdrawal of the product may actually result in a
net reduction of patient health and safety, even considering the
adverse consequences of the side effects. The net therapeutic benefit of
the product relative to its side effects will not be identified by an
FMEA of its design, manufacturing, and use—unless the withdrawal of the
product is considered as a failure mode and the scope of analysis is
broadened to consider the net consequences of non-use.</p>
<p>In addition to considering downstream effects in scoping the
analysis, it is essential to recognize that the interventions selected
in an FMEA to mitigate an identified risk can also introduce their own
failure modes and effects having critical risks. Interventions should be
designed to “first, do no harm;” that is, they should introduce no new <em>uncorrected</em>
failure modes. This suggests that FMEA should be performed on each
intervention, as well. In some cases controlling the hazard from one
failure mode can increase the hazard from another, and this may require
consideration of multiple simultaneous or sequential failures as an
extension of FMEA.</p>
<p>Also, while interpreting the results of an FMEA, it is essential to
understand the derivation and limitations of the probability analysis
that is incorporated in the evaluation of the risks associated with
failure effects. The probability that a failure mode will occur can be
obtained from engineering, field, or registry data such as historic
component failure rates; the probability that a functional element or
complex component will fail can be estimated by combining the failure
rates of sub-assemblies or sub-systems. Failure rates may be obtained
from laboratory research if actual field data are unavailable. Lacking
in both field and laboratory data, failure mode probabilities may be
estimated. The FMEA analysts confidence in the results should depend on
the derivation of these probabilities. An additional probabilistic
element in some FMEA applications is the likelihood that an effect of
stated severity will follow from a failure mode. This element needs to
be estimated in a similar manner, with confidence in the results of the
analysis once again depending on the source of the probability
estimates. Another probabilistic element can enter FMEA when considering
interventions to control or mitigate an identified risk; here, the
probability that the intervention will successfully address the risk
needs to be estimated.</p>
<p>Failure and reliability rates are particularly difficult to estimate
when human performance is involved. The FAA states in its design
guidance material that “quantitative assessments of the probabilities of
crew error are not considered feasible” (FAA AC25.1309-1A, paragraph
11); as I have already discussed, the FAA then turns at times to the
unrealistic assumption that humans perform with perfect reliability. In
other domains, performance by trained professionals has been estimated
as being satisfactory in 30-60 percent of exposures to a demanding task.
Although the reliability level of human performance is highly variable
depending on the nature of the task, environment, and individual, it is
probably best to assume that human performance in systems often may be
much less reliable than what is demanded of hardware and software
systems, and accordingly to plan compensations when humans may be
responsible for detecting primary failure modes or for intervening to
mitigate failure effects.</p>
<p>Review of FMEA applications in various industries suggests that there
is no standard definition for an acceptable level of risk. Based on the
high volume of operations with consequent risk exposure and the
publics low tolerance for mishaps, commercial aviation design and
manufacturing is held to a stringent reliability criterion:
certification guidance requires that every failure having catastrophic
consequences must be demonstrated to be extremely improbable; the FAA
defines “extremely improbable failure conditions” as “those having a
probability of on the order of 1 X 10E-9 or less” (AC251309-1A,
paragraph 10). In contrast, FMEA applications in other industrial
domains accept catastrophic outcomes with probabilities that may be
orders of magnitude more likely. An interesting criterion for aviation
design that incorporates both probability and severity factors
establishes that “in general, a failure condition resulting from a
single failure mode of a device cannot be accepted as being extremely
improbable” (FAA AC 25.1309-1A, paragraph 2-g). Thus, every failure mode
having catastrophic consequences, regardless of its estimated
likelihood, must be mitigated by a redundant system or a means of
reliably detecting the failure before it occurs (the FAA guidance does
suggest that “…in very unusual cases, however, experienced engineering
judgment may enable an assessment that such a failure mode is not a
practical possibility.”).</p>
<p>When considering the effectiveness of interventions in mitigating the
risks of failure effects, a significant implication of probability
analysis is the assumption of independent events. Normally, the
probability of two events both occurring is the probability of one event
multiplied by the probability of the other event. For example, consider
an aircraft component that FMEA determines to have an unacceptable
failure rate. To control this risk, designers require the mechanic to
check the component before each flight and also require the pilot to
recheck the component during the taxi-out checklist. If there is a 10
percent chance of the mechanic forgetting to check the component and
also a 10 percent chance of the pilot skipping the same item on the
checklist, the probability of the check being omitted by both persons is
only 1 in 100. In this manner, adequate reliability can be obtained
from two somewhat unreliable human performances by imposing multiple,
redundant interventions. However, this analysis assumes that the pilot
and mechanic events are independent, while in reality these events may
interact: a pilot who knows that the mechanic is supposed to be checking
the component may grow to rely on the mechanic and become less likely
to perform the re-check. As another example, consider a pharmaceutical
product that requires patients to receive periodic lab tests to detect
possible adverse side effects. Multiple, redundant interventions are
designed to ensure that patients receive the lab tests: doctors and
pharmacists are both instructed to track the due dates for the tests and
notify patients. However, if doctors become aware that pharmacists are
tracking the due dates, the doctors may become less likely to perform
this effort as well; therefore, multiple intervention collapses to a
single intervention and the redundancy is lost. Whenever the assumption
of independent events is violated and the likelihood of one event
becomes a function of another event, it is impossible to conclude that
the desired reliability will result from multiple interventions.
Therefore, interventions must be designed and implemented so as to
provide and preserve the independence of the events.</p>
<h3>Complementary analytical techniques</h3>
<p>In its <em>Safety Manual</em>, NASA states that “risk assessment
should use the simplest methods that adequately characterize the
probability and severity of undesired events.” The NASA manual further
states, “Qualitative methods that characterize hazards and failure modes
and effects should be used first…quantitative methods are to be used
when qualitative methods do not provide an adequate understanding of
failures, consequences, and events” (NASA NPG 8715.3).</p>
<p>A variety of analytical methods are available to apply to risk
management, in addition to FMEA. I will briefly define and discuss
several of these methods and indicate how they can be used to complement
FMEA and extend its applications into areas in which FMEA is otherwise
inherently limited.</p>
<p>I have described the FMEA method as a “bottom-up” approach that
attempts to identify failure effects (some of which may not yet have
occurred in actual use of the product) by starting with individual
component failures, imagining the ways the component can fail, and then
proceeding up the chain of the system to subsequent failures and
consequences. Further, I identified the bottom-up orientation of FMEA as
advantageous for a prospective, accident-prevention program.</p>
<p>Some alternative analytical methods are “top-down” in that they begin
with the ultimate system consequence or failure event and then proceed
down into the system to identify why the failure occurred. These methods
perform well as retrospective analyses; for example, investigations of
accidents or incidents that have already occurred. However, top-down
methods can also be useful in prospective analysis; for example, when
concerned about a severe consequence, recognizing that the primary FMEA
method may miss some failure effects, it may also be helpful to analyze
beginning with the consequence itself and to search creatively for other
sub-system functions or component failures might bring about the
undesired result.</p>
<p>The SAEs recommended standard for the general evaluation of aircraft safety (ARP4761, <em>Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment</em>)
describes an over-arching “System Safety Assessment” (SSA) process. SSA
integrates FMEA and some of the following approaches, as required, to
thoroughly evaluate all of the failure modes, failure effects, and risks
of a system and show that the entire system (the aircraft) operates at
the required level of safety/reliability despite all anticipated failure
modes.</p>
<p><span style="text-decoration: underline;">Functional Hazard Analysis</span>
(FHA) is a top-down approach that is most often performed at the
beginning of a design effort, when the final specifications for a
product have not yet been settled yet its basic functions are already
established. Using engineering judgment and knowledge from similar
efforts, analysts review the basic functions of a product or process and
suggest system-level hazardous outcomes for further analysis. This
method allows the safety/quality improvement process to begin early in
product development, at least at a level of broad generality.</p>
<p>Methods similar to FHA also can be applied retrospectively, after a
product is fielded. One successful application is Hazard Analysis of
Critical Control Points, which is used in the food services industries
to evaluate the entire chain of food production and distribution,
identifying and controlling sources of food contamination. This
application seems amenable to the simpler FHA methodology rather than a
formal FMEA.</p>
<p><span style="text-decoration: underline;">Fault Tree Analysis</span>
(FTA) is more formal top-down approach to identifying the causal links
between functional breakdowns and their antecedents in events or
failures of lower-level components. The FTA begins with the system-level
failure or consequence that the analysts want to understand. Proceeding
down through the system from the top-end level to the underlying
processes and components, the analysis results in a graphical
representation of the combinations of subsystem and component failures
that can result in the system event. The fault tree (so-named because it
resembles the root structure of a tree) uses standard notations of
Boolean logic to denote precursor or lower-level events that must occur
individually (“or-gate”) or in combination (“and-gate”) to bring about
the higher level event. In this manner, FTA directly incorporates
multiple causation (simultaneous/sequential) events. Further, when
failure rates are added to each component of the tree diagram, the
probabilities of each of the lower-level events can be added or
multiplied to estimate the probability of the ultimate system-level
event.</p>
<p>The following is an example of FTA provided by the SAE:</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-sae-diagram.gif" alt="" height="494" width="576"></p>
<p style="padding-left: 30px;"><em>Source: SAE ARP926B, p. 46.</em></p>
<p>As a top-down approach, FTA may identify one or more underlying
causes of the top-level event but omit others that might be identified
in the bottom-up FMEA. Additional limitations of FTA are that the
methodology (unlike FMEA) does not represent the severity of
consequences; hence, it is difficult to assess the risks of failure and
evaluate them with respect to the available countermeasures, without
also undertaking an FMEA.</p>
<p>Because it handles multiple failures, various multiple causations as
expressed through Boolean logic, and the associated probabilities rather
naturally, FTA also complements FMEA where the latter is limited. I
suggest that FTA notation and techniques should be applied selectively
to explore multiple failures and associated probabilities once these
factors have been identified in the basic FMEA. Another advantage of FTA
when used in combination with FMEA is the top-down check of the
bottom-up process that I have already described. FTA might be applied
selectively, once again, to confirm that FMEA has not omitted
catastrophic outcomes. I would consider selective application of FTA as a
complementary extension to the basic FMEA methodology. This is
explicitly recognized by the SAE in ARP926B.</p>
<p><span style="text-decoration: underline;">Probabilistic Risk Assessment</span>
(PRA) has been adopted by NASA as formal methodology for analyzing “the
probability (or frequency) of occurrence of a consequence of interest,
and the magnitude of that consequence, including assessment and display
of uncertainties.” (Michael A. Greenfield, “Risk Management Tools,” NASA
Langley Research Center presentation, May 2, 2000). A key contribution
of PRA is that it considers, tracks, and documents the current state of
knowledge and certainty of the probabilities that are employed in basic
FMEA and other analyses. One significant limitation of PRA, as defined
by NASA, is that the methodology requires specific experience-based
failure rate data for the components and functions that are being
analyzed. As a result, I suggest that it may be difficult to apply
formal PRA to “softer” areas such as human performance in FMEA
interventions.</p>
<p><span style="text-decoration: underline;">Markov Analysis</span> (MA)
is a specialized probabilistic analysis especially well suited to
evaluating the failure effects and consequences of high-technology
systems that include self-monitoring, self-repairing and
self-reconfiguring functionalities. MA is capable of handling these
complex relationships between failure mode, effect, and consequence by
representing the relationship as a chain, each element in the chain in
an operational or non-operational state, and the movement between states
as a system of differential equations. I would suggest that MA is a
good methodology to employ as a complement to basic FMEA and FTA when
the nature of the components, environment, or operators require it;
otherwise, in accordance with the principle of minimizing the complexity
of risk analysis, MA does not appear warranted in most applications.</p>
<p>To summarize these alternative methodologies, it is quite possible to
extend a basic FMEA into areas in which the FMEA method is limited,
including multiply caused events, simultaneous or sequential events, and
the estimation of probabilities of failure modes, effects, and
consequences (and our confidence in the estimated probabilities), by
applying selected aspects of FTA and PRA to the FMEA. I do not suggest
that complete, formal FTA and PRA need to be undertaken in every FMEA
application; rather, these methodologies should be drawn from as
required.</p>
<h3>Complementary field reporting and data analysis systems from aviation</h3>
<p>In a previous section, I mentioned the importance of feeding
information from the post-manufacturing user communities and processes
back into the FMEA to ensure that the consequences of failure modes that
arise only in product use (perhaps because they were rare events and
did not occur during design and testing) are recognized and compensated
for once they have been discovered. There are several fairly recent
developments in aviation industry reporting and analysis systems,
potentially useful for refining and refreshing an FMEA on a continuing
basis, that may also have applications in other industries.</p>
<p><span style="text-decoration: underline;">Aviation Safety Action Programs</span>
(ASAP) are cooperative reporting systems for persons active in
commercial aviation operations, including pilots, mechanics, and
aircraft dispatchers, to report the events that happen in daily line
operations. ASAP reports are non-jeopardy; in fact, if a person reports
an event to ASAP independently of enforcement action by the regulatory
authority (FAA) then the FAA will typically waive sanctions for any
regulatory violation related to the event. This waiver of sanctions
motivates personnel to report the information. ASAP reflects the
aviation systems recognition that for human failings, obtaining the
information is often more important than punishment the transgressions,
most of which are inadvertent in any case. A key feature of the ASAP
program is the Event Review Team, comprising representatives from the
airline, the pilots association, and the FAA, which meets periodically
to review all submitted ASAP reports and act on the information in the
reports. ASAP is considered to be successful in revealing,
disseminating, and promoting resolution of adverse events in daily
flight operations that would otherwise remain unknown. ASAP applications
are increasingly popular in commercial aviation. These programs are
described in official FAA guidance (Advisory Circular 120-66B, <em>Aviation Safety Action Program</em>).</p>
<p>Whereas ASAP obtains information from the personnel in the aviation
system, Flight Operations Quality Assurance (FOQA) programs tap into the
volumes of parametric data generated during regular flight operations
and recorded continuously by on-board solid state recording equipment
(similar to, but usually distinct from the crash-hardened Digital Flight
Data Recorders that are used in accident investigations). In FOQA, the
greatest challenges are handling mass data and then interpreting the
information. Initial applications of FOQA concentrated on identifying
events in which normal flight parameters (such as airspeed limitations,
g-loading, touchdown relative to target) were exceeded. The programs are
beginning to delve beyond exceedance monitoring to the consideration of
within-specification performance statistics, including both the means
and the distributions about them, which can then define the norms of the
industry. There is also a growing trend in FOQA programs to link the
information obtained from FOQA with information derived from ASAP about
the same events. This facilitates the combined analysis of “what”
happened (from FOQA) and “why” it happened (ASAP, to the extent that the
personnel involved in the event were aware of why they performed the
way that they did). A long-term NASA research program, the Automated
Performance Management System, is encouraging the establishment of FOQA
programs at various U.S. airlines and enhancing data analysis along
these lines. Most of the major U.S. air carriers are generating and
collecting FOQA data on at least their more modern fleet types (these
aircraft are equipped with the required data busses). FOQA programs are
described in the Flight Safety Foundations <em>Flight Safety Digest</em>,
July-September 1998, “Aviation Safety: U.S. Efforts to Implement Flight
Operational Quality Assurance Programs.” Although analogous data may
not be available in other applications, FOQA demonstrates the value of
routine monitoring of the use of products in the field, including the
identification of product misuse (exceedances in FOQA) and the
characterization of norms for product use.</p>
<p>The Continuing Airworthiness Surveillance System (CASS) is an
aviation reporting and analysis system that concentrates on tracking
product failure modes, effects, and consequences in actual line
maintenance operations. CASS is one of the oldest data-driven quality
assurance programs, beginning in 1964 and tracing its history to
industry concerns about several maintenance-related air carrier
accidents during the 1950s. Air carriers are required to implement CASS
by Federal aviation regulations (14 CFR Part 121.373); interestingly,
CASS is the only safety management/quality assurance system that has
been specifically mandated by the FAA. CASS is defined by the FAA as a
“structured process to identify factors that could lead to an accident
or incident through collection and evaluation of information that can be
used as indicators of the degree of maintenance program effectiveness
and performance…accomplished through a closed-loop, continuous cycle of
surveillance, investigations, data collection and analysis, corrective
action, corrective action monitoring, and back to surveillance.” (FAA AC
120-16D, <em>Air Carrier Maintenance Programs</em>, and AC 120-79, <em>Developing and Implementing a Continuing Airworthiness Surveillance System</em>).</p>
<p>Event reporting systems with many similarities to these aviation
systems are being developed and used in other industries, including
healthcare. I think that review of the characteristics and
implementation of ASAP, FOQA, and CASS may enhance similar systems in
alternative industries, particularly as these aviation systems are
applied in combination to obtain information that only the personnel in
the system can report, additional mass data about regular operations,
and specific product and personnel failures in the post-manufacturing
environment. Also, I suggest that information systems with these
characteristics can be effective feedback mechanisms for the ongoing
analysis of failure modes, effects, and consequences through FMEA.</p>
<h3>The Boeing 737 Flight Controls Engineering Test and Evaluation Board: a successful application of extended FMEA</h3>
<p>On September 8, 1994, USAir flight 427, a Boeing 737-300 airplane,
crashed while maneuvering to land at Pittsburgh International Airport,
Pittsburgh, Pennsylvania. All of the 132 persons aboard were killed, and
the airplane was destroyed. The accident occurred in clear weather with
light winds, during the hours of daylight. After a three-year
investigation, the National Transportation Safety Board (NTSB)
determined that the probable cause of this accident was “loss of control
of the airplane resulting from the movement of the rudder surface to
its blowdown limit…The rudder surface most likely deflected in a
direction opposite to that commanded by the pilots as a result of a jam
of the main rudder power control unit servo valve secondary slide to the
servo valve housing offset from its neutral position and overtravel of
the primary slide.” (National Transportation Safety Board, Uncontrolled
Descent and Collision With Terrain, USAir Flight 427, Boeing 737-300,
N513AU, Near Aliquippa, Pennsylvania, September 8, 1994, NTSB AAR-99/01,
adopted on 3/24/99).</p>
<p>Before this accident the rudder system of the 737 had been evaluated
by Boeing and the FAA, in full compliance with existing certification
requirements, using failure analysis (a less rigorous version of FMEA)
for the original design reviews performed during the 1960s and FMEA for
new-model reviews performed during the 1980s and 90s. Because the rudder
systems had not been completely redesigned in the new model 737s, the
FAA required only a very limited scope for the FMEAs conducted in the
80s and 90s. Despite these analyses and consistent with their limited
scope, the NTSB investigation determined that the airplanes rudder
system was subject to several previously unidentified single-point
failures that could have catastrophic results. One or more of these
failure modes was most likely involved in the rudder system jam and
reversal, which led to the fatal accidents.</p>
<p>The NTSB issued numerous safety recommendations related to its
findings regarding the Boeing 737 rudder system and unusual attitude
recovery procedures for flight crews. In Safety Recommendation A-99-21,
the NTSB recommended to the FAA:</p>
<p style="padding-left: 30px;">Convene an engineering test and
evaluation board to conduct a failure analysis to identify potential
failure modes, a component and subsystem test to isolate particular
failure modes found during the failure analysis, and a full-scale
integrated systems test of the Boeing 737 rudder actuation and control
system to identify potential latent failures and validate operation of
the system without regard to minimum certification standards and
requirements in 14 Code of Federal Regulations Part 25. Participants in
the engineering test and evaluation board should include the Federal
Aviation Administration (FAA); National Transportation Safety Board
technical advisors; the Boeing Company; other appropriate manufacturers;
and experts from other government agencies, the aviation industry, and
academia. A test plan should be prepared that includes installation of
original and redesigned Boeing 737 main rudder power control units and
related equipment and exercises all potential factors that could
initiate anomalous behavior (such as thermal effects, fluid
contamination, maintenance errors, mechanical failure, system
compliance, and structural flexure). The engineering boards work should
be completed by March 31, 2000 and published by the FAA.</p>
<p>In response to this recommendation, the Engineering Test and
Evaluation Board (ETEB) was convened in May 1999 and completed its work
in July 2000 with the issuance of a final report. (Federal Aviation
Administration, <em>737 Flight Controls Engineering Test and Evaluation Board Final Report</em>,
July 20, 2000.) The staff of the ETEB was detailed from the FAA, Boeing
(Commercial, Space, and Military Airplane divisions), Air Line Pilots
Association, Ford Motor Company, Air Transport Association, Interstate
Aviation Commission (Russia), NASA, and U.S. Navy.</p>
<p>According to the ETEBs report, the group conducted:</p>
<ul>
<li>A failure analysis of the flight control system to identify potential failure modes;</li>
<li>Component and subsystem tests to isolate particular failure modes found during the failure analysis; and</li>
<li>Full-scale integrated systems tests, including ground and flight
testing, of the … 737 rudder actuation and control system to identify
potential latent failures and to validate the operation of the system
(ETEB Final Report, p. 2-3).</li>
</ul>
<p>The ETEB noted that normal certification procedures for aircraft and
components require consideration of the probabilities of a failure mode
or adverse effect. However, the ETEB chose to evaluate the severity of
failure mode consequences without regard to their probability of
occurrence. The ETEBs rationale for this approach was that the Boeing
737 had experienced approximately four serous failures of its rudder
system in 100 million flight hours, two of which had resulted in fatal
accidents. Therefore, the failures under investigation were extremely
rare but of extremely adverse outcome. Consequently, it was considered
appropriate to treat any failure mode with the potential for
catastrophic consequences as of the highest risk level, regardless of
how unlikely the failure mode or effect. A related goal of this new
analysis was to “focus…on rare failures that may not have been
considered in the original certification requirements” (because the
failures were considered extremely improbable, ETEB Final Report, p.
2-8). The ETEB described its analytical approach as follows:</p>
<p style="padding-left: 30px;">The ETEB conducted a comprehensive and
detailed failure modes and effects analysis (FMEA) for the complete
rudder control system…Preliminary hazard classifications were assigned
to each failure, based on the predicted severity and the ability of the
flight crew to maintain control of the airplane and conduct a safe
landing. For all failures classified as “catastrophic (Class I)” or
“hazardous (Class II),” the ETEB conducted failure simulations using a
detailed high-fidelity simulation of the rudder control system. In
addition, the ETEB conducted pilot-in-the-loop failure simulations using
a motion-base flight simulator. The purpose was to identify the impact
of the failures on the operation of the airplane following flight crew
actions. The hazard classifications of the failures were updated, based
on the combined results from these two simulation activities (ETEB Final
Report, p. 2-7).</p>
<p>These tests and simulations were used to verify and validate the
hazard levels that had preliminarily been assigned to the failure modes.
Because some failures and interventions had unexpected consequences in
the testing, the feedback from these verifications was extremely
important and influential in the final conclusions and recommendations
of the ETEB. This demonstrates how an FMEA that is open to feedback and
change, either from testing or field experience, can provide much better
results than a one-time evaluation.</p>
<p>The ETEB illustrated the verification and feedback built into the FMEA in the following figure from its final report:</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-eteb-diagram.gif" alt="" height="416" width="575"></p>
<p style="padding-left: 30px;"><em>Source: ETEB Final Report, p. 2-6</em></p>
<p>The full range of hazard classifications followed standard FAA practice and was defined as follows by the ETEB:</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-eteb-hazards-table.gif" alt=""></p>
<p style="padding-left: 30px;"><em>Source: ETEB Final Report p. 3-3</em></p>
<p>The ETEB used a standard adaptation of the FMEA analysis form (see
table). It is interesting to note how the form explicitly recognized the
mitigating effects of flight crew actions in response to equipment
malfunctions (columns 5, 7, and 8).</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-eteb-analysis-form.gif" alt="" height="584" width="576"></p>
<p style="padding-left: 30px;"><em>Source: ETEB Final Report, p.3-2</em></p>
<p>Although the possibility of imperfect flight crew performance (a
realistic expectation for human intervention in a complex or stressful
situation) was not explicitly modeled on the FMEA worksheet, the ETEB
accomplished this important extension to the basic FMEA by validating
and revising assumptions about the reliability of flight crew
performance through its testing process. The ETEB found that flight
crews were not able to reliably intervene and mitigate the consequences
of rudder component failures in some operational circumstances, and
these revised expectations were entered into the final versions of the
FMEA worksheets.</p>
<p>The following figure provides an excerpt of an actual FMEA worksheet.
This worksheet includes a finding of catastrophic severity for a
failure effect that could not be mitigated:</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-eteb-appendix.gif" alt="" height="559" width="689"></p>
<p style="padding-left: 30px;"><em>Source: ETEB Final Report, appendix A, p. 95</em></p>
<p>Another useful extension that the ETEB added to the basic FMEA was
the explicit consideration of latent (preexisting, undetected) failures
combined with active failures. Although FMEA is not considered to be
well-suited to the analysis of multiple failure modes, the ETEB was able
to readily analyze these sequential failure combinations by treating
the latent and active failures as a single combined failure mode for
subsequent evaluation of the failure effects and consequences. This
manual extension of the FMEA method was effective for linked pairs of
errors; I think that it may have been very complicated to use this
method to track and display triple or even more complicated failure
combinations, but these failure combinations were not required.</p>
<p>The table that follows (from ETEB Final Report, p. 3-40) provides a
sample of the new latent/active failure combinations that the ETEB was
able to identify and analyze using FMEA:</p>
<p><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/burman-1-eteb-failures-table.gif" alt="" height="510" width="571"></p>
<p>The FMEA undertaken by the ETEB was successful in identifying a large
number of previously unknown or unevaluated failure modes, several of
which had the potential to result in catastrophic consequences. The
following are excerpted from the results presented by the ETEB in its
final report:</p>
<p style="padding-left: 30px;">The [Boeing] 737 rudder control system is susceptible to a number of:</p>
<ul>
<li>Failures and jams that can cause uncommanded rudder motion;</li>
<li> Failures and jams that affect the operation of both the rudder main
and standby power control units (PCU), thereby defeating the
independence of the two systems; and</li>
<li> Latent failures.</li>
</ul>
<p>These failure modes are single failures, single jams, or latent failures in combination with a detectable failure or jam.</p>
<p>The rudder control system of the Initial and Classic Model 737s with
the modifications required by the applicable FAA [Airworthiness
Directives]…have:</p>
<ul>
<li>14 single failures and jams, and 12 latent failure combinations,
that have Class I failure effects in the takeoff and landing regimes.
These same failure modes have 4 Class I effects and 22 Class III (major)
effects in the rest of the flight envelope.</li>
<li>8 single failures and jams, and 11 latent failure combinations, that have Class II failure effects. (ETEB Final Report p.. 1-3)</li>
</ul>
<p>The ETEB drew strong conclusions about factors influencing the
efficacy of human interventions to mitigate rudder system failures:</p>
<p>The ETEB conducted 40 hours of pilot-in-the-loop rudder failure
simulations with10 pilot and co-pilot flight crews from four airlines.</p>
<ul>
<li>In general, the flight crews found the existing Jammed or Restricted Rudder Emergency Procedure difficult to use.</li>
<li>The flight crews appeared to have received little training in the
use of the Jammed or Restricted Rudder Emergency Procedure or the
Uncommanded Yaw or Roll Emergency Procedure.</li>
<li>The lack of a clear and unambiguous display of rudder position made
it difficult for the crews to diagnose uncommanded rudder deflections
and take prompt corrective actions.</li>
<li>Uncommanded rudder hardover deflections during takeoff and landing
resulted in Class I failure effects [i.e., human intervention was not
reliably effective] (ETEB Final Report, p. 1-4).</li>
</ul>
<p>The ETEBs investigation of latent failure effects using extended
FMEA methods resulted in a conclusion that “there are several latent
failures that, when combined with one additional single failure or jam,
result in Class I or Class II failure effects. There are insufficient
inspections for these latent failures” (ETEB Final Report, p. 1-5).</p>
<p>As I have indicated throughout, no FMEA is can be considered complete
unless it leads to the mitigation of the unacceptable risks that the
analysis identifies. The ETEBs application of FMEA resulted in the
following recommendations for redesign of the rudder system:</p>
<p style="padding-left: 30px;">Modify the Boeing Model 737 rudder control system to ensure that:</p>
<ul>
<li>No single failure or single jam of the rudder control system will
cause uncommanded motion of the rudder surface that results in a Class I
failure effect;</li>
<li>No combination of failures or jams will result in a Class I failure
effect, except for those combinations that are shown to be extremely
improbable; and</li>
<li>No probable single failure or jam will have an effect worse than Class IV.<br>In
addition, The Boeing Company should consider providing a fail-safe
rudder control system design that provides protection from latent
failures that contribute to a Class I failure effect (ETEB Final Report,
p. 1-6).</li>
</ul>
<p>As a result of these recommendations (and the preceding accident
investigation causal findings and recommendations of the NTSB), the
Boeing 737 rudder system has been redesigned to provide reliable
redundancy, and a major hardware retrofit program is underway for the
entire fleet.</p>
<p>To mitigate risks pending completion of this fleet retrofit, the ETEB
also provided the following recommendations to improve the risk
mitigation value of human (pilot and mechanic) interventions following a
rudder system failure:</p>
<ul>
<li>Revise and simplify the current “Jammed or Restricted Rudder” emergency procedure.</li>
<li>Provide additional training to flight crews in the use of the
“Jammed or Restricted Rudder” emergency procedure and the related
“Uncommanded Yaw or Roll” emergency procedure.</li>
<li>Display rudder position to the flight crew.</li>
<li>Alert flight crews and maintenance crews to the signs of rudder
malfunctions, such as uncommanded pedal motion (ETEB Final Report, p.
1-6).</li>
</ul>
<p>These recommendations targeted at improving human performance have
been partially implemented by the aircraft manufacturer and FAA, from
2000 to present. Despite the limitations that remain in human
interventions, it is most significant, I believe, that the result of the
FMEA performed by the ETEB was to render the designers expectations
for human performance, and the designs reliance on human intervention,
much more consistent with realistic human capabilities and limitations.
This was a strong contributor to the accuracy and applicability of the
FMEAs results and its ability to improve system safety.</p>
<p>In all, I believe that the ETEB process was a very successful example
of the application of FMEA extended with (1) top-down analysis (the
program began with foreknowledge that the end-level adverse event to
eliminate or mitigate was flight control malfunction leading to loss of
aircraft control), (2) consideration of multiple (latent) failures, and
(3) realistic consideration of human performance during interventions,
and (4) feedback from external data sources to FMEA revision. In the
ETEB application, FMEA was not supplemented by data-driven analysis of
conditional probabilities, this was an appropriate, conservative
response to the extremely rare/extremely hazardous nature of the
environment and threats.</p>
<p>The ETEBs work shows how the basic FMEA combined with complementary
extensions can form a comprehensive safety analysis that results in real
safety improvement. The excellent results of the ETEB program are
equally a testament, I think, to a strong effort to creatively re-think
the failure modes and effects for a system that had been thought to be
completely well-understood and thoroughly time-tested by 100 million
hours of field use. This creativity and openness are necessary
ingredients for any successful analysis.</p>
<h3>Conclusions about FMEA</h3>
<p>Based on the foregoing review, I conclude the following about the Failure Modes and Effects Analysis methodology:</p>
<ul>
<li>FMEA is a sound methodology for basic, structured risk management and quality improvement analysis.</li>
<li>The ideal approach can be to use FMEA as the backbone for analysis
that also includes the integration of complementary methods, as
required; for example, it may be appropriate to apply elements of FTA or
PRA to understand and explore the proper scope of analysis, the
significance of failure effects, and the effectiveness of risk
management interventions.</li>
<li>Thoughtful application of FMEA can identify when these extensions
are required and to integrate and document results of an extended
analysis.</li>
<li>The limited reliability of humans in complex systems argues for
multiple, redundant, independent interventions when relying on humans to
detect failure modes or actively intervene to mitigate failure effects.</li>
<li>FMEA, as extended with appropriate top-down, probabilistic, and
feedback methods, is an excellent framework for risk management and
quality improvement in the post-design/post-manufacture (field
distribution, application, or user) environment, including the human
performance aspects of this environment.</li>
</ul>
<p>&nbsp;</p>
<p><a name="1"></a><span style="vertical-align: super; font-size: 70%;">[1]</span>
I acknowledge and thank ParagonRx, LCC for its support of my review of
risk-management methodologies and the writing of this paper. All
opinions expressed herein are my own and do not necessarily represent
the opinions, policies, and products of ParagonRx, LLC.</p>
<p>&nbsp;</p>
<p><a href="http://www.paragonrx.com/downloads/white_papers/Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques.pdf" target="_blank"><img src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/adobe_pdfdoc.gif" alt="" height="14" width="14"><em> Download</em></a></p>
<div class="previous-next"><span class="previous">Previous page: <a href="http://www.paragonrx.com/experience/white-papers/" title="White Papers">White Papers</a></span> <span class="next">Next page: <a href="http://www.paragonrx.com/experience/comments-to-the-fda/" title="Comments to the FDA">Comments to the FDA</a></span></div>
</div>
<div id="sidebar">
</div>
</div>
<span class="content_foot">&nbsp;</span>
</div>
<div id="sign_up"><form style="display: block;" action="http://visitor.constantcontact.com/d.jsp" method="post"> <fieldset> <legend>Register for periodic industry updates:</legend> <label>Email Address</label> <input name="ea" class="textfield" value="Enter Email Address" type="text"> <input class="submit_button" name="go" value="" type="submit"> <input name="m" value="1102312321000" type="hidden"> <input name="p" value="oi" type="hidden"> </fieldset> </form></div>
<div id="footer">
<div class="vcard">
<div>ParagonRx International, LLC&nbsp;&nbsp;&nbsp; <a class="email" href="mailto:info@paragonrx.com">info@paragonrx.com</a></div>
</div>
<div class="links"><a href="http://www.paragonrx.com/your-needs/" title="Your Needs">Your Needs</a><span></span><a href="http://www.paragonrx.com/services/" title="Services">Services</a><span></span><a href="http://www.paragonrx.com/experience/" title="Experience">Experience</a><span></span><a href="http://www.paragonrx.com/rems-hub/" title="REMS Hub">REMS Hub</a><span></span><a href="http://www.paragonrx.com/news-events/" title="News &amp;amp; Events">News &amp; Events</a><span></span><a href="http://www.paragonrx.com/our-team/" title="Our Team">Our Team</a><span></span><a href="http://www.paragonrx.com/contact-us/" title="Contact Us">Contact Us</a><span></span><a href="http://www.paragonrx.com/site-map/" title="Sitemap">Site Map</a><span></span><a href="http://www.paragonrx.com/privacy-policy/" title="Privacy Policy">Privacy Policy</a><span></span><a href="http://www.paragonrx.com/terms-of-use/" title="Terms of Use">Terms of Use</a></div>
<p id="copyright">© 2009-2011 ParagonRx International, LLC.&nbsp; All Rights Reserved.</p>
</div>
</div>
<!-- CODE ADDED FOR GOOGLE ANALYTICS -->
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script><script src="Effective%20Risk%20Management%20and%20Quality%20Improvement%20by%20Application%20of%20FMEA%20and%20Complementary%20Techniques%20%7C%20ParagonRx_files/ga.js" type="text/javascript"></script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-7665971-1");
pageTracker._trackPageview();
} catch(err) {}
</script>
<!-- CODE ADDED FOR POPUPS IN IE -->
<script type="text/javascript">
function JavascriptPopUpWindow(link,windowId,sizeX,sizeY,resizable,scrollbars)
{
var x=window.outerWidth;
var y=window.outerHeight;
if(document.all)
{
x=screen.width;
y=screen.height;
}
x=(x-sizeX)/2;
y=(y-sizeY)/2;
var newWindow=window.open(link,windowId,'width='+sizeX+',height='+sizeY+',screenX='+x+',screenY='+y+',left='+x+',top='+y+',resizable='+resizable+',scrollbars='+scrollbars);
if(newWindow)
newWindow.focus();
void(0);
}
</script>
<!-- 1.7609 / 134 / 13105944 / 13379852 -->
</body></html>