You are on page 1of 12

Extreme failure analysis: never again a repeat failure

Apply root-cause failure analysis to recurring reliability problems



K. Bloch, Flint Hills Resources, L.P., Rosemount, Minnesota
The ultimate purpose of this article is to significantly reduce the risk for catastrophic equipment
failures. Readers may believe that having been trained in root-cause failure analysis (RCFA) is
enough. Why, then, is some equipment allowed to repeatedly fail? Are low-consequence repeat
failures discretionary maintenance opportunities, or precursors to more serious reliability and
safety problems? What really constitutes effective RCFA? Let's consider real life experiences to
answer these questions.
For equipment failure analysis to be effective, our beliefs (and even the most reasonable of
assumptions) must align with the facts. Unfortunately, an extreme failure (an explosion, fire,
wreck or crash) often complicates matters by compromising much of the information that we
would normally use to determine an accident's cause. The issue with an extreme failure is that
although limited physical evidence remains, its consequences are devastating. Indeed, the
consequences are so severe that it is unthinkable to take action without being certain that the
problem will be solved.
Determining causes with scant physical evidence. Without physical evidence it can be very
difficult to look at an effect and determine its cause. In contrast, predicting the effect of an
observed cause is a relatively simple task. For example, consider the simple mental
experiment1 shown in Fig. 1. First predict the outcome of a melting ice cube on hot concrete.
Then look at the photo under it and explain how the water stain got there. Note that you would
be mistaken to believe that an ice cube left behind this stain. In situations where conclusive
physical evidence has been compromised, it is sometimes easier to pass failures off as acts of
sabotage or conspiracy. Worse yet, events leaving behind no physical evidence are often
dismissed as an "act of God," and the case is closed.

Fig. 1 Melting ice cubes leave a stain on concrete, but
what left the other stain behind?

In reality, the evidence you need to solve the problem is most likely available but hidden from
plain sight. Therefore, identifying a probable cause involves knowing where to find this
evidence. Admittedly, resolving who or what left the water behind in Fig. 1 is hardly a matter of
great consequence, but in extreme failures the implications are infinitely higher. Moreover, since
there is usually low confidence in the physical evidence left behind by extreme failures, we must
turn our attention to their latent, or hidden causes.
Latent cause identification. Hidden but powerful forces within our organizations allow
incremental mistakes to negatively impact safety and reliability. We must identify these latent
causes to develop an action plan toward assured failure prevention. Latent cause identification
is simplified somewhat by recognizing that a specific sequence of events is shared between
many different extreme failures. The "extreme failure life cycle" shown in Fig. 2 represents the
relationship between a failure, a repeat failure and an extreme failure. Underlying maintenance
and design defects can usually be detected as the probable cause of many controversial
failures when this pattern is kept in mind.

Fig. 2 Extreme failure life cycle showing the
process a failure goes through to
become an extreme failure. Notice the
repeat failure's position.
Fact-based conclusions ultimately add more value than unproductive conspiracy and sabotage
theory debates. Assigning blame instead of confronting the latent cause is a certain prescription
for repeating the same problem. The extreme-failure life cycle indicates that when repeat
reliability events are disregarded they eventually become the catalyst for progressively more
serious and potentially highly dangerous equipment failures.
Repeat failures tell an important story. The role that a "repeat failure" plays in the life cycle of
an extreme failure is of great interest. In a "hindsight is 20/20" world, we often wish we had
acted differently after suffering the painful consequences of a decision under our control. Since
repeat failures are the likely intermediate step leading up to an extreme failure, they are also
reliable warning signals that precede many catastrophic equipment failures. Taking control over
repeat failures to consciously prevent a catastrophic accident reinforces the precept that we are
in charge of equipment reliability and not victims of their "unpredictable" behavior.
A repeat failure is simply defined as a recurring equipment difficulty that prevents it from
achieving its anticipated life expectancy. Repeat failures exist because we have perhaps
concluded that a particular failure mechanism is more economical to manage than to correct. If
allowed to persist, a repeat failure will eventually be perceived as a discretionary, low-risk
nuisance with no potential safety or environmental consequence. This defective risk
assessment approach is also known as "normalization of deviance" and must be resisted.2
Repeat failures build a reactive work order history in our maintenance management systems.
More often than not, the entries abound with useless information such as "bearing replaced"
when the entry "bearing failed due to oil starvation resulting from use of pressure-unbalanced
constant-level lubricator" would have added real value. Regardless, repeat failure work orders
tend to get buried under higher-priority items that represent a more immediate production
constraint. Repeat failures are often addressed only as time allows and without asking why the
failure occurred. Knowing why the failure occurred may require a failure analysisand
performing a failure analysis on something viewed as a low-consequence risk takes time away
from addressing immediate production constraints that show up on the daily maintenance plan.
In truth, this highly reactive "reliability strategy" is the trademark of a repair-focused
organization. While they might claim to be reliability-focused, such organizations exhibit few, if
any, of the requisite traits or do so in name only.
Extreme failures. While we are obviously not condoning repeat failures, extreme failures are
much more offensive. Extreme failures are "extreme" in every sense of the word and are
differentiated as:
Being of, or having the potential for, the most extreme consequences
Leaving behind extremely little physical evidence to readily expose a probable cause
Statistically, extremely improbable.
Also, because precursor repeat failures leave their tracks in the maintenance management
system, extreme failures, in retrospect, always appear to be very predictable. Therefore, the
maintenance management system contains not only evidence critical for investigating an
extreme failure, but also reproof for not taking preventive action. The following examples
illustrate the relationship between repeat and extreme failures.
The Hindenburg disaster: an extreme failure. The Hindenburg disaster is one of the most
identifiable extreme failures in the history of modern machines. The circumstances behind this
failure still stir considerable controversy and debate, led by various conspiracy and sabotage
assertions that accompany most extreme failures. The purpose of examining it here is to
demonstrate how the pattern shown in Fig. 2 applies to all extreme failures no matter where
they occur. Only by associating the extreme failure with its adjunct repeat failure can we
determine a fact-based credible scenario that moves us away from accepting theories fueled by
speculation.
The Hindenburg airship was built with a lightweight metal airframe held rigid by a network of
0.125-in.-diameter steel bracing wires under tension. Its outer covering consisted of cotton linen
painted with a metallic cellulose acetate butyrate "dope" to repel water and reflect sunlight.
Sixteen inflatable bags were filled with 7 MMscf of hydrogen to lift the airship, since the
preferred medium (helium) was not available.
Like every machine, the Hindenburg had an operating envelope and violating its limits would
greatly increase the mechanical failure risk. Operating procedures were used to mitigate these
failure risks, and the Zeppelin Company's enviable safety record was evidence of an effective
training program. Top among these procedures were strict rules governing landing maneuvers
to avoid exceeding the bracing wires' 1,000-lbs tensile force limit in the tail-to-fuselage section,
which absorbs the energy produced while turning the massive airship. Regardless, the
Hindenburg's maintenance records contain a history of bracing wire failures in the tail-to-
fuselage section.3
The Hindenburg's otherwise perfect transatlantic flight was spoiled by unexpected headwinds
that put it 12 hours behind schedule upon its arrival in Lakehurst, New Jersey. Eager to land the
ship without further delay, the captain ordered a risky sharp left turn after the wind suddenly
changed direction to quickly reorient the airship's nose back into the wind. This violated landing
procedures that required aborting the landing attempt if the wind shifted direction. Following
procedures was needed to safely point the airship's nose back into the wind without exceeding
the bracing wires' stress limit.
After making the sharp left turn, the captain noticed the Hindenburg suddenly becoming tail-
heavy. Since procedures also required landing the airship horizontally to avoid damaging the tail
fin, the captain released the remaining ton of water from the ship's rear ballast tanks (Fig. 3).
Several minutes later, the captain ordered six crewmen to the front of the airship to
counterbalance the continued tail section downward-slope. Next, he dropped the anchor ropes
from the airship's nose.

Fig. 3 Rear ballast tanks are emptied to avoid hitting
the ground after the Hindenburg unexpectedly
becomes tail-heavy during landing maneuvers.
On the ground, everything appeared normal. The ground crew grabbed the anchor ropes and
began walking the airship to the mooring mast. Before they were able to fasten the ropes to the
mast however, a fire broke out in front of the top tail fin, where evidence of a hydrogen leak (tail-
heaviness) existed after the captain deviated from procedures by executing a sharp left turn
after the wind changed direction. The entire airship burned from the tail forward, destroying all
physical evidence within 32 seconds. Thirty-five of the 97 people on board were killed along
with one ground crew member.
In hindsight, knowing that a repeat failure is somehow involved makes it easy to understand that
a bracing wire probably broke upon exceeding its stress limit, just as expected. While this failure
had occurred previously, this time the unstable wire penetrated a hydrogen bag and the
airship's outer skin, which set off a sequence of events that resulted in one of history's most
famous disasters. The repeat failure became extreme by an unlikely combination of contributing
factors:
A very tight schedule, made even tighter by strong headwinds during the flight
Procedure deviation
Hydrogen containment was lost
The failure occurred during a critical phase during the landing procedure
Light rain was falling, which made the anchor ropes capable of conducting an electrical
charge after becoming adequately moistened.
Some may wonder why the Zeppelin Company did not address the Hindenburg's design risk
with something more reliable than an administrative control procedure, like stress-resistant
materials in the vulnerable tail-to-fin section. But it is important to consider how the Zeppelin
Company's perfect safety record influenced its risk tolerance for bracing wire failures. In
hindsight, their maintenance records show that this repeat failure represented a discretionary
maintenance nuisance that could be managed with little inconvenience. Living with the failure
mechanism was, therefore, a more economical alternative. Would the choice to sacrifice a wire
in the interest of preserving the airship's remaining turnaround time have been considered
acceptable if the procedure deviation had not ended in an extreme failure? While the Zeppelin
Company's safety record was indicative of a reliability-focused organization it was, in fact, guilty
of making decisions associated with a repair-focused organization.
Inherently safe technology advocates will argue that the use of hydrogen instead of helium is
what caused the accident, while minimizing the impact of maintenance practices that led to a
loss of containment scenario. Whether or not helium was available to Germany in the mid 1930s
is not the issue here. In modern times we must operate responsibly because it is not practical to
make similar substitutions. To illustrate, let's turn our attention to industries where OSHA's
Process Safety Management (PSM) Standard (29 CFR 1910.119) applies. The standard's
purpose is to achieve safe and continuous containment of hazardous substances inherent to the
manufacturing process.
Spent caustic tank explosion. Refineries use caustic (sodium hydroxide) to purify liquefied
petroleum gas (LPG). As the caustic reacts with LPG contaminants, its concentration
decreases. In other words, it becomes "spent."
To maintain the minimum caustic concentration needed to continue the reaction, spent caustic
must be periodically removed and replaced with an equal volume of fresh caustic. In one
refinery, the spent caustic batches into a 35,000-gallon intermediate cone-roof storage tank.
From there the caustic slowly drains to the waste treatment facility (Fig. 4). This disposal
strategy absorbs large slugs of spent caustic that would otherwise upset the biological treatment
system.

Fig. 4 A degassing vessel was installed to vent
hydrocarbons from spent caustic before
entering the storage tank.
In 2004, a spent-caustic system hazard and operability (HAZOP) study concluded that operator
error could result in sending a large volume of LPG directly into the spent-caustic storage tank.
Upon entering the tank, the LPG would vaporize and release a propane vapor cloud into the
refinery. The history of fugitive vapor releases in refineries is not comforting; vapor releases
continue to be responsible for extensive equipment damage and fatalities upon ignition.
Therefore, a HAZOP action item was assigned to mitigate the risk for a vapor cloud release
from the atmospheric spent-caustic storage tank pressure relief system.
A degassing vessel was retrofitted in front of the spent-caustic storage tank and commissioned
on day 1 (actually in 2005). This system satisfied the HAZOP action item's purpose for
hydrocarbon removal from the spent caustic entering the tank. For most of the time the system
would operate in "fill" mode, where spent caustic from the upstream liquid/liquid LPG contact
process would stagnate in the degassing vessel while venting hydrocarbons into the refinery
flare header. After allowing sufficient time to pass, operators would perform a manual "dump"
procedure by opening the discharge valve under nitrogen pressure to drain its degassed
(vented) contents into the tank. Operators were expected to stand by the transfer valve during
this manual procedure, to verify that the liquid seal above the degassing vessel's discharge
nozzle inlet remained intact.
On day 529 (in 2007) the spent-caustic storage tank failed a leak detection and repair (LDAR)
test, with over 2,000 ppm hydrocarbon measured exiting the tank's atmospheric pressure relief
device (PRD). In compliance with refinery policy, a work order was issued to repair the leaking
PRD within 15 days of discovery. The repair involved tightening the bolts around the PRD to
stop the hydrocarbon leak.
After the repair, a second LDAR test was performed to confirm that the repair was successful so
that the work order could be closed. However, the LDAR test failed again with over 2,000 ppm
hydrocarbon being measured exiting the tank after the repair. In response, the results of the
failed repair attempt were logged in the maintenance management system and another repair
was scheduled. For the second repair, the PRD's sealing gasket was replaced.
The LDAR test failed again after the second repair attempt, with about 1,000 ppm hydrocarbon
detected leaking out of the tank. The maintenance management system was again updated with
the failure information, and a third repair attempt was scheduled. This repair was canceled,
however, because a final LDAR test conducted before executing the work showed zero ppm
hydrocarbon at the PRD.
On day 621 (2007) two contractors working near the tank both prematurely shut down their jobs
at the same time, after a foul odor from an unidentified source invaded their work area.
Operators were advised of the situation and they immediately responded by investigating the
problem. However, the source for the release was not positively identified because the odor had
dissipated by the time they entered the process unit to investigate the complaints. The
contractors were allowed to resume working in the area and the odor did not return.
On day 628 (2007) the spent-caustic storage tank exploded suddenly and without warning
shortly after operators initiated the procedure to drain spent caustic from the degassing vessel
into the tank. Because the operator had left the valve to attend to another part of the process,
there were no injuries or fatalities. However, the accident was severe. It caused the tank to
become airborne, spread fire into the unit, and interrupted spent-caustic disposal operations.
The damage imposed by the accident (Fig. 5) compromised any physical evidence that would
expedite root-cause identification.

Fig. 5 Spent-caustic storage tank after explosion.

Only after the incident were the repetitive LDAR failures and odor complaints recognized as
warning signals that hydrocarbon was leaking through the degassing vessel into the tank.
Remembering the ignition triangle, this satisfied the fuel requirement for an explosion. Although
50 years of reliable spent-caustic storage system operation had been experienced before the
accident, the refinery was faced with compelling evidence that elements of a repair-based
culture existed. This culture allowed three repeat failures (hydrocarbon vapor emission events)
without investigating why hydrocarbons were entering the tank after commissioning the
degassing vessel.

Fig. 6 Minimum nozzle submergence requirements (feet) to
prevent vapor entrainment when draining liquid without
a vortex breaker.
8

In the post-accident investigation, it was proven that the spent-caustic interface level did not
drop below the degassing vessel's drain nozzle at the time of the accident. Therefore, attention
shifted to alternative scenarios that would explain how hydrocarbons could penetrate the
degassing vessel's liquid seal. By chasing down this thread, the investigation uncovered
evidence that an unintended design condition existed, which allowed flare gas and LPG in the
degassing vessel to contaminate the spent-caustic storage tank during the draining procedure.
Since the degassing vessel was draining without a vortex breaker, it would have to operate
according to the nozzle submergence requirements shown in Fig. 6 to avoid entraining
hydrocarbon vapor in spent caustic. Archived process data provided evidence that the
degassing vessel operated outside of these limits (Fig. 7). This means that hydrocarbon vapor
was passing into the tank every time a transfer was made. The investigation uncovered
additional systemic defects that explain how the failure became extreme. These conditions
produced an unlikely combination of contributing factors:
A procedure deviation that made it possible for operators to transfer spent caustic
without using nitrogen, which greatly increased the amount of hydrocarbon vapor in the
degassing vessel headspace
The formation of a pyrophoric iron sulfide ignition source on the internal tank roof
surface
Oxygen in the tank.
Both examples strongly reinforce repeat failures' involvement in extreme failures. In every case,
a trustworthy and actionable cause emerges. It is based on evidence associated with a
preceding repeat failure.
Recall, however, that the goal of a reliability-based organization is to recognize the warning
signals and take action before an extreme failure triggers an accident investigation. The final
example shows how this can be accomplished by taking appropriate intervention steps upon
detecting a repeat failure.
Extreme failure avoidance. A five-stage, barrel-type, hydrogen recycle centrifugal compressor
similar to the one shown in Fig. 8 is in service in a large midwestern refinery's platformer unit.
The compressor operates at 8,200 rpm and processes a recycle gas flow of about 97 MMscfd.
The suction gas is contaminated with ammonium chloride. This situation is conducive to
depositing salt on the rotor, which has been the presumed source for a series of recurring
vibration events over the compressor's 30-year history.

Fig. 8 Typical barrel compressor internal bundle
assembly after casing removal.

Fifteen months into a stable run after overhaul, the compressor tripped offline and coasted to a
stop without lubrication following an unintended shutdown of both lube-oil supply pumps. After a
warm restart, vibration appeared to be stable and in general very similar to conditions before the
trip. Stable operation was interrupted a month later when the outboard radial bearing vibration
suddenly jumped to 1.7 mils.
Vibration analysis indicated that subsynchronous vibration had developed due to a fluid
instability problem that produced an "oil whirl" pattern. Two months later, the vibration profile
deteriorated further into an "oil whip" pattern. This resulted in increasingly unstable and
unpredictable vibration spikes exceeding 2 mils.
Reducing the frequency and severity of the vibration spikes was possible only by operating the
compressor at speeds below 7,600 rpm. The speed curtailment resulted in a significant
platformer unit rate cut. The economics favored shutting down the unit to repair the compressor
rather than continuing to operate the machine below its normal running speed. The repair plan
was limited to replacing the inboard and outboard floating-ring oil seals and tilt-pad radial
bearings. These components were suspected to have been damaged by the accidental loss of
lube oil. The repair plan also provided a rationale for the type of vibration experienced soon
after, which indicated a fluid instability problem characterized by oil whip.
When the machine was opened for inspection, the maintenance staff was pleased to find radial-
bearing and floating-ring oil seal damage consistent with their diagnosis. The damaged
components were replaced and the compressor restarted. Unfortunately, the unstable
subsynchronous vibration component remained at speeds above 7,600 rpm upon the
compressor's return to service.
A second repair at considerable expense was scheduled in response to this unfortunate turn of
events. Since the compressor barrel was to be opened for inspection, a complete overhaul was
planned. A comprehensive vibration study was performed to narrow down the repair scope. An
investigation was launched to determine if a repeat failure could explain this machine's long
history of what appeared to be unrelated, but persistent unstable vibration events at high speed.
Although the compressor is armed with an eddy-current type noncontacting shaft vibration
monitoring and shutdown system, "unstable" and "high speed" are words that do not go well
together in reliability and safety-based organizations. Therefore, refinery staff wanted to
determine if rotor fouling and other discrete events were somehow related. Among these events
the most recent one was where replacing the damaged components did nothing to correct
unstable vibration.
The vibration study provided evidence needed to determine both probable cause and,
ultimately, avoidance of a repeat failure. Fig. 9 shows how the subsynchronous component
adjusts to maintain a constant fractional relationship with the rotor speed. It is "locked-in" at a
rotating frequency of 3,000 cpm that corresponds to the rotor's first natural fundamental
frequency (critical speed). These characteristics apply to flexible rotors that operate above one
or more shaft critical speeds.
4
The compressor maintenance file contains a history of unstable
vibration events at speeds above 7,600 rpm. These events date back to 1985 and consistently
appeared within 18 to 24 months after overhaul. References document similar cases involving
the aerodynamic excitation of a rotor's first natural fundamental frequency.
5
This condition may
be experienced with flexible rotors, due to the gradual deterioration of damping properties
associated with normal operation after compressor overhaul.
6

Fig. 7 Actual degassing vessel operation compared
with minimum nozzle submergence
requirements shows vapor entrainment
occurring.


Fig. 9 Cascade plot showing a troublesome
subsynchronous vibration component "locked-
in" at 3,000 cpm along with expected
synchronous (1X) vibration.
Aerodynamic rotor instability was thus identified as the probable cause for the history of
compressor vibration events. This fact-based explanation developed the confidence
management needed to approve the investigation team's long-term recommendation, i.e., to
address the inherent instability by either redesigning or replacing the compressor. Most
importantly, it interrupted an extreme failure's life cycle that might have resulted in unacceptable
consequences, no matter what their relative "improbability." Bottom line: Tolerating repeat
failures is inconsistent with reliability-focused thinking.
The science of warning signals. As these examples illustrate, rarely will an extreme failure
occur simply based on a single, isolated event. Rather, extreme failures are produced when an
existing repeat failure combines with other factors that are statistically unlikely to coexist. By
way of analogy, repeat failures keep reappearing like bars on a gambling casino slot machine.
Repeat failures are common, predictable events that independently represent low risk. But
when all the bars line up, there is a payout. When certain deviations line up with repeat failures
you get negative payout in the form of an extreme failure.
This is the basis for the "coupling" argument introduced by Charles Perrow in his classic Normal
Accidents text. Perrow's basic premise is that complex systems are uniquely suited for two or
more independent and innocuous conditions to combine at once to produce an unexpected
catastrophic event.7 This principle is best reflected in our compressor example, where a flexible
rotor (the latent cause) is no problem at all until it interacts with the contributing factors that align
within 18 to 24 months of normal operation. Likewise, the normal deterioration from start-of-run
conditions expected after 18 to 24 months would have little impact on a rigid rotor's
aerodynamic stability operating in this specific service.
The benefit of recognizing and controlling a repeat failure is that eliminating only one of the
coupling requirements can mitigate the risk for an extreme failure. For example, the accidents
suffered in the case of the Hindenburg and the spent-caustic storage tank could have been
prevented had the repeat failures (snapped bracing wires and hydrocarbon leakage,
respectively) been resolved. It is more rewarding to trigger an investigation that prevents an
accident rather than investigating the accident you could have prevented.
What can you do? Knowledge about the relationship between repeat failures and extreme
failures adds value in two ways. First, it becomes possible to locate the facts we need to filter
our beliefs, so that a credible probable cause can be identified when physical evidence has
been compromised. Second, it promotes confidence that we control process reliability and
safety and will not let it control us. By recognizing warning signals we can take deliberate
actions to prevent extreme failures before suffering unacceptable consequences.
Since failure and accident prevention are the reliability-based organization's trademark, here are
a few suggestions:
Recognize repeat failures. Check reactive work orders and challenge the ones that
pop up regularly. Ask yourself, "Do I know why I'm working on this again?" Perform an
RCFA if the answer is no.
Follow and enforce procedures. Shortcuts tend to introduce risks that procedures
mitigate. Follow procedure steps in order. Communicate openly when you think there
may be a better way to execute a procedure or if the steps do not make sense or seem
out of order before deviating from them.
Use good judgment. When changing conditions or circumstances interfere with the
plan, don't be afraid to enter a holding pattern or call time out. Stopping a job makes
more sense than executing it unsafely.
Operate a near-miss awareness, reporting and investigation program. Ask
employees to report things that don't look, sound or smell right. Follow up on employee
concerns about unresolved problems. Resolve the issue and communicate findings
back to them. Look for trends that indicate a bigger problem looming.
Develop and apply internal RCFA skills. Our biggest opportunity lies with correcting
small failures to avoid the bigger ones. Ultimately, no time will be saved unless RCFA is
performed.
RCFA triggers must be linked to repeat failures. Many organizations tier their RCFA
levels according to safety, environmental and economic thresholds. Reserve a category
for repeat failures and measure improvement (reduction) over time. The maintenance
staff will appreciate reducing the backlog and their frustration over experiencing the
same problems. You also benefit in knowing that you are systematically mitigating the
risk for an improbable, yet far too costly, extreme failure (PSM incident).
Communicate and incorporate lessons learned. Lessons obtained by investigating
repeat failures extend far beyond the equipment type on which they occur. They will
benefit different units, areas, sites and even industries. Maximizing value from a single
failure involves communicating lessons learned effectively throughout an organization.
Lessons learned from outside resources can be obtained from numerous sources, such
as the annual NPRA Safety Conference (www.npra.org), semiannual API/NPRA
Operating Practices Symposium (www.api.org), the AIChE Spring National Meeting
(www.aiche.org), and the US Chemical Safety Board (www.csb.gov).
Above all, remember that the machines we build perform and respond exactly as expected
under the conditions to which they are exposed. Rarely, if ever, is the cause for a failure out of
our control. Be convinced that answers and solutions will come to those who act on their
responsibility to explain unacceptable equipment performance.
LITERATURE CITED
1
Taleb, N. N., The Black Swan, Random House, New York, New York, p. 196, (ISBN 978-1-4000-6351-2), 2007.
2
Bloch, K. and S. Williams, "Normalize Deviance at Your Peril," Chemical Engineering, 111, No. 5, pp. 5256, 2004.
3
"The Hindenburg Airship," Seconds From Disaster, Yavar Abbas, The National Geographic Channel, November 15, 2005.
4
Eisenmann, Sr., R., and R. Eisenmann, Jr., Machinery Malfunction Diagnosis and Correction, Prentice-Hall, Inc., Upper
Saddle River, New Jersey, p. 436, (ISBN 0-13-240946-1, out of print), 1998.
5
Nicholas, J. C. and J. Kocur, "Rotordynamic Design of Centrifugal Compressors in Accordance with New API Stability
Specifications," Proceedings of the Thirty-Fourth Turbomachinery Symposium, Turbomachinery Laboratory, Texas A&M
University, College Station, Texas, pp. 2534, 2005.
6
Eisenmann, op. cit., p. 436.
7
Perrow, C., Normal Accidents: Living With High-Risk Technologies, Princeton University Press, Princeton, New Jersey, p. 7,
(ISBN 0-691-00412-9), 1999.
8
Lieberman, N., Troubleshooting Refinery Processes, Penwell Publishing Co., Tulsa, Oklahoma, p. 272, 1981.

The author


Kenneth Bloch is lead process reliability engineer at Flint Hills Resources' Pine
Bend Refinery in Rosemount, Minnesota. He is responsible for mitigating and
investigating process-governed failures on refinery assets. A Certified API 510
Inspector, Mr. Bloch publishes articles on equipment failure analysis, life cycle
extension, and reliability improvement in Hydrocarbon Processing and Chemical
Engineering magazines, and is a regular participant and speaker at the
semiannual API/NPRA Operating Practices Symposium and annual NPRA
National Safety Conference. He holds a BS degree (honors) from Lamar
University in Beaumont, Texas.

You might also like