A friend of mine who
formerly worked at NASA was talking about his volunteer work at his church,
which is to operate the video camera that records the pastor's sermon. He's going to ask them to buy a second
camera, and when I asked him why, he said, "Single-point failure. That camera goes out, we're up a creek
without a paddle."
The same concept that can be
applied to as humble and non-life-threatening a situation as recording sermons also
applies to highly complex systems such as Boeing's 737 Max 8, a new version of
the popular 737 aircraft that is in service around the world. But new evidence from the Mar. 10, 2019 crash
of a 737 Max 8 outside Addis Ababa, Ethiopia in which 157 people died indicates
that a single-point failure may be responsible for both that disaster and a
similar crash of another 737 Max 8 on Oct. 28, 2018 in Indonesia that killed
189.
The single-point failure possibility
involves a new anti-stall system called MCAS, which Boeing installed on the Max
8 version of their 737s when the two engines were moved forward compared to
earlier versions. Because this move made
the aircraft more prone to stall, the MCAS system was intended to make the
plane handle more like older 737s, reducing the need for extensive pilot
retraining. But evidently, pilots were
not thoroughly informed that the new MCAS system was in place and activated
until the Ethiopian crash brought attention to the system.
The system works by monitoring
information from two sensors called angle-of-attack (AOA) sensors. These are small fins that stick out from the
side of the aircraft rather like wind vanes, and rotate to sense the direction
of local airflow with respect to the fuselage.
In a stall, the plane is tilted nose-up excessively with respect to the
direction of airflow. This makes the
sensor turn at an angle that the onboard computers use to figure out that it's
time to take over the controls from the pilot and push the nose down.
Normally, according to a
post on aviationstackexchange.com, the onboard computer takes the output of
both AOA sensors into account, and if one indicates a stall and the other
doesn't, perhaps just a warning is issued to the pilot. But according to a New York Times report, the MCAS anti-stall system activates even if
only one of the two sensors says the nose is too high. If anything happens to make one of the
sensors give a false reading—a stray updraft from the backwash of a flight that
just took off, for example—the MCAS goes into action and pushes the nose down,
even if the takeoff is proceeding normally.
The altitude records of both
737s involved in the crashes in Ethiopia and Indonesia show that the pilots
went on a desperate roller-coaster ride, executing climbs and descents every
half-minute or so three or four times before the final descent and crash. This is consistent with a struggle between
the MCAS and the pilots, although other causes could be responsible as
well. Following the Ethiopian crash,
China and many other countries grounded all 737 Max 8 and Max 9 planes, and
later last week, the U. S. followed suit.
Boeing says it is working on
a software upgrade for the planes involved, but it may not be available until
April, and so until then, millions of dollars' worth of aviation assets will be
out of service. But that's better than
having another 737 crash on takeoff.
It is too soon to draw
definite conclusions about the causes of these crashes. That has to wait for the analyses of black-box
records and other pertinent data. But
investigators have already found that the horizontal stabilizer in the
Ethiopian plane was set to push the nose down, which is not something you
normally do on takeoff. And the fact
that the MCAS can be triggered by only one AOA sensor is enough reason to take
measures such as grounding planes until a remedy can be developed and
installed.
Planes are designed by
people who work in organizations, and successful designs of safe planes emerge
from an exceedingly complex process involving thousands of designers,
technicians, supervisors, inspectors, regulators, and other interested
parties. Successful companies manage to
evolve with new young staff replacing retired engineers and managers while
maintaining the core principles and knowledge that is essential to making
planes safe. And one of those core
principles, so easy to understand that even I get it, is to avoid
single-point-failure situations whenever possible by installing backup systems
and procedures.
If what the Times reported is true, someone dropped
the ball with regard to the MCAS system's behavior in response to only one
erroneous sensor. It could take months
or years to figure out how this design error happened. But the lesson is one that has to be learned
if Boeing is to recover from this sequence of disasters, which it probably
will.
It's also possible that the
accidents involved pilot error in combination with a misbehaving MCAS. If the pilots didn't know that the MCAS was
even installed, or were unfamiliar with what flying the plane with an activated
MCAS is like, their actions with regard to it may have contributed to the
crashes. Part of the problem here is
that the MCAS rarely activates under typical flight conditions. Perhaps there is something about the
meteorological conditions at the two airports involved which gave rise to a
single AOA sensor error that persisted long enough to cause the accidents.
These and other speculations
will have to await the full accident reports, which may not be available for
months. But in the absence of more
knowledge, grounding the 737 Max 8 and 9 planes until the single-point-failure
problem with MCAS can be addressed and demonstrated to be fixed is the wisest
course.
Sources: I referred to a New York Times report that appeared on Mar. 15, 2018 at https://www.nytimes.com/2019/03/15/business/boeing-ethiopian-crash.html. I also consulted articles from vox.com at https://www.vox.com/2019/3/16/18268646/ethiopian-airlines-lion-air-boeing-737-similarities
and qz.com at https://qz.com/1574441/a-warning-signal-that-could-have-prevented-the-lion-air-crash-was-optional/. A description of how angle-of-attack sensors
work is available at https://aviation.stackexchange.com/questions/2317/how-does-an-alpha-aoa-vane-work.
Postscript: After I posted this blog, I received an informative
email from a reader who wishes to remain anonymous. He has given me permission to post it here,
as it sheds more light on the concept of single-point failure:
Dear Mr.
Stephan,
I am writing to you in order to add some of my thoughts on your recent
post 'Are the 737 Max 8 Crashes Single-Point Failures?'
In determining a single point of failure it would seem necessary to
choose a boundary for the system or sub-system and also define what the goal of
the system is. In your example the goal is to record the pastor's sermon,
and the entire system consists of a video camera and an operator. So, in
this case the video camera is a single point of failure, and adding a second
camera provides redundancy.
In the case of the MCAS the overall goal is to maintain a safe flight
path. The failure of a single AOA sensor may cause the MCAS to make
inappropriate changes to the horizontal stabilizer trim. But there are
other subsystems that provide redundancy, most notably the stabilizer trim cut
out switches. The proper use of these switches is a memory item for both
pilot and co-pilot. So, in this broader context, the AOA sensor may not
be a single point of failure with regards to the goal of maintaining a safe
flight path.
This design was probably used as the failure to pitch down at the onset
of a stall at low level may result in a condition from which recovery is
impossible, while an erroneous change in horizontal stabilizer trim can be
corrected by timely intervention by the pilots.
And finally, I
would like to thank you for your blog. It provides detailed and nuanced
analyses of engineering problems that I have not found elsewhere.
(Name withheld)
Sources:
Boeing 737 technical
site: http://www.b737.org.uk/index.htm
Juan Browne's (a 777 pilot
and an air frame and power plant mechanic) YouTube channel:
FAA Advisory Circulars:
Mentor Pilot (a 737 pilot
and line training captain) YouTube Channel:
No comments:
Post a Comment