Monday, April 08, 2019

Boeing Confirms Software At Fault In Ethiopian Crash


Last Thursday, Apr. 4, Ethiopian Transport Minister Dagmawit Moges released a preliminary report into the crash of an Ethiopian Airlines Boeing 737 Max 8 outside Addis Ababa last month, killing all 157 people on board.  Cockpit voice recordings and data from the flight recorder make it very clear that, as Boeing CEO Dennis A. Muilenberg admitted regarding both this crash and that of an Indonesian Lion Air flight last fall, "it's apparent that in both flights the Maneuvering Characteristics Augmentation System, known as MCAS, activated in response to erroneous angle of attack information."  Boeing is currently scrambling to fix both that software problem and another minor one uncovered recently, but as of now, no 737 Max 8s are flying in the U. S. or much of anywhere else.  And the FBI is reportedly investigating how Boeing certified the plane.

When we blogged about the Ethiopian crash three weeks ago, there were significant questions as to whether the MCAS alone was at fault, or whether pilot errors contributed to the crash.  But according to a summary published in the Washington Post, Minister Moges said that the pilots did everything recommended by the manufacturer to disable the MCAS, which was repeatedly attempting to point the plane's nose downward in response to the single faulty angle-of-attack sensor output.  But their efforts proved futile, and the plane eventually keeled over into a 40-degree dive and crashed into the ground at more than 500 mph. 

Our sympathy is with those who lost relatives and loved ones in both crashes.  Similar words were spoken by CEO Muilenberg, on whose head lies the ultimate responsibility for fixing these problems.  In doing so, he and his underlings will be dealing with how to smoothly integrate control of life-critical systems when both humans and what amounts to artificial intelligence are involved.

This is not a new problem, but it has transformed so much over the years that it seems new. 

I once toured a museum near Lowell, Massachusetts which preserved a good number of the original pieces of machinery used in one of the many water-powered textile mills that used to dot the landscape in the early 1800s.  Attached to their main water turbine was a large, complicated system of gears, flywheels, springs, levers, and so on which turned out to be the speed regulator for the mill.  As looms were cut in and out of the belt-and-shaft power distribution system, the load would vary, but it was important to keep the speed of the mill's shafts as constant as possible.  The complicated piece of machinery I saw turned out to be a sophisticated control system that kept the wheels turning at the same rate to within a few percent, despite wide variations in load.

I'm sure that from time to time the thing might malfunction, and in that case a human operator would have to intervene, shutting it down if it started to go too fast, for example, or if continued operation endangered someone caught in a belt, say.  So humans have been learning to get along with autonomous machinery for almost two hundred years.

The difference now is that in transportation systems (autonomous cars, airplanes), timing is critical.  And because cars and planes travel into novel situations, not all of which can be anticipated by software engineers, conditions can arise which make it hard or impossible for the humans who are ultimately responsible for the safety of the craft to respond.  That increasingly seems to be what happened to Ethiopian Air Flight 302, as evidenced by the black-box data clearly showing only one angle-of-attack sensor to be transmitting flawed data. 

Such issues have happened numerous times with the limited number of autonomous cars that have been fielded in recent years.  We know of at least two fatalities associated with them, and there have probably been many more near-misses or non-fatal accidents as well. 

But even a severe car wreck can kill at most a few people.  Commercial airliners are in a differenc category altogether.  They are operated by (mostly) seasoned professionals who should be able to trust that if they follow the procedures recommended by the manufacturer (in this case, Boeing), they will be able to deal with almost any imaginable contingency, even something like a stray plastic bag jamming an angle-of-attack sensor (this is my imagination working, but something had to make it give an erroneous reading).  In the case of the Ethiopian crash, the implied promise was broken.  The pilots did what they were told would disable the MCAS, but it didn't disable, and with disastrous results.

It is unusual for a criminal investigation to be aimed at the civilian U. S. aircraft industry, whose safety record has been achieved under mostly cooperative conditions between the Federal Aviation Administration and the firms who make and fly the planes.  Obviously it is too soon to speculate about what, if anything, will turn up from such an investigation.  In teaching my engineering classes, I sometimes ask if anyone has encountered on-the-job situations whose ethics could be questioned.  And I have heard several stories about how inspection or test records were falsified in order to pass along defective products.  So such things do happen, but one hopes that in a firm with a reputation such as Boeing's, incidents like this are rare. 

The marketplace has ways of punishing firms for bad behavior which are not just, perhaps, but nonetheless effective.  With the growth of Airbus, Boeing knows it has a formidable rival for commercial aircraft, and any company with millions of dollars' worth of capital sitting idly on the ground as the 737 Max 8s wait for properly vetted software upgrades is bound to be having second thoughts about going with Boeing the next time they need some planes.  I would not want to be one of the software engineers or managers dealing with this problem, as the reputation of the company may be hinging on the timeliness and effectiveness of the fixes they will come up with. 

Boeing has been reasonably transparent about this problem so far, and I hope they continue to be up-front and frank with customers, regulators, investigators, and the public about the progress they make toward fixing these software issues.  People have been learning to get along with smart machines for centuries now, and I am confident that engineers can overcome this issue as well.  But it will take a lot of work and continued vigilance to keep something like it from happening in the future.

Sources:  The Washington Post carried the story "Additional software problem detected in Boeing 737 Max flight control system, officials say," on Apr. 4 at https://www.washingtonpost.com/world/africa/ethiopia-says-pilots-performed-boeings-recommendations-to-stop-doomed-aircraft-from-diving-urges-review-of-737-max-flight-control-system/2019/04/04/3a125942-4fec-11e9-bdb7-44f948cc0605_story.html.  I also consulted a  Seattle Times article at https://www.seattletimes.com/business/boeing-aerospace/fbi-joining-criminal-investigation-into-certification-of-boeing-737-max/ and the original report from the Transport Ministry of Ethiopia, which the Washington Post currently has at https://www.washingtonpost.com/context/ethiopia-aircraft-accident-investigation-preliminary-report/?noteId=6375a995-4d9f-4543-bc1e-12666dfe2869&questionId=7ad6fc9d-5427-415d-b719-34ad0b3fecfd&utm_term=.55ff25187605.

Facts, Investigations, and Rumors: The Houston Tank-Farm Fire


NOTE:  Due to an oversight, this blog was not posted last week.  It was intended to appear on Apr. 1.  My apologies in case you missed it.---KDS

As most readers of this blog know, most of what appears here is commentary on engineering-ethics-related news from other sources.  First-hand reporting is not my bag, if for no other reason that I don't have time for it, and there aren't a lot of sources who are willing to be called at 5 AM, which is usually around the time I'm writing it.  But a week or so ago I received some information almost by chance, and it puts me in something of an ethical dilemma.  Do I write about something that wasn't intended for publication or not?  Well, with certain precautions, I've decided to go ahead.

Here's the known and widely publicized facts:  On Sunday, Mar. 17, a fire began at the Intercontinental Terminals Company (ITC) tank farm in Deer Park, an industrial suburb of Houston.  It quickly spread and at one point involved 11 of the 242 tanks at the facility.  Firefighters could only spread foam on nearby tanks to keep the fire from spreading, but had to wait for the products contained in the tanks to burn away, which took several days.  These products included toluene, xylene, naphtha, and benzene, a known carcinogen.  The fire made a huge black plume visible for miles, and caused the closure of several local school districts for a day or two.  Authorities also temporarily closed a portion of the nearby Houston Channel to shipping due to the fire.

Naturally, the fire is going to be investigated.  Although no one was killed or injured as an immediate result of the fire, millions of dollars' worth of chemicals and plant facilities were destroyed, and an unknown amount of toxic chemicals was released into the air, the ground, and the water nearby.  Anything this consequential is worth investigating because of the lessons that can be learned to avoid similar accidents in the future.

An independent agency, the U. S. Chemical Safety Board, announced last week that it was opening its investigation into the accident.  This board is recognized for its thorough and reliable conduct of mishap investigations, which can take months or even years before a well-researched report is issued.  In the meantime, before such reports are available, the cause remains officially unknown, although the facts that wind up in the official report are presumably somewhere waiting to be investigated.

And in the meantime, the last thing any company official is going to do is talk loosely about what they think might have happened.  This explains the relatively small amount of information that ITC released on its own during the fire, which burned off and on for nearly a week.  Lawyers flock to major accidents like—well, I was going to mention a species of bird, but we'll just let it go at that.  Already the Texas attorney general has announced that he's suing ITC for the pollution caused by the fire, and other suits will follow as night follows the day.  And the less fodder given by a company's officials to lawyers to use against them, the better, as far as the company is concerned.

So much for officialdom.  Now for the rumors and unconfirmed reports.  I did manage to find a reference in a minor Houston news outlet (the Houston Press) to the following report:  "Also Wednesday morning, the Houston Chronicle was quoting an unidentified worker as says the fire may have been started when a tank overheated and a safety valve did not shut that down."  I was unable to locate the original Chronicle story, but (and here's my contribution to the mix), it fits in with what a friend of mine heard from his connections back in Deer Park, where he was raised and worked in the refining business for most of his career before retiring to my area.  For obvious reasons, he will remain anonymous here.

On the Friday after the fire began, he told me the following.  At a tank farm there are tanks, pipes, valves, and pumps to send the various products to nearby facilities or transportation points such as loading locations for tank cars and tank trucks.  Some of these pumps are quite large, using multiple-horsepower motors that consume many kilowatts of power.  If an order comes through to a technician to send a certain amount of product to a certain pipe, the appropriate pump is turned on first and then the valve is opened, because otherwise, unforeseen back pressure or other issues might cause products to go the wrong way and get mixed up.

But it's vitally important, especially when a large powerful pump is involved, to turn on the valve shortly after the pump is turned on.  If this isn't done, all the energy that the pump's spinning impeller puts into the liquid can't go anywhere and turns into heat.  And the product—often a flammable one such as naphtha—can get hot enough to rise past its flash point, so that once the valve is turned on and any air is present, the product will spontaneously catch fire.

Normally there are thermal cutout sensors that will detect when a pump's outlet overheats due to misguided operation such as this, and shut off the pump automatically.  But sensors sometimes fail.

What my friend heard was that someone turned on a pump in preparation for shipping some flammable product out of the plant, but due to paperwork or some other delay, the appropriate valve wasn't turned on for some 17 hours.  That's plenty of time for a pump without a safety thermal cutout to get its product way too hot.  And so sometime Sunday, the product caught fire, and the rest is very public history. 

Distorted through a rumor mill and two news outlets, that more detailed story fits in with the unconfirmed and unattributed report that a "tank overheated" and a "safety valve" (read: thermal cutout) did not shut it down.  So strictly speaking, I'm not reporting a scoop here.  But it does sound like confirmation of another unattributed report.

Whatever the rumors say, we'll have to wait for the Chemical Safety Board to interview everyone concerned, compile the data they can obtain from SCADA (supervisory control and data acquisition) digital records, and anything else they find that's relevant to getting to the bottom of this accident officially.  But in the meantime, plant operators everywhere should pay extra attention to pumps and valves and timing.

Sources:  Besides my friend, I consulted the report on the fire carried by Chemical and Engineering News at https://cen.acs.org/safety/industrial-safety/Houston-chemical-distribution-tank-farm/97/web/2019/03 and found the unattributed report of the overheated tank at https://www.houstonpress.com/news/itc-tank-fire-extinguished-but-some-school-remain-closed-11258138.