Monday, July 29, 2019

What Price Safety? The 737 Max 8 Saga Continues

In March and April, I blogged on the tragic and costly software problems plaguing Boeing's 737 Max 8 jetliner.  Briefly, after two crashes in Ethiopia and Malaysia in which a total of 346 people died, evidence pointed to a software problem in the fly-by-wire plane, and the U. S. Federal Aviation Administration (FAA) grounded the plane after numerous other nations did the same in March.  In May, Boeing claimed that they had fixed the software problem, and since then Boeing and the FAA have been running extensive tests to verify that the problem has in fact been solved.  On June 3, Boeing CEO Dennis Muilenberg said that he expected the FAA to declare the plane flightworthy by the end of the year, but declined to give a specific timeline. 

In the meantime, all 387 existing MAX 8s are sitting on the ground instead of flying and generating revenue for the airlines that own them.  This has caused big headaches for both American Airlines and Southwest, which recently announced that it is terminating service to New Jersey's Newark Airport simply because it doesn't have enough planes owing to the MAX 8 groundings.  And American's losses are running in the range of $400 million, largely due to the groundings.

Most of the time, when software fails to do what it should, the consequences are fairly minor.  If it's one feature on some software on your laptop that acts up, maybe you lose some work, or even get so turned off by the problem that you swear never to buy that software again. But you remain healthy and nobody dies.

Then there's the whole issue of software security, and making sure malevolent attacks don't disable or otherwise inconvenience users.  Software companies are used to dealing with such things by now, and generally stay up to date with patches that prevent hackers from doing major damage, as long as the users install the patches.

These kinds of environments are what most software developers are used to working in.  The bigger the organization and the more critical the software, the more bureaucracy is involved, but that's not necessarily a bad thing.  I spoke with a software engineer many years ago who worked for a regional telecommunications company.  She told me that she'd been spending most of the previous year on changing exactly one line of code.  The reason it took so long was that a bunch of other engineers had to take that change and try it out in all sorts of other situations and find out what its ramifications were, and whether it would cause problems down the road. 

Telecomm companies are rather shielded from competition, and so taking a year to change one line of code may be fairly routine, I don't know.  So maybe we shouldn't be that surprised if it now takes six more months for the FAA to make sure that the changes Boeing has made in their 737 MAX 8s are really going to make things better and not otherwise. 

Thing is, the phone company didn't have to shut down and wait for my software engineer friend to finish her job.  But when software is intimately tied in with a multimillion-dollar piece of hardware that you can't use just a little of, and the software makes the whole thing unusable, it creates a spectacle that we haven't seen since the week or so after 9/11/2001 when all domestic U. S. flights were grounded.  And that period, plus the general fear of flying it engendered, hit the airlines with an economic punch that took them years to recover from.

Fortunately, the MAX 8 problem doesn't appear to have frightened people away from flying in general.  Because of the scarcity of seats, the airlines have been able to charge more, and so revenues at American and Southwest are actually up, despite the shortage of planes.  Nevertheless, Boeing has set aside nearly $5 billion in case it ends up having to pay its customers for loss of revenue, and lots of airlines around the world are going to think very hard before they place any more orders with Boeing.

Unlike mechanical failures, software failures are not simply a function of physics.  Software is so dynamic and dependent on the exact conditions and history of its environment that it is virtually impossible to "prove" it won't fail under any circumstances, except in rare and rather academic cases.  Some day, I hope the whole history of this fiasco will come out, as it will be a fascinating study in how software engineering ethics failed in this instance, and it will harbor lessons for how safety-critical software should not be written. 

The problem with such a story may be that it could be too hard for anybody except specialized software engineers to understand.  But then again, it may boil down to management problems, as so many ethical issues do.  Already there has been speculation that the FAA was allowing Boeing to conduct too many of its own safety tests, and basically just taking Boeing's word for it that everything was okay.  Only when we have enough details about how the problems happened and how they were fixed, can we judge whether the FAA has been lax or negligent in this area.

In the meantime, software engineers everywhere except Boeing can be glad that their work is not going under the microscope of the FAA's inspection.  But there are plenty of other types of software that are life-critical:  for example, software for medical devices, automotive software, even the software that lets first responders communicate with each other.  A failure with any of these products can have life-threatening implications. 

So maybe the lesson here for software engineers is:  program as though your life depended on it.  If more programmers had that attitude, we'd all have much better software.  Maybe not so much of it, but that might not be a bad thing either.

Sources:  The report describing CEO Muilenberg's comments appeared on the CNBC website on June 3, 2019 at  Reuters reported on Southwest leaving Newark at  I also referred to the Wikipedia articles "Boeing 737 MAX" and "Boeing 737 MAX groundings."

No comments:

Post a Comment