Developer’s Confession: Why Philae Crashed

By Lori Cameron
Published 10/18/2018
Share this on:

Scientists made space exploration history on 12 November 2014 when, for the first time, a space probe landed on a comet.

The speeding target was Churyumov-Gerasimenko—a dirty cosmic snowball slightly bigger than Mount Fuji. The probe was the Philae lander—part of the European Space Agency’s Rosetta program.

“The launch occurred approximately 500 million kilometers from Earth, approximately 3 astronomical units from the sun, and 22.5 kilometers from the comet,” writes András Balázs, an embedded-hardware-and-software system engineer at the Wigner Research Centre for Physics.

However, the landing was anything but graceful. When Philae hit the comet, its anchoring mechanism failed, causing it to bounce and tumble for hours before crash landing in a crater 1.2 kilometers away from the site of first contact—plopped unceremoniously to one side with its foot sticking up in the air.

“Thanks mostly to the comet’s gravitational attraction, Philae still completed its touchdown,” says Balázs.

ESA staff during Philae landing
Staff at the European Operations Space Centre in Darmstadt, Germany on November 12, 2014 during the landing of the Philae craft. Photograph: Handout/ESA via Getty Images

Balázs was a member of the Philae’s development team and later received laurels from the International Academy of Astronautics for his team work.

Now he writes of the experience and some takeaways for colleagues in his article “A Comet Revisited: Lessons Learned from Philae’s Landing.

Balázs recounts the Rosetta mission and the serious problems the Philae team encountered with the probe’s hardware and software as well as its mission operations control.

How a tiny landing probe collided with a giant comet

It was a journey across the solar system lasting 10 years in order for the Rosetta spacecraft—carrying the Philae lander (pictured below)—to rendezvous with the comet Churyumov-Gerasimenko. Rosetta then orbited the comet, performing experiments and “mapping the comet’s shape and surface in detail never seen before,” says Balázs.

The Philae lander explored comet 67P Churyumov-Gerasimenko

It took Philae seven more hours to reach the surface of the comet. However, things went south when the anchoring mechanism, which should have tethered the lander to the comet, failed to deploy.

“The lander couldn’t attach itself to the comet, owing to unexpected, probably systematic failures in both parts of the dual-redundant anchoring subsystem and a malfunction of the non-redundant hold-down thruster,” Balázs writes.

In the image below, one of Philae’s three legs can be seen sticking up from behind a boulder after its crash landing, illustrating the difficulty Rosetta had in spotting the lander on the comet’s chaotic surface.

Philae landing
Photo credit: European Space Agency (ESA)

Below, the BBC’s The Sky at Night documentary, hosted by Chris Lintott and Maggie Aderin-Pocock, details the ESA’s mission to land the Philae module on the surface of a comet for the first time in history.

Below is a video clip of up-to-the minute coverage of Philae’s comet landing covered by NBC News back in 2014.

Six lessons learned from the Philae landing

Balázs’ study of Philae marks a “thorough, honest analysis of what went wrong, what was done, and, importantly, what more could have been done,” say Software magazine editors Michiel van Genuchten and Les Hatton.

Equally important are the six lessons he gleaned for computer scientists.

“The software community could benefit from more such evaluations of the problems that so frequently occur in projects,” write van Genuchten and Hatton.

Lesson 1: No One Can Conquer Fate

Sometimes a fix for fault tolerance can contain an error in itself.

“The irony of fate was that exactly at the final step of design, when we thought we had achieved perfect fault tolerance, we introduced a source of error that remained unnoticed for a long time. A cyclic redundancy check code was supposed to protect the received telecommands from misinterpretation. Nevertheless, the telecommand decoders sometimes (but not often) misinterpreted bit-serially transmitted cross-coupled telemetry packets (which were feedback noise caused by improper harnessing) as true emergency telecommands,” says Balázs.

Lesson 2: Being Prepared for the ‘Unbelievable’

The central onboard computer (CDMS) was so cold, it remained unpowered after being turned on.

“During the cruise phase, we faced an astonishing incident. In a very narrow temperature range around the CDMS’s thermal equilibrium (–27 degrees C), one of the dual-redundant processor units remained unpowered after being turned on,” Balázs writes.

Fortunately, another processor took over until a hardware-decoded emergency telecommand brought the first one back to life.

Lesson 3: When Even Redundancy Is Useless

Several unexpected events happened that forced the engineers to rework the lander’s anchoring strategy: an event-detection accelerometer became unusable, the touchdown sensor status cleared mysteriously after each evaluation attempt (not allowing subsequent evaluations to exclude any transient errors), and doubts arose over whether the anchoring algorithm would work.

Because they couldn’t access the lander’s hardward, they decided to revamp and upload new software.

“By implementing an ‘Or-Majority’ (Touchdown = A or MajorityOf(B, C, D)) voting scheme for four (A, B, C, D) touchdown event sources or paths, we made the detection of the touchdown event more robust against transmission errors and false alerts. We together with the anchor team also completely reworked the anchoring-control software algorithm, tested it on the ground, and uplinked it to the CDMS,” says Balázs.

“Even so, and even though Philae properly detected the touchdown event, the anchoring failed, which severely affected the rest of the mission,” he adds.

Lesson 4: Conflicts Can Exist between Safety and Science

To complete the scientific objective of the Rosetta team, Philae had to land safely on the comet, which would require a closer launch to preserve its battery supply. But to keep the Rosetta spacecraft safe, Philae had to be launched from farther away.

“In the end, we might have been able to compensate for the relatively slow hardware degradation, at least to partly save the long-term-science phase. In hindsight, this proved a bridge too far for all of us.”

“To not violate Rosetta’s safety margins, the spacecraft ejected Philae relatively far from the comet, which resulted in a prolonged descent. The candidate scientific experiments for the descent required more energy than the secondary battery could provide. So, science won out over safety in terms of a redundant battery supply,” says Balázs.

Lesson 5: Being Unprepared for the Conceivable

The team experienced so many problems with the anchoring system during the cruise phase, they knew it was very likely that Philae would end up tilted to one side, instead of being fixed firmly on its feet.

But they did very little about it.

“No one appreciated that such a scenario was realistic but didn’t necessarily have immediate fatal consequences. The entire Philae team suffered from a sort of groupthink and didn’t take measures in advance to prepare both the operational ground segment and (in particular) Philae’s onboard system,” Balázs writes.

Lesson 6: The Revenge of Missed Opportunities

Because the Philae team had so many mishaps to deal with, they lost sight of the primary goal—to gather scientific data.

“After not uplinking in advance the set of telecommands, the team showed again that it hadn’t realized the urgent need to establish favorable conditions—including flown orbits of Rosetta—for commanding Philae to collect and downlink scientific data, instead of focusing on investigating how and why the telecommunication units were degrading. It was of the utmost importance to get a second set of science data for drawing conclusions on comet evolution,” Balázs writes.

Perhaps the mishaps couldn’t have been avoided. Still, the lessons learned will inform future missions.

“The Philae team exploited the onboard autonomy and flexibility in many respects, but not to the extent that would have been—in retrospect—purposeful. In the end, we might have been able to compensate for the relatively slow hardware degradation, at least to partly save the long-term-science phase. In hindsight, this proved a bridge too far for all of us,” says Balázs.

 

Research related to astronomy in the Computer Society Digital Library:

 


 

About Lori Cameron

Lori Cameron is a Senior Writer for the IEEE Computer Society and currently writes regular features for the CS website and publications. Contact her at l.cameron@computer.org. Follow her on LinkedIn.