The Future of Automated Debugging and Software Testing with Harlan D Mills Award Winner Andreas Zeller

By IEEE Computer Society Team on

March 16, 2026

An interview with Andreas Zeller, recipient of the 2026 IEEE Harlan D. Mills Award.

Andreas Zeller is a Professor of Computer Science at Saarland University and a faculty member of the CISPA Helmholtz Center for Information Security, whose foundational work in automated debugging and software testing has redefined the standards of software reliability and security.

We connected with Dr. Zeller to discuss the evolution of automated test generation, the educational shift toward interactive textbooks, and the future of program analysis in an era of increasingly complex software systems.

Your work on Delta Debugging—automatedly isolating the causes of program failures—is foundational. For a junior developer today who spends hours in a debugger, what was the "aha!" moment that led you to realize this process could be mathematically formalized and automated?

Manual processes for simplifying inputs were well-known and are not exactly rocket science. The key discovery was the insight that one could start by trying to remove large chunks of the input, and then gradually make these removals smaller and smaller; this resulted in a very simple, yet very effective algorithm and a pretty universal automated process. When I first presented Delta Debugging in 1999, a researcher from France approached me and said: “I would never have thought that something so simple could be presented at a scientific conference”. But it took me a year to get it that simple!

What was the most significant challenge you faced when transitioning from a specialized researcher to a faculty member?

As a faculty member, you need to excel in many disciplines: You research, you teach, you lead, you write, you code, you present, you organize, you do politics. I transitioned from a specialist to a generalist and had to learn lots of things very quicky. But it is precisely the variety of tasks and challenges that I love in my profession, and I try to excel in as many and as much as possible.

If you could distill the mindset of a "master debugger" into one core habit for someone in their first year of software engineering, what would it be?

It is a bit sad that although debugging and maintenance takes up to 50% of the development effort, the essential skills are hardly taught. But good debugging does not need sophisticated tools or deep training. In a nutshell: Proceed systematically, using the scientific method: Set up a hypothesis, design an experiment, observe, and refine or refute your hypothesis according to your observations. And be explicit about your hypotheses – telling them to a rubber duck helps a lot. Better yet, write down what you thought, tried, and observed, so you can come back to your notes the next morning with a fresh mind.

How has your perspective on software testing evolved now that "bugs" are not just functional errors, but potential security vulnerabilities?

Not all bugs matter, and many systems have a long trail of bugs that may never be fixed. Vulnerabilities, however, are bugs that do matter – they are critical, they are valuable, and they can leave a lot of damage if not fixed. Fortunately, we now have a great set of techniques to detect and prevent the most common vulnerabilities – run-time checks for potential buffer overflows or accessing uninitialized memory, for instance, are extremely effective for catching bugs. Yet, we again and again have to train and remind developers to actually use these techniques; and once the shallow bugs are found and fixed, the remaining ones are harder to detect.

You’ve spent time at Microsoft Research, ETH Zürich, and the University of Washington. Has your time in such different places geographically and institutionally changed the way you approach your work?

First, I learned that the best institutions in the world all have the same high standards for excellence in research and practice, and it is lots of fun to aspire to these. Second, working with industry and industrial research is something I found eye-opening – you quickly realize what it takes to make your research results applicable at scale, and how important it is that your approach be easy to integrate into existing products and processes. Many researchers in my field seem to think that if only everyone would adopt their approach, the world would be a better place. That may be true, but then you as a researcher also have to create bridges that enable a smooth transition.

With the release of The Fuzzing Book, you’ve made complex automated testing accessible. Why do you believe fuzzing has become such a critical tool in the modern CI/CD pipeline compared to traditional unit testing?

The nice thing about fuzzing is that it is rather easy to deploy – you do not need knowledge about the program under test or its context. It suffices to have a few sample inputs and then the fuzzer will do its job mutating these inputs and trying to find new failures. In some way, it is a “fire-and-forget” technique – you set it up, and then it just churns along for days or weeks. However, fuzzers cannot replace traditional testing. For one, fuzzers mostly focus on bugs in input processing. And then, fuzzers do not check functionality beyond generic crashes or hangs. Yet someone or something still has to check whether the program is doing the right thing. And then, we are back at some specification or a suite of traditional unit tests – with suitable assertions, of course.

You were a pioneer in mining software repositories. For developers today who have access to massive datasets like GitHub, what is a common mistake they make when trying to draw conclusions from historical commit data?

The most important fallacy is to use commit data to rate the performance of individuals or groups. In some of the projects we analyzed, we found that among all contributors, the project lead was the one whose code had to be fixed most often later. But that was not because the lead was a bad programmer – this was because they had taken on the tasks that were most risky, tasks that required expertise that no other team member had. Another example was an investigation we did in industry on a piece of code that apparently had to be fixed every other day. Was this code so wrong? It turned out that this code was communicating with hardware products from several vendors. Each of these vendors had made a few mistakes in their implementation, but they already had produced thousands of samples for which they could not change the code anymore. So it was easier to change the “software” piece such that it would work around the mistakes in hardware. In the commit history, though, it looked as if the blame was all on the software. As a consequence, I learned to always talk to the people in charge about the context of your findings and how your numbers actually should be interpreted. You learn a lot this way.

As a co-founder of several tech start-ups, what is one piece of "academic" wisdom that you found didn't actually work when applied to the fast-paced environment of a commercial startup?

During my studies, we learned a lot of “wisdom” about rigorous software development and how soon we would be able to create fully formally verified systems which, once built, we would never have to change because they would be correct by construction. Fortunately, my co-founders and I also collected lots of advice from start-up practitioners, and we were very eager to adopt a much more agile, customer-oriented development style.

You have supervised over 20 Ph.D. theses. When you are looking for a new researcher or student to join your lab, what non-technical trait do you value most?

I usually look for general problem-solving capabilities. I ask the candidates for the most difficult problem they have encountered so far, and how they solved it. If your students find creative solutions for hard problems on their own, then as their advisor, your main job will be to help them generalize both problems and solutions into a contribution that will stand the test of time. This still sounds challenging but is much more satisfying than taking them by the hand for every little problem they encounter.

As the industry moves toward AI-assisted coding (like GitHub Copilot), do you believe the role of formal program analysis will become more or less important for the next generation of engineers?

Despite us having more and more capable AI agents producing code for us, there will still be humans who take responsibility for the final product. What will they base their assessment on? Someone will have to check that the code is doing the right thing – and also specify what the right thing is. You may trust your AI agent on that, but in the end, it is your head that is at stake. Automated testing and analysis tools (for which again other humans take responsibility) will help us rest assured that everything in our code is (still) fine, and that we can safely and responsibly release code into production.

Andreas Zeller will receive his award at ICSE this year.