Beyond Boundaries: A Conversation with Premkumar Devanbu, Harlan D. Mills Award Recipient

IEEE Computer Society Team
Published 02/20/2024
Share this on:

Debendra Das SharmaWith a rich academic background spanning from a degree in Electrical Engineering from IIT Madras to a Ph.D. in Computer Science from Rutgers University, Devanbu began launching into his career path, becoming a pioneer in statistical modeling. After contributing decades of work to Bell Labs, he now serves as a Distinguished Research Professor at UC Davis. Throughout his career, he’s been driven to improve the productivity, reliability, and quality of practical software systems. His contributions vary, including leading GENOA, a groundbreaking analysis tool for code, to groundbreaking research on the utilization of Merkle hash trees for secure data outsourcing. Additionally, his work, “Naturalness of Software,” dating back to ICSE 2012, showcased how language models can effectively model source code. From pioneering insights into the repetitive structure of code to the development of statistical methods, Devanbu showcases the impact of transformative research.

In honor of his many achievements, he has received the IEEE Computer Society’s 2024 Harlan D. Mills Award for, “…impactful contributions to the statistical modeling of source code and development practices, to improve software tools and processes.”

 

You have published several award-winning papers, congratulations! Tell us about the paper you are most proud of, and how it influenced the industry. How did it feel to receive the recognition that it did?


The paper I am most proud of is our paper on the “Naturalness of Software”, from ICSE 2012. The work on this paper began with a series of debates between me and my UC Davis colleague, Zhendong Su, over his earlier (FSE 2010) paper about the “Uniqueness of source code.”– In this paper, he and his student Mark Gabel presented compelling evidence that source code is surprisingly repetitive…indeed most short-to-medium length token-sequences of source code tokens are non-unique (viz., are repeated elsewhere in a large corpus); only very long token sequences are unique. These results made Zhendong, myself, and our post-docs Abram Hindle and Earl Barr wonder, sometime around the summer of 2011, what can do with this finding. Can we Exploit it to help programmers?

We organized a reading group on Natural Language Processing (NLP), and then quickly realized that the statistical methods that had revolutionized NLP could be fruitfully applied to generative tasks in programming practice. Our basic insight was this: Code isn’t just repetitive: it’s repetitive in a way that can be effectively captured by statistical language models developed in the field of NLP! Abram trained up some n-gram language models (state-of-the-art at the time) on a code corpus and showed these models could effectively capture the repetitive structure of code, and fruitfully complement existing code-completion tools built into a current IDE (Eclipse). This exciting result led us to formulate a vision of how generative statistical methods in NLP, could create a revolution in programming tools. An initial paper was published at ICSE 2012; we gratefully received a Large Grant From the U.S. National Science Foundation and then published a longer vision paper in CACM in 2016. Concomitantly, the field of NLP entirely transformed, with the advent of Sequence-to-sequence Deep Learning models; Transformers appeared later, and things just exploded in terms of generative statistical models for code. We saw the emergence of TabNine, and then CoPilot; more recently, Google announced Didact. It’s fair to say, that language models have utterly transformed coding practice.

It’s been incredibly gratifying and humbling to observe the creative forces that have been unleashed. We feel very fortunate to have been so early on this. It’s also noteworthy that we actually contemplated, during the fall of 2011, whether we should patent the idea of Language Models for code completion and code generation, and even discussed doing so, with the IP folks at UC Davis. We didn’t; we concluded at the time that it would be better to let this idea develop organically, without any monopolistic IP protection. I’m convinced that was the right decision.


Honor your colleagues’ achievements. Nominate Someone for a Major Award Today!


 

Speaking of, not only have these papers received awards, but they also withstood the test of time. Can you share insights into the key factors that contributed to your work’s lasting impact? Where do you hope the technology will head towards?


Ultimately, in my view, success in research is about having great students and colleagues to interact with, and a whole lot of pure garden-variety luck. I think I’ve had more than my fair share of all of that. But, if you really want my personal opinion on this (for what it’s worth):

Good research is all about asking the right questions. The answers are secondary. If you ask the right questions, you’ll move the field forward. So even if there are no clear, decisive answers to start with, if one has a novel, interesting, and important question, one should keep pushing! When I think back on our work from Davis over the years—our best, most interesting work began with someone asking an interesting question. How can we exploit the repetitiveness of software? What if open-source software data used in observational studies are biased? Do open-source software teams organize themselves somehow? Do large language models repeat mistakes made by humans?

Secondly, software engineering is an applied discipline, where value to practitioners is just as important as theoretical elegance. Given any experimental or theoretical finding, one should also consider if/how it helps produce software faster-better-cheaper. I hasten to add that this is not another diatribe about “Effect sizes”, which sadly have become yet another Ritualistic Rejection Formula in our conferences. It doesn’t matter if your innovation has only a small effect—if it is interesting, and adds practical value (or has the potential to do so) it’s worth working on.

Your early work on GENOA played a pivotal role in your passion for improving the productivity, reliability, and quality of software systems. Can you elaborate on how your experience at GENOA influenced your subsequent research directions?


GENOA (ICSE 1992) was essentially a nice little query language for ASTs and a query engine that could be retargeted to any existing parser that built an AST. When I built it, I was an AI PhD Student working in an AI Research department at Bell Labs, which was focused on Formal Knowledge Representation (KR). My boss at the time, Ron Brachman (one of the pioneers of formal KR), tasked me with the project of exploring how to use KR for Software Engineering. GENOA was essentially designed to extract facts from source code and populate Knowledge Bases. As it turned out, software teams really didn’t want Knowledge Bases about their code, but they really wanted Coding Standards checkers to check their code: these, luckily, you could build quite easily with GENOA; furthermore, since GENOA was built around existing compilation tools, the resulting checkers were easy to incorporate into complex, pre-existing build systems, which was a huge win. It was deeply satisfying to see GENOA finding use in large software projects. My management let me move away from KR into building software tools. By this time, I’d met several colleagues at Software Engineering meetings (like Dewayne Perry, Laurie Dillon, Walter Tichy, Carlo Ghezzi, David Rosenblum, Alex Wolf, Elayne Weyuker, and Pamela Zave), and was very impressed with their scholarship, collegiality, and insights, and so, I switched fields from formal AI to Software Engineering. In retrospect, it was a great decision!

 

Your recent work focuses on helping human developers make better use of language model outputs. Could you discuss the challenges and goals in this area, and how your research aims to enhance the safety and effectiveness of utilizing this technology?


Language models make mistakes. They can produce buggy code. One of our studies (done by Kevin Jesse, MSR 2022) evaluated language models over a corpus of “simple stupid bugs” which consisted of single-line mistakes that human developers had made (and later fixed). We found that language models tended to reproduce human errors at a surprisingly high rate, despite being trained after developers had fixed these bugs. This led us to the question, when should programmers trust the code produced by language models, and when should they not? Can LLMs per se give humans an indication of when their generated outputs can be trusted? This relates to the broader problem of joint human-AI decision making, and more specifically to the issue of whether LLMs are well-calibrated. Viz., when they are more confident in their generated output, are they also more likely to be correct? While this issue has been explored in non-generative settings in software engineering (e.g., classification) there are complications in generative settings in code, since we have to carefully define what we mean by correctness.

While this is a difficult challenge, it’s an extremely worthwhile one: Generative LLMs are going to be really important in software development, but they can and do make mistakes; still, we have to find ways to help humans effectively use these outputs.

 

Your realization in 2017 that code is bimodal, allowing both algorithmic static analysis and statistical modeling, has led to innovative advances. How have bimodal approaches influenced or changed training, pre-training, and prompt engineering, particularly in applications like syntax error correction, code summarization, repair, and completion?


If programming languages are languages like any other, then utterances in a PL are programs. But unlike natural language utterances, humans writing programs are aware of two entirely different audiences: the human programmers who maintain programs, and the computers which run them. These two audiences have very different requirements. For a computer, a given program has the same formally-defined semantics, the first time and every time; the formally-defined semantics also enables various types of algorithms, such as automated meaning-preserving transforms, that can be exploited by compilers, obfuscators, etc. A Human reading a program, on the other hand, could be impatient, distracted, confused, or lacking in background knowledge; they might interpret the program differently, at different times. This second human-human channel induces developers to write code in a “natural” way that minimizes the chances of misunderstanding; developers prefer clear, well-contextualized variable names, standard forms (e.g, i = i +1, rather than i = 1+i), standardized indentation and commenting forms, etc. These practices have little or no effect on the computer’s understanding of the code, but help humans, and, as a side-effect, make the program text much more predictable, and amenable to machine learning. We have done corpus and human subject studies (with Casey Casalnuovo, Kenji Sagae, and Emily Morgan) showing evidence for this.

Bimodality refers to the dual “formal” and “natural” nature of code. This duality allows us to explore Synergies by using e.g., the formal manipulations to support machine learning, and the “naturalness” to support better algorithms. For example, we can rewrite “natural” code from GitHub into meaning-preserved but odd forms, and train a model to rewrite it back to the natural form. By doing this, we are teaching models to both understand code and to write it back in a form that humans prefer. We showed that models thusly trained to do better at several standard tasks.

 

Considering your successful career trajectory and diverse contributions, what advice would you give to early career professionals aiming to follow a similar career path?


Most importantly, I’ve been tremendously lucky with finding great students and colleagues. Besides the all-important aspects of good colleagues and luck, a few habits have helped me:

      • I try very hard to meet people from other fields, and talk to them: certainly fields close to software engineering, like PL, and Formal Methods. But certainly, also ones where the connection was less tenuous: Statistical Physics, Bioinformatics, Natural Language Processing, Psycho-linguistics, Sociology, Science & Technology Studies, and Political Science. I’ve met fascinating people in all these fields. SE is an applied discipline. We. Got. Problems. The solutions could be anywhere, so why not talk to any/all interesting people who are willing to talk?

      • At conferences, I try to talk to as many students, as want to talk to me. First of all, by talking to young people, you get exposed to a lot of new ways of thinking. Second, you meet (very early) the leaders of the future; finally, students help you recruit other students (from their institutions) in the near future.

      • I try to absorb anything, any kind of scientific content, especially in an experimental/empirical vein, that seems interesting, and that I can understand: psychology, economics, biology, medicine, sociology, and physics (I’m not big on pure Maths). As I said, we. SE people. got. Problems. Who knows where the solutions may lurk?

Throughout your career, what connections have been the most meaningful among the computing community, and how did you meet them? How did these connections play a role in your overall career?


I would say the most important, enduring, and fruitful connections have been with colleagues from slightly removed areas—who had access to different problems, solutions, and perspectives than I did. I’d like to mention a colleague in Security, Stuart Stubblebine, with whom I had a multi-year collaboration relating to data integrity. Another lasting collaboration was with Michael Gertz (a database researcher) and Chip Martel (an Algorithms person) with whom I teamed up to work on secure data structures that provided guarantees of correct query processing, even with untrusted compute servers. More recently, I had a wonderful decade-long collaboration with Vladimir Filkov, who was originally a bioinformatician, which led to a series of test-of-time award-winning papers in the area of open-source software data analytics. Finally, my collaboration with Zhendong Su, who has deep roots in Programming Languages research, led to the work on language modeling for code.

More About Premkumar Devanbu


Prem Devanbu holds a degree in EE from IIT Madras, and a Ph.D in CS from Rutgers University. After some decades at Bell Labs, he joined UC Davis, where is now Distinguished Research Professor of Computer Science. His early work on GENOA (a general-purpose analysis tool for code) led to an enduring passion for finding ways to improve the productivity, reliability and quality of practical software systems. Between 2000 and 2005, he worked on the use of Merkle hash trees (aka “blockchains”) for secure data out-sourcing. Starting around 2005, his work shifted to study the copious amounts of time-series data available in open-source repositories: how can this data be used to help improve software tools and processes? Devanbu has published several test-of-time award-winning papers which studied various aspects of the Data Science of software-related data, including: developer social networks, data quality, and modeling challenges.

In 2012, Hindle, Barr, Su, and Devanbu published their “Naturalness of Software” work, introducing the notion that language models can effectively model source code. Subsequent work showed that discrete language models, customized for source code, worked for several tasks, including both code completion and code de-obfuscation. This work included a nested cache model, which could beat contemporaneous DNN auto-regressive models. Around 2017, working with Earl Barr and others, Devanbu realized that code is bimodal, allowing both algorithmic static analysis and statistical modeling. This has led to a line of work of bimodal approaches to training, pre-training, and prompt-engineering, for applications such as syntax error correction, code summarization, code repair, and code completion. Most recently, Devanbu’s work has been directed at helping human developers make better, safer use of output from language models. Devanbu has won the ACM SIGSOFT Outstanding Research Award, and the Alexander von Humboldt Research Prize. He is an ACM Fellow.