Leaving a Mark: A Conversation with Ming Li, W. Wallace McDowell Award Recipient

IEEE Computer Society Team
Published 03/21/2024
Share this on:

Ming LiMing Li, a University Professor at the University of Waterloo, has made major contributions to modern information theory and bioinformatics, leaving an impactful mark. Not only is he an IEEE Fellow, he has been elected fellowship status at the Royal Society of Canada (RSC), the Association for Computing Machinery (ACM), and the International Society for Computational Biology (ISCB). Dr. Li has been at the forefront of innovation, collaborating with colleagues to systematically develop Kolmogorov complexity and its applications, forming one of the fundamental pillars of computer science. While working with Pail Vitanyi, their book, ‘An Introduction to Kolmogorov Complexity and Its Applications,’ shaped a general understanding of the field and provided a foundation for researchers within various areas including deep learning and large language models.

Dr.Li is a pioneer in bioinformatics and has changed the process of homology search. His series of cross-field publications in PNAS 2017, Nature Methods 2019, Nature Machine Intelligence 2020, 2021, and 2023, Nature Comm 2022 and 2023 have solved challenges in neoantigen discovery and validation.

In honor of his many achievements, he has received the 2024 W. Wallace McDowell Award for, “…pioneering and enduring contributions to modern information theory and bioinformatics.

 

Creating new connections and collaborating are both key in the fast-paced world of technology. As a long-standing member and fellow of IEEE, can you share a meaningful connection that you’ve made while being a part of this community?


The most relevant connection other than my own students is that with Paul Vitanyi, Tao Jiang, and John Tromp. With Paul, through life-long collaboration, we have written our book “An Introduction to Kolmogorov Complexity and its Applications”. We spent 10 years writing the book and many other years revising it for newer editions (a total of 4 editions).


Honor your colleagues’ achievements. Nominate Someone for a Major Award Today!


The extension of Kolmogorov complexity is ground-breaking. Could you share a specific application of Kolmogorov complexity that you find particularly intriguing or impactful, and how it has contributed to the field of computer science?


Human and animal cognition and k-shot learning may be formulated by information distance (conditioning on a pre-trained model) which boils down to compression. If we only depend on compression to learn, why do humans have better learning abilities? Where does our consciousness come from? One possible explanation is our ability to label data. Language, for example, may be seen as a tool for labeling data. The better a species can label the data, the better the species can learn. Does consciousness also come from proper labels of me, others, life, time, families, countries, and ideologies?

 

You have made several contributions to the fields of modern information theory and bioinformatics. Can you share another breakthrough discovery of yours that you’re most proud of?


One always remembers one of the first papers one wrote. In the early 1980’s, the community was interested in looking at the power of different types of Turing machines. It was known that one Turing machine tape can simulate 2 or more tapes in quadratic time, this was proved by my supervisor, Juris Hartmanis, and his colleagues in 1965. Then in 1985, Wolfgang Maass, Paul Vitanyi, and I proved tight lower bounds that you cannot improve such a simulation. But what about 1 tape simulating two pushdown stores? To my surprise, I obtained an n to the power of 1.5 upper bound for nondeterministic Turing machines. This result also implied the solution to several other open questions. This was my first paper published at the IEEE FOCS conference.

Your book with Paul Vitanyi has left a lasting impact. What motivated you to co-author the book, and how has it influenced the next generation of researchers and practitioners in computer science?


In 1986 I met Paul Vitanyi at the STOC conference at Berkeley. We thought as far as computer science is concerned, a sequence has two aspects: 1) the amount of information it contains; and 2) the resources it consumes to process it. These will form two theoretical pillars of computer science.

The second one is formulated as computational complexity by my supervisor, Juris Hartmanis, and his generation of computer scientists. The first aspect is well captured by Kolmogorov complexity, but at the time in its infantile stage. While touring the Napa valley, Paul and I decided that we would make this a systematic science by writing a book about it: unifying its fundamental concepts, extending the theory, and especially show how useful such a theory is by presenting a plethora of applications. 38 years have passed. We are happy to see that the theory is widely adopted in computer science, AI, philosophy, psychology, physics, mathematics, and biology, frequently discussed in places like Reddit, and popular journals like Scientific American and New Scientist.

The incompressibility method is intriguing. Could you provide an example of a long-standing problem that you were able to solve using this method and any challenges you overcame throughout the process?


The idea of the incompressibility method is as follows: to analyze the the average time complexity of an algorithm, one must analyze the time complexity for each input of a certain length and then take an average. This method is not feasible. Analyzing an algorithm of one typical input whose time complexity is the same as the majority would be better. But how do we find such an input? It turns out that such an input exists, but we can’t construct it. Most inputs have such property, though we can’t put our hands on them. These inputs are exactly Kolmogorov random inputs or inputs constructed from Kolmogorov random strings.

Let me give an example, informally. Shellsort was proposed by DE Shell in 1959. Its average case analysis remained as a challenging task— consider a p-pass Shellsort algorithm. The input is a permutation of 1 .. n. There are n! ways of permuting them hence we can fix a Kolmogorov random permutation (input) with Kolmogorov complexity at least log n!. Let m(i,k) be the number of steps k-th element moves in the i-th pass. Then the time complexity T(n) is the sum of all m(i,k)’s. But from these m(i,k)’s we can reconstruct the original input permutation. Thus the descriptions of these m(i,k)’s must be at least log n!. Now, we can solve the time complexity T(n)=pn^{1+1/p}. When p=1, this is BubbleSort, with n^2 average case lower bound.

More About Ming Li


As a University Professor at University of Waterloo, Ming Li has made pioneering and enduring contributions to modern information theory and bioinformatics. Li won Killam Prize, and was elected as fellow of RSC, ACM, IEEE, and ISCB.

Together with his colleagues, especially Bennett, Gacs, Vitanyi, and Jiang, Li has systematically developed Kolmogorov complexity and its applications, one of the fundamental pillars for computer science. He extended Kolmogorov complexity from one sequence to two (STOC’93), to measure not just information within one sequence, but also information between two sequences. He has also developed the incompressibility method to solve several long-standing open problems in average-case analysis of algorithms. His book with Paul Vitanyi shaped this new field and is regarded by many readers on Amazon.com as a classic, rated 5 stars. In today’s information society, his book helped to educate a generation of researchers and practitioners about what is information, and provided a foundation or spiritual guide for many research fields including deep learning and large language models.

Li is a pioneer in bioinformatics. His work (with Blum, Jiang, Tromp, Yannakakis) on shortest common supersequence, provided a fundamental background for shotgun sequencing and is described in detail in well-known computational biology books. The optimized spaced seeds by Li’s team (with Ma and Tromp) have changed the way we do homology search. In 2017, Nature Biotechnology has raised the challenge of neoantigen discovery and validation using mass spectrometry. Li solved the problem with a series of cross-field publications in PNAS 2017, Nature Methods 2019, Nature Machine Intelligence 2020, 2021 and 2023, Nature Comm 2022 and 2023. His pipeline is serving the community: from a tumor sample to immunogenic neoantigens. His PEAKS Online (Nature Comm. 2022) is serving over 4000 pharmaceutical and institution users, and was used in over 5000 papers as a tool.