Pages: pp. 4-6
A lovely volume graced my desk recently — at least, as gracefully as a hardbound, 1,100-page book can. The Practical Handbook of Internet Computing, 1 edited by my predecessor as Internet Computing's editor in chief, Munindar Singh, is a collection of 57 chapters that attempt to define the space of, well, Internet computing .
The book is divided into six parts. It begins with "Applications," 11 examples of systems you can build over the Internet, such as voice-over-IP telephony, digital libraries, and collaboration environments. Eleven examples, of course, do not exhaust the list of possible Internet applications; devoting the entire 1,100 pages to applications at this level of detail wouldn't begin to exhaust the possibilities.
Starting with applications is a sign of an emerging trend. Traditional books in science, mathematics, and engineering often take a top-down approach. After hand waving at why the topic is important, they begin with a collection of principles, progress through an increasingly complex set of technologies, continue with techniques, and finally conclude with detailed examples of how the treasures related in the previous chapters might actually be applied. In computer science, this unfolds by starting with the data structures and algorithmic analysis of a domain, algorithms applicable to that domain, and then following with a prescriptive description of the right way to do things. Only at the end of the book do we see the "real-world examples" meant to illustrate the high-level points of the rest of the text. A critical motif of standard computer science is that the techniques and technology illustrated in the first parts of the book are useful for far more than the particular applications at the end.
However, over the past decade, pressure has increased, especially from funding agencies, to focus on harnessing the power of information systems in the service of other fields — that is, "bioinformatics" rather than applying database or search technology to biology. A handbook that starts with applications is a signpost of this trend.
This isn't necessarily bad. Being application-centered has many virtues. Real applications are greatly informative to technology. The trick is to partner with application areas, rather than becoming subordinate, and to emerge with a coherent discipline rather than a collection of island tricks. It will be a great win if we can apply the search mechanisms invented in bioinformatics to logistics or animation, and a great loss if the mechanisms of bioinformatics turn out to be only about biology.
The book regresses to a more conventional historical development in the next section, "Enabling Technologies." The theme here is technologies developed elsewhere that have both become critical to the Internet and flourished because of it. Examples of such technologies include information retrieval, agents, and digital rights management. (I suspect that workers in these fields find them no more subordinate to Internet computing than computer scientists find their work in bioinformatics subordinate to actual biology.)
The "Information Management" section climbs up a semantic ladder, going from chapters on the syntactic (XML), through heterogeneous data techniques, and on to Web semantics. The book then descends to the high-level fluid dynamics of Internet plumbing in a section on "Systems and Utilities," touching on topics such as directory protocols, middleware, and caching. The concluding sections deal with "Engineering and Management" (primarily software engineering, various security issues, and network management) and a grab bag of "Systemic Matters" about social issues and governance.
The handbook's scope engenders two questions: first, is this a representative description of the field of Internet computing, and, more concretely, is Internet computing a field?
It's a solid collection. Most papers in the book are directly informative to the reader and succeed at describing their territory at a level at which the educated computer professional can come to understand the issues and approaches involved. However, the overall feeling is one of omission — 57 chapters barely scratch the surface of all the applications and technologies associated with the Internet. I can't help feeling an echo of the Alan Perlis aphorism, "If you have a procedure with 10 parameters, you probably missed some."
Science divides itself into disciplines, and scientific disciplines tend to divide themselves into subdisciplines. This division is sometimes based on the topic being studied. For example, biology studies "living things." Subdivision can also be based on the particular basic assumptions and techniques being applied. For example, statistics, operations research, and artificial intelligence all try to glean understanding from models of the world, but differ in their focus on how to construct and manipulate those models; economics, psychology, and sociology all try to explain why people do the things they do, but differ on the assumed primary motivations and interesting conclusions.
Ernest Rutherford famously observed, "All science is either physics or stamp collecting," contrasting data gathering (for example, collecting samples of species of beetles) with mathematically formulated theories. To my mind, this 57-chapter definition of Internet computing resembles a computational stamp collection — here are many of the exotic specimens you can find on the Internet.
"Biology is the study of living organisms" is a fine definition because the domain is both fairly well delimited and has a certain uniformity. All living things reproduce and decrease entropy, and mechanisms such as DNA, metabolism, and evolution find themselves applied consistently to the domain of living things.
Saying that Internet computing is "the study of anything having to do with the Internet" is clearly less satisfactory. It is a definition tied too closely to a particular moment in technology. When computing mechanisms have pervasively spread to every doorknob and plumbing fixture, computers have become as embedded in the natural fabric of economic existence as printed language, and interconnectivity is as omnipresent as radio reception, will it still be Internet computing? (Although you might counter, "When we've genetically engineered/nanotechnologied animals and devices with both inorganic and organic parts, will biologists still be the ones studying them?)
Of course, we can hardly expect to reduce Internet computing to something akin to Schrödinger equations. Rather, I'd prefer to follow Peter Denning's lead 2 and ask, "What are the underlying principles we use to define Internet computing?" From Denning's point of view, such principles are the fundamental stories we use to explain things. For computer science as a whole, he cites five guiding principles: computation (algorithms, complexity, and the like), communication (such as Shannon entropy and data transmission), coordination (human-computer and computer-computer interfaces), automation (artificial intelligence and machine learning), and recollection (storage hierarchies and search).
I'd really like to see a set of neatly crafted principles that define Internet computing. There will be considerable overlap with Denning's principles, just as there is considerable overlap between Internet computing and computer science as a whole. (One measure of this overlap was a small analysis I once performed. IC is one of about a dozen magazines published by the IEEE Computer Society. Like IC, these magazines generally tend to have themes for each issue. The year I counted, half of that year's themes could have been IC themes. On the other hand, half could not.)
At first glance, the most basic Internet computing principles will be about communication — the transfer of information over a distance. Thinking about information leads to the next few steps: the nature of things that might communicate (people, conceptually passive data repositories, active computational agents), protocols (the kinds of conversations that communicants might have), and content (not only the generic form of content, but also how to organize it for efficient retrieval). Internet computing, as a subdiscipline of computer science, thus emphasizes technologies for efficient and meaningful communication, rather than computation or the automation of computation.
In a future column, I'll explore this theme in greater depth, seeking to define more specific and principled boundaries for this emerging subdiscipline.
Junghoo Cho is an assistant professor in the Department of Computer Science at the University of California, Los Angeles. His main research interests are in the study of the evolution, management, retrieval, and mining of the World Wide Web. Cho has a BS in physics from Seoul National University and a PhD in computer science from Stanford University. He has published several research papers in international journals and conference proceedings and serves on program committees of several international conferences, including SIGMOD, Very Large Databases (VLDB), and World Wide Web. Cho has received both the NSF Career Award and the IBM Faculty Award. Contact him at email@example.com.
Andrew Tomkins is a senior research scientist at Yahoo Research. His interests lie in measurement, modeling, algorithms, and analytics for large heterogeneous datasets such as the World Wide Web. Prior to joining Yahoo, he spent eight years at IBM's Almaden Research Center, where he headed the information management principles group and served as chief scientist of the WebFountain project. Tomkins has a PhD in computer science from Carnegie Mellon University. He has published more than 50 technical papers, including two that won best paper awards at the World Wide Web conference. He also serves on various program committees and editorial boards. Contact him at firstname.lastname@example.org.
For detailed information on submitting articles, write to email@example.com or visit www.computer.org/internet/author.htm.Letters to the Editors
Send letters to Rebecca Deuel, Contact Editor, firstname.lastname@example.org. Provide an email address or daytime phone number with your letter.On the Web
Visit www.computer.org/subscribe/.Subscription Change of Address
Send change-of-address requests for magazine subscriptions to email@example.com.Missing or Damaged Copies
Contact firstname.lastname@example.org.Reprints of Articles
For price information or to order reprints, send email to email@example.com or fax +1 714 821 4010.Reprint Permission
To obtain permission to reprint an article, contact William Hagen, IEEE Copyrights and Trademarks Manager, at firstname.lastname@example.org.