1541-4922/06/$31.00 © 2006 IEEE
Published by the IEEE Computer Society
Education: Integrating Parallel and Distributed Computing in Computer Science Curricula
Writing this column for DS Online
's new Education department is like going back in time. In 1994-1995, Janusz Zalewski, Jerzy Waśniowski, and I published three papers related to computer science education. 1 - 3
We made suggestions regarding parallel and distributed computing's place in the computer science curriculum. However, our timing was unfortunate—just after a wave of bankruptcies, consolidations, and changing business objectives hit the industry (beginning in 1992), claiming, among others, BBN, Alliant, Kendall Square Research, Thinking Machines, and Convex. At this time, the first Beowulf cluster ( http://beowulf.org) was just created, and the idea of grid computing was still that—just an idea.
So many things have changed since then. Cluster computing brought parallel computing to the masses. Grid computing has been one of the hottest research areas since Ian Foster and Carl Kesselman published The Grid: Blueprint for a New Computing Infrastructure in 1998 (Morgan Kaufmann). High-performance computing came back with a vengeance, spearheaded partly by the Advanced Simulation and Computing Program ( http://www.llnl.gov/asci) and international competition to build the world's fastest supercomputer (represented by the Earth Simulator, http://www.es.jamstec.go.jp/esc/eng). Most recently, parallel computing has truly entered the desktop—initially through resurfacing of shared-memory parallel computers (in the form of dual- and quad-processor servers), followed by the introduction of multicore processors.
We can say that in today's computing landscape, there's no single-processor computing—"only" parallel and distributed computing. So, it's probably the right time to bring back ideas that we proposed 11 years ago and to suggest that computer science curricula must represent today's computing reality by focusing on parallel and distributed computing.
Obviously, that's easier said than done. The suggestion of introducing something new to the curriculum typically receives one of two responses:
• University departments create a new course and add it to the electives list. Depending on the faculty's availability and willingness, a department offers such a course from time to time.
• University departments can't find space for another course. This is especially the case when such a course doesn't have a champion—a person who ensures that the department will actually offer, for example, Parallel and Distributed Computing.
Recognizing this situation and believing that even the first possibility is inadequate, we approached the problem from a different perspective. We suggested that instead of introducing a new course, computer science departments should introduce elements of parallel and distributed computing in all appropriate courses across the curriculum. More specifically, we suggested that the curriculum should
• include parallel and distributed computing as early as possible, possibly in Computer Science I and II;
• teach parallel and distributed computing in core courses in a breadth-first manner; and
• introduce parallel and distributed computing based on software engineering principles.
We followed with specific examples of what material to introduce in which courses (Computer Science I, Computer Science II, Computer Architecture and Hardware, Programming Languages, Operating Systems, Data Structures and Algorithms, Software Engineering, and Senior Research Project).
These proposals seem too mild now, and a more radical approach is necessary.
The evolving world of computing
Returning to my earlier line of thought, all mainstream-computing processors(from desktop PCs and laptops up) will soon be multicore, and we'll have to take this fact as a reality and not an aberration. We'll find single-core processors primarily in cell phones or thinking refrigerators(embedded systems in general). Furthermore, multiple multicore processors will be combined to create constantly more powerful servers (for example, as centers of provisioning of virtualized resources). As a side note, this also means that the central mainframe is back in a somewhat modified form. Because the industry is already working on quad-core processors, and quad-processor servers have existed for some time, it's easy to envision inexpensive servers consisting of at least 16 processing cores.
At the same time, the Internet has become so enmeshed with our lives that students should easily understand the vision of the Net's multiple available resources (which happens to be central to the grid metaphor). These students communicate daily with their friends using email and instant messaging and use the Internet to find various types of content. So, we must change how we teach computer science to reflect how the world of computing is and will be, rather than how it was. Therefore, from the start, we should
• treat parallel and distributed computing as a natural environment our students will find themselves in and
• treat a single thread of execution as a special case of the typical situation (where multiple threads execute concurrently and interact with each other).
With this in mind, I'll briefly sketch some ideas of what we can teach in early Computer Science courses. These ideas are limited to a few examples that illustrate the outlook that I advocate here and that show the direction of proposed changes. Detailed analysis of course content must obviously follow but is outside this column's scope.
Suggestions for a new curriculum
We should treat parallel and distributed computing as a natural situation and begin teaching it from the first Computer Science course. Even the very first program—which often writes "Hello, World"—can be executed on two processors and can produce "Hello, World" twice. Furthermore, if the available computer has more processors, the program can be executed on an increasing number of them until each one writes its own greeting to the world. Obviously, different ways of doing this depend on the language and environment used, and these specific techniques would go beyond the student's current knowledge. However, after a general multiprocessing introduction, we could explain that the additional "technical details" just facilitate execution on multiple processors and that the students will understand what's happening later. They can grasp the concept and have fun playing with executing the code on multiple processors, and they won't be overwhelmed by the situation.
An important concept that CS I introduces is the loop. Students could easily learn that they can execute different parts of the loop on different processors. After completing this step and introducing vectors, it's easy to envision that students would be able to split operations performed on vectors into parts executed on separate processors.
CS II typically introduces several algorithms that deal with various complex data structures. Some algorithms might be more amenable to parallelization than others, and students should explore this possibility. For instance, comparing linked-list-type and vector-type data objects in relation to parallelization could be useful. Also, in CS II, we can introduce techniques for coarse-grain parallelization. Here, students can concurrently execute different modules working on different tasks (in this way, we move away from data parallelism and show functional decomposition of a problem into tasks).
Obviously, we'd need to change the Computer Architecture course, emphasizing current architectures. When I taught Computer Architecture, I liked John Hennesy and David Patterson's textbook, Computer Architecture: A Quantitative Approach (Morgan Kaufmann, 2002); however, I don't know any current textbooks that are appropriate for an undergraduate course focusing on today's multicore processor architectures. Maybe it's time to refocus and split Computer Architecture courses between what is taught in Computer Science and in Computer Engineering programs. But this is a subject for a different column.
By now, I hope it's obvious how we can extend the proposed program to other courses in a Computer Science curriculum. The proposed means of introducing parallel and distributed computing in early undergraduate Computer Science courses don't involve complicated problems and in-depth analysis. What matters most is that, from the start, we expect students to consider the world of computing as filled with multiprocessor machines and Internet resources that they can combine and use to solve computational problems. In other words, I propose an approach that will let us develop, from the beginning, the frame of mind that best represents the computational realities of the present and the future.
is an associate professor at the Warsaw School of Social Psychology's Computer Science Institute. Contact him at firstname.lastname@example.org.