Issue No. 02 - March/April (2006 vol. 8)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2006.31
Although the Internet is manmade, its exact structure is a mystery. Most of its component networks are owned and managed by private companies that keep their hardware specifications secret for commercial and security reasons. In this respect, the Internet—routers and the network hops between them—has much in common with other real-world networks, such as the World Wide Web. They, too, have formed organically, and their topologies and growth patterns remain elusive.
In the late 1990s, several groups of researchers published studies suggesting that the graphs of many such real-world networks are "heavy-tailed"—meaning they have an unexpectedly large number of high-degree nodes. Following these studies, physicists at the University of Notre Dame, led by Albert-Laszlo Barabási, published two highly influential papers describing a growth mechanism that might explain this phenomenon as well as a key feature of networks generated by it: an Achilles heel. Barabási's networks are held together by a few highly connected nodes through which most traffic must pass; thus, they're robust against random failures yet vulnerable to targeted attacks. Because the Internet graph is thought to be heavy-tailed, the Notre Dame researchers reasoned that it, too, might be "robust yet fragile."
A recent paper by John C. Doyle and colleagues challenges this view of the Internet and offers an alternative picture that they argue is more consistent with known data and technological constraints. "Thus far, we have worked with models that fail to reflect essential features of the Internet," says John W. Byers, a professor of computer science at Boston University. "Now, the picture is beginning to hang together a lot better."
The Doyle model—described in a paper that appeared in the US Proceedings of the National Academy of Sciences (PNAS)1 in October—suggests that the Internet is robust against both random router failures and attacks that target them. "If you wanted to attack the Internet, the last thing you would do is go out and smash routers," says Doyle, a professor of control and dynamical systems at the California Institute of Technology. "Router attacks are the one place where the Internet has no Achilles heel."
Power Laws and Preferential Attachment
Several papers published in the late 1990s used link analysis to elucidate the Web's structure. Inspired by this work, Barabási decided to explore the Web's connectivity and large-scale topological properties. With Réka Albert and Hawoong Jeong, Barabási built a Web crawler, similar to those employed by search engines, and used the collected data to approximate the probability that a given page will have k incoming links or k outgoing links. Because individuals act independently and follow their unique interests when choosing which sites to link their Web pages to, the researchers expected that the distribution of node degrees for the Web graph would be bell-shaped, with the fraction of nodes of degree d decreasing exponentially in d.
To their surprise, the data showed something quite different. The degree distribution tailed off not exponentially, but polynomially, following what scientists call a power law. Compared to an exponential distribution, a polynomial distribution has a greater number of Web pages with a very large number of links—in other words, a heavy tail. The researchers published their results in a 1999 issue of Nature,2 at the same time that several other researchers published studies with the same conclusion. 3,4
In a Science paper 5 that appeared later that year, Barabási and Albert listed a diverse array of real-world networks that appeared to have heavy-tailed degree distributions and suggested that this phenomenon can be explained by "growth" and "preferential attachment." As nodes are added to the network over time, they argued, there's a bias toward linking them to high-degree nodes. In the Web's case, for example, the researchers suggested that large-degree sites tend to grow at a faster rate than the small-degree ones because people creating Web pages are more likely to link to high-profile sites.
The discovery of power laws on the Web prompted an avalanche of research on the degree distributions of real-world network graphs. Studies showed the existence of power laws in collaboration networks of researchers, in networks of film stars linked by appearance in the same film, in the power grid, and in cellular and ecological processes. Significantly, a 1999 study 6 by the Faloutsos brothers used trace-route sampling data to argue that the Internet graphs at the router level and at the autonomous systems level (where the nodes are Internet service provider networks and the links are peering agreements) are heavy-tailed.
At the same time, researchers began to rediscover old power-law research by well-known figures such as Benoit Mandelbrot and George Kingsley Zipf. As it turned out, the mathematical basis of the preferential attachment model dates back to at least 1925, where it was used to explain a power law in the distribution of species among genera of plants. 7
In 2000, the three Notre Dame physicists joined forces again and published another Nature paper describing robustness against the failure of preferential attachment-generated random graphs. 8 They showed that the connectivity of such graphs, which tend to have their high-degree nodes in their heavily-trafficked cores, is maintained when randomly chosen nodes are removed, but falls sharply upon the removal of the highly connected hub nodes. Citing the Faloutsos paper, the researchers concluded that the Internet might be vulnerable to targeted attacks.
The Internet research community was "thunderstruck" by the implications of the Barabási and Faloutsos papers, says Byers. With colleagues, he, like other researchers, built an Internet "topology generator" based on the preferential attachment model, which was widely used to simulate the behavior of peer-to-peer networks and other large-scale network systems. "People wanted a model that reflected the power laws that were presumptive in the Internet," Byers says.
Within a year or two, however, some researchers began to question whether the preferential attachment model, which seemed so plausible when applied to the Web, made sense for the physical Internet. "People who were mathematically inclined and work on the Internet knew that [the preferential attachment model] wasn't the right picture," says Jon Kleinberg, a professor of computer science at Cornell University. Also, some network operators, when asked about the "Achilles heel" routers of their networks, replied that their networks didn't have any of the high-degree hub nodes Barabási posited.
An Alternative Model
In their October PNAS paper, Doyle and colleagues cast further doubts on the robust-yet-fragile picture of the router-level Internet and provided a different model that they say is more consistent with known data and economic constraints. The researchers note that within the large class of heavy-tailed graphs, most have the highly connected, highly trafficked, vulnerable hub nodes described by Barabási, yet a few have an altogether different structure. In these statistically rare graphs, the high-degree nodes aren't located in the heavily trafficked core, but at the low-traffic periphery.
To explain why this structure, highly unlikely in random graphs, is likely to reflect the actual Internet, the researchers came up with an alternative mechanism for generating a heavy-tailed Internet graph. The new mechanism follows a general framework for generating highly optimized tolerance, or HOT, heavy-tailed graphs, introduced by Doyle and Jean Carlson, of the University of California, Santa Barbara. In a 1999 paper, 9 Carlson and Doyle pointed out that heavy-tailed degree distributions can arise in engineered networks in which the builders of the individual components try to optimize a common objective subject to a universal hazard or constraint. For example, the power-law distribution of the sizes of forest fires can be explained by the optimized placement of firebreaks.
In the Internet setting, Doyle and his colleagues assumed a graph design mechanism that attempts to maximize throughput (measured in bandwidth) subject to node constraints representing the limitations in router technology. Because of economic considerations, the researchers asserted that there's a limit to the total bandwidth a commercial router can support, forcing a trade-off between the number of links and each link's throughput. Thus, high-bandwidth routers must necessarily have fewer links, and high-degree routers must necessarily be at the Internet's periphery, where throughput is low. Graphs following this model have a throughput much higher than that of a preferential attachment-generated graph, the researchers showed. "Graphs with a given degree distribution come in a bunch of different flavors," says Byers. "One likely flavor is to correlate high degree with high bandwidth, but that doesn't correlate with performance."
To support their conclusions, Doyle and his coauthors demonstrated that the HOT model is consistent with Abilene, the backbone for the Internet2 academic network (one of the few networks for which detailed hardware specifications are publicly available) and with the data available on commercial networks.
The researchers also examined the effect of deleting nodes from their HOT model, measuring performance as the amount of original traffic that can be served by rerouting through the new network while preserving the original bandwidth constraints. They found that it was robust not only against deletions of the low-degree core nodes, but also against deletions of the high-degree edge nodes. The loss of an edge node disconnects only low-bandwidth users, leaving the rest of the network intact.
This new picture of the Internet will likely be reflected in the next generation of network models, says Byers, adding, "If I weren't on sabbatical, I'd be building a new generator."
The Doyle paper doesn't address the Internet's structure at the autonomous system level, in which the nodes are the service provider networks and the links are the peering agreements between them. For this network, which is widely believed to be heavy-tailed, the Barabási-Albert preferential attachment model seems more plausible to researchers.
Kleinberg, a theoretician, sees the new work as an important step toward clarifying a larger goal: classifying complex networks by their large-scale attributes. "The degree distribution is not everything," he says. "You also need to look at other invariants before saying what the graph looks like."
"Preferential attachment was important to articulate," Kleinberg adds. But, in general, "these graphs aren't growing with any one simple process, they are complicated processes and it shouldn't come as a shock when simple models fail to explain everything."