SEPTEMBER/OCTOBER 2004 (Vol. 8, No. 5) pp. 30-33
1089-7801/04/$31.00 © 2004 IEEE
Published by the IEEE Computer Society
Published by the IEEE Computer Society
Guest Editors' Introduction: Internet Measurement
|Where We Need to Go|
PDFs Require Adobe Acrobat
During the past 10 years, the Internet has become a vital component of international commerce, interpersonal communication, and technological development. Network measurement is critical to this new communication medium's continued viability. Researchers, service providers, and other members of the Internet community need to understand the Internet's growth characteristics and the limitations affecting the system, both globally and locally.
Network measurement was neglected in the Internet's early stages, taking lower priority to increasing the network's speed, capacity, and coverage. Recently, however, interest in network measurement has expanded, paving the way toward a growing understanding of the Internet's structure and behavior. Unfortunately, as the number of Internet users has grown, antisocial and even malicious behavior has also increased. Countering these and other scaling challenges will require substantially more investment in Internet measurement and data analysis. The four articles that follow provide an introduction to this vital research area.
In its early history, before 1995, the Internet primarily served a worldwide research community. The infrastructure that seeded the Internet was funded by US government agencies (DARPA and the National Science Foundation), which supported regional networks operated by organizations around the country.
Merit Network ( www.merit.edu), which operated the NSFnet backbone in its various forms, measured the backbone's traffic volumes and produced summary statistics through April 1995. (See www.cc.gatech.edu/gvu/stats/NSF/merit.html.) But these were primarily oriented toward short-term operational requirements or periodic simplistic traffic reports for funding agencies. As such, they weren't conducive to workload or performance characterization, much less network-dynamics modeling. As the NSFnet and attached regional infrastructures exploded in popularity among academic and commercial sectors, operators acutely focused on increasing link speeds and router/switch-traffic capacities, as well as expanding the topology to cover more of the world. Developers worked on improving protocols and inventing new ones to support emerging services. The evolutionary context of the infrastructure left little room for more than mild interest in network measurement.
In the mid 1990s, two events caused significant changes to the Internet. First, the NSF ended its funding for the US Internet backbone, implementing a strategic plan to transition from US-government to commercial funding and long-term sustainability of the still relentlessly growing Internet. Second, Tim Berners-Lee (then at CERN) developed the basic protocols underlying the Web and publicly released them and the accompanying software, thus making it possible for everyone to publish information (for free) and, eventually, offer services via the Web. Shortly thereafter, the National Center for Supercomputing Applications (NCSA) released, also free for noncommercial use, the Mosaic browser to support a more appealing graphical interface to Web content. In response, the Internet community quickly grew to many millions of users.
During the late 1990s, the Internet support community remained focused on operations — keeping networks running reliably and ensuring that infrastructural components would scale to cope with the increasing traffic volume and number of users. Network operators were generally interested in measurement but lacked the resources to pursue it themselves.
In the early 2000s, the dot-com bubble burst and Internet growth eased. With less money to invest in hardware, some providers became noticeably interested in understanding how their networks behaved — knowledge that could let them optimize physical resources, such as routers, switches, and leased lines.
For ordinary people, the Internet has become an integral part of everyday life; we now use it continually to find information, buy products, meet people, do our jobs, and play. As if these circumstances weren't sufficiently revolutionary, the pervasive adoption of mobile computing expectations and requirements is now prompting service providers to take a strong interest in more strategic measurement and charging schemes.
With its ever-growing user community, the Internet has gradually been forced over the past decade to deal with the "real world." Like chemical pollutants from industrial production processes, infrastructural pollution — such as viruses, worms, and spam traffic — has become significant in volume and impact on user productivity. Protective technologies such as firewalls and NAT gateways have changed the Internet's simple end-to-end connectivity model. Although these devices can effectively block some malignant packets, they do so by filtering packets according to access-control lists (ACLs), which can prevent many applications (those that require end-to-end connectivity) from working properly. However, the recent Witty worm clearly demonstrated that firewalls themselves can be vulnerable to devastating attacks. 1 (See www.caida.org/analysis/security/witty/.) Furthermore, vast portions of the Internet remain vulnerable to attack because many users (most residential users, for example) do not even use firewalls. Getting a handle on the impact of network pollution and attack traffic, not to mention developing techniques to minimize it, has motivated a deeper interest in measurement and a corresponding rise in research activity.
Collection, interpretation, and modeling of empirical Internet data remains challenging. The technologies and protocols involved in generating and delivering Internet traffic were designed for technical expediency, architectural clarity, and functionality, rather than for measurement and analysis. New developments often introduce specifications that are independent of their predecessors; technology developers often deploy them as rapidly as possible, without concerted systematic testing on the vast set of heterogeneous components encountered on the Internet. Indeed, it would be impossible to test certain behaviors against all possible combinations of equipment, software, and configuration. Furthermore, many who develop technologies and protocols contend that the Internet has evolved splendidly thus far without extensive measurement and modeling. Others believe that we should not begin measurement and modeling efforts until doing so proves cheaper than simply expanding the currently available bandwidth. To make matters harder, a variety of legal and privacy issues serve as active disincentives to measurement research and development activity. Nonetheless, every constituency of the Internet (providers, vendors, policymakers, and users) realizes that we need a better understanding of Internet structure and behavior, including the influence of various components and functionalities on macroscopic dynamics.
Floyd and Paxson's landmark paper provided several insights into why the Internet is hard to measure, and thus hard to simulate, making it resistent to modeling and predictive insight. 2
The first big challenge is that everything keeps changing. For example, HTTP traffic grew from zero in 1995 to more than 80 percent of the network traffic at many sites by the early 2000s. Yet, HTTP's proportion of total traffic is now dropping on most links, and peer-to-peer traffic is steadily rising as developers find more ways to use P2P technology.
The Internet's global scale also complicates measurement efforts, as does the fact that many aspects of traffic and behavior change from location to location. Thus, statistics gathered at one location often prove unrepresentative of the global Internet. Instead, we need to make measurements at many sites and correlate the results to derive a comprehensive view.
Finally, few Internet protocols and applications were designed to inherently support fine-grained measurement. Instead, researchers have had to find indirect ways to measure network phenomena. For example, traffic-flow measurements rely on data collected from packet headers as they pass across links; counting packets and bytes and classifying them into flows on the basis of values taken from the headers is easy but yields limited insight into higher-layer behavior. Measuring application performance generally remains a challenge, since applications differ as to how they transport application-specific data. For example, while effective tools exist for measuring Web server performance, such tools are often not effective at measuring performance of other applications.
We trust that the four theme articles in this issue will provide some insight into the nature of the young and colorful science of network measurement. Our goal here is to raise awareness and promote understanding of the need for measurement to improve the Internet's stability and performance as its relentless growth continues.
As network and CPU speeds increase, users expect their file-transfer rates to improve as well. Unfortunately, congested sections of paths through the Internet often render lower-than-expectedtransfer rates. To help diagnose such problems, researchers have developed tools to measure performance characteristics such as available network bandwidth. In our first article, "Estimating Available Bandwidth," Ribeiro, Riedi, and Baraniuk describe techniques for measuring available bandwidth using their open-source tool for spatio-temporal bandwidth estimation. Their further goal is to determine which links in the path are most congested over short time intervals. Such a tool could prove useful for network troubleshooting as well as for applications that need to adapt to network conditions.
Multicasting has remained a promising technology for a decade, offering an efficient way to achieve scalable many-to-many data delivery. In "Multicast Routing Instabilities," Rajvaidya and Almeroth briefly survey multicast technology R&D, analyze four years of multicast routing data, and evaluate overall multicast stability (the dynamics of group membership and routing systems) and reachability (of multicast sources and destinations). Understanding the causes of such instabilities has allowed the authors to improve multicast router configurations, bringing multicast a step closer to being an integral part of the Internet.
Because it strongly influences the way reliable transport protocols such as TCP behave, thepacket-loss rate is a fundamental characteristic of any Internet path. The simplest way to measure packet loss is to send probe packets and observe their behavior, but today's loss rates are often too low to obtain reliable measurements. In "Comparing Probe- and Router-Based Packet-Loss Measurement," Barford and Sommers examine probe-based and router-based packet-loss measurement techniques; their results show that probe-based measurements are much less reliable. Because researchers seldom have access to routers, especially those outside their own networks, router-based statistics are hard to obtain. Improvements in active probing methods that don't rely on intermediate router access would provide an effective solution to this problem.
We end this theme section with "Long-Range Dependence: Ten years of Internet Traffic Modeling," Karagiannis, Molle, and Faloutsos's critical retrospective of the past decade of Internet traffic modeling. Despite its theoretically revolutionary nature, the notion of long-range dependence (LRD) in traffic (that is, roughly, that packet interarrival times are influenced by those of earlier packets), has had limited practical impact. The complexities and inaccuracies inherent in LRD estimation have significantly constrained its utility. Furthermore, some backbone traffic samples, although self-similar, are also well-characterized by the simpler and better understood Poisson model (in contrast to LRD), even at subsecond time scales. This audacious article challenges the community to reevaluate current modeling assumptions and methodologies. As with the Internet's radical effect on society, it is safe to say that the revolution in Internet traffic modeling is not yet over.
Where We Need to Go
Real data from wide-area networks is useful not only to traffic researchers but also to many others, including those doing Internet traffic engineering, ISPs that must bill according to traffic volume, and network security researchers. Because measurement provides the only accurate data about the current status of network usage, R&D in this area is the only way to secure intelligent decisions about network provisioning with finite resources. Thus, Internet data collection helps both users and providers. Unfortunately, technologies do not exist to handle many basic measurement tasks. Moreover, no agencies or organizations are clearly responsible for the cost of developing measurement technologies and analyzing the resulting data to capture insights into the current infrastructure and longitudinally measure its evolution, much less track longitudinal data on its evolution.
The relative dearth of information about network topology and function led to the proliferation of many misconceptions over the previous two decades about the Internet's structure and performance. 3 Such misconceptions have led to inferences based upon incomplete or misleading information. Compounding matters is the fact that there are few independent repositories representing solid collections of Internet data. To address this issue, the NSF is currently funding the development of such a repository. (See www.caida.org/projects/trends/ for details of CAIDA's network data repository project.)
And yet, without a pervasive community commitment, cogent measurement and analysis of the structure and dynamics of the Internet infrastructure will remain a weak science and, as such, a weak basis for predicting traffic behavior or informing public policy. Internet measurement is still a discipline in its infancy, but it holds a vital key to the health of the current and future Internet.
Nevil Brownlee is an associate professor of computer science at the University of Auckland. After nearly four years researching Internet traffic behavior using higher-speed versions of NeTraMet at the Cooperative Association for Internet Data Analysis (CAIDA) in San Diego, he is working on research-and-education networking in New Zealand. Brownlee is cochair of the IETF's IP Flow Information Export (IPFIX) working group and former cochair of the Real-Time Traffic Flow Measurement (RTFM) working group. Contact him at firstname.lastname@example.org.
kc claffy is principal investigator for the Cooperative Association for Internet Data Analysis (CAIDA), based at the University of California's San Diego Supercomputer Center. Her research interests include data collection, analysis, and visualization of Internet workload, performance, topology, and routing behavior. She also works on engineering and traffic analysis requirements of the commercial Internet community, often requiring ISP cooperation in the face of commercialization and competition. Contact her at email@example.com.