Redondo Beach, California

Nov. 12, 2000 to Nov. 14, 2000

ISBN: 0-7695-0850-2

pp: 57

P. Raghavan , IBM Almaden Res. Center, San Jose, CA, USA

R. Kumar , IBM Almaden Res. Center, San Jose, CA, USA

D. Sivakumar , IBM Almaden Res. Center, San Jose, CA, USA

A. Tomkins , IBM Almaden Res. Center, San Jose, CA, USA

E. Upfal , IBM Almaden Res. Center, San Jose, CA, USA

ABSTRACT

The Web may be viewed as a directed graph each of whose vertices is a static HTML Web page, and each of whose edges corresponds to a hyperlink from one Web page to another. We propose and analyze random graph models inspired by a series of empirical observations on the Web. Our graph models differ from the traditional G/sub n,p/ models in two ways: 1. Independently chosen edges do not result in the statistics (degree distributions, clique multitudes) observed on the Web. Thus, edges in our model are statistically dependent on each other. 2. Our model introduces new vertices in the graph as time evolves. This captures the fact that the Web is changing with time. Our results are two fold: we show that graphs generated using our model exhibit the statistics observed on the Web graph, and additionally, that natural graph models proposed earlier do not exhibit them. This remains true even when these earlier models are generalized to account for the arrival of vertices over time. In particular, the sparse random graphs in our models exhibit properties that do not arise in far denser random graphs generated by Erdos-Renyi models.

INDEX TERMS

directed graphs; random processes; information resources; stochastic processes; stochastic models; Web graph; directed graph; static HTML Web page; hyperlink; random graph model; statistics; vertices; sparse random graphs; Erdos-Renyi models

