1521-9615/08/$25.00 � 2008 IEEE
Published by the IEEE Computer Society
A Googol of Information about Google
Timothy P. Chartier Editor: Mario Belloni, mabelloni@davidson.edu Send announcements to bortega@computer.org
  Article Contents  
Download Citation
Download Content
PDFs Require Adobe Acrobat

Have you ever wondered about the mathematics and computer science behind Web pages' search engine rankings? For instance, how does Google determine the Web page rankings that correspond to your submitted queries? Amy Langville and Carl Meyer answer these and other questions in Google's PageRank and Beyond: The Science of Search Engine Rankings. Whether you're a mathematician, computational scientist, or just someone with a general interest in the science behind search engine results, this book should be on your desk or nightstand.
Given the large spectrum of topics the book addresses, the authors anticipate their readers' varied interests and backgrounds. Researchers will learn the science behind search engines, while casual readers will enjoy the anecdotes scattered throughout the book. To this end, the authors supply readers with navigation tips that emulate the nonlinear structure of hyperlinks and jump to places in the book per readers' interests. For someone wishing to delve into the book's mathematical proofs but last studied math from a dusty tome sitting on a shelf, the last chapter contains a thorough primer of linear algebra and other necessary mathematical ideas relating to the book.
The text concentrates largely on Google's PageRank algorithm, which the company's founders, Larry Page and Sergey Brin, developed while they were graduate students at Stanford University. Chapter 1 presents a history of information retrieval and, depending on readers' ages, reminds or introduces them to the state of information retrieval prior to the advent of Google and its revolutionary algorithm.
Google uses a model of search engine activity to order Web pages that correspond to user queries. Such rankings result from measuring each Web page's relevance to queries and its quality. The book concentrates largely on how Google ranks Web pages according to such a measure of quality. Thus, if two pages are equally relevant to a query, then the Web page deemed more popular or of higher quality will be ranked above the other.
Readers carefully examine the PageRank algorithm as a Markov process in which a Web page's rank correlates to an entry's magnitude in the dominant eigenvector of the associated Markov transition matrix. Although undergraduate linear algebra classes cover the computation of eigenvectors, readers will quickly discern that finding the PageRank vector is much more difficult than finding an eigenvector of the small matrices often considered in the classroom. How do you store a matrix with billions of rows and columns? Which computational algorithms are most efficient for this problem and for one of this size? Furthermore, how do you adapt modern numerical methods to a matrix with such a large dimension? With the Web's ever-changing nature, how do you efficiently update Web page rankings? Langville and Meyer address these questions with references to modern research and the challenging unanswered questions that remain in the field.
Although much of the book deals with the PageRank algorithm, the authors introduce alternative algorithms as well. In particular, readers can learn about Hyperlink-Induced Topic Search (HITS) and Stochastic Approach for Link Structured Analysis (Salsa), and experiment with Matlab code for many of the algorithms.
This book is an excellent resource for researchers because it collects a large amount of work into one area and supplies citations for further study. Teachers and students alike will see linear algebra, mathematical modeling, and computational science applied to real-world problems. Also, having such a large number of stories and asides about search engine analysis is a treasure in itself. For instance, you can learn about the computer power used in Web indexing wars, how companies might exploit the PageRank algorithm to boost their Web page rankings, or how someone launched a "Google bomb" that returned the official White House biography for President George W. Bush as the top-ranked Web page when someone entered "miserable failure" into the search engine. Langville and Meyer weave together modern research on search engine analysis with humorous and interesting stories about the world of search engines. Along with the book's ample citations, readers have a googol (the number 10100 after which Google was named, reflecting the company's goal of organizing the Web's information) of directions to follow, depending on their personal interests.
Timothy P. Chartier is an assistant professor of mathematics at Davidson College. His research area is numerical partial differential equations and numerical linear algebra, which he conducts in collaboration with researchers at Lawrence Livermore and Los Alamos National Laboratories. After completing his doctoral work in applied mathematics at the University of Colorado at Boulder and finishing a Vertical Integration of Research and Education in the Mathematical Sciences (Vigre) postdoctoral position at the University of Washington, Chartier joined the faculty at Davidson. Contact him at tichartier@davidson.edu.