# Using Web Search Engines to Find and Refind Information

Robert G. Capra, Virginia Tech
Manuel A. Perez-Quinones, Virginia Tech

Pages: pp. 36-42

Abstract—To inform the design of next-generation Web search tools, researchers must better understand how users find, manage, and refind online information. Synthesizing results from one of their studies with related work, the authors propose a search engine use model based on prior task frequency and familiarity.

Search engines, which receive approximately 550 million requests per day, 1 play a vital role in finding and filtering the vast amount of data available on the Web. However, despite the availability of more accurate and efficient search algorithms, even experienced users have trouble refinding and managing information found on the Web. 2-5 Further, users often do not know what data they will need to reaccess—a problem known as postvalue recall.6 Even if they do anticipate the information's value, users often encounter difficulties with existing organizational tools such as bookmarks.

As University of Tampere researchers recently observed, a major drawback of general-purpose search engines as refinding tools is that "finding relevant information is often an iterative process" and "it can be almost impossible to remember the exact query that was used when a specific piece of information was found." 2 Refinding information thus involves more than simply fine-tuning a search algorithm. 3

Early search engines were not optimized to access previously viewed information—for example, most order results by criteria independent of a user's browsing history. However, as the " Emerging Web Search Tools and Features" sidebar describes, many search engines and Web browsers now offer enhanced features, such as localized search results and customizable toolbars, to increase their usability and utility. Some of these new features are starting to focus on helping users manage and relocate information they find on the Web—for example, by providing access to histories of prior searches.

To better inform the design of next-generation Web search tools, researchers have begun to examine users' finding and refinding behaviors as well as the limitations of existing search technologies. 2-10 As part of this effort, Virginia Tech's Center for Human-Computer Interaction ( www.hci.vt.edu) is exploring how users find and refind online data and the factors that affect these processes. Synthesizing results from one of our laboratory studies with related work, we have developed a search engine use model based on how frequently users perform a given task as well as how familiar they are with that task.

## Finding versus Refinding

Finding and refinding present different user challenges. Finding information for the first time is often an exploratory activity. Users apply knowledge of the Web, intuition, and browsing and foraging strategies together with tools such as search engines to find the desired information. This process can involve substantial uncertainty: Is the information available? Where is it? What form is it in?

Users often have only partial information when initiating a search. They might know some words to input in a search engine query or know something about the Web site where they expect to locate the data. However, they may lack the "key" or "clue" to precisely locate the information using available tools.

In contrast, refinding is often a more focused process. Users know the information is out there because they have already seen it. The task therefore requires getting back to the information. While finding relies only on recognition (Is this the information I was looking for?), refinding relies on recognition as well as recall (Where did I see that?). Refinding thus involves some degree of certainty. For example, users might recall or recognize waypoints—important or memorable Web sites along the path they took when they found the information the first time. 11

However, users still must deal with partial information; after all, if they remembered all the relevant details, they could directly access the desired data. Thus, context is critical to the refinding process: Users might recognize a key link or Web site based on seeing it in the same context they did the first time they found the information.

## Search Engine Use Study

During the fall of 2003, the Center for Human-Computer Interaction conducted a study of search engine use in finding and refinding information on the Web. The goal was to explore how users completed different types of directed tasks as well as to learn about their previous experience with similar tasks. In addition to exploring the differences between finding and refinding, we were particularly interested in the impact of

• prior frequency—how often participants did a task in their daily life, and
• prior familiarity—how familiar participants were with the requested information's location

on search engine use.

### Study methodology

Seventeen Virginia Tech undergraduates participated in two sessions in a controlled laboratory setting that enabled us to compare the findings across the two sessions. In the first session, we gave the participants a set of 18 tasks that asked them to look for certain information on the Web; in the second session, about a week later, we asked the participants to refind the same or similar data.

Table 1 describes 12 of the tasks, which involved finding specific pieces of information such as a phone number, a sports score, or flight reservation information. The other tasks are beyond the scope of this article.

Table 1. Task descriptions in search engine use study.

Before each task, we asked the participants how often they did this type of task in their daily life and how familiar they were with the location of the requested information. They responded to the frequency question using an ordinal scale. We grouped the responses into the following levels: low = less often than once per month, medium = once or several times per month, and high = once per week or more. The familiarity question used a seven-point Likert-type scale, which we grouped into low (1-2), medium (3-5), and high (6-7).

We used screen capture and data-logging software to record participants' actions and the URLs they visited while completing the tasks. To track search engine use, we ran the log of URLs through a script that automatically counted the number of search engine result pages based on features of the URLs returned by major search engines.

The software counted all accesses to search result pages. If a participant used the back button to return to a search results page, the software counted the page a second time. We decided that this type of counting was appropriate for our analysis because the participant was explicitly referring back to the search results—that is, using them a second time.

To compare search engine data across tasks, we normalized the counts using a search ratio: the total number of search engine uses divided by the total number of pages (URLs) loaded for that task. We also recorded whether or not a search engine was used at all for each observed task attempt. We refer to this binary measure as the search used variable (1 = search engine used, 0 = search engine not used). The analyses presented here include tasks that participants completed successfully.

### Overall search engine use

During the first session, participants used Web search engines to complete 38 percent of the tasks in the first session (68 of 179 tasks) and 32.5 percent of the tasks in the second session (54 of 166 tasks). A chi-square test ( $\chi ^2$ = 1.12, df = 1, alpha = 0.05) did not detect any effect of session on the search-used variable. These results indicate that

• there was no difference between the two sessions in terms of the number of tasks that included some use of a search engine, and
• participants carried out most tasks on both days without using search engines.

Our findings are consistent with other recent research. For example, MIT's Haystack Project ( http://haystack.lcs.mit.edu), which is developing a system to manage semistructured data, 12 observed that people use search engines only 39 percent of the time to access their personal information spaces. 3 Richard Boardman and Martina Sasse likewise reported a strong preference among users for browsing over search in managing files, e-mail, and bookmarks. 4 In addition, a study by the University of Washington's Keeping Found Things Found project ( http://kftf.ischool.washington.edu) showed that, to manage information for reuse, users employ various methods including e-mailing URLs to themselves, saving Web pages to a local disk, and printing out Web pages. 5

### Prior task frequency and familiarity

Two analyses of variance showed that prior task frequency and familiarity significantly impacted search engine use ( p = 0.01). As Figure 1a shows, tasks that participants performed with high frequency in their daily lives had a significantly lower mean search ratio than tasks with medium or low frequency. Similarly, as Figure 1b shows, tasks with high prior familiarity had a significantly lower mean search ratio than those with low prior familiarity. For both analyses, we conducted post hoc tests using the Tukey-Kramer adjustment for multiple comparisons to guard against Type I error.

Figure 1   Impact of prior task frequency and familiarity on search engine use. (a) Tasks with high frequency had a significantly lower mean search ratio than those with medium or low frequency. (b) Tasks with high prior familiarity had a significantly lower mean search ratio than those with low prior familiarity.

These results suggest that users develop techniques for accessing information they know they will need—that has recurring value—and that they are less likely to use a search engine as part of this process. For example, when tasked with finding local weather information, all the participants in our study knew a specific way to get to this data without using a search engine. The more familiar participants were with a task or the more frequently they performed this task in their daily life, the less they relied on search engine use.

### Difference between sessions

Another analysis of variance did not detect any overall difference in the mean search ratio from the first session (0.11, n = 179) to the second session (0.10, n = 166). This suggests that participants who used a search engine to complete a task during the first session likely also used it during the second session, and that those who did not use a search engine during the first session probably did not use one during the second session either.

These results indicate that single exposure to a task does not affect search engine behavior and that people strongly prefer particular techniques to access certain types of information. Such patterns may change over time as users realize the value of the data or reaccess it more frequently, but they are fairly stable on a daily basis.

Our study revealed that the type of task had a major influence on search engine use. Known Web resources often have real-world counterparts such as phone books, newspapers, and dictionaries that people are familiar with using. Thus, for some types of tasks, people find it easier and more reassuring to consult a known Web resource than to conduct a full Web search. For example, almost all the participants in our study knew a URL for an online dictionary and accessed it directly to look up word definitions. Table 2 summarizes some of the Internet resources that study participants used to complete tasks and the real-world counterpart of each online source.

Table 2. Traditional and Internet information sources.

Participants typically used search engines when they did not know a source or when the task required them to find very specific or detailed information. Interestingly, some relied on search engines to find online versions of trusted traditional sources. For example, one participant searched for "white pages" and "yellow pages" to locate a source for a local business phone number rather than directly searching for the business name.

### Refinding moved links and Web pages

One task in our study that was not included in the previous analyses asked participants to find and read a news article during the first session and then try to refind the same article in the second session. In many cases, participants discovered that, during the intervening week, the link to the story had moved or was removed. For example, many participants read articles linked to the CNN homepage in the first session and found that the links were different when they returned for the second session—only two of the 17 participants reported success refinding the same article they had read on the first day.

This aspect of our study underscored just how difficult it can be to refind information on the Web. In some cases, reaccessing data can be impossible if the source has removed the information. Additional research on tools to support refinding when the Web changes—such as that by Jaime Teevan at MIT 13—is needed.

## Search engine use model

Finding and refinding are intuitively different tasks, but prior task frequency and familiarity, as well as the nature of the task, all influence how people reaccess information on the Web. Based on the results of our study as well as other related research, we have developed a model of search engine use for directed tasks.

### Finding-refinding continuum

As Figure 2 shows, finding and refinding lie along a continuum from acquiring information for the first time to using specific strategies to carry out familiar tasks. Individual access patterns may vary little from one session to another but can change over time as the value of information becomes apparent or increases. At some point, users may adopt a more efficient approach that requires learning or memorization—for example, remembering a specific URL rather than tracing a path to it from a known source.

Figure 2   Finding and refinding continuum. Prior task frequency and familiarity influence finding and refinding behavior over time.

The user is looking for information for the first time. The user may make use of information-foraging strategies 14 and rely on prior related experiences. For example, a user who knows how to search for airline reservations might be able to find hotel prices fairly easily without ever having reserved a hotel online. Even for new tasks, familiarity with related domains might move the user from quadrant I to within one of the other quadrants.

The user has some familiarity with the tasks but does not perform them frequently. These tasks might be well understood or important but performed only periodically. Examples include logging in to a bank account once or twice a month, occasionally looking up a phone number, or registering for courses on a university Web site at the start of each semester. Users might develop access patterns for these tasks but do not invest the time to create or learn significant shortcuts.

The user repetitively performs the tasks. For example, some users check the weather frequently—perhaps multiple times in one day. In this quadrant, users are highly familiar with the task and strongly aware of the information's value. Thus, they adopt information-seeking behaviors that rely less on search engines and instead invest time or mental resources to create a streamlined or well-known access method. For example, they might commit a URL to memory, bookmark the page, routinely follow a specific sequence of links from a known starting point, or use their Web browser's autocompletion feature to access the information.

The user performs confusing or difficult tasks fairly often but is not yet proficient at them. Tasks in this quadrant are inherently difficult. One example is reaccessing a Web site with a long or complicated URL by manually typing in the address rather than bookmarking the site (which would move the task into quadrant III). Another example would be summarizing the previous day's headlines from various news sites—an inherently challenging task given the Web's dynamic nature. Quadrant IV probably contains fewer tasks than the other quadrants.

#### User movement.

Many factors can cause a user to move around the quadrants for any given task. The arrow in Figure 2 indicates one possible progression: The user goes from doing a task for the first time to developing an access pattern once it becomes more familiar to establishing shortcuts when it becomes important enough to access frequently.

Movement from quadrant III to II or from quadrant II to I is also possible. Users who do not perform a task for a length of time may forget shortcuts and access patterns. In addition, exposure to many similar stimuli can make recall tasks more difficult. 15 For example, remembering which of multiple similar search queries originally led to the desired information can be difficult. 2

### Tool design implications

Our results have several implications for the design of Web search tools. For example, users who are already proficient at tasks in quadrant III probably do not need help with them. However, if a user returns to a task previously in quadrant II or III after a period of not performing it, a refinding tool could help with remembering prior access patterns, possibly through a search history mechanism. For tasks in quadrant I, a user may benefit from a tool that could help distinguish between similar searches, recognize waypoints along a previous path, or identify a relevant bookmark from a long list. Understanding these quadrants' characteristics may help tool designers provide more tailored assistance with refinding.

## Conclusion

To enable the design and evaluation of more effective Web search tools, additional studies of users' finding and refinding behaviors are needed—both longitudinal studies to examine strategies and practices in actual use, and laboratory studies with controlled conditions to isolate specific factors that influence users' approaches. We believe that future tools must consider not just how users access data but also how these access patterns change over time: The tools must go beyond traditional keyword searching and leverage users' recognition and recall abilities.

The new search engine features described in the sidebar were either unavailable or early in their deployment at the time of our study and require further evaluation to fully understand their benefits and limitations. These features bring new tasks into the realm of what search engines can efficiently locate, and it will be interesting to see whether users with existing access strategies change their finding and refinding behaviors.

Our ongoing work 16 is part of a wider effort by researchers to learn how users use and reuse diverse types of electronic information, including e-mail, personal documents, address books, and electronic calendars. 17 The increasing reliance on mobile devices and sharing of data among multiple computers have added to the complexity of the challenge. Many emerging technologies in addition to Web search tools provide more support for information management and filtering—the Semantic Web is a good example of this trend. However, much work remains to be done, and the field for technological innovation is fertile.

## Emerging Web Search Tools and Features

New Web search tools and Web browser features underscore the need for better support for filtering, managing, and refinding online data.

#### Localized search

This feature uses geographic information such as postal zip codes to return search results relevant to the user's location. For example, searching for "pizza 24060" on Google returns a listing of pizza restaurants in Blacksburg, Virginia. Users can provide their location as part of the search query or, in some cases, register their location with the search engine.

#### Multiple databases

Automatic searching of multiple databases such as dictionaries, yellow pages, and book and movie listings enables search engines to tailor results, which often appear in visually distinct areas of the results page based on the different databases searched.

For example, the A9.com search engine provides check boxes for issuing a single query over multiple databases, and it displays the results in newspaper-style columns. Thus, as Figure A shows, a search for the term "white castle" displays relevant Web sites, books, and movies , and, if applicable, the address and phone number of a local White Castle restaurant.

Figure A   A9.com search engine. Users can issue a single query over multiple databases, and the search engine displays the results in newspaper-style columns.

#### Custom toolbars

Many search engine providers and third parties are releasing toolbars that integrate with Web browsers to provide a more personalized search experience. Toolbar features include highlighting search terms in returned pages, storing online bookmarks, browsing histories, saving queries for reuse, saving Web page snapshots on a server, and searching a personal set of saved Web pages.

#### Personal information search

Users can install tools that search for information stored on their computer. These tools create an index of the user's documents, including personal files, e-mail, Web pages, and instant message logs. Several have the ability to index text and metadata from Microsoft Office documents, PDF documents, and other file formats.

#### OS refinding support

Operating system vendors are also incorporating tools that support personal information search by indexing multiple data sources and providing access to them from a single, unified interface.

For example, Microsoft is investigating refinding in its Stuff I've Seen research project 1 and is designing the new WinFS file system to make finding and reusing information easier. 2 The latest version of Apple Computer's Mac OS X Tiger (v10.4) includes Spotlight ( www.apple.com/macosx/features/spotlight), a personal search tool that supports conducting searches based on temporal references such as "today" or "last week."

## Acknowledgments

We thank Miranda Capra for her support and assistance with data analysis along with Mary Pinney for her help running the study as well as work on a related research project. We also thank the reviewers and editors for their support and improvements to the article. This work was supported in part by the National Science Foundation under grant no. IIS-0049075.

## References

• 1. L. Grossman, "Search and Destroy: A Gang of Web-Search Companies Is Gunning for Google," Time,22 Dec. 2003, pp. 46-50.
• 2. A. Aula, N. Jhaveri, and M. Käki, "Information Search and Reaccess Strategies of Experienced Web Users, Proc. 14th Int'l Conf. World Wide Web, ACM Press, 2005, pp. 583-592.
• 3. J. Teevan, et al., "The Perfect Search Engine Is Not Enough: A Study of Orienteering Behavior in Directed Search," Proc. Sigchi Conf. Human Factors in Computing Systems, ACM Press, 2004, pp. 415-422.
• 4. R. Boardman, and M.A. Sasse, "Stuff Goes into the Computer and Doesn't Come Out: A Cross-Tool Study of Personal Information Management," Proc. Sigchi Conf. Human Factors in Computing Systems, ACM Press, 2004, pp. 583-590.
• 5. H. Bruce, W. Jones, and S. Dumais, "Information Behaviour That Keeps Found Things Found," Information Research, vol. 10, no. 1, 2004; http://informationr.net/ir/10-1/paper207.html.
• 6. J. Wen, "Post-Valued Recall Web Pages: User Disorientation Hits the Big Time," IT&Society, vol. 1, no. 3, 2003, pp. 184-194.
• 7. A. Cockburn, et al., "Improving Web Page Revisitation: Analysis, Design and Evaluation," IT & Society, vol. 1, no. 3, 2003, pp. 159-183.
• 8. R.G. Capra, and M.A. Pérez-Quiñones, Re-Finding Found Things: An Exploratory Study of How Users Re-Find Information, tech. report cs.HC/0402001, Computer Science Dept., Virginia Tech, 2003; http://arxiv.org/ftp/cs/papers/0310/0310011.pdf.
• 9. L. Tauscher, and S. Greenberg, "How People Revisit Web Pages: Empirical Findings and Implications for the Design of History Systems," Int'l J. Human-Computer Studies, vol. 47, no. 1, 1997, pp. 97-137.
• 10. Georgia Tech Graphics, Visualization, and Usability Center, 10th WWW User Survey, Georgia Tech Research Corp., 1998; www.gvu.gatech.edu/user_surveys/survey-1998-10.
• 11. P.P. Maglio, and R. Barrett, "How to Build Modeling Agents to Support Web Searchers," Proc. 6th Int'l Conf. User Modeling, Springer, 1997, pp. 5-16.
• 12. D.R. Karger, and D. Quan, "Haystack: A User Interface for Creating, Browsing, and Organizing Arbitrary Semistructured Information," CHI 04 Extended Abstracts Human Factors in Computing Systems, ACM Press, 2004, pp. 777-778.
• 13. J. Teevan, "How People Refind Information When the Web Changes," MIT AI memo 2004-12, June 2004; http://people.csail.mit.edu/teevan/work/publications/papers/aim04.pdf.
• 14. P. Pirolli, and S. Card, "Information Foraging in Information Access Environments," Proc. Sigchi Conf. Human Factors in Computing Systems, ACM Press, 1995, pp. 51-58.
• 15. J.R. Anderson, Cognitive Psychology and Its Implications, 5th ed., Worth, 1999.
• 16. R.G. Capra, and M.A. Pérez-Quiñones, "Mobile Refinding of Web Information Using a Voice Interface: An Exploratory Study," to appear in Proc. 2nd Latin American Conf. Human-Computer Interaction, ACM Press, 2005.
• 17. W. Jones, and H. Bruce, "A Report on the NSF-Sponsored Workshop on Personal Information Management," 2005; www.ischool.washington.edu/pim/ final%20PIM%report.pdf.