The Community for Technology Leaders

Guest Editor's Introduction: Information Customization

Javed Mostafa, Indiana University, Bloomington

Pages: pp. 8-11

Increasing demands for time in our busy lives and the critical connection between timely information and an organization's bottom line require that we reverse (at least partially) the conventional ways we find information. Instead of users investing significant effort to find the right information, the right information should find the users. Information customization systems attempt to accomplish this by automating many functions of today's information retrieval systems and provide features to optimally use information.

Ideally, IC systems function proactively, continuously scanning appropriate resources, analyzing and comparing content, selecting relevant information, and presenting it as visualizations or in a pruned format. IC systems do not preclude users from self-directed information finding, such as browsing and searching. In fact, they often combine user-directed functions with proactive search functions to meet the user's demands.

To help establish the user's information needs, IC systems monitor user-system interactions or transactions. Such tracking's end result is an interest representation that may range from a complex user model to a simple keyword list. The interest representation's ability to predict new information's relevance makes it useful to the IC system. However, interest representations can also reconfigure resources to improve search effectiveness.


Context-aware computing is an emerging research area that offers interesting insights into how to detect more varied contextual clues related to a user's environment and how to use such clues to customize environmental conditions. 1 CAC is a broader area than IC because it deals with a wider variety of applications (for example, smart homes and smart weapons). When you consider the wide-ranging ways users interact with computers, it becomes apparent that beyond transaction data, many more contextual clues exist that IC systems can leverage to customize information. These clues include the type of application being used, the type of information resources used, the devices being used, the locations of use, and even the time of use. An initiative is under way at Indiana University to develop a context-aware ubiquitous and persistent information delivery (Cupid) system (see Figure 1).


Figure 1: The Cupid (Context-aware Ubiquitous & Persistent Information Delivery) system will acquire interest representations (the green dot) as a function of three key dimensions: topic, time, and location   Use of nontopical clues and the ability to customize information across different devices are some of Cupid's unique features.

The broader perspective offers a way to distinguish between personalization and customization functions. Personalization functions are a subset of customization functions. They are strictly based on information request, use, and demand patterns. Customization involves functions beyond personalization that consider factors such as location, time, and the device being used to identify, structure, and present relevant information.


Collecting contextual information that relates to users' interests and information needs is challenging for IC systems because users have little tolerance for the system intruding to collect this information. So, IC systems attempt to leverage implicit evidence of interest, such as specific page selection during Web traversal. They also use collaborative-filtering approaches, which collect evidence from user communities to predict relevance for individual users.

Another challenge involves interest representation itself. The representation strategy and technique must be robust and responsive. As a user's interests or demands change over time, the change must be captured quickly, and predictions must match the user's most recent interests and needs. A related issue is protecting the interest representation from abuse. Security and privacy are very important to IC systems but are rarely addressed.

The type of resources presents another set of barriers. Resources might be heterogeneous regarding content and format. Such heterogeneity makes it difficult to generate robust interest representation, match the representation to resources, and access information from the resources. Resources are seldom static; they cease to exist or their content evolves. Their dynamic nature increases interest representation's complexity.

In "Online Customized Index Synthesis in Commercial Web Sites," Mamata Jenamani, Pratap Mohapatra, and Sujoy Ghose focus on providing intelligent navigation aids to Web site visitors by dynamically generating links to related pages as the visitor browses the site. The authors model page navigation as a Markov process and assume that a user occupying a state (one or more Web pages belonging to an information category) transitions to another state on the basis of a certain probability. The authors treat establishing the next state to transition as a Markov decision problem. They apply two basic strategies to establish subsequent states:

  • Most Accessed Pages, using page request frequency.
  • Company's Interest Pages, combining a priori rewards the site owner assigns to states with the particular page selection the user traverses. CIP applies a value-determination algorithm to calculate new links' utilities on the basis of the states' rewards and the current state, and the probability of traversing to the link from the current state.

Each state's utility value is continuously updated as a sum of the chosen state's reward and products of probabilities and utilities of states that link to it. Because all users' traversal patterns drive the interest representation (and in the case of the second strategy, combine with rewards based on the site policy), the computational load and demands on individual users for representation generation is minimized. Comparative evaluation of the two main strategies showed that CIP is superior. Visitor's Interest Pages, a generalization of the CIP strategy that can handle multiple information categories, produced predictions that fit the expected patterns.

In "Using Document Access Sequences to Recommend Customized Information," Travis Bauer and David Leake also track users' Web traversals but apply a different technique to generate page recommendations. For a system called Calvin, they use an unsupervised-learning-network algorithm to identify a set of key terms and weights that represent the user's interests. Calvin uses a three-layer network that associates each layer with a different term-identification property. Layer one mimics short-term memory and identifies a subset of words that occur frequently in the document stream. Layers two and three are designed as long-term memory. Layer two identifies terms that tend to occur frequently over a longer time interval; layer three identifies terms that generally occur frequently. Each unit in a layer has a triplet of values: a term, an activation factor representing the likelihood that the unit can be bound to a different term, and a priming factor representing the activation factor's rate of change. The number, values, connections, and update strategies of units ensure that terms compete for them to permit their selection according to the three layers' properties. Terms thus selected are treated as the interest representation that is used to generate queries for related documents. The network is updated continuously, so it can detect changes in interest over time.

Bauer and Leake argue that two popular strategies for term representation and weighting— term frequency inverse document frequency and latent semantic indexing—are unsuitable for designing personal information agents because agents might have a limited amount of information and must operate over shorter document streams. The authors conducted a simulation study using a transaction log of documents browsed by actual users on four specific topics and evaluated Calvin's performance on retrieving documents similar to those used to generate the context. Calvin's performance was superior compared to that produced with representations based on TFIDF and LSI. However, they acknowledge that the evaluation was tuned to the four topics that formed the document domain's scope; their approach must be evaluated on a broader domain with different browsing session lengths to establish its generality.

In "Helping Online Customers Decide through Web Personalization," Sung Ho Ha applies a hybrid strategy for generating interest representation, combining users' personal transactions with peer-group-level transaction patterns for an e-commerce site. The system collects two types of transaction records: purchase patterns (buying behavior) and purchase-related interactions such as item selection (buying attitude). Three element vectors represent the behavior data—purchase recency, purchase frequency, and total purchase amount—for each user, and the system segments the data with a self-organizing map (SOM) clustering approach. Each user's attitude data is also collected as three element vectors containing the frequency of list-to-detail, detail-to-cart, and list-to-cart transactions. ( List refers to item lists presented in the site, cart refers to online shopping carts, and detail refers to expanded information on items presented on request.) The system also clusters these vectors using SOM. The system generates product recommendations on the basis of offline calculations of product-product and product-category association rules, where the association is valid if a certain proportion of purchase transactions reflect it (the system administrator sets the minimum). The system generates three classes of rules: mild, moderate, and strong. The mild class relies on all the individual user's purchase transactions, the moderate class uses all the peer groups' purchase transactions, and the strong class uses purchase transactions of users with similar attitudes in the behavior-based segment. So, an individual's interaction patterns drive the recommendations based on mild rules, while group-level interactions determine recommendations based on the other two classes. Ha presents evaluation data comparing the utility of recommendations produced using the three types of rules. He shows that the three types of recommendations impact consumer attitudes in significantly different ways and that a site with recommendations has a more positive impact on users than a site without recommendations.

In "Building Adaptive E-Catalog Communities Based on User Interaction Patterns," Hye-young Paik and Boualem Benatallah approach customization from the information resources perspective. Their article addresses the dynamic adaptation of online catalogs—accessible through e-commerce sites—based on how customers use the catalogs. The authors propose a catalog organization scheme that assigns a group of catalogs concentrating on similar product types to the same community. Catalog communities associate with each other as a tree: groups of related subcommunities are categorized and assigned to a supercommunity, and the supercommunities link to a root community called AllCatalog. Catalog communities belonging to the same supercommunity are considered peers, and their association strength is represented using a weight value (between 0 and 1). At the leaf level are the actual catalogs that vendors supply. The catalog community scheme facilitates a query that is directed toward a single catalog or applied to a set of catalogs simultaneously. A query submitted to a community can also be forwarded to peer communities on the basis of the peer association's degree of relevancy.

Paik and Benatallah convincingly argue that adapting the catalog scheme just described can improve navigation and access. They identify a set of interaction patterns called interaction sequences using a combination of atomic interactions. The system continuously monitors pattern frequency and generates patterns' relevance estimates by considering patterns that reinforce each other and those that conflict. Identifying interaction sequences and their relevance is approximately equivalent to generating interest representations. Relevance estimates of interaction sequences are compared to a predefined threshold value, which could trigger catalog scheme restructuring, such as merge, split, and move and update peer group association values. The authors evaluated their system by simulating navigation and access with a set of software agents and showed that catalog community reconfiguration can increase the likelihood that agents will find target information about products.

The final paper, "Adaptive Assistants for Customized E-Shopping," by Filippo Menczer, Alvaro E. Monge, and W. Nick Street, concentrates on customizing items retrieved and recommended on an e-commerce site. With this system, users can take on different personae; each persona is associated with a specific interest facet (for example, gadget geek versus movie freak). The system generates and updates individual interest representations for each persona. Interest representations, referred to as profiles, contain numerical categories and keywords for which the system maintains continuously updated values reflecting individual customers' interests. The profiles' update is unobtrusive, based on common actions performed on particular items retrieved or recommended. The system tracks five types of actions:

  • buy (strong positive feedback)
  • browse (weak positive feedback)
  • ignore—does not get to an item on the list (no change)
  • explicitly skip—bypasses an item even when presented (negative feedback)
  • remove (strong negative feedback)

These feedback values help update weights in corresponding features and keywords in the profiles using a simple additive formula that also incorporates a learning-rate parameter. The authors present evaluation data demonstrating that the system can, with minimal initial degradation in performance, acquire profiles whose performance correlates strongly with performance based on inferred interest from feedback. The article's particularly novel contribution is its system architecture incorporating privacy agents, an anonymizing server, and the shopping personae. This architecture deserves special attention because it is one of the few that addresses IC system privacy concerns.


The five papers describe various approaches to tackling the challenges associated with IC. Although these approaches differ in actual technique and implementation, they share some common attributes. Clearly, unobtrusive means of collecting interest information is generally favored. Researchers leveraged implicit interest indicators such as users' navigation paths through the content and users' interaction patterns. They also avoided using representation techniques that require extensive batch or a priori knowledge engineering. Representations were generally inspired by machine-learning approaches employing "learn while performing" or unsupervised learning principles. Two papers—one by Paik and Benatallah, the other by Menczer, Street, and Monge—also described innovative techniques for dealing with challenges associated with resource heterogeneity.

Looking toward the future, we see new and exciting avenues opening up for delivering customized information, especially as wireless and location-aware information systems become more ubiquitous. These new devices, however, bring with them new challenges. The articles in this issue hopefully will encourage new initiatives in the IC area and support its growth. We also hope that as the area matures further, IC researchers will not only succeed in tackling new challenges but will, in fact, influence the development of new information systems.


The two associate editors of this special issue, Snehasis Mukhopadhyay, associate professor of computer and information science at Indiana University-Purdue University, Indianapolis, and Wai Lam, associate professor of systems engineering and engineering management at Chinese University, have been extremely generous with their time and helpful in selecting the final set of papers. Special thanks to the anonymous reviewers for providing useful feedback. We are also deeply grateful to the IEEE staff for their support.


About the Authors

Bio Graphic
Javed Mostafa is the Victor H. Yngve Associate Professor of Information Science and Associate Professor of Informatics at Indiana University. He is the director of the Laboratory of Applied Informatics Research at Indiana University. His research interest is intelligent interfaces for information retrieval. He has a PhD in information science from the University of Texas at Austin. He is a member of the AAAS, the ACM, ASIST, and the IEEE. Contact him at Indiana Univ., 1320 E 10th St., Bloomington, IN 47405-3907;;
62 ms
(Ver 3.x)