This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Relevant Data Expansion for Learning Concept Drift from Sparsely Labeled Data
March 2005 (vol. 17 no. 3)
pp. 401-412
Keeping track of changing interests is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. Being able to do so with a few feedback examples poses an even more important and challenging problem because existing concept drift learning algorithms that handle the task typically suffer from it. This paper presents a new computational Framework for Extending Incomplete Labeled Data Stream (FEILDS), which extends the capability of existing algorithms for learning concept drift from a few labeled data. The system transforms the original input stream into a new stream that can be conveniently tracked by the existing learning algorithms. The experiment results reveal that FEILDS can significantly improve the performances of a Multiple Three-Descriptor Representation (MTDR) algorithm, Rocchio algorithm, and window-based concept drift learning algorithms when learning from a sparsely labeled data stream with respect to their performances without using FEILDS.

[1] J. Allan, “Incremental Relevance Feedback for Information Filtering,” Proc. 19th Int'l Conf. Research and Development in Information Retrieval, pp. 270-278, 1996.
[2] C. Apté, F. Damerau, and S.M. Weiss, “Automatic Learning of Decision Rules for Text Categorization,” ACM Trans. Information Systems, vol. 12, no. 3, pp. 233-251, 1994.
[3] M. Balabanović, “An Adaptive Web Page Recommendation Service,” Proc. First Int'l Conf. Autonomous Agents, pp. 378-385, 1997.
[4] P.L. Bartlett, S.B. David, and S.R. Kulkarni, “Learning Changing Concepts by Exploiting the Structure of Change,” Computational Learning Theory, pp. 131-139, 1996.
[5] D. Billsus and M. Pazzani, “A Personal News Agent that Talks, Learns, and Explains,” Proc. Third Int'l Conf. Autonomous Agents, pp. 268-275, 1999.
[6] C. Blake and C. Merz UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/_mlearnMLRepository. html , , Univ. of California, Irvine, Dept. of Information and Computer Sciences, 1998.
[7] A. Blum and S. Chawla, “Learning from Labeled and Unlabeled Data Using Graph Mincuts,” Proc. 18th Int'l Conf. Machine Learning, pp. 19-26, 2001.
[8] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training,” Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.
[9] A. Blummer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, “Learnability and the Vapnik-Chervonenkis Dimension,” J. ACM, vol. 36, no. 4, pp. 929-965, 1989.
[10] C.C. Chen, M.C. Chen, and Y. Sun, “PVA: A Self-Adaptive Personal View Agent,” J. Intelligent Information Systems, special issue on automated text categorization, vol. 18, nos. 2-3, pp. 173-194, 2002.
[11] L. Chen and K. Sycara, “WebMate: Personal Agent for Browsing and Searching,” Proc. Second Int'l Conf. on Autonomous Agents, pp. 132-139, 1998.
[12] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithms,” J. Royal Statistical Soc., Series B., vol. 39, no. 1, pp. 1-38, 1977.
[13] D.M. Gabbay, C.J. Hogger, and J.A. Robinson, Handbook of Logic in Artificial Intelligence and Logic Programming: V4. Epistemic and Temporal Reasoning. New York: Oxford Univ. Press, 1995.
[14] M.B. Harries, C. Sammut, and K. Horn, “Extracting Hidden Context,” Machine Learning, vol. 32, no. 2, pp. 101-128, 1998.
[15] D.P. Helmbold and P.M. Long, “Tracking Drifting Concepts by Minimizing Disagreement,” Machine Learning, vol. 14, no. 1, pp. 27-45, 1994.
[16] D.A. Hull, “The TREC-7 Filtering Track: Description and Analysis,” NIST Special Publication 500-242: The Seventh Text Retrieval Conf. (TREC-7), E.M. Voorhees and D.K. Harman, eds., pp. 33-56, 1998.
[17] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[18] B.J. Jansen, A. Spink, and T. Saracevic, “Real Life, Real Users and Real Needs: A Study and Analysis of Users Queries on the Web,” Information Processing and Management, vol. 36, no. 2, pp. 207-227, 2000.
[19] R. Klinkenberg, “Using Labeled and Unlabeled Data to Learn Drifting Concepts,” Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI-01) Workshop Learning from Temporal and Spatial Data, http://www-ai.cs.uni-dortmund.de/DOKUMENTE klinkenberg _2001a.pdf, 2001.
[20] R. Klinkenberg and T. Joachims, “Detecting Concept Drift with Support Vector Machine,” Proc. 17th Int'l Conf. Machine Learning, pp. 487-494, 2000.
[21] R. Klinkenberg, “Learning Drifting Concepts with Partial User Feedback,” Beiträge zum Treffen der GI-Fachgruppe 1.1.3 Maschinelles Lernen (FGML-99), Perner, Petra, Fink, and Volkmar, eds., 1999.
[22] R. Klinkenberg and I. Renz, “Adaptive Information Filtering: Learning in the Presence of Concept Drifts,” Proc. AAAI Workshop Learning for Text Categorization, pp. 33-40, 1998.
[23] R. Kothari and V. Jain, “Learning from Labeled and Unlabeled Data,” Proc. 2002 Int'l Joint Conf. Neural Networks, pp. 2803-2808, 2002.
[24] K. Lang, “News Weeder: Learning to Filter News,” Proc. 12th Int'l Conf. Machine Learning, pp. 331-339, 1995.
[25] D.D. Lewis and M. Ringuette, “A Comparison of Two Learning Algorithms for Text Categorization,” Proc. Third Ann. Symp. Document Analysis and Information Retrieval, pp. 81-93, 1994.
[26] T.M. Mitchell, Machine Learning. New York: McGraw-Hill, 1997.
[27] M. Mitra, A. Singhal, and C. Buckley, “Improving Automatic Query Expansion,” Proc. 21st Conf. Research and Development in Information Retrieval, pp. 206-214, 1998.
[28] A. Moukas and G. Zacharia, “Evolving a Multi-Agent Information Filtering Solution in AMALTHEA,” Proc. First Int'l Conf. Autonomous Agents, pp. 394-403, 1997.
[29] J.J. Rocchio, “Relevance Feedback in Information Retrieval,” The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313-323, 1971.
[30] G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
[31] J.C. Schlimmer and R.H. Granger, “Beyond Incremental Processing: Tracking Concept Drift,” Proc. Fifth Nat'l Conf. Artificial Intelligence, pp. 502-507, 1986.
[32] G. Widmer and M. Kubat, “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, vol. 23, no. 1, pp. 69-101, 1996.
[33] G. Widmer, “Tracking Context Changes through Meta-Learning,” Machine Learning, vol. 3, pp. 259-286, 1997.
[34] D.H. Widyantoro, T.R. Ioerger, and J. Yen, “An Adaptive Algorithm for Learning Changes in User Interests,” Proc. Eighth Int'l Conf. Information and Knowledge Management, pp. 405-412, 1999.
[35] D.H. Widyantoro, T.R. Ioerger, and J. Yen, “Learning User Interest Dynamics with a Three-Descriptor Representation,” J. Am. Soc. Information Science, vol. 52, no. 3, pp. 212-225, 2001.
[36] D.H. Widyantoro, T.R. Ioerger, and J. Yen, “An Incremental Approach to Building a Cluster Hierarchy,” Proc. Second IEEE Int'l Conf. Data Mining, pp. 705-708, 2002.
[37] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. New York: Van Nostrand Reinhold, 1994.
[38] Y. Yang, J.D. Carbonell, R.D. Brown, T. Pierce, B.T. Archibald, and X. Liu, “Learning Approaches for Detecting and Tracking News Events,” IEEE Intelligent Systems, special issue on applications of intelligent information retrieval, vol. 14, no. 4, pp. 32-43, 1999.
[39] T. Zhang and F.J. Oles, “A Probability Analysis on the Value of Unlabeled Data for Classification Problems,” Proc. 17th Int'l Conf. Machine Learning, pp. 1191-1198, 2000.

Index Terms:
Concept learning, relevance feedback, information filtering.
Citation:
Dwi H. Widyantoro, John Yen, "Relevant Data Expansion for Learning Concept Drift from Sparsely Labeled Data," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 401-412, March 2005, doi:10.1109/TKDE.2005.48
Usage of this product signifies your acceptance of the Terms of Use.