This Article 
 Bibliographic References 
 Add to: 
Relevant Data Expansion for Learning Concept Drift from Sparsely Labeled Data
March 2005 (vol. 17 no. 3)
pp. 401-412
Keeping track of changing interests is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. Being able to do so with a few feedback examples poses an even more important and challenging problem because existing concept drift learning algorithms that handle the task typically suffer from it. This paper presents a new computational Framework for Extending Incomplete Labeled Data Stream (FEILDS), which extends the capability of existing algorithms for learning concept drift from a few labeled data. The system transforms the original input stream into a new stream that can be conveniently tracked by the existing learning algorithms. The experiment results reveal that FEILDS can significantly improve the performances of a Multiple Three-Descriptor Representation (MTDR) algorithm, Rocchio algorithm, and window-based concept drift learning algorithms when learning from a sparsely labeled data stream with respect to their performances without using FEILDS.

[1] J. Allan, “Incremental Relevance Feedback for Information Filtering,” Proc. 19th Int'l Conf. Research and Development in Information Retrieval, pp. 270-278, 1996.
[2] C. Apté, F. Damerau, and S.M. Weiss, “Automatic Learning of Decision Rules for Text Categorization,” ACM Trans. Information Systems, vol. 12, no. 3, pp. 233-251, 1994.
[3] M. Balabanović, “An Adaptive Web Page Recommendation Service,” Proc. First Int'l Conf. Autonomous Agents, pp. 378-385, 1997.
[4] P.L. Bartlett, S.B. David, and S.R. Kulkarni, “Learning Changing Concepts by Exploiting the Structure of Change,” Computational Learning Theory, pp. 131-139, 1996.
[5] D. Billsus and M. Pazzani, “A Personal News Agent that Talks, Learns, and Explains,” Proc. Third Int'l Conf. Autonomous Agents, pp. 268-275, 1999.
[6] C. Blake and C. Merz UCI Repository of Machine Learning Databases, html , , Univ. of California, Irvine, Dept. of Information and Computer Sciences, 1998.
[7] A. Blum and S. Chawla, “Learning from Labeled and Unlabeled Data Using Graph Mincuts,” Proc. 18th Int'l Conf. Machine Learning, pp. 19-26, 2001.
[8] A. Blum and T. Mitchell, “Combining Labeled and Unlabeled Data with Co-Training,” Proc. 11th Ann. Conf. Computational Learning Theory, pp. 92-100, 1998.
[9] A. Blummer, A. Ehrenfeucht, D. Haussler, and M. Warmuth, “Learnability and the Vapnik-Chervonenkis Dimension,” J. ACM, vol. 36, no. 4, pp. 929-965, 1989.
[10] C.C. Chen, M.C. Chen, and Y. Sun, “PVA: A Self-Adaptive Personal View Agent,” J. Intelligent Information Systems, special issue on automated text categorization, vol. 18, nos. 2-3, pp. 173-194, 2002.
[11] L. Chen and K. Sycara, “WebMate: Personal Agent for Browsing and Searching,” Proc. Second Int'l Conf. on Autonomous Agents, pp. 132-139, 1998.
[12] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithms,” J. Royal Statistical Soc., Series B., vol. 39, no. 1, pp. 1-38, 1977.
[13] D.M. Gabbay, C.J. Hogger, and J.A. Robinson, Handbook of Logic in Artificial Intelligence and Logic Programming: V4. Epistemic and Temporal Reasoning. New York: Oxford Univ. Press, 1995.
[14] M.B. Harries, C. Sammut, and K. Horn, “Extracting Hidden Context,” Machine Learning, vol. 32, no. 2, pp. 101-128, 1998.
[15] D.P. Helmbold and P.M. Long, “Tracking Drifting Concepts by Minimizing Disagreement,” Machine Learning, vol. 14, no. 1, pp. 27-45, 1994.
[16] D.A. Hull, “The TREC-7 Filtering Track: Description and Analysis,” NIST Special Publication 500-242: The Seventh Text Retrieval Conf. (TREC-7), E.M. Voorhees and D.K. Harman, eds., pp. 33-56, 1998.
[17] A.K. Jain and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
[18] B.J. Jansen, A. Spink, and T. Saracevic, “Real Life, Real Users and Real Needs: A Study and Analysis of Users Queries on the Web,” Information Processing and Management, vol. 36, no. 2, pp. 207-227, 2000.
[19] R. Klinkenberg, “Using Labeled and Unlabeled Data to Learn Drifting Concepts,” Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI-01) Workshop Learning from Temporal and Spatial Data, klinkenberg _2001a.pdf, 2001.
[20] R. Klinkenberg and T. Joachims, “Detecting Concept Drift with Support Vector Machine,” Proc. 17th Int'l Conf. Machine Learning, pp. 487-494, 2000.
[21] R. Klinkenberg, “Learning Drifting Concepts with Partial User Feedback,” Beiträge zum Treffen der GI-Fachgruppe 1.1.3 Maschinelles Lernen (FGML-99), Perner, Petra, Fink, and Volkmar, eds., 1999.
[22] R. Klinkenberg and I. Renz, “Adaptive Information Filtering: Learning in the Presence of Concept Drifts,” Proc. AAAI Workshop Learning for Text Categorization, pp. 33-40, 1998.
[23] R. Kothari and V. Jain, “Learning from Labeled and Unlabeled Data,” Proc. 2002 Int'l Joint Conf. Neural Networks, pp. 2803-2808, 2002.
[24] K. Lang, “News Weeder: Learning to Filter News,” Proc. 12th Int'l Conf. Machine Learning, pp. 331-339, 1995.
[25] D.D. Lewis and M. Ringuette, “A Comparison of Two Learning Algorithms for Text Categorization,” Proc. Third Ann. Symp. Document Analysis and Information Retrieval, pp. 81-93, 1994.
[26] T.M. Mitchell, Machine Learning. New York: McGraw-Hill, 1997.
[27] M. Mitra, A. Singhal, and C. Buckley, “Improving Automatic Query Expansion,” Proc. 21st Conf. Research and Development in Information Retrieval, pp. 206-214, 1998.
[28] A. Moukas and G. Zacharia, “Evolving a Multi-Agent Information Filtering Solution in AMALTHEA,” Proc. First Int'l Conf. Autonomous Agents, pp. 394-403, 1997.
[29] J.J. Rocchio, “Relevance Feedback in Information Retrieval,” The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313-323, 1971.
[30] G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
[31] J.C. Schlimmer and R.H. Granger, “Beyond Incremental Processing: Tracking Concept Drift,” Proc. Fifth Nat'l Conf. Artificial Intelligence, pp. 502-507, 1986.
[32] G. Widmer and M. Kubat, “Learning in the Presence of Concept Drift and Hidden Contexts,” Machine Learning, vol. 23, no. 1, pp. 69-101, 1996.
[33] G. Widmer, “Tracking Context Changes through Meta-Learning,” Machine Learning, vol. 3, pp. 259-286, 1997.
[34] D.H. Widyantoro, T.R. Ioerger, and J. Yen, “An Adaptive Algorithm for Learning Changes in User Interests,” Proc. Eighth Int'l Conf. Information and Knowledge Management, pp. 405-412, 1999.
[35] D.H. Widyantoro, T.R. Ioerger, and J. Yen, “Learning User Interest Dynamics with a Three-Descriptor Representation,” J. Am. Soc. Information Science, vol. 52, no. 3, pp. 212-225, 2001.
[36] D.H. Widyantoro, T.R. Ioerger, and J. Yen, “An Incremental Approach to Building a Cluster Hierarchy,” Proc. Second IEEE Int'l Conf. Data Mining, pp. 705-708, 2002.
[37] I.H. Witten, A. Moffat, and T.C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images. New York: Van Nostrand Reinhold, 1994.
[38] Y. Yang, J.D. Carbonell, R.D. Brown, T. Pierce, B.T. Archibald, and X. Liu, “Learning Approaches for Detecting and Tracking News Events,” IEEE Intelligent Systems, special issue on applications of intelligent information retrieval, vol. 14, no. 4, pp. 32-43, 1999.
[39] T. Zhang and F.J. Oles, “A Probability Analysis on the Value of Unlabeled Data for Classification Problems,” Proc. 17th Int'l Conf. Machine Learning, pp. 1191-1198, 2000.

Index Terms:
Concept learning, relevance feedback, information filtering.
Dwi H. Widyantoro, John Yen, "Relevant Data Expansion for Learning Concept Drift from Sparsely Labeled Data," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 3, pp. 401-412, March 2005, doi:10.1109/TKDE.2005.48
Usage of this product signifies your acceptance of the Terms of Use.