The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.18)
pp: 2839-2848
F. Heimerl , Inst. for Visualization & Interactive Syst., Univ. Stuttgart, Stuttgart, Germany
S. Koch , Inst. for Visualization & Interactive Syst., Univ. Stuttgart, Stuttgart, Germany
H. Bosch , Inst. for Visualization & Interactive Syst., Univ. Stuttgart, Stuttgart, Germany
T. Ertl , Inst. for Visualization & Interactive Syst., Univ. Stuttgart, Stuttgart, Germany
ABSTRACT
Performing exhaustive searches over a large number of text documents can be tedious, since it is very hard to formulate search queries or define filter criteria that capture an analyst's information need adequately. Classification through machine learning has the potential to improve search and filter tasks encompassing either complex or very specific information needs, individually. Unfortunately, analysts who are knowledgeable in their field are typically not machine learning specialists. Most classification methods, however, require a certain expertise regarding their parametrization to achieve good results. Supervised machine learning algorithms, in contrast, rely on labeled data, which can be provided by analysts. However, the effort for labeling can be very high, which shifts the problem from composing complex queries or defining accurate filters to another laborious task, in addition to the need for judging the trained classifier's quality. We therefore compare three approaches for interactive classifier training in a user study. All of the approaches are potential candidates for the integration into a larger retrieval system. They incorporate active learning to various degrees in order to reduce the labeling effort as well as to increase effectiveness. Two of them encompass interactive visualization for letting users explore the status of the classifier in context of the labeled documents, as well as for judging the quality of the classifier in iterative feedback loops. We see our work as a step towards introducing user controlled classification methods in addition to text search and filtering for increasing recall in analytics scenarios involving large corpora.
INDEX TERMS
text analysis, data visualisation, interactive systems, iterative methods, learning (artificial intelligence), pattern classification, query processing, text search, visual classifier training, text document retrieval, search queries, filter criteria, machine learning, classification methods, interactive classifier training, interactive visualization, labeled documents, iterative feedback loops, user controlled classification methods, Human computer interaction, Information retrieval, Performance evaluation, Visual analytics, Training data, Learning systems, Classification, user evaluation, Visual analytics, human computer interaction, information retrieval, active learning, classification
CITATION
F. Heimerl, S. Koch, H. Bosch, T. Ertl, "Visual Classifier Training for Text Document Retrieval", IEEE Transactions on Visualization & Computer Graphics, vol.18, no. 12, pp. 2839-2848, Dec. 2012, doi:10.1109/TVCG.2012.277
REFERENCES
[1] M. Ankerst, C. Elsen, M. Ester,, and H.-P. Kriegel., Visual classification: An interactive approach to decision tree construction. In Proc. 5th Int. Conf. on Knowledge Discovery and Data Mining, KDD’99, pages 392-396, 1999.
[2] M. Ankerst, M. Ester, and H.-P. Kriegel., Towards an effective cooperation of the user and the computer for classification. In Proc. 6th Int. Conf. on Knowledge Discovery and Data Mining, KDD’OO, pages 179-188, 2000.
[3] A. Becks and C. Seeling., Swapit: a multiple views paradigm for exploring associations of texts and structured data. In Proc. of the Working Conf. on Advanced Visual Interfaces, AVI ‘04, pages 193-196, New York, NY, USA, 2004. ACM.
[4] E. Bertini and D. Lalanne., Surveying the complementary role of automatic data analysis and visualization in knowledge discovery. In Proc. of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery, VAKD’09, pages 12-20, 2009.
[5] C. J. Burges., A tutorial on support vector machines for pattern recognition Data Mining and Knowledge Discovery, 2: 121-167, 1998.
[6] C. Campbell, N. Cristianini, and A. Smola., Query learning with large margin classifiers. In Proc. of the 17th Int. Conf. on Machine Learning, ICML’OO, pages 111-118, 2000.
[7] M. Chalmers and P. Chitson., Bead: Explorations in information visualization. In In Proc. of the 15th annual into conf. on research and development in information retrieval, ACM SIGIR’92, pages 330-337, 1992.
[8] N. Cristianini and J. Shawe-Taylor., An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2010.
[9] E. Eaton, G. Holness, and D. McFarlane., Interactive learning using manifold geometry. In Proc. of the AAAI Fall Symposium on Manifold Learning and Its Applications, AAAI’09, pages 10-17, 2009.
[10] A. Endert, C. Han, D. Maiti., L. House, S. Leman,, and C. North., Observation-level interaction with statistical models for visual analytics. In Proc. of the Conf. on Visual Analytics Science and Technology, VAST’ 11, pages 121-130, 2011.
[11] J. A. Fails and D. R. Olsen, Jr., Interactive machine learning. In Proc. of the 8th Int. Conf. on Intelligent User Interfaces, IUI’03, pages 39-45, 2003.
[12] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, LIB-LINEAR: A library for large linear classification Journal of Machine Learning Research, 9: 1871-1874, 2008.
[13] J. Fogarty, D. Tan, A. Kapoor,, and S. Winder., Cueflik: interactive concept learning in image search. In Proc. of the 26th Conf. on Human Factors in Computing Systems, CHI’08, pages 29-38, 2008.
[14] S. G. Hart and L. E. Stavenland., Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock, and N. Meshkati, editors, Human Mental Workload, chapter 7. Elsevier, 1988.
[15] M. A, Hearst Search User Interfaces. Cambridge University Press, 1st edition, 2009.
[16] B. Höferlin, R. Netzel, M. Höferlin, D. Weiskopf, and G. Heidemann., Interactive learning of ad-hoc classifiers for video visual analytics. In Proc. of the Conf. on Visual Analytics Science and Technology, VAST’ 12, 2012.
[17] T. Joachims., Text categorization with support vector machines: Learning with many relevant features. In European conf. on Machine Learning, ECML’98, pages 137-142. Springer, 1998.
[18] T. Joachims., Transductive inference for text classification using support vector machines. In Int. conf. on Machine Learning, ICML’99, pages 200-209, 1999.
[19] R. Jones, A. Mccallum, K. Nigam,, and E. Riloff., Bootstrapping for text learning tasks. In Workshop on Text Mining: Foundations, Techniques and Applications, IJCAI’99, pages 52-63, 1999.
[20] D. Keim, J. Kohlhammer, G. Ellis,, and F. Mansmann,editors. Mastering The Information Age: Solving Problems with Visual Analytics. Eurographics, 2010.
[21] K. Lang., Newsweeder: Learning to filter netnews. In Proc. of the 12th Int. Machine Learning Conf., ICML’95, pages 331-339, 1995.
[22] D. Lewis and W. Gale., A sequential algorithm for training text classifiers. In Proc. of the 17th Int. conf. on Research and Development in Information Retrieval, SIGIR’94, pages 3-12, 1994.
[23] Y. Liu and G. Salvendy, Design and evaluation of visualization support to facilitate decision trees classification Int. Journal of Human-Computer Studies, 65: 95-110, 2007.
[24] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008.
[25] T. May and J. Kohlhammer, Towards closing the analysis gap: Visual generation of decision supporting schemes from raw data Computer Graphics Forum, 27(3): 911-918, 2008.
[26] J. Moehrmann and G. Heidemann., Efficient annotation of image data sets for computer vision applications. In Proc. of the 1st Int. Workshop on Visual Interfaces for Ground Truth Collection in Computer Vision Applications, VIGTA’ 12, pages 2:1-2:6, 2012.
[27] F. Olsson., A literature survey of active machine learning in the context of natural language processing. Technical Report 06, Swedish Institue of Computer Science, 2009.
[28] F. V. Paulovich,L. G. Nonato, R. Minghim, and H. Levkowitz., Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping IEEE Trans. on Visualization and Computer Graphics, 14(3): 564-575, 2008.
[29] F. V. Paulovich,M. C., F. Oliveira,, and R. Minghim., The projection ex-plorer: A flexible tool for projection-based multidimensional visualization. In Proc. of the 20th Brazilian Symposium on Computer Graphics and Image Processing, SIBGRAPI’07, pages 27-36, 2007.
[30] P. Pirolli and S. Card., The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proc. of Int. conf. on Intelligence Analysis, pages 2-4, 2005.
[31] F. Poulet., Visual data mining. chapter Towards Effective Visual Data Mining with Cooperative Approaches, pages 389-406. Springer-Verlag, 2008.
[32] T. Rose, M. Stevenson, and M. Whitehead., The reuters corpus 1-from yesterday's news to tomorrow's language resources. In Proc. of the 3rd Int. conf. on Language Resources and Evaluation, LREC’02, pages 29-31, 2002.
[33] I. Ruthven and M. Lalmas, A survey on the use of relevance feedback for information access systems Knowledge Engineering Review, 18(2): 95-145, 2003.
[34] P. Saraiya, C. North, V. Lam,, and K. Duca., An insight-based longitudinal study of visual analytics. IEEE Trans. on Visualization and Computer Graphics, 12(6): 1511-1522, 2006.
[35] G. Schohn and D. Cohn., Less is more: Active learning with support vector machines. In Proc. of the 17th Int. conf. on Machine Learning, ICML’00, pages 839-846, 2000.
[36] F. Sebastiani, Machine learning in automated text categorization ACM Computing Surveys, 34: 1-47, 2002.
[37] C. Seifert and M. Granitzer., User-based active learning. In Int. conf. on Data Mining, Workshops, ICDMW’ 10, pages 418-425, 2010.
[38] C. Seifert, V. Sabol, and M. Granitzer., Classifier hypothesis generation using visual analysis methods. In Proc. of the Second Int. conf. on Networked Digital Technologies, pages 98-111, 2010.
[39] B. Settles., Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison, 2009.
[40] B. Settles., Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proc. of the conf. on Empirical Methods in Natural Language Processing, EMNLP’ 11, pages 1467-1478, 2011.
[41] J. Stasko,C. Görg,, and Z. Liu., Jigsaw: supporting investigative analysis through interactive visualization. Information Visualization, 7(2): 118-132, 2008.
[42] C. A. Steed,C. D. Symonsa,F. A. DeNap,, and T. E. Potok., Guided text analysis using adaptive visual analytics. In Proc. of conf. on Visualization and Data Analysis, VDA’ 12, 2012.
[43] J. Thomas and J. Kielman, Challenges for visual analytics Information Visualization, 8(4): 309-314, 2009.
[44] S. Tong and D. Koller., Support vector machine active learning with applications to text classification. In Journal of Machine Learning Research, 2, pages 45-66, 2000.
[45] S. van den Elzen and J. van Wijk., Baobabview: Interactive construction and analysis of decision trees. In Proc. of the conf. on Visual Analytics Science and Technology, VAST’11, pages 151-160, 2011.
[46] V. Vapnik., Statistical learning theory. Wiley, 1998.
[47] M. Ware, E. Frank, G. Holmes., M. Hall, and I. H. Witten., Interactive machine learning: letting users build classifiers Int. Journal on Human-Computer Studies, 55(3): 281-292, 2002.
[48] P. C. Wong, B. Hetzler, C. Posse., M. Whiting, S. Havre., N. Cramer, A. Shah., M. Singhal, A. Turner,, and J. Thomas., In-spire infovis 2004 contest entry. In Proc. of the IEEE Symposium on Information Visualization, INFOVIS’ 04, 2004.
19 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool