The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2011 vol.17)
pp: 2459-2468
José Gustavo Paiva , ICMC, University of São Paulo, Brazil
Laura Florian , ICMC, University of São Paulo, Brazil
Hélio Pedrini , University of Campinas, Brazil
Guilherme Telles , University of Campinas, Brazil
Rosane Minghim , ICMC, University of São Paulo, Brazil
ABSTRACT
An alternative form to multidimensional projections for the visual analysis of data represented in multidimensional spaces is the deployment of similarity trees, such as Neighbor Joining trees. They organize data objects on the visual plane emphasizing their levels of similarity with high capability of detecting and separating groups and subgroups of objects. Besides this similarity-based hierarchical data organization, some of their advantages include the ability to decrease point clutter; high precision; and a consistent view of the data set during focusing, offering a very intuitive way to view the general structure of the data set as well as to drill down to groups and subgroups of interest. Disadvantages of similarity trees based on neighbor joining strategies include their computational cost and the presence of virtual nodes that utilize too much of the visual space. This paper presents a highly improved version of the similarity tree technique. The improvements in the technique are given by two procedures. The first is a strategy that replaces virtual nodes by promoting real leaf nodes to their place, saving large portions of space in the display and maintaining the expressiveness and precision of the technique. The second improvement is an implementation that significantly accelerates the algorithm, impacting its use for larger data sets. We also illustrate the applicability of the technique in visual data mining, showing its advantages to support visual classification of data sets, with special attention to the case of image classification. We demonstrate the capabilities of the tree for analysis and iterative manipulation and employ those capabilities to support evolving to a satisfactory data organization and classification.
INDEX TERMS
Similarity Trees, Multidimensional Projections, Image Classification.
CITATION
José Gustavo Paiva, Laura Florian, Hélio Pedrini, Guilherme Telles, Rosane Minghim, "Improved Similarity Trees and their Application to Visual Data Classification", IEEE Transactions on Visualization & Computer Graphics, vol.17, no. 12, pp. 2459-2468, Dec. 2011, doi:10.1109/TVCG.2011.212
REFERENCES
[1] C. Bachmaier, U. Brandes, and B. Schlieper, Drawing Phylogenetic Trees. In Algorithms and Computation, volume 3827 of Lecture Notes in Computer Science, pages 1110–1121, 2005.
[2] B. B. Bederson, B. Shneiderman, and M. Wattenberg, Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies. ACM Transaction on Graphics, 21 (4): 833–854, October 2002.
[3] S. Büttcher, C. L. A. Clarke, and G. V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, Cambridge, MA, USA, 2010.
[4] M. Chalmers, A Linear Iteration Time Layout Algorithm for Visualising High-Dimensional Data. In Proceedings of the 7th Conference on Visualization (VIS'96), pages 127–131, San Francisco, CA, USA, 1996.
[5] Y. Chen, L. Wang, M. Dong, and J. Hua, Exemplar-based Visualization of Large Document Corpus. IEEE Transactions on Visualization and Computer Graphics, 15: 1161–1168, 2009.
[6] L. Cinque, S. Levialdi, A. Malizia, and K. Olsen, A Multidimensional Image Browser. Journal of Visual Languages and Computing, 9 (1): 103– 117, 1998.
[7] C. Cortes and V. Vapnik, Support-Vector Networks. Machine Learning, 20 (3): 273–297, 1995.
[8] T. M. Cover and P. E. Hart, Nearest Neighbor Pattern Classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory, 13: 21–27, 1967.
[9] T. F. Cox and M. A. A. Cox, Multidimensional Scaling. Chapman & Hall/CRC, 2nd edition, 2000.
[10] A. M. Cuadros, F. V. Paulovich, R. Minghim, and G. P. Telles, Point Placement by Phylogenetic Trees and its Application for Visual Analysis of Document Collections. In Proceedings of IEEE Symposium on Visual Analytics Science and Technology (VAST'2007), pages 99–106, Sacramento, CA, USA, 2007.
[11] J. Daniels, E. W. Anderson, L. G. Nonato, and C. T. Silva, Interactive Vector Field Feature Identification. IEEE Transactions on Visualization and Computer Graphics, 16: 1560–1568, 2010.
[12] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. John Wiley & Sons Inc, New York, NY, USA, 1973.
[13] P. A. Eades, A Heuristic for Graph Drawing. Congressus Numerantium, 42: 149–160, 1984.
[14] H. Ehrig, K. Ehrig, U. Prange, and G. Taentzer, Fundamentals of Algebraic Graph Transformation. Springer, 2006.
[15] D. M. Eler, M. Y. Nakazaki, F. V. Paulovich, D. P. Santos, G. F. Andery, M. C. F. Oliveira, J. Batista-Neto, and R. Minghim, Visual Analysis of Image Collections. The Visual Computer, 25 (10): 923–937, 2009.
[16] I. Elias and J. Lagergren, Fast Neighbor Joining. In Proceedings of the 32nd International Colloquium on Automata, Languages and Programming (ICALP'05), volume 3580, pages 1263–1274, 2005.
[17] J. Evans, L. Sheneman, and J. Foster, Relaxed Neighbor Joining: A Fast Distance-Based Phylogenetic Tree Construction Method. Journal of Molecular Evolution, 62 (6): 785–792, 2006.
[18] J. Fan, Y. Gao, and H. Luo, Hierarchical Classification for Automatic Image Annotation. In Proceedings of the 30th Annual International ACM SI-GIR Conference on Research and Development in Information Retrieval, pages 111–118, New York, NY, USA, 2007.
[19] J. Fan, Y. Gao, and H. Luo, Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation. IEEE Transactions on Image Processing, 17 (3): 407–426, march 2008.
[20] O. Gascuel and M. Steel, Neighbor-Joining Revealed. Molecular Biology and Evolution, 23 (11): 1997–2000, 2006.
[21] G. Griffin, A. Holub, and P. Perona, Caltech-256 Object Category Dataset. Technical Report 7694, California Institute of Technology, 2007.
[22] M. A. Hearst, S. T. Dumais, E. Osman, J. Platt, and B. Scholkopf, Support Vector Machines. IEEE Intelligent Systems and their Applications, 13 (4): 18–28, 1998.
[23] I. Jolliffe, Principal Component Analysis. Springer-Verlag, New York, NY, USA, 2nd edition, 2002.
[24] J. Li and J. Z. Wang, Automatic Linguistic Indexing of Pictures by a Statistical Modelin Approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25: 1075–1088, 2003.
[25] T. Mailund, G. S. Brodal, R. Fagerberg, C. N. S. Pedersen, and D. Phillips, Recrafting the Neighbor-Joining Method. BMC Bioinfor-matics, 7 (29), 2006.
[26] B. Moghaddam, Q. Tian, N. Lesh, C. Shen, and T. S. Huang, PDH: A Human-Centric Interface for Image Libraries. In Proceedings of IEEE International Conference on Multimedia and Expo, pages 901–904, New York, NY, USA, July 2002.
[27] G. Nguyen and M. Worring, Interactive Access to Large Image Collections using Similarity-Based Visualization. Journal of Visual Languages & Computing, 19 (2): 203–224, 2008.
[28] F. V. Paulovich, D. M. Eler, J. Poco, C. P. Botha, R. Minghim, and L. G. Nonato, Piecewise Laplacian-based Projection for Interactive Data Exploration and Organization. IEEE Computer Graphics Forum, Proceedings Eurovis 2011, 30 (3): 1091–1100, 2011.
[29] F. V. Paulovich and R. Minghim, HiPP: A Novel Hierarchical Point Placement Strategy and its Application to the Exploration of Document Collections. IEEE Transactions on Visualization and Computer Graphics, 14 (6): 1229–1236, 2008.
[30] F. V. Paulovich, L. G. Nonato, R. Minghim, and H. Levkowitz, Least Square Projection: A Fast High Precision Multidimensional Projection Technique and its Application to Document Mapping. IEEE Transactions on Visualization and Computer Graphics, 14 (3): 564–575, 2008.
[31] F. V. Paulovich, C. T. Silva, and L. G. Nonato, Two-Phase Mapping for Projecting Massive Data Sets. IEEE Transactions on Visualization and Computer Graphics, 16: 1281–1290, 2010.
[32] C. Plaisant, J. Grosjean, and B. B. Bederson, SpaceTree: Supporting Exploration in Large Node Link Tree, Design Evolution and Empirical Evaluation. IEEE Symposium on Information Visualization, page 57, 2002.
[33] K. Rodden, W. Basalaj, D. Sinclair, and K. R. Wood, Evaluating a Visualization of Image Similarity as a Tool for Image Browsing. In Proceedings of IEEE InfoVis, pages 36–43, San Francisco, CA, USA, 1999.
[34] G. Rozenberg editor. Handbook of Graph Grammars and Computing by Graph Transformation, volume 1. World Scientific Publishing Company, 1997.
[35] N. Saitou and M. Nei, The Neighbor-Joining Method: A New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution, 4 (4): 406–425, 1987.
[36] M. Simonsen, T. Mailund, and C. N. Pedersen, Rapid Neighbour-Joining. In Proceedings of WABI 2008, pages 113–122, Karlsruhe, Germany, September 2008.
[37] J. A. Studier and K. J. Kepler, A Note on the Neighbour-Joining Method of Saitou and Nei. Molecular Biology and Evolution, 5: 729–731, 1988.
[38] E. Tejada, R. Minghim, and L. G. Nonato, On Improved Projection Techniques to Support Visual Exploration of Multidimensional Data Sets. Information Visualization, 2 (4): 218–231, 2003.
[39] Q. Tian, B. Moghaddam, and T. S. Huang, Visualization, Estimation and User-Modeling for Interactive Browsing of Image Libraries. In ACM International Conference on Image and Video Retrieval, pages 7–16, London, UK, 2002.
[40] T. J. Wheeler, Large-Scale Neighbor-Joining with NINJA. In Proceedings of WABI 2009, pages 375–389, Philadelphia, PA, USA, 2009.
[41] T. J. Wheeler, Large-Scale Neighbor-Joining with NINJA. In Proceedings of the 9th International Conference on Algorithms in Bioinformatics, pages 375–389, Philadelphia, PA, USA, 2009.
[42] M. Worring, O. de Rooij, and T. van Rijn, Browsing Visual Collections Using Graphs. In Multimedia Information Retrieval, pages 307–312, Augsburg, Germany, 2007.
7 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool