The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.18)
pp: 2829-2838
ABSTRACT
Significant effort has been devoted to designing clustering algorithms that are responsive to user feedback or that incor- porate prior domain knowledge in the form of constraints. However, users desire more expressive forms of interaction to influence clustering outcomes. In our experiences working with diverse application scientists, we have identified an interaction style scat- ter/gather clustering that helps users iteratively restructure clustering results to meet their expectations. As the names indicate, scatter and gather are dual primitives that describe whether clusters in a current segmentation should be broken up further or, al- ternatively, brought back together. By combining scatter and gather operations in a single step, we support very expressive dynamic restructurings of data. Scatter/gather clustering is implemented using a nonlinear optimization framework that achieves both locality of clusters and satisfaction of user-supplied constraints. We illustrate the use of our scatter/gather clustering approach in a visual analytic application to study baffle shapes in the bat biosonar (ears and nose) system. We demonstrate how domain experts are adept at supplying scatter/gather constraints, and how our framework incorporates these constraints effectively without requiring numerous instance-level constraints.
INDEX TERMS
Clustering algorithms, Visual analytics, Optimization, Computer science, Linear programming, Algorithm design and analysis, constrained clustering, Scatter/gather clustering, alternative clustering
CITATION
M. Shahriar Hossain, Praveen Kumar Reddy Ojili, Cindy Grimm, Rolf Muller, Layne T. Watson, Naren Ramakrishnan, "Scatter/Gather Clustering: Flexibly Incorporating User Feedback to Steer Clustering Results", IEEE Transactions on Visualization & Computer Graphics, vol.18, no. 12, pp. 2829-2838, Dec. 2012, doi:10.1109/TVCG.2012.258
REFERENCES
[1] S. R. Aghabozorgi, and T. Y. Wah., Recommender Systems: Incremental Clustering on Web Log Data. In ICIS ‘09, pages 812-818, 2009.
[2] R. Agrawal, J. Gehrke, D. Gunopulos,, and P. Raghavan., Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD Rec., 27(2): 94-105, 1998.
[3] Z. Ahmed, P. Yost, A. McGovern,, and C. Weaver., Steerable Clustering for Visual Analysis of Ecosystems. In Euro VA ‘11, pages 49-52, 2011.
[4] O. Alonso and J. Talbot., Structuring Collections with Scatter/Gather Extensions. In SIGIR ‘08, pages 697-698, 2008.
[5] G. Andrienko and N. Andrienko, Interactive Cluster Analysis of Diverse Types of Spatiotemporal Data SIGKDD Explor. Newsl., 11(2): 19-28, 2010.
[6] G. Andrienko, N. Andrienko, S. Rinzivillo., M. Nanni, D. Pedreschi,, and F. Giannotti., Interactive Visual Clustering of Large Collections of Trajectories. In VAST ‘09, pages 3-10, 2009.
[7] M. Ankerst, S. Berchtold, and D. A. Keim., Similarity Clustering of Dimensions for an Enhanced Visualization of Multidimensional Data. In INFOVIS ‘98, pages 52-60, 1998.
[8] J. Bernard,T. von Landesberger, S. Bremm, and T. Schreck., Cluster Correspondence Views for Enhanced Analysis of SOM Displays. In VAST, 10, pages 217-218, 2010.
[9] S. Brohee and J. van Helden., Evaluation of Clustering Algorithms for Protein-protein Interaction Networks. BMC Bioinformatics. 7: 488, 2006.
[10] R. Caruana, M. Elhawary, N. Nguyen,, and C. Smith., Meta Clustering. In ICDM ‘06, pages 107-118, 2006.
[11] K. Chen and L. Liu, iVIBRATE: Interactive Visualization-based Framework for Clustering Large Datasets ACM Trans. Inf Syst., 24(2): 245-294, 2006.
[12] C. Cheng,A. W. Fu,, and Y. Zhang., Entropy-based Subspace Clustering for Mining Numerical Data. In KDD ‘99, pages 84-93, 1999.
[13] J. Choo, S. Bohn, and H. Park., Two-stage Framework for Visualization of Clustered High Dimensional Data. In VAST ‘09, pages 67-74, 2009.
[14] C. S. Chua and R. Jarvis., Point Signatures: A New Representation for 3D Object Recognition Int. J. Comput. Vision, 25(1): 63-85, 1997.
[15] T. Coleman and Y. Li, An Interior, Trust Region Approach for Nonlinear Minimization Subject to Bounds SIAM Journal on Optimization, 6: 418-445, 1996.
[16] Y. Cui, X. Fern, and J. G. Dy., Non-redundant Multi-view Clustering via Orthogonalization. In ICDM ‘07, pages 133-142, 2007.
[17] D. R. Cutting,D.R. Karger,, and J. O. Pedersen., Constant Interaction-time Scatter/Gather Browsing of Very Large Document Collections. In SIGIR ‘93, pages 126-134, 1993.
[18] D. R. Cutting,D. R. Karger,J. O. Pedersen,, and J. W. Tukey., Scatter/Gather: a Cluster-based Approach to Browsing Large Document Collections. In SIGIR ‘92, pages 318-329, 1992.
[19] X. Dang and J. Bailey., A Hierarchical Information Theoretic Technique for the Discovery of Non-linear Alternative Clusterings. In KDD ‘10, pages 573-582, 2010.
[20] X. Dang and J. Bailey., Generation of Alternative Clusterings Using the CAMI Approach. In SDM ‘10, pages 118-129, 2010.
[21] I. Davidson and Z. Qi., Finding Alternative Clusterings Using Constraints. In ICDM ‘08, pages 773-778, 2008.
[22] I. Davidson and S. S. Ravi., Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In SDM ‘05, pages 201-211, 2005.
[23] I. Davidson,S. S. Ravi,, and M. Ester., Efficient Incremental Constrained Clustering. In KDD ‘07, pages 240-249, 2007.
[24] M. desJardins, J. MacGlashan, and J. Ferraioli., Interactive Visual Clustering. In VAST ‘07, pages 361-364, 2007.
[25] D. Gondek and T. Hofmann., Non-redundant Clustering with Conditional Ensembles. In KDD ‘05, pages 70-77, 2005.
[26] G. Govaert and M. Nadif, Clustering with Block Mixture Models PR, 36(2): 463-473, 2003.
[27] M. Greenacre., Clustering the Rows and Columns of a Contingency Table. J. of Classification, 5(1): 39-51, 1988.
[28] D. Guo, J. Chen, A. MacEachren,, and K. Liao., A Visualization System for Space-Time and Multivariate Patterns (VIS-STAMP). TVCG, 12(6): 1461-1474, 2006.
[29] P. Heider,A. Pierre-Pierre, R. Li, and C. Grimm., Local Shape Descriptors, A Survey and Evaluation. In Eurographics Workshop on 3D Object Retrieval, pages 49-57, 2011.
[30] M. Höferlin,B. Höferlin, D. Weiskopf, and G. Heidemann., Interactive Schematic Summaries for Exploration of Surveillance Video. In ICMR ‘11, pages 9:1-9:8, 2011.
[31] M. S, Hossain Exploratory Data Analysis using Clusters and Stories. PhD thesis, Virginia Tech, Blacksburg, VA, June 2012.
[32] M. S. Hossain, S. Tadepalli, L. T. Watson, I. Davidson, R. F. Helm,, and N. Ramakrishnan., Unifying Dependent Clustering and Disparate Clustering for Non-homogeneous Data. In KDD ‘10, pages 593-602, 2010.
[33] Y. Huang and T. M. Mitchell., Text Clustering with Extended User Feedback. In SIGIR ‘06, pages 413-420, 2006.
[34] I. Hwang, M. Kahng, and S.-g. Lee., Exploiting User Feedback to Improve Quality of Search Results Clustering. In ICUIMC ‘11, pages 68:1-68:5, 2011.
[35] D. H. Jeong, A. Darvish, K. Najarian., J. Yang, and W. Ribarsky., Interactive Visual Analysis of Time-series Microarray Data Vis. Comput., 24(12): 1053-1066, 2008.
[36] G. Jones and E. C. Teeling., The Evolution of Echolocation in Bats Trends in Ecology&Evolution, 21(3): 149-156, 2006.
[37] Y. Kanada., Axis-specified Search: a Fine-grained Full-text Search Method for Gathering and Structuring Excerpts. In DL ‘98, pages 108-117, 1998.
[38] S. Kaski,J. Nikkilä, J. Sinkkonen, L. Lahti,J. E. A. Knuuttila,, and C. Roos., Associative Clustering for Exploring Dependencies between Functional Genomics Data Sets. IEEE/ACM TCBB, 2(3): 203-216, 2005.
[39] P. Kubelka, New Contributions to the Optics of Intensely Light-Scattering Materials. Part I J. Opt. Soc. Am., 38(5): 448-448, 1948.
[40] T. Li, C. Ding, and M. I. Jordan., Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization. In ICDM ‘07, pages 577-582, 2007.
[41] J. Liang, B. Abidi, and M. Abidi., Automatic X-ray Image Segmentation for Threat Detection. In ICCIMA ‘03, pages 396-401, 2003.
[42] J. Ma and R. Müller., A Method for Characterizing the Biodiversity in Bat Pinnae as a Basis for Engineering Analysis. Bioinspiration & Biomimetics, 6(2): 026008, 2011.
[43] G. Miao, J. Tatemura, W.-P. Hsiung, A. Sawires, and L. E. Moser., Extracting Data Records from the Web using Tag Path Clustering In WWW ‘09, pages 981-990, 2009.
[44] R. Müller., Numerical Analysis of Biosonar Beamforming Mechanisms and Strategies in Bats. J Acoust Soc Am., 128(3): 14141425, 2010.
[45] S. Monti, P. Tamayo, J. Mesirov,, and T. Golub., Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52: 91-118, 2003.
[46] R. Müller., A Numerical Study of the Role of the Tragus in the Big Brown Bat. J Acoust Soc Am, 116(6): 3701-12, 2004.
[47] M. Nadif and G. Govaert., Block Clustering of Contingency Table and Mixture Model. In IDA ‘05, pages 249-259, 2005.
[48] E. J. Nam, Y. Han, K. Mueller., A. Zelenyuk, and D. Imre., ClusterSculp-tor: A Visual Analytics Tool for High-Dimensional Data. In VAST ‘07, pages 75-82, 2007.
[49] D. Niu,J. G. Dy,, and M. I. Jordan., Multiple Non-redundant Spectral Clustering Views. In ICML ‘10, pages 831-838, 2010.
[50] M. K. Obrist,M. B. Fenton,J. L. Eger,, and P. A. Schlegel., What Ears do for Bats: a Comparative Study of Pinna Sound Pressure Transformation in Chiroptera J Exp Biol, 180: 119-152, 1993.
[51] V. A. Petrushin., Mining Rare and Frequent Events in Multi-camera Surveillance Video using Self-organizing Maps. In KDD ‘05, pages 794-800, 2005.
[52] Z. Qi and I. Davidson., A Principled and Flexible Framework for Finding Alternative Clusterings. In KDD ‘09, pages 717-726, 2009.
[53] D. Russo, G. Jones, and R. Arlettaz, Echolocation and Passive Listening by Foraging Mouse-eared Bats Myotis myotis and M. blythii, J Exp Biol, 210(1): 166-176, 2007.
[54] H. Schnitzler and E. Kalko., Bat Biology and Conservation. Washington, DC: Smithsonian Institution Press, 1998.
[55] T. Schreck, J. Bernard, T. Tekusova,, and J. Kohlhammer., Visual Cluster Analysis of Trajectory Data with Interactive Kohonen Maps. In VAST ‘08, pages 3-10, 2008.
[56] J. Sese, Y. Kurokawa, M. Monden., K. Kato, and S. Morishita, Constrained Clusters of Gene Expression Profiles with Pathological Features Bioinformatics, 20(17): 3137-3145, 2004.
[57] J. Sinkkonen, S. Kaski, and J. Nikkilä., Discriminative Clustering: Optimal Contingency Tables by Learning Metrics. In ECML ‘02, pages 418-430, 2002.
[58] J. Sinkkonen,J. Nikkilä, L. Lahti, and S. Kaski., Associative Clustering. In ECML ‘04, pages 396-406, 2004.
[59] A. Strehl and J. Ghosh, Cluster Ensembles — a Knowledge Reuse Framework for Combining Multiple Partitions JMLR, 3: 583-617, 2003.
[60] K. Wagstaff and C. Cardie., Clustering with Instance-level Constraints. In ICML ‘00, pages 1103-1110, 2000.
[61] K. Wagstaff, C. Cardie, S. Rogers,, and S. Schrödl., Constrained K-means Clustering with Background Knowledge. In ICML ‘01, pages 577-584, 2001.
[62] X. Wang and I. Davidson., Flexible Constrained Spectral Clustering. In KDD ‘10, pages 563-572, 2010.
[63] Y. Xu and V. Oan Dong Xu., Clustering Gene Expression Data using a Graph-theoretic Approach: an Application of Minimum Spanning Trees Bioinformatics, 18(4): 536-545, 2002.
[64] Y. Zeng, J. Tang, J. Garcia-Frias,, and G. R. Gao., An Adaptive Meta-Clustering Approach: Combining the Information from Different Clustering Results. In CSB ‘02, pages 276-287, 2002.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool