The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - February (2012 vol.24)
pp: 309-325
Byron J. Gao , Texas State University, San Marcos
Obi L. Griffith , Lawrence Berkeley National Laboratory, Berkeley
Martin Ester , Simon Fraser University, Burnaby
Hui Xiong , Rutgers University, Newark
Qiang Zhao , Texas State University, San Marcos
Steven J.M. Jones , University of British Columbia and British Columbia Cancer Agency, Vancouver
ABSTRACT
HASH(0x2982ae4)
INDEX TERMS
Order-preserving submatrix, OPSM, deep OPSM, deep pattern, subspace clustering, pattern-based clustering, sequential pattern mining, scalability, best effort, gene expression analysis, negative correlation, data mining.
CITATION
Byron J. Gao, Obi L. Griffith, Martin Ester, Hui Xiong, Qiang Zhao, Steven J.M. Jones, "On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 2, pp. 309-325, February 2012, doi:10.1109/TKDE.2010.244
REFERENCES
[1] C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park, "Fast Algorithms for Projected Clustering," SIGMOD Record, vol. 28, no. 2, pp. 61-72, 1999.
[2] C.C. Aggarwal and P.S. Yu, "Finding Generalized Projected Clusters in High Dimensional Spaces," SIGMOD Record, vol. 29, no. 2, pp. 70-81, 2000.
[3] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," SIGMOD Record, vol. 27, no. 2, pp. 94-105, 1998.
[4] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th Int'l Conf. Data Eng. (ICDE), 1995.
[5] R. Albert, "Scale-Free Networks in Cell Biology," J. Cell Science, vol. 118, pp. 4947-4957, 2005.
[6] M. Ashburner et al., "Gene Ontology: Tool for the Unification of Biology, the Gene Ontology Consortium," Nature Genetics, vol. 25, no. 1, pp. 25-29, 2000.
[7] T. Barrett et al., "NCBI GEO: Archive for High-Throughput Functional Genomic Data," Nucleic Acids Research, vol. 37, pp. D885-D890, 2009.
[8] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem," J. Computational Biology, vol. 10, nos. 3/4, pp. 373-384, 2003.
[9] Y. Cheng and G.M. Church, "Biclustering of Expression Data," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), 2000.
[10] L. Cheung, D.W. Cheung, B. Kao, K.Y. Yip, and M.K. Ng, "On Mining Micro-Array Data by Order-Preserving Submatrix," Int'l J. Bioinformatics Research and Applications, vol. 3, no. 1, pp. 42-64, 2007.
[11] W.G. Cochran, Sampling Techniques, third ed. John Wiley and Sons, 1977.
[12] A.C. Davison and D.V. Hinkley, Bootstrap Methods and Their Application. Cambridge Univ. Press, 1997.
[13] I. Dhillon, E. Marcotte, and U. Roshan, "Diametrical Clustering for Identifying Anti-Correlated Gene Clusters," Bioinformatics, vol. 13, no. 19, pp. 1612-1619, 2003.
[14] B. Efron, "Bootstrap Methods: Another Look at the Jackknife," Annals of Statistics, vol. 7, pp. 1-26, 1979.
[15] M. Ester, H. Kriegel, J. Sander, and X. Xu, "A Density-Based Algorithm for Discovering Clusters in Large Spatial Database with Noise," Proc. Second Int'l Conf. Knowledge Discovery and Data Mining (KDD), 1996.
[16] J.H. Friedman and J.J. Meulman, "Clustering Objects on Subsets of Attributes," J. Royal Statistical Soc., vol. 66, no. 4, pp. 815-849, 2004.
[17] B.J. Gao, O.L. Griffith, M. Ester, and S.J.M. Jones, "Discovering Significant OPSM Subspace Clusters in Massive Gene Expression Data," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2006.
[18] R. Gentleman et al., "Bioconductor: Open Software Development for Computational Biology and Bioinformatics," Genome Biology, vol. 5, no. 10, p. R80, 2004.
[19] S. Goil, H. Nagesh, and A. Choudhary, "MAFIA: Efficient and Scalable Subspace Clustering for Very Large Data Sets," Technical Report CPDC-TR-9906-010, Northwestern Univ., 1999.
[20] A. Goldstrohm, A. Greenleaf, and M. Garcia-Blanco, "Co-Transcriptional Splicing of Pre-Messenger RNAs: Considerations for the Mechanism of Alternative Splicing," Gene, vol. 277, nos. 1/2, pp. 31-47, 2001.
[21] O. Griffith et al., "Assessment and Integration of Publicly Available SAGE, cDNA Microarray, and Oligonucleotide Microarray Expression Data for Global Coexpression Analyses," Genomics, vol. 86, pp. 476-488, 2005.
[22] S. Hanhijärvi, M. Ojala, N. Vuokko, K. Puolamäki, N. Tatti, and H. Mannila, "Tell Me Something I Don't Know: Randomization Strategies for Iterative Data Mining," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2009.
[23] J. Hartigan, "Direct Clustering of a Data Matrix," J. Am. Statistical Assoc., vol. 67, no. 337 pp. 123-129, 1972.
[24] I. Hedenfalk et al., "Gene-Expression Profiles in Hereditary Breast Cancer," New England J. Medicine, vol. 344, no. 8, pp. 539-548, 2001.
[25] S. Ho Sui et al., "oPOSSUM: Identification of Over-Represented Transcription Factor Binding Sites in Co-Expressed Genes," Nucleic Acids Research, vol. 33, no. 10, pp. 3154-3164, 2005.
[26] J. Hubble et al., "Implementation of Genepattern within the Stanford Microarray Database," Nucleic Acids Research, vol. 37, pp. D898-D901, 2009.
[27] L. Jensen et al., "Arrayprospector: A Web Resource of Functional Associations Inferred from Microarray Expression Data," Nucleic Acids Research, vol. 32, pp. W445-W448, 2004.
[28] L. Jing, M.K. Ng, and J.Z. Huang, "An Entropy Weighting K-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data," IEEE Trans. Knowledge and Data Eng., vol. 19, no. 8, pp. 1026-1041, Aug. 2007.
[29] L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analyis. John Wiley and Sons, 1990.
[30] H.-P. Kriegel, P. Kroger, M. Renz, and S. Wurst, "A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data," Proc. IEEE Fifth Int'l Conf. Data Mining (ICDM), 2005.
[31] H.-P. Kriegel, P. Kröger, and A. Zimek, "Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering," ACM Trans. Knowledge Discovery Data, vol. 3, no. 1, pp. 1-58, 2009.
[32] J. Liu and W. Wang, "OP-Cluster: Clustering by Tendency in High Dimensional Space," Proc. IEEE Third Int'l Conf. Data Mining (ICDM), 2003.
[33] J. Liu, J. Yang, and W. Wang, "Biclustering in Gene Expression Data by Tendency," Proc. IEEE Computational Systems Bioinformatics Conf. (CSB), 2004.
[34] J.B. MacQueen, "Some Methods for Classifiation and Analysis of Multivariate Observations," Proc. Fifth Berkeley Symp. Math. Statistics and Probability, 1967.
[35] S. Madeira and A. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.
[36] T. Mielikainen and H. Mannila, "The Pattern Ordering Problem," Proc. Seventh European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD), 2003.
[37] B. Mirkin, Mathematical Classification and Clustering. Kluwer Academic Publishers, 1996.
[38] B. Modrek and C. Lee, "A Genomic View of Alternative Splicing," Nature Genetics, vol. 30, no. 1, pp. 13-19, 2002.
[39] R.T. Ng and J. Han, "Efficient and Effective Clustering Methods for Spatial Data Mining," Proc. 20th Int'l Conf. Very Large Data Bases (VLDB), 1994.
[40] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "Mining Sequential Patterns by Pattern-Growth: The Prefixspan Approach," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1424-1440, Nov. 2004.
[41] A. Prelic et al., "A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data," Bioinformatics, vol. 22, no. 9, pp. 1122-1129, 2006.
[42] C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali, "A Monte Carlo Algorithm for Fast Projective Clustering," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2002.
[43] J. Qian et al., "Beyond Synexpression Relationships: Local Clustering of Time-Shifted and Inverted Gene Expression Profiles Identifies New, Biologically Relevant Interactions," J. Molecular Biology, vol. 314, no. 5, pp. 1053-1066, 2001.
[44] C.P. Robert and G. Casella, Monte Carlo Statistical Methods. Springer, 2004.
[45] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," Proc. Fifth Int'l Conf. Extending Database Technology (EDBT), 1996.
[46] A. Su et al., "Large-Scale Analysis of the Human and Mouse Transcriptomes," Proc. Nat'l Academy of Sciences USA, vol. 99, no. 7, pp. 4465-4470, 2002.
[47] H. Wang, W. Wang, J. Yang, and P.S. Yu, "Clustering by Pattern Similarity in Large Data Sets," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2002.
[48] K.-G. Woo, J.-H. Lee, M.-H. Kim, and Y.-J. Lee, "Findit: A Fast and Intelligent Subspace Clustering Algorithm Using Dimension Voting," Information and Software Technology, vol. 46, no. 4, pp. 255-271, 2004.
[49] J. Yang, W. Wang, H. Wang, and P.S. Yu, "$\delta$ -Clusters: Capturing Subspace Correlation in a Large Data Set," Proc. 18th Int'l Conf. Data Eng. (ICDE), 2002.
[50] K.Y. Yip, D.W. Cheung, and M.K. Ng, "Harp: A Practical Projected Clustering Algorithm," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11 pp. 1387-1397, Nov. 2004.
[51] M.L. Yiu and N. Mamoulis, "Iterative Projected Clustering by Subspace Mining," IEEE Trans. Knowledge and Data Eng., vol. 17, no. 2, pp. 176-189, Feb. 2005.
[52] M.J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, vol. 42, nos. 1/2, pp. 31-60, 2001.
[53] B. Zeeberg et al., "High-Throughput GoMiner, an 'Industrial-Strength' Integrative Gene Ontology Tool for Interpretation of Multiple-Microarray Experiments, with Application to Studies of Common Variable Immune Deficiency (CVID)," BMC Bioinformatics, vol. 6, article 168, 2005.
[54] T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Databases," SIGMOD Record, vol. 25, no. 2, pp. 103-114, 1996.
[55] Y. Zhao, J. Xu Yu, G. Wang, L. Chen, B. Wang, and G. Yu, "Maximal Subspace Coregulated Gene Clustering," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 1, pp. 83-98, Jan. 2008.
29 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool