The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.12 - Dec. (2012 vol.24)
pp: 2218-2231
Qiong Fang , Hong Kong University of Science and Technology, Hong Kong
Wilfred Ng , Hong Kong University of Science and Technology, Hong Kong
Jianlin Feng , Sun Yat-Sen University, Guangzhou
Yuliang Li , Hong Kong University of Science and Technology, Hong Kong
ABSTRACT
The Order-Preserving SubMatrices (OPSMs) are employed to discover significant biological associations between genes and experiment conditions. Herein, we propose a new relaxed OPSM model by considering the linearity relaxation, which is called the Bucket OPSM (BOPSM) model. An efficient method called ApriBopsm is developed to exhaustively mine such BOPSM patterns. We further generalize the BOPSM model by incorporating the similarity relaxation strategy. We develop a generalized BOPSM model called GeBOPSM and adopt a pattern growing method called SeedGrowth to mine GeBOPSM patterns. Informally, the SeedGrowth algorithm adopts two different growing strategies on rows and columns in order to expand a seed BOPSM into a maximal GeBOPSM pattern. We conduct a series of experiments using both synthetic and biological datasets to study the effectiveness of our proposed relaxed models and the efficiency of the relevant mining methods. The BOPSM model is shown to be able to capture the characteristics of noisy OPSM patterns, and is superior to the strict counterparts. ApriBopsm is also significantly more efficient than OPC-Tree, which is the state-of-the-art OPSM mining method. Compared to all the current relaxed OPSM models, the GeBOPSM model achieves the best performance in terms of the number of mined quality patterns.
INDEX TERMS
Biological system modeling, Gene expression, Data mining, Linearity, Data models, Itemsets, OPSM, Order-preserving submatrix, biclustering, bucket order, linearity relaxation, similarity relaxation
CITATION
Qiong Fang, Wilfred Ng, Jianlin Feng, Yuliang Li, "Mining Bucket Order-Preserving SubMatrices in Gene Expression Data", IEEE Transactions on Knowledge & Data Engineering, vol.24, no. 12, pp. 2218-2231, Dec. 2012, doi:10.1109/TKDE.2011.180
REFERENCES
[1] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, 1995.
[2] A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem," Proc. Sixth Int'l Conf. Computational Biology (RECOMB '02), pp. 49-57, 2002.
[3] Y. Cheng and G.M. Church, "Biclustering of Expression Data," Proc. 11th Int'l Conf. Intelligent Systems Molecular Biology, pp. 93-103, 2000.
[4] C.K. Chui, B. Kao, K.Y. Yip, and S.D. Lee, "Mining Order-Preserving Submatrices from Data with Repeated Measurements," Proc. IEEE Eighth Int'l Conf. Data Mining (ICDM '08), pp. 133-142, 2008.
[5] Q. Fang, W. NG, and J. Feng, "Discovering Significant Relaxed Order-Preserving Submatrices," SIGKDD '10: Proc. 16th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 433-442, 2010.
[6] B.J. Gao, O.L. Griffith, M. Ester, and S.J.M. Jones, "Discovering Significant Opsm Subspace Clusters in Massive Gene Expression Data," SIGKDD '06: Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 922-928, 2006.
[7] N. Gupta and S. Aggarwal, "Mib: Using Mutual Information for Biclustering Gene Expression Data," Pattern Recognition, vol. 43, pp. 2692-2697, 2010.
[8] R. Gupta, N. Rao, and V. Kumar, "Discovery of Error-Tolerant Biclusters from Noisy Gene Expression Data," BIOKDD '10: Proc. Int'l Workshop Data Mining in Bioinformatics, 2010.
[9] H.-P. Kriegel, P. Kröger, and A. Zimek, "Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering," ACM Trans. Knowledge Discovery Data, vol. 3, no. 1, pp. 1-58, 2009.
[10] G. Li, Q. Ma, H. Tang, A. Paterson, and Y. Xu, "Qubic: A Qualitative Biclustering Algorithm for Analyses of Gene Expression Data," Nucleic Acids Research, vol. 37, no. 15,e101, 2009.
[11] J. Liu and W. Wang, "Op-Cluster: Clustering by Tendency in High Dimensional Space," Proc. IEEE Int'l Conf. Data Mining (ICDM '03), 2003.
[12] S.C. Madeira and A.L. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24-45, Jan.-Mar. 2004.
[13] G. Pandey, G. Atluri, M. Steinbach, C.L. Myers, and V. Kumar, "An Association Analysis Approach to Biclustering," SIGKDD '09: Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 677-686, 2009.
[14] J. Pei and et al, "Prefixspan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," Proc. IEEE Int'l Conf. Data Eng. (ICDE '01), 2001.
[15] J. Pei and et al, "Mining Sequential Patterns by Pattern-Growth: The Prefixspan Approach," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1424-1440, Nov. 2004.
[16] A. Prelić, S. Bleuler, P. Zimmermann, A. Wille, and et al, "A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data," Bioinformatics, vol. 22, no. 9, pp. 1122-1129, 2006.
[17] P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher, "Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization," Molecular Biology of the Cell, vol. 9, no. 12, pp. 3273-3297, Dec. 1998.
[18] R. Srikant and R. Agrawal, "Mining Sequential Patterns: Generalizations and Performance Improvements," Proc. Fifth Int'l Conf. Extending Database Technology (EDBT '96), pp. 3-17, 1996.
[19] A. Tanay, R. Sharan, and R. Shamir, "Discovering Statistically Significant Biclusters in Gene Expression Data," Bioinformatics, vol. 18, pp. 136-144, 2002.
[20] M. Zhang, W. Wang, and J. Liu, "Mining Approximate Order Preserving Clusters in the Presence of Noise," Proc. Int'l Conf. Data Eng. (ICDE '08), pp. 160-168, 2008.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool