The Community for Technology Leaders
RSS Icon
Issue No.11 - November (2011 vol.23)
pp: 1735-1747
Ruichu Cai , Guandong University of Technology and South China Universty of Technology, Guangzhou
Anthony K.H. Tung , National University of Singapore, Singapore
Zhenjie Zhang , National University of Singapore, Singapore
Zhifeng Hao , Guangdong University of Technology, Guangzhou
In previous studies, association rules have been proven to be useful in classification problems over high dimensional gene expression data. However, due to the nature of such data sets, it is often the case that millions of rules can be derived such that many of them are covered by exactly the same set of training tuples and thus have exactly the same support and confidence. Ranking and selecting useful rules from such equivalent rule groups remain an interesting and unexplored problem. In this paper, we look at two interestingness measures for ranking the interestingness of rules within equivalent rule group: Max-Subrule-Conf and Min-Subrule-Conf. Based on these interestingness measures, an incremental Apriori-like algorithm is designed to select more interesting rules from the lower bound rules of the group. Moreover, we present an improved classification model to fully exploit the potential of the selected rules. Our empirical studies on our proposed methods over five gene expression data sets show that our proposals improve both the efficiency and effectiveness of the rule extraction and classifier construction over gene expression data sets.
Association rules, gene expression data, incremental mining framework, robust classification.
Ruichu Cai, Anthony K.H. Tung, Zhenjie Zhang, Zhifeng Hao, "What is Unequal among the Equals? Ranking Equivalent Rules from Gene Expression Data", IEEE Transactions on Knowledge & Data Engineering, vol.23, no. 11, pp. 1735-1747, November 2011, doi:10.1109/TKDE.2010.207
[1], 2011.
[2] R. Agrawal et al., "Mining Association Rules between Sets of Items in Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '93), 1993.
[3] S. Brin et al., "Dynamic Itemset Counting and Implication Rules for Market Basket Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '97), 1997.
[4] M.P.S. Brown et al., "Knowledge-Based Analysis of Microarray Gene Expression Data by Using Support Vector Machines," Proc. Nat'l Academy of Sciences USA, vol. 97, no. 1, pp. 262-267, 2000.
[5] H. Cheng et al., "Discriminative Frequent Pattern Analysis for Effective Classification," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007.
[6] H. Cheng et al., "Direct Discriminative Pattern Mining for Effective Classification," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), 2008.
[7] G. Cong et al., "Mining Top-k Covering Rule Groups for Gene Expression Data," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '05), 2005.
[8] G. Cong et al., "Farmer: Finding Interesting Rule Groups in Microarray Datasets," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '04), 2004.
[9] C. Creighton and S. Hanash, "Mining Gene Expression Databases for Association Rules," Bioinformatics, vol. 19, no. 1, pp. 79-86, 2003.
[10] A. Dent, "B-cell Lymphoma: Suppressing a Tumor Suppressor," Nature Medicine, vol. 11, no. 1, p. 22, 2005.
[11] T. Golub et al., "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring," Science, vol. 286, no. 5439, pp. 531-537, 1999.
[12] G.J. Gordon et al., "Translation of Microarray Data into Clinically Relevant Cancer Diagnostic Tests Using Gene Expression Ratios in Lung Cancer and Mesothelioma," Cancer Research, vol. 62, no. 17, pp. 4963-4967, 2002.
[13] M. Iwen et al., "Scalable Rule-Based Gene Expression Data Classification," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE '08), 2008.
[14] R. Bayardo, R. Agrawal, and D. Gunopulos, "Constraint-Based Rule Mining in Large, Dense Databases," Proc. 15th Int'l Conf. Data Eng. (ICDE), pp. 188-197, 1999.
[15] D. Koczan et al., "Gene Expression Profiling of Peripheral Blood Mononuclear Leukocytes from Psoriasis Patients Identifies New Immune Regulatory Molecules," European J. Dermatology, vol. 15, no. 4, pp. 251-258, 2005.
[16] R. Kohavi, "A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection," Proc. 14th Int'l Joint Conf. Artificial Intelligence (IJCAI '95), 1995.
[17] J. Li et al., "Minimum Description Length Principle: Generators are Preferable to Closed Patterns," Proc. 21st Nat'l Conf. Artificial Intelligence (AAAI '06), 2006.
[18] W. Li et al., "CMAR: Accurate and Efficient Classification Based on Multiple Class-Association Rules," Proc. IEEE Int'l Conf. Data Mining (ICDM), 2001.
[19] B. Liu et al., "Integrating Classification and Association Rule Mining," Proc. Knowledge Discovery and Data Mining (KDD), 1998.
[20] X. Liu et al., "An Entropy-Based Gene Selection Method for Cancer Classification Using Microarray Data," BMC Bioinformatics, vol. 6, no. 1, p. 76, 2005.
[21] G. Mulligan et al., "Gene Expression Profiling and Correlation with Outcome in Clinical Trials of the Proteasome Inhibitor Bortezomib," Blood, vol. 109, no. 8, pp. 3177-3188, 2007.
[22] F. Pan et al., "Carpenter: Finding Closed Patterns in Long Biological Datasets," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[23] M. Seimiya et al., "Clast5/stra13 is a Negative Regulator of B Lymphocyte Activation," Biochemical and Biophysical Research Comm., vol. 292, no. 1, pp. 121-127, 2002.
[24] M. Seimiya et al., "Impaired Lymphocyte Development and Function in Clast5/Stra13/Dec1-Transgenic Mice," European J. Immunology, vol. 34, no. 5, pp. 1322-1332, 2004.
[25] M. Shipp et al., "Diffuse large B-cell Lymphoma Outcome Prediction by Gene-Expression Profiling and Supervised Machine Learning," Nature Medicine, vol. 8, no. 1, pp. 68-74, 2002.
[26] D. Singh et al., "Gene Expression Correlates of Clinical Prostate Cancer Behavior," Cancer Cell, vol. 1, no. 2, pp. 203-209, 2002.
[27] M. Takagi et al., "Regulation of p53 Translation and Induction After DNA Damage by Ribosomal Protein l26 and Nucleolin," Cell, vol. 123, no. 1, pp. 49-63, 2005.
[28] G.I. Webb, "Discovering Significant Rules," Proc. 12th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '06), 2006.
[29] G.I. Webb, "Discovering Significant Patterns," Machine Learning, vol. 71, no. 1, p. 131, 2008.
[30] X. Yin and J. Han, "CPAR: Classification Based on Predictive Association Rules," Proc. SIAM Int'l Conf. Data Mining (SDM), 2003.
[31] M.J. Zaki and K. Gouda, "Fast Vertical Mining Using Diffsets," Proc. Ninth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), 2003.
[32] A. Reiner, D. Yekutieli, and Y. Benjamini, "Identifying Differentially Expressed Genes Using False Discovery Rate Controlling Procedures," Bioinformatics, vol. 19, no. 3, pp. 368-375, 2003.
[33] G. Pandey, G. Atluri, M. Steinbach, and V. Kumar, "Association Analysis Techniques for Discovering Functional Modules from Microarray Data," Nature Precedings, 2008.
[34] S. Dudoit, Y.H. Yang, M.J. Callow, and T.P. Speed, "Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments," Statistica Sinica, vol. 12, pp. 111-139, 2002.
[35] B. Efron, R. Tibshirani, J.D. Storey, and V. Tusher, "Empirical Bayes Analysis of a Microarray Experiment," J. Am. Statistical Assoc., vol. 96, pp. 1151-1160, 2001.
[36] D. Jiang, C. Tang, and A. Zhang, "Cluster Analysis for Gene Expression Data: A Survey," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.
[37] M. Schena et al., "Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray," Science, vol. 270, no. 5235, pp. 467-470, 1995.
[38] N. Friedman, I. Nachman, and D. Pe'er, "Using Bayesian Networks to Analyze Expression Data," J. Computational Biology: Computational Moleculer Cell Biology, vol. 7, pp. 601-620, 2000.
[39] H.C. Wang and Y.S. Lee, "Gene Network Prediction from Microarray Data by Association Rule and Dynamic Bayesian Network," Proc. Int'l Conf. Computational Science and Its Applications (ICCSA), pp. 309-317, 2005.
[40] X.Q. Shang, Q. Zhao, and Z.H. Li, "Mining High-Correlation Association Rules for Inferring Gene Regulation Networks," Proc. 11th Int'l Conf. Data Warehousing and Knowledge Discovery (DaWaK '09), pp. 244-255, 2009.
[41] , 2011.
13 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool