This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
High Confidence Rule Mining for Microarray Analysis
October-December 2007 (vol. 4 no. 4)
pp. 611-623
We present an association rule mining method for mining high confidence rules, which describe interesting gene relationships from microarray datasets. Microarray datasets typically contain an order of magnitude more genes than experiments, rendering many data mining methods impractical as they are optimised for sparse datasets. A new family of row-enumeration rule mining algorithms have emerged to facilitate mining in dense datasets. These algorithms rely on pruning infrequent relationships to reduce the search space by using the support measure. This major shortcoming results in the pruning of many potentially interesting rules with low support but high confidence. We propose a new row-enumeration rule mining method, MaxConf, to mine high confidence rules from microarray data. MaxConf is a support-free algorithm which directly uses the confidence measure to effectively prune the search space. Experiments on three microarray datasets show that MaxConf outperforms support-based rule mining with respect to scalability and rule extraction. Furthermore, detailed biological analyses demonstrate the effectiveness of our approach -- the rules discovered by MaxConf are substantially more interesting and meaningful compared with support-based methods.

[1] T. Akutsu, S. Kuhara, O. Maruyama, and S. Miyano, “Identification of Genetic Networks by Strategic Gene Disruptions and Gene Overexpressions under a Boolean Model,” Theoretical Computer Science, vol. 298, pp. 235-251, 2003.
[2] T. Akutsu, S. Miyano, and S. Kuhara, “Inferring Qualitative Relations in Genetic Networks and Metabolic Pathways,” Bioinformatics, vol. 16, no. 8, pp. 727-734, 2000.
[3] C. Creighton and S. Hanash, “Mining Gene Expression Databases for Association Rules,” Bioinformatics, vol. 19, no. 1, pp. 79-86, 2003.
[4] G. Cong, K.-L. Tan, A. Tung, and F. Pan, “Mining Frequent Closed Patterns in Microarray Data,” Proc. Fourth IEEE Int'l Conf. Data Mining (ICDM), vol. 4, pp. 363-366, 2004.
[5] T. Akutsu, S. Miyano, and S. Kuhara, “Identification of Genetic Networks from a Small Number of Gene Expression Patterns under the Boolean Network Model,” Proc. Pacific Symp. Biocomputing, vol. 4, pp. 17-28, 1999.
[6] F. Pan, G. Cong, K. Tung, J. Yang, and M. Zaki, “CARPENTER: Finding Closed Patterns in Long Biological Datasets,” Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 637-642, 2003.
[7] C. Alfarano et al., “The Biomolecular Interaction Network Database and Related Tools 2005 Update,” Nucleic Acids Research, vol. 33, pp. D418-D424, 2005.
[8] The Gene Ontology Consortium, “The Gene Ontology (GO) Database and Informatics Resource,” Nucleic Acids Research, vol. 32, pp. D258-D261, 2004.
[9] P. Spellman, G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell, vol. 9, pp. 3273-3297, 1998.
[10] D. Singh et al., “Gene Expression Correlates of Clinical Prostate Cancer Behavior,” Cancer Cell, vol. 1, pp. 203-209, 2002.
[11] A. Gasch, P. Spellman, C. Kao, O. Carmel-Harel, M. Eisen, G. Storz, D. Botstein, and P. Brown, “Genomic Expression Changes in the Response of Yeast Cells to Environmental Changes,” Molecular Biology of the Cell, vol. 11, no. 12, pp. 4241-4257, 2000.
[12] D. Jiang, C. Tang, and A. Zhang, “Cluster Analysis for Gene Expression Data: A Survey,” IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1370-1386, Nov. 2004.
[13] G. Cong, K.-L. Tan, A.K. Tung, and X. Xu, “Mining TOP-K Covering Rule Groups for Gene Expression Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 670-681, 2005.
[14] R. Agrawal, T. Imielinksi, and A.N. Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 207-216, 1993.
[15] G. Cong, A. Tung, X. Xu, F. Pan, and J. Yang, “FARMER: Finding Interesting Rule Groups in Microarray Datasets,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 143-154, 2004.
[16] M. Zaki and C. Hsiao, “CHARM: An Efficient Algorithm for Closed Association Rule Mining,” Proc. SIAM Int'l Conf. Data Mining (SDM), pp. 457-473, 2002.
[17] J. Pei, J. Han, and R. Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” Proc. ACM SIGMOD Int'l Workshop Data Mining and Knowledge Discovery (DMKD), pp. 21-30, 2000.
[18] Y. Huang, H. Xiong, S. Shekhar, and J. Pei, “Mining Confident Co-Location Rules without a Support Threshold,” Proc. 18th ACM Symp. Applied Computing (SAC), pp. 407-501, 2003.
[19] T. Hughes et al., “Functional Discovery via a Compendium of Expression Profiles,” Cell, vol. 102, pp. 109-126, 2000.
[20] S. Mnaimneh et al., “Exploration of Essential Gene Functions via Titratable Promoter Alleles,” Cell, vol. 118, pp. 31-44, 2004.
[21] T. Beissbarth and T. Speed, “GOstat: Find Statistically Overrepresented Gene Ontologies within Gene Groups,” Bioinformatics, vol. 20, no. 9, pp. 1464-1465, 2004.
[22] R. Hassett, A. Romeo, and D. Kosman, “Regulation of High Affinity Iron Uptake in the Yeast Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 273, no. 13, pp. 7628-7636, 1998.
[23] V. Haurie, H. Boucherie, and F. Sagliocco, “The Snf1 Protein Kinase Controls the Induction of Genes of the Iron Uptake Pathway at the Diauxic Shift in Saccharomyces Cerevisiae,” J.Biological Chemistry, vol. 278, no. 46, pp. 45391-45396, 2003.
[24] L. Martins, L. Jensen, J. Simon, G. Keller, and D. Winge, “Metalloregulation of FRE1 and FRE2 Homologs in Saccharomyces Cerevisiae,” J. Biological Chemistry, vol. 273, no. 37, pp.23716-23721, 1998.
[25] T. McIntosh and S. Chawla, “On Discovery of Maximal Confident Rules without Support Pruning in Microarray Data,” Proc. Fifth ACM SIGKDD Workshop Data Mining in Bioinformatics (BIOKDD), pp. 37-45, 2005.

Index Terms:
Data mining, association rules, high confidence rule mining, microarray analysis
Citation:
Tara McIntosh, Sanjay Chawla, "High Confidence Rule Mining for Microarray Analysis," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 611-623, Oct.-Dec. 2007, doi:10.1109/tcbb.2007.1050
Usage of this product signifies your acceptance of the Terms of Use.