The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2013 vol.25)
pp: 2314-2324
David K.Y. Chiu , University of Guelph, Guelph
Thomas W.H. Lui , University of Guelph, Guelph
ABSTRACT
In this research, we introduce a novel, complex associative pattern that is found to be very useful because it identifies the core associative structure from the data. We refer to it as nested high-order pattern. The pattern is more specific than associative patterns represented as multiple variables. It also generalizes sequential patterns, as the outcomes need not be contiguous. This paper outlines two search algorithms, the $(r)$-Tree and Best-$(k)$ algorithm in its detection. It was then applied to an analysis of biomolecule using the aligned sequence family of the molecule. In the SH3 protein, a model for protein-protein interaction mediator, we identify functional groups (core and binding sites) in the three-dimensional structure as well as amino acid patterns dominating certain species.
INDEX TERMS
Algorithm design and analysis, Tin, Statistical analysis, Mutual information, Proteins, Educational institutions, Compounds, pattern analysis, Algorithm design and analysis, Tin, Statistical analysis, Mutual information, Proteins, Educational institutions, Compounds, bioinformatics, Classifier design and evaluation, data mining, granular computing
CITATION
David K.Y. Chiu, Thomas W.H. Lui, "NHOP: A Nested Associative Pattern for Analysis of Consensus Sequence Ensembles", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 10, pp. 2314-2324, Oct. 2013, doi:10.1109/TKDE.2012.151
REFERENCES
[1] R. Agrawal, T. Imielinski, and A.N. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 207-216, 1993.
[2] M. Antonie and O. Zaiane, "An Associative Classifier Based on Positive and Negative Rules," Proc. Ninth ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery, pp. 64-69, 2004.
[3] B. Arunasalam and S. Chawla, "CCCS: A Top-Down Associative Classifier for Imbalanced Class Distribution," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 517-522, 2006.
[4] A. Bateman, L. Coin, R. Durbin, R.D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, and S. Moxon, "The Pfam Protein Families Database," Nucleic Acids Research, vol. 32, pp. D138-D141, 2004.
[5] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, B.A. Rapp, and D.L. Wheeler, "Genbank," Nucleic Acids Research, vol. 36, no. 1, pp. D25-D30, 2008.
[6] S. Brin, R. Motwani, and C. Silverstein, "Beyond Market Baskets: Generalizing Association Rules to Correlations," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 265-276, 1997.
[7] D.K.Y. Chiu and T.W.H. Lui, "A Multiple-Pattern Biosequence Analysis Method for Diverse Source Association Mining," Applied Bioinformatics, vol. 4, no. 2, pp. 85-92, June 2005.
[8] D.K.Y. Chiu and T.W.H. Lui, "Fusing Information Involving Multiple Databases in Bioinformatics," GBGIST '03: Proc. Processing of the Atlantic Symp. Computational Biology and Genome Informatics, pp. 899-902, Sept. 2003.
[9] D.K.Y. Chiu and T.W.H. Lui, "Integrated Use of Multiple Interdependent Patterns for Biomolecular Sequence Analysis," Int'l J. Fuzzy Systems, vol. 4, no. 3, pp. 766-775, 2002.
[10] D.K.Y. Chiu and T.W.H. Lui, "Integrated Database Mining in Multi-Modal Relations," Proc. Internet Conf. Advances in Infrastructure for Electronic Business, Science and Education on the Internet, p. 72, Aug. 2001.
[11] D.K.Y. Chiu and T.W.H. Lui, "Integrated Database Mining in Multi-Modal Relations," Proc. Internet Conf. Advances in Infrastructure for Electronic Business, Science and Education on the Internet, Aug. 2001.
[12] D.K.Y. Chiu and T.W.H. Lui, "Consigned Interdependency for Pattern Discovery from Biomolecular Sequences," GBGIST '01: Proc. Processing of the Atlantic Symp. Computational Biology and Genome Information Systems & Technology, pp. 75-81, Mar. 2001.
[13] D.K.Y Chiu and Y. Wang, "Multipattern Consensus Regions in Multiple Aligned Protein Sequences and Their Segmentation," EURASIP J. Bioinformatics and Systems Biology, vol. 2006, pp. 1-8, 2006.
[14] D.K.Y. Chiu, A.K.C. Wong, and B. Cheung, "Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis," Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W.J. Frawley, eds., pp. 125-140, MIT/AAAI Press, 1991.
[15] D.K.Y. Chiu and A.K.C. Wong, "Multiple Pattern Associations for Interpreting Structural and Functional Characteristics of Biomolecules," Information Sciences, vol. 167, pp. 23-39, 2004.
[16] A.A. Di Nardo, S.M. Larson, and A.R. Davidson, "The Relationship between Conservation, Thermodynamic Stability, and Function in the SH3 Domain Hydrophobic Core," J. Molecular Biology, vol. 333, no. 3, pp. 641-655, 2003.
[17] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, second ed. Wiley-Interscience, 2001.
[18] S.J. Haberman, "The Analysis of Residuals in Cross-Classified Tables," Biometrics, vol. 29, pp. 205-220, 1973.
[19] J. Han, J. Pei, and Y. Yin, "Mining Frequent Patterns without Candidate Generation," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD '00), pp. 1-12, 2000.
[20] P.D.N Hebert, A. Cywinska, S.L Ball, and J.R deWaard, "Biological Identifications through DNA Barcodes," Proc. Royal Soc. London B, vol. 270, pp. 313-321, 2003.
[21] J.G. Kalbfleisch, Probability and Statistical Inference, second ed., vol. 2. Springer-Verlag, 1985.
[22] S.M. Larson, A.A. Nardo Di, and A.R. Davidson, "Analysis of Covariation in an SH3 Domain Sequence Alignment: Applications in Tertiary Contact Prediction and the Design of Compensating Hydrophobic Core Substitutions," J. Molecular Biology, vol. 303, no. 3, pp. 433-446, 2000.
[23] W. Li, J. Han, and J. Pei, "CMAR: Accurate and Efficient Classification Based on Multiple-Class Association Rule," Proc. Int'l Conf. Data Mining (ICDM '01), pp. 369-376, 2001.
[24] T.Y. Lin, "Granular Computing I: The Concept of Granulation and Its Formal Model," Int'l J. Granular Computing, Rough Sets and Intelligent Systems, vol. 1, no. 1, pp. 21-42, 2009.
[25] B. Liu, W. Hsu, and Y. Ma, "Integrating Classification and Association Rule Mining," Proc. Int'l Conf. Knowledge Discovery and Data Mining, pp. 80-86, 1998.
[26] B. Liu, Y. Ma, and C.K. Wong, "Improving an Association Rule Based Classifier," Proc. Fourth European Conf. Principles of Data Mining and Knowledge Discovery, pp. 504-509, 2000.
[27] T.W.H. Lui and D.K.Y. Chiu, "Associative Classification Using Patterns from Nested Granules," Int'l J. Granular Computing, Rough Sets and Intelligent Systems, vol. 1, no. 4, pp. 393-406, 2010.
[28] T.W.H. Lui and D.K.Y. Chiu, "Multi-Value Association Patterns and Data Mining," Data Mining: Theoretical Foundations and Applications, A. Abraham, A.E. Hassanien, A. Carvalho, and V. Snasel, eds., vol. 6, pp. 171-191, Springer-Verlag, 2009.
[29] T.W.H. Lui and D.K.Y. Chiu, "NHOP: Detecting Descriptive Patterns Using Association Pattern Mining," Proc. IEEE Int'l Conf. Granular Computing (GRC), pp. 491-496, 2008.
[30] T.W.H. Lui and D.K.Y. Chiu, "Three Related Types of Multi-Value Association Patterns," Proc. 19th Int'l Conf. Pattern Recognition (IAPR), 2008, doi:10.1109/ICPR.2008.4761258.
[31] T.W.H. Lui and D.K.Y. Chiu, "Complementary Analysis of High-Order Association Patterns and Classification," Proc. 21st Florida Artificial Intelligence Research Soc. Conf. (FLAIRS), pp. 294-299, 2008.
[32] T.W.H. Lui and D.K.Y. Chiu, "Discovering Maximized Progressive High-Order Patterns in Biosequences," Proc. 10th Joint Conf. Information Sciences, P.Y. Cao, et al., eds., pp. 110-115, 2007.
[33] T.W.H. Lui and E.R.M. Tillier, "Finding the Functional Amino Acid Associations with and without Assuming the Protein Sequence Phylogeny," GBGIST '03: Proc. Atlantic Symp. Computational Biology and Genome Informatics, pp. 855-858, Sept. 2003.
[34] T. Pawson and J. Schlessingert, "SH2 and SH3 Domains," Current Biology, vol. 3, pp. 434-442, 1993.
[35] W. Pedrycz, "Granular Computing—The Emerging Paradigm," J. Uncertain Systems, vol. 1, no. 1, pp. 38-61, 2007.
[36] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[37] B.K. Sy, "Information-Statistical Pattern Based Approach for Data Mining," J. Statistical Computing and Simulation, vol. 69, no. 2, pp. 1-31, 2001.
[38] B.K. Sy, "Discovering Association Patterns Based on Mutual Information," Proc. Int'l Conf. Machine Learning and Data Mining in Pattern Recognition, pp. 369-378, 2003.
[39] F. Thabtah, "A Review of Associative Classification Mining," The Knowledge Eng. Rev., vol. 22, no. 1, pp. 37-65, 2007.
[40] E.R. Tillier and T.W.H. Lui, "Using Multiple Interdependency to Separate Functional from Phylogenetic Correlations in Protein Alignments," Bioinformatics, vol. 19, pp. 750-755, 2003.
[41] W. Wang and J. Yang, Mining Sequential Patterns from Large Data Sets, A.K. Elmagarmid, ed. Springer, 2005.
[42] A.K.C. Wong, D.K.Y. Chiu, and W. Huang, "A Discrete-Valued Clustering Algorithm with Applications to Biomolecular Data," Information Sciences, vol. 139, pp. 97-112, 2001.
[43] A.K.C. Wong and D.K.Y. Chiu, "An Event-Covering Method for Effective Probabilistic Inference," Pattern Recognition, vol. 20, no. 2, pp. 245-255, 1987.
[44] A.K.C. Wong and G.C.L. Li, "Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 7, pp. 911-923, July 2008.
[45] A.K.C. Wong, T.S. Liu, and C.C. Wang, "Statistical Analysis of Residue Variability in Cytochrome c," J. Molecular Biology, vol. 102, no. 2, pp. 287-295, 1976.
[46] A.K.C. Wong and Y. Wang, "High-Order Pattern Discovery from Discrete-Valued Data," IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 877-892, Nov./Dec. 1997.
[47] A.K.C. Wong and Y. Wang, "From Association to Classification: Inference Using Weight of Evidence," Proc. IEEE Int'l Conf. Systems, Man, and Cybernetics (SMC '99), vol. 3, pp. 934-939, 1999.
[48] A.K.C. Wong and Y. Wang, "Pattern Discovery: A Data Driven Approach to Decision Support," IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications & Reviews, vol. 33, no. 1, pp. 114-124, Feb. 2003.
[49] J.T. Yao, "A Ten-Year Review of Granular Computing," Proc. IEEE Int'l Conf. Granular Computing, pp. 734-739, Nov. 2007.
[50] X. Yin and J. Han, "CPAR: Classification Based on Predictive Association Rules," Proc. SIAM Int'l Conf. Data Mining (SDM '03), pp. 331-335, 2003.
[51] A. Zimmermann and L.D. Raedt, "CorClass: Correlated Association Rule Mining for Classification," Proc. Seventh Int'l Conf. Discovery Science (DS '04), pp. 60-72, 2004.
5 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool