
This Article  
 
Share  
Bibliographic References  
Add to:  
Digg Furl Spurl Blink Simpy Del.icio.us Y!MyWeb  
Search  
ASCII Text  x  
"Efficient and Accurate Discovery of Patterns in Sequence Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 8, pp. 11541168, August, 2011.  
BibTex  x  
@article{ 10.1109/TKDE.2011.69, author = {}, title = {Efficient and Accurate Discovery of Patterns in Sequence Data Sets}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {23}, number = {8}, issn = {10414347}, year = {2011}, pages = {11541168}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.69}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, }  
RefWorks Procite/RefMan/Endnote  x  
TY  JOUR JO  IEEE Transactions on Knowledge and Data Engineering TI  Efficient and Accurate Discovery of Patterns in Sequence Data Sets IS  8 SN  10414347 SP1154 EP1168 EPD  11541168 PY  2011 KW  trees (mathematics) KW  data mining KW  suffixtreebased algorithm KW  sequence data sets KW  sequence mining algorithms KW  flexible and accurate motif detector KW  FLAME KW  Computational modeling KW  Fires KW  Data mining KW  Biological system modeling KW  Approximation algorithms KW  DNA KW  Proteins KW  suffix tree. KW  Motif KW  sequence mining VL  23 JA  IEEE Transactions on Knowledge and Data Engineering ER   
[1] M.O. Dayhoff, R.M. Schwartz, and B. Orcutt, "A Model for Evolutionary Changes in Proteins," Atlas of Protein Sequence and Structure, vol. 5, pp. 345352, Nat'l Biomedical Research Foundation, 1978.
[2] S. Henikoff and J. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of Sciences USA, vol. 89, no. 22, pp. 1091510919, 1992.
[3] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 487499, 1994.
[4] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th IEEE Int'l Conf. Data Eng. (ICDE), pp. 314, 1995.
[5] M.J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, vol. 42, nos. 1/2, pp. 3160, 2001.
[6] J. Wang and J. Han, "BIDE: Efficient Mining of Frequent Closed Sequences," Proc. 20th IEEE Int'l Conf. Data Eng. (ICDE), pp. 7990, 2004.
[7] X. Yan, J. Han, and R. Afshar, "CloSpan: Mining Closed Sequential Patterns in Large Datasets," Proc. SIAM Int'l Conf. Data Mining (SDM), 2003.
[8] J. Yang, W. Wang, P.S. Yu, and J. Han, "Mining Long Sequential Patterns in a Noisy Environment," Proc. ACM SIGMOD, pp. 406417, 2002.
[9] S. Sinha and M. Tompa, "YMF: A Program for Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation," Nucleic Acids Research, vol. 31, no. 13, pp. 35863588, 2003.
[10] G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole, "Weeder Web: Discovery of Transcription Factor Binding Sites in a Set of Sequences From CoRegulated Genes," Nucleic Acids Research, vol. 32, pp. W199W203, 2004.
[11] E. Eskin and P.A. Pevzner, "Finding Composite Regulatory Patterns in DNA Sequences," Proc. 10th Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. S354S363, 2002.
[12] J. Buhler and M. Tompa, "Finding Motifs Using Random Projections," J. Computational Biology, vol. 9, no. 2, pp. 225242, 2002.
[13] G. Das, K.I. Lin, H. Mannila, G. Renganathan, and P. Smyth, "Rule Discovery from Time Series," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 1622, 1998.
[14] S. Hoppner, "Discovery of Temporal Patterns—Learning Rules about the Qualitative Behaviour of Time Series," Proc. Fifth European Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 192203, 2001.
[15] P. Patel, E. Keogh, J. Lin, and S. Lonardi, "Mining Motifs in Massive Time Series Databases," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 370377, 2002.
[16] H. Wu, B. Salzberg, G.C. Sharp, S.B. Jiang, H. Shirato, and D. Kaeli, "Subsequence Matching on Structured Time Series Data," Proc. ACM SIGMOD, pp. 682693, 2005.
[17] M.J. Zaki, "Sequence Mining in Categorical Domains: Incorporating Constrains," Proc. Ninth Int'l Conf. Information and Knowledge Management (CIKM), pp. 442429, 2000.
[18] B.Y.C. Chiu, E.J. Keogh, and S. Lonardi, "Probabilistic Discovery of Time Series Motifs," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 493498, 2003.
[19] W. Wang and J. Yang, Mining Sequential Patterns from Large Data Sets, vol. 28, SpringerVerlag, 2005.
[20] M. Das and H.K. Dai, "A Survey of DNA Motif Finding Algorithms," BMC Bioinformatics, vol. 8, p. S21S33, 2007.
[21] G.K. Sandve and F. Drabløs, "A Survey of Motif Discovery Methods in an Integrated Framework," Biology Direct, vol. 1, pp. 1126, 2006.
[22] J. Pei, J. Han, B. MortazaviAsl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "PrefixSpan: Mining Sequential Patterns by PrefixProjected Growth," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 215224, 2001.
[23] J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints in Large Databases," Proc. 11th Int'l Conf. Information and Knowledge Management (CIKM), pp. 1825, 2002.
[24] A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert, "Approaches to the Automatic Discovery of Patterns in Biosequences," J. Computational Biology, vol. 5, pp. 279305, 1998.
[25] L. Marsan and M.F. Sagot, "Algorithms for Extracting Structured Motifs Using a Suffix Tree with Application to Promoter and Regulatory Site Consensus Identification," J. Computational Biology, vol. 7, nos. 3/4, pp. 345360, 2000.
[26] F. Zhu, X. Yan, J. Han, and P.S. Yu, "Efficient Discovery of Frequent Approximate Sequential Patterns," Proc. Seventh IEEE Int'l Conf. Data Mining (ICDM), 2007.
[27] S. Rajasekaran, S. Balla, C.H. Huang, V. Thapar, M.R. Gryk, M.W. Maciejewski, and M.R. Schiller, "Exact Algorithms for Motif Search," Proc. AsiaPacific Bioinformatics Conf. (APBC), pp. 239248, 2005.
[28] J. Davila, S. Balla, and S. Rajasekaran, "Fast and Practical Algorithms for Planted (l, d) Motif Search," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544552, Oct.Dec. 2007.
[29] J. Davila, S. Balla, and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted Motif Search," Proc. Int'l Conf. Computational Science, pp. 822829, 2006.
[30] S. Rajasekaran, S. Balla, and C.H. Huang, "Exact Algorithms for Planted Motif Challenge Problems," Proc. AsiaPacific Bioinformatics Conf. (APBC), pp. 249259, 2005.
[31] T.L. Bailey and C. Elkan, "Unsupervised Learning of Multiple Motifs in Biopolymers Using EM," Machine Learning, vol. 21, nos. 1/2, pp. 5180, 1995.
[32] W. Thompson, E.C. Rouchka, and C.E. Lawrence, "Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites," Nucleic Acids Research, vol. 31, no. 13, pp. 35803585, 2003.
[33] G. Narasimhan, C. Bu, Y. Gao, X. Wang, N. Xu, and K. Mathee, "Mining Protein Sequences for Motifs," J. Computational Biology, vol. 9, no. 5, pp. 707720, 2002.
[34] I. Rigoutsos and A. Floratos, "Motif Discovery without Alignment or Enumeration (Extended Abstract)," Proc. Second Ann. Int'l Conf. Computational Molecular Biology (RECOMB), pp. 221227, 1998.
[35] M. Tompa et al., "Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites," Nature Biotechnology, vol. 23, pp. 137144, 2005.
[36] A.W.C. Fu, E.J. Keogh, L.Y.H. Lau, and C.A. Ratanamahatana, "Scaling and Time Warping in Time Series Querying," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 649660, 2005.
[37] M. Vlachos, G. Kollios, and D. Gunopulos, "Discovering Similar Multidimensional Trajectories," Proc. 18th IEEE Int'l Conf. Data Eng. (ICDE), pp. 673684, 2002.
[38] L. Chen, M. Tamer Ozsu, and V. Oria, "Robust and Fast Similarity Search for Moving Object Trajectories," Proc. ACM SIGMOD, pp. 491502, 2005.
[39] Y. Zhu and D. Shasha, "Warping Indexes with Envelope Transforms for Query by Humming," Proc. ACM SIGMOD, pp. 181192, 2003.
[40] A. Udechukwu, K. Barker, and R. Alhajj, "Discovering all Frequent Trends in Time Series," Proc. Winter Int'l Symp. Information and Comm. Technologies, vol. 58, pp. 16, 2004.
[41] Y. Zhang and M.J. Zaki, "SMOTIF: Efficient Structured Pattern and Profile Motif Search," Algorithms for Molecular Biology, vol. 1, pp. 2245, 2006.
[42] G. Navarro and M. Raffinot, "Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching," J. Computational Biology, vol. 10, no. 6, pp. 903923, 2003.
[43] A. Policriti, N. Vitacolonna, M. Morgante, and A. Zuccolo, "Structured Motifs Search," Proc. Eighth Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 133139, 2004.
[44] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.F. Sagot, "Efficient Extraction of Structured Motifs Using BoxLinks," Proc. Int'l Symp. String Processing and Information Retrieval (SPIRE), pp. 267268, 2004.
[45] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.F. Sagot, "A Highly Scalable Algorithm for the Extraction of CisRegulatory Regions," Proc. AsiaPacific Bioinformatics Conf. (APBC), pp. 273282, 2005.
[46] A.M. Carvalho et al., "An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 126140, Apr.June 2006.
[47] N. Pisanti, A.M. Carvalho, L. Marsan, and M.F. Sagot, "Risotto: Fast Extraction of Motifs with Mismatches," Proc. Seventh Latin Am. Theoretical Informatics Symp. (LATIN), pp. 757768, 2006.
[48] Y. Zhang and M.J. Zaki, "EXMOTIF: Efficient Structured Motif Extraction," Algorithms for Molecular Biology, vol. 1, pp. 2138, 2006.
[49] F. Fassetti, G. Greco, and G. Terracina, "Mining Loosely Structured Motifs from Biological Data," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 14721489, Nov. 2008.
[50] L. DS, "Transcription Factors: An Overview," Int'l J. Biochemistry and Cell Biology, vol. 29, no. 12, pp. 13051312, 1997.
[51] I. Jonassen, J.F. Collins, and D.G. Higgins, "Finding Flexible Patterns in Unaligned Protein Sequences," Protein Science, vol. 4, no. 8, pp. 15871595, 1995.
[52] "Data Sets from Analysis of Financial Time Series," http://www.gsb.uchicago.edu/fac/ruey.tsay/ teachingfts/, 2010.
[53] R.S. Tsay, Analysis of Financial Time Series, first ed., WileyInterscience, Oct. 2001.
[54] P.A. Pevzner and S.H. Sze, "Combinatorial Approaches to Finding Subtle Signals in DNA Sequences," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 269278, 2000.
[55] "cSPADE Source Code," http://www.cs.rpi.edu/zakisoftware/, 2010.
[56] "CloSpan Source Code," http:/illimine.cs.uiuc.edu/, 2011.
[57] "YMF Source Code," http://bio.cs.washington.edusoftware. html , 2010.
[58] "Weeder Source Code," http://www.pesolelab.it/Toolind.php, 2010.
[59] "Random Projections Source Code," http://www.cse.wustl.edu/jbuhlerpgt/, 2010.
[60] S. Tata, R.A. Hankins, and J.M. Patel, "Practical Suffix Tree Construction," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 3647, 2004.
[61] A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 518529, 1999.
[62] T.L. Bailey and C. Elkan, "Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers," Proc. Second Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 2836, 1994.
[63] "TRANSFAC," http://www.generegulation.com/pub databases.html , 2011.