|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ASCII Text | x | ||
| "Efficient and Accurate Discovery of Patterns in Sequence Data Sets," IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 8, pp. 1154-1168, August, 2011. | |||
| BibTex | x | ||
| @article{ 10.1109/TKDE.2011.69, author = {}, title = {Efficient and Accurate Discovery of Patterns in Sequence Data Sets}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {23}, number = {8}, issn = {1041-4347}, year = {2011}, pages = {1154-1168}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.69}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - Efficient and Accurate Discovery of Patterns in Sequence Data Sets IS - 8 SN - 1041-4347 SP1154 EP1168 EPD - 1154-1168 PY - 2011 KW - trees (mathematics) KW - data mining KW - suffix-tree-based algorithm KW - sequence data sets KW - sequence mining algorithms KW - flexible and accurate motif detector KW - FLAME KW - Computational modeling KW - Fires KW - Data mining KW - Biological system modeling KW - Approximation algorithms KW - DNA KW - Proteins KW - suffix tree. KW - Motif KW - sequence mining VL - 23 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
[1] M.O. Dayhoff, R.M. Schwartz, and B. Orcutt, "A Model for Evolutionary Changes in Proteins," Atlas of Protein Sequence and Structure, vol. 5, pp. 345-352, Nat'l Biomedical Research Foundation, 1978.
[2] S. Henikoff and J. Henikoff, "Amino Acid Substitution Matrices from Protein Blocks," Proc. Nat'l Academy of Sciences USA, vol. 89, no. 22, pp. 10915-10919, 1992.
[3] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 487-499, 1994.
[4] R. Agrawal and R. Srikant, "Mining Sequential Patterns," Proc. 11th IEEE Int'l Conf. Data Eng. (ICDE), pp. 3-14, 1995.
[5] M.J. Zaki, "SPADE: An Efficient Algorithm for Mining Frequent Sequences," Machine Learning, vol. 42, nos. 1/2, pp. 31-60, 2001.
[6] J. Wang and J. Han, "BIDE: Efficient Mining of Frequent Closed Sequences," Proc. 20th IEEE Int'l Conf. Data Eng. (ICDE), pp. 79-90, 2004.
[7] X. Yan, J. Han, and R. Afshar, "CloSpan: Mining Closed Sequential Patterns in Large Datasets," Proc. SIAM Int'l Conf. Data Mining (SDM), 2003.
[8] J. Yang, W. Wang, P.S. Yu, and J. Han, "Mining Long Sequential Patterns in a Noisy Environment," Proc. ACM SIGMOD, pp. 406-417, 2002.
[9] S. Sinha and M. Tompa, "YMF: A Program for Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation," Nucleic Acids Research, vol. 31, no. 13, pp. 3586-3588, 2003.
[10] G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole, "Weeder Web: Discovery of Transcription Factor Binding Sites in a Set of Sequences From Co-Regulated Genes," Nucleic Acids Research, vol. 32, pp. W199-W203, 2004.
[11] E. Eskin and P.A. Pevzner, "Finding Composite Regulatory Patterns in DNA Sequences," Proc. 10th Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. S354-S363, 2002.
[12] J. Buhler and M. Tompa, "Finding Motifs Using Random Projections," J. Computational Biology, vol. 9, no. 2, pp. 225-242, 2002.
[13] G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth, "Rule Discovery from Time Series," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 16-22, 1998.
[14] S. Hoppner, "Discovery of Temporal Patterns—Learning Rules about the Qualitative Behaviour of Time Series," Proc. Fifth European Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 192-203, 2001.
[15] P. Patel, E. Keogh, J. Lin, and S. Lonardi, "Mining Motifs in Massive Time Series Databases," Proc. IEEE Int'l Conf. Data Mining (ICDM), pp. 370-377, 2002.
[16] H. Wu, B. Salzberg, G.C. Sharp, S.B. Jiang, H. Shirato, and D. Kaeli, "Subsequence Matching on Structured Time Series Data," Proc. ACM SIGMOD, pp. 682-693, 2005.
[17] M.J. Zaki, "Sequence Mining in Categorical Domains: Incorporating Constrains," Proc. Ninth Int'l Conf. Information and Knowledge Management (CIKM), pp. 442-429, 2000.
[18] B.Y.-C. Chiu, E.J. Keogh, and S. Lonardi, "Probabilistic Discovery of Time Series Motifs," Proc. Int'l Conf. Knowledge Discovery and Data Mining (KDD), pp. 493-498, 2003.
[19] W. Wang and J. Yang, Mining Sequential Patterns from Large Data Sets, vol. 28, Springer-Verlag, 2005.
[20] M. Das and H.K. Dai, "A Survey of DNA Motif Finding Algorithms," BMC Bioinformatics, vol. 8, p. S21-S33, 2007.
[21] G.K. Sandve and F. Drabløs, "A Survey of Motif Discovery Methods in an Integrated Framework," Biology Direct, vol. 1, pp. 11-26, 2006.
[22] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, "PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 215-224, 2001.
[23] J. Pei, J. Han, and W. Wang, "Mining Sequential Patterns with Constraints in Large Databases," Proc. 11th Int'l Conf. Information and Knowledge Management (CIKM), pp. 18-25, 2002.
[24] A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert, "Approaches to the Automatic Discovery of Patterns in Biosequences," J. Computational Biology, vol. 5, pp. 279-305, 1998.
[25] L. Marsan and M.-F. Sagot, "Algorithms for Extracting Structured Motifs Using a Suffix Tree with Application to Promoter and Regulatory Site Consensus Identification," J. Computational Biology, vol. 7, nos. 3/4, pp. 345-360, 2000.
[26] F. Zhu, X. Yan, J. Han, and P.S. Yu, "Efficient Discovery of Frequent Approximate Sequential Patterns," Proc. Seventh IEEE Int'l Conf. Data Mining (ICDM), 2007.
[27] S. Rajasekaran, S. Balla, C.-H. Huang, V. Thapar, M.R. Gryk, M.W. Maciejewski, and M.R. Schiller, "Exact Algorithms for Motif Search," Proc. Asia-Pacific Bioinformatics Conf. (APBC), pp. 239-248, 2005.
[28] J. Davila, S. Balla, and S. Rajasekaran, "Fast and Practical Algorithms for Planted (l, d) Motif Search," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 4, no. 4, pp. 544-552, Oct.-Dec. 2007.
[29] J. Davila, S. Balla, and S. Rajasekaran, "Space and Time Efficient Algorithms for Planted Motif Search," Proc. Int'l Conf. Computational Science, pp. 822-829, 2006.
[30] S. Rajasekaran, S. Balla, and C.-H. Huang, "Exact Algorithms for Planted Motif Challenge Problems," Proc. Asia-Pacific Bioinformatics Conf. (APBC), pp. 249-259, 2005.
[31] T.L. Bailey and C. Elkan, "Unsupervised Learning of Multiple Motifs in Biopolymers Using EM," Machine Learning, vol. 21, nos. 1/2, pp. 51-80, 1995.
[32] W. Thompson, E.C. Rouchka, and C.E. Lawrence, "Gibbs Recursive Sampler: Finding Transcription Factor Binding Sites," Nucleic Acids Research, vol. 31, no. 13, pp. 3580-3585, 2003.
[33] G. Narasimhan, C. Bu, Y. Gao, X. Wang, N. Xu, and K. Mathee, "Mining Protein Sequences for Motifs," J. Computational Biology, vol. 9, no. 5, pp. 707-720, 2002.
[34] I. Rigoutsos and A. Floratos, "Motif Discovery without Alignment or Enumeration (Extended Abstract)," Proc. Second Ann. Int'l Conf. Computational Molecular Biology (RECOMB), pp. 221-227, 1998.
[35] M. Tompa et al., "Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites," Nature Biotechnology, vol. 23, pp. 137-144, 2005.
[36] A.W.-C. Fu, E.J. Keogh, L.Y.H. Lau, and C.A. Ratanamahatana, "Scaling and Time Warping in Time Series Querying," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 649-660, 2005.
[37] M. Vlachos, G. Kollios, and D. Gunopulos, "Discovering Similar Multidimensional Trajectories," Proc. 18th IEEE Int'l Conf. Data Eng. (ICDE), pp. 673-684, 2002.
[38] L. Chen, M. Tamer Ozsu, and V. Oria, "Robust and Fast Similarity Search for Moving Object Trajectories," Proc. ACM SIGMOD, pp. 491-502, 2005.
[39] Y. Zhu and D. Shasha, "Warping Indexes with Envelope Transforms for Query by Humming," Proc. ACM SIGMOD, pp. 181-192, 2003.
[40] A. Udechukwu, K. Barker, and R. Alhajj, "Discovering all Frequent Trends in Time Series," Proc. Winter Int'l Symp. Information and Comm. Technologies, vol. 58, pp. 1-6, 2004.
[41] Y. Zhang and M.J. Zaki, "SMOTIF: Efficient Structured Pattern and Profile Motif Search," Algorithms for Molecular Biology, vol. 1, pp. 22-45, 2006.
[42] G. Navarro and M. Raffinot, "Fast and Simple Character Classes and Bounded Gaps Pattern Matching, with Applications to Protein Searching," J. Computational Biology, vol. 10, no. 6, pp. 903-923, 2003.
[43] A. Policriti, N. Vitacolonna, M. Morgante, and A. Zuccolo, "Structured Motifs Search," Proc. Eighth Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB), pp. 133-139, 2004.
[44] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.-F. Sagot, "Efficient Extraction of Structured Motifs Using Box-Links," Proc. Int'l Symp. String Processing and Information Retrieval (SPIRE), pp. 267-268, 2004.
[45] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.-F. Sagot, "A Highly Scalable Algorithm for the Extraction of Cis-Regulatory Regions," Proc. Asia-Pacific Bioinformatics Conf. (APBC), pp. 273-282, 2005.
[46] A.M. Carvalho et al., "An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences," IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 126-140, Apr.-June 2006.
[47] N. Pisanti, A.M. Carvalho, L. Marsan, and M.-F. Sagot, "Risotto: Fast Extraction of Motifs with Mismatches," Proc. Seventh Latin Am. Theoretical Informatics Symp. (LATIN), pp. 757-768, 2006.
[48] Y. Zhang and M.J. Zaki, "EXMOTIF: Efficient Structured Motif Extraction," Algorithms for Molecular Biology, vol. 1, pp. 21-38, 2006.
[49] F. Fassetti, G. Greco, and G. Terracina, "Mining Loosely Structured Motifs from Biological Data," IEEE Trans. Knowledge and Data Eng., vol. 20, no. 11, pp. 1472-1489, Nov. 2008.
[50] L. DS, "Transcription Factors: An Overview," Int'l J. Biochemistry and Cell Biology, vol. 29, no. 12, pp. 1305-1312, 1997.
[51] I. Jonassen, J.F. Collins, and D.G. Higgins, "Finding Flexible Patterns in Unaligned Protein Sequences," Protein Science, vol. 4, no. 8, pp. 1587-1595, 1995.
[52] "Data Sets from Analysis of Financial Time Series," http://www.gsb.uchicago.edu/fac/ruey.tsay/ teachingfts/, 2010.
[53] R.S. Tsay, Analysis of Financial Time Series, first ed., Wiley-Interscience, Oct. 2001.
[54] P.A. Pevzner and S.-H. Sze, "Combinatorial Approaches to Finding Subtle Signals in DNA Sequences," Proc. Eighth Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 269-278, 2000.
[55] "cSPADE Source Code," http://www.cs.rpi.edu/zakisoftware/, 2010.
[56] "CloSpan Source Code," http:/illimine.cs.uiuc.edu/, 2011.
[57] "YMF Source Code," http://bio.cs.washington.edusoftware. html , 2010.
[58] "Weeder Source Code," http://www.pesolelab.it/Toolind.php, 2010.
[59] "Random Projections Source Code," http://www.cse.wustl.edu/jbuhlerpgt/, 2010.
[60] S. Tata, R.A. Hankins, and J.M. Patel, "Practical Suffix Tree Construction," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 36-47, 2004.
[61] A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions via Hashing," Proc. Int'l Conf. Very Large Data Bases (VLDB), pp. 518-529, 1999.
[62] T.L. Bailey and C. Elkan, "Fitting a Mixture Model by Expectation Maximization to Discover Motifs in Biopolymers," Proc. Second Int'l Conf. Intelligent Systems for Molecular Biology (ISMB), pp. 28-36, 1994.
[63] "TRANSFAC," http://www.gene-regulation.com/pub databases.html , 2011.

