|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
| ASCII Text | x | ||
| Fabio Fassetti, Gianluigi Greco, Giorgio Terracina, "Mining Loosely Structured Motifs from Biological Data," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 11, pp. 1472-1489, November, 2008. | |||
| BibTex | x | ||
| @article{ 10.1109/TKDE.2008.65, author = {Fabio Fassetti and Gianluigi Greco and Giorgio Terracina}, title = {Mining Loosely Structured Motifs from Biological Data}, journal ={IEEE Transactions on Knowledge and Data Engineering}, volume = {20}, number = {11}, issn = {1041-4347}, year = {2008}, pages = {1472-1489}, doi = {http://doi.ieeecomputersociety.org/10.1109/TKDE.2008.65}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Knowledge and Data Engineering TI - Mining Loosely Structured Motifs from Biological Data IS - 11 SN - 1041-4347 SP1472 EP1489 EPD - 1472-1489 A1 - Fabio Fassetti, A1 - Gianluigi Greco, A1 - Giorgio Terracina, PY - 2008 KW - Data mining KW - Bioinformatics (genome or protein) databases KW - Mining methods and algorithms VL - 20 JA - IEEE Transactions on Knowledge and Data Engineering ER - | |||
[1] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 11th Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, 1995.
[2] N. Alon, Y. Matias, and M. Szegedy, “The Space Complexity of Approximating the Frequency Moments,” Proc. 28th ACM Symp. Theory of Computing (STOC '96), pp. 20-29, 1996.
[3] A. Apostolico and M. Crochemore, “String Pattern Matching for a Deluge Survival Kit,” Handbook of Massive Data Sets, J. Abello, P.M.Pardalos and M.G.C. Resende, eds., Kluwer Academic, 2000.
[4] M.I. Arnone and E.H. Davidson, “The Hardwiring of Development: Organization and Function of Genomic Regulatory Systems,” Development, vol. 124, pp. 1851-1864, 1997.
[5] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential Pattern Mining Using a Bitmap Representation,” Proc. Eighth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '02), pp. 429-435, 2002.
[6] B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” Proc. 21st ACM SIGACT-SIGMOD-SIGART Symp. Principles of Database Systems (PODS '02), pp. 1-16, 2002.
[7] R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval. Addison Wesley Longman, 1999.
[8] T.L. Bailey and C. Elkan, “Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization,” Machine Learning, vol. 21, nos. 1-2, pp. 51-80, 1995.
[9] A. Bairoch, “PROSITE: A Dictionary of Protein Sites and Patterns,” Nucleic Acid Research, vol. 20, pp. 2013-2018, 1992.
[10] A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert, “Approaches to the Automatic Discovery of Patterns in Biosequences,” J. Computational Biology, vol. 5, no. 2, pp. 277-304, 1998.
[11] A. Brazma, I. Jonassen, J. Vilo, and E. Ukkonen, “Predicting Gene Regulatory Elements in Silico on a Genomic Scale,” Genome Research, vol. 8, pp. 1202-1215, 1998.
[12] J. Buhler and M. Tompa, “Finding Motifs Using Random Projections,” Proc. Fifth Ann. Int'l Conf. Computational Biology (RECOMB '01), pp. 69-76, 2001.
[13] A.M. Carvalho, A.T. Freitas, A.L. Oliveira, and M.F. Sagot, “An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 3, no. 2, pp. 126-140, Apr.-June 2006.
[14] J.M. Chen, N. Chuzhanova, P.D. Stenson, C. Ferec, and D.N. Cooper, “Meta-Analysis of Gross Insertions Causing Human Genetic Disease: Novel Mutational Mechanisms and the Role of Replication Slippage,” Human Mutation, vol. 25, no. 2, pp. 207-221, 2005.
[15] E. Cohen, M. Datar, S. Fujiwara, A. Gionis, P. Indyk, R. Motwani, J.D. Ullman, and C. Yang, “Finding Interesting Associations without Support Pruning,” IEEE Trans. Knowledge and Data Eng., vol. 13, no. 1, pp. 64-78, Jan./Feb. 2001.
[16] G. Cormode, M. Datar, P. Indyk, and S. Muthukrishnan, “Comparing Data Streams Using Hamming Norms (How to Zero In),” IEEE Trans. Knowledge and Data Eng., vol. 15, no. 3, pp. 529-540, May/June 2003.
[17] G. Cormode and S. Muthukrishnan, “An Improved Data Stream Summary: The Count-Min Sketch and Its Applications,” J.Algorithms, vol. 55, no. 1, pp. 58-75, 2005.
[18] A. Dobra, M. Garofalakis, J. Gehrke, and R. Rastogi, “Processing Complex Aggregate Queries over Data Streams,” Proc. ACM SIGMOD '02, pp. 61-72, 2002.
[19] I. Erill, M. Escribano, S. Campoy, and J. Barbé, “In Silico Analysis Reveals Substantial Variability in the Gene Contents of the Gamma Proteobacteria Lexa-Regulon,” Bioinformatics, vol. 19, no. 17, pp. 2225-2236, 2003.
[20] E. Eskin and P.A. Pevzner, “Finding Composite Regulatory Patterns in DNA Sequences,” Proc. 10th Int'l Conf. Intelligent Systems for Molecular Biology (ISMB '02), pp. 354-363, 2002.
[21] M. Ester and X. Zhang, “A Top-Down Method for Mining Most-Specific Frequent Patterns in Biological Sequences,” Proc. SIAM Int'l Conf. Data Mining (SDM), 2004.
[22] P.B. Gibbons and Y. Matias, “Synopsis Data Structures for Massive Data Sets,” External Memory Algorithms, pp. 39-70, 1999.
[23] P.B. Gibbons and S. Tirthapura, “Estimating Simple Functions on the Union of Data Streams,” Proc. 13th ACM Symp. Parallel Algorithms and Architectures (SPAA '01), pp. 281-291, 2001.
[24] C.A. Gross, M. Lonetto, and R. Losick, “Bacterial Sigma Factors,” Transcriptional Regulation, vol. 1, pp. 129-176, 1992.
[25] D. GuhaThakurta and G.D. Stormo, “Identifying Target Sites for Cooperatively Binding Factors,” Bioinformatics, vol. 17, no. 7, pp.608-621, 2001.
[26] D. Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambrige Univ. Press, 1997.
[27] J. van Helden, A.F. Rios, and J. Collado-Vides, “Discovering Regulatory Elements in Non-Coding Sequences by Analysis of Spaced Dyads,” Nucleic Acids Research, vol. 28, no. 8, pp. 1808-1818, 2000.
[28] G. Hertz and G. Stormo, “Identifying DNA and Protein Patterns with Statistically Significant Alignments of Multiple Sequences,” Bioinformatics, vol. 15, nos. 7-8, pp. 563-577, 1999.
[29] D.A. Hinds, L.L. Stuve, G.B. Nilsen, E. Halperin, E. Eskin, D.G. Ballinger, K.A. Frazer, and D.R. Cox, “Whole-Genome Patterns of Common DNA Variation in Three Human Populations,” Science, vol. 307, no. 5712, pp. 1072-1079, 2005.
[30] J.D. Hughes, P.W. Estep, S. Tavazoie, and G.M. Church, “Computational Identification of CIS-Regulatory Elements Associated with Groups of Functionally Related Genes in Saccharomyces Cerevisiae,” J. Molecular Biology, vol. 296, no. 5, pp. 1205-1214, 2000.
[31] P. Indyk, N. Koudas, and S. Muthukrishnan, “Identifying Representative Trends in Massive Time Series Data Sets Using Sketches,” Proc. 26th Int'l Conf. Very Large Databases (VLDB '00), pp. 363-372, 2000.
[32] I. Jonassen, J.F. Collins, and D.G. Higgins, “Finding Flexible Patterns in Unaligned Protein Sequences,” Protein Science, vol. 4, pp. 1587-1595, 1995.
[33] U. Keich and P.A. Pevzner, “Finding Motifs in the Twilight Zone,” Proc. Sixth Ann. Int'l Conf. Research in Computational Molecular Biology (RECOMB '02), pp. 195-204, 2002.
[34] L. Li, Y. Liang, and R.L. Bass, “GAPWM: A Genetic Algorithm Method for Optimizing a Position Weight Matrix,” Bioinformatics, vol. 23, no. 10, pp. 1188-1194, 2007.
[35] H. Mannila, H. Toivonen, and A. Inkeri Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, 1997.
[36] L. Marsan and M.F. Sagot, “Algorithms for Extracting Structured Motifs Using a Suffix Tree with Application to Promoter and Regulatory Site Consensus Identification,” J. Computational Biology, vol. 7, pp. 345-360, 2000.
[37] N.D. Mendes, A.C. Casimiro, P.M. Santos, I. Sà-Correia, A.L. Oliveira, and A.T. Freitas, “MUSA: A Parameter Free Algorithm for the Identification of Biologically Significant Motifs,” Bioinformatics, vol. 22, no. 24, pp. 2996-3002, 2006.
[38] G. Navarro, “A Guided Tour to Approximate String Matching,” ACM Computing Surveys, vol. 33, no. 1, pp. 31-88, 2001.
[39] A. Neuwald, J. Liu, and C. Lawrence, “Gibbs Motif Sampling: Detection of Bacterial Outer Membrane Repeats,” Protein Science, vol. 4, pp. 1618-1632, 1995.
[40] A.F. Neuwald and P. Green, “Detecting Patterns in Protein Sequences,” J. Molecular Biology, vol. 239, pp. 698-712, 1994.
[41] M. Osanai, H. Takahashi, K.K. Kojima, M. Hamada, and H. Fujiwara, “Essential Motifs in the 3' Untranslated Region Required for Retrotransposition and the Precise Start of Reverse Transcription in Non-Long-Terminal-Repeat Retrotransposon SART1,” Molecular and Cellular Biology, vol. 24, no. 19, pp. 7902-7913, 2004.
[42] G. Pavesi, G. Mauri, and G. Pesole, “In Silico Representation and Discovery of Transcription Factor Binding Sites,” Briefings in Bioinformatics, vol. 5, pp. 217-236, 2004.
[43] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Prefixspan: Mining Sequential Patterns by Prefix-Projected Growth,” Proc. 17th Int'l Conf. Data Eng. (ICDE '01), pp. 215-224, 2001.
[44] S. Robin, J.-J. Daudin, H. Richard, M.-F. Sagot, and S. Schbath, “Occurrence Probability of Structured Motifs in Random Sequences,” J. Computational Biology, vol. 9, pp. 761-773, 2003.
[45] G.K. Sandve, O. Abul, V. Walseng, and F. Drabløs, “Improved Benchmarks for Computational Motif Discovery,” BMC Bioinformatics, vol. 8, no. 193, pp. 1-13, 2007.
[46] G.K. Sandve and F. Drabløs, “A Survey of Motif Discovery Methods in an Integrated Framework,” Biology Direct, vol. 1, no. 11, pp. 1-16, 2006.
[47] S. Sinha, “Composite Motifs in Promoter Regions of Genes: Models and Algorithms,” General Report, 2002.
[48] S. Sinha and M. Tompa, “YMF: A Program for Discovery of Novel Transcription Factor Binding Sites by Statistical Overrepresentation,” Nucleic Acid Research, vol. 31, no. 13, pp. 3586-3588, 2003.
[49] H.O. Smith, T.M. Annau, and S. Chandrasegaran, “Finding Sequence Motifs in Groups of Functionally Related Proteins,” Proc. Nat'l Academy of Sciences, pp. 826-830, 1990.
[50] R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. Fifth Int'l Conf. Extending Database Technology (EDBT '96), pp. 3-17, 1996.
[51] Z. Tu, S. Li, and C. Mao, “The Changing Tails of a Novel Short Interspersed Element in Aedes Aegypti: Genomic Evidence for Slippage Retrotransposition and the Relationship between 3' Tandem Repeats and the Poly(da) Tail,” Genetics, vol. 168, no. 4, pp. 2037-2047, 2004.
[52] A. Vanet, L. Marsan, A. Labigne, and M.-F. Sagot, “Inferring Regulatory Elements from a Whole Genome. An Analysis of Helicobacter Pylori $\sigma^{80}$ Family of Promoter Signals,” J. Molecular Biology, vol. 297, pp. 335-353, 2000.
[53] A. Vanet, L. Marsan, and M.-F. Sagot, “Promoter Sequences and Algorithmical Methods for Identifying Them,” Research in Microbiology, vol. 150, no. 9, pp. 779-799, 1999.
[54] K. Wang, Y. Xu, and J. Xu Yu, “Scalable Sequential Pattern Mining for Biological Sequences,” Proc. ACM 13th Conf. Information and Knowledge Management (CIKM '04), pp. 178-187, 2004.
[55] T. Werner, “Models for Prediction and Recognition of Eukaryotic Promoters,” Mammalian Genome, vol. 10, no. 2, pp. 168-175, 1999.
[56] T. Werner, “The State of the Art of Mammalian Promoter Recognition,” Briefings in Bioinformatics, vol. 4, no. 1, pp. 22-30, 2003.
[57] M.J. Zaki, “Spade: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, vol. 42, no. 1-2, pp. 31-60, 2001.
[58] Y. Zhang and M.J. Zaki, “EXMOTIF: Efficient Structured Motif Extraction,” Algorithms for Molecular Biology, vol. 1, no. 1,rec.No21, 2006.
[59] J. Zhu and M. Zhang, “SCPD: A Promoter Database for the Yeast Saccharomyces Cerevisiae,” Bioinformatics, vol. 15, nos. 7-8, pp. 607-611, 1999.

