The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.06 - November/December (2011 vol.8)
pp: 1642-1652
Christoph Hafemeister , Max Planck Institute for Molecular Genetics, Ihnestr
Roland Krause , Max Planck Institute for Molecular Genetics, Ihnestr
Alexander Schliep , Rutgers The State University of New Jersey, Piscataway
ABSTRACT
For designing oligonucleotide tiling arrays popular, current methods still rely on simple criteria like Hamming distance or longest common factors, neglecting base stacking effects which strongly contribute to binding energies. Consequently, probes are often prone to cross-hybridization which reduces the signal-to-noise ratio and complicates downstream analysis. We propose the first computationally efficient method using hybridization energy to identify specific oligonucleotide probes. Our Cross-Hybridization Potential (CHP) is computed with a Nearest Neighbor Alignment, which efficiently estimates a lower bound for the Gibbs free energy of the duplex formed by two DNA sequences of bounded length. It is derived from our simplified reformulation of t-gap insertion-deletion-like metrics. The computations are accelerated by a filter using weighted ungapped q-grams to arrive at seeds. The computation of the CHP is implemented in our software OSProbes, available under the GPL, which computes sets of viable probe candidates. The user can choose a trade-off between running time and quality of probes selected. We obtain very favorable results in comparison with prior approaches with respect to specificity and sensitivity for cross-hybridization and genome coverage with high-specificity probes. The combination of OSProbes and our Tileomatic method, which computes optimal tiling paths from candidate sets, yields globally optimal tiling arrays, balancing probe distance, hybridization conditions, and uniqueness of hybridization.
INDEX TERMS
Biology and genetics, DNA microarrays, tiling arrays, oligonucleotide probes, cross hybridization.
CITATION
Christoph Hafemeister, Roland Krause, Alexander Schliep, "Selecting Oligonucleotide Probes for Whole-Genome Tiling Arrays with a Cross-Hybridization Potential", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.8, no. 6, pp. 1642-1652, November/December 2011, doi:10.1109/TCBB.2011.39
REFERENCES
[1] F. Li and G.D. Stormo, “Selection of Optimal DNA Oligos for Gene Expression Arrays,” Bioinformatics, vol. 17, no. 11, pp. 1067-1076, Nov. 2001.
[2] T.J. Albert, M.N. Molla, D.M. Muzny, L. Nazareth, D. Wheeler, X. Song, T.A. Richmond, C.M. Middle, M.J. Rodesch, C.J. Packard, G.M. Weinstock, and R.A. Gibbs, “Direct Selection of Human Genomic Loci by Microarray Hybridization,” Nature Methods, vol. 4, pp. 903-905, 2007.
[3] R. Sasidharan, A. Agarwal, J. Rozowsky, and M. Gerstein, “An Approach to Compare Genome Tiling Microarray and MPSS Sequencing Data for Transcript Mapping,” BMC Research Notes, vol. 2, no. 1, p. 150, July 2009.
[4] W. Huber, A. von Heydebreck, H. Sültmann, A. Poustka, and M. Vingron, “Variance Stabilization Applied to Microarray Data Calibration and to the Quantification of Differential Expression,” Bioinformatics, vol. 18, pp. S96-S104, Dec. 2002.
[5] T.E. Royce, J.S. Rozowsky, and M.B. Gerstein, “Assessing the Need for Sequence-Based Normalization in Tiling Microarray Experiments,” Bioinformatics, vol. 23, no. 8, pp. 988-997, Apr. 2007.
[6] H.-R. Chung, D. Kostka, and M. Vingron, “A Physical Model for Tiling Array Analysis,” Bioinformatics, vol. 23, no. 13, pp. i80-i86, June 2007.
[7] A. Schliep and R. Krause, “Efficient Algorithms for the Computational Design of Optimal Tiling Arrays,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 5, no. 4, pp. 557-567, Oct.-Dec. 2008.
[8] A.G. D'yachkov, A.J. Macula, W.K. Pogozelski, T.E. Renz, V.V. Rykov, and D.C. Torney, “New t-Gap Insertion-Deletion-Like Metrics for DNA Hybridization Thermodynamic Modeling,” J. Computational Biology, vol. 13, no. 4, pp. 866-881, May 2006.
[9] M.D. Kane, T.A. Jatkoe, C.R. Stumpf, J. Lu, J.D. Thomas, and S.J. Madore, “Assessment of the Sensitivity and Specificity of Oligonucleotide (50mer) Microarrays,” Nucleic Acid Research, vol. 28, no. 22, pp. 4552-4557, 2000.
[10] N. Reymond, H. Charles, L. Duret, F. Calevro, G. Beslon, and J.-M. Fayard, “ROSO: Optimizing Oligonucleotide Probes for Microarrays,” Bioinformatics, vol. 20, no. 2, pp. 271-273, Jan. 2004.
[11] S. Rimour, D. Hill, C. Militon, and P. Peyret, “GoArrays: Highly Dynamic and Efficient Microarray Probe Design,” Bioinformatics, vol. 21, no. 7, pp. 1094-1103, Apr. 2005.
[12] X. Wang and B. Seed, “Selection of Oligonucleotide Probes for Protein Coding Sequences,” Bioinformatics, vol. 19, no. 7, pp. 796-802, May 2003.
[13] R. Wernersson and H.B. Nielsen, “OligoWiz 2.0—Integrating Sequence Feature Annotation into the Design of Microarray Probes,” Nucleic Acids Research, vol. 33, pp. W611-W615, July 2005.
[14] H. Chen and B.M. Sharp, “Oliz, A Suite of Perl Scripts that Assist in the Design of Microarrays Using 50mer Oligonucleotides from the 3' Untranslated Region,” BMC Bioinformatics, vol. 3, p. 27, Oct. 2002.
[15] L. Kaderali and A. Schliep, “Selecting Signature Oligonucleotides to Identify Organisms Using DNA Arrays,” Bioinformatics, vol. 18, no. 10, pp. 1340-1349, Oct. 2002.
[16] S. Rahmann, “Fast Large Scale Oligonucleotide Selection Using the Longest Common Factor Approach,” J. Bioinformatics and Computational Biology, vol. 1, no. 2, pp. 343-361, July 2003.
[17] J.-M. Rouillard, C.J. Herbert, and M. Zuker, “OligoArray: Genome-Scale Oligonucleotide Design for Microarrays,” Bioinformatics, vol. 18, no. 3, pp. 486-487, Mar. 2002.
[18] J.-M. Rouillard, M. Zuker, and E. Gulari, “OligoArray 2.0: Design of Oligonucleotide Probes for DNA Microarrays Using a Thermodynamic Approach,” Nucleic Acids Research, vol. 31, no. 12, pp. 3057-3062, June 2003.
[19] J.D. Gans and M. Wolinsky, “Improved Assay-Dependent Searching of Nucleic Acid Sequence Databases,” Nucleic Acids Research, vol. 36, no. 12, p. e74, July 2008.
[20] P. Bertone, V. Trifonov, J.S. Rozowsky, F. Schubert, O. Emanuelsson, J. Karro, M.Y. Kao, M. Snyder, and M. Gerstein, “Design Optimization Methods for Genomic DNA Tiling Arrays,” Genome Research, vol. 16, no. 2, pp. 271-281, Feb. 2006.
[21] S. Gräf, F.G.G. Nielsen, S. Kurtz, M.A. Huynen, E. Birney, H. Stunnenberg, and P. Flicek, “Optimized Design and Assessment of Whole Genome Tiling Arrays,” Bioinformatics, vol. 23, no. 13, pp. i195-i204, July 2007.
[22] G.O.S. Thomassen, A.D. Rowe, K. Lagesen, J.M. Lindvall, and T. Rognes, “Custom Design and Analysis of High-Density Oligonucleotide Bacterial Tiling Microarrays,” PLoS One, vol. 4, no. 6, p. e5943, 2009.
[23] J. SantaLucia, “A Unified View of Polymer, Dumbbell, and Oligonucleotide DNA Nearest-Neighbor Thermodynamics,” Proc. Nat'l Academy Sciences USA, vol. 95, no. 4, pp. 1460-1465, Feb. 1998.
[24] A.E. Pozhitkov and D. Tautz, “An Algorithm and Program for Finding Sequence Specific Oligonucleotide Probes for Species Identification,” BMC Bioinformatics, vol. 3, p. 9, 2002.
[25] L. Zhang, C. Wu, R. Carta, and H. Zhao, “Free Energy of DNA Duplex Formation on Short Oligonucleotide Microarrays,” Nucleic Acids Research, vol. 35, no. 3, p. e18, 2007.
[26] M. Seringhaus, J. Rozowsky, T. Royce, U. Nagalakshmi, J. Jee, M. Snyder, and M. Gerstein, “Mismatch Oligonucleotides in Human and Yeast: Guidelines for Probe Design on Tiling Microarrays,” BMC Genomics, vol. 9, p. 635, 2008.
[27] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, “Basic Local Alignment Search Tool,” J. Molecular Biology, vol. 215, no. 3, pp. 403-410, Oct. 1990.
[28] S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs,” Nucleic Acids Research, vol. 25, no. 17, pp. 3389-3402, Sept. 1997.
[29] W.R. Pearson, “Rapid and Sensitive Sequence Comparison with FASTP and FASTA,” Methods Enzymology, vol. 183, pp. 63-98, 1990.
[30] S. Burkhardt, A. Crauser, P. Ferragina, H.-P. Lenhof, E. Rivals, and M. Vingron, “Q-Gram Based Database Searching Using a Suffix Array (QUASAR),” Proc. Third Int'l Conf. Computational Molecular Biology (RECOMB '99), pp. 77-83, 1999.
[31] P. Jokinen and E. Ukkonen, “Two Algorithms for Approximate String Matching in Static Texts,” Proc. 16th Symp. Math. Foundations of Computer Science, vol. 520, pp. 240-248, 1991.
[32] E. Ukkonen, “Approximate String-Matching with q-Grams and Maximal Matches,” Theoretical Computer Science, vol. 92, no. 1, pp. 191-211, 1992.
[33] E. Southern, K. Mir, and M. Shchepinov, “Molecular Interactions on Microarrays,” Nature Genetics, vol. 21, pp. 5-9, 1999.
[34] W.B. Langdon, G.J. Upton, and A.P. Harrison, “Probes Containing Runs of Guanines Provide Insights into the Biophysics and Bioinformatics of Affymetrix GeneChips,” Briefings in Bioinformatics, vol. 10, no. 3, pp. 259-277, May 2009.
[35] S. Burkhardt and J. Kärkkäinen, “Better Filtering with Gapped q-Grams,” Fundamenta Informaticae, pp. 73-85, 2001.
[36] A. Schliep, D.C. Torney, and S. Rahmann, “Group Testing with DNA Chips: Generating Designs and Decoding Experiments,” Proc. Second IEEE CS Bioinformatics (CSB '03) Conf., pp. 84-93, 2003.
[37] G.W. Klau, S. Rahmann, A. Schliep, M. Vingron, and K. Reinert, “Optimal Robust Non-Unique Probe Selection Using Integer Linear Programming,” Bioinformatics, vol. 20, pp. i186-i193, Aug. 2004.
[38] A. Phillippy, X. Deng, W. Zhang, and S. Salzberg, “Efficient Oligonucleotide Probe Selection for Pan-Genomic Tiling Arrays.” BMC Bioinformatics, vol. 10, p. 293, 2009.
[39] D.J. Lockhart, H. Dong, M.C. Byrne, M.T. Follettie, M.V. Gallo, M.S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Horton, and E.L. Brown, “Expression Monitoring by Hybridization to High-Density Oligonucleotide Arrays,” Nature Biotechnology, vol. 14, no. 13, pp. 1675-1680, Dec. 1996.
[40] S. Lemoine, F. Combes, and S.L. Crom, “An Evaluation of Custom Microarray Applications: The Oligonucleotide Design Challenge,” Nucleic Acids Research, vol. 37, no. 6, pp. 1726-1739, Apr. 2009.
[41] D. Martinez, R.M. Berka, B. Henrissat, M. Saloheimo, M. Arvas, S.E. Baker, J. Chapman, O. Chertkov, P.M. Coutinho, D. Cullen, E.G.J. Danchin, I.V. Grigoriev, P. Harris, M. Jackson, C.P. Kubicek, C.S. Han, I. Ho, L.F. Larrondo, A.L. de Leon, J.K. Magnuson, S. Merino, M. Misra, B. Nelson, N. Putnam, B. Robbertse, A.A. Salamov, M. Schmoll, A. Terry, N. Thayer, A. Westerholm-Parvinen, C.L. Schoch, J. Yao, R. Barabote, R. Barbote, M.A. Nelson, C. Detter, D. Bruce, C.R. Kuske, G. Xie, P. Richardson, D.S. Rokhsar, S.M. Lucas, E.M. Rubin, N. Dunn-Coleman, M. Ward, and T.S. Brettin, “Genome Sequencing and Analysis of the Biomass-Degrading Fungus Trichoderma reesei (syn. Hypocrea jecorina),” Nature Biotechnology, vol. 26, no. 5, pp. 553-560, May 2008.
[42] N.R. Markham and M. Zuker, “DINAMelt Web Server for Nucleic Acid Melting Prediction,” Nucleic Acids Research, vol. 33, pp. W577-W581, July 2005.
[43] R.A. Dimitrov and M. Zuker, “Prediction of Hybridization and Melting for Double-Stranded Nucleic Acids,” Biophysical J., vol. 87, no. 1, pp. 215-226, July 2004.
[44] E. Birney et al., “Identification and Analysis of Functional Elements in 1% of the Human Genome by the ENCODE Pilot Project,” Nature, vol. 447, no. 7146, pp. 799-816, 2007.
[45] T.J.P. Hubbard, B.L. Aken, S. Ayling, B. Ballester, K. Beal, E. Bragin, S. Brent, Y. Chen, P. Clapham, L. Clarke, G. Coates, S. Fairley, S. Fitzgerald, J. Fernandez-Banet, L. Gordon, S. Graf, S. Haider, M. Hammond, R. Holland, K. Howe, A. Jenkinson, N. Johnson, A. Kahari, D. Keefe, S. Keenan, R. Kinsella, F. Kokocinski, E. Kulesha, D. Lawson, I. Longden, K. Megy, P. Meidl, B. Overduin, A. Parker, B. Pritchard, D. Rios, M. Schuster, G. Slater, D. Smedley, W. Spooner, G. Spudich, S. Trevanion, A. Vilella, J. Vogel, S. White, S. Wilder, A. Zadissa, E. Birney, F. Cunningham, V. Curwen, R. Durbin, X.M. Fernandez-Suarez, J. Herrero, A. Kasprzyk, G. Proctor, J. Smith, S. Searle, and P. Flicek, “Ensembl 2009,” Nucleic Acids Research, vol. 37, no. 1, pp. D690-D697, 2009.
114 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool