CSDL Home IEEE/ACM Transactions on Computational Biology and Bioinformatics 2008 vol.5 Issue No.04 - October-December

Subscribe

Issue No.04 - October-December (2008 vol.5)

pp: 557-567

Alexander Schliep , MPI for Molecular Genetics, Berlin

Roland Krause , MPI for Molecular Genetics, Berlin

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.50

ABSTRACT

The representation of a genome by oligonucleotide probes is a prerequisite for the analysis of many of its basic properties, such as transcription factor binding sites, chromosomal breakpoints, gene expression of known genes and detection of novel genes, in particular those coding for small RNAs. An ideal representation would consist of a high density set of oligonucleotides with similar melting temperatures that do not cross-hybridize with other regions of the genome and are equidistantly spaced. The implementation of such design is typically called a tiling array or genome array. We formulate the minimal cost tiling path problem for the selection of oligonucleotides from a set of candidates. Computing the selection of probes requires multi-criterion optimization, which we cast into a shortest path problem. Standard algorithms running in linear time allow us to compute globally optimal tiling paths from millions of candidate oligonucleotides on a standard desktop computer for most problem variants. The solutions to this multi-criterion optimization are spatially adaptive to the problem instance. Our formulation incorporates experimental constraints with respect to specific regions of interest and trade offs between hybridization parameters, probe quality and tiling density easily.

INDEX TERMS

Biology and genetics, Graph Theory

CITATION

Alexander Schliep, Roland Krause, "Efficient Algorithms for the Computational Design of Optimal Tiling Arrays",

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol.5, no. 4, pp. 557-567, October-December 2008, doi:10.1109/TCBB.2008.50REFERENCES

- [12] D.W. Selinger, K.J. Cheung, R. Mei, E.M. Johansson, C.S. Richmond, F.R. Blattner, D.J. Lockhart, and G.M. Church, “RNA Expression Analysis Using a 30 Base Pair Resolution Escherichia coli Genome Array,”
Nature Biotechnology, vol. 18, no. 12, pp. 1262-1268, Dec. 2000.- [16] E. Prak and H. Kazazian Jr., “Mobile Elements and the Human Genome,”
Nature Rev. Genetics, vol. 1, no. 2, pp. 134-144, 2000.- [17] A. Smit, R. Hubley, and P. Green,
RepeatMasker Open-3.0. Inst. for Systems Biology, http:/www.repeatmasker.org, 2004.- [18] E. Ryder, R. Jackson, A. Ferguson-Smith, and S. Russell,
MAMMOT—a Set of Tools for the Design, Management and Visualization of Genomic Tiling Arrays, pp. 883-884, 2006.- [19] E. Lander et al., “Initial Sequencing and Analysis of the Human Genome,”
Nature, vol. 409, no. 6822, pp. 860-921, 2001.- [31] A. Aggarwal, B. Schieber, and T. Tokuyama, “Finding a Minimum Weight K-Link Path in Graphs with Monge Property and Applications,”
Proc. Ninth Ann. Symp. Computational Geometry (SCG '93), pp. 189-197, 1993.- [32] E.W. Dijkstra, “A Note on Two Problems in Connexion with Graphs,”
Numerische Mathematik, vol. 1, Mathematisch Centrum, pp. 269-271, 1959.- [34] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein,
Introduction to Algorithms, second ed. MIT Press, Sept. 2001.- [36] A. Schliep, “The Software GADAR and Its Application to Extremal Graph Theory,”
Proc. 25th Southeastern Int'l Conf. Combinatorics, Graph Theory and Computing, vol. 104, pp. 193-203, 1994.- [37] S. Kurtz, “The Vmatch Large Scale Sequence Analysis Software,”
Ref Type: Computer Program, pp. 4-12, 2003.- [38] S. Bienert, “Flexible Combination of Filters for Oligodesign,” Diplomathesis, Center for Bioinformatics, Universität Hamburg, 2006.
- [43] A. Schliep and R. Krause, “Efficient Computational Design of Tiling Arrays Using a Shortest Path Approach,”
Lecture Notes in Computer Science, vol. 4645, p. 383, 2007.- [44]
Boost c++ Libraries, http:/www.boost.org/, 2008.- [45]
Automatically Tuned Linear Algebra Software (ATLAS), http:/math-atlas.sourceforge.net/, 2008.- [46]
Python, http:/www.python.org/, 2008.- [47]
Numpy, http:/numpy.scipy.org/, 2008.- [51] S. Cole et al., “Deciphering the Biology of Mycobacterium Tuberculosis from the Complete Genome Sequence,”
Nature, vol. 393, pp. 537-544, 1998.- [52] J. SantaLucia Jr., H. Allawi, and P. Seneviratne, “Improved Nearest-Neighbor Parameters for Predicting DNA Duplex Stability,”
Biochemistry, vol. 35, no. 11, pp. 3555-3562, 1996. |