The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.02 - April-June (2008 vol.5)
pp: 281-290
ABSTRACT
Let A be a sequence of n real numbers, L1 and L2 be two integers such that L1 ≤ L2 , and R1 and R2 be two real numbers such that R1 ≤ R2. An interval of A is feasible if its length is between L1 and L2 and its average is between R1 and R2. In this paper, we study the following problems: finding all feasible intervals of A, counting all feasible intervals of A, finding a maximum cardinality set of non-overlapping feasible intervals of A, locating a longest feasible interval of A, and locating a shortest feasible interval of A. The problems are motivated from the problem of locating CpG islands in biomolecular sequences. In this paper, we firstly show that all the problems have an Ω(n log n)-time lower bound in the comparison model. Then, we use geometric approaches to design optimal algorithms for the problems. All the presented algorithms run in an on-line manner and use O(n) space.
INDEX TERMS
algorithms, data structures, analysis of algorithms, geometrical problems and computations
CITATION
Yong-Hsiang Hsieh, Chih-Chiang Yu, Biing-Feng Wang, "Optimal Algorithms for the Interval Location Problem with Range Constraints on Length and Average", IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.5, no. 2, pp. 281-290, April-June 2008, doi:10.1109/TCBB.2007.70217
REFERENCES
[1] A. Aho, J. Hopcroft, and J. Ullman, The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.
[2] L. Allison, “Longest Biased Interval and Longest Non-Negative Sum Interval,” Bioinformatics, vol. 19, no. 10, pp. 1294-1295, 2003.
[3] F. Antequera, “Structure, Function and Evolution of CpG Island Promoters,” Cellular and Molecular Life Sciences, vol. 60, no. 8, pp.1647-1658, 2003.
[4] F. Antequera and A. Bird, “Number of CpG Islands and Genes in Human and Mouse,” Proc. Nat'l Academy of Sciences, vol. 90, no. 24, pp. 11995-11999, 1993.
[5] R. Bayer, “Symmetric Binary B-Trees: Data Structure and Maintenance Algorithms,” Acta Informatica, vol. 1, pp. 290-306, 1972.
[6] R. Bayer and E.M. McCreight, “Organization and Maintenance of Large Ordered Indexes,” Acta Informatica, vol. 1, pp. 173-189, 1972.
[7] B. Chazelle, “A Functional Approach to Data Structures and Its Use in Multidimensional Searching,” SIAM J. Computing, vol. 17, no. 3, pp. 427-462, 1988.
[8] K.-Y. Chen and K.-M. Chao, “Optimal Algorithms for Locating the Longest and Shortest Segments Satisfying a Sum or an Average Constraint,” Information Processing Letters, vol. 96, no. 6, pp. 197-201, 2005.
[9] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed. McGraw-Hill, 2001.
[10] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press, 1998.
[11] M. Esteller, “CpG Island Hypermethylation and Tumor Suppressor Genes: A Booming Present, A Brighter Future,” Oncogene, vol. 21, no. 35, pp. 5427-5440, 2002.
[12] G.N. Frederickson and S. Rodger, “A New Approach to the Dynamic Maintenance of Maximal Points in a Plane,” Discrete and Computational Geometry, vol. 5, no. 4, pp. 365-374, 1990.
[13] T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama, “Interval Finding and Its Application to Data Mining,” Proc. Seventh Ann. Int'l Symp. Algorithms and Computation, pp. 55-64, 1996.
[14] M.H. Goldwasser, M.-Y. Kao, and H.-I. Lu, “Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications,” J. Computer and System Sciences, vol. 70, no. 2, pp. 128-144, 2005.
[15] L.J. Guibas and R. Sedgewick, “A Dichromatic Framework for Balanced Trees,” Proc. 19th Ann. Symp. Foundations of Computer Science, pp. 8-21, 1978.
[16] S.-Y. Hsieh and T.-Y. Chou, “Finding a Weight-Constrained Maximum-Density Subtree in a Tree,” Proc. 16th Ann. Int'l Symp. Algorithms and Computation, pp. 944-953, 2005.
[17] X. Huang, “An Algorithm for Identifying Regions of a DNA Sequence that Satisfy a Content Requirement,” Computer Applications in the Biosciences, vol. 10, no. 3, pp. 219-225, 1994.
[18] I.P. Ioshikhes and M.Q. Zhang, “Large-Scale Human Promoter Mapping Using CpG Islands,” Nature Genetics, vol. 26, pp. 61-63, 2000.
[19] R. Janardan, “On the Dynamic Maintenance of Maximal Points in the Plane,” Information Processing Letters, vol. 40, no. 2, pp. 59-64, 1991.
[20] S. Kapoor, “Dynamic Maintenance of Maxima of 2-D Point Sets,” SIAM J. Computing, vol. 29, no. 6, pp. 1858-1877, 2000.
[21] S.K. Kim, “Linear-Time Algorithm for Finding a Maximum-Density Segment of a Sequence,” Information Processing Letters, vol. 86, no. 6, pp. 339-342, 2003.
[22] Y.-L. Lin, X. Huang, T. Jiang, and K.-M. Chao, “MAVG: Locating Non-Overlapping Maximum Average Segments in a Given Sequence,” Bioinformatics, vol. 19, no. 1, pp. 151-152, 2003.
[23] Y.-L. Lin, T. Jiang, and K.-M. Chao, “Efficient Algorithms for Locating the Length-Constrained Heaviest Segments with Applications to Biomolecular Sequences Analysis,” J. Computer and System Sciences, vol. 65, no. 3, pp. 570-586, 2002.
[24] R.-R. Lin, W.-H. Kuo, and K.-M. Chao, “Finding a Length-Constrained Maximum-Density Path in a Tree,” J. Combinatorial Optimization, vol. 9, no. 2, pp. 147-156, 2005.
[25] E.M. McCreight, “Priority Search Trees,” SIAM J. Computing, vol. 14, no. 2, pp. 257-276, 1985.
[26] A. Nekrutenko and W.-H. Li, “Assessment of Compositional Heterogeneity within and between Eukaryotic Genomes,” Genome Research, vol. 10, no. 12, pp. 1986-1995, 2000.
[27] M.H. Overmars and J. van Leeuwen, “Maintenance of Configurations in the Plane,” J. Computer and System Sciences, vol. 23, no. 2, pp. 166-204, 1981.
[28] F.P. Preparata and M.I. Shamos, Computational Geometry. Springer, 1985.
[29] L. Scotto and R.K. Assoian, “A GC-Rich Domain with Bifunctional Effects on mRNA and Protein Levels: Implications for Control of Transforming Growth Factor Beta 1 Expression,” Molecular and Cellular Biology, vol. 13, no. 6, pp. 3588-3597, 1993.
[30] L. Wang and Y. Xu, “SEGID: Identifying Interesting Segments in (Multiple) Sequence Alignments,” Bioinformatics, vol. 19, no. 2, pp.297-298, 2003.
6 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool