2008 7th Computer Information Systems and Industrial Management Applications Nonlinear Dimensionality Reduction by Isomap and MLEdim as Applied to Amino-Acid Distribution in Yeast ORFs June 26-June 28 ISBN: 978-0-7695-3184-7
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CISIM.2008.44
We consider the multivariate distribution of amino-acids coding for proteins in Open Reading Frames (ORFs). An appropriate statistical model of this distribution might throw some light on the interdependency of the 20 amino-acids and contribute to the problem of verification of known ORFs (At the date 3. April 2008 only 71.02\% of known ORFs were verified). From a graphical analysis od the data we deduce that the data cloud mightbe modelled by a curvilinear manifold of smaller dimension embedded in a larger, 20-dimensional space. To check that assumption we have applied to the recorded data (containing frequency of appearing 20 amino-acids in ORFs found in the 7th yeast chromosome) two nonlinear methods referred to as the Isomap (Tennenbaum et al., 2000 ) and MLEdim (Levina and Bickel, 2005). These two methods, based on complete different principles, gave similar results: the true 'intrinsic' dimension of the investigated data appears several dimensions smaller as originally supposed.
Index Terms:
intrinsic dimension, reduction of dimensionality, genetic code, Open Reading Frames (ORFs) in yeast, Isomap, MLEdim estimator
Citation:
Anna Bartkowiak, "Nonlinear Dimensionality Reduction by Isomap and MLEdim as Applied to Amino-Acid Distribution in Yeast ORFs," cisim, pp.183-188, 2008 7th Computer Information Systems and Industrial Management Applications, 2008 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||