loading...
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
2008 7th Computer Information Systems and Industrial Management Applications
Intrinsic Dimensionality of Data and of their Representatives. A Case Study of Amino-Acid Distribution in ORFs
June 26-June 28
ISBN: 978-0-7695-3184-7
By intrinsic dimensionality of a data set we mean the smallest number of base vectors which permit to reconstruct the considered set. Nowadays we obtain very huge data sets, which are computationally demanding. Therefore we look for some representative data vectors (prototypes) which might yield an insight into the data and be used for a (preliminary) data analysis. Let D of size n ? d denote the observed data set, and D1 of size M ? d the set of representatives of the data. Denote by r, the number of base vectors spanning D, and by r1 the number of base vectors spanning the data set D1 appropriately. Our questions: 1) Are r and r1 equal? 2) Say, we want to choose base vectors k and k1 approximating the sets D and D1 with a given accuracy. Are k and k1 equal? We answer these questions by considering the data set amino 569 containing frequency distribution of twenty amino-acids composing the ORFs in the 7th yeast chromosome. The answer is: twice NO.
Index Terms:
Data reduction, Choice of representatives, Neural Gas, Self-organizing map, Amino-acid distribution in ORFs
Citation:
Anna Bartkowiak, Adam Szustalewicz, "Intrinsic Dimensionality of Data and of their Representatives. A Case Study of Amino-Acid Distribution in ORFs," cisim, pp.177-182, 2008 7th Computer Information Systems and Industrial Management Applications, 2008
Usage of this product signifies your acceptance of the Terms of Use.