This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
The Depth Problem: Identifying the Most Representative Units in a Data Group
Jan.-Feb. 2013 (vol. 10 no. 1)
pp. 161-172
Itziar Irigoien, Dept. of Comput. Sci. & Artificial Intell., Univ. of the Basque Country, Donostia, Spain
Francesc Mestres, Dept. of Genetics, Univ. of Barcelona, Barcelona, Spain
Concepcion Arenas, Dept. of Stat., Univ. of Barcelona, Barcelona, Spain
This paper presents a solution to the problem of how to identify the units in groups or clusters that have the greatest degree of centrality and best characterize each group. This problem frequently arises in the classification of data such as types of tumor, gene expression profiles or general biomedical data. It is particularly important in the common context that many units do not properly belong to any cluster. Furthermore, in gene expression data classification, good identification of the most central units in a cluster enables recognition of the most important samples in a particular pathological process. We propose a new depth function that allows us to identify central units. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multiattribute data). Therefore, it is very valuable in many biomedical applications, which usually involve noncontinuous data, such as clinical, pathological, or biological data sources. We validate the approach using artificial examples and apply it to empirical data. The results show the good performance of our statistical approach.
Index Terms:
Kernel,Gaussian distribution,Gene expression,Bioinformatics,Computational biology,Covariance matrix,Context,gene expression data,Cluster analysis,kernel,data depth,depth function,central unit,geometric variability,proximity function
Citation:
Itziar Irigoien, Francesc Mestres, Concepcion Arenas, "The Depth Problem: Identifying the Most Representative Units in a Data Group," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 10, no. 1, pp. 161-172, Jan.-Feb. 2013, doi:10.1109/TCBB.2012.147
Usage of this product signifies your acceptance of the Terms of Use.