Issue No. 02 - April-June (2010 vol. 7)

ISSN: 1545-5963

pp: 375-381

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.90

Giora Unger , Tel-Aviv University, Tel-Aviv

Benny Chor , Tel-Aviv University, Tel-Aviv

ABSTRACT

We study simple geometric properties of gene expression data sets, where samples are taken from two distinct classes (e.g., two types of cancer). Specifically, the problem of linear separability for pairs of genes is investigated. If a pair of genes exhibits linear separation with respect to the two classes, then the joint expression level of the two genes is strongly correlated to the phenomena of the sample being taken from one class or the other. This may indicate an underlying molecular mechanism relating the two genes and the phenomena (e.g., a specific cancer). We developed and implemented novel efficient algorithmic tools for finding all pairs of genes that induce a linear separation of the two sample classes. These tools are based on computational geometric properties and were applied to 10 publicly available cancer data sets. For each data set, we computed the number of actual separating pairs and compared it to an upper bound on the number expected by chance and to the numbers resulting from shuffling the labels of the data at random empirically. Seven out of these 10 data sets are highly separable. Statistically, this phenomenon is highly significant, very unlikely to occur at random. It is therefore reasonable to expect that it manifests a functional association between separating genes and the underlying phenotypic classes.

INDEX TERMS

Gene expression analysis, DNA microarrays, diagnosis, linear separation.

CITATION

G. Unger and B. Chor, "Linear Separability of Gene Expression Data Sets," in

*IEEE/ACM Transactions on Computational Biology and Bioinformatics*, vol. 7, no. , pp. 375-381, 2008.

doi:10.1109/TCBB.2008.90

CITATIONS