13th IEEE International Conference on BioInformatics and BioEngineering (2003)
Mar. 10, 2003 to Mar. 12, 2003
Haixun Wang , IBM T.J.Watson
Wei Wang , University of North Carolina at Chapel Hill
Philip Yu , IBM T.J.Watson
Jiong Yang , University of Illinois at Urbana-Champaign
Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters.To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.
Haixun Wang, Wei Wang, Philip Yu, Jiong Yang, "Enhanced Biclustering on Expression Data", 13th IEEE International Conference on BioInformatics and BioEngineering, vol. 00, no. , pp. 321, 2003, doi:10.1109/BIBE.2003.1188969