On Finding Bicliques in Bipartite Graphs: a Novel Algorithm with Application to the Integration of Diverse Biological Data Types
Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008) (2008)
Waikoloa, Big Island, Hawaii
Jan. 7, 2008 to Jan. 10, 2008
The integration of multiple genome-scale data sets is a huge algorithmic challenge for modern systems biology. In such settings, bipartite graphs are often useful in representing relationships across pairs of heterogeneous data types, with the interpretation of such relationships accomplished through an enumeration of maximal bicliques. Unfortunately, previously-known algorithms are highly inefficient and do not scale for this important task. In this paper, a fast and novel algorithm is described that finds all maximal bicliques in a bipartite graph. Unlike other techniques that have been proposed for this problem, the new method neither places undue restrictions on the input nor inflates the problem size. Efficiency is achieved by exploiting structure inherent in bipartite graphs, and by ensuring that biclique enumeration avoids duplication while pruning nonmaximal candidates. Experiments using gene expression data indicate that the new approach can be as much as two orders of magnitude faster than the best previous alternatives.
M. A. Langston, E. J. Chesler and Y. Zhang, "On Finding Bicliques in Bipartite Graphs: a Novel Algorithm with Application to the Integration of Diverse Biological Data Types," Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008)(HICSS), Waikoloa, Big Island, Hawaii, 2008, pp. 473.