Falk Hüffner , TU Berlin, Berlin
Christian Komusiewicz , TU Berlin, Berlin
Adrian Liebtrau , Friedrich-Schiller-Universität Jena, Berlin
Rolf Niedermeier , TU Berlin, Berlin
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TCBB.2013.177
A popular clustering algorithm for biological networks which was proposed by Hartuv and Shamir [IPL 2000] identifies nonoverlapping highly connected components. We extend the approach taken by this algorithm by introducing the combinatorial optimization problem Highly Connected Deletion, which asks for removing as few edges as possible from a graph such that the resulting graph consists of highly connected components. We show that Highly Connected Deletion is NP-hard and provide a fixed-parameter algorithm and a kernelization. We propose exact and heuristic solution strategies, based on polynomial-time data reduction rules and integer linear programming with column generation. The data reduction typically identifies 75% of the edges that are deleted for an optimal solution; the column generation method can then optimally solve protein interaction networks with up to 6,000 vertices and 13,500 edges in less than a day. Additionally, we present a new heuristic that finds more clusters than the method by Hartuv and Shamir.
Integer linear programming, Clustering, Protein interaction networks, NP-hard problem, Fixed-parameter algorithm
Falk Hüffner, Christian Komusiewicz, Adrian Liebtrau, Rolf Niedermeier, "Partitioning Biological Networks into Highly Connected Clusters with Maximum Edge Coverage", IEEE/ACM Transactions on Computational Biology and Bioinformatics, , no. 1, pp. 1, PrePrints PrePrints, doi:10.1109/TCBB.2013.177