Subscribe

Issue No.10 - October (2010 vol.22)

pp: 1428-1443

Sergio Consoli , Brunel University, Uxbridge

Kenneth Darby-Dowman , Brunel University, Uxbridge, Middlesex

Gijs Geleijnse , Philips Research Eindhoven, Eindhoven

Jan Korst , Philips Research Eindhoven, Eindhoven

Steffen Pauws , Philips Research Eindhoven, Eindhoven

DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.188

ABSTRACT

Given a set of objects and their pairwise distances, we wish to determine a visual representation of the data. We use the quartet paradigm to compute a hierarchy of clusters of the objects. The method is based on an NP-hard graph optimization problem called the Minimum Quartet Tree Cost problem. This paper presents and compares several heuristic approaches to approximate the optimal hierarchy. The performance of the algorithms is tested through extensive computational experiments and it is shown that the Reduced Variable Neighborhood Search heuristic is the most effective approach to the problem, obtaining high-quality solutions in short computational running times.

INDEX TERMS

Clustering, heuristic methods, optimization, graphs and networks.

CITATION

Sergio Consoli, Kenneth Darby-Dowman, Gijs Geleijnse, Jan Korst, Steffen Pauws, "Heuristic Approaches for the Quartet Method of Hierarchical Clustering",

*IEEE Transactions on Knowledge & Data Engineering*, vol.22, no. 10, pp. 1428-1443, October 2010, doi:10.1109/TKDE.2009.188REFERENCES

- [1] R. Battiti, M. Brunato, and F. Mascia,
Reactive Search and Intelligent Optimization, Operations Research/Computer Science Interfaces Series, vol. 45. Springer-Verlag, 2008.- [2] A. Ben-Dor, B. Chor, D. Graur, R. Ophir, and D. Pelleg, "Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships,"
J. Computational Biology, vol. 5, no. 3, pp. 377-390, 1998.- [3] V. Berry, T. Jiang, P. Kearney, M. Li, and T. Wareham, "Quartet Cleaning: Improved Algorithms and Simulations,"
Proc. Seventh European Symp. Algorithms (ESA '99), H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwefel, eds., pp. 313-324, 1999.- [4] R. Cilibrasi, "The Complearn Toolkit," http:/www.complearn. org/, 2007.
- [5] R. Cilibrasi and P.M.B. Vitányi, "Clustering by Compression,"
IEEE Trans. Information Theory, vol. 51, no. 4, pp. 1523-1545, 2005.- [6] R. Cilibrasi and P.M.B. Vitányi, "A New Quartet Tree Heuristic for Hierarchical Clustering,"
Dagstuhl Seminar Proc.: Theory of Evolutionary Algorithms, D.V. Arnold, T. Jansen, M.D. Vose, and J.E. Rowe, eds., http://drops.dagstuhl.de/opus/volltexte/ 2006598, 2006.- [7] R. Cilibrasi and P.M.B. Vitányi, "The Google Similarity Distance,"
IEEE Trans. Knowledge and Data Eng., vol. 19, no. 3, pp. 370-383, Mar. 2007.- [8] S. Consoli, "Test Data Sets for the Quartet Method of Hierarchical Clustering," http://www.sergioconsoli.comQuartet.htm, 2008.
- [9] R. Diestel,
Graph Theory. Springer-Verlag, 2000.- [10] J. Felsenstein, "Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach,"
J. Molecular Evolution, vol. 17, no. 6, pp. 368-376, 1981.- [11] T.A. Feo and M.G.C. Resende, "A Probabilistic Heuristic for a Computationally Difficult Set Covering Problem,"
Operations Research Letters, vol. 8, pp. 67-71, 1989.- [12] G.W. Furnas, "The Generation of Random, Binary Unordered Trees,"
J. Classification, vol. 1, no. 1, pp. 187-233, 1984.- [13] G. Geleijnse, J. Korst, and V. de Boer, "Instance Classification Using Co-Occurrences on the Web,"
Proc. Int'l Semantic Web Conf., http://www.dse.nl/~gijsg webconmine.pdf , 2006.- [14] F. Glover, "Future Paths for Integer Programming and Links to Artificial Intelligence,"
Computers and Operations Research, vol. 13, pp. 533-549, 1986.- [15] F. Glover and G.A. Kochenberger,
Handbook of Metaheuristics. Kluwer Academic Publishers, 2003.- [16] P. Hansen and N. Mladenović, "Variable Neighbourhood Search,"
Handbook of Metaheuristics, F. Glover and G. A. Kochenberger, eds., ch. 6, pp. 145-184, Kluwer Academic Publishers, 2003.- [17] T. Jiang, P. Kearney, and M. Li, "A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application,"
SIAM J. Computing, vol. 30, no. 6, pp. 1942-1961, 2000.- [18] L. Kaufman and P.J. Rousseeuw,
Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 2005.- [19] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, "Optimization by Simulated Annealing,"
Science, vol. 220, no. 4598, pp. 671-680, 1983.- [20] C.D. Manning and H. Schütze,
Foundations of Statistical Natural Language Processing. MIT Press, 1999.- [21] N. Mladenović, J. Petrović, V. Kovačević-Vujčić, and M. Čangalović, "Solving Spread Spectrum Radar Polyphase Code Design Problem by Tabu Search and Variable Neighbourhood Search,"
European J. Operational Research, vol. 151, no. 2, pp. 389-399, 2003.- [22] I.H. Osman, "Metastrategy Simulated Annealing and Tabu Search Algorithms for the Vehicle Routing Problem,"
Annals of Operations Research, vol. 41, pp. 421-451, 1993.- [23] A. Rokas, B.L. Williams, N. King, and S.B. Carroll, "Genome-Scale Approaches to Resolving Incongruence in Molecular Phylogenies,"
Nature, vol. 425, no. 6960, pp. 798-804, 2003.- [24] M.A. Steel, "The Complexity of Reconstructiong Trees from Qualitative Characters and Subtrees,"
J. Classification, vol. 9, pp. 91-116, 1992.- [25] K. Strimmer and A. von Haeseler, "Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies,"
Molecular Biology and Evolution, vol. 13, no. 7, pp. 964-969, 1996.- [26] J. Weyer-Menkhoff, C. Devauchelle, A. Grossmann, and S. Grünewald, "Integer Linear Programming as a Tool for Constructing Trees from Quartet Data,"
Computational Biology and Chemistry, vol. 29, no. 3, pp. 196-203, 2005. |