This Article 
 Bibliographic References 
 Add to: 
A Taxonomy of Similarity Mechanisms for Case-Based Reasoning
November 2009 (vol. 21 no. 11)
pp. 1532-1543
Pádraig Cunningham, University College Dublin, Dublin
Assessing the similarity between cases is a key aspect of the retrieval phase in case-based reasoning (CBR). In most CBR work, similarity is assessed based on feature value descriptions of cases using similarity metrics, which use these feature values. In fact, it might be said that this notion of a feature value representation is a defining part of the CBR worldview—it underpins the idea of a problem space with cases located relative to each other in this space. Recently, a variety of similarity mechanisms have emerged that are not founded on this feature space idea. Some of these new similarity mechanisms have emerged in CBR research and some have arisen in other areas of data analysis. In fact, research on kernel-based learning is a rich source of novel similarity representations because of the emphasis on encoding domain knowledge in the kernel function. In this paper, we present a taxonomy that organizes these new similarity mechanisms and more established similarity mechanisms in a coherent framework.

[1] H. Bunke and B.T. Messmer, “Similarity Measures for Structured Representations,” Proc. European Workshop Case-Based Reasoning (EWCBR '93), S. Wess, K.D. Althoff, and M.M. Richter, eds., pp.106-118, 1993.
[2] B. Falkenhainer, K.D. Forbus, and D. Gentner, “The Structure-Mapping Engine,” Proc. Conf. Assoc. for the Advancement of Artificial Intelligence (AAAI '86), pp. 272-277, 1986.
[3] S.J. Delany and D.G. Bridge, “Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering,” Proc. Int'l Conf. Case-Based Reasoning (ICCBR '07), R. Weber and M.M. Richter, eds., pp. 314-328, 2007.
[4] J.L. Arcos, M. Grachten, and R.L. de Mántaras, “Extracting Performers' Behaviors to Annotate Cases in a cbr System for Musical Tempo Transformations,” Proc. Int'l Conf. Case-Based Reasoning (ICCBR '03), K.D. Ashley and D.G. Bridge, eds., pp. 20-34, 2003.
[5] E. Costello and D.C. Wilson, “A Case-Based Approach to Gene Finding,” Proc. Fifth Int'l Conf. Case-Based Reasoning Workshop CBR in the Health Sciences, pp. 19-28, 2003.
[6] N. Wiratunga, I. Koychev, and S. Massie, “Feature Selection and Generalisation for Retrieval of Textual Cases,” Proc. European Conf. Case Based Reasoning (ECCBR '04), P. Funk and P.A. González-Calero, eds., pp. 806-820, 2004.
[7] E. Gabrilovich and S. Markovitch, “Overcoming the Brittleness Bottleneck Using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge,” Proc. 21st Nat'l Conf. Artificial Intelligence, pp. 1301-1306, 2006.
[8] R. Bergmann and A. Stahl, “Similarity Measures for Object-Oriented Case Representations,” Proc. European Workshop Case-Based Reasoning (EWCBR '98), B. Smyth and P. Cunningham, eds., pp.25-36, 1998.
[9] M. Minor, A. Tartakovski, and R. Bergmann, “Representation and Structure-Based Similarity Assessment for Agile Workflows,” Proc. Seventh Int'l Conf. Case-Based Reasoning (ICCBR '07), R.O.Weber and M.M. Richter, eds., pp. 224-238, 2007.
[10] E. Plaza, “Cases as Terms: A Feature Term Approach to the Structured Representation of Cases,” Proc. Int'l Conf. Case-Based Reasoning (ICCBR '95), M.M. Veloso and A. Aamodt, eds., pp. 265-276, 1995.
[11] K.E. Sanders, B.P. Kettler, and J.A. Hendler, “The Case for Graph-Structured Representations,” Proc. Int'l Conf. Case-Based Reasoning (ICCBR '97), D.B. Leake and E. Plaza, eds., pp. 245-254, 1997.
[12] B. Smyth and P. Cunningham, “Déjà Vu: A Hierarchical Case-Based Reasoning System for Software Design,” Proc. European Conf. Artificial Intelligence (ECAI '92), pp. 587-589, 1992.
[13] B. Smyth, M.T. Keane, and P. Cunningham, “Hierarchical Case-Based Reasoning Integrating Case-Based and Decompositional Problem-Solving Techniques for Plant-Control Software Design,” IEEE Trans. Knowledge and Data Eng. vol. 13, no. 5, pp. 793-812, Sept. 2001.
[14] M.T. Keane and M. Brayshaw, “The Incremental Analogy Machine: A Computational Model of Analogy,” Proc. European Working Session on Learning (EWSL '88), pp. 53-62, 1988.
[15] M.M. Veloso and J.G. Carbonell, “Case-Based Reasoning in Prodigy,” Machine Learning: A Multistrategy Approach, R.S.Michalski and G. Teccuci, eds., vol. IV, pp. 523-548, Morgan Kaufmann, 1994.
[16] R. Bergmann, “Experience Management: Foundations, Development Methodology, and Internet-Based Applications,” Lecture Notes in Computer Science, vol. 2432, Springer, 2002.
[17] M. Lenz and K. Ashley, Proc. AAAI '98 Workshop Textural Case-Based Reasoning, 1998.
[18] H. Shimazu, “A Textual Case-Based Reasoning System Using xml on the World-Wide Web,” Proc. Fourth European Workshop Case-Based Reasoning, B. Smyth and P. Cunningham, eds., pp. 274-285, 1998.
[19] C. Stanfill and D.L. Waltz, “Toward Memory-Based Reasoning,” Comm. ACM, vol. 29, pp. 1213-1228, 1986.
[20] D. Wilson and T. Martinez, “Improved Heterogeneous Distance Functions,” J. Artificial Intelligence Research, vol. 6, pp. 1-34, 1997.
[21] E. Blanzieri and F. Ricci, “A Minimum Risk Metric for Nearest Neighbor Classification,” Proc. 16th Int'l Conf. Machine Learning, pp. 22-31, 1999.
[22] A. Beygelzimer, S. Kakade, and J. Langford, “Cover Trees for Nearest Neighbor,” Proc. 23rd Int'l Conf. Machine Learning (ICML '06), 2006.
[23] J. Schaaf, “Fish and Shrink. A Next Step Towards Efficient Case Retrieval in Large-Scale Case Bases,” Proc. European Workshop Case-Based Reasoning (EWCBR '96), I. Smith and B. Faltings, eds., pp. 362-376, 1996.
[24] S. Santini and R. Jain, “Similarity Measures,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 21, no. 9, pp. 871-883, Sept. 1999.
[25] A. Tversky, “Features of Similarity,” Psychological Rev., vol. 84, pp.327-352, 1977.
[26] T. Tanimoto, “An Elementary Mathematical Theory of Classification and Prediction [Z],” technical report, IBM Corp., 1958.
[27] P. Jaccard, “The Distribution of the Flora in the Alpine Zone,” New Phytologist, vol. 11, pp. 37-50, 1912.
[28] L. Dice, “Measures of the Amount of Ecologic Association between Species,” Ecology, vol. 26, pp. 297-302, 1945.
[29] D. Greene, A. Tsymbal, N. Bolshakova, and P. Cunningham, “Ensemble Clustering in Medical Diagnostics,” Proc. 17th IEEE Symp. Computer-Based Medical Systems (CBMS '04), pp. 576-581, 2004.
[30] S. Kullback and R.A. Leibler, “On Information and Sufficiency,” Annals of Math. Statistics, vol. 22, pp. 79-86, 1951.
[31] Y. Rubner, C. Tomasi, and L.J. Guibas, “The Earth Mover's Distance as a Metric for Image Retrieval,” Int'l J. Computer Vision, vol. 40, pp. 99-121, 2000.
[32] Z. Wu and M.S. Palmer, “Verb Semantics and Lexical Selection,” Proc. 32nd Ann. Meeting Assoc. for Computational Linguistics (ACL '94), pp. 133-138, 1994.
[33] P. Resnik, “Using Information Content to Evaluate Semantic Similarity in a Taxonomy,” Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '95), pp. 448-453, 1995.
[34] V. Levenshtein, “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals,” Problems in Information Transmission, vol. 1, pp. 8-17, 1965.
[35] B. Smyth and M.T. Keane, “Adaptation-Guided Retrieval: Questioning the Similarity Assumption in Reasoning,” Artificial Intelligence, vol. 102, pp. 249-293, 1998.
[36] J.P. Vert, H. Saigo, and T. Akutsu, “Local Alignment Kernels for Biological Sequences,” Kernel Methods in Computational Biology, B.Schölkopf, K. Tsuda, and J.P. Vert, eds., MIT Press, 2004.
[37] S. Needleman and C. Wunsch, “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins,” J. Molecular Biology, vol. 48, pp. 443-453, 1970.
[38] T. Smith and M. Waterman, “Identification of Common Molecular Subsequences,” J. Molecular Biology, vol. 147, pp. 195-197, 1981.
[39] D. Gentner, “Structure-Mapping: A Theoretical Framework for Analogy,” Cognitive Science, vol. 7, pp. 155-170, 1983.
[40] T. Veale and M.T. Keane, “The Competence of Sub-Optimal Theories of Structure Mapping on Hard Analogies,” Proc. Int'l Joint Conf. Artificial Intelligence (IJCAI '97), vol. 1, pp. 232-237, 1997.
[41] M. Li, X. Chen, X. Li, B. Ma, and P.M.B. Vitányi, “The Similarity Metric,” IEEE Trans. Information Theory, vol. 50, no. 12 , pp. 3250-3264, Dec. 2004.
[42] E.J. Keogh, S. Lonardi, and C. Ratanamahatana, “Towards Parameter-Free Data Mining,” Proc. ACM Int'l Conf. Knowledge Discovery and Data Mining, W. Kim, R. Kohavi, J. Gehrke, and W.DuMouchel, eds., pp. 206-215, 2004.
[43] J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. Information Theory, vol. 23, no. 3, pp.337-343, May 1977.
[44] X. Chen, S. Kwong, and M. Li, “A Compression Algorithm for DNA Sequences and Its Applications in Genome Comparison,” Proc. Int'l Conf. Research in Computational Molecular Biology (RECOMB '00), vol. 107, 2000.
[45] M. Li, J.H. Badger, X. Chen, S. Kwong, P.E. Kearney, and H. Zhang, “An Information-Based Sequence Distance and Its Application to Whole Mitochondrial Genome Phylogeny,” Bioinformatics, vol. 17, pp. 149-154, 2001.
[46] N. Bolshakova, F. Azuaje, and P. Cunningham, “Incorporating Biological Domain Knowledge into Cluster Validity Assessment,” Proc. EvoWorkshops—Applications of Evolutionary Computing, F.Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, E. Lutton, P. Machado, J.H. Moore, J. Romero, G.D. Smith, G.Squillero, and H. Takagi, eds., pp. 13-22, 2006.
[47] S. Esmeir and S. Markovitch, “Anytime Induction of Decision Trees: An Iterative Improvement Approach,” Proc. 21st Nat'l Conf. Artificial Intelligence, pp. 348-355, 2006.
[48] L. Breiman, “Random Forests,” Machine Learning, vol. 45, pp. 5-32, 2001.
[49] D. Greene, A. Tsymbal, N. Bolshakova, and P. Cunningham, “Ensemble Clustering in Medical Diagnostics,” Proc. IEEE Symp. Computer-Based Medical Systems (CBMS '04), pp. 576-581, 2004.
[50] T. Lange, V. Roth, M.L. Braun, and J.M. Buhmann, “Stability-Based Validation of Clustering Solutions,” Neural Computation, vol. 16, pp. 1299-1323, 2004.
[51] A. Tsymbal, M. Pechenizkiy, and P. Cunningham, “Dynamic Integration with Random Forests,” Proc. European Conf. Machine Learning (ECML '06), J. Fürnkranz, T. Scheffer, and M. Spiliopoulou, eds., pp. 801-808, 2006.
[52] O. Chapelle, J. Weston, and B. Schölkopf, “Cluster Kernels for Semi-Supervised Learning,” Advances in Neural Information Processing Systems (NIPS), S. Becker, S. Thrun, and K. Obermayer, eds., pp. 585-592, MIT Press, 2002.
[53] J. Weston, C.S. Leslie, E. Ie, D. Zhou, A. Elisseeff, and W.S. Noble, “Semi-Supervised Protein Classification Using Cluster Kernels,” Bioinformatics, vol. 21, pp. 3241-3247, 2005.
[54] M. Sahami and T. Heilman, “A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets,” Proc. 15th Int'l Conf. World Wide Web, pp. 377-386, 2006.
[55] M.M. Richter, “Introduction,” Case-Based Reasoning Technology, M.Lenz, B. Bartsch-Spörl, H.D. Burkhard, and S. Wess, eds., pp. 1-16, Springer, 1998.
[56] S. Delany, P. Cunningham, and B. Smyth, “ECUE: A Spam Filter That Uses Machine Learning to Track Concept Drift,” Proc. 17th European Conf. Artificial Intelligence (ECAI '06), G.C.S Brewka, A.Perini, and P. Traverso, eds., pp. 627-631, 2006.
[57] T.C. Bell, I.H. Witten, and J.G. Cleary, Text Compression. Prentice Hall, 1990.
[58] M. Lenz, H.-D. Burkhard, and S. Brückner, “Applying Case Retrieval Nets to Diagnostic Tasks in Technical Domains,” Proc. European Workshop Case-Based Reasoning (EWCBR '96), I.F.C. Smith and B. Faltings, eds., pp. 219-233, 1996.
[59] M. Lenz and H.D. Burkhard, “Case Retrieval Nets: Basic Ideas and Extensions,” KI—Kunstliche Intelligenz–96: Advances in Artificial Intelligence (Proc. 20th Ann. German Conf. AI), pp. 227-239, 1996.
[60] B. Smyth and E. McKenna, “Footprint-Based Retrieval,” Proc. Int'l Conf. Case-Based Reasoning (ICCBR '99), K.D. Althoff, R. Bergmann, and K. Branting, eds., pp. 343-357, 1999.

Index Terms:
Machine learning, case-based reasoning, nearest neighbor classifiers.
Pádraig Cunningham, "A Taxonomy of Similarity Mechanisms for Case-Based Reasoning," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 11, pp. 1532-1543, Nov. 2009, doi:10.1109/TKDE.2008.227
Usage of this product signifies your acceptance of the Terms of Use.