This Article 
 Bibliographic References 
 Add to: 
An Information Retrieval Approach for Automatically Constructing Software Libraries
August 1991 (vol. 17 no. 8)
pp. 800-813

A technology for automatically assembling large software libraries which promote software reuse by helping the user locate the components closest to her/his needs is described. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two steps. First, attributes are automatically extracted from natural language documentation by using an indexing scheme based on the notions of lexical affinities and quantity of information. Then a hierarchy for browsing is automatically generated using a clustering technique which draws only on the information provided by the attributes. Due to the free-text indexing scheme, tools following this approach can accept free-style natural language queries.

[1] M. Adanson.,Histoire Naturelle du Sénégal. Coquillages. Avec la relation abrégée d'un voyage fait en ce pays, pendant les années 1749,50,51,52 et 53. Paris: Bauche, 1757.
[2] B. Allen and S. Lee, "A knowledge-based environment for the development of software parts composition systems," inProc. IEEE 11th Int. Conf. Software Eng., May 1989, pp. 104-112.
[3] S. P. Arnold and S. L. Stepoway, "The reuse system: Cataloging and retrieval of reusable software," inSoftware Reuse: Emerging Technology, W. Tracz, Ed. Los Alamitos, CA: IEEE Computer Soc., 1987, pp. 138-141.
[4] R. Ash,Information Theory. New York: Wiley-Interscience, 1965.
[5] D.C. Blair and M.E. Marron, "An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System,"Comm. ACM, Vol. 28, No. 3, Mar. 1985, pp. 289-299.
[6] B. A. Burton, R. Wienk Aragon, S. A. Bailey, K. D. Koelher, and L. A. Mayes, "The reusable software library, " inSoftware Reuse: Emerging Technology, W. Tracz, Ed. Los Alamitos, CA: IEEE Computer Soc., 1987, pp. 129-137.
[7] F. Can and E. A. Ozkarahan, "A clustering scheme," inProc. SIGIR'83(Bethesda, MD), 1983, pp. 115-121.
[8] F. de Saussure,Cours de Linguistique Générale, Quatrième Edition. Paris: Librairie Payot, 1949.
[9] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman," Indexing by latent semantic analysis,"J. Amer. Soc. Inform. Sci., vol. 41, no. 6, pp. 391-407, 1990.
[10] P. Devanbu, "Re-use of software knowledge: A progress report," presented at the 3rd Ann. Workshop: Methods and Tools for Reuse, Syracuse, NY, June 1990.
[11] P. Devanbu, P. G. Selfridge, B. W. Ballard, and R. J. Brachman, "A knowledge-based software information system," inProc. IJCAI'89(Detroit, MI), Aug. 1989, pp. 110-115.
[12] E. Diday, J. Lemaire, and F. Testu,Eléments d'Analyse des Données. Paris: Dunod, 1982.
[13] B. Everitt,Cluster Analysis. New York: Halsted, 1980.
[14] W. B. Frakes and P. B. Gandel, "Classification, storage and retrieval of reusable components," inProc. SIGIR'89(Cambridge, MA), June 1989, N. J. Belkin and C. J. van Rijsbergen, Eds., pp. 251-254.
[15] W. B. Frekes and P. B. Gandel, "Representing reusable software,"Inform. Software Technol., Nov. 1990.
[16] W. B. Frakes and B. A. Nejmeh, "Software reuse through information retrieval," inProc. 20th Ann. HICSS(Kona, HI), Jan. 1987, pp. 530-535.
[17] A. Griffiths, L. A. Robinson, and P. Willett, "Hierarchical agglomerative clustering methods for automatic document classification,"J. Documentation, vol. 40, no. 3, pp. 175-205, Sept. 1984.
[18] W. Harrison, "A program development environment for programming by refinement and reuse," inProc. 19th HICSS(Kona, HI), 1986, pp. 459-469.
[19] IBM AIX Version 3 for RISC System/6000. Commands Reference. Yorktown Heights; NY: IBM, 1990.
[20] T. Ichikawa and M. Hirakawa, "Ares: A relational database with the capability of performing flexible interpretation of queries,"IEEE Trans. Software Eng., vol. SE-12, pp. 624-634, May 1986.
[21] N. Jardine and C. J. van Rijsbergen, "The use of hierarchic clustering in information retrieval,"Inform. Storage and Retrieval, vol. 7, no. 5, pp. 217-240, Dec. 1971.
[22] S. M. Kaplan and Y. S. Maarek, "Incremental maintenance of semantic links in dynamically changing hypertext systems,"Interacting with Computers, vol. 2, no. 3, Dec. 1990.
[23] P. H. Klingbiel, "Machine-aided indexing of technical literature,"Inform. Storage and Retrieval, vol. 9, pp. 79-84, 1973.
[24] G. N. Lance and W. T. Williams, "A general theory of classificatory sorting strategies,"Computer J., vol. 9, pp. 373-380, 1967.
[25] M. Luhn, "The automatic creation of literature abstracts,"IBM J. Res. Develop., vol. 2, no. 2, pp. 159-165, Apr. 1958.
[26] Y. S. Maarek, "Using structural information for managing very large software systems," Ph.D. thesis, Technion, Israel Instit. Technol., Haifa, Israel, Jan. 1989.
[27] Y. S. Maarek, "An incremental conceptual clustering algorithm with input-ordering bias correction, inAdvances in Artificial Intelligence, Natural Language and Knowledge Base Systems, M. C. Golumbic, Ed. New York: Springer-Verlag, 1990.
[28] Y. S. Maarek and G. E. Kaiser, "On the use of conceptual clustering for classifying reusable ada code, " inProc. Ada Letters, Using Ada: ACM SIGAda Int. Conf.(Boston, MA), Dec. 1987, pp. 208-215.
[29] Y. S. Maarek and F. A. Smadja, "Full text indexing based on lexical relations, an application: Software libraries," inProc. SIGIR'89(Cambridge, MA), June 1989, N. J. Belkin and C. J. van Rijsbergen, Eds., pp. 198-206.
[30] W. J. R. Martin, B. P. F. Al, and P. J. G. van Sterkenburg, "On the processing of a text corpus: From textual data to lexicographic information," inLexicographiy: Principles and Practice(Applied Language Studies Series), R. R. K. Hartmann, Ed. London: Academic, 1983.
[31] R. Michalski and R. Stepp, "Automated constructions of classifications: Conceptual clustering versus numerical taxonomy,"IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, pp. 396-409, July 1983.
[32] R. Prieto Diaz and P. Freeman, "Classifying software for reusability,"IEEE Software, vol. 4, pp. 6-16, Jan. 1987.
[33] G. Salton,Automatic Text Processing: The Transformation. Analysis and Retrieval of Information by Computer. Reading, MA: Addison-Wesley, 1989.
[34] G. Salton and M. J. McGill,Introduction to Modern Information Retrieval(Computer Series). New York: McGraw-Hill, 1983.
[35] G. Salton and M. Smith, "On the application of syntactic methodologies in automatic text analysis," inProc. SIGIR'89(Cambridge, MA), June 1989, pp. 137-150.
[36] R. W. Schwanke, R. Z. Altucher, and M. A. Platoff, "Discovering, visualizing and controllling software structure," inProc. 5th Int. Workshop on Software Specifications and Design(Pittsburgh, PA), May 1989, pp. 147-150.
[37] F. A. Smadja, "Lexical co-occurrence: The missing link,"J. Assoc. Literary and Linguistic Computing, vol. 4, no. 3, 1989.
[38] K. Sparck Jones and J. I. Tait, "Automatic search variant generation,"J. Documentation, vol. 40, no. 1, pp. 50-66, Mar. 1984.
[39] W. F. Tichy, R. L. Adams, and L. Holter, "NLH/E: A natural-language help system," inProc. 11th ICSE(Pittsburgh, PA), May 1989, pp. 364-374.
[40] C. J. van Rijsbergen,Information Retrieval, 2nd ed. Stoneham, MA: Butterworths, 1979.
[41] M. Wood and I. Sommerville, "An information retrieval system for software components,"SIGIR Forum, vol. 22, nos. 314, pp. 11-25, Spring/Summer 1988.

Index Terms:
information retrieval approach; large software libraries; software reuse; attributes; natural language documentation; indexing scheme; lexical affinities; browsing; clustering technique; free-text indexing scheme; free-style natural language queries; automatic programming; information retrieval systems; natural languages; software reusability; subroutines
Y.S. Maarek, D.M. Berry, G.E. Kaiser, "An Information Retrieval Approach for Automatically Constructing Software Libraries," IEEE Transactions on Software Engineering, vol. 17, no. 8, pp. 800-813, Aug. 1991, doi:10.1109/32.83915
Usage of this product signifies your acceptance of the Terms of Use.