• Publication
  • 1996
  • Issue No. 8 - August
  • Abstract - A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project
 This Article 
   
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project
August 1996 (vol. 18 no. 8)
pp. 771-782

Abstract—This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. We have experimented previously with such a technique for a smaller molecular biology domain (Worm Community System, with 10+ MBs of document collection) with encouraging results.

In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we recently conducted experiments using the concept space approach on parallel supercomputers. Our test collection included 2+ GBs of computer science and electrical engineering abstracts extracted from the INSPEC database. The concept space approach called for extensive textual and statistical analysis (a form of knowledge discovery) based on automatic indexing and co-occurrence analysis algorithms, both previously tested in the biology domain. Initial testing results using a 512-node CM-5 and a 16-processor SGI Power Challenge were promising. Power Challenge was later selected to create a comprehensive computer engineering concept space of about 270,000 terms and 4,000,000+ links using 24.5 hours of CPU time. Our system evaluation involving 12 knowledgeable subjects revealed that the automatically-created computer engineering concept space generated significantly higher concept recall than the human-generated INSPEC computer engineering thesaurus. However, the INSPEC was more precise than the automatic concept space. Our current work mainly involves creating concept spaces for other major engineering domains and developing robust graph matching and traversal algorithms for cross-domain, concept-based retrieval. Future work also will include generating individualized concept spaces for assisting user-specific concept-based information retrieval.

[1] T. Ahlswede and M. Evens, "Generating a Relational Lexicon From a Machine-Readable Dictionary. Int'l J. Lexicography, vol. 1, no. 3, pp. 214-237, 1988.
[2] J.R. Anderson, Cognitive Psychology and Its Implications, second ed.,New York, NY: W. H. Freeman and Company, 1985.
[3] J.R. Anderson, "Indexing Systems: Extensions of the Mind's Organizing Power," Information and Behavior, vol. 1, 1985.
[4] M.J. Bates, "Subject Access in Online Catalogs: A Design Model," J. Am. Soc. Information Science, vol. 37, no. 6, pp. 357-376, Nov. 1986.
[5] N.J. Belkin, R.N. Oddy, and H.M. Brooks, "Ask for Information Retrieval: Part I. Background and Theory," J. Documentation, vol. 38, no. 2, pp. 61-71, June 1982.
[6] S.K. Card, T.P. Moran, and A. Newell, The Psychology of Human-Computer Interaction, Erlbaum, Hillsdale, N.J., 1983.
[7] H. Chen, "Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms," J. Am. Soc. Information Science, vol. 46, no. 3, pp. 194-216, Apr. 1995.
[8] H. Chen and V. Dhar, "Reducing Indeterminism in Consultation: A Cognitive Model of User/Librarian Interaction," Proc. Sixth Nat'l Conf. Artificial Intelligence (AAAI-87), pp. 285-289,Seattle, July13-17, 1987.
[9] H. Chen and V. Dhar, "Cognitive Process as a Basis for Intelligent Retrieval Systems Design," Information Processing and Management, vol. 27, no. 5, pp. 405-432, 1991.
[10] H. Chen and K. Lynch, "Automatic Construction of Networks of Concepts Characterizing Document Databases," IEEE Trans. Systems, Man and Cybernetics, Sept./Oct. 1992, pp. 885-902.
[11] H. Chen, K.J. Lynch, K. Basu, and D.T. Ng, "Generating, Integrating, and Activating Thesauri for Concept-Based Document Retrieval," IEEE EXPERT, Special Series on Artificial Intelligence in Text-based Information Systems, vol. 8, no. 2, pp. 25-34, Apr. 1993.
[12] H. Chen et al., "A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Processing: An Experiment on the Worm Community System," J. American Soc. Information Science, to appear 1996.
[13] H. Chen and T. Ng, "An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Brand-and-Bound Search vs. Connectionist Hopfield Net Activation," J. Am. Soc. Information Science, vol. 46, no. 5, 1995, pp. 348-369.
[14] H. Chen et al., "Automatic Thesaurus Generation for an Electronic Scientific Community," J. American Soc. Information Science, Vol. 46, No. 3, Apr. 1995, pp. 175-193.
[15] T.R. Couvreur, R.N. Benzel, S.F. Miller, and D.N. Zeitler, "An Analysis of Performance and Cost Factors in Searching large text Databases Using Parallel Search Systems," J. Am. Soc. Information Science, vol. 45, no. 7, pp. 443-464, Aug. 1994.
[16] W.B. Croft and R. Das, "Experiments with Query Acquisition and Use in Document Retrieval Systems," Proc. 13th Conf. Research and Development in Information Retrieval, pp. 349-365,Brussels, Sept.5-7, 1990.
[17] C.J. Crouch and B. Yang, "Experiments in Automatic Statistical Thesaurus Construction," Proc. 15th Ann. Int'l ACM/SIGIR Conf. Research and Development in Information Retrieval, pp. 77-88,Copenhagen, June21-24, 1992.
[18] L.B. Doyle, "Indexing and Abstracting By Association," Am. Documentation, vol. 13, no. 4, pp. 378-390, Oct. 1962.
[19] S.T. Dumais, "Latent Semantic Indexing (LSI) and TREC-2," Pro. Text Retrieval Conf. (TREC-2), pp. 105-115,Bethesda, Md, Nov.4-6, 1994.
[20] E.A. Fox, J.T. Nutter, T. Ahlswede, M. Evens, and J. Markowitz, "Building a Large Thesaurus for Information Retrieval, Proc. 2nd Conf. Applied Natural Language Processing, Assoc. Computational Linguistics, Ballard, Bruce, eds.; pp. 101-108,Morristown, N.J.: Bell Communications Research., 1988.
[21] O. Frieder and H. Siegelmann,“On the allocation of documents in multiprocessor information retrieval systems,”inProc. ACSIGIR Conf., 1991, pp. 230–239.
[22] G. Furnas et al., "The Vocabulary Problem in Human-System Communication," Comm. ACM, Nov. 1987, pp. 964-971.
[23] J.J. Hopfield, "Neural Network and Physical Systems with Collective Computational Abilities," Proc. Nat'l Academy of Science, USA, vol. 79, no. 4 pp. 2,554-2,558, 1982.
[24] Y.W. Kim and J.H. Kim, "A Model of Knowledge Based Information Retrieval with Hierarchical Concept Graph," J. Documentation, vol. 46, pp. 113-116, 1990.
[25] F.W. Lancaster, Vocabulary Control for Information Retrieval,Arlington, Va: Information Resources Press, 1986.
[26] M.E. Lesk, "Word-Word Associations in Document Retrieval Systems," American Documentation, vol. 20, no. 1, pp. 27-38, Jan. 1969.
[27] D.A. Lindberg and B.L. Humphreys, "The UMLS Knowledge Sources: Tools for Building Better User Interface," Proc. 14th Ann. Symp. Computer Applications in Medical Care, pp. 121-125,Los Alamitos, Calif., Nov.,4-7, 1990.
[28] M.E. Maron and J.L. Kuhns, "On Relevance, Probabilistic Indexing and Information Retrieval," J. Assoc. for Computing Machinery (JACM), vol. 7, no. 3, July 1960, pp. 216-244.
[29] R.N. Oddy and B. Balakrishnan, "PTHOMAS: An Adaptive Information Retrieval System on the Connection Machine," Information Processing and Management, vol. 27, no. 4, pp. 317-335, 1991.
[30] H.J. Peat and P. Willett, "The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems," J. Am. Soc. Information Science, vol. 42, no. 5, pp. 378-383, June 1991.
[31] R. Pool, "Off-the-Shelf Chips Conquer the Heights of Computing," Science, vol. 269, pp. 1,359-1,361, Sept.8 1995.
[32] E. Rasmussen, "Clustering Algorithms," Information Retrieval: Data Structures and Algorithms, W.B. Frakes and R. Baeza-Yates, eds. Englewood Cliffs, N.J.: Prentice Hall, 1992.
[33] E. M. Rasmussen,“Introduction: Parallel processing and information retrieval,”Inform. Processing, Manage., vol. 27, no. 4, pp. 255–263, 1991.
[34] G. Salton, Automatic Text Processing. Addison-Wesley, 1988.
[35] G. Salton and C. Buckley,“Parallel text search methods,”CACM, vol. 31, no. 2, pp. 202–215, Feb. 1988.
[36] B.R. Schatz and J.B. Hardin, "NSCA Mosaic and the World Wide Web: Global Hypermedia Protocols for the Internet," Science, vol. 265, pp. 895-901, Aug. 1994.
[37] C. Stanfill and R. Thau, "Information Retrieval on the Connection Machine: 1 to 8,192 Gigabytes," Information Processing and Management, vol. 27, no. 4, pp. 285-310, 1991.
[38] H.E. Stiles, "Progress in the Use of the Association Factor in Information Retrieval," J. ACM, vol. 18, 1961, pp. 271-279.
[39] C.D. Thomborson, "Does Your Workstation Computation Belong on a Vector Supercomputer?" Comm. ACM, vol. 36, no. 11, pp. 41-49, Nov. 1993.
[40] B. Wah, "Report on Workshop on High Performance Computing and Communications for Grand Challenge Applications: Computer Vision, Speech and Natural Language Processing, and Artificial Intelligence," IEEE Trans. Knowledge and Data Eng., vol. 5, no. 1, pp. 138-154, Feb. 1993.
[41] J. Yang and R.R. Korfhage, "Effects of Query Term Weights Modification in Document Retrieval: A Study Based on a Genetic Algorithm," Proc. Second Ann. Symp. Document Analysis and Information Retrieval, pp. 271-285,Las Vegas, Apr.26-28, 1993.

Index Terms:
Semantic retrieval, concept space, concept association, parallel computing, digital library.
Citation:
Hsinchun Chen, Bruce Schatz, Tobun Ng, Joanne Martinez, Amy Kirchhoff, Chienting Lin, "A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 771-782, Aug. 1996, doi:10.1109/34.531798
Usage of this product signifies your acceptance of the Terms of Use.