The Community for Technology Leaders
RSS Icon
Issue No.07 - July (2010 vol.22)
pp: 1028-1040
Payam Barnaghi , The University of Surrey, Guildford, Surrey
Wang Wei , University of Nottingham (Malaysia Campus), Semanyih
Probabilistic topic models were originally developed and utilized for document modeling and topic extraction in Information Retrieval. In this paper, we describe a new approach for automatic learning of terminological ontologies from text corpus based on such models. In our approach, topic models are used as efficient dimension reduction techniques, which are able to capture semantic relationships between word-topic and topic-document interpreted in terms of probability distributions. We propose two algorithms for learning terminological ontologies using the principle of topic relationship and exploiting information theory with the probabilistic topic models learned. Experiments with different model parameters were conducted and learned ontology statements were evaluated by the domain experts. We have also compared the results of our method with two existing concept hierarchy learning methods on the same data set. The study shows that our method outperforms other methods in terms of recall and precision measures. The precision level of the learned ontology is sufficient for it to be deployed for the purpose of browsing, navigation, and information search and retrieval in digital libraries.
Knowledge acquisition, ontology learning, ontology, probabilistic topic models.
Payam Barnaghi, Wang Wei, "Probabilistic Topic Models for Learning Terminological Ontologies", IEEE Transactions on Knowledge & Data Engineering, vol.22, no. 7, pp. 1028-1040, July 2010, doi:10.1109/TKDE.2009.122
[1] T. Burners-Lee, J. Hendler, and O. Lassila, "The Semantic Web," Scientific Am., vol. 284, no. 5, pp. 34-43, 2001.
[2] P. Cimiano, Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer-Verlag New York, Inc., 2006.
[3] S. Ponzetto and M. Strube, "Deriving a Large Scale Taxonomy from Wikipedia" Proc. 22nd Nat'l Conf. Artificial Intelligence (AAAI '07), pp. 1440-1447, July 2007.
[4] F.M. Suchanek, G. Kasneci, and G. Weikum, "Yago: A Core of Semantic Knowledge," Proc. 16th Int'l Conf. World Wide Web (WWW '07), pp. 697-706, 2007.
[5] H. Cunningham, "Information Extraction, Automatic," Encyclopedia of Language and Linguistics, second ed., Elsevier Science, 2005.
[6] P. Cimiano and J. Völker, "Text2onto," Proc. Int'l Conf. Natural Language to Information Systems (NLDB), pp. 227-238, 2005.
[7] A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff, "Semantic Annotation, Indexing, and Retrieval," J. Web Semantics, vol. 2, no. 1, pp. 49-79, 2004.
[8] M. Fleischman and E.H. Hovy, "Fine Grained Classification of Named Entities," Proc. Int'l Conf. Computational Linguistics (COLING '02), 2002.
[9] F.M. Suchanek, G. Ifrim, and G. Weikum, "Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents," Proc. ACM SIGKDD, pp. 712-717, 2006.
[10] M. Pasca, "Finding Instance Names and Alternative Glosses on the Web: Wordnet Reloaded," Proc. Int'l Conf. Computational Linguistics and Intelligent Text Processing (CICLing), pp. 280-292, 2005.
[11] T. Hofmann, "Probabilistic Latent Semantic Analysis," Proc. Uncertainty in Artificial Intelligence (UAI), pp. 289-296, 1999.
[12] D.M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
[13] T.L. Griffiths and M. Steyvers, "Finding Scientific Topics," Proc. Nat'l Academy of Sciences USA, vol. 101, no. 1, pp. 5228-5235, Apr. 2004.
[14] M. Steyvers and T. Griffiths, "Probabilistic Topic Models," Latent Semantic Analysis: A Road to Meaning, T. Landauer, D. Mcnamara, S. Dennis, and W. Kintsch, eds., Lawrence Erlbaum, 2005.
[15] R. Navigli and P. Velardi, "Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites," Computational Linguistics, vol. 30, no. 2, pp. 151-179, 2004.
[16] M. Sanderson and W.B. Croft, "Deriving Concept Hierarchies from Text," Proc. ACM SIGIR, pp. 206-213, 1999.
[17] J. Diederich and W.-T. Balke, "The Semantic Growbag Algorithm: Automatically Deriving Categorization Systems," Proc. European Conf. Digital Libraries (ECDL), pp. 1-13, 2007.
[18] C. Biemann, "Ontology Learning from Text: A Survey of Methods," LDV Forum, vol. 20, no. 2, pp. 75-93, 2005.
[19] M.A. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora," Proc. Int'l Conf. Computational Linguistics (COLING), pp. 539-545, 1992.
[20] Z. Harris, Mathematical Structures of Language. Wiley, 1968.
[21] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, "Indexing by Latent Semantic Analysis," J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[22] M.W. Berry, S.T. Dumais, and G.W. O'Brien, "Using Linear Algebra for Intelligent Information Retrieval," SIAM Rev., vol. 37, pp. 573-595, 1995.
[23] J. Bilmes, "A Gentle Tutorial on the Em Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," Technical Report ICSI-TR-97-021, Univ. of Berkeley, 1997.
[24] G. D'Agostini, "Bayesian Inference in Processing Experimental Data: Principles and Basic Applications," Reports on Progress in Physics, vol. 66, no. 9, pp. 1383-1419, 2003.
[25] C. Andrieu, N. de Freitas, A. Doucet, and M.I. Jordan, "An Introduction to MCMC for Machine Learning," Machine Learning, vol. 50, nos. 1/2, pp. 5-43, 2003.
[26] T. Griffiths, "Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation," technical report, Stanford Univ., 2002.
[27] D.J. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.
[28] E. Zavitsanos, G. Paliouras, G.A. Vouros, and S. Petridis, "Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora," Proc. IEEE/WIC/ACM Int'l Conf. Web Intelligence (WI '07), pp. 402-408, 2007.
[29] L. Itti and P. Baldi, "Bayesian Surprise Attracts Human Attention," Advances in Neural Information Processing Systems, vol. 19, pp. 547-554, MIT Press, 2006.
[30] C.D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
18 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool