2009 Ninth IEEE International Conference on Data Mining Knowledge Discovery from Citation Networks Miami, Florida December 06-December 09 ISBN: 978-0-7695-3895-2
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2009.137
Knowledge discovery from scientific articles has received increasing attentions recently since huge repositories are made available by the development of the Internet and digital databases. In a corpus of scientific articles such as a digital library, documents are connected by citations and one document plays two different roles in the corpus: \emph{document itself} and \emph{a citation of other documents}. In the existing topic models, little effort is made to differentiate these two roles. We believe that the topic distributions of these two roles are different and related in a certain way. In this paper we propose a \emph{Bernoulli Process Topic}~(BPT) model which models the corpus at two levels: \emph{document level} and \emph{citation level}. In the BPT model, each document has two different representations in the latent topic space associated with its roles. Moreover, the multi-level hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The distribution parameters of the BPT model are estimated by a variational approximation approach. In addition to conducting the experimental evaluations on the document modeling task, we also apply the BPT model to a well known scientific corpus to discover the latent topics. The comparisons against state-of-the-art methods demonstrate a very promising performance.
Index Terms:
Unsupervised learning, latent models, text mining
Citation:
Zhen Guo, Zhongfei Zhang, Shenghuo Zhu, Yun Chi, Yihong Gong, "Knowledge Discovery from Citation Networks," icdm, pp.800-805, 2009 Ninth IEEE International Conference on Data Mining, 2009 Usage of this product signifies your acceptance of the Terms of Use. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||