The Community for Technology Leaders
2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) (2014)
London, United Kingdom
Sept. 8, 2014 to Sept. 12, 2014
ISBN: 978-1-4799-5569-5
pp: 117-126
Zhaohui Wu , Computer Science and Engineering, Pennsylvania State University, University Park, 16802, USA
Jian Wu , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
Madian Khabsa , Computer Science and Engineering, Pennsylvania State University, University Park, 16802, USA
Kyle Williams , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
Hung-Hsuan Chen , Computer Science and Engineering, Pennsylvania State University, University Park, 16802, USA
Wenyi Huang , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
Suppawong Tuarob , Computer Science and Engineering, Pennsylvania State University, University Park, 16802, USA
Sagnik Ray Choudhury , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
Alexander Ororbia , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
Prasenjit Mitra , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
C. Lee Giles , Information Sciences and Technology, Pennsylvania State University, University Park, 16802, USA
ABSTRACT
We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.
INDEX TERMS
Databases, Big data, Books, Servers, Cloud computing, Data mining, Crawlers,Big Data, Scholarly Big Data, Information Extraction
CITATION
Zhaohui Wu, Jian Wu, Madian Khabsa, Kyle Williams, Hung-Hsuan Chen, Wenyi Huang, Suppawong Tuarob, Sagnik Ray Choudhury, Alexander Ororbia, Prasenjit Mitra, C. Lee Giles, "Towards building a scholarly big data platform: Challenges, lessons and opportunities", 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), vol. 00, no. , pp. 117-126, 2014, doi:10.1109/JCDL.2014.6970157
99 ms
(Ver 3.3 (11022016))