The Community for Technology Leaders
2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC) (2016)
Pittsburgh, Pennsylvania, United States
Nov. 1, 2016 to Nov. 3, 2016
ISBN: 978-1-5090-4607-2
pp: 373-380
ABSTRACT
The popularity and huge amount of information published in Online Social Networks (OSN) established them as one of the main data sources for a variety of research community fields. However, the design of a large-scale dataset collection campaign is a major problem for organizations and researchers who aim in addressing their research questions by analyzing this type of data. OSN platforms provide Application Programming Interfaces (API) to third party developers, which enable them to retrieve and use this data for applications deployment. However, due to OSN imposed limitations, the process of retrieving large scale data with the use of these APIs is challenging and time consuming, resulting in datasets which are either incomplete or outdated. It is relatively impossible for an individual scientist or research group to follow an efficient dataset collection procedure and build a large sample in a short amount of time. In this paper we present a framework for efficient crowd crawling of OSN. Our framework is based on the use of multiple OSN accounts, which are engaged in an efficient distributed collection process able to circumvent the imposed limitations without violating the terms of use. We present an evaluation of the proposed solution and demonstrate its performance in terms of dataset completeness and timeliness, for the case study of Twitter, one of the most popular platforms used in research.
INDEX TERMS
Data collection, Twitter, Facebook, Real-time systems, IP networks, Monitoring
CITATION

H. Efstathiades, D. Antoniades, G. Pallis and M. D. Dikaiakos, "Distributed Large-Scale Data Collection in Online Social Networks," 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC), Pittsburgh, Pennsylvania, United States, 2016, pp. 373-380.
doi:10.1109/CIC.2016.056
323 ms
(Ver 3.3 (11022016))