The Community for Technology Leaders
2016 International Conference on Big Data and Smart Computing (BigComp) (2016)
Hong Kong, China
Jan. 18, 2016 to Jan. 20, 2016
ISSN: 2375-9356
ISBN: 978-1-4673-8795-8
pp: 395-397
Jae-ho Shin , Dept. of Software, Gachon University, Seongnam-si, Korea
Gyoung-Don Joo , Dept. of Software, Gachon University, Seongnam-si, Korea
Chulyun Kim , Dept. of Software, Gachon University, Seongnam-si, Korea
ABSTRACT
An increasing number of online market places have emerged as online shopping becomes more popular for a couple of decades. During that time, technologies to construct web sites have been evolved as well and, currently, AJAX is a representative technique to construct dynamic web pages. Crawling is a basic tool to collect information in the internet, and traditional crawling techniques randomly choose and follow links represented by the anchor tag in order to navigate the Word-Wide-Web. However, when a traditional crawler is applied for gathering information from a targeted up-to-date online market place, there are some critical problems. The first issue is that there are too many links, among which only few are enough to navigate all web pages in the site. The second issue is that most links are given by JavaScript but not by the anchor tags, which cannot be followed by the traditional web crawlers. Therefore, to overcome these issues, we suggest a webpage crawling method which can extract only necessary and sufficient links by adopting crowdsourcing approach and can follow JavaScript links by using a navigating information represented by XPaths.
INDEX TERMS
Crawlers, Uniform resource locators, Web pages, Crowdsourcing, Data mining, Java, XML
CITATION

Jae-ho Shin, Gyoung-Don Joo and Chulyun Kim, "XPath based crawling method with crowdsourcing for targeted online market places," 2016 International Conference on Big Data and Smart Computing (BigComp)(BIGCOMP), Hong Kong, China, 2016, pp. 395-397.
doi:10.1109/BIGCOMP.2016.7425956
96 ms
(Ver 3.3 (11022016))