The Community for Technology Leaders
2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (2012)
Sanya, China China
Oct. 10, 2012 to Oct. 12, 2012
ISBN: 978-1-4673-2624-7
pp: 44-52
ABSTRACT
The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers' data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users' behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.
INDEX TERMS
Sampling methods, Crawlers, Web pages, Marketing and sales, Social network services, Engines, Communities, user behavior, Taobao, bipartite graph, sampling method, MHRW, Scrapy
CITATION

J. Wang and Y. Guo, "Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao," 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery(CYBERC), Sanya, China China, 2012, pp. 44-52.
doi:10.1109/CyberC.2012.17
90 ms
(Ver 3.3 (11022016))