2016 International Conference on Advanced Computing and Applications (ACOMP) (2016)
Can Tho City, Vietnam
Nov. 23, 2016 to Nov. 25, 2016
Marketing research through collecting data from e-commercial websites comes with latent risks of receiving inaccurate data which have been modified before they are returned, especially when the crawling processes are conducted by other service providers. The risk of data being modified is often dismissed in related research works of web crawling systems. Avoiding this problem requires an examination phase where the data are collected for the second time for comparisons. However, the cost for re-crawling processes to simply examine all the data is significant as it will double the original cost. In this paper, we introduce an efficient approach to choose potential data which are most likely to have been modified for later re-crawling processes. By this approach, we can reduce the cost for examining, but still guarantee the data achieve their authenticity. We then measure the efficiency of our scheme while testing the ability to detect fraudulent data in a dataset containing simulated modified data. Results show that our scheme can reduce considerably the amount of data to be re-crawled but still cover most of the fraudulent data. As an example, by applying our scheme to select the data to be re-crawled from a real-world e-commercial website, with a set in which fraudulent data occupy 50 percentages, we only need to re-collect 50 percentages of total data to detect up to 80 percentages of fraudulent data, which is clearly more efficient than choosing randomly the same amount of data to be re-crawled. We conclude by discussing the accuracy measurement of the proposed model.
Data models, Authentication, Feature extraction, Market research, Urban areas, Servers
K. D. Tran, D. D. Ho, D. M. Pham, A. K. Vo and H. H. Nguyen, "A Cross-Checking Based Method for Fraudulent Detection on E-Commercial Crawling Data," 2016 International Conference on Advanced Computing and Applications(ACOMP), Can Tho City, Vietnam, 2016, pp. 32-39.