2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR) (2015)
May 16, 2015 to May 17, 2015
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MSR.2015.19
William Martin , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Mark Harman , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Yue Jia , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Federica Sarro , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Yuanyuan Zhang , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Many papers on App Store Mining are susceptible to the App Sampling Problem, which exists when only a subset of apps are studied, resulting in potential sampling bias. We introduce the App Sampling Problem, and study its effects on sets of user review data. We investigate the effects of sampling bias, and techniques for its amelioration in App Store Mining and Analysis, where sampling bias is often unavoidable. We mine 106,891 requests from 2,729,103 user reviews and investigate the properties of apps and reviews from 3 different partitions: the sets with fully complete review data, partially complete review data, and no review data at all. We find that app metrics such as price, rating, and download rank are significantly different between the three completeness levels. We show that correlation analysis can find trends in the data that prevail across the partitions, offering one possible approach to App Store Analysis in the presence of sampling bias.
Data mining, Measurement, Correlation, Market research, Google, Web pages, Computational modeling
W. Martin, M. Harman, Yue Jia, F. Sarro and Yuanyuan Zhang, "The App Sampling Problem for App Store Mining," 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), Florence, Italy, 2015, pp. 123-133.