The Community for Technology Leaders
2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR) (2015)
Florence, Italy
May 16, 2015 to May 17, 2015
ISBN: 978-0-7695-5594-2
pp: 123-133
William Martin , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Mark Harman , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Yue Jia , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Federica Sarro , Dept. of Comput. Sci., Univ. Coll. London, London, UK
Yuanyuan Zhang , Dept. of Comput. Sci., Univ. Coll. London, London, UK
ABSTRACT
Many papers on App Store Mining are susceptible to the App Sampling Problem, which exists when only a subset of apps are studied, resulting in potential sampling bias. We introduce the App Sampling Problem, and study its effects on sets of user review data. We investigate the effects of sampling bias, and techniques for its amelioration in App Store Mining and Analysis, where sampling bias is often unavoidable. We mine 106,891 requests from 2,729,103 user reviews and investigate the properties of apps and reviews from 3 different partitions: the sets with fully complete review data, partially complete review data, and no review data at all. We find that app metrics such as price, rating, and download rank are significantly different between the three completeness levels. We show that correlation analysis can find trends in the data that prevail across the partitions, offering one possible approach to App Store Analysis in the presence of sampling bias.
INDEX TERMS
Data mining, Measurement, Correlation, Market research, Google, Web pages, Computational modeling
CITATION

W. Martin, M. Harman, Yue Jia, F. Sarro and Yuanyuan Zhang, "The App Sampling Problem for App Store Mining," 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), Florence, Italy, 2015, pp. 123-133.
doi:10.1109/MSR.2015.19
157 ms
(Ver 3.3 (11022016))