Issue No. 09 - Sept. (2016 vol. 28)
Guoliang Li , Department of Computer Science, Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing, China
Jiannan Wang , School of Computing Science, Simon Fraser University, Burnaby, Canada
Yudian Zheng , Department of Computer Science, University of Hong Kong, Pokfulam, Hong Kong
Michael J. Franklin , AMPLab, UC Berkeley, Berkeley, CA 94720
Any important data management and analytics tasks cannot be completely addressed by automated processes. These tasks, such as entity resolution, sentiment analysis, and image recognition can be enhanced through the use of human cognitive ability. Crowdsouring platforms are an effective way to harness the capabilities of people (i.e., the crowd) to apply human computation for such tasks. Thus, crowdsourced data management has become an area of increasing interest in research and industry. We identify three important problems in crowdsourced data management. (1) Quality Control: Workers may return noisy or incorrect results so effective techniques are required to achieve high quality; (2) Cost Control: The crowd is not free, and cost control aims to reduce the monetary cost; (3) Latency Control: The human workers can be slow, particularly compared to automated computing time scales, so latency-control techniques are required. There has been significant work addressing these three factors for designing crowdsourced tasks, developing crowdsourced data manipulation operators, and optimizing plans consisting of multiple operators. In this paper, we survey and synthesize a wide spectrum of existing studies on crowdsourced data management. Based on this analysis we then outline key factors that need to be considered to improve crowdsourced data management.
Crowdsourcing, Quality control, Pricing, Predictive models, Electronic mail, Data models, Computational modeling
Guoliang Li, Jiannan Wang, Yudian Zheng, Michael J. Franklin, "Crowdsourced Data Management: A Survey", IEEE Transactions on Knowledge & Data Engineering, vol. 28, no. , pp. 2296-2319, Sept. 2016, doi:10.1109/TKDE.2016.2535242