The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.05 - Sept.-Oct. (2012 vol.16)
pp: 20-27
Jeroen B.P. Vuurens , The Hague University of Applied Science
Arjen P. de Vries , Centrum Wiskunde & Informatica
ABSTRACT
The performance of information retrieval (IR) systems is commonly evaluated using a test set with known relevance. Crowdsourcing is one method for learning the relevant documents to each query in the test set. However, the quality of relevance learned through crowdsourcing can be questionable, because it uses workers of unknown quality with possible spammers among them. To detect spammers, the authors' algorithm compares judgments between workers; they evaluate their approach by comparing the consistency of crowdsourced ground truth to that obtained from expert annotators and conclude that crowdsourcing can match the quality obtained from the latter.
INDEX TERMS
Accuracy, Internet, Conferences, Information retrieval, Unsolicited electronic mail, Detection algorithms, Reliability, spam, crowdsourcing, judgment, quality, relevance
CITATION
Jeroen B.P. Vuurens, Arjen P. de Vries, "Obtaining High-Quality Relevance Judgments Using Crowdsourcing", IEEE Internet Computing, vol.16, no. 5, pp. 20-27, Sept.-Oct. 2012, doi:10.1109/MIC.2012.71
REFERENCES
1. E.M. Voorhees, "Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness," Information Processing and Management, vol. 36, Elsevier, 2000, pp. 697–716.
2. O. Alonso, D.E. Rose, and B. Stewart, "Crowdsourcing for Relevance Evaluation," SIGIR Forum vol. 42, no. 2, 2008, pp. 9–15.
3. P.G. Ipeirotis, F. Provost, and J. Wang, "Quality Management on Amazon Mechanical Turk," Proc. ACM SIGKDD Workshop Human Computation, ACM, 2010, pp. 64–67.
4. R. Blanco et al., "Repeatable and Reliable Search System Evaluation Using Crowdsourcing," Proc. 34th Ann. Int'l ACM SIGIR, Conf. Research and Development in Information Retrieval, ACM, 2011, pp. 923–932.
5. A. Kittur, E.H. Chi, and B. Suh, "Crowdsourcing User Studies with Mechanical Turk," Proc. 26th Ann. SIGCHI Conf. Human Factors in Computing Systems (CHI 08), ACM, 2008, pp. 453–456.
6. J.B.P. Vuurens, A.P. de Vries, and C. Eickhoff, "How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy," Proc. SIGIR Workshop Crowdsourcing for Information Retrieval (CIR 11), ACM, 2011, pp. 21–26
7. J. Le et al., "Ensuring Quality in Crowdsourced Search Relevance Evaluation," Proc. SIGIR Workshop Crowdsourcing for Search Evaluation (CSE 10), ACM, 2010, pp. 17–20.
8. G. Kazai, J. Kamps, and N. Milic-Frayling,"Worker Types and Personality Traits in Crowdsourcing Relevance Labels," Proc. 20th ACM Conf. Information and Knowledge Management (CIKM 11), ACM, 2011, pp. 1941–1944.
9. D. Zhu and B. Carterette, "An Analysis of Assessor Behavior in Crowdsourced Preference Judgments," Proc. 33rd ACM SIGIR Workshop Crowdsourcing for Search Evaluation (CSE 10), ACM, 2010, pp. 21–26.
10. C. Eickhoff and A.P. de Vries, "Increasing Cheat Robustness of Crowdsourcing Tasks," Advances in Information Retrieval, Springer, to appear, 2012.
11. G. Kazai, "In Search of Quality in Crowdsourcing for Search Engine Evaluation," Advances in Information Retrieval, Springer, 2011, pp. 165–176.
12. O. Dekel and O. Shamir, "Vox Populi: Collecting High-Quality Labels from a Crowd," Proc. 22nd Ann. Conf. Learning Theory (COLT 09), 2009.
13. J. Rzeszotarski and A. Kittur, "Instrumenting the Crowd: Using Implicit Behavioral Measures to Predict Task Performance," ACM Symp. User Interface Software and Technology, ACM, 2011, pp. 13–22.
14. J. Downs et al., "Are Your Participants Gaming the System? Screening Mechanical Turk Workers," Proc. 28th Int'l Conf. Human Factors in Computing Systems (CHI 10), ACM, 2010, pp. 2399–2402.
15. K. Krippendorff and M.A. Bock, The Content Analysis Reader, SAGE Publications, 2009.
16. M.A. Kouritzin et al., "On Detecting Fake Coin Flip Sequences," IMS Collections − Markov Processes and Related Topics: A Festschrift for Thomas G. Kurtz, vol. 4, IMS Collections, 2008, pp. 107–122.
17. F. Scholer, A. Turpin, and M. Sanderson, "Quantifying Test Collection Quality Based on the Consistency of Relevance Judgments," Proc. 34th Ann. ACM Special Interest Group on Information Retrieval, ACM, 2011, pp. 1063–1072.
18. V.S. Sheng, F. Provost, and P.G. Ipeirotis, "Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers," Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 08), ACM, 2008, pp. 614–622.
38 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool