The Community for Technology Leaders
RSS Icon
Issue No.04 - April (2013 vol.35)
pp: 769-783
K. Audhkhasi , Electr. Eng. Dept., Univ. of Southern California, Los Angeles, CA, USA
S. Narayanan , Electr. Eng. Dept., Univ. of Southern California, Los Angeles, CA, USA
Researchers have shown that fusion of categorical labels from multiple experts - humans or machine classifiers - improves the accuracy and generalizability of the overall classification system. Simple plurality is a popular technique for performing this fusion, but it gives equal importance to labels from all experts, who may not be equally reliable or consistent across the dataset. Estimation of expert reliability without knowing the reference labels is, however, a challenging problem. Most previous works deal with these challenges by modeling expert reliability as constant over the entire data (feature) space. This paper presents a model based on the consideration that in dealing with real-world data, expert reliability is variable over the complete feature space but constant over local clusters of homogeneous instances. This model jointly learns a classifier and expert reliability parameters without assuming knowledge of the reference labels using the Expectation-Maximization (EM) algorithm. Classification experiments on simulated data, data from the UCI Machine Learning Repository, and two emotional speech classification datasets show the benefits of the proposed model. Using a metric based on the Jensen-Shannon divergence, we empirically show that the proposed model gives greater benefit for datasets where expert reliability is highly variable over the feature space.
Reliability, Speech, Humans, Labeling, Data models, Training, Analytical models, emotion recognition, Multiple diverse experts, label fusion, label reliability, expectation-maximization algorithm, human annotation
K. Audhkhasi, S. Narayanan, "A Globally-Variant Locally-Constant Model for Fusion of Labels from Multiple Diverse Experts without Using Reference Labels", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.35, no. 4, pp. 769-783, April 2013, doi:10.1109/TPAMI.2012.139
[1] E. Mower, A. Metallinou, C-C. Lee, A. Kazemzadeh, C. Busso, S. Lee, and S. Narayanan, "Interpreting Ambiguous Emotional Expressions," Proc. Third Int'l Conf. Affective Computing and Intelligent Interaction and Workshops, Sept. 2009.
[2] V.C. Raykar, S. Yu, L.S. Zhao, G.H. Valadez, C. Florin, L. Bogoni, and L. Moy, "Learning from Crowds," J. Machine Learning Research, vol. 11, pp. 1297-1322, Mar. 2010.
[3] R. Snow, B. O'Connor, D. Jurafsky, and A.Y. Ng, "Cheap and Fast—But Is It Good?: Evaluating Non-Expert Annotations for Natural Language Tasks," Proc. Conf. Empirical Methods in Natural Language Processing, pp. 254-263. 2008.
[4] A.P. Dawid and A.M. Skene, "Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm," J. Royal Statistical Soc. Series C (Applied Statistics), vol. 28, no. 1, pp. 20-28, 1979.
[5] P. Smyth, U. Fayyad, M. Burl, P. Perona, and P. Baldi, "Inferring Ground Truth from Subjective Labeling of Venus Images," Advances in Neural Information Processing Systems, pp. 1085-1092, 1995.
[6] K. Audhkhasi and S.S. Narayanan, "Data-Dependent Evaluator Modeling and Its Application to Emotional Valence Classification from Speech," Proc. Interspeech Conf., 2010.
[7] A.P. Dempster, N.M. Liard, and D.B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc.: Series B, vol. 39, pp. 1-38, 1977.
[8] M. Grimm and K. Kroschel, "Evaluation of Natural Emotions Using Self Assessment Manikins," Proc. IEEE Workshop Automatic Speech Recognition and Understanding, pp. 381-385, 2005.
[9] F. Alimoglu and E. Alpaydin, "Methods of Combining Multiple Classifiers Based on Different Representations for Pen-Based Handwritten Digit Recognition," Proc. Fifth Turkish Artificial Intelligence and Artificial Neural Networks Symp., 1996.
[10] P. Horton and K. Nakai, "A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins," Proc. Fourth Int'l Conf. Intelligent Systems for Molecular Biology, pp. 109-115. 1996,
[11] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan, "Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise," Neural Information Processing Systems, vol. 22, pp. 2035-2043, 2009.
[12] Y. Yan, R. Rosales, G. Fung, M. Schmidt, G. Hermosillo, L. Bogoni, L. Moy, and G.J. Dy, "Modeling Annotator Expertise: Learning When Everyone Knows a Bit of Something," Proc. Int'l Conf. Artificial Intelligence and Statistics, 2010.
[13] M. Marge, S. Banerjee, and A.I. Rudnicky, "Using the Amazon Mechanical Turk for Transcription of Spoken Language," Proc. Int'l Conf. Acoustics, Speech, and Signal Processing, 2010.
[14] A. Sorokin and D. Forsyth, "Utility Data Annotation with Amazon Mechanical Turk," Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, pp. 1-8. 2008.
[15] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A Large-Scale Hierarchical Image Database," Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, 2009.
[16] J. Heer and M. Bostock, "Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design," Proc. Int'l Conf. Human Factors in Computing Systems, pp. 203-212, 2010.
[17] R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, vol. 2. Wiley, 2001.
[18] C.M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
[19] D. Heck, J. Knapp, J. Capdevielle, G. Schatz, and T. Thouw, CORSIKA: A Monte Carlo Code to Simulate Extensive Air Showers. Forschungszentrum Karlsruhe, 1998.
[20] J.W. Smith, J.E. Everhart, W.C. Dickson, W.C. Knowler, and R.S. Johannes, "Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus," Proc. Ann. Symp. Computer Application in Medical Care, pp. 261-268, 1988.
[21] W.J. Nash, The Population Biology of Abalone (Haliotis Species) in Tasmania. I. Blacklip Abalone (H Rubra) from the North Coast and the Islands of Bass Strait, Sea Fisheries Division, Marine Research Laboratories—Taroona, Dept. of Primary Industry and Fisheries, Tasmania, 1978.
[22] A. Frank and A. Asuncion, "UCI Machine Learning Repository," http://archive.ics.uci.eduml, 2010.
[23] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.H. Witten, "The WEKA Data Mining Software: An Update," ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009.
[24] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[25] S. Lee, S. Yildirim, A. Kazemzadeh, and S. Narayanan, "An Articulatory Study of Emotional Speech Production," Proc. Ninth European Conf. Speech Comm. and Technology, 2005.
[26] F. Eyben, M. Wöllmer, and B. Schuller, "OpenSMILE: The Munich Versatile and Fast Open-Source Audio Feature Extractor," Proc. ACM Int'l Conf. Multimedia, pp. 1459-1462, 2010.
[27] R. Kehrein, "The Prosody of Authentic Emotions," Proc. Speech Prosody Conf., pp. 423-426. 2002.
[28] G. McKeown, M. Valstar, R. Cowie, M. Pantic, and M. Schroder, "The SEMAINE Database: Annotated Multimodal Records of Emotionally Coloured Conversations Between a Person and a Limited Agent," IEEE Trans. Affective Computing, vol. 3, no. 1, pp. 5-17, Jan.-Mar. 2011.
[29] J. Lin, "Divergence Measures Based on the Shannon Entropy," IEEE Trans. Information Theory, vol. 37, no. 1, pp. 145-151, Jan. 1991.
[30] R. Jin and Z. Ghahramani, "Learning with Multiple Labels," Neural Information Processing Systems, pp. 921-928, 2003.
[31] M. Schulze, "A New Monotonic and Clone-Independent Single-Winner Election Method," Voting Matters, vol. 17, pp. 9-19, 2003.
[32] D. Black, The Theory of Committees and Elections. Springer, 1986.
32 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool