Subscribe
Issue No.01 - Jan. (2014 vol.26)
pp: 83-96
Ying Zhang , The University of New South Wales, Sydney
Wenjie Zhang , The University of New South Wales, Sydney
Jian Pei , Simon Fraser Univeristy, Burnaby
Xuemin Lin , The University of New South Wales, Sydney
Qianlu Lin , The University of New South Wales, Sydney
Aiping Li , National University of Defense Technology, ChangSha
ABSTRACT
In this paper, we tackle a novel problem of ranking multivalued objects, where an object has multiple instances in a multidimensional space, and the number of instances per object is not fixed. Given an ad hoc scoring function that assigns a score to a multidimensional instance, we want to rank a set of multivalued objects. Different from the existing models of ranking uncertain and probabilistic data, which model an object as a random variable and the instances of an object are assumed exclusive, we have to capture the coexistence of instances here. To tackle the problem, we advocate the semantics of favoring widely preferred objects instead of majority votes, which is widely used in many elections and competitions. Technically, we borrow the idea from Borda Count (BC), a well-recognized method in consensus-based voting systems. However, Borda Count cannot handle multivalued objects of inconsistent cardinality, and is costly to evaluate top $(k)$ queries on large multidimensional data sets. To address the challenges, we extend and generalize Borda Count to quantile-based Borda Count, and develop efficient computational methods with comprehensive cost analysis. We present case studies on real data sets to demonstrate the effectiveness of the generalized Borda Count ranking, and use synthetic and real data sets to verify the efficiency of our computational method.
INDEX TERMS
Cities and towns, Biological system modeling, Indexes, Educational institutions, Probabilistic logic, Data models, Economics,consensus-based ranking, Multivalued objects
CITATION
Ying Zhang, Wenjie Zhang, Jian Pei, Xuemin Lin, Qianlu Lin, Aiping Li, "Consensus-Based Ranking of Multivalued Objects: A Generalized Borda Count Approach", IEEE Transactions on Knowledge & Data Engineering, vol.26, no. 1, pp. 83-96, Jan. 2014, doi:10.1109/TKDE.2012.250
REFERENCES
 [1] J.A. Aslam and M.H. Montague, "Models for Metasearch," Proc. 24th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), 2001. [2] J.A. Benediktsson and I. Kanellopoulos, "Classification of Multisource and Hyperspectral Data Based on Decision," IEEE Trans. Geoscience and Remote Sensing, vol. 37, no. 3, pp. 1367-1377, May 1999. [3] S. Börzsönyi, D. Kossmann, and K. Stocker, "The Skyline Operator," Proc. 17th Int'l Conf. Data Eng. (ICDE), 2001. [4] J.P. Callan, Z. Lu, and W.B. Croft, "Searching Distributed Collections with Inference Networks," Proc. 18th Ann. Int'l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 21-28, 1995. [5] Y.-C. Chang, L.D. Bergman, V. Castelli, C.-S. Li, M.-L. Lo, and J.R. Smith, "The Onion Technique: Indexing for Linear Optimization Queries," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2000. [6] T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein, Introduction to Algorithms, second ed., MIT Press, 2001. [7] G. Cormode, F. Li, and K. Yi, "Semantics of Ranking Queries for Probabilistic Data and Expected Ranks," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2009. [8] G. Das, D. Gunopulos, N. Koudas, and N. Sarkas, "Ad-Hoc Top-k Query Answering for Data Streams," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), 2007. [9] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications, second ed. Springer-Verlag, 2000. [10] J.-C. de Borda, Mmoire sur les lections au scrutin, Oxford Univ. Press for Social Sciences, 1781. [11] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar, "Rank Aggregation Methods for the Web," Proc. 10th Int'l Conf. World Wide Web (WWW), pp. 613-622, 2001. [12] R. Fagin, "Combining Fuzzy Information from Multiple Systems," J. Computer and System Sciences, vol. 58, no. 1, pp. 83-99, 1999. [13] L. Guinier, Tyranny of the Majority: Fundamental Fairness in Representative Democracy, Free Press, 1995. [14] A. Guttman, "R-Trees: A Dynamic Index Structure for Spatial Searching," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 1984. [15] R. Herrerias-pleguezuelo, Distribution Models Theory, World Scientific Publishing, 2005. [16] J. Jestes, G. Cormode, F. Li, and K. Yi, "Semantics of Ranking Queries for Probabilistic Data," IEEE Trans. Knowledge Data Eng., vol. 23, no. 12, pp. 1903-1917, Dec. 2011. [17] J. Li, B. Saha, and A. Deshpande, "A Unified Approach to Ranking in Probabilistic Databases," Proc. VLDB Endowment, vol. 2, no. 1, pp. 502-513, 2009. [18] G.S. Manku, S. Rajagopalan, and B.G. Lindsay, "Approximate Medians and Other Quantiles in One Pass and with Limited Memory," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), pp. 426-435, 1998. [19] K. Mouratidis, S. Bakiras, and D. Papadias, "Continuous Monitoring of Top-k Queries over Sliding Windows," Proc. ACM SIGMOD Int'l Conf. Management of Data (SIGMOD), 2006. [20] S. Nepal and M.V. Ramakrishna, "Query Processing Issues in Image (Multimedia) Databases," Proc. 15th Int'l Conf. Data Eng. (ICDE), 1999. [21] M.E. Renda and U. Straccia, "Web Metasearch: Rank vs. Score Based Rank Aggregation Methods," Proc. ACM Symp. Applied Computing (SAC), 2003. [22] A.D. Sarma, O. Benjelloun, A.Y. Halevy, S.U. Nabar, and J. Widom, "Representing Uncertain Data: Models Properties, and Algorithms," VLDB J., vol. 18, pp. 989-1019, 2009. [23] J.H. Smith, "Aggregation of Preferences with Variable Electorate," Econometrica, vol. 41, no. 6, pp. 1027-1041, 1973. [24] M.A. Soliman, I.F. Ilyas, and K.C. Chang, "Top-$k$ Query Processing in Uncertain Databases," Proc. IEEE 23rd Int'l Conf. Data Eng. (ICDE), 2007. [25] A.P. Topchy, M.H.C. Law, A.K. Jain, and A.L.N. Fred, "Analysis of Consensus Partition in Cluster Ensemble," Proc. IEEE Fourth Int'l Conf. Data Mining (ICDM), 2004. [26] H.P. Young, "An Axiomatization of Borda's Rule," J. Economic Theory, vol. 9, no. 1, pp. 43-52, 1974. [27] W. Zhang, X. Lin, M.A. Cheema, Y. Zhang, and W. Wang, "Quantile-Based KNN Over Multi-Valued Objects," Proc. Int'l Conf. Data Eng. (ICDE), 2010.