The Community for Technology Leaders
RSS Icon
Subscribe
Issue No.10 - Oct. (2013 vol.25)
pp: 2177-2191
Wei-Shinn Ku , Auburn University, Auburn
Haiquan Chen , Valdosta State University, Valdosta
Haixun Wang , Microsoft Research Asia, Beijing
Min-Te Sun , National Central University, Taoyuan
ABSTRACT
The past few years have witnessed the emergence of an increasing number of applications for tracking and tracing based on radio frequency identification (RFID) technologies. However, raw RFID readings are usually of low quality and may contain numerous anomalies. An ideal solution for RFID data cleansing should address the following issues. First, in many applications, duplicate readings of the same object are very common. The solution should take advantage of the resulting data redundancy for data cleaning. Second, prior knowledge about the environment may help improve data quality, and a desired solution must be able to take into account such knowledge. Third, the solution should take advantage of physical constraints in target applications to elevate the accuracy of data cleansing. There are several existing RFID data cleansing techniques. However, none of them support all the aforementioned features. In this paper, we propose a Bayesian inference-based framework for cleaning RFID raw data. We first design an $(n)$-state detection model and formally prove that the three-state model can maximize the system performance. Then, we extend the $(n)$-state model to support two-dimensional RFID reader arrays and compute the likelihood efficiently. In addition, we devise a Metropolis-Hastings sampler with constraints, which incorporates constraint management to clean RFID data with high efficiency and accuracy. Moreover, to support real-time object monitoring, we present the streaming Bayesian inference method to cope with real-time RFID data streams. Finally, we evaluate the performance of our solutions through extensive experiments.
INDEX TERMS
Radiofrequency identification, Redundancy, Bayesian methods, Equations, Mathematical model, Computational modeling, Accuracy, spatiotemporal databases, Radiofrequency identification, Redundancy, Bayesian methods, Equations, Mathematical model, Computational modeling, Accuracy, uncertainty, Data cleaning, probabilistic algorithms
CITATION
Wei-Shinn Ku, Haiquan Chen, Haixun Wang, Min-Te Sun, "A Bayesian Inference-Based Framework for RFID Data Cleansing", IEEE Transactions on Knowledge & Data Engineering, vol.25, no. 10, pp. 2177-2191, Oct. 2013, doi:10.1109/TKDE.2012.116
REFERENCES
[1] P. Agrawal, O. Benjelloun, A.D. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, "Trio: A System for Data, Uncertainty, and Lineage," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), pp. 1151-1154, 2006.
[2] P. Andritsos, A. Fuxman, and R.J. Miller, "Clean Answers over Dirty Databases: A Probabilistic Approach," Proc. 22nd Int'l Conf. Data Eng. (ICDE), p. 30, 2006.
[3] L. Antova, C. Koch, and D. Olteanu, "Query Language Support for Incomplete Information in the MayBMS System," Proc. 33rd Int'l Conf. Very Large Data Bases (VLDB), pp. 1422-1425, 2007.
[4] S.S. Chawathe, V. Krishnamurthy, S. Ramachandran, and S.E. Sarma, "Managing RFID Data," Proc. 30th Int'l Conf. Very Large Data Bases (VLDB), pp. 1189-1195, 2004.
[5] H. Chen, W.-S. Ku, H. Wang, and M.-T. Sun, "Leveraging Spatio-Temporal Redundancy for RFID Data Cleansing," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 51-62, 2010.
[6] R. Cheng, S. Singh, and S. Prabhakar, "U-DBMS: A Database System for Managing Constantly-evolving Data," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), pp. 1271-1274, 2005.
[7] R. Cocci, T.T.L. Tran, Y. Diao, and P.J. Shenoy, "Efficient Data Interpretation and Compression over RFID Streams," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), pp. 1445-1447, 2008.
[8] N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases," Int'l J. Very Large Data Bases J., vol. 16, no. 4, pp. 523-544, 2007.
[9] A. Deshpande, C. Guestrin, and S. Madden, "Using Probabilistic Models for Data Management in Acquisitional Environments," Proc. Conf. Innovative Data Systems Research (CIDR), pp. 317-328, 2005.
[10] D.W. Engels and S.E. Sarma, "The Reader Collision Problem," Proc. IEEE Int'l Conf. Systems, Man and Cybernetics (SMC), 2002.
[11] C. Floerkemeier and M. Lampe, "Issues with RFID Usage in Ubiquitous Computing Applications," Proc. Fourth Int'l Conf. Pervasive Computing (Pervasive), pp. 188-193, 2004.
[12] M.J. Franklin, S.R. Jeffery, S. Krishnamurthy, F. Reiss, S. Rizvi, E. Wu, O. Cooper, A. Edakkunni, and W. Hong, "Design Considerations for High Fan-In Systems: The HiFi Approach," Proc. Conf. Innovative Data Systems Research (CIDR), pp. 290-304, 2005.
[13] H. Gonzalez, J. Han, X. Li, and D. Klabjan, "Warehousing and Analyzing Massive RFID Data Sets," Proc. 22nd Int'l Conf. Data Eng. (ICDE), p. 83, 2006.
[14] J. Ho, D.W. Engels, and S.E. Sarma, "HiQ: A Hierarchical Q-learning Algorithm to Solve the Reader Collision Problem," Proc. Int'l Symp. Applications and the Internet Workshops (SAINT Workshops), pp. 88-91, 2006.
[15] R. Jampani, F. Xu, M. Wu, L.L. Perez, C. Jermaine, and P.J. Haas, "MCDB: A Monte Carlo Approach to Managing Uncertain Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 687-700, 2008.
[16] S.R. Jeffery, G. Alonso, M.J. Franklin, W. Hong, and J. Widom, "Declarative Support for Sensor Data Cleaning," Proc. Fourth Int'l Conf. Pervasive Computing (Pervasive), pp. 83-100, 2006.
[17] S.R. Jeffery, M.J. Franklin, and M.N. Garofalakis, "An Adaptive RFID Middleware for Supporting Metaphysical Data Independence," Int'l J. Very Large Data Bases J., vol. 17, no. 2, pp. 265-289, 2008.
[18] S.R. Jeffery, M.N. Garofalakis, and M.J. Franklin, "Adaptive Cleaning for RFID Data Streams," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), pp. 163-174, 2006.
[19] N. Khoussainova, M. Balazinska, and D. Suciu, "Towards Correcting Input Data Errors Probabilistically Using Integrity Constraints," Proc. Fifth ACM Int'l Workshop Data Eng. for Wireless and Mobile Access (MobiDE), pp. 43-50, 2006.
[20] N. Khoussainova, M. Balazinska, and D. Suciu, "Probabilistic Event Extraction from RFID Data," Proc. 24th Int'l Conf. Data Eng. (ICDE), pp. 1480-1482, 2008.
[21] S. Kullback and R.A. Leibler, "On Information and Sufficiency," Annals of Math. Statistics, vol. 22, pp. 49-86, 1951.
[22] J. Letchner, C. Ré, M. Balazinska, and M. Philipose, "Access Methods for Markovian Streams," Proc. IEEE Int'l Conf. Data Eng. (ICDE), pp. 246-257, 2009.
[23] J. Myung, W. Lee, J. Srivastava, and T.K. Shih, "Tag-Splitting: Adaptive Collision Arbitration Protocols for RFID Tag Identification," IEEE Trans. Parallel Distributed Systems, vol. 18, no. 6, pp. 763-775, June 2007.
[24] D. Prasad, An Introduction to Numerical Analysis, second ed. Alpha Science Int'l, Ltd, 2005.
[25] J. Rao, S. Doraiswamy, H. Thakkar, and L.S. Colby, "A Deferred Cleansing Method for RFID Data Analytics," Proc. 32nd Int'l Conf. Very Large Data Bases (VLDB), pp. 175-186, 2006.
[26] L. Sullivan, "RFID Implementation Challenges Persist, All This Time Later," InformationWeek, Oct. 2005.
[27] T.T.L. Tran, A. McGregor, Y. Diao, L. Peng, and A. Liu., "Conditioning and Aggregating Uncertain Data Streams: Going Beyond Expectations," Proc. VLDB Endowment, vol. 3, pp. 1302-1313, 2010.
[28] T.T.L. Tran, L. Peng, Y. Diao, A. McGregor, and A. Liu, "CLARO: Modeling and Processing Uncertain Data Streams," Int'l J. Very Large Data Bases J., vol. 21, no. 5, pp. 651-676, 2012.
[29] T.T.L. Tran, L. Peng, B. Li, Y. Diao, and A. Liu, "PODS: A New Model and Processing Algorithms for Uncertain Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 159-170, 2010.
[30] T.T.L. Tran, C. Sutton, R. Cocci, Y. Nie, Y. Diao, and P.J. Shenoy, "Probabilistic Inference over RFID Streams in Mobile Environments," Proc. Int'l Conf. Data Eng. (ICDE), 2009.
[31] J. Waldrop, D.W. Engels, and S.E. Sarma, "Colorwave: An Anticollision Algorithm for the Reader Collision Problem," Proc. IEEE Int'l Conf. Comm. (ICC), pp. 1206-1210, 2003.
[32] F. Wang and P. Liu, "Temporal Management of RFID Data," Proc. 31st Int'l Conf. Very Large Data Bases (VLDB), pp. 1128-1139, 2005.
[33] R. Want, "The Magic of RFID," ACM Queue, vol. 2, no. 7, pp. 40-48, 2004.
[34] J. Xie, J. Yang, Y. Chen, H. Wang, and P.S. Yu, "A Sampling-Based Approach to Information Recovery," Proc. IEEE 24th Int'l Conf. Data Eng. (ICDE), pp. 476-485, 2008.
56 ms
(Ver 2.0)

Marketing Automation Platform Marketing Automation Tool