The Community for Technology Leaders
Green Image
Issue No. 04 - Dec. (2018 vol. 4)
ISSN: 2332-7790
pp: 473-486
Weiwei Shi , Shanghai Jiao Tong University, Minhang Qu, China
Yongxin Zhu , Shanghai Jiao Tong University, Minhang Qu, China
Philip S. Yu , Department of Computer Science, University of Illinois at Chicago, Chicago, IL
Jiawei Zhang , Department of Computer Science, University of Illinois at Chicago, Chicago, IL
Tian Huang , University of Cambridge, Cambridge, United Kingdom
Chang Wang , Shanghai Jiao Tong University, Minhang Qu, China
Yufeng Chen , Shandong Power Supply Company of State Grid, Jinan, China
ABSTRACT
More massive volume of data are generated in many areas than ever before. However, the missing of some values in collected data always occurs in practice and challenges extracting maximal value from these large scale data sets. Nevertheless, in multivariable time series, most of the existing methods either might be infeasible or could be inefficient to predict the missing data. In this paper, we have taken up the challenge of missing data prediction in multivariable time series by employing improved matrix factorization techniques. Our approaches are optimally designed to largely utilize both the internal patterns of each time series and the information of time series across multiple sources. Based on the idea, we have imposed three different regularization terms to constrain the objective functions of matrix factorization and built five corresponding models. Extensive experiments on real-world data sets and synthetic data set demonstrate that the proposed approaches can effectively improve the performance of missing data prediction in multivariable time series. Furthermore, we have also demonstrated how to take advantage of the high processing power of Apache Spark to perform missing data prediction in large scale multivariable time series.
INDEX TERMS
Time series analysis, Sensor phenomena and characterization, Big Data, Sparks, Correlation, Linear programming
CITATION

W. Shi et al., "Effective Prediction of Missing Data on Apache Spark over Multivariable Time Series," in IEEE Transactions on Big Data, vol. 4, no. 4, pp. 473-486, 2018.
doi:10.1109/TBDATA.2017.2719703
355 ms
(Ver 3.3 (11022016))