This Article 
 Bibliographic References 
 Add to: 
Machine Learning for the New York City Power Grid
February 2012 (vol. 34 no. 2)
pp. 328-345
C. Rudin, MIT Sloan Sch. of Manage., Massachusetts Inst. of Technol., Cambridge, MA, USA
D. Waltz, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
R. N. Anderson, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
A. Boulanger, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
A. Salleb-Aouissi, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
M. Chow, Consolidated Edison Co. of New York, New York, NY, USA
H. Dutta, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
P. N. Gross, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
B. Huang, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
S. Ierome, Consolidated Edison Co. of New York, New York, NY, USA
D. F. Isaac, Consolidated Edison Co. of New York, New York, NY, USA
A. Kressner, Grid Connections, LLC, Consolidated Edison Co. of New York, New York, NY, USA
R. J. Passonneau, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
A. Radeva, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
L. Wu, Center for Comput. Learning Syst., Columbia Univ., New York, NY, USA
Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce (1) feeder failure rankings, (2) cable, joint, terminator, and transformer rankings, (3) feeder Mean Time Between Failure (MTBF) estimates, and (4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or real-time, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City's electrical grid.

[1] "Office of Electric Transmission United States Department of Energy and Distribution," "Grid 2030" a Nat'l Vision for Electricity's Second 100 Years, July 2003.
[2] "North American Electric Reliability Corporation (NERC)," Results of the 2007 Survey of Reliability Issues, Revision 1, Oct. 2007.
[3] S.M. Amin, "U.S. Electrical Grid Gets Less Reliable," IEEE Spectrum, Jan. 2011.
[4] M. Chupka, R. Earle, P. Fox-Penner, and R. Hledik, "Transforming America's Power Industry: The Investment Challenge 2010-2030," technical report, The Brattle Group, prepared for The Edison Foundation, Washington, D.C., 2008.
[5] W.J. Frawley, G. Piatetsky-Shapiro, and C.J. Matheus, "Knowledge Discovery in Databases: An Overview," AI Magazine, vol. 13, no. 3, pp. 57-70, 1992.
[6] J.A. Harding, M. Shahbaz, S. Srinivas, and A. Kusiak, "Data Mining in Manufacturing: A Review," J. Manufacturing Science and Eng., vol. 128, no. 4, pp. 969-976, 2006.
[7] A. Azevedo and M.F. Santos, "KDD, SEMMA and CRISP-DM: A Parallel Overview," Proc. Int'l Assoc. Development of the Information Soc. European Conf. Data Mining, pp. 182-185, 2008.
[8] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "From Data Mining to Knowledge Discovery in Databases," AI Magazine, vol. 17, pp. 37-54, 1996.
[9] W. Hsu, M.L. Lee, B. Liu, and T.W. Ling, "Exploration Mining in Diabetic Patients Databases: Findings and Conclusions," Proc. Sixth ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 430-436, 2000.
[10] R. Kohavi, L. Mason, R. Parekh, and Z. Zheng, "Lessons and Challenges from Mining Retail E-Commerce Data," Machine Learning, special issue on data mining lessons learned, vol. 57, pp. 83-113, 2004.
[11] A.P. Bradley, "The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms," Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, July 1997.
[12] C. Rudin, "The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List," J. Machine Learning Research, vol. 10, pp. 2233-2271, Oct. 2009.
[13] C. Rudin and R.E. Schapire, "Margin-Based Ranking and an Equivalence between AdaBoost and RankBoost," J. Machine Learning Research, vol. 10, pp. 2193-2232, Oct. 2009.
[14] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer, "An Efficient Boosting Algorithm for Combining Preferences," J. Machine Learning Research, vol. 4, pp. 933-969, 2003.
[15] T. Joachims, "A Support Vector Method for Multivariate Performance Measures," Proc. Int'l Conf. Machine Learning, 2005.
[16] H. Drucker, C.J.C. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support Vector Regression Machines," Proc. Advances in Neural Information Processing Systems, vol. 9, pp. 155-161, 1996.
[17] L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen, CART: Classification and Regression Trees. Wadsworth Press, 1983.
[18] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, Oct. 2001.
[19] D.R. Cox, "Regression Models and Life-Tables," J. the Royal Statistical Soc., Series B (Methodological), vol. 34, no. 2, pp. 187-220, 1972.
[20] P. Gross, A. Salleb-Aouissi, H. Dutta, and A. Boulanger, "Ranking Electrical Feeders of the New York Power Grid," Proc. Int'l Conf. Machine Learning and Applications, pp. 725-730, 2009.
[21] P. Gross, A. Boulanger, M. Arias, D.L. Waltz, P.M. Long, C. Lawson, R. Anderson, M. Koenig, M. Mastrocinque, W. Fairechio, J.A. Johnson, S. Lee, F. Doherty, and A. Kressner, "Predicting Electricity Distribution Feeder Failures Using Machine Learning Susceptibility Analysis," Proc. 18th Conf. Innovative Applications of Artificial Intelligence, 2006.
[22] H. Becker and M. Arias, "Real-Time Ranking with Concept Drift Using Expert Advice," Proc. 13th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, pp. 86-94, 2007.
[23] C. Rudin, R. Passonneau, A. Radeva, H. Dutta, S. Ierome, and D. Isaac, "A Process for Predicting Manhole Events in Manhattan," Machine Learning, vol. 80, pp. 1-31, 2010.
[24] R. Passonneau, C. Rudin, A. Radeva, and Z.A. Liu, "Reducing Noise in Labels and Features for a Real World Dataset: Application of NLP Corpus Annotation Methods," Proc. 10th Int'l Conf. Computational Linguistics and Intelligent Text Processing, 2009.
[25] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan, "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications," Proc. 40th Anniversary Meeting Assoc. for Computational Linguistics, July 2002.
[26] P. Shivaswamy, W. Chu, and M. Jansche, "A Support Vector Approach to Censored Targets," Proc. Int'l Conf. Data Mining, 2007.
[27] A. Radeva, C. Rudin, R. Passonneau, and D. Isaac, "Report Cards for Manholes: Eliciting Expert Feedback for a Machine Learning Task," Proc. Int'l Conf. Machine Learning and Applications, 2009.
[28] H. Dutta, C. Rudin, R. Passonneau, F. Seibel, N. Bhardwaj, A. Radeva, Z.A. Liu, S. Ierome, and D. Isaac, "Visualization of Manhole and Precursor-Type Events for the Manhattan Electrical Distribution System," Proc. Workshop Geo-Visualization of Dynamics, Movement and Change, 11th AGILE Int'l Conf. Geographic Information Science, May 2008.
[29] N.D. Hatziargyriou, "Machine Learning Applications to Power Systems," Machine Learning and Its Applications, pp. 308-317, Springer-Verlag 2001.
[30] A. Ukil, Intelligent Systems and Signal Processing in Power Engineering. Springer, 2007.
[31] L.A. Wehenkel, Automatic Learning Techniques in Power Systems. Springer, 1998.
[32] A. Saramourtsis, J. Damousis, A. Bakirtzis, and P. Dokopoulos, "Genetic Algorithm Solution to the Economic Dispatch Problem— Application to the Electrical Power Grid of Crete Island," Proc. Workshop Machine Learning Applications to Power Systems (ACAI), pp. 308-317, 2001.
[33] Y.A. Katsigiannis, A.G. Tsikalakis, P.S. Georgilakis, and N.D. Hatziargyriou, "Improved Wind Power Forecasting Using a Combined Neuro-Fuzzy and Artificial Neural Network Model," Proc. Fourth Helenic Conf. Artificial Intelligence, pp. 105-115, 2006.
[34] P. Geurts and L. Wehenkel, "Early Prediction of Electric Power System Blackouts by Temporal Machine Learning," Proc. ICML '98/AAAI '98 Workshop Predicting the Future: AI Approaches to Time Series Analysis, pp. 21-28, 1998.
[35] L. Wehenkel, M. Glavic, P. Geurts, and D. Ernst, "Automatic Learning for Advanced Sensing Monitoring and Control of Electric Power Systems," Proc. Second Carnegie Mellon Conf. Electric Power Systems, 2006.
[36] H. Chen, W. Chung, J.J. Xu, G. Wang, Y. Qin, and M. Chau, "Crime Data Mining: A General Framework and Some Examples," Computer, vol. 37, no. 4, pp. 50-56, Apr. 2004.
[37] B. Cornélusse, C. Wera, and L. Wehenkel, "Automatic Learning for the Classification of Primary Frequency Control Behaviour," Proc. IEEE Lausanne Power Tech Conf., 2007.
[38] S.R. Dalal, D. Egan, M. Rosenstein, and Y. Ho, "The Promise and Challenge of Mining Web Transaction Data," Statistics in Industry (Handbook of Statistics), R. Khatree and C.R. Rao, eds., vol. 22, Elsevier, 2003.
[39] P.R. Rosenbaum and D.B. Rubin, "The Central Role of the Propensity Score in Observational Studies for Causal Effects," Biometrika, vol. 70, no. 1, pp. 45-55, 1983.

Index Terms:
statistical analysis,learning (artificial intelligence),power engineering computing,power grids,statistical models,New York City power grid,power companies,knowledge discovery methods,statistical machine learning,preventive maintenance,electrical grid data,feeder failure rankings,transformer rankings,feeder Mean Time Between Failure,MTBF,manhole events vulnerability rankings,decision making,Maintenance engineering,Power cables,Data models,Power grids,Machine learning,reliability.,Applications of machine learning,electrical grid,smart grid,knowledge discovery,supervised ranking,computational sustainability
C. Rudin, D. Waltz, R. N. Anderson, A. Boulanger, A. Salleb-Aouissi, M. Chow, H. Dutta, P. N. Gross, B. Huang, S. Ierome, D. F. Isaac, A. Kressner, R. J. Passonneau, A. Radeva, L. Wu, "Machine Learning for the New York City Power Grid," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 2, pp. 328-345, Feb. 2012, doi:10.1109/TPAMI.2011.108
Usage of this product signifies your acceptance of the Terms of Use.