Issue No. 05 - Sept.-Oct. (2013 vol. 15)
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/MCSE.2013.107
Francis J. Alexander , Los Alamos National Laboratory
Machine learning, often referred to as statistical learning, computational learning, and pattern recognition, has come a long way since its humble origins in the 1930s with Ronald A. Fisher 1 and 1950s with Frank Rosenblatt’s linear perceptron. 2 Along the way, the rapid evolution of machine learning has been made possible due to fundamental advances in statistics, probability theory, and functional analysis as well as from the exponential increase in computing capability. Following some downs and ups in the late 1960s 3 and early 1970s, 4 the 1980s were a decade of promise with neural networks. Unfortunately, much of the hype resting on neural networks met with the reality of training despite the significant advances in computing power. In the 1990s there was a rebirth in machine learning when several new avenues were opened up by the practical implementation of support vector machines and their efficient training. 5–7
The last decade has seen a burgeoning of applications of machine learning, in particular for complex learning problems. The field that began trying simply to linearly separate points in a plane 7 had evolved into solving structured learning problems like drug discovery and experiment design for systems biology. 8 Other advances include learning when the training data is presented by an “adversary,” 9 learning with extra information beyond the training data, and learning with massive amounts of data. The latter occurs in astronomy, speech, and text—and while it has great promise, it’s also computationally challenging. In this special issue of CiSE, we present some recent advances in machine learning and their applications to exciting new problem areas.
The article “Interactive Machine Learning in Data Exploitation” by Reid Porter and his colleagues describes how to make better and more efficient use of the massive amounts of data in science and engineering, especially when the quantity of data being collected far exceeds any individual human’s capacity to take in and make sense of it. What all too often happens now is that the analyst will simply throw away much of the data or store it without ever actually looking at it. Therefore, much of the data is never exploited, and the resources spent to obtain that data are wasted. The authors propose a remedy to this situation by striking a balance between what machines and humans will do.
Finding this balance is a challenge, because for many applications (human) domain experts simply don’t trust machines to replace them in their analysis of data. Recognizing this reality, Porter and his colleagues propose a new approach to interactive machine learning. The goal of interactive machine learning is to help scientists and engineers exploit more of their (often highly) specialized data in considerably less time. Interactive machine learning includes methods that let domain experts apply and guide machine-learning tools from within the deployed environment in which those tools are used. This is in marked contrast to the more conventional approach, where the tools are developed in one setting and used in another. The authors then demonstrate how materials scientists are using this approach to characterize microscopy images, as well as how image analysts can use the method for automated target recognition. They close with a description of the considerable potential of their framework to other important problem areas.
A different challenge in materials science is to use machine learning for the discovery and/or design of novel materials. Recently, materials scientists have imported some tools that have gained momentum in bioinformatics. In particular, supervised and unsupervised learning methods are being applied to high-dimensional data to predict and ultimately discover materials with desired properties. Srikant Srinivasan and Krishna Rajan describe recent work along these lines in their article “Revisiting Computational Thermodynamics through Machine Learning of High-Dimensional Data.”
In this article, one of the main obstacles that the authors face is to determine what the relevant features are that provide a good predictive capacity for materials behavior. They focus their attention on variance-based methods to reduce the dimensionality of the space of physical features (for example, the atomic number, electrical resistivity, melting point, and crystal structure). In the end, they keep only those features that generate the largest variance in the database. Of these variance-reduction methods, principal components analysis is perhaps the best-known example. The authors also apply high-dimensional model representation to the area of materials discovery. The key message of the article is to demonstrate how machine learning offers a radically new approach to bridging standard continuum representations of matter based on thermodynamics to high-dimensional representations at the atomic scale.
The final article, “Climate Informatics: Accelerating Discovering in Climate Science with Machine Learning,” by Claire Monteleoni and her colleagues, introduces machine learning to one of the most important and challenging scientific topics in 21st century. We would like to predict the future climate given knowledge of the climate dynamics (that is, the governing physics equations), today’s climate, and the impacts of potential natural and antropomorphic driving forces. Currently, there are numerous models for climate dynamics. The goal of the emerging field of climate informatics is to introduce and develop tools from machine learning into climate science so that the field can benefit from significant observational data and complex models.
In this article, the authors describe this fledgling field and provide a discussion of global climate models and multimodel ensembles. They also apply novel online learning algorithms with the ultimate goal of making both real-time and future predictions. Their new method, called the Learn−α algorithm, systematically combines the predictions of individual models comprising the Coupled Model Intercomparison Project suite. Then, as an invitation for others to join in this exciting area of research, the authors present a short list of open problems in climate science for which machine learning might offer a solution.
In this special issue, we present some exciting new directions in machine learning. Each of the articles highlights a different application area and uses a different set of tools. The selection of topics is by no means exhaustive, but is meant to provide a taste of what is being done across the broad field of machine learning. I hope you enjoy the articles. If the past is any indication of the future evolution of machine learning, who knows where the field is headed in coming decades.
Francis J. Alexander is the deputy division leader of the Computer, Computational, and Statistical Sciences Division and the Information Science and Technology Institute leader at Los Alamos National Laboratory. His research interests include statistical mechanics, computational physics, and machine learning. Alexander has a PhD in physics from Rutgers, The State University of New Jersey. Contact him at firstname.lastname@example.org.