, Virginia Tech
Abstract—As data permeates all disciplines, the role of big data becomes increasingly critical. This special theme issue's articles examine big data technology trends that impact databases, algorithms, and applications.
Keywords—big data; data analytics; databases
If we apply the Gartner Hype Cycle for Emerging Technologies to big data, the technology has passed through the peak of inflated expectations and the trough of disillusionment, and is now moving steadily along the slope of enlightenment (www.gartner.com/technology/research/methodologies/hype-cycle.jsp). The datafication of our world means that big data permeates not only science and engineering, but also such diverse and creative disciplines as the arts and humanities.
For this special theme issue, we have assembled a group of five articles that represent modern trends in big data by examining specific technologies and developments spanning databases, algorithms, and applications.
One of the early technologies associated with the big data revolution is NoSQL. As this part of the hype cycle passed, there was a lack of consensus about what NoSQL denotes, as the term has been used to conflate a large set of features. Jignesh M. Patel demystifies this issue in his article “Operational NoSQL Systems: What's New and What's Next?” In addition to providing clear definitions and delineations, the article also identifies promising directions of future research.
In “Renaissance in Database Management: Navigating the Landscape of Candidate Systems,” Venkat Gudivada, Dhana Rao, and Vijay V. Raghavan expand these ideas to provide a unified perspective on modern big data system architectural choices. The authors also provide a glossary of modern database management terms and concepts, and include a very useful checklist of questions to be answered when selecting a big data system design.
In their forward-looking article “Cognitive Storage for Big Data,” Giovanni Cherubini, Jens Jelitto, and Vinodh Venkatesan propose the notion of a “cognitive data storage” system. Unlike traditional data storage systems, a cognitive storage system is attuned to metrics like data value, popularity, and obsolescence. To do this, a learning algorithm classifies data into different relevance classes and works in concert with a multitiered storage architecture to customize data placement. This proposal has implications for both scientific and business applications of big data.
In “Nomadic Computing for Big Data Analytics,” Hsiang-Fu Yu, Cho-Jui Hsieh, Hyokun Yun, S.V.N. Vishwanathan, and Inderjit Dhillon abstract away the specifics of many machine-learning paradigms to define what they call a nomadic paradigm for big data analytics. They demonstrate how to organize parallel computing workloads more effectively than the prevalent MapReduce approach. This idea is illustrated using matrix completion and topic modeling, two widely used machine-learning algorithms.
Finally, Asmaa Elbadrawy, Agoritsa Polyzou, Zhiyun Ren, Mackenzie Sweeney, George Karypis, and Huzefa Rangwala focus on big data analytics in the context of another technology that has been on its own hype cycle—namely, massive open online courses (MOOCs). In “Predicting Student Performance Using Personalized Analytics,” they posit a means for predicting student retention, in-class assessment, and grade outcomes. Their results come from both traditional and MOOC course offerings from the University of Minnesota and George Mason University, and a Stanford University MOOC.
Technology sage Mark Weiser has said: “The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” In the case of big data, it is straightforward to make the case that although the hype cycle is over, the technologies themselves are no longer big news because they are widely deployed. It is not a stretch to say that to be successful today, every researcher must become a data scientist!