Facebooktwittergoogle_plusredditlinkedintumblr

By Lori Cameron

Data analytics show no sign of stopping.

Big data expert Bernard Marr recently wrote in Forbes, “There’s absolutely no question that we will continue generating larger and larger volumes of data, especially considering that the number of handheld devices and Internet-connected devices is expected to grow exponentially.”

Data analytics takes enormous quantities of information about everything from traffic management and fraud detection to disease outbreak and natural disaster and tries to make sense of it. Cities, disaster relief agencies, doctors, and businesses rely heavily on it. Moreover, the more critical the situation, the faster the data analysis is needed, which is why edge devices are so crucial.

Analyzing big data is possible due to “improved analytical capabilities, increased access to different data sources, and cheaper and improved computing power in the form of cloud computing,” write Newcastle University researcher Rajiv Ranjan and his colleagues in their article “Orchestrating BigData Analysis Workflows,” published in the May/June 2017 issue of IEEE Cloud Computing.

But there is a crisis.

big data chart

Mapping of high level workflow activities of Real-Time Flood Modelling application to programming frameworks and cloud datacenter and/or Edge resources. The workflow orchestration is a cross-cutting issue as it spans across all the layers (analysis activities, programming framework, and datacenters).

Until now, no workflow requirements exist for big data performance models that can handle dynamic and time-sensitive data analysis. Customers demand real-time, highly accurate, streaming analysis of workflow data so they can make rapid business decisions or respond to a disaster. Edge devices solve the problem of latency, but, when big data owners try to scale up or down, it’s a mess. Some high profile platforms can manage data just fine from data centers but run into trouble with edge devices.

“Platforms such as Apache YARN, Apache Mesos, Amazon IoT and Google Cloud Dataflow that can support script-based composition of heterogeneous analytic activities on cloud datacenter resources cannot deal with edge resources,” write Ranjan and his colleagues.

Furthermore, while current techniques allow dynamic reconfiguration of interactive multi-tier web applications, they do a lousy job of predicting data flow metrics.

“BigData workflows are fundamentally different from multi-tier web applications. To make dynamic reconfiguration in the execution of BigData workflow applications, their run-time resource requirements and data flow changes needs to be predicted including any possible failure occurrence,” the authors added.

So what is the solution?

Big data needs to up its game.

“The research community must aim to design new frameworks and novel platforms and techniques that enable decision making by allowing the orchestration of their execution in a seamless manner allowing dynamic resource reconfiguration at runtime,” the authors say.

“One of the myths is that BigData analysis is driven purely by the innovation of new data mining and machine learning algorithms. While innovation of new data mining and machine learning algorithms is critical, this is only one aspect of producing BigData analysis solutions,” say the authors.

In other words, the demand for real-time insights into data will be the norm within the next five years. As big data explodes, algorithm markets surge, and subsequent staffing shortages abound, data analysts will need to work hard to keep up.

The other authors of the study are Saurabh Garg of University of Tasmania; Ali Reza Khoskbar of Australian National University; Ellis Solaiman of Newcastle University; Philip James of Newcastle University; and Dimitrios Georgakopoulos of Swinburne University of Technology.

 

Related research on big data analytics in the Computer Society Digital Library: