One of the most compelling aspects of the current big data Gold Rush is the race to develop tools to help data scientists process data faster and easier. By 2014, according to Gartner, 30 percent of analytic applications will use predictive capabilities. Forecasting, targeting, fraud detection, customer churn, and price elasticity are some of the more useful applications.
It currently can take weeks or months to process large data sets. The dream is that tools will shrink the processing time down to days.
“Data science is too complex today. Things must get simpler or it won’t scale,” said Roger Barga, group program manager for Microsoft’s Azure Data Platform. “When I talk to data scientists, I ask them how long it takes to deal with a data set. It can take days or weeks. This will be a game changer for who can do it and how quickly they can do it.”
Barga advised those at the recent Microsoft Research Faculty Summit, which focused on big data, that if they’re building a tool to support data scientists, understanding the workflow is key. The first step in the process is defining the goal, followed by collecting and managing data, building the model, evaluating and critiquing the model, presenting the results, then deploying the model.
“Enterprises say they need an end-to-end solution with support for collaboration, lineage tracking, archive for predictive models, and support for search and discovery,” he said. “There’s an incredible breadth of applications,” said Barga. “If you can predict it, you can own it.”
There is also an accompanying explosion of big data programs. According to Barga, there are now more than 100 data science programs, in comparison to less than a half-dozen several years ago. Typical courses include Introduction to Data Science, Hadoop, and Building Predictive Models.
“This is where the next generation of data scientists will come from,” he said, adding that “the barrier to entry here is low if you have access to the right tools.”
Another thing that’s changed is that five years ago, typically only companies with 500 or more employees had a data scientist on staff. Now, he said, “the third or fourth hire is a data scientist.”
Companies are also trying to take advantage of the interest in big data by offering data as an additional product line. For example, he said, Rolls-Royce, by adding sensors to its jet engines, is able to offer airlines and others information on fuel usage and other flight characteristics.