Big Data Analytics: Outsource or In-House?
Dejan Milojicic
JUL 17, 2015 13:24 PM
A+ A A-
Big Data Analytics: Outsource or In-House?
by Dejan Milojicic, 2014 President of IEEE Computer Society, and Sr. Research Manager and Scientist, HP Labs
According to IDC, the 1.8 zettabytes – that’s 1.8 trillion gigabytes – of information created last year will grow by a factor of nine over the next five years. While the storage of massive amounts of data on big computers is not a new idea, what has changed is the need and expectation of mining that data for decision support. That’s what we call big data analytics and all experts agree that the ability to analyze big data will be the difference between success and failure in almost every type of business in the coming years.
Big data analytics has been and will continue to be made possible by three significant trends –
  • The growth of nonvolatile memory – replacing disks and DRAM
  • The introduction of photonics and improvements in interconnects – replacing copper cabling, thereby reducing space and power requirements
  • Advances in systems on a chip – also reducing power and footprint
So how do companies today make the leap to light speed and become big data analyzers? Do they go outside and hire data analysis consultants or try to develop the capability in-house? The fundamental question must be “how business critical is the data?” If the data is essential to the company’s business survival, it should be kept in-house. Other analytics can be outsourced. But notice, that outsourcing suppliers shouldn’t just be warehousing. Whether business critical or support, the data must be analyzed.
We should take a step back and say that in analytics, there are two classes of data. There’s the information that can be determined over time, and there’s the data that must be analyzed and mined in near real time. The first class is a backend operation which allows for deep dives into long-term analysis and business processing. This class of analysis was facilitated by technological developments like Hadoop and MapReduce, which make it possible to scale information and distribute it to a large number of commodity processors. These operations include built-in triple redundancy for security. They are suited for running in the cloud.
Decision support, on the other hand, is a near real-time operation similar to streaming. It manipulates far smaller amounts of data and is facilitated by such technologies as open source Storm, Spark and Flink which support real time and stream processing. Analysis that used to take days to perform, with faster processing was reduced to hours, and is now expected in minutes.
So imagine a startup company that, let’s say, plans to consolidate all the most interesting news from around the world into a single newspaper. The masses of data they would scrape from news sources and social media would in fact be their “product.” This data would need to be analyzed in near real time with alerts for certain key words and topics to create the news publication. This function should – no, must – be performed in-house. This essential data must be close by, accessible, manipulatable, secure. At the same time, the background IT that supports the website could be stored and analyzed in the cloud by an outsourced provider. Business criticality is the key.
[%= name %]
[%= createDate %]
[%= comment %]
Share this:
Please login to enter a comment:

Computing Now Blogs
Business Intelligence
by Drew Hendricks
by Keith Peterson
Cloud Computing
A Cloud Blog: by Irena Bojanova
The Clear Cloud: by STC Cloud Computing
Computing Careers: by Lori Cameron
Display Technologies
Enterprise Solutions
Enterprise Thinking: by Josh Greenbaum
Healthcare Technologies
The Doctor Is In: Dr. Keith W. Vrbicky
Heterogeneous Systems
Hot Topics
NealNotes: by Neal Leavitt
Industry Trends
The Robotics Report: by Jeff Debrosse
Internet Of Things
Sensing IoT: by Irena Bojanova