There is no doubt that Hadoop will continue to be a critical part of most organizations’ information architecture. Data is rapidly becoming a strategic asset to drive organizational decision-making, and organizations are increasingly using data to drive greater profitability, uncover opportunities, accelerate product and service innovation, and deliver exceptional customer experiences.
But let’s be clear, no organization is repeatably and sustainably delivering big data innovation by hand. The idea that organizations will tolerate the productivity hits and maintenance nightmares of hand-coded scripts is preposterous. Organizations that lead the pack in delivering repeatable and sustainable success recognize that automation and intelligence are critical to delivering well-managed and trustworthy analytics using Hadoop.
Guest article by Murthy Mathiprakasam, Principal Product Marketing Manager, Informatica
Big Data Should Mean Big Opportunity — not Big Liability
With the rapidly expanding volume, variety, and velocity of data, organizations struggle to quickly acquire, integrate, secure, and govern data assets to make better decisions. With big data locked up in silos or collected in diverse formats from unstructured data sources, such as web server logs, social media, and sensors, the complexity to ingest data causes extensive delays in projects. Additionally, the big data that is available is often incomplete, inconsistent, insecure, or ungoverned, which can lead to adverse outcomes.
So what has been the typical reaction to these expanding data assets and new data technologies? Usually, to throw specialized development resources at the problem. But the challenge with this approach is that there is no scale or long-term repeatability. Throwing people at the problem can be expensive and time-consuming, leading to business consumers waiting weeks to get useful information.
Technology diversity and complexity are silent killers of sustainable, well-managed data management practices, making data integration code susceptible to frequent migrations and upgrades when underlying technologies change. Well-intended reactions like throwing people at the problem can inadvertently turn big data from being a big opportunity to being a big liability.
Additionally, with increased demands for data from the business, organizations struggle between balancing the promise of greater autonomy for business analysts and data scientists, with the enterprise requirements of security and governance. While promises of greater autonomy and agility seem attractive, organizations often worry about the technical debt created by data fragmentation and taxonomy proliferation. The demands for business autonomy and self-service can inadvertently seem at odds with the requirement to ensure control and visibility.
There is no doubt that a new approach to big data management is required that simultaneously delivers trusted and timely information to business consumers, via a process that is fast, repeatable, agile, collaborative, and empowering. Automation is the key to unlocking the false tradeoff between agility/autonomy and security/governance.
Unlocking Big Value with Automation and Machine Learning
While the passive benefits of automation seem obvious, the real power of automation comes when it is powered by machine intelligence and statistical inference. The seemingly recursive use of big data analytical techniques on the big data itself unlocks opportunities that may have never been discovered through manual means.
The innovative concept here is to leverage an ontological universal metadata graph. In a world of ever-diversified datasets, context remains king. Rather than rely on manual approaches for discovering and prescribing data schemas, new machine learning techniques enable the underlying schemas and context of data to be inferred automatically. By using machine learning to intelligently understand data, relationships between datasets can be established in a metadata graph and accessed through a metadata catalog.
This universal metadata catalog now breaks the false tradeoff between agility/autonomy and security/governance. The catalog can intelligently facilitate engagement between authorized business users and trusted data assets. Even as data becomes pervasive throughout the organization, the metadata catalog provides an index to all data assets, enabling greater and more autonomous access to distributed datasets. Meanwhile, it becomes a central source of truth that can automatically identify potential threats to security and governance, and automatically remediate against potential issues.
Suit Up to Repeatably Deliver Trustworthy Big Data Analytics
Automation, powered by machine learning, is the secret to repeatably and sustainably delivering trustworthy big data analytics. Using an automated and intelligent approach, organizations can finally scale their information architectures to handle any volume of data supply or data demand. Data is undoubtedly the fuel for competitive advantage in the 21st century — and intelligence-powered automation is the key to igniting it.
Let’s be clear — analytical leverage is not going to come from hiring huge armies of specialized development teams. Big data simply cannot require big teams. The benefits of scale and repeatability require an automated and intelligent approach that enables agility and autonomy, while ensuring security and governance. This is the approach that will enable organizations to repeatably and sustainably unlock big data opportunities for profit, protection, and power.