No Batteries Required - Home
Business Intelligence and Big Data - Part II
Ray Kahn
FEB 25, 2013 09:20 AM
A+ A A-

In Part I I discussed what Business Intelligence (BI) and Big Data are and why BI is important. In this section I will discuss the challenges faced by IT departments when designing a BI solution and working with Big Data.

This Isn’t Your Grandfather’s IT

Business Intelligence and Big Data require a whole host of technologies and resources that IT departments may lack since most Information Systems are designed to deal with structured data that fit nicely in conventional RDBMS. The challenges of working with Big Data – unstructured data, volume, speed and quality – necessitates the use of distributed parallel processing, NoSQL databases, data stream processing, data scientists and very large data repository systems like EMC Isilon.

Apache Hadoop software library is an open source framework that allows for distributed parallel “processing of large data sets across clusters of computers using simple programming models”. Hadoop “provides a distributed filesystem that can store data across thousands of servers, and a means of running work (Map/Reduce jobs) across those machines, running the work near the data.” In other words Map/Reduce model breaks your Big Data into smaller data sets (key/value pairs) across clusters of computers and executes user defined jobs (functions) on those sets. The end result of the Map/Reduce model is a smaller and typically different set of key/value pairs.

Because the end result of Map/Reduce model is still a very large set of key/value pairs NoSQL database management systems are the ideal choice for querying these sets. NoSQL databases, unlike RDBMS, are optimized for retrieval and appending operations on key/value pair datasets, with not much other functionality. NoSQL database systems are not concerned with relationship between elements either; they usually don’t use tables and do not use SQL for data mining purposes. An example of a NoSQL database system is MongoDB which is an open source project.

S4 (Simple Scalable Streaming System) is an open source platform that allows for development of applications to process continuous streams of data (it is only one of the available frameworks. See Storm for additional information) that can be integrated with Hadoop to create real time BI applications. Internet has altered our perception of what is an acceptable service delivery time. In this day and age the speed at which services are delivered creates competitive advantage for organizations. Speed becomes even more important in the context of BI: recall that BI is intended to provide a longer “decision window” to executives by providing relevant information quickly. Jobs that used to take days now take hours to complete and as a result executives and managers can make informed decisions quicker.

There is a really good article by DJ Patil about data scientists and their role in BI and Big Data processing. I’ll just summarize his points here (but highly recommend that you read it): data scientists have technical expertise in some scientific field, are extremely curios, use data to tell a story and think out-of-box. They are ideal people to tackle the Big Data challenge since they have a passion for working with lots of data.   

BI Best Practices

The vast majority of literature on BI focuses on its intangible benefits such as alignment of goals, process improvement and strategic advantage. For BI to achieve a widespread acceptance within an organization it must also demonstrate that it can provide tangible and short term benefits as well. But that is easier said than done. Since BI initiatives require large capital outlays such efforts need to have a solid foundation before they are lunched. There are steps that organizations can take to ensure a successful BI solution:

  • Use Good Data: As I mentioned earlier data quality matters greatly and will make a difference between success and failure of an organization's BI solution.
  • Educate: Train your employees on BI tools and the importance and meaning of data. Educate them on how analytics can improve decision making.
  • Evolve: BI projects are iterative and the right tools can only be developed over time. The reason is data's meaning changes in different situations.
  • Engage: BI is NOT an IT initiative; it is collaboration between business and IT. Make sure you have the full support of managers and executives and that they understand the efforts and resources needed for a BI project.
  • Define: This is perhaps the most important part of any BI project: what problem are you trying to solve? Be clear from the onset.
  • Self-Service: Make your BI tools simple and easy to use. This should be no brainer since IT will have to spend less time addressing managers’ unanticipated needs.
  • Measure: Make sure performance metrics (KPIs) are clearly defined and in place to measure the effectiveness of your BI tools.

Conclusion

Business Intelligence provides visualization of right information at the right time to business decision makers. In this fast paced and fast changing business climate having time critical information has become even more important. Organizations will need to invest heavily in solutions that give them strategic advantage over their competitors. Although BI is not a cure-all but done right it may be the tool which could make a difference between market dominance and playing catch.

Business Intelligence projects are not cheap. That’s why IT must make sure that executives at the highest levels are informed about the costs and understand the risks and benefits involved.

As always document your work, it may save your career one day. 

 

Ray Kahn

Director of Information Technology and Services

IEEE Computer Society

FIRST
PREV
NEXT
LAST
Page(s):
[%= name %]
[%= createDate %]
[%= comment %]
Share this:
Please login to enter a comment:
 
RESET