Businesses have typically relied on their highly structured transnational data to gain insights about their business and make decisions For example answering questions such as how many customers bought a particular product in third quarter in the APAC region. Using this information they determined strategic activities that needs to be undertaken. This includes a broad set of decisions which can include marketing activities, promotional activities, sales training, etc.
In order to facilitate business analyst to make decisions several technologies, processes, human roles in the organization have to come together. All this have been well established over the past years and this will continue to be used for a foreseeable future.
I have used Ralph Kimball’s insights to help me better understand the various aspects of building a data warehouse. I found Dr. Kimball’s approach on this subject very intuitive and prescriptive for building a successful data warehouse implementation. Any one who has worked with data warehousing project knows that there are several groups or organizations involved in establishing a BI solution. The key among them are:
- Data warehouse modelling (Star Schema)
- OLAP Relational Database warehouse implementer and DBA
- BI Modeler
- BI Application developer
- Report Builder
- Performance team for analyzing ad-hoc queries and creating Materialized views.
- OLAP cube modeler and implementer
I think I have covered most of the roles, as you can see setting up a data warehouse is a complex task. The business analyst provides the requirements which then is translated as work items for the various groups. For the analyst to see the first report all the above groups need to align their deliverables.
Dr. Kimball covers concepts, normalizes terminology and establishes a well defined process and ground rules for setting up the ETL system – This is referred to as 34 sub-systems of ETL. While the title says ETL it covers all aspects of designing a data warehouse listed above. This truly a comprehensive view and walking through each sub-subsystem gives deep insights into the requirements.
I have used this methodology to evaluate Oracle products in the structured analytics area which include
- Oracle GoldenGate
- Oracle Data Integrator
- Oracle Enterprise Data Quality
- Oracle OLAP database
- Oracle Analytic workspace manager
- Oracle BI (OBIEE)
While the above process very well for structured data, more specifically, relational data, when we look into requirements for Big Data we need to carefully evaluate what aspects of the above process apply and how do we operationalize it.
Looking at the present and the future, the above process and technology that got matured over the years for structured data will continue. I have seen scenarios where the data warehouse models (star schema) have been represented in Hive and customers ETL their structured data into Hive and perform aggregations. The product Oracle Data Integrator has the capabilities to support this scenario.
The promise of Big Data extends well beyond the realms of structured data and includes forms of data that is typically not considered by traditional BI systems. Also, unlike the traditional BI, the dynamism required both in preparing and analyzing is much higher when working with Big Data. The following table provides some of key differences between traditional BI systems and Big Data BI systems. In my following posts I will delve into various aspects of Big Data analytics. Keep tuned….