Data management in the era of Big Data

I have spent  number of years working on platforms and framework  related to data management, data warehousing and  business intelligence.  Using these technologies enabled businesses develop and implement strategies for an effective data management and data analytics, particularly relational data.

Data management is a key aspect any business which includes – Providing capabilities for reliably gathering and sharing data,  ensuring consistency across the enterprise,  providing durability for proper governance and finally securing data.

Typically an enterprise customers use database(s)  which ensures ACID compliance for the data being collected.  Be it a retail, financial institutions or any other vertical,  having a reliable, durable mechanism  for  capturing  and persisting the data is of paramount important.  A vast majority of the enterprise use relational databases such as Oracle or MySql. This is followed by a plethora of NoSQL databases which offer ACID compliance such as Neo4J,  CouchDB, MongoDB etc.

The enterprise class databases come with capabilities for scaling the database, providing high availability, maintaining a physical standby and taking backups.

Taking Oracle database as an example,  it provides RAC  for horizontally scaling the number of user sessions  and  providing high availability usage of the database. Oracle database also provides Data Guard for maintaining secondary standby databases (physical standbys) as alternative/supplementary repositories to production “primary databases”.  The secondary standby databases are used during  fail over or switch over scenarios thus seamlessly allowing the user to using the database with minimal or no interruption. Oracle RMAN is used for taking scheduled backups.

As mentioned above one of the keys aspect of data management is making available data for analysis for enabling better understanding the business. This implies that the data businesses captures through their transaction systems will need to analyzed.

A key requirement for analyzing the data is making available the transactions captured in production databases to a downstream database which is also referred to as a logical standby for downstream analysis.

Oracle GoldenGate enables replicating transactions in real-time from production databases into logical standby database for downstream analysis with minimal impact on the production databases. Oracle GoldenGate can replicate between heterogeneous databases and supports a variety of  commonly used databases.

In the era of Big Data, where businesses are adopting Big Data technologies such as Hadoop, Cassandra, MongoDB and other so called NoSQL technologies management of data across these system becomes important. For example business can capture transactions in their relational databases and wants mash up this data with other data on Hadoop.

Since relational data is a key component for Big Data analysis, making it available on these systems is important.  Building on the advantages on using Oracle GoldenGate, it can be extended to making available transactions on Big Data systems.   I have written a number of howtos and blogs  on using GoldenGate which illustrates streaming transactions into Big Data systems. They can be accessed at –

It would be great if you can share your use case on using relational data with Big Data.


About atiru

Product Strategist and architect for harnessing value from data.
This entry was posted in Data Management and Analytics. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s