Monthly Archives: August 2014

Data preparation – Normalization subsystem – Clustering Text using Fingerprinting

In this blog I will examine the normalization sub-system which is one of the sub-systems I called in my earlier blog – Data Preparation Sub-Systems. A key objective of this step is to ensure the data consistency.  For example, when working … Continue reading

Posted in Data Management and Analytics | Tagged , | Leave a comment

Ensuring data consistency between cloud and on-premises

Enterprises today have greater flexibility in determining whether investing in applications, platforms and infrastructure should be a capital expenditure or operational expenditure or both.  As such enterprises are increasingly using a mix of public cloud, private cloud and on-premises strategy … Continue reading

Posted in Data Management and Analytics | Leave a comment

Graph Computation and Analytics

The Graph APIs (BluePrint, Jena, SAIL)  discussed in my post Manipulating Graph are good for creating and updating the graph databases (Property Graph and RDFs).  At a level higher than the Graph API’s, technology such as Gremlin (or Cypher for Neo4J) which … Continue reading

Posted in Topics related to Graph Databases and Compute, Linked Data (RDF) | Tagged , , | Leave a comment

Manipulating Graphs

In my earlier post on Graph Database I discussed two types of Graph Models – Property Graph Model and RDF Model. In this post I will discuss the APIs standards that are available for manipulating graph model. Property Graph API … Continue reading

Posted in Topics related to Graph Databases and Compute, Linked Data (RDF) | Tagged , , | Leave a comment

RDF Serialization and Triplestores

In my earlier post  on Graph Databases I introduce RDF.  Briefly, RDF  is a language for expressing data models using statements expressed as triples. Each statement is composed of a subject, a predicate, and an object. RDF adds several important concepts … Continue reading

Posted in Topics related to Graph Databases and Compute, Linked Data (RDF) | Tagged , | Leave a comment

Data Preparation for Batch and Real-time data

In this blog post I will discuss the role of data preparation when working with Batch data set or Real-time data sets. Irrespective of whether the analysis of data is happening real-time or in batch some aspects of data preparation … Continue reading

Posted in Data Management and Analytics | Leave a comment

Data Preparation Platform for Big Data

Before discussing the data preparation platform  for Big Data lets look at the some of the requirements – Since there is no apriori knowledge of the data content, the data preparation process is highly interactive and a visual process. Getting a … Continue reading

Posted in Data Management and Analytics | Leave a comment