Graph Computation and Analytics

The Graph APIs (BluePrint, Jena, SAIL)  discussed in my post Manipulating Graph are good for creating and updating the graph databases (Property Graph and RDFs).  At a level higher than the Graph API’s, technology such as Gremlin (or Cypher for Neo4J) which is considered a domain specific language (DSL) can be used for creating graph analytical applications.  Several Graph Algorithms (e.g. Ranking and Centrality algorithms) can be implemented using Gremlin. Gremlin is built on Tinkerpop2.x BluePrint API.

The efficiency, performance and features of using DSL such as Gremlin which operates directly on the graph storage layer can be limiting in functionality and performance especially when working large graphs. As such we need to use specialized Graph Processing system.

In this post I will discuss 3 types graph compute systems which I got exposed during my evaluation of  graph computation engines for my project.

Green-Marl

Green-Marl is a domain-specific language (DSL) for graph data analysis originated at Stanford. Green-Marl allows the users to describe their algorithms in intuitive ways while the performance is delivered by the compiler. In specific, the compiler translates the given DSL program into an equivalent, parallelized, high-performing program written in a general purpose language.

Currently, their compiler can produce parallel C++ code targeting multi-core/multi-socket shared-memory environment also it can generate Java code with Map-Reduce like framework, targeting distributed execution. Green-Marl claims using their DSL is intuitive, concise and improves productivity.

Green-Marl Process

  • Write code in GM DSL
  • Compile
  • Invoke the C++ or Java code generated by GM
  • GM provides loaders for loading persisted graphs into memory for processing.

GraphLab

GraphLab  is a graph parallel system that enables advanced analytics and machine learning on graphs. Graph parallel systems (GraphLab, Pregel)) address the drawback of Data Parallel system (e.g. Hadoop) when performing computations on a Graph.  They are specialized graph systems with APIs to capture complex graph dependencies and exploit graph structure to reduce communication and facilitate parallel computations.  Graph Parallel system reduces both the resource and time required to perform graph analytics.  The following table lists a comparison between GraphLab versus Hadoop when doing Triangle Counting on Twitter (40 Million users and 1.4 Billion links – Info obtained from GraphLab presentation)

Capture

 

Looking at GraphLab API it supports loading Graph from a text file or previously saved graph binary file. he text file is typically generated by an ETL process and which generates content in a format suitable for GraphLab to use.

GraphLab shrinks the amount of resource and time for graph computation by orders of magnitude compared to Graph algorithms written on Hadoop.  However it does not address the big picture of data processing pipeline which includes Graph Creation and Post Processing.

GraphX

GraphX is the Spark API which combines both data parallel and graph-parallel computation.  GraphX addresses the big picture of data processing which includes Graph creation, Computation and Post Processing.

The goal of the GraphX project is to unify graph-parallel and data-parallel computation in one system with a single composable API. The GraphX API enables users to view data both as a graph and as collections (i.e., RDDs) without data movement or duplication. By incorporating recent advances in graph-parallel systems, GraphX is able to optimize the execution of graph operations.

datapreparation

 

Share your favorite graph compute system  and use case !

Advertisements

About atiru

Product Strategist and architect for harnessing value from data.
This entry was posted in Topics related to Graph Databases and Compute, Linked Data (RDF) and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s