Both from a consumer and a producer perspective the decision to go cloud or not is an important and sometimes one. The following are some points to consider when making the decision.
- Data locality – Where is the majority of the data being collected that needs to be prepared ? High speed networks, convenient reliable and durable storage capabilities on the cloud, strategies to copy across the WAN are minimizing the effects of data locality. I came across a product called WANDISCO which enables bi-replication for HDFS, HBase between two data centers. If any one reading this blog is using this send me your experience and use case !
- Real-time decisions – If decisions needs to be made in real – time the then preparation needs to happen close to where the data is gathered.
- Complementary Services – Typically, Data Preparation is part of a value chain that sits between sources of data and processed data consumers such as BI systems, Discovery systems, Graph DBs, Analytic Applications etc. The source data system could include applications (CRM, HCM, SCM etc), applications logs etc. Need to evaluate the optimal location for data preparation based on the locality upstream and downstream application in the value chain.
- Security – Cloud services provide good support for security – Encryption, access control, data isolation However, on-premises business which are primarily focused on security, strict data governance will need run though their security checklist if moving data to the cloud for data preparation and other downstream services is the right approach.
- Business Reasons – Strategic decisions demand moving to the cloud or staying on-premises. Global businesses may find moving to the cloud a strategic investment in the long run. Small and medium business may find cloud as an economical alternative and faster to market strategy.
Where would you do your data preparation and why ?