The Modern Cloud Data Platform war — A Data Series, Case Study & Reference Architectures

Published in

Data Arena

3 min readJun 20, 2021

Case Study: Let us take a simple example, Company X’s Indonesia Branch, Source, and Ingest Data from 3 regions (of the many). The company has a central Data Warehouse and the BI and ML consumption layer is across multiple regions. They run several businesses and is an e-commerce firm — Internet Organization. The volume of the business, number of users, types of applications, sharing and use of data are discussed in the below points.

Massive data input: Data sourcing and ingestion from 100’s of places out of which the below listed 3 regions load petabytes of data as these are central partner locations.

Data fluctuations: Imagine that there is a flux of data at a different point in time. Between Jan to Jun this year, the data fluctuations was between 500 M to 900 M. With Indonesia Data Architecture relying heavily on the on-premise Data platform, such fluctuations demand scalability (on-premise = vertical scalability, mostly) warrants provisioning for a minimum of 1200 M records and should be able to readily scale anytime. We are to architect using Cloud- let us explore and research different options from Sourcing to consumption overarching Data Governance and Security and build reference architectures with different combinations.

Each bar represents **100 M records.** Image created by the author

Massive loads of Data Sharing: Another case for the same firm is that, it has to share loads of data with other organizations — say every month-end they transfer 100 PB of data over.

User Fluctuations: User access to the portal keeps fluctuating. Certain times there are defined patterns such as festive times, Covid Lockdowns, etc, certain times it goes up with major e-commerce sales days such as Amazon or local providers' sale day.

User fluctuations. Image created by the author

Machine Learning: Different types of Machine Learning algorithms run on these massive data sets from Recommendation engines to fraud detection etc.

ML & outcome Analytics. Image created by the author

Search: several GB’s of data is searched per hour. There are also fluctuations in search.

and there are few more issues and use cases that will be discussed as we progress.

What we will explore as part of this series:

The modern Cloud Data Platform stack will be explored as part of this series. Image created by the author

Summary: Modern Cloud Data Platform going strong and Private Blockchain along with the Public Blockchain is catching up big time. But, what is the right solution for my firm, what is in it for the business, how would I ensure that technology spend on such technologies can be shown as a profit centre as it brings in new avenues and opportunities to business. Loads of questions, we shall discuss it all in this series of articles in the form of case studies and reference architectures. Refer to the next part of the article — DataBricks (Part 1).

The Modern Cloud Data Platform war — A Data Series, Case Study & Reference Architectures

What we will explore as part of this series:

Written by LAKSHMI VENKATESH