This article is a part of a multi-part series Data Technology Trends (parent article). Previous article — Data Technology Trend #3: Accelerated and next article Data Technology Trend #5: Democratized. Next part of this article — Data Technology Trend #4: Decentralized (part 2).
Decentralization of Data
Trend #4.1: Data Mesh
Move beyond a monolithic Data lake to a distributed Data Mesh. Data Mesh is not a technology but a technology paradigm or concept. Unlike traditional monolithic data infrastructures that handle the ETL in one central data lake, a data mesh supports distributed, domain-specific data consumers and views “data-as-a-product,” with each domain handling its own data pipelines. Underlying the data mesh is a standardized layer of observability and governance that ensures data is reliable and trustworthy at all times. Domain-based decentralized ETL and have DWH on top of it.
The data is divided into (1) Operational Data Plane — Running the business (2) Analytical Data Plane — Optimizing the business.
Four principles of Data Mesh:
1. Domain-Driven Data Ownership architecture
2. Federated computational governance
3. Self-service infrastructure as a platform
4. Data as a product
What is Data Mesh:
Data Mesh is a Technology paradigm by Thought Works. Contrary to building a single source of data either using Data Lake or any other monolithic architecture and introducing a massive change, Data Mesh is an architecture discipline / paradigm that works on the different disparate systems as it is, puts a layer on top of it to gather data for analytics.
In the backend / front end application stack, there is the concept of microservices. While every company is a data company, at the underlying technology level containers were introduced and a pseudo-microservices deployment was implemented. With Data Mesh, the gap of not having Microservices is kind of filled. This enables you to have a clear demarcation of data and a complete distributed architecture yet able to interact and gain meaningful insights from the data as a whole. This also helps you to build “Data-As-A-Product” or “Data-As-A-Service”. To understand if your organization needs Data Mesh or not depends on the Data source, data preparation, size of the data, variability of the data, whether you want to build a data product or service, etc., it auto implements data governance and building a self-serviceable analytics platform becomes easy.
How does it work:
Sample technology: Trustgrid DataMesh Platform
Refer: Martin Fowler’s “How to move beyond Monolithic Data Lake to distributed Data Mesh”
What issue does it solve:
Data Mesh platform is an intentionally designed distributed data architecture — under centralized governance and standardization for interoperability. It is not always possible to completely switch from the current decentralized architecture to a central/monolithic Data Lake. Data Mesh fixes this gap.
Trend #4.2: Data Fabric
What is Data Fabric:
Simply put, a data fabric is an integration platform that connects all data together with data management services, enabling companies to access and utilize data from multiple sources. With data fabric, companies can design, collaborate, transform, and manage data regardless of where it resides or is generated. According to NetApp, by simplifying and integrating data management across on-premises and cloud environments, data fabric allows companies to speed up digital transformation.
What is it? Data fabric architecture is a means of supporting “frictionless access and sharing of data in a distributed network environment.”
Enabling Data Fabric is a Data exchange platform is the ultimate goal.
So what is a data fabric? According to this article, Data Fabric provides a catalog of consistent data services across private and public clouds. This explanation says enterprises have grappled with the problem of the integration of their entire data sets into a single platform. A data fabric simply describes a comprehensive way to make that goal.
How it works:
Worlds collide and Data Fabric provides integration possibility.
i. Cloud / On-prem
ii. Relational / Non-Relational
iii. Data & Analytics
1. Data Fabric is an architecture and set of data services that provide consistent capability across a choice of endpoints spanning on-premises, multiple cloud environments, and different database technologies. Delivers consistent integrated hybrid cloud platform.
2. Data Lake vs Data Fabric
a. Data Lake is where data goes.
b. Data. Fabric is orchestration policies to be put on top of it. Simply put it is a way to make sense of all the data across the organization — weave together.
3. Data Fabric is a concept and not a technology by itself and is more important as the data sources are extremely diverse and disparate. Integrating data and able to eliminate silos becomes imperative. With the variety of data sources and cloud storage and various technologies including but not limited to IoT, Management, Security, reliability, consistency. Must be maintained.
4. Key features
§ Clean, compliant, and accessible data for everyone.
§ Talend provides a single, unified platform for data integration, data integrity, data governance, and real-time data delivery.
§ Connect any data source to any destination
§ Deploy in the cloud, on-premises, or in a hybrid configuration
§ Lower data integration costs
§ Reduce data governance compliance time
§ With Talend behind your data, you can make smarter decisions, drive innovation, and improve operations.
Solutions: HPE Data Fabric, Talend, Tibco
Solution: Altair Engineering
What problem does it solve:
Data Fabric weaves all the data sources across the organization be it in on-premise or cloud together.