Data Technology Trend #5: Democratized

LAKSHMI VENKATESH
4 min readJun 13, 2021

This article is a part of a multi-part series Data Technology Trends (parent article). Previous article — Data Technology Trend #4: Decentralized (part 1) and next article — Data Technology Trend #4: Decentralized (part 2).

Democratization of Data begins with having a strong function of DataOps in the enterprise.

Trend #5.1: DataOps

While CI/CD, the DevOps process takes care of the Develop -> Build -> Test -> (auto or not) Deploy -> and Run, once the application is deployed it “runs” till the next change is to be deployed. There can be as many minor and major releases to the code, yet till the time the code is deployed, it is not impacted. Whereas for data, the change in the data is a continuous process — the goalpost keeps moving. As the data gets created, the changes to the data (except for immutable data) keep happening. An efficient Data Ops is imperative. Depends on the organization’s culture and size, the data ops can be a bunch of reports and monitoring tools. For other bigger organizations or internet enterprises, a separate Data Ops routine must be created and a team should be built to address this function. Many firms take DevOps and/or DevSecOps seriously, but DataOps is left to business and this lies in the framework or documentation than considering it as a real “function”. With the increasing Cloud Adoption and usage of Modern Data Platforms, DataOps is not optional but a necessary function.

Difference between DevOps and DataOps Process:

As we talk about DevOps and DataOps, DataOps is the logical extension of DevOps:

Agile + Dev Ops + Lean Manufacturing = DataOps

While DevOps and DataOps are both high on the value pipeline, DataOps is the core and essential for Innovation Pipeline. Hence, DataOps is just not DevOps for Data as it is popularly called.

Source unknown.

Your customer success is a function of data and code.

So, to reiterate, what is DataOps? DataOps is a collaborative data management practice to

- Improve communication

- Integration

- Automation of Data flows.

Few of the interesting notes and developments in the area of DataOps:

1. DataKitchen: https://www.dataopsmanifesto.org/

Increasingly the core purpose of data gather/store, building data warehousing, data management, building a big data system is to come up with “value” analytics.

The DataKitchen through its first-hand experience of working with data across organizations, has come up with a better way to develop and deliver analytics that encompasses the entire cycle of data and they call it “DataOps”.

The 18-point DataOps manifesto says:

The eight challenges of Data Analytics according to DataKitchen:

1. The goalpost keeps moving

2. Data lives in silos

3. Data formats are not optimized

4. Data errors

5. Bad data ruins good reports

6. Data pipeline maintenance never ends

7. Manual process fatigue

8. The trap of hope and heroism

Seven steps to DataOps:

1. Add Data to Logical tests

2. Use a Version control system

3. Branch and Merge

4. Use multiple environments

5. Bad data ruins good reports

6. Parameterize your processing

7. Work without fear or heroism

Refer to book by DataKitchen on DataOps.

Another interesting read: https://www.dataops.rocks/

Trend #5.2: MLOps / AIOps

MLOps stands for Machine Learning Operations. AIOps which is much broader is Artificial Intelligence Operations. Even though Machine Learning and AI are matured there is a staggering adoption due to the governance.

MLOps Vs. AIOps:

MLOps and AIOps are not the same thing.

AIOPS:

AIOPS

Where disparate and hybrid infrastructure is involved in bigger organizations, AIOps enables to optimize the infrastructure.

MLOPS:

It is the communication between data scientists and operations teams. MLOps have mixed data scientists and services designed to provide automation in ML pipelines and get more precious insights in production systems. It provides reproducibility, visibility, managed access control, and the computing resources to test, train, and deploy AI algorithms to Data engineers, business analysts, and operations teams. The new chasm of the development process of machine learning involves the collaboration of four major disciplines — Data Science, Data Engineering, Software Engineering, and traditional DevOps. These four disciplines have their level of operations and their requirements with different constraints and velocity.

Another technology that is included as part of this trend, which discussed in several trends is Data Catalog. MLOps can be effectively implemented using DataBricks MLFlow.

Further references:

  1. https://www.rackspace.com/solve/aiops-and-mlops-not-same-thing

For other articles refer to luxananda.medium.com.

--

--

LAKSHMI VENKATESH

I learn by Writing; Data, AI, Cloud and Technology. All the views expressed here are my own views and does not represent views of my firm that I work for.