Bring DevOps To Data Science With MLOps

20 July 2024

1

This article was published as a part of the Data Science Blogathon.

MLOps is the intersection of Machine Learning, DevOps and Data Engineering.

Introduction:

As the data science and machine learning models are becoming more capable of solving complex business problems, it is evident that many businesses continue to invest in building their capabilities in this field to deliver business value to their users. This trend is also driven by the fact that managing resources such as huge data, infrastructure, compute power, pretrained models, etc have become very inexpensive and on-demand which has enabled teams to go from prototype to production in a very short time.

However, the businesses have also realized that the real challenge isn’t in building machine learning models but on the operational side of things, especially in production. How do we ensure the different pieces of functionalities built by various teams integrate seamlessly? How do make sure that the model in production is not drifting? How do we automatically validate the data from various sources and check for standards? How do we refresh models in production on the fly?

These and many more such questions are not new, and we have seen similar stuff on the typical web development over the years. There have been various methodologies and approaches developed and one such thing is DevOps which has to a large extent addressed these challenges and is the norm currently. So, is there a way to leverage its power in data science?. Let’s explore.

DevOps Lifecycle:

DevOps refers to a software development method and a collaborative way of developing and deploying software. It is a practice that allows a single team to manage the entire application development life cycle, that is, development, testing, deployment, operations. DevOps helps to establish cross-functional teams that share responsibility for maintaining the system that runs the software and prepares the software to run on that system with increased quality feedback and automation issues. At a high level, there are different stages of the DevOps lifecycle.

Here is how all the above components come together to create a seamless development, integration, testing and deployment.

Source: Edureka

Data Science Lifecycle:

Similar to typical Software Development Lifecycle (SDLC), most data science projects do have a process that outlines the major stages of execution. The lifecycle is not linear meaning each stage may undergo multiple iterations till we reach satisfying results which are acceptable technically and by the business. The various stages of the lifecycle can be summarized below.

Machine Learning Operations (MLOps):

Now, that we have a fair understanding of both DevOps and the Data science lifecycle, is there a way to leverage the powerful features of DevOps like automation, workflows, and test automation in data science projects?

Certainly, yes and that’s what we will explore in the next sections where we will bring these approaches together. In today’s world, this is called Machine Learning Operations (MLOps) and in short, it is the intersection of Machine Learning, DevOps, and Data Engineering. Let’s understand this better with an example.

Any Prerequisites?

To understand MLOps, you will need very basic knowledge of model building in python and a GitHub account. I will be using Visual Studio Code as editor; you can use any editor of your choice as long as you are comfortable with it.

Getting Started:

We will be using Kaggle’s South Africa Heart Disease dataset. Here is the data dictionary for reference. Our objective is to predict chd i.e., coronary heart disease (yes=1 or no=0). A simple binary classification model.

sbp: systolic blood pressure
tobacco: cumulative tobacco (kg)
ldl: low-density lipoprotein cholesterol
adiposity:
famhist: family history of heart disease (Present=1, Absent=0)
typea: type-A behavior
obesity
alcohol: current alcohol consumption
age: age at onset
chd: coronary heart disease (yes=1 or no=0)

Let’s load the dataset and take a quick look at data and some basic stats. We will also drop ‘famhist‘ variable for now and will experiment with it at a later stage

Python Code:

Bring DevOps To Data Science With MLOps

Introduction:

DevOps Lifecycle:

Data Science Lifecycle:

Machine Learning Operations (MLOps):

Any Prerequisites?

Getting Started:

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Recent Comments

EDITOR PICKS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR POSTS

Verizon will basically pay you to buy the new, awesome Barbie phone

8 Best VPNs for Apple TV in 2024: Fast & Secure by Penka Hristovska

Samsung offers free screen replacements for users still suffering green line issues

POPULAR CATEGORY

ABOUT US

FOLLOW US