MLOps - Continuous Training Pipeline by Sunse KwonMLOps - Continuous Training Pipeline by Sunse Kwon

MLOps - Continuous Training Pipeline

Sunse Kwon

Completed work

Data Engineer

Data Scientist

ML Engineer

scikit-learn

Artificial Intelligence

MLOps - Continuous Training Pipeline

This Project showcase Automation of ML workflow from scratch.

ETL pipeline from API to Master tables.

Feature Engineering Pipeline.

Automated ML model training pipeline.

Continuous Training: Transition staging model to production model, followed by triggering github action to deploy model to AWS Sagemaker Endpoint.

MLOps Level 1 - System Architecture

Check project repository

Technologies Used

Apache Airflow: Workflow orchestration

MLflow: Model tracking and model lifecycle management

AWS SageMaker: Model deployment

Docker: Containerization

GitHub Actions: CI/CD pipeline

Tutorial

Watch the step-by-step tutorial on YouTube: MLOps Level 1 - Continuous Training Pipeline Tutorial

Prerequisites

Python 3.12+

Docker and Docker Compose

AWS Account with SageMaker access

API credentials for weather data

Git

Project Structure

.
├── dags/              # Airflow DAGs for workflow orchestration
├── mlflow/            # MLflow tracking and model management
├── config/            # Configuration files
├── plugins/           # Custom Airflow plugins
├── images/            # Documentation images
└── requirements.txt   # Python dependencies

Pipeline Components

1. ETL Pipeline

Fetches weather data from API

Transforms and loads data into master tables

Handles data validation and error checking

2. Feature Engineering

Processes raw data into model features

Handles missing values and outliers

Creates time-based features

3. Model Training

Automated training pipeline

Model versioning with MLflow

Performance tracking and comparison

4. Model Deployment

GitHub Actions workflow for automated deployment: deploy-model-sagemaker

Monitoring and Maintenance

Monitor pipeline health through Airflow UI

Track model performance in MLflow

Set up alerts for pipeline failures

Regular model retraining based on date

Troubleshooting

Common issues and solutions:

Pipeline failures

Check Airflow logs

Verify API connectivity

Ensure AWS credentials are valid

Model deployment issues

Verify SageMaker endpoint status

Check model artifacts in MLflow

Review deployment logs

Reference

Input Data Example (Weather API Response)

{
    "response": {
        "header": {
            "resultCode": "00",
            "resultMsg": "NORMAL_SERVICE"
        },
        "body": {
            "dataType": "JSON",
            "items": {
                "item": [
                    {
                        "baseDate": "20250313",
                        "baseTime": "0000",
                        "category": "PTY",
                        "nx": 86,
                        "ny": 106,
                        "obsrValue": "0"
                    },
                    {
                        "baseDate": "20250313",
                        "baseTime": "0000",
                        "category": "RN1",
                        "nx": 86,
                        "ny": 106,
                        "obsrValue": "0"
                    },
                    {
                        "baseDate": "20250313",
                        "baseTime": "0000",
                        "category": "T1H",
                        "nx": 86,
                        "ny": 106,
                        "obsrValue": "6.4"
                    },
                ]
            },
            "pageNo": 1,
            "numOfRows": 10,
            "totalCount": 8
        }
    }
}

Data Model (ERD) - Master Tables

Check project repository

Like this project

Completed work

Posted May 6, 2025

- Feature Engineering Pipeline from API to Feature Table - Automated ML Training Pipeline - Continuous Deployment to AWS Sagemaker

Likes

Views

Timeline

Apr 3, 2025 - Apr 30, 2025