A production-ready MLOps implementation of the Titanic survival prediction model, demonstrating modern ML engineering practices including automated training, prediction pipelines, and model versioning.
This project implements an end-to-end MLOps pipeline for the classic Titanic survival prediction problem, featuring:
- Automated model retraining on data/code changes
- Automated prediction pipeline
- Model versioning and experiment tracking with MLflow
- Champion/Challenger model deployment strategy
The system consists of two main pipelines:
- Training Pipeline: Automatically retrains the model when training data or code changes
- Prediction Pipeline: Generates new predictions when a model is updated
├── ARISA_DSML/ # Main package directory
│ ├── config.py # Configuration and constants
│ ├── predict.py # Prediction pipeline
│ ├── preproc.py # Data preprocessing
│ ├── resolve.py # Model resolution logic
│ └── train.py # Training pipeline
├── data/ # Data directory
├── .mlflow/ # MLflow tracking
├── models/ # Model artifacts
├── notebooks/ # Development notebooks
└── reports/ # Generated analysis
- Python 3.11
- Kaggle account and API key
- Clone the repository:
git clone <your-repo-url>
cd ARISA-MLOps- Create and activate virtual environment:
py -3.11 -m venv .venv
# Windows
.\.venv\Scripts\activate
# Mac/Linux
source .venv/bin/activate- Install dependencies:
make requirements- Set up Kaggle authentication:
- Place your
kaggle.jsonin:- Windows:
C:\Users\USERNAME\.kaggle - Mac/Linux:
/home/username/.config/kaggle
- Windows:
-
AWS RDS (Metadata Store):
- Create PostgreSQL database
- Configure public access
- Note connection details
-
AWS S3 (Artifact Store):
- Create bucket
- Configure appropriate access
-
GitHub Secrets: Add the following secrets to your repository:
KAGGLE_KEYWORKFLOW_PAT
The training pipeline automatically triggers when:
- Training data changes
- Model code changes
- Manual workflow dispatch
make trainThe prediction pipeline runs when:
- A new model is trained
- Prediction code changes
- Manual workflow dispatch
make predictFor local development and testing:
# Download and preprocess data
make preprocess
# Train model
make train
# Generate predictions
make predictAccess the MLflow UI through your configured tracking server to:
- Compare experiments
- View model metrics
- Access model artifacts
- Monitor model versions
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT
Piotr Gryko
- Original Titanic dataset from Kaggle
- MLOps architecture inspired by ml-ops.org