ARISA-MLOps: Titanic Survival Prediction

A production-ready MLOps implementation of the Titanic survival prediction model, demonstrating modern ML engineering practices including automated training, prediction pipelines, and model versioning.

Project Overview

This project implements an end-to-end MLOps pipeline for the classic Titanic survival prediction problem, featuring:

Automated model retraining on data/code changes
Automated prediction pipeline
Model versioning and experiment tracking with MLflow
Champion/Challenger model deployment strategy

Architecture

The system consists of two main pipelines:

Training Pipeline: Automatically retrains the model when training data or code changes
Prediction Pipeline: Generates new predictions when a model is updated

Project Structure

├── ARISA_DSML/          # Main package directory
│   ├── config.py        # Configuration and constants
│   ├── predict.py       # Prediction pipeline
│   ├── preproc.py      # Data preprocessing
│   ├── resolve.py      # Model resolution logic
│   └── train.py        # Training pipeline
├── data/               # Data directory
├── .mlflow/            # MLflow tracking
├── models/             # Model artifacts
├── notebooks/          # Development notebooks
└── reports/           # Generated analysis

Setup

Prerequisites

Python 3.11
Kaggle account and API key

Local Development

Clone the repository:

git clone <your-repo-url>
cd ARISA-MLOps

Create and activate virtual environment:

py -3.11 -m venv .venv
# Windows
.\.venv\Scripts\activate
# Mac/Linux
source .venv/bin/activate

Install dependencies:

make requirements

Set up Kaggle authentication:

Place your kaggle.json in:
- Windows: C:\Users\USERNAME\.kaggle
- Mac/Linux: /home/username/.config/kaggle

Cloud Infrastructure Setup

AWS RDS (Metadata Store):
- Create PostgreSQL database
- Configure public access
- Note connection details
AWS S3 (Artifact Store):
- Create bucket
- Configure appropriate access
GitHub Secrets: Add the following secrets to your repository:
- KAGGLE_KEY
- WORKFLOW_PAT

Usage

Training Pipeline

The training pipeline automatically triggers when:

Training data changes
Model code changes
Manual workflow dispatch

make train

Prediction Pipeline

The prediction pipeline runs when:

A new model is trained
Prediction code changes
Manual workflow dispatch

make predict

Local Development

For local development and testing:

# Download and preprocess data
make preprocess

# Train model
make train

# Generate predictions
make predict

MLflow Tracking

Access the MLflow UI through your configured tracking server to:

Compare experiments
View model metrics
Access model artifacts
Monitor model versions

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

MIT

Contact

Piotr Gryko

Acknowledgments

Original Titanic dataset from Kaggle
MLOps architecture inspired by ml-ops.org

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.mlflow/db		.mlflow/db
ARISA_DSML		ARISA_DSML
data/processed		data/processed
docs		docs
mlruns/2/8c8d13118ead40d4bac400c6bbfdd124/artifacts		mlruns/2/8c8d13118ead40d4bac400c6bbfdd124/artifacts
models		models
notebooks		notebooks
references		references
reports		reports
results		results
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARISA-MLOps: Titanic Survival Prediction

Project Overview

Architecture

Project Structure

Setup

Prerequisites

Local Development

Cloud Infrastructure Setup

Usage

Training Pipeline

Prediction Pipeline

Local Development

MLflow Tracking

Contributing

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

pgryko/ARISA-MLOps

Folders and files

Latest commit

History

Repository files navigation

ARISA-MLOps: Titanic Survival Prediction

Project Overview

Architecture

Project Structure

Setup

Prerequisites

Local Development

Cloud Infrastructure Setup

Usage

Training Pipeline

Prediction Pipeline

Local Development

MLflow Tracking

Contributing

License

Contact

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages