Math5470

This repository contains the course project for Math5470, a mathematics course designed to explore fundamental techniques in machine learning.

🧭 Project Overview

As part of the course requirement, students work individually or in small teams to complete a hands-on machine learning project. The project involves data analysis, model development, and result interpretation, with emphasis on scientific reasoning rather than just performance.

📊 Selected Project: M5 Forecasting

Our team chose the Kaggle M5 Forecasting – Accuracy competition.

Objective:
To predict daily unit sales of Walmart retail products over a 28-day horizon using hierarchical time series data from multiple stores across California, Texas, and Wisconsin.

Dataset Features:

Item-level, department, and store-level data
Explanatory variables such as prices, promotions, calendar events, and special days

Goal:
Improve forecast accuracy by combining traditional time-series approaches with modern machine learning techniques.

Competition link:
Kaggle M5 Forecasting – Accuracy

⚙️ Environment Setup

Clone this repository and create the Conda environment using the provided .yml file:

git clone <your-repo-link>
cd <your-repo-folder>
conda env create -f environment.yml
conda activate math

📂 Dataset Download

Download the M5 Forecasting dataset from Kaggle and place it in the following directory structure:

Math5470
|-- calendar.csv
|-- sales_train_validation.csv
|-- sell_prices.csv
|-- sample_submission.csv

Dataset download link: [🔗 Link]

Make sure the files are unzipped and located in the math5470/ folder before running any training or analysis scripts.

🧠 Model Download

Download the pretrained model or checkpoints (if available) and place them in the following directory:

Math5470/
|-- model.lgb
|-- model_meta.json

Model download link: [🔗 Link]

Ensure that the model file name and path match the configuration in your training or inference scripts.

🔍Exploratory data analysis

Explore the dataset patterns and insights using our interactive EDA notebook:

jupyter notebook EDA.ipynb

The analysis includes sales trends, seasonal patterns, and feature correlations to guide model development.

🚀 Training

Run the training script to start model training:

python train.py

Make sure the dataset and environment are properly set up before running this command. Training logs and checkpoints will be automatically saved in the designated output directory.

🔍 Inference

After training, use the inference script to generate predictions:

python infer.py

📈 Evaluation

Evaluate the model performance using the provided evaluation script:

python eval.py

⚡ Train Other Models

We also provide the scripts for training and evaluating a xgboost model. You can follow it and write your own method. Feel free to try it!

python train_xgboost.py
python infer_xgboost.py

👥 Contribution

Name	Contribution
Weizhen Bian	Performed initial data cleaning and feature extraction; implemented the main model, including training, inference, and evaluation; and contributed to writing and editing the final report.
Yiming Li	Conducted exploratory data analysis to identify sales patterns with key visualizations; implemented other models for comparison; aided in ablation study; and contributed to writing and editing the final report
Pengyu Chen	Conducted exploratory data analysis to identify sales patterns with key visualizations; supported data preprocessing, contributed to modeling via feature engineering, and aided in drafting the EDA section.
Jiahao Pan	Conducted exploratory data analysis to identify sales patterns with key visualizations; contributed to writing and editing the final report
Boyi Kang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Math5470

🧭 Project Overview

📊 Selected Project: M5 Forecasting

⚙️ Environment Setup

📂 Dataset Download

🧠 Model Download

🔍Exploratory data analysis

🚀 Training

🔍 Inference

📈 Evaluation

⚡ Train Other Models

👥 Contribution

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
EDA.ipynb		EDA.ipynb
README.md		README.md
config.json		config.json
data_preprocessing.py		data_preprocessing.py
environment.yml		environment.yml
eval.py		eval.py
infer.py		infer.py
infer_xgboost.py		infer_xgboost.py
test.py		test.py
train.py		train.py
train_xgboost.py		train_xgboost.py

LuckyBian/Math5470

Folders and files

Latest commit

History

Repository files navigation

Math5470

🧭 Project Overview

📊 Selected Project: M5 Forecasting

⚙️ Environment Setup

📂 Dataset Download

🧠 Model Download

🔍Exploratory data analysis

🚀 Training

🔍 Inference

📈 Evaluation

⚡ Train Other Models

👥 Contribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages