Skip to content

RizeComputerScience/DSMII-Credit-Card-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Credit Card Customer Clustering Tutorial

A hands-on tutorial demonstrating K-means clustering to segment credit card customers into actionable business groups using the UCI Credit Card Default dataset.

Overview

This tutorial walks through the complete process of customer segmentation using unsupervised machine learning. You'll learn to:

  • Transform raw transactional data into behavioral features
  • Apply K-means clustering to identify customer segments
  • Select optimal cluster numbers using the Elbow Method and Silhouette Score
  • Interpret clusters for business decision-making

Dataset

Source: UCI Kaggle Credit Card Default Data

The dataset contains 30,000 credit card customers with 6 months of payment history and transaction data.

Prerequisites

  • Python 3.7 or higher
  • Jupyter Notebook or VS Code with Jupyter extension

Installation

1. Clone or Download

Download this repository or navigate to the project directory:

cd /path/to/Unit6-Practice

2. Install Dependencies

The tutorial uses the following Python packages:

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

Install all dependencies using the first code cell in the notebook, or run:

pip install pandas numpy matplotlib seaborn scikit-learn

3. Verify Data File

Ensure the dataset is located at:

data/UCI_Credit_Card.csv

How to Run

Option 1: Using Jupyter Notebook

  1. Launch Jupyter Notebook:

    jupyter notebook
  2. Open credit_groups.ipynb

  3. Run cells sequentially from top to bottom using Shift + Enter

Option 2: Using VS Code

  1. Open the project folder in VS Code

  2. Open credit_groups.ipynb

  3. Select a Python kernel when prompted

  4. Run cells sequentially using the play button or Shift + Enter

Tutorial Structure

Part 1: Data Loading and Feature Engineering

  • Load the UCI Credit Card dataset
  • Create three behavioral features using RFM methodology:
    • Recency: Most recent payment status
    • Frequency: Count of on-time payments over 6 months
    • Monetary: Average monthly payment amount

Part 2: Data Preparation

  • Visualize feature distributions
  • Standardize features using StandardScaler
  • Prepare data for clustering

Part 3: Selecting Optimal K

  • Test K values from 2 to 10
  • Apply the Elbow Method (inertia analysis)
  • Calculate Silhouette Scores
  • Visualize both metrics to select optimal K

Part 4: Final Clustering

  • Train final K-means model with selected K value
  • Assign cluster labels to all customers
  • Analyze cluster characteristics

Part 5: Business Interpretation

  • Visualize customer segments
  • Interpret clusters as business segments:
    • Champions: High payment amounts, always on time
    • Solid Performers: Consistent on-time payers
    • High Rollers: Large payments but inconsistent
    • At-Risk: Declining payment behavior
    • Problem Accounts: Frequent late payments, low amounts

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors