A hands-on tutorial demonstrating K-means clustering to segment credit card customers into actionable business groups using the UCI Credit Card Default dataset.
This tutorial walks through the complete process of customer segmentation using unsupervised machine learning. You'll learn to:
- Transform raw transactional data into behavioral features
- Apply K-means clustering to identify customer segments
- Select optimal cluster numbers using the Elbow Method and Silhouette Score
- Interpret clusters for business decision-making
Source: UCI Kaggle Credit Card Default Data
The dataset contains 30,000 credit card customers with 6 months of payment history and transaction data.
- Python 3.7 or higher
- Jupyter Notebook or VS Code with Jupyter extension
Download this repository or navigate to the project directory:
cd /path/to/Unit6-PracticeThe tutorial uses the following Python packages:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
Install all dependencies using the first code cell in the notebook, or run:
pip install pandas numpy matplotlib seaborn scikit-learnEnsure the dataset is located at:
data/UCI_Credit_Card.csv
-
Launch Jupyter Notebook:
jupyter notebook
-
Open
credit_groups.ipynb -
Run cells sequentially from top to bottom using
Shift + Enter
-
Open the project folder in VS Code
-
Open
credit_groups.ipynb -
Select a Python kernel when prompted
-
Run cells sequentially using the play button or
Shift + Enter
- Load the UCI Credit Card dataset
- Create three behavioral features using RFM methodology:
- Recency: Most recent payment status
- Frequency: Count of on-time payments over 6 months
- Monetary: Average monthly payment amount
- Visualize feature distributions
- Standardize features using StandardScaler
- Prepare data for clustering
- Test K values from 2 to 10
- Apply the Elbow Method (inertia analysis)
- Calculate Silhouette Scores
- Visualize both metrics to select optimal K
- Train final K-means model with selected K value
- Assign cluster labels to all customers
- Analyze cluster characteristics
- Visualize customer segments
- Interpret clusters as business segments:
- Champions: High payment amounts, always on time
- Solid Performers: Consistent on-time payers
- High Rollers: Large payments but inconsistent
- At-Risk: Declining payment behavior
- Problem Accounts: Frequent late payments, low amounts