This project implements a customer segmentation analysis for a UK-based online gift retailer using K-means clustering. The analysis segments customers based on their transaction history (2009-2011) using RFM (Recency, Frequency, Monetary) features to identify distinct customer groups and provide targeted marketing recommendations.
- Aggregate transaction data to customer-level RFM features
- Determine optimal number of clusters using elbow method and silhouette analysis
- Train K-means clustering model to segment customers
- Visualize and interpret customer segments
- Provide actionable business recommendations for each segment
Assignment-4/
├── README.md # Project documentation (this file)
├── starter_notebook.ipynb # Main Jupyter notebook with analysis
└── data/
└── online_retail_II.csv # Transaction dataset (2009-2011)
The online_retail_II.csv dataset contains transaction records with the following key fields:
- Invoice: Unique transaction identifier
- StockCode: Product code
- Description: Product name
- Quantity: Number of items purchased
- InvoiceDate: Transaction date and time
- Price: Unit price per item
- Customer ID: Unique customer identifier
- Country: Customer's country
-
Open the notebook:
- Open
starter_notebook.ipynbin VS Code
- Open
-
Install dependencies:
- Run the first code cell to install required packages:
pip install pandas matplotlib seaborn scikit-learn
- Run the first code cell to install required packages:
-
Run the analysis:
- Execute cells sequentially from top to bottom
- Fill in the TODO sections as you progress through the assignment
-
Install dependencies:
pip install pandas matplotlib seaborn scikit-learn
-
Launch Jupyter:
jupyter notebook starter_notebook.ipynb
Or:
jupyter lab
-
Execute the notebook:
- Run cells in order, completing the TODO sections
The analysis is organized into six main steps:
- Import required Python libraries
- Load the Online Retail II dataset
- Display basic dataset information
- Clean transaction data (remove missing IDs, returns, etc.)
- Calculate RFM metrics for each customer:
- Recency: Days since last purchase
- Frequency: Number of unique transactions
- Monetary: Total amount spent
- Standardize RFM features using StandardScaler
- Apply elbow method to test k values from 2 to 10
- Calculate silhouette scores for cluster quality validation
- Select optimal k value with written justification
- Train final K-means model with optimal k
- Add cluster labels to customer data
- Create scatter plot visualization (Frequency vs Monetary)
- Calculate and display cluster centers
- Analyze each customer segment's characteristics
- Provide descriptive names for segments
- Calculate detailed statistics for each segment
- Verify all code runs without errors
- Push to GitHub
- Submit repository link
After completing all steps:
git add .
git commit -m 'completed customer segmentation assignment'
git pushSubmit your GitHub repository link on the course platform.