A hands-on exploration of Principal Component Analysis (PCA) for dimensionality reduction using the Human Activity Recognition dataset.
This assignment demonstrates how to use PCA to compress high-dimensional sensor data (561 features) from wearable devices into a lower-dimensional space that can be visualized and analyzed more efficiently. You'll compare model performance and training time between the original dataset and PCA-transformed features.
Human Activity Recognition Using Smartphones
- Training set: 7,352 samples with 561 features
- Test set: 2,947 samples with 561 features
- Activities: 6 types (Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, Laying)
- Features: Accelerometer and gyroscope sensor readings
Assignment-5/
├── README.md # This file
├── starter_notebook.ipynb # Main Jupyter notebook with assignment tasks
└── data/
├── train.csv # Training dataset
└── test.csv # Test dataset
Install the required Python packages:
pip install pandas matplotlib seaborn scikit-learn- Open
starter_notebook.ipynbin Jupyter Notebook or VS Code - Run the cells sequentially from top to bottom
- Complete the TODO sections marked throughout the notebook
- Answer reflection questions in the markdown cells
- Load and explore the high-dimensional dataset
- Visualize data using 2D and 3D PCA
- Analyze explained variance to determine optimal components
- Compare model performance (accuracy, training time) between original and PCA features
- Reflect on when to use dimensionality reduction in production
- Understand the curse of dimensionality
- Apply PCA for dimensionality reduction
- Interpret explained variance and scree plots
- Balance model complexity with performance
- Visualize high-dimensional data in lower dimensions