This project implements Sentiment Analysis using BERT (Bidirectional Encoder Representations from Transformers). The goal is to classify text data into positive or negative sentiment, leveraging BERT's powerful ability to understand deep contextual relations in text.
The pipeline includes:
-
Data loading and preprocessing with a BERT-specific tokenizer.
-
Splitting data into training and validation sets.
-
Fine-tuning a pre-trained BERT model with early stopping to prevent overfitting.
-
Visualizing performance metrics (accuracy and loss).
-
Evaluate by testing your own text.
git clone https://github.com/anniemburu/Sentimental-Analysis-with-BERTconda create -n myenv python=3.9
conda activate myenvAll dependencies are listed in requirements.txt. Install them with:
pip install -r requirements.txtThe processed dataset is expected at:
bash datasets/processed/sentiment_data.csv.
The data used from this project was sourced from Kaggle. You can modify data_split in src/data/preprocessing.py if you wish to use a different dataset. You can modify bash data_split in bash src/data/preprocessing.py if you wish to use a different dataset.
Run the training pipeline with:
python3 train.pyor
python3 -m trainThis will:
- Train the BiLSTM model on the training data.
- Validate it on the validation set.
- Save the trained model to
bash src/models/sentimental_model/. - Generate training performance plots at
bash src/results/model_performance.png.
You can add extra parameters as defined in bash src/utils/parser.py.
You can test your own text by running :
python3 evaluate.pyor
python3 -m evaluateThe data is sourced from Kaggle. You can either download it manually or automatically.
python3 -m train --autodownloadPreprocessing: The data has been tokenized, padded to fixed sequence length, and split into training and testing.
├── config
│ └── vars.yml
├── datasets/
│ ├── processed/
│ │ └── sentiment_data.csv
│ └── raw/
│ └── sentimentdataset.csv
├── src/
│ ├── data/
│ │ ├── preprocessing.py
│ │ └── data_loader.py
│ ├── models/
│ │ ├── sentimental_model/
│ ├── results/
│ │ └── model_performance.png
│ └── utils/
│ └── parser.py
├── evaluate.py
├── train.py
├── requirements.txt
└── README.md
TBA