LocationTrust is an advanced machine learning application that automatically detects policy violations and assesses the quality of Google location reviews. Built as a comprehensive solution for trustworthy review analysis, the system identifies:
- Advertisement violations: Reviews promoting other businesses or services
- Irrelevant content: Reviews unrelated to the location or experience
- Rant detection: Emotional outbursts from users who likely haven't visited the location
- Quality assessment: Overall trustworthiness scoring of reviews
- Interactive Web Interface: Streamlit-based dashboard with real-time processing
- Flexible Data Input: Support for various CSV formats with configurable column mapping
- Advanced NLP Processing: Powered by Hugging Face transformers and spaCy
- Comprehensive Evaluation: Detailed metrics, visualizations, and exportable results
- Scalable Architecture: Batch processing with concurrent execution for large datasets
π LocationTrust/
βββ π― app.py # Main Streamlit application
βββ π¦ src/ # Core modules
β βββ data_processor.py # Data preprocessing and feature extraction
β βββ model_handler.py # Hugging Face model integration
β βββ policy_detector.py # Policy violation detection logic
β βββ evaluation.py # Metrics calculation and evaluation
β βββ visualization.py # Charts and dashboard components
βββ π οΈ utils/ # Utility functions
β βββ helpers.py # General helper functions
β βββ prompts.py # ML model prompts and templates
βββ π datasets/ # Sample datasets
β βββ reviews.csv # General review dataset
β βββ sepetcioglu_restaurant.csv # Restaurant-specific dataset
βββ π requirements.txt # Python dependencies
- Python 3.8+
- Hugging Face Account (for API access)
- Git (for cloning the repository)
git clone https://github.com/stinkray77/LocationTrust.git
cd LocationTrustpip install -r requirements.txtpip install uv
uv sync- Create account at huggingface.co
- Go to Settings > Access Tokens
- Create new token with Read permissions
- Copy the token for environment setup
Create a .env file or set environment variable:
export HUGGINGFACE_API_KEY="your_token_here"Add to your app's secrets in the Streamlit dashboard:
HUGGINGFACE_API_KEY = "your_token_here"streamlit run app.pyThe app is deployed on Streamlit Cloud at: https://locationtrust.streamlit.app
The repository includes sample datasets in the datasets/ folder:
- Launch the application following setup instructions above
- Navigate to "π Data Upload & Processing"
- Upload sample data: Use
datasets/reviews.csvordatasets/sepetcioglu_restaurant.csv - Configure column mapping:
- Review Text:
review_textortext - Rating:
rating(optional) - Location:
locationorbusiness_name(optional)
- Review Text:
- Upload your CSV file with location reviews
- Map columns to standard format
- Preview processed data with extracted features
- Choose Model: Select from available Hugging Face models
- Default:
microsoft/DialoGPT-medium(lightweight) - Advanced:
facebook/bart-large-mnli(higher accuracy)
- Default:
- Set Parameters:
- Max Length: 256 tokens (recommended)
- Temperature: 0.3 (balanced creativity)
- Confidence Threshold: 0.7 (default)
- Run Analysis: Process reviews through the ML pipeline
- Monitor Progress: Real-time progress tracking with batch processing
- Review Results: Detailed violation detection with confidence scores
- Performance Metrics: Precision, Recall, F1-Score for each violation type
- Interactive Visualizations:
- Violation distribution charts
- Confidence score histograms
- Rating vs. trustworthiness scatter plots
- Detailed Analysis: Review-by-review breakdown with explanations
- Download Results: JSON export with full analysis
- Generate Report: Automated summary with key insights
- Save Configuration: Export model settings for reproducibility
Using the sample datasets, you should see:
Performance Metrics:
- Advertisement Detection: ~85-90% F1-Score
- Irrelevant Content: ~80-85% F1-Score
- Rant Detection: ~75-80% F1-Score
- Overall Accuracy: ~82-87%
Key Insights:
- ~15-25% of reviews contain policy violations
- Advertisement violations most common in restaurant reviews
- Rant detection correlates with extreme ratings (1β or 5β )
- Update Model List in
src/model_handler.py:
AVAILABLE_MODELS = [
"your-model-name",
# existing models...
]- Configure Model Parameters for optimal performance
- Modify Rules in
src/policy_detector.py - Add New Violation Types with custom logic
- Update Evaluation Metrics in
src/evaluation.py
- Required Column: Review text (any column name)
- Optional Columns: Rating, location, date, user info
- Supported Formats: CSV files with UTF-8 encoding
Dependencies:
# Install missing spaCy model
python -m spacy download en_core_web_sm
# Update packages
pip install --upgrade -r requirements.txtAPI Issues:
- Verify Hugging Face API key is valid
- Check model availability and permissions
- Monitor API rate limits
Memory Issues:
- Reduce batch size in model configuration
- Use lighter models for large datasets
- Process data in smaller chunks
- Issues: GitHub Issues
- Documentation: See inline code comments
- Community: Streamlit Community Forum
This project is open source and available under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request.
Built with β€οΈ using Streamlit, Hugging Face, and modern NLP techniques