Skip to content

2AcademicVictims/LocationTrust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” LocationTrust - ML for Trustworthy Location Reviews

Streamlit App

πŸ“‹ Project Overview

LocationTrust is an advanced machine learning application that automatically detects policy violations and assesses the quality of Google location reviews. Built as a comprehensive solution for trustworthy review analysis, the system identifies:

  • Advertisement violations: Reviews promoting other businesses or services
  • Irrelevant content: Reviews unrelated to the location or experience
  • Rant detection: Emotional outbursts from users who likely haven't visited the location
  • Quality assessment: Overall trustworthiness scoring of reviews

🎯 Key Features

  • Interactive Web Interface: Streamlit-based dashboard with real-time processing
  • Flexible Data Input: Support for various CSV formats with configurable column mapping
  • Advanced NLP Processing: Powered by Hugging Face transformers and spaCy
  • Comprehensive Evaluation: Detailed metrics, visualizations, and exportable results
  • Scalable Architecture: Batch processing with concurrent execution for large datasets

πŸ—οΈ System Architecture

πŸ“ LocationTrust/
β”œβ”€β”€ 🎯 app.py                    # Main Streamlit application
β”œβ”€β”€ πŸ“¦ src/                      # Core modules
β”‚   β”œβ”€β”€ data_processor.py        # Data preprocessing and feature extraction
β”‚   β”œβ”€β”€ model_handler.py         # Hugging Face model integration
β”‚   β”œβ”€β”€ policy_detector.py       # Policy violation detection logic
β”‚   β”œβ”€β”€ evaluation.py            # Metrics calculation and evaluation
β”‚   └── visualization.py         # Charts and dashboard components
β”œβ”€β”€ πŸ› οΈ utils/                    # Utility functions
β”‚   β”œβ”€β”€ helpers.py               # General helper functions
β”‚   └── prompts.py               # ML model prompts and templates
β”œβ”€β”€ πŸ“Š datasets/                 # Sample datasets
β”‚   β”œβ”€β”€ reviews.csv              # General review dataset
β”‚   └── sepetcioglu_restaurant.csv # Restaurant-specific dataset
└── πŸ“‹ requirements.txt          # Python dependencies

πŸš€ Setup Instructions

Prerequisites

  • Python 3.8+
  • Hugging Face Account (for API access)
  • Git (for cloning the repository)

1. Clone the Repository

git clone https://github.com/stinkray77/LocationTrust.git
cd LocationTrust

2. Install Dependencies

Option A: Using pip

pip install -r requirements.txt

Option B: Using uv (recommended)

pip install uv
uv sync

3. Get Hugging Face API Key

  1. Create account at huggingface.co
  2. Go to Settings > Access Tokens
  3. Create new token with Read permissions
  4. Copy the token for environment setup

4. Environment Configuration

For Local Development:

Create a .env file or set environment variable:

export HUGGINGFACE_API_KEY="your_token_here"

For Streamlit Cloud:

Add to your app's secrets in the Streamlit dashboard:

HUGGINGFACE_API_KEY = "your_token_here"

5. Run the Application

Local Development:

streamlit run app.py

Production Deployment:

The app is deployed on Streamlit Cloud at: https://locationtrust.streamlit.app

πŸ“Š How to Reproduce Results

Using Sample Data

The repository includes sample datasets in the datasets/ folder:

  1. Launch the application following setup instructions above
  2. Navigate to "πŸ“Š Data Upload & Processing"
  3. Upload sample data: Use datasets/reviews.csv or datasets/sepetcioglu_restaurant.csv
  4. Configure column mapping:
    • Review Text: review_text or text
    • Rating: rating (optional)
    • Location: location or business_name (optional)

Step-by-Step Workflow

1. Data Processing

  • Upload your CSV file with location reviews
  • Map columns to standard format
  • Preview processed data with extracted features

2. Model Configuration

  • Choose Model: Select from available Hugging Face models
    • Default: microsoft/DialoGPT-medium (lightweight)
    • Advanced: facebook/bart-large-mnli (higher accuracy)
  • Set Parameters:
    • Max Length: 256 tokens (recommended)
    • Temperature: 0.3 (balanced creativity)
    • Confidence Threshold: 0.7 (default)

3. Policy Detection

  • Run Analysis: Process reviews through the ML pipeline
  • Monitor Progress: Real-time progress tracking with batch processing
  • Review Results: Detailed violation detection with confidence scores

4. Results & Evaluation

  • Performance Metrics: Precision, Recall, F1-Score for each violation type
  • Interactive Visualizations:
    • Violation distribution charts
    • Confidence score histograms
    • Rating vs. trustworthiness scatter plots
  • Detailed Analysis: Review-by-review breakdown with explanations

5. Export & Summary

  • Download Results: JSON export with full analysis
  • Generate Report: Automated summary with key insights
  • Save Configuration: Export model settings for reproducibility

Expected Results

Using the sample datasets, you should see:

Performance Metrics:

  • Advertisement Detection: ~85-90% F1-Score
  • Irrelevant Content: ~80-85% F1-Score
  • Rant Detection: ~75-80% F1-Score
  • Overall Accuracy: ~82-87%

Key Insights:

  • ~15-25% of reviews contain policy violations
  • Advertisement violations most common in restaurant reviews
  • Rant detection correlates with extreme ratings (1β˜… or 5β˜…)

πŸ”§ Customization

Adding New Models

  1. Update Model List in src/model_handler.py:
AVAILABLE_MODELS = [
    "your-model-name",
    # existing models...
]
  1. Configure Model Parameters for optimal performance

Custom Policy Detection

  1. Modify Rules in src/policy_detector.py
  2. Add New Violation Types with custom logic
  3. Update Evaluation Metrics in src/evaluation.py

Data Format Support

  • Required Column: Review text (any column name)
  • Optional Columns: Rating, location, date, user info
  • Supported Formats: CSV files with UTF-8 encoding

πŸ” Troubleshooting

Common Issues

Dependencies:

# Install missing spaCy model
python -m spacy download en_core_web_sm

# Update packages
pip install --upgrade -r requirements.txt

API Issues:

  • Verify Hugging Face API key is valid
  • Check model availability and permissions
  • Monitor API rate limits

Memory Issues:

  • Reduce batch size in model configuration
  • Use lighter models for large datasets
  • Process data in smaller chunks

Getting Help

  • Issues: GitHub Issues
  • Documentation: See inline code comments
  • Community: Streamlit Community Forum

πŸ“œ License

This project is open source and available under the MIT License.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


Built with ❀️ using Streamlit, Hugging Face, and modern NLP techniques

Releases

No releases published

Packages

No packages published

Languages