A CLI tool to identify pull request outliers in GitHub repositories based on review time, size, and qualitative metrics. This tool helps engineering teams understand review patterns and identify unusual PRs that might need attention.
- Fetch & Store: efficiently retrieve PR data from GitHub (handling rate limits) and store locally in a SQLite database.
- Outlier Detection: Detect statistical outliers using Z-score analysis on multiple metrics (review duration, size, comments, etc.).
- Analysis: Calculate rich features like code churn, comment density, and review speed.
- Flexible Output: view results in the terminal as tables, or export to JSON/CSV for further analysis.
Prerequisites:
- Python 3.12 or higher
- uv (recommended for dependency management)
-
Clone the repository:
git clone https://github.com/ghinks/review-classification.git cd review-classification -
Install dependencies:
uv sync
To avoid rate limits, you must set a GitHub personal access token:
export GITHUB_TOKEN=your_token_hereThe classify command fetches PR data from a repository and stores it locally.
# Fetch all PRs
uv run review-classify classify owner/repo
(default start date is 30 days ago)
# Fetch PRs within a specific date range
uv run review-classify classify owner/repo --start 2023-01-01 --end 2023-12-31
# Reset the local database before fetching
uv run review-classify classify owner/repo --reset-dbThe detect-outliers command analyzes the stored data to find unusual PRs.
# Basic outlier detection (default threshold: 2.0)
uv run review-classify detect-outliers owner/repo
# rigorous detection (higher threshold)
uv run review-classify detect-outliers owner/repo --threshold 3.0
# Export results to JSON
uv run review-classify detect-outliers owner/repo --format json > outliers.jsonInstall dev dependencies:
uv sync --group devRun the test suite with pytest:
uv run pytestThis project uses ruff for linting and formatting and mypy for static type checking.
# Run pre-commit hooks on all files
uv run pre-commit run --all-files