Skip to content

feat: added auto duplicated issue and pr detector#402

Open
Abhinandankaushik wants to merge 7 commits intoapache:mainfrom
Abhinandankaushik:main
Open

feat: added auto duplicated issue and pr detector#402
Abhinandankaushik wants to merge 7 commits intoapache:mainfrom
Abhinandankaushik:main

Conversation

@Abhinandankaushik
Copy link

Summary

This PR adds an automated mechanism to detect and handle duplicate issues and pull requests in the repository through the CI/CD workflow.

Problem

Currently, duplicate issues and pull requests are frequently created, which leads to:

  • Redundant discussions
  • Repeated work
  • Increased maintenance overhead for maintainers

There was no automated system in place to proactively detect and flag such duplicates.

What this PR does

This PR introduces an automated duplicate detection mechanism that:

  • Scans newly created issues and pull requests for similarity with existing ones.
  • Compares them based on title, description, and labels.
  • Automatically applies a possible-duplicate label to suspected duplicates.
  • Posts a bot comment suggesting related existing issues with relevant links.
  • Provides configurable behavior to optionally close an issue if an exact match is found.

Alternatives considered

The following alternatives were considered but rejected:

  • Manual review by maintainers → Does not scale and increases workload.
  • GitHub issue templates only → Helpful but does not prevent duplicate submissions.
  • Strict issue submission rules → Too restrictive and may discourage contributions.

Additional context

  • This feature is especially useful for large repositories with frequent contributions.
  • Implemented using GitHub Actions with existing issue similarity tools / custom scripts.
  • Similar functionality exists in repositories such as Kubernetes and TensorFlow.

Checklist

  • Implemented automated duplicate detection in CI/CD
  • Added labeling for suspected duplicates
  • Added bot comment with related issue links
  • Made auto-close behavior configurable

Fixed issue : #400

- Removed all .md doc files (DUPLICATE_DETECTION, HINDI_SUMMARY, etc.)
- Removed test scripts (test-local.sh, test-local.ps1)
- Simplified CONTRIBUTING.md duplicate detection section
- Keep only essential files for detection functionality
Copilot AI review requested due to automatic review settings February 7, 2026 20:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a GitHub Actions-based automation to detect potentially duplicate issues and pull requests and then label/comment (and optionally close on “exact matches”) to reduce duplicate reports in the repo.

Changes:

  • Added a new CI workflow plus a Python-based duplicate detection script and YAML configuration.
  • Added contributor documentation noting the duplicate detection behavior.
  • Added/updated various site/i18n content files and a .gitattributes file for line-ending normalization.

Reviewed changes

Copilot reviewed 11 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
.github/workflows/duplicate-detector.yml New workflow to run duplicate detection on issue/PR open/reopen events.
.github/scripts/detect-duplicates.py Implements similarity-based duplicate detection and applies labels/comments (optional close).
.github/scripts/requirements.txt Python dependencies for the workflow.
.github/duplicate-detector-config.yml Configures thresholds, labels, and other behavior for the detector.
CONTRIBUTING.md Documents that automated duplicate detection is used.
.gitattributes Enforces consistent line endings and marks binaries.
i18n/en-US/docusaurus-theme-classic/navbar.json Adds English i18n strings for navbar.
i18n/en-US/docusaurus-theme-classic/footer.json Adds English i18n strings for footer.
i18n/en-US/docusaurus-plugin-content-docs/current.json Adds docs i18n strings for the current version.
i18n/en-US/docusaurus-plugin-content-blog/options.json Adds blog i18n strings.
i18n/en-US/docusaurus-plugin-content-blog/authors.yml Adds blog author metadata.
i18n/en-US/docusaurus-plugin-content-docs/current/.keep Adds placeholder file to keep directory in git.
src/pages/home/css/tailwind.css Formatting/normalization changes.
src/hooks/useAOS.tsx Formatting/normalization changes.
src/constants/index.ts Formatting/normalization changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Abhinandankaushik and others added 4 commits February 8, 2026 01:37
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 15 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant